The definition of NaNs, signed zero and infinity, and denormalized numbers from [3] is reproduced here for convenience. The definitions for quadruple-precision floating point numbers are analogs of those for single and double-precision floating point numbers, and are defined in [3].
In the following, 'S' stands for the sign bit, 'E' for the exponent, and 'F' for the fractional part. The symbol 'u' stands for an undefined bit (0 or 1).
For single-precision floating point numbers:
Type S (1 bit) E (8 bits) F (23 bits) ---- --------- ---------- ----------- signalling NaN u 255 (max) .0uuuuu---u (with at least one 1 bit) quiet NaN u 255 (max) .1uuuuu---u negative infinity 1 255 (max) .000000---0 positive infinity 0 255 (max) .000000---0 negative zero 1 0 .000000---0 positive zero 0 0 .000000---0
For double-precision floating point numbers:
Type S (1 bit) E (11 bits) F (52 bits) ---- --------- ----------- ----------- signalling NaN u 2047 (max) .0uuuuu---u (with at least one 1 bit) quiet NaN u 2047 (max) .1uuuuu---u negative infinity 1 2047 (max) .000000---0 positive infinity 0 2047 (max) .000000---0 negative zero 1 0 .000000---0 positive zero 0 0 .000000---0
For quadruple-precision floating point numbers:
Type S (1 bit) E (15 bits) F (112 bits) ---- --------- ----------- ------------ signalling NaN u 32767 (max) .0uuuuu---u (with at least one 1 bit) quiet NaN u 32767 (max) .1uuuuu---u negative infinity 1 32767 (max) .000000---0 positive infinity 0 32767 (max) .000000---0 negative zero 1 0 .000000---0 positive zero 0 0 .000000---0
Subnormal numbers are represented as follows:
Precision Exponent Value --------- -------- ----- Single 0 (-1)**S * 2**(-126) * 0.F Double 0 (-1)**S * 2**(-1022) * 0.F Quadruple 0 (-1)**S * 2**(-16382) * 0.F