Why does the IEEE 754 standard use offset 127?

When working with the redundant representation of integers, I use an offset of 2 n-1 . However, the IEEE 754 standard uses 2 n-1 - 1 instead.

The only advantage I can think of is a larger positive range. Are there any other reasons why this decision was made?

+6
source share
1 answer

The reason is both Infinities / NaNs and gradual overflow.

If you use indicators to display both integer (n> = 0) and fractional (n <0) values, you have a problem that you need one indicator for 2 ^ 0 = 1. Thus, the remaining range is odd, which gives you either choose a large range for fractions, or for integers. For single precision, we have 256 values, 255 without exponent 0. Now IEEE754 has reserved the highest indicator (255) for special values: + - Infinity and NaNs (Not a Number) to indicate a failure. Thus, we again return to even numbers (254 for both sides, integer and fractional), but with a lower offset.

The second reason is gradual overflow. The standard declares that usually all numbers are normalized, which means that the indicator shows the position of the first bit. To increase the number of bits, the first bit is usually not set, but accepted (hidden bit): the first bit after the exponent bit is the second bit of the number, the first is always binary. If you apply normalization, you are faced with a problem that you cannot encode zero, and even if you encode zero as a special value, the accuracy of the numerical data is difficult. + -Infinity (the highest indicator) makes it clear that something is wrong, but a downstream to zero for very small numbers is completely normal and therefore easy to overlook as a possible problem. Therefore, Kahan, the developer of the standard, decided to introduce denormalized numbers or subnormal values, and they should include 1 / MAX_FLOAT.

EDIT: Allan asked why "numerical precision is difficult" if you encode zero as a special value. I should better say this, because "numerical accuracy is still difficult." In fact, it was an implementation of the historic DEC VAX floating point format. If the exponent field in the raw bit encoding was 0, it was considered equal to zero. For example, I am taking now the 32-bit format, still rampant in GPUs.

X 00000000 XXXXXXXXXXXXXXXXXXXXXXX 

In this case, the contents of the mantissa field on the right can be completely ignored and usually filled with zeros. The sign field on the left side can be valid, distinguishing between normal zero and β€œnegative zero” (you can get a negative zero by about -1.0 / 0,0 or round a negative number).

The gradual overflow and subnormal values ​​of IEEE 754, in contrast, used the mantissa field. Only

 X 00000000 00000000000000000000000 

equals zero. All other bit combinations are valid and even more practical, you are warned that your result is ending. So what is the point?

Consider various numbers

 A 0 00000009 10010101111001111111111 B 0 00000009 10010101111100001010000 

They are real floating-point members, very small, but still finite. But, as you can see, the first 11 bits are identical. If you now subtract AB or BA, the first valid bit leaves the lower range of the exponents, so the result without a gradual lower stream will be ... 0. So, A! = B, but AB = 0. Ouch. Countless people have fallen in this trap, and it can be assumed that they never knew it. The same thing with multiplication or division: you need to add or subtract the exponents, and if it falls below the lower threshold: 0. And as you know: 0 * all = 0. You can have STXYZ, and as soon as one subproduct is 0, the result will be 0, even if the completely correct and even huge number is the correct result. It should be said that these anomalies could never be completely avoided due to rounding, but with a gradual overflow they became rare. Very rarely.

+6
source

Source: https://habr.com/ru/post/906298/


All Articles