Why does numeric_limits <float> :: min () really not give the lowest possible float?

It seems that we can trivially output floats smaller than numeric_limits<float>::min() . What for. If numeric_limits<float>::min() should not be the smallest positive float, what should it be?

 #include <iostream> #include <limits> using namespace std; int main(){ float mind = numeric_limits<float>::min(); float smaller_than_mind = numeric_limits<float>::min()/2; cout<< ( mind > smaller_than_mind && smaller_than_mind > 0 ) <<endl; } 

Run it here: https://onlinegdb.com/ry3AcxjXz

+5
source share
3 answers

min() of the floating-point type returns the minimum positive value that has the full expressiveness of the format: all bits of its value are available for use.

Smaller positive values ​​are called subnormal. Despite the fact that they are presented, high value bits are necessarily equal to zero.

The IEEE-754 64-bit binary floating-point format is a signed number (+ or - encoded as 0 or 1), a metric (-1022 to +1023, encoded as 1 to 2046, plus 0 and 2047 as special cases) and a 53-bit value (encoded with 52 bits plus the key from the exponent field).

For normal values, the exponent field is from 1 to 2046 (with exponents from -1022 to +1023), and the value (in binary terms) is 1.xxx ... xxx, where xxx ... xxx represents another 52 bits. In all these values, the value of the least significant bit of the value is 2 -52 times the value of the most significant bit (the first one in it).

For subnormal values, the exponent field is 0. This still represents the exponent -1022, but that means the most significant bit of the value is 0. Significance is now 0.xxx ... xxx. Since lower and lower values ​​are used in this range, the higher bits of the value become equal to zero. Now the value of the least significant bit of the significant value is greater than 2 -52 times the value of the most significant bit. You cannot adjust the numbers as precisely in this interval as in the normal interval, because not all value bits are available for arbitrary values ​​- some leading bits are fixed to 0 to set the scale.

Because of this, the relative errors that occur when working with numbers in this range are usually larger than the relative errors in the normal range. The floating point format has this subnormal range, because if he did not, the numbers would simply be cut off with the lowest normal value, and the gap between this normal value and zero would be a huge relative jump - 100% of the value in one step. Including subnormal numbers, relative errors increase more gradually, and absolute errors remain constant from this point to zero.

It is important to know where the bottom of the normal range is. min() reports this. denorm_min() indicates the minimum positive value.

+7
source

According to en.cppreference.com :

For floating point types with denormalization, min returns the minimum positive normalized value. Note that this behavior may be unexpected, especially when compared to the min behavior for integral types.

float is a denormalized type, information about normalized floating point numbers.

+3
source

Because numeric_limits::min returns "For floating types with subnormal numbers, returns the minimum positive normalized value." You can divide this by 2 and get an abnormal (aka denormal on some platforms) quantity on some systems. These numbers do not preserve the full precision of the float type, but allow you to store values ​​that are otherwise equal to 0.0.

+2
source

Source: https://habr.com/ru/post/1274489/


All Articles