What is the maximum number of base 10 digits in the fractional part of a floating-point number

If a floating-point number can be inferred so that there is no truncation of the value (for example, using setpercision ), and the number is displayed in a fixed notation (for example, using fixed ), what will be the size of the buffer, which will ensure that the whole can the fractional part of the floating point number be stored in the buffer?

I hope there is something in the standard, like #define or something in numeric_limits , which will tell me the maximum value of the base-10 value of the fractional part of the floating point type.

I asked about the maximum number of base 10 digits in the fractional part of a floating point type here: What is the maximum number of base 10 digits in the integral part of a floating point number

But I understand that this can be more complicated. For example, 1.0 / 3.0 is an endlessly repeating series of numbers. When I output this using fixed formatting, I get this many places before repeating 0s:

0.333333333333333314829616256247390992939472198486328125

But I can’t say the maximum accuracy, because I don’t know how many of these trailing 0s were actually represented in the floating point bottom, and it was not shifted by a negative indicator.

I know that we min_exponent10 is what I should look for for this?

+5
source share
4 answers

If you count the 32 and 64 bit IEEE 754 numbers, they can be calculated as described below.

It's all about negative powers of 2. So let's see how each exponent contributes:

 2^-1 = 0.5 ie 1 digit 2^-2 = 0.25 ie 2 digits 2^-3 = 0.125 ie 3 digits 2^-4 = 0.0625 ie 4 digits .... 2^-N = 0.0000.. ie N digits 

since base-10 numbers always end in 5, you can see that the number of digits of the 10th level increases by 1 when the exponent decreases by 1. So 2 ^ (-N) will require N digits

Also note that when adding these contributions, the number of resulting digits is determined by the smallest number. So what you need to know is the smallest metric that can contribute.

For the 32-bit IEEE 754 you have:

The smallest figure is -126

Fractional Shares 23

Thus, the smallest indicator is -126 + -23 = -149, so the smallest contribution will be in 2 ^ -149, i.e.

For a 32-bit IEEE 754 printed in base-10, there can be 149 fractional digits

For the 64-bit IEEE 754 you have:

The smallest figure is -1022

Fractional Discharges 52

Thus, the smallest indicator is -1022 + -52 = -1074, so the smallest contribution will come from 2 ^ -1074, i.e.

For a 64-bit IEEE 754 printed in base 10, there may be 1074 fractional digits

+6
source

I am sure that the standard does not allow (and cannot, without imposing other restrictions) provide a predefined constant to indicate the number you are requesting.

Floating point is most often represented in base 2, but base 16 and base 10 are also in reasonably widespread use.

In all these cases, the only factors in the base (2 and possibly 5) are also the coefficients 10. As a result, we never get an infinitely repeating number when moving from them to base 10 (decimal).

Standards do not limit floating point such representations. Theoretically, if someone really wanted them to be able to use (for example) base 3 or base 7 to represent them with a floating point. If they did, it would be trivial to store a number that will repeat endlessly when it is converted to decimal. For example, 0.1 in base 3 will represent 1/3, which repeats endlessly when converted to base 10. Although I have never heard of anyone doing this, I believe that such an implementation can meet the requirements of the standard.

For a typical binary representation, min_exponent should probably be a reasonable proxy for the required value. Unfortunately, it is probably impossible to say things much more accurately than that.

For example, for an implementation it is allowed to store intermediate values ​​to a greater accuracy than it is stored in memory, so it is possible that (for example), if you give 1.0/3.0 literally in the source code, the result may actually differ from the value obtained by reading a pair of inputs into runtime, input 1 and 3 and dividing them. In the first case, separation can be performed at compile time, so the result you printed will be exactly the size of double , without any extra. When you enter two values ​​at run time, division will be performed at run time, and you can get the result with greater precision.

The standard also requires that the floating point base be registered as std::numeric_limits<T>::radix . Based on this, you can calculate the approximation of the maximum number of places after the decimal point based on radix min_exponent if the simple base coefficients were divided with the main factors of 10.

+2
source

You really don't want to know how many “digits are in the fractional part”, this statement shows that you are not 100% aware of what is happening under the hood in the floating point view. There is no separate accuracy for the whole and fractional parts.

What you really want to know is accuracy.

1) The 32-bit single-point number of IEEE754 has 24 bits of mantissa, which gives an accuracy of 24 * log10(2) = 7.2 digits.

2) A 64-bit double-precision number IEEE754 has 53 bits of mantissa, which gives an accuracy of 53 * log10(2) = 16.0 digits.

Suppose you are working with double precision numbers. If you have a very small base-10 value, say, from 0 to 1, then after the decimal point you will have about 16 decimal digits of precision. This example of your example 1.0/3.0 shows above - you know that the answer must be 0.3 repetitions, but you have sixteen three after the decimal point before the answer turns into rubbish.

If you have a very large number, say, a billion, divided by three ( 1000000000.0/3.0 ), then on my machine the answer will look something like this:

 1000000000.0/3.0 = 333333333.333333313465118 

In this case, you still have about 16 digits of accuracy, but the accuracy is divided into integral and fractional parts. The integral part has 9 exact digits and 7 exact digits in the fractional part. The eight-digit number in the fractional part is garbage.

Similarly, suppose we divide one quintillion (18 zeros) into three. On my car:

 1000000000000000000.0/3.0 = 333333333333333312.000000000000000 

You still have sixteen digits of precision, but zero of these digits after the decimal point.

+1
source

std::numeric_limits<double>::min_exponent

The minimum negative integer value such that the radix raised to (min_exponent-1) generates a normalized floating point number. Equivalent to FLT_MIN_EXP, DBL_MIN_EXP or LDBL_MIN_EXP for floating types.

min_exponent10 also available.

A minimum negative integer value such that 10 raised to this power generates a normalized floating point number. Equivalent to FLT_MIN_10_EXP, DBL_MIN_10_EXP or LDBL_MIN_10_EXP for floating types.

0
source

Source: https://habr.com/ru/post/1257649/


All Articles