About floating point precision: why are iteration numbers not equal?

There are two similar programs with a matrix, one iteration 10 times, and the other 11 times.

One:

i = 0; x = 0.0; h = 0.1; while x < 1.0 i = i + 1; x = i * h; disp([i,x]); end 

Other:

 i = 0; x = 0.0; h = 0.1; while x < 1.0 i = i + 1; x = x + h; disp([i,x]); end 

I do not understand why there is a difference between a floating point add operation and a set.

+4
source share
3 answers

Compare the output of the following:

 >> fprintf('%0.20f\n', 0.1.*(1:10)) 0.10000000000000001000 0.20000000000000001000 0.30000000000000004000 0.40000000000000002000 0.50000000000000000000 0.60000000000000009000 0.70000000000000007000 0.80000000000000004000 0.90000000000000002000 1.00000000000000000000 >> fprintf('%0.20f\n', cumsum(repmat(0.1,1,10))) 0.10000000000000001000 0.20000000000000001000 0.30000000000000004000 0.40000000000000002000 0.50000000000000000000 0.59999999999999998000 0.69999999999999996000 0.79999999999999993000 0.89999999999999991000 0.99999999999999989000 

Also compare using the MATLAB COLON statement:

 >> fprintf('%0.20f\n', 0.1:0.1:1) 0.10000000000000001000 0.20000000000000001000 0.30000000000000004000 0.40000000000000002000 0.50000000000000000000 0.59999999999999998000 0.69999999999999996000 0.80000000000000004000 0.90000000000000002000 1.00000000000000000000 

If you want to see a 64-bit binary representation, use:

 >> format hex >> [(0.1:0.1:1)' (0.1.*(1:10))' cumsum(repmat(0.1,10,1))] 3fb999999999999a 3fb999999999999a 3fb999999999999a 3fc999999999999a 3fc999999999999a 3fc999999999999a 3fd3333333333334 3fd3333333333334 3fd3333333333334 3fd999999999999a 3fd999999999999a 3fd999999999999a 3fe0000000000000 3fe0000000000000 3fe0000000000000 3fe3333333333333 3fe3333333333334 3fe3333333333333 3fe6666666666666 3fe6666666666667 3fe6666666666666 3fe999999999999a 3fe999999999999a 3fe9999999999999 3feccccccccccccd 3feccccccccccccd 3feccccccccccccc 3ff0000000000000 3ff0000000000000 3fefffffffffffff 

Some suggested readings (related to MATLAB):

+2
source

You must be very careful when iterating with float counters. As an example, I will show you what happens in your case (this is a Java program, but your case should be the same): click here to start it yourself

 double h = 0.1; System.out.println(10*h-1.0); System.out.println(h+h+h+h+h+h+h+h+h+h-1.0); 

It simply prints the difference in unit when doing the additions of multiplication and sampling.

Since the float representation is not accurate, the result is as follows:

 0.0 -1.1102230246251565E-16 

Thus, if you use this as a loop condition in the latter case, there will be an additional iteration (not yet achieved).

Try using the counter variable i , which is an integer, and you will not run into such problems.

+4
source

The representation of floats IS is accurate, except that float arithmetic is in base 2, and decimal numbers such as 0.1 have infinite binary decomposition. Since floats have a finite number of bits, an infinite extension of 0.1 must be rounded, and the rounding error accumulates when added, which leads to a discrepancy.

However, most floating point operations are inaccurate: USUALLY results require more precision bits than can be placed in a fixed number of bits, so the processor automatically rounds the result to match the available accuracy. Such rounding errors accumulate in long chains of calculations, as you have noticed, and sometimes lead to huge discrepancies between the actual and the β€œcorrect” result. ("correct" is defined as the result obtained in arithmetic with infinite accuracy.)

0
source

Source: https://habr.com/ru/post/917521/


All Articles