How to perform high precision calculations in D?

For some universal work, I have to approximate some numbers - for example, Euler with a series. So I have to add very small numbers, but I have problems with accuracy. If the number is very small, this does not affect the result.

real s; //sum of all previous terms ulong k; //factorial s += 1.0/ k; 

after each step, k becomes even smaller, but after the 10th round, the result no longer changes and gets stuck at 2.71828

+4
source share
3 answers

If you need a solution that works using native types, you can get reasonable results by always trying to add numbers of the same magnitude. One way to do this is to compute the first X members of the series, and then repeatedly replace the two smallest numbers with the sum:

 auto data = real[N]; foreach(i, ref v; data) { v = Fn(i); } while(data.length > 1) { data.sort(); // IIRC .sort is deprecated but I forget what replaced it. data[1] += data[0]; data = data[1..$]; } return data[0]; 

(A minimal heap will make it a little faster.)

+3
source

Fixed-precision floating-point types, those supported by your floating-point processor ( float , double , real ), are not optimal for any calculations that require many precision digits, such as the example you gave.

The problem is that these floating point types have a finite number of precision digits (actually binary digits) that limit the length of the number that can be represented by this type of data. The float type has a limit of approximately 7 decimal digits (for example, 3.141593); double type is limited to 14 (for example, 3.1415926535898); and the type real has a similar limit (slightly larger than double ).

Adding extremely small numbers to a floating point value will cause these numbers to be lost. See what happens when we add the following two float values:

 float a = 1.234567f, b = 0.0000000001234567 float c = a + b; writefln("a = %fb = %fc = %f", a, b, c); 

Both a and b are valid float values ​​and store separately about 7 digits of precision. But when added, only the very first 7 digits are saved, because it returns to the float:

 1.2345670001234567 => 1.234567|0001234567 => 1.234567 ^^^^^^^^^^^ sent to the bit bucket 

So, c ends with a , because the smaller precision digits from adding a and b are removed.

Here is another explanation of the concept , probably much better than mine.


The answer to this problem is arbitrary arithmetic. Unfortunately, arbitrary precision arithmetic support is not in the CPU hardware; therefore it is not (usually) in your programming language. However, there are many libraries that support arbitrary-precision floating-point types and the math you want to do on them. See this question for some suggestions. You probably won't find D-specific libraries for this today, but there are many C libraries (GMP, MPFR, etc.) that should be easy enough to use in isolation, and even more so if you can find D for one of them.

+9
source

As already mentioned, you need to use some third-party multidimensional floating point arithmetic library (I think that Tango or Phobos has only a module for integer arithmetic of arbitrary length).

dil is a D project that uses MPFR. Here you should find the bindings.

+2
source

Source: https://habr.com/ru/post/1339014/


All Articles