Call optimization bigint

Question

Call optimization bigint

I am currently using the book "Programming in D" for teaching D. I tried to solve the problem of summing the squares of numbers from 1 to 10,000,000. First I made a functional approach to solve the problem with the map and decrease it, but as the number increases I have to bring the numbers to bigint to get the right result.

long num = 10000001; BigInt result; result = iota(1,num).map!(a => to!BigInt(a * a)).reduce!((a,b) => (a + b)); writeln("The sum is : ", result);

The above takes 7s to complete when compiling with dmd -O. I have profiled the program, and most of the time is spent on BigInt calls. Although the square of the number can fit in a long one, I have to give them to the bigint method in order to reduce the amount of functions and return the corresponding amount. The python program takes only 3 seconds to complete it. When the program num = 100000000 D reaches 1 minute and 13 seconds. Is there a way to optimize calls in bigint. Products themselves can be long, but they must be typed as bigint objects so that they give the right results while reducing operations. I tried pushing a square of numbers into a bigint array, but it is also slower. I tried to display all numbers as Bigint

 auto bigs_map_nums = iota(1,num).map!(a => to!BigInt(a)).array; auto bigs_map = sum(bigs_map_nums.map!(a => (a * a)).array);

But he is also slower. I read the answers to How to optimize this short factorial function in scala? (Creation of 50,000 BigInts) . Is this a problem with implementing multiplication for large integers in D? Is there a way to optimize function calls in BigInt?

python code:

 timeit.timeit('print sum(map(lambda num : num * num, range(1,10000000)))',number=1) 333333283333335000000 3.58552622795105

The code was executed on a dual-core 64-bit Linux laptop with 2 GB of RAM. python: 2.7.4 dmd: compiler DMD64 D v2.066.1

+6

optimization python d

xtreak Jan 16 '15 at 20:30

source share

3 answers

The python code is not equivalent to the D code, in fact it is much smaller.

Python uses an int, then it advances that int to long () when the result is larger than the one that can be stored in the int () type. Inside (at least CPython) a long number is used to store an integer in excess of 256, which is at least 32 bits. Until this overflow of normal cpu commands is used for multiplication, which runs faster than bigint multiplication.

D The BigInt implementation treats numbers like BigInt from the start and uses the expensive multiplication operation from 1 to the end. Much more needs to be done there.

I wonder how complicated multiplication can be when we talk about BigInts.

Implementation D

https://github.com/D-Programming-Language/phobos/blob/v2.066.1/std/internal/math/biguintcore.d#L1246

Python starts with execution

 static PyObject * int_mul(PyObject *v, PyObject *w) { long a, b; long longprod; /* a*b in native long arithmetic */ double doubled_longprod; /* (double)longprod */ double doubleprod; /* (double)a * (double)b */ CONVERT_TO_LONG(v, a); CONVERT_TO_LONG(w, b); /* casts in the next line avoid undefined behaviour on overflow */ longprod = (long)((unsigned long)a * b); ... //check if we have overflowed { const double diff = doubled_longprod - doubleprod; const double absdiff = diff >= 0.0 ? diff : -diff; const double absprod = doubleprod >= 0.0 ? doubleprod : -doubleprod; /* absdiff/absprod <= 1/32 iff 32 * absdiff <= absprod -- 5 good bits is "close enough" */ if (32.0 * absdiff <= absprod) return PyInt_FromLong(longprod); else return PyLong_Type.tp_as_number->nb_multiply(v, w); } }

and if the number is greater than the long, it can lead to the multiplication of karatsuba. Implementation in:

http://svn.python.org/projects/python/trunk/Objects/longobject.c (k_mul function)

Equivalent code will wait for BigInts to be used until they become native data types that may contain the specified number.

+5

Tasos vogiatzoglou Jan 16 '15 at 9:38

source share

The DMD backend does not generate highly optimized code. For fast programs, compile using GDC or LDC.

On my computer, I get these timings:

 Python: 3.01 dmd -O -inline -release: 3.92 ldmd2 -O -inline -release: 2.14

+4

Vladimir Panteleev Jan 16 '15 at 20:41

source share

user4463256 · Accepted Answer · 2015-01-16T21:33:37+0000

Without a cold zone: foreach(x; 0 .. num) result += x * x;

With a range of cold (?) Values:

 import std.functional: reverseArgs; result = iota(1, num) .map!(a => a * a) .reverseArgs!(reduce!((a, b) => a + b))(BigInt(0) /* seed */);

The key is to avoid BigInt each element, of course.

The range version is slightly slower than the non-range version. Both are significantly faster than the python version.

Edit: Oh! Oh! std.algorithm.sum can be made much nicer:

 result = iota(1, num) .map!(a => a * a) .sum(BigInt(0));

Call optimization bigint

More articles: