Let N be an unsigned compilation integer.
GCC can optimize
unsigned sum = 0; for(unsigned i=0; i<N; i++) sum += a;
just a*N This can be understood since modular arithmetic says (a%k + b%k)%k = (a+b)%k .
However gcc will not optimize
float sum = 0; for(unsigned i=0; i<N; i++) sum += a;
up to a*(float)N
But using associative math, for example, -Ofast I found that GCC can reduce this in order of log2(N) . For example, for N=8 he can make the sum in three additions.
sum = a + a sum = sum + sum
Although some point after N=16 GCC returns to the execution of the sums of N-1 .
My question is: why does GCC not make a*(float)N with -Ofast ?
Instead of O(N) or O(Log(N)) it could just be O(1) . Since N known at compile time, you can determine if N is suitable in the float. And even if N too large for the float, it can do sum =a*(float)(N & 0x0000ffff) + a*(float)(N & ffff0000) . In fact, I did a little test to check the accuracy, and a*(float)N more accurate anyway (see Code and results below).
//gcc -O3 foo.c //don't use -Ofast or -ffast-math or -fassociative-math
The result is 61848396.000000 52707136.000000 52707136.000000 , which shows that the Kahan multiplication and summation have the same result, which, I think, shows that the multiplication is more accurate than a simple sum.
optimization with gcc floating-point
Z boson Oct. 15 '15 at 15:23 2015-10-15 15:23
source share