Array summation is slower than summing individual variables in Julia

Question

Array summation is slower than summing individual variables in Julia

Ok, I've been doing a series of tests lately. I have an MC simulation where I have several variables (20) that make sense to put them all in a one-dimensional array, because it makes it easier to read several things.

But I have one problem: I need to summarize the variables at each iteration, and the simulation takes a lot of Iterations, so I came across this problem (reduced to 7 variables):

function sumtest1(N) s=0.0 a=1.0 b=2.0 c=3.0 d=4.0 e=5.0 f=6.0 g=7.0 for i = 1:N s += (a+b+c+d+e+f+g) end return s end function sumtest2(N) s=0.0 A=[1.0,2.0,3.0,4.0,5.0,6.0,7.0] for i = 1:N s += sum(A) end return s end @time sumtest1(1_000_000_000) elapsed time: 0.998272756 seconds (96 bytes allocated) @time sumtest1(1_000_000_000) elapsed time: 7.522198967 seconds (208 bytes allocated)

Is this expected? Or am I doing something wrong? I would really like my variables to be indexed due to other reasons that have been explaining for too long, but this performance limitation is something I can't go with.

+5

performance arrays sum julia-lang

Esteban Apr 22 '16 at 18:42

source share

1 answer

Stefankarpinski · Accepted Answer · 2016-04-22T19:16:47+0000

Look at the LLVM code that runs for sumtest1 :

 julia> @code_llvm sumtest1(10^9) define double @julia_sumtest1_21391(i64) { top: %1 = icmp sgt i64 %0, 0 %2 = select i1 %1, i64 %0, i64 0 %3 = icmp eq i64 %2, 0 br i1 %3, label %L3, label %L.preheader L.preheader: ; preds = %top %4 = icmp sgt i64 %0, 0 %smax = select i1 %4, i64 %0, i64 0 br label %L L: ; preds = %L, %L.preheader %lsr.iv = phi i64 [ %smax, %L.preheader ], [ %lsr.iv.next, %L ] %s.0 = phi double [ %5, %L ], [ 0.000000e+00, %L.preheader ] %5 = fadd double %s.0, 2.800000e+01 %lsr.iv.next = add i64 %lsr.iv, -1 %6 = icmp eq i64 %lsr.iv.next, 0 br i1 %6, label %L3, label %L L3: ; preds = %L, %top %s.1 = phi double [ 0.000000e+00, %top ], [ %5, %L ] ret double %s.1 }

This is pretty complicated, but one thing stands out in the body of the loop, L :

  %5 = fadd double %s.0, 2.800000e+01

For each iteration, the previously calculated constant 28.0 , s added to the drive. The compiler can say that you never change any of the local variables, and therefore it knows that the same amount is added every time. The only reason the loop should be executed at all is because re-adding floating point is not exactly equivalent to multiplication. If all local variables change to integers, where repeated addition is exactly equivalent to multiplication, the cycle is completely excluded:

 julia> @time sumtest1_int(10^9) 0.000005 seconds (6 allocations: 192 bytes) 28000000000 julia> @code_llvm sumtest1_int(10^9) define i64 @julia_sumtest1_int_21472(i64) { top: %1 = icmp slt i64 %0, 1 br i1 %1, label %L3, label %L.preheader L.preheader: ; preds = %top %2 = icmp sgt i64 %0, 0 %.op = mul i64 %0, 28 %3 = select i1 %2, i64 %.op, i64 0 br label %L3 L3: ; preds = %L.preheader, %top %s.1 = phi i64 [ 0, %top ], [ %3, %L.preheader ] ret i64 %s.1 }

Which translates roughly to Julia as:

 sumtest1_int(N) = N < 1 ? 0 : ifelse(N > 0, N*28, 0)

This is a bit redundant, since the body can be simplified to ifelse(N > 1, N*28, 0) (which, in turn, can only be changed to 28N , since we do not care about negative N values), but it is all even faster than looping.

The sumtest2 function cannot be easily analyzed or optimized. To do this, one would have to prove that array A never be modified, which is rather complicated. Therefore, the compiler has no choice but to do all the work, which, of course, is much slower than not doing it. In your simulation, it may still be faster to use local variables than to store values in an array, but this may not be the case. You will need to measure code that will make something harder to fully optimize it to be sure.

Array summation is slower than summing individual variables in Julia

More articles: