Look at the LLVM code that runs for sumtest1 :
julia> @code_llvm sumtest1(10^9) define double @julia_sumtest1_21391(i64) { top: %1 = icmp sgt i64 %0, 0 %2 = select i1 %1, i64 %0, i64 0 %3 = icmp eq i64 %2, 0 br i1 %3, label %L3, label %L.preheader L.preheader: ; preds = %top %4 = icmp sgt i64 %0, 0 %smax = select i1 %4, i64 %0, i64 0 br label %L L: ; preds = %L, %L.preheader %lsr.iv = phi i64 [ %smax, %L.preheader ], [ %lsr.iv.next, %L ] %s.0 = phi double [ %5, %L ], [ 0.000000e+00, %L.preheader ] %5 = fadd double %s.0, 2.800000e+01 %lsr.iv.next = add i64 %lsr.iv, -1 %6 = icmp eq i64 %lsr.iv.next, 0 br i1 %6, label %L3, label %L L3: ; preds = %L, %top %s.1 = phi double [ 0.000000e+00, %top ], [ %5, %L ] ret double %s.1 }
This is pretty complicated, but one thing stands out in the body of the loop, L :
%5 = fadd double %s.0, 2.800000e+01
For each iteration, the previously calculated constant 28.0 , s added to the drive. The compiler can say that you never change any of the local variables, and therefore it knows that the same amount is added every time. The only reason the loop should be executed at all is because re-adding floating point is not exactly equivalent to multiplication. If all local variables change to integers, where repeated addition is exactly equivalent to multiplication, the cycle is completely excluded:
julia> @time sumtest1_int(10^9) 0.000005 seconds (6 allocations: 192 bytes) 28000000000 julia> @code_llvm sumtest1_int(10^9) define i64 @julia_sumtest1_int_21472(i64) { top: %1 = icmp slt i64 %0, 1 br i1 %1, label %L3, label %L.preheader L.preheader: ; preds = %top %2 = icmp sgt i64 %0, 0 %.op = mul i64 %0, 28 %3 = select i1 %2, i64 %.op, i64 0 br label %L3 L3: ; preds = %L.preheader, %top %s.1 = phi i64 [ 0, %top ], [ %3, %L.preheader ] ret i64 %s.1 }
Which translates roughly to Julia as:
sumtest1_int(N) = N < 1 ? 0 : ifelse(N > 0, N*28, 0)
This is a bit redundant, since the body can be simplified to ifelse(N > 1, N*28, 0) (which, in turn, can only be changed to 28N , since we do not care about negative N values), but it is all even faster than looping.
The sumtest2 function cannot be easily analyzed or optimized. To do this, one would have to prove that array A never be modified, which is rather complicated. Therefore, the compiler has no choice but to do all the work, which, of course, is much slower than not doing it. In your simulation, it may still be faster to use local variables than to store values ββin an array, but this may not be the case. You will need to measure code that will make something harder to fully optimize it to be sure.