How does array access affect performance?

Question

How does array access affect performance?

int steps = 256 * 1024 * 1024; int[] a = new int[2]; // Loop 1 for (int i=0; i<steps; i++) { a[0]++; a[0]++; } // Loop 2 for (int i=0; i<steps; i++) { a[0]++; a[1]++; }

Can someone explain why the second cycle is 20 times smaller than the first (19 ms vs 232 ms)?

Here is how I define it:

 long start_time = System.currentTimeMillis(); // Loop long end_time = System.currentTimeMillis(); System.out.println(end_time - start_time);

+5

java performance algorithm

user6611764 Jul 21 '17 at 19:19

source share

1 answer

Peter de Rivaz · Accepted Answer · 2017-07-21T20:04:54+0000

Summary

The JIT compiler converts the first cycle to multiplication, but does not really optimize the second cycle.

Discussion

The bytecode for both loops is basically the same (you can view this with javap -c test.class ).

In Java, bytecode is converted to x86 instructions by the JIT compiler, which has the ability to perform additional optimizations.

In fact, you can view the assembly created by JIT via java -XX:+UnlockDiagnosticVMOptions -XX:+PrintAssembly ... if you have the hsdis plugin.

I changed the value added to each element to 0xbad to make it easier to determine the appropriate code and change the loop counter to long .

The first loop creates:

  mov r11d,dword ptr [r13+10h] Load from memory a[0] ... add r11d,175ah Add 2 * 0xbad to the value mov dword ptr [r13+10h],r11d Store to memory a[0]

The second cycle creates:

  mov ebx,dword ptr [rax+10h] Load from memory a[0] add ebx,0badh Add 0xbad ... mov dword ptr [rax+10h],ebx Store to memory ... mov ebx,dword ptr [rax+14h] Load from memory a[1] add ebx,0badh Add 0xbad ... mov dword ptr [rax+14h],ebx Store to memory a[1]

so that you can see that the compiler can already optimize the first loop with fewer instructions.

In particular, he noticed that two additions to the same array element can be combined into one addition, twice the value.

When I changed the loop counter to int , I noticed that the compiler manages to do even better with your first loop:

 mov r10d,dword ptr [r14+10h] imul ecx,r13d,175ah This line converts lots of adds of 0xbad into a single multiply mov r11d,r10d sub r11d,ecx add r10d,175ah mov dword ptr [r14+10h],r10d

In this case, he noticed that he can actually implement several iterations of your loop in a single pass using multiplication! This explains how the first cycle can be an order of magnitude faster than the second.

How does array access affect performance?

Summary

Discussion

More articles: