Interpreting Cachegrind Output

This is part of cachegrind output. This part of the code is executed 1224 times. elmg1 is an unsigned 16 x 20 array. The size of the L1 cache for my machine is 32 KB, the size of the cache line is 64 B, and the 8-way associative association.

  • for (i = 0; i <20; i ++) 78.336 2.448 2 50.184 0 0 1.224 0 0
  • {
  • telm01 = elmg1 [i]; 146.880 0 0 73.440 0 0 24 480 0 0
  • telm31 = (telm01 <3) ^ val1; 97.920 0 0 48.960 0 0 24.480 0 0
  • telm21 = (telm01 <2) ^ (val1 → 1); 146.880 1.224 1 48.960 0 0 24 480 0 0
  • telm11 = (telm01 <1) ^ (val1 → 2); 146.880 0 0 48.960 0 0 24 480 0 0
  • }

and. The reason I put it here is because in the third line inside the for loop, I see several misses of I1 (one miss of L2 too). This is somewhat confusing, and I could not guess why?

Q. I am trying to optimize the (time) part of the code. Above all, this is a small fragment. I think that in my memory the program cost me a lot. As in the example above, elmg1 is an array of 16 x 20 unsigned lengths. When I try to use it in the code, there are always some flaws, and in my program these variables occur a lot. Any suggestions?

C. I need to highlight and (sometimes initialize) these unsigned long ones. Can you suggest which one I should prefer, a calloc or array declaration, and then explicit initialization. By the way, will there be any difference in how the cache handles them?

Thank.

+3
source share
1 answer

?

  • L1 . L2 1224 , - .
  • L2 ?
  • calloc(), , , . , , - , , .

edit: , .

, 5:

Ir    146,880
I1mr  1,224
ILmr  1
Dr    48,960
D1mr  0
DLmr  0
Dw    24,480
D1mw  0
DLmw  0

L1 32 I1 D1. IL DL - L2 L3, , .

I1mr - , , , I1.

I1 1 5 3672, 3 1224, , , 3 I1 64 , , 128-192 , 3 -. , I1 5, .

KCachegrind cachegrind

: .

, 1224 , , , I1.

I1 32 512 ( 64 ). "8- " , 8 512 . - 32 , I1, . , 8 64- 8 . , 1 ( ), 8 32 (1 /32 ) 8 .

lwn.net

, ( ) (.. , ). GCC /, /, (, ).

I1 , .

+3

Source: https://habr.com/ru/post/1772425/


All Articles