Comparison of algorithms in C, what's the difference?

#define IMGX 8192
#define IMGY 8192
int red_freq[256];
char img[IMGY][IMGX][3];

main(){ 

int i, j;
  long long total;
  long long redness;

  for (i = 0; i < 256; i++) 
    red_freq[i] = 0;

  for (i = 0; i < IMGY; i++) 
    for (j = 0; j < IMGX; j++) 
      red_freq[img[i][j][0]] += 1;

  total = 0;
  for (i = 0; i < 256; i++) 
    total += (long long)i * (long long)red_freq[i];

  redness = (total + (IMGX*IMGY/2))/(IMGX*IMGY); 

what's the difference when replacing the second cycle with

for (j = 0; j < IMGX; j++) 
    for (i = 0; i < IMGY; i++) 
      red_freq[img[i][j][0]] += 1;

everything else remains the same and why is the first algorithm faster than the second algorithm?

Does it have anything to do with memory allocation?

+3
source share
6 answers

The first version changes the memory in sequence, so it optimally uses the processor cache. The second version uses one value from each loaded cache line, so it uses pessimal to use the cache.

It is clear that the cache is divided into lines, each of which will contain many values ​​in the general structure.

(SIMD-), .

+8

, , , - . CPU, , , .

+5

, (, ) , "" ( ) , . , , , , , .

, , " " - , .

+2

- , , , , .

+1

, , . , .

(i * (IMGY * IMGX)) + (j * IMGX) + 0

(i * (IMGY * IMGX)) gets calculates 8192 times
(j * IMGX) gets calculated 8192 * 8192 times

(i * (IMGY * IMGX)) gets calculates 8192 * 8192 times
(j * IMGX) gets calculated 8192 times

(i * (IMGY * IMGX)) 

, .

+1

, - . img, 3 . ( , 4 ). . , , sizeof (char[IMGX][3]) , 24 . . , , . , . , , , , , . , , .

, , , . , Row-major Order . , C.

0
source

Source: https://habr.com/ru/post/1705541/


All Articles