Matrix Computing in C

I recently noticed that seemingly small changes in access to the matrix in C can have a big impact on performance. For example, suppose we have these two pieces of C code. This:

for(i = 0; i < 2048; i++)
{
    for(j = 0; j < 2048; j++) {
            Matrix[i][j] = 9999;    
    }
}

And this one:

for(j = 0; j < 2048; j++)
{
    for(i = 0; i < 2048; i++) {
            Matrix[i][j] = 9999;    
    }
}

The second version is 2 times slower than the first version. What for? I think this is related to memory management: in each cycle, the first version refers to the positions in memory that are located next to each other, and the second version should “jump” to different regions in each cycle. Is this intuition right? Also, if I make the matrix small (e.g. 64x64), then there is no difference in performance. What for? I would appreciate it if someone could give intuitive and rigorous information. By the way, I am using Ubuntu 14.04 LTS.

+4
4
        for(i=0;i<2048;i++)
        {
                for(j=0;j<2048;j++) {
                        Matrix[i][j]=9999;    
                }
        }

L1, L2 L3. j Matrix[i][j], Matrix[i][0], Matrix[i][1]... a.s.o. ( , sizeof(Matrix[i][0])), Matrix[i][0] - Matrix [i] [1].

,

        for(j=0;j<2048;j++)
        {
                for(i=0;i<2048;i++) {
                        Matrix[i][j]=9999;    
                }
        }

Matrix[0][j], Matrix[1][j]... a.s.o. Matrix[1][j] - Matrix[0][j]+2048*sizeof(Matrix[0][0]) - , 2048 Matrix[0].

, Matrix[0][j] , Matrix[1][j], , .

.

+5

" ! !"

, ...

2D-:

uint8_t Matrix[4][4]

:

allocate 16 bytes, and access them as a 2D array, 4x4

4- , :

2D array in memory

4 , [0][0], [1][0], [2][0],... , ( ) 16 !

[0][0], [0][1], [0][2],... 2D- 4 .


- L1 L2, L3.

. :

  • L1 (, )
  • L2
  • L3 (?)
  • (: HDD - , , , ).
+3

. , ( ).

(, DRAM - L1 I-cache), L1 ).

. ( ) __builtin_prefetch ( GCC , , PREFETCH ). __builtin_prefetch .

, gcc -Wall -O2 -march=native, , ( -O3 -O2...).

+2

. . .

There are circuits on the computer that store data next to reading, because most likely the nearest data will be read soon. You cannot control how these schemes work. All you can do is customize the code to your behavior.

+2
source

Source: https://habr.com/ru/post/1688458/


All Articles