Effective Access Matrix Columns

Effective access issue: I need to access a large matrix column (more than 2000x2000), for my algorithm I need a pass of 1 row and 1 column. Line skipping is good for memory efficiency (cache miss), but how to reduce cache skipping in a column pass? I need efficiency.

The only thing I had was: declare n a local variable (based on memory sample size),

int a1, a2, a3, a4; for ( int j = 0 ; j < DIM_Y ; j+=4 ) for ( int i = 0 ; i < DIM_X ; i++ ) a1 = matrix[i][j]; ... ; a4 = matrix[i][j+4]; // make the column processing on the 4 variables.

This is in C or C ++, as well as in an array of either int or char.

Any suggestions or comments are welcome.

Thanks.

0
source share
2 answers

Two main methods are used:

1) cycle lock

Instead

  for (j=0;j<2000;j++) for (i=0;i<2000;i++) process_element(i,j); 

use

 for (j=0;j<2000;j+=8) for (i=0;i<2000;i+=8) process_block_of_8x8(i,j); 

2) invalidity of the 2nd row (for example, 8192 bytes + 64) - pad if necessary

in this case line [i] .. line [i + 7] will not fight for the same cache line

data should be in the area of โ€‹โ€‹continuous memory with a manually calculated addition.

+1
source

An efficient way to store a 2D matrix uses a C style array like this:

 | a11 a12 a13 | | a21 a22 a23 | -> memory: [a11,a12,a13,a21,a22,a23,a31,a32,a33] | a31 a32 a33 | Element(i,j) = memory[N_COL*i+j] 

where i is the index of the row number starting at 0 , and j index of the column number also starting at 0 and N_COL number of columns.

Hopefully the / jit compiler is about to put all the values โ€‹โ€‹sequentially into memory for quick access. Usually, the more you try to trick the compiler (for example, manually unrolling the loop), the more you feel in performance. Write clean code and let the compiler do its job.

0
source

Source: https://habr.com/ru/post/985195/


All Articles