Why do I see performance degradation when using row order?

Question

Why do I see performance degradation when using row order?

I have a piece of code that runs on a large matrix and calculates statistics broken down by columns, where the bins are given in vector b.

The code goes (something) as follows:

for (item = 0; item < items; item++) {
    uint8 bin = binvec[item];
    for (col = 0; col < columns; col++) {
        int idx = item * items_stride + col * cols_stride;
        uint8 val = matrix[idx];
        float x = matrix2[idx];
        count[bin][val][col] += x;
    }
}

Suppose the number of columns is known at compile time. The values matrixdo not have a specific structure / order - they take pure random values. the data size is quite large: several million elements and hundreds of columns.

Looking at the code, I assume that the best performance will be achieved if:

matrix is the main line for better cache locality.
countwill be available as count[bin][col][val], therefore, address calculation count[bin][col]can be optimized, which will simplify prefetching, etc.

, matrix count , .

(1) (2) 50% . , ..

, ? .

+4

optimization c caching simd nested-loops

Moti 23 . '17 10:35

1

tgregory · Accepted Answer · 2017-06-23T12:33:43+0000

. . ?

, , , , .

count, val , , , , :

count[bin][val][col]

, val . :

count[bin][col][val]

, , . .

( val) , ?

Why do I see performance degradation when using row order?

More articles: