How not to ruin the cache when working with some long vectors in memory?

Premise

I want to do some kind of calculation involving klong data vectors (each of the lengths n) that I get in main memory, and write some result back to main memory. For simplicity, we also assume that the calculation is simple.

for(i = 0; i < n; i++)
    v_out[i] = foo(v_1[i],v_2[i], ... ,v_k[i])

or maybe

for(i = 0; i < n; i++)
    v_out[i] = bar_k(...bar_2(bar_1(v_1[i]),v_2[i]), ... ),v_k[i])

(this is not code, this is pseudocode.) foo()and bar_i()functions have no side effects. kis constant (known at compile time), nknown only before this calculation (and it is relatively large - at least several times larger than the entire size of the L2 cache and possibly larger).

, x86_64 (Intel AMD, --, , ). , , foo() ( bar_i()) , .. n ( k x n) invocations foo() ( bar_i()).

, :

  • , .
  • , .
  • bar_j(...bar_1(v_1[i])...), L1, , v_ {j + 1} [i]... v_k [i] . L2.
  • L1 , . L2.
  • .
  • , .
  • ---- v_out ( , , , , ).

:

  • . , .
  • , , .
  • bar_i , w.r.t. v_out.
+4
2

, .

v_1 [i], v_2 [i],..., v_k [i] ​​ , , . , , , , . , k , - .

, .

, . .

, L1, , ( ).

_mm_prefetch .

.

. , , CPU , . . k k x n , , .

type* pMat = (type*)aligned_alloc(CACHE_LINE_SIZE, n * k * sizeof(type));
v_0[i] = pMat[i * k + 0];
v_1[i] = pMat[i * k + 1];
// ...
v_k-1[i] = pMat[i * k + k-1];

v_0,... v_k SIMD, .

, .

, transcedental .

---- v_out ( , ; , , ).

, (_mm_prefetch).

+2

, k 1, k. , , .

struct VectorData
{
    Type1 Var1;
    Type2 Var2;
    // ...
    TypeK VarK;
};

std::vector<VectorData> v_in;

for (i = 0; i < n; i++){
    v_out[i] = foo(v_in[i].Var1, v_in[i].Var2, ... , v_in[i].VarK);
    // Or just pass the whole element:
    v_out[i] = foo(v_in[i]);
}
+1

Source: https://habr.com/ru/post/1584734/


All Articles