Should I prefer to use one memory access for reading or writing?

It is well known that one-step memory access is the best for performance.

In situations where

  • I have to access one memory area for reading,
  • I have to access another region for recording and
  • I can access only one of the two areas in one way,

Should I better read step one or write one step?

One simple, concrete example is a BLAS-like copy and swap operation, for example y := P x . The permutation matrix P completely determined by some permutation vector q(i) . It has the corresponding reverse permutation vector qinv(i) . You can encode the required loop as y[qinv(i)] = x[i] or as y[i]=x[q(i)] , where the first reads from x , step one, and the second writes step y to y .

Ideally, you can always encode both possibilities, project them under typical conditions and choose a faster version. Imagine that you could only encode one version - which access pattern would you always expect faster based on the behavior of modern memory architectures? Do you change your response in a streaming environment?

+6
source share
1 answer

The access pattern you call "write stride one" ( y[i]=x[q(i)] ) is usually faster.

If the memory is cached and your pieces of data are smaller than the cache line, this access pattern requires less memory bandwidth.

Typically, modern processors require more load units than storage units. And the next Intel architecture, called Haswell, only supports the GATHER instruction, while SCATTER is not yet in their plans. All this is also in favor of the "write stride one" template.

Work in a streaming environment does not change this.

+4
source

Source: https://habr.com/ru/post/906951/


All Articles