It is well known that one-step memory access is the best for performance.
In situations where
- I have to access one memory area for reading,
- I have to access another region for recording and
- I can access only one of the two areas in one way,
Should I better read step one or write one step?
One simple, concrete example is a BLAS-like copy and swap operation, for example y := P x . The permutation matrix P completely determined by some permutation vector q(i) . It has the corresponding reverse permutation vector qinv(i) . You can encode the required loop as y[qinv(i)] = x[i] or as y[i]=x[q(i)] , where the first reads from x , step one, and the second writes step y to y .
Ideally, you can always encode both possibilities, project them under typical conditions and choose a faster version. Imagine that you could only encode one version - which access pattern would you always expect faster based on the behavior of modern memory architectures? Do you change your response in a streaming environment?
source share