Memory performance is extremely uncertain.
I think that what you are looking for is CPU cache processing, since there is a factor of about 10 between cache access and main memory access.
For a complete reference to the underlying mechanisms of the cache, you can read this wonderful series of articles by Ulrich Drapper on lwn.net .
In short:
Target on the ground
You should not jump in memory, so try (if possible) to combine the elements that will be used together.
Predictability goal
If your memory accesses are predictable, the processor will most likely pre-program the memory for the next piece of work so that it is available immediately or shortly after the completion of the current fragment.
A typical example is a for loop on arrays:
for (int i = 0; i != MAX; ++i) for (int j = 0; j != MAX; ++j) array[i][j] += 1;
Change array[i][j] += 1; on array[j][i] += 1; , and performance will change ... at low levels of optimization;)
The compiler should catch these obvious cases, but some of them are more insidious. For example, using Node-based containers (linked lists, binary search trees) instead of array-based containers (vector, some hash tables) can slow down the application.
Do not waste your space ... beware of false exchanges
Try packing your structures. This is due to alignment, and you can waste space due to alignment problems in your structures that artificially inflate the structure size and the waste cache space.
A typical rule is to arrange the elements in the structure by reducing the size (use sizeof ). This is stupid, but works well. If you are more aware of dimensions and alignments, just avoid holes :) Note: Useful only for structures with many instances ...
However, beware of false exchanges. In programs with multiple threads, simultaneous access to two variables that are close enough to share the same cache line is expensive because it involves a lot of cache invalidation and a processor crash for owning the cache line.
Profile
Unfortunately, this is difficult to determine.
If you intend to program on Unix, Callgrind (part of the Valgrind package) can be started by modeling the cache and determine the parts of the code that trigger the cache skips.
I guess there are other tools, I just never used them.