Influence of performance of nested vectors on adjacent arrays

Were there any reliable tests that clearly displayed performance differences between access and writing to nested vectors compared to C ++ built-in arrays? I heard that using nested (multidimensional) vectors usually has some overhead compared to accessing elements in the same array (where all elements are stored in continuous memory), but this just seems to me hypothetical. I have not seen any tests that really show these differences. Are they significant? I’m sure it depends on the scenario, but as an inexperienced programmer, I’m not quite sure at what level these differences become significant.

+5
source share
2 answers

It definitely depends on the scenario, as far as I don’t think it can be answered in general, which approach is the fastest. The fastest approach will be that access patterns have better data locality, which greatly depends on the access pattern, as well as how the structures are laid out in memory, which in the case of nested vectors depends on the allocator and probably differs slightly between compilers .

I would use the general optimization rule, which should first write things in the simplest way, and then try to optimize when you can prove that there is a bottleneck.

+5
source

Two things affect the time difference between nested and flattened arrays:
Caching and Indirect Access

  • CPUs use a cache hierarchy to avoid accessing RAM too quickly. This takes advantage of the fact that most memory accesses either touch or have a certain temporal locality, that is, to what access was accessed recently, will be available again.
    This means that if the innermost arrays of your nested array are quite large, you will notice slight differences in the flat array if you will iterate over the values ​​in a contiguous manner. This means that when iterating over a range of values ​​for flat arrays, your inner loop should iterate over sequential elements, for nested arrays, your innermost loop should iterate over the innermost array.
  • If, however, your access patterns are random, the most important timing difference is: For a flat array, you use something like A[(Z * M + Y) * N + X] , so you do 4 arithmetic operations, and then access to memory.
    For a nested array, you use A[Z][Y][X] , so there are actually three interdependent memory accesses: you need to know A[Z] before you can access A[Z][Y] and etc. Due to the superscalar architecture of modern processors, operations that can be performed in parallel are notably efficient, there are not so many interdependent operations. Thus, you have some arithmetic operations and memory loading on the one hand and three interdependent loads on the other hand, which is much slower. It is possible that for nested arrays, the contents of A , as well as A[Z] for some Z values ​​can be found in the cache hierarchy, but if your nested array is large enough, it will never fit completely into the cache, resulting in several cache misses and memory loads (nested) instead of just cache miss and load (flat) for one random access to the array.

Also see his question , especially the shorter answers below for a more detailed discussion of caching (my answer) and indirectness (Peter's answer), which also gives an example where there are no noticeable differences between nested and flat arrays (after fixing the indexing error, of course;) )

So, if you want to know if there are significant differences in execution time between them, I would answer:

  • If you are doing random access, you will definitely notice a few hints, which will lead to longer execution time for nested arrays.

  • If you perform continuous access and use the correct order of loops (innermost loop = innermost array / innermost index for a planar array), and your innermost dimension of a multidimensional array is large enough, then the difference will be negligible, since the compiler will be able to move all indirect ones from the innermost cycle.

+2
source

Source: https://habr.com/ru/post/1270980/


All Articles