Fast memory access in C ++?

What should be considered when developing a game in terms of quick access to memory in C ++?

Is loadable memory static, so should I correctly insert it into a contiguous block of memory?

Also, how do I organize variables within structures to improve performance?

+6
source share
6 answers

Memory performance is extremely uncertain.

I think that what you are looking for is CPU cache processing, since there is a factor of about 10 between cache access and main memory access.

For a complete reference to the underlying mechanisms of the cache, you can read this wonderful series of articles by Ulrich Drapper on lwn.net .

In short:

Target on the ground

You should not jump in memory, so try (if possible) to combine the elements that will be used together.

Predictability goal

If your memory accesses are predictable, the processor will most likely pre-program the memory for the next piece of work so that it is available immediately or shortly after the completion of the current fragment.

A typical example is a for loop on arrays:

 for (int i = 0; i != MAX; ++i) for (int j = 0; j != MAX; ++j) array[i][j] += 1; 

Change array[i][j] += 1; on array[j][i] += 1; , and performance will change ... at low levels of optimization;)

The compiler should catch these obvious cases, but some of them are more insidious. For example, using Node-based containers (linked lists, binary search trees) instead of array-based containers (vector, some hash tables) can slow down the application.

Do not waste your space ... beware of false exchanges

Try packing your structures. This is due to alignment, and you can waste space due to alignment problems in your structures that artificially inflate the structure size and the waste cache space.

A typical rule is to arrange the elements in the structure by reducing the size (use sizeof ). This is stupid, but works well. If you are more aware of dimensions and alignments, just avoid holes :) Note: Useful only for structures with many instances ...

However, beware of false exchanges. In programs with multiple threads, simultaneous access to two variables that are close enough to share the same cache line is expensive because it involves a lot of cache invalidation and a processor crash for owning the cache line.

Profile

Unfortunately, this is difficult to determine.

If you intend to program on Unix, Callgrind (part of the Valgrind package) can be started by modeling the cache and determine the parts of the code that trigger the cache skips.

I guess there are other tools, I just never used them.

+11
source

You do not care. Such things are likely to be micro-optimizations of the smallest nature. Run it first, if it is too slow, then find out which parts are slow and optimize them (hint: most likely you will call libraries, etc., and not memory access).

+7
source

I agree with earlier statements. You have to write your game, then find out where the time is spent and try to improve there.

However, in the spirit of providing some potentially useful [and potentially distracting from real problems :-)] tips, there are some common mistakes you may find:

  • Function pointers and virtual methods provide greater design flexibility, but if they are used very, very often, you will find them slower than those that can be embedded. This is mainly due to the fact that the CPU is more difficult to perform branch prediction for calls using the function pointer. A good mitigation for this in C ++ is to use templates that can give you similar design flexibility at compile time.

    One potential drawback of this approach is that nesting will increase your code size. The good news is that your compiler decides whether it is built-in or not, and it can probably make better decisions about it than you can. In many cases, your optimizer knows about your specific processor architecture and may make some good suggestions for this.

  • Avoid indirection in commonly used data structures.

For instance:

 struct Foo { // [snip] other members here... Bar *barObject; // pointer to another allocation owned by Foo structure }; 

can sometimes create less efficient memory layouts than this:

 struct Foo { // [snip] other members here... Bar barObject; // object is a part of Foo, with no indirection }; 

This may seem silly, and in most cases you will not notice any difference. But the general idea is that “unnecessary indirection” is a good thing that should be avoided. Do not go too far from your path to do this, but it must be remembered.

One of the potential drawbacks of this approach is that it can make your Foo objects more cache incompatible ...

  • In the lines of the previous two rounds ... In C ++, containers and STL algorithms can lead to fairly efficient object-oriented code. In the case of <algorithm> , your functor passed in according to various algorithms can be easily integrated, which helps to avoid unnecessary pointer calls, while maintaining common routines. In the case of containers, STL can accordingly declare objects of type T inside list nodes, etc., which helps to avoid unnecessary indirectness in data structures.

  • Yes, memory access can make a difference ... An example would be a loop through pixels in a large image. If you process a column of an image by time, it can be worse than processing a row by time. In the most common image formats, the pixel in (x, y) is usually next to the at (x + 1, y) symbol, while the pixel in (x, y) is usually equal to the (width) pixels from (x, y + 1).

  • At the same time, as the second bullet, at one time working on a project with image manipulation (although by old hardware by today's standards), I saw that even the arithmetic involved in determining the location of the pixel was slower. For example, if you are dealing with coordinates (x, y), it is intuitively clear that you need to reference the pixel in buf[y * bytes_per_line + x] . If your processor is slow at multiplication and the image is large, this can add up. In this case, it is better to contact line by line in time than continuing to calculate the location (x, y) for different coordinates.

Of course, the overall design of your game should bring your early decisions together, and measurement should guide your performance improvements. You do not have to go out of your way to implement these points if it prevents you from doing the “real job” or makes it difficult to understand the project. But these bullet points are intended to give some examples of where you might run into some problems and introduce some context of what might cause performance problems in practice, among other measures such as algorithmic complexity.

+1
source

Finding a solution before a problem occurs is not productive.

It’s better that you focus on your design, leaving those details later, who knows, maybe you won’t have any performance issues due to the good overall design.

0
source
Memory usage

does not have to be contiguous. if you can halve the size of memory used, this may help a little.

In terms of structure organization, you should store bytes together, then shorts together, etc. Otherwise, the compiler will waste memory aligning smaller bytes and short circuits on bivalve locations.

one more tip. if you use a class, you can push it onto the stack rather than allocating it with a new one.

i means

 CmyClass x; instead of Cmyclass px = new CmyClass; ... delete px; 

** edit When you call new () or malloc, you call the C ++ heap, sometimes the heap returns a new block of memory in several cycles, sometimes it is not. When you declare a class on the stack, you still eat the same amount of memory (maybe more complex), but the class just pushes onto the stack and no function calls are required. Ever. When the function completes, the stack is cleared and the stack is compressed.

0
source

An address read from the cache is much faster than when reading from the main memory. Therefore, try to keep any addresses that you read from as close as possible to each other.

For example, when creating a linked list, you probably would be better off building one large block for all of your nodes (which can be placed more or less in order) than using a single malloc for node (which can fragment your data structure well)

0
source

Source: https://habr.com/ru/post/893164/


All Articles