Dynamic distribution of objects with aligned members - possible solutions?

I am considering using SSE to speed up work on some code in my project. This usually requires 16-byte alignment of the data I'm working on. For a static distribution, I suppose __declspec(align(16)) solves the problem, but my problem is: what is the best way to make sure that this happens with dynamic allocations? Especially in cases where the selected object does not require direct alignment, but uses objects with the requirement for alignment as elements (which makes it much easier to forget to correctly align it). I came up with the following solutions:

  • Always assume that any potentially non-statically distributed data is not aligned and use non-standard load commands. From what I read, this is slow, and in this case you should not worry about SSE at all. I can implement this and check how it works, but I would better ask about the best solutions before investing so much work in it to find out that it is not worth it or that there is another solution.

  • Be very careful and use only _aligned_malloc / _aligned_free to highlight any object that requires alignment, and any object that uses them as members. It is probably very easy to forget and therefore error prone.

  • Overloading new / delete globally and / or creating custom malloc / free functions that align memory and then use them for everything. However, probably not the best idea is to literally align everything that is dynamically distributed.

  • Create a base class with the overloaded new / delete operators, then make sure that any class that requires alignment and any class that uses them as members inherits it. Then just use new / delete for most / all dynamic allocations. Probably fewer errors than 2.

  • Is there some other way that I did not think about, or I do not know?

Options 1.-3. probably not the best ideas. What about 4.? Am I mistaken in everything that I mentioned? Suggestions, opinions, useful links on this topic?

Thanks in advance:)

+4
source share
2 answers

On Windows, malloc is aligned to 16 bytes ( msdn ). If your malloc platform has lower alignment requirements, you need to use aligned versions of malloc for objects used by SSE.

EDIT: if you have a specific class of objects that need SSE support, you can override the new / delete only for this class.

+3
source

Not sure if this is practical for your purposes, but you can use the Doug Lea allocator and define the MALLOC_ALIGNMENT macro to suit your needs (up to 128bytes).

You don’t even need to replace the default dispenser - you should use Doug Lea-specific dlmalloc and dlfree only for your SSE needs and continue to use the default dispenser for everything else.

0
source

Source: https://habr.com/ru/post/1395523/


All Articles