Speed ​​difference between char and whole arrays?

I am currently dealing with video processing software in which image data (8 bits, signed and unsigned) is stored in arrays of 16 aligned integers, allocated as

__declspec(align(16)) int *pData = (__declspec(align(16)) int *)_mm_malloc(width*height*sizeof(int),16); 

As a rule, will this not allow to speed up reading and writing if you used the so-called char unsigned arrays ?:

 __declspec(align(16)) int *pData = (__declspec(align(16)) unsigned char *)_mm_malloc(width*height*sizeof(unsigned char),16); 

I don't know much about cache line size and data transfer optimization, but at least I know this is a problem. In addition, SSE will be used in the future, in which case char -arrays - unlike int arrays - are already in packaged format. So which version will be faster?

+4
source share
4 answers

If you plan to use SSE, storing data in its own size (8 bits) is certainly the best choice, since many operations can be performed without unpacking, and even if you need to unpack for pmaddwd or other similar instructions, it is still faster because you have to load less data.

Even in scalar code, loading 8-bit or 16-bit values ​​is no slower than loading 32-bit values, since movzx / movsx does not differ in speed from mov. This way you just save memory, which certainly cannot be damaged.

+4
source

In fact, it depends on your target processor - you should familiarize yourself with its specifications and run some tests, as everyone has already said. Many factors can affect performance. The first obvious one that comes to my mind is that your array of ints is 2-4 times larger than the array of characters, and therefore, if the array is large enough, you will get fewer accesses to the data cache, which will definitely slow down performance .

0
source

on the contrary, packing and unpacking are CPU commands expensive.

if you want to do a lot of random pixel operations, it's faster to make it an int array so that each pixel has its own address.

but if you are consistently repeating your image, you want to create an array of characters so that it is small in size and reduces the likelihood of page errors (especially for large images).

-1
source

Char In some cases, arrays may be slower. As a very general rule of thumb, the size of the native word is best suited, which is likely to be 4-byte (32-bit) or 8-byte (64-bit). Even better, everything is consistent with 16 bytes, as you have already done ... this will allow faster copying if you use SSE (MOVNTA) instructions. If you are only concerned with moving objects around this, it will have a much greater impact than the type used by the array ...

-1
source

Source: https://habr.com/ru/post/1277205/


All Articles