16-bit memory alignment using SSE instructions

I am trying to get rid of unidentified loads and store for SSE instructions for my application by replacing

_mm_loadu_ps() 

 _mm_load_ps() 

and memory allocation using:

 float *ptr = (float *) _mm_malloc(h*w*sizeof(float),16) 

instead:

 float *ptr = (float *) malloc(h*w*sizeof(float)) 

However, wehen I print the pointer addresses using:

 printf("%p\n", &ptr) 

I get the output:

 0x2521d20 0x2521d28 0x2521d30 0x2521d38 0x2521d40 0x2521d48 ... 

This is not 16 byte alignment, although I used the _mm_malloc function? And when using aligned load / store operations for SSE instructions, I get a segmentation error, because the data is not aligned by 16 bytes.

Any ideas why they are not aligned correctly or any other ideas to fix this?

Thanks in advance!


Update

Using

 printf("%p\n",ptr) 

solved the problem with memory alignment, the data is really correctly aligned.

However, I still get a segmentation error when I try to align load / store with this data, and I suspect this is a pointer problem.

When allocating memory:

 contents* instance; instance.values = (float *) _mm_malloc(h*w*sizeof(float),16); 

I have a structure with:

 typedef struct{ ... float** values; ... }contents; 

In the code, I then execute in another function with a pointer to the content passed as an argument:

 __m128 tmp = _mm_load_ps(&contents.values); 

Do you guys see something that I am missing? Thanks for all the help so far :)

+4
source share
1 answer

Edit:

 printf("%p\n", &ptr) 

in

 printf("%p\n", ptr) 

This is the memory pointed to by ptr, which must be aligned by 16 bytes, not the actual pointer variable.

+3
source

Source: https://habr.com/ru/post/1386082/


All Articles