SSE Access Violation

I have a code:

float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;

mu_x_ptr = _aligned_malloc(4*sizeof(float), 16);
mm_mu_x = (__m128*) mu_x_ptr;
for(row = 0; row < ker_size; row++) {
    tmp = (__m128*) &original[row*width + col];
    *mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}

From this I get:

First-chance exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
Unhandled exception at 0x00ad192e in SSIM.exe: 0xC0000005: Access violation reading location 0x00000000.
The program '[4452] SSIM.exe: Native' has exited with code -1073741819 (0xc0000005)

when the program starts, an error occurs in the line _mm_add_ps.

the original is highlighted using _aligned_malloc (..., 16); and also passed to the function, so it should not, as far as I know, sse, be that it is not assigned.

I am wondering if anyone can understand why this is happening since I do not understand why.

EDIT: width and col are always multiples of 4. Col = 0 or 4, and width is always a multiple of 4.

EDIT2: It looks like my original array is not aligned. Would not be:

function(float *original);
.
.
.
    orignal = _aligned_malloc(width*height*sizeof(float), 16);
    function(original);
    _aligned_free(original);
}

Make sure the original is inside the function?

Edit3: Actually, it's really weird. When I do this:

float *orig;
orig = _aligned_malloc(width*height*sizeof(float), 16);
assert(isAligned(orig));

Statement fails with

#define isAligned(p) (((unsigned long)(p)) & 15 == 0)
+3
source share
2

,

__m128 tmp = _mm_load_ps( &original[row * width + col] );

tmp = (__m128 *)&original[row * width + col];

: , , . __m128 ( 4 ). .

, [row * width + col]. .

+3

tmp , width col . width col 4.

, , ,

#define IsAligned(p) ((((unsigned long)(p)) & 15) == 0)

float *mu_x_ptr;
__m128 *tmp;
__m128 *mm_mu_x;

assert(original != NULL && IsAligned(original));
mu_x_ptr = _aligned_malloc(4 * sizeof(float), 16);
assert(mu_x_ptr != NULL && IsAligned(mu_x_ptr));
mm_mu_x = (__m128 *)mu_x_ptr;
assert(IsAligned(mm_mu_x));
for (row = 0; row < ker_size; row++)
{
    tmp = (__m128 *)&original[row * width + col];
    assert(IsAligned(tmp));
    *mm_mu_x = _mm_add_ps(*tmp, *mm_mu_x);
}
+1

Source: https://habr.com/ru/post/1757785/


All Articles