There are many questions about access to unallocated memory, which is clearly undefined. But what about the next corner case.
Consider the following structure, which is aligned with 16 bytes, but takes up only 8 bytes:
struct alignas(16) A
{
float data[2];
};
Now we get access to 16 bytes of data using the built-in functions of loading / storage via SSE:
__m128 test_load(const A &a)
{
return _mm_load_ps(a.data);
}
void test_store(A &a, __m128 v)
{
_mm_store_ps(a.data, v);
}
This is also undefined behavior, and should I use the add-on instead?
In any case, since Intel intrinsics is not standard C ++, does it have access to a partially distributed but aligned memory block (no more than the alignment size) undefined behavior in standard C ++?
I refer to both the internal case and the C ++ standard. I'm interested in both of them.