I want to transfer the load vector __m256 with the same 4 pairs of floats that are adjacent in memory.
That is, if I have a pointer to the float x array specified by {a, b} , I want to end with __mm256 containing {a, b, a, b, a, b, a, b} .
My question is, are there any potential problems using _mm256_broadcast_sd to achieve this after casting x become a double array?
So:
__m256 vect = (__m256)_mm256_broadcast_sd((double *)x);
source share