I want to transfer the load vector __m256
with the same 4 pairs of floats that are adjacent in memory.
That is, if I have a pointer to the float x
array specified by {a, b}
, I want to end with __mm256
containing {a, b, a, b, a, b, a, b}
.
My question is, are there any potential problems using _mm256_broadcast_sd
to achieve this after casting x
become a double array?
So:
__m256 vect = (__m256)_mm256_broadcast_sd((double *)x);
source share