With SSE, you can load one float from memory into all 4 __m128 slots with internal _mm_load1_ps ()
When using SIMD with 256-bit encryption with AVX, it seems there is no _mm256_load1_ps () to load one float from memory into all 8 vector slots.
Why is this an omission and what is the best way to do this?
Or even better: is there a way to load one float into the target slot 0..7 of the vector?
source share