It makes no sense to use_mm256_lddqu_si256 , considering it a synonym _mm256_loadu_si256. lddquexists only for historical reasons, since x86 has evolved to improve carrier support without binding, and processors that support the AVX version run them the same way. No version of AVX512.
- lddqu , , , .
x86 vlddqu vmovdqu. , , AVX-. , , , ( , Nehalem). vlddqu -.
lddqu movdqu Pentium 4. . ... : 1. LDDQU/movdqu .
lddqu ( P4 ) 16B . movdqu - 16 . . , , movdqu, , , lddqu. ( movdqu " ", , . , , , .)
UnCacheable (UC) Uncacheable Speculate Write-combining (UCSW, aka) ( MMIO .)
asm :
4000e3: 0f 10 07 movups xmm0, [rdi]
4000e6: f2 0f f0 07 lddqu xmm0, [rdi]
4000ea: f3 0f 6f 07 movdqu xmm0, [rdi]
4000ee: c5 fb f0 07 vlddqu xmm0, [rdi]
4000f2: c5 fa 6f 07 vmovdqu xmm0, [rdi]
Core2 lddqu, movdqu. Intel lddqu Core2, .
Core2, - SSSE3 palignr, movdqu, 2- Core2 (Penryn), palignr - uop 2 Merom/Conroe. (Penryn 128b).
. Dark Shikaris 2009 Diary Of x264 : Cacheline , .
Core2 - Nehalem, movdqu - uop . - , ( , , AVX), , movdqu , , .
, Intel AVX lddqu . , , movdqu/vmovdqu ( SSE AVX128/AVX256) , - VEX .
, AVX, //, . , vmovdqa.
, ; movdqu lddqu, uops - , , , uop , , .
Intel ISA ref lddqu , 256b 64 ( ):
(V) MOVDQU, . , , , (V) LDDQU, , (V) MOVDQU (V) MOVDQA (V) LDDQU. , , , 16- , (V) MOVDQA.
IDK, , (V) AVX. , Intel vlddqu , .
AVX512 vlddqu, , , Intel , .