What is the difference between _mm256_lddqu_si256 and _mm256_loadu_si256

I used _mm256_lddqu_si256the example that I found on the Internet. I later discovered _mm256_loadu_si256. The Intel Intrinsics manual only states that a version lddqumay work better when crossing the cache line boundary. What could be the benefits loadu? In general, how do these functions differ?

+4
source share
1 answer

It makes no sense to use_mm256_lddqu_si256 , considering it a synonym _mm256_loadu_si256. lddquexists only for historical reasons, since x86 has evolved to improve carrier support without binding, and processors that support the AVX version run them the same way. No version of AVX512.

- lddqu , , , .


x86 vlddqu vmovdqu. , , AVX-. , , , ( , Nehalem). vlddqu -.

lddqu movdqu Pentium 4. . ... : 1. LDDQU/movdqu .

lddqu ( P4 ) 16B . movdqu - 16 . . , , movdqu, , , lddqu. ( movdqu " ", , . , , , .)

UnCacheable (UC) Uncacheable Speculate Write-combining (UCSW, aka) ( MMIO .)


asm :

  # SSE packed-single instructions are shorter than SSE2 integer / packed-double
  4000e3:       0f 10 07                movups xmm0, [rdi]   

  4000e6:       f2 0f f0 07             lddqu  xmm0, [rdi]
  4000ea:       f3 0f 6f 07             movdqu xmm0, [rdi]

  4000ee:       c5 fb f0 07             vlddqu xmm0, [rdi]
  4000f2:       c5 fa 6f 07             vmovdqu xmm0, [rdi]
  # AVX-256 is the same as AVX-128, but with one more bit set in the VEX prefix

Core2 lddqu, movdqu. Intel lddqu Core2, .

Core2, - SSSE3 palignr, movdqu, 2- Core2 (Penryn), palignr - uop 2 Merom/Conroe. (Penryn 128b).

. Dark Shikaris 2009 Diary Of x264 : Cacheline , .

Core2 - Nehalem, movdqu - uop ​​ . - , ( , , AVX), , movdqu , , .


, Intel AVX lddqu . , , movdqu/vmovdqu ( SSE AVX128/AVX256) , - VEX .

, AVX, //, . , vmovdqa.

, ; movdqu lddqu, uops - , , , uop , , .

Intel ISA ref lddqu , 256b 64 ( ):

(V) MOVDQU, . , , , (V) LDDQU, , (V) MOVDQU (V) MOVDQA (V) LDDQU. , , , 16- , (V) MOVDQA.

IDK, , (V) AVX. , Intel vlddqu , .

AVX512 vlddqu, , , Intel , .

+4

Source: https://habr.com/ru/post/1689661/


All Articles