Transfer byte value to all 16 XMM slots in Delphi ASM

It is easy in AVX using the VBROADCASTS or SSE command if this value doubles or floats.

How to translate one 8-bit value for each slot in the XMM register in Delphi ASM?

+4
source share
3 answers

Do you mean that you have a byte in the LSB of the XMM register and want to duplicate it across all the bands of this register? I don’t know the syntax of the built-in Delphi assembly, but in Intel / MASM syntax this can be done something like this:

punpcklbw xmm0,xmm0    ; xxxxxxxxABCDEFGH -> xxxxxxxxEEFFGGHH
punpcklwd xmm0,xmm0    ; xxxxxxxxEEFFGGHH -> xxxxxxxxGGGGHHHH
punpckldq xmm0,xmm0    ; xxxxxxxxGGGGHHHH -> xxxxxxxxHHHHHHHH
punpcklqdq xmm0,xmm0   ; xxxxxxxxHHHHHHHH -> HHHHHHHHHHHHHHHH
+3
source

. , SSSE3, pshufb .

(1) 8- AL () (2) XMM1 (3), , XMM0, :

movd   xmm1, eax  ;// move value in AL (part of EAX) into XMM1
pxor   xmm0, xmm0 ;// clear xmm0 to create the appropriate mask for pshufb
pshufb xmm1, xmm0 ;// broadcast lowest value into all slots of xmm1

, Delphi BASM SSSE3.

+4

- SSSE3 pshufb, .

; SSSE3
pshufb      xmm0,  xmm1       ; where xmm1 is zeroed, e.g. with pxor xmm1,xmm1

:

; SSE2 only
punpcklbw   xmm0, xmm0        ; xxxxxxxxABCDEFGH -> xxxxxxxxEEFFGGHH
pshuflw     xmm0, xmm0, 0     ; xxxxxxxxEEFFGGHH -> xxxxxxxxHHHHHHHH
punpcklqdq  xmm0, xmm0        ; xxxxxxxxHHHHHHHH -> HHHHHHHHHHHHHHHH

, punpckl bw/wd → pshufd xmm0, xmm0, 0, 64- . ( Merom K8). pshuflw , punpcklqdq, pshufd punpck 64 . , " ", 3 bw/wd/pshufd.

3 , . . http://agner.org/optimize/ .

, pshuflw.


, 0x01010101, 4 . .

; movzx   eax, whatever

imul   edx, eax, 0x01010101    ; edx = al repeated 4 times

movd   xmm0, eax
pshufd xmm0, xmm0, 0

, imul , 32- , , 32 .


, , , . movd xmm. (, , pinsrb, , , , , , movd .)

If bandwidth for teams is more a problem than latency, you should consider pmuludqif you cannot use it pshufb, even if it has 5 cycles on most processors.

; low 32 bits of xmm0 = your byte, **zero extended**
pmuludq xmm0, xmm7        ; xmm7 = 0x01010101 in the low 32 bits
pshufd  xmm0, xmm0, 0
+2
source

Source: https://habr.com/ru/post/1569988/


All Articles