128-bit shifts using assembly language?

What is the most efficient way to do a 128-bit shift on a modern Intel processor (i7 core, sand bridge).

Similar code is in my innermost loop:

u128 a[N]; void xor() { for (int i = 0; i < N; ++i) { a[i] = a[i] ^ (a[i] >> 1) ^ (a[i] >> 2); } } 

The data in a[N] are almost random.

+6
source share
2 answers

Using the Shift command is double .

So SHLD or SHRD , because SSE is not intended for this purpose. There is a classic method, here you have test cases for a 128-bit left shift by 16 bits in 32 and 64-bit processor mode.

Thus, you can perform unlimited size shift of up to 32/64 bits. Yoo can be shifted for an immediate number of bits or a number in the cl register. The statement of the first instruction can also access the variable in memory.

128-bit 16-bit left shift with 32-bit x86 processor mode:

  mov eax, $04030201; mov ebx, $08070605; mov ecx, $0C0B0A09; mov edx, $100F0E0D; shld edx, ecx, 16 shld ecx, ebx, 16 shld ebx, eax, 16 shl eax, 16 

And 128 bit left shift by 16 bit in 64-bit x86 CPU mode:

  mov rax, $0807060504030201; mov rdx, $100F0D0E0B0C0A09; shld rdx, rax, 16 shl rax, 16 
+9
source

In this particular case, you can use a combination of the x86 SHR and RCR commands:

 ; a0 - bits 0-31 of a[i] ; a1 - bits 32-63 of a[i] ; a2 - bits 64-95 of a[i] ; a3 - bits 96-127 of a[i] mov eax, a0 mov ebx, a1 mov ecx, a2 mov ecx, a3 shr eax, 1 rcr ebx, 1 rcr ecx, 1 rcr edx, 1 ; b0 - bits 0-31 of b[i] := a[i] >> 1 ; b1 - bits 32-63 of b[i] := a[i] >> 1 ; b2 - bits 64-95 of b[i] := a[i] >> 1 ; b3 - bits 96-127 of b[i] := a[i] >> 1 mov b0, eax mov b1, ebx mov b2, ecx mov b3, edx shr eax, 1 rcr ebx, 1 rcr ecx, 1 rcr edx, 1 ; c0 - bits 0-31 of c[i] := a[i] >> 2 = b[i] >> 1 ; c1 - bits 32-63 of c[i] := a[i] >> 2 = b[i] >> 1 ; c2 - bits 64-95 of c[i] := a[i] >> 2 = b[i] >> 1 ; c3 - bits 96-127 of c[i] := a[i] >> 2 = b[i] >> 1 mov c0, eax mov c1, ebx mov c2, ecx mov c3, edx 

If your goal is x86-64, this makes it easier:

 ; a0 - bits 0-63 of a[i] ; a1 - bits 64-127 of a[i] mov rax, a0 mov rbx, a1 shr rax, 1 rcr rbx, 1 ; b0 - bits 0-63 of b[i] := a[i] >> 1 ; b1 - bits 64-127 of b[i] := a[i] >> 1 mov b0, rax mov b1, rbx shr rax, 1 rcr rbx, 1 ; c0 - bits 0-63 of c[i] := a[i] >> 2 = b[i] >> 1 ; c1 - bits 64-127 of c[i] := a[i] >> 2 = b[i] >> 1 mov c0, rax mov c1, rbx 

Update: fixed typos in the 64-bit version

+3
source

Source: https://habr.com/ru/post/899933/


All Articles