Extended ASM GCC syntax: loading 128-bit memory as a source

GCC generates this code for shuffle () below:

movaps xmm0,XMMWORD PTR [rip+0x125] pshufb xmm4,xmm0 

Ideally, this should be:

 pshufb xmm4,XMMWORD PTR [rip+0x125] 

What is the extended ASM syntax for creating this separate statement?

Thank you very much Adam

PS: the commented out internal code generates the optimal code for this example. This does not work at all (GCC probably generates unnecessary registry copies when there are global registry variables).

 #include <stdint.h> typedef int8_t xmm_t __attribute__ ((vector_size (16))); const xmm_t xmm_shuf={128, 0, 1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 12, 13, 14, 15}; register xmm_t xmm __asm__("xmm4"); #define NTL ".intel_syntax noprefix\n" #define ATT ".att_syntax\n" void shuffle() { //xmm=__builtin_ia32_pshufb128(xmm, xmm_shuf); __asm__(NTL"pshufb %0, %1\n"ATT : "=x" (xmm) : "x" (xmm_shuf)); } int main() { } 

$ gcc -Os -std = gnu99 -msse4.1 -flax-vector-conversion pshufb_128bit_constant.c && & && objdump -d -m i386: x86-64: intel a.out | less

 0000000000400494 <shuffle>: 400494: 0f 28 05 25 01 00 00 movaps xmm0,XMMWORD PTR [rip+0x125] # 4005c0 &lt;xmm_shuf+0x10&gt; 40049b: 66 0f 38 00 e0 pshufb xmm4,xmm0 4004a0: c3 ret 
+4
source share
1 answer

Change the limitation of the input operands to "xm" so that in addition to the SSE registers memory allocation is allowed.

However, when I tested it, the compiler generated code that did not match Intel syntax. So in the end, this is what I used:

 __asm__("pshufb %1, %0" : "+x" (xmm) : "xm" (xmm_shuf)); 
+5
source

Source: https://habr.com/ru/post/1300006/


All Articles