Gcc build with stack

I need a built-in build code like this:

  • I have a couple (so it is balanced) push / pop operations inside the assembly
  • I also have a variable in memory (so it doesn't register) as input

like this:

__asm__ __volatile__ ("push %%eax\n\t" // ... some operations that use ECX as a temporary "mov %0, %%ecx\n\t" // ... some other operation "pop %%eax" : : "m"(foo)); // foo is my local variable, that is to say, on stack 

When disassembling the compiled code, the compiler passes the memory address, for example, 0xc(%esp) , it refers to esp , so this piece of code does not work correctly, since I have a push operation before mov , therefore, as I can say compilation, I do not like foo relative to esp , but any thing like -8(%ebp) relative to ebp.

PS You can assume that I can put eax inside Clobbers, but this is just sample code . I do not like to show the full reason why I do not make this decision.

+1
source share
3 answers

Modifying the ESP inside inline-asm should generally be avoided when you have any memory I / O, so you don’t need to disable the optimization or force the compiler to stack the EBP stack in some other way. One of the main advantages is that you (or the compiler) can use EBP as an additional free registry ; potentially significant acceleration if you already have to spill / reload material. If you are writing inline asm, this is presumably a hot spot, so you should spend the extra code size to use ESP-relative addressing modes.

There is an additional obstacle to using push / pop in x86-64 code because you cannot tell the compiler that you want to compress the red zone below RSP. You may get problems like this where you clobber the compiler data on the stack. No 32-bit x86 ABI has a red zone, although this applies only to the x86-64 System.

You need to disable -fomit-frame-pointer for this function if you want to use asm-only, like push , as the stack data structure, so there is a push variable. Or perhaps when optimizing code size.

You can always write a whole non-built-in function in asm and put it in a separate file, then you have full control. But do this only if your function includes a whole loop; don't make the call compiler a short function without a loop inside the inner loop of C.


It seems you are using push / pop inside the built-in asm because you do not have enough registers and you need to save / reload something. You do not need to use push / pop to save / restore. Instead, use dummy output operands with "=m" constraints to force the compiler to allocate stack space for you and use mov to / from these slots. (Of course, you are not limited to mov , it can be a win to use the memory source operand for an ALU instruction if you need only one or two times.)

This may be slightly worse for code size, but usually worse for performance (and may be better). If this is not so good, write the entire function (or the whole loop) in asm so that you do not have to deal with the compiler.

 int foo(char *p, int a, int b) { int t1,t2; // dummy output spill slots int r1,r2; // dummy output tmp registers int res; asm ("# operands: %0 %1 %2 %3 %4 %5 %6 %7 %8\n\t" "imull $123, %[b], %[res]\n\t" "mov %[res], %[spill1]\n\t" "mov %[a], %%ecx\n\t" "mov %[b], %[tmp1]\n\t" // let the compiler allocate tmp regs, unless you need specific regs eg for a shift count "mov %[spill1], %[res]\n\t" : [res] "=&r" (res), [tmp1] "=&r" (r1), [tmp2] "=&r" (r2), // early-clobber [spill1] "=m" (t1), [spill2] "=&rm" (t2) // allow spilling to a register if there are spare regs , [p] "+&r" (p) , "+m" (*(char (*)[]) p) // dummy in/output instead of memory clobber : [a] "rmi" (a), [b] "rm" (b) // a can be an immediate, but b can't : "ecx" ); return res; // p unused in the rest of the function // so it really just an input to the asm, // which the asm is allowed to destroy } 

This compiles into the next asm with gcc7.3 -O3 -m32 in the Godbolt compiler explorer . Pay attention to the asm comment showing that the compiler has chosen for all operands of the template: it has selected 12(%esp) for %[spill1] and% edi for % [spill2] (because I used "= & rm" for that operand, so the compiler saved/restore % edi` outside of asm, and gave it to us for this dummy operand).

 foo(char*, int, int): pushl %ebp pushl %edi pushl %esi pushl %ebx subl $16, %esp movl 36(%esp), %edx movl %edx, %ebp #APP # 19 "/tmp/compiler-explorer-compiler118120-55-w92ge8.v797i/example.cpp" 1 # operands: %eax %ebx %esi 12(%esp) %edi %ebp (%edx) 40(%esp) 44(%esp) imull $123, 44(%esp), %eax mov %eax, 12(%esp) mov 40(%esp), %ecx mov 44(%esp), %ebx mov 12(%esp), %eax # 0 "" 2 #NO_APP addl $16, %esp popl %ebx popl %esi popl %edi popl %ebp ret 

Hmm, a dummy memory operand, to tell the compiler which memory we are modifying seems to have led to the allocation of a register for this, I think because the operand p is an early clober, so it cannot use the same register. I think you could risk leaving early clobber if you are sure that none of the other inputs will use the same register as p . (that is, they do not have the same meaning).

+2
source

The direct use of the stack pointer to reference local variables is probably due to the use of compiler optimizations. I think you could solve the problem in several ways:

  • Disabling frame pointer optimization ( -fno-omit-frame-pointer in GCC);
  • Insert esp into Clobbers so that the compiler knows that its value is changing ( check your compiler for compatibility ).
+1
source

Instead of moving to ecx in assembly code, put the operand in ecx directly:

  : : "c"(foo) 
+1
source

Source: https://habr.com/ru/post/1275616/


All Articles