Modifying the ESP inside inline-asm should generally be avoided when you have any memory I / O, so you donβt need to disable the optimization or force the compiler to stack the EBP stack in some other way. One of the main advantages is that you (or the compiler) can use EBP as an additional free registry ; potentially significant acceleration if you already have to spill / reload material. If you are writing inline asm, this is presumably a hot spot, so you should spend the extra code size to use ESP-relative addressing modes.
There is an additional obstacle to using push / pop in x86-64 code because you cannot tell the compiler that you want to compress the red zone below RSP. You may get problems like this where you clobber the compiler data on the stack. No 32-bit x86 ABI has a red zone, although this applies only to the x86-64 System.
You need to disable -fomit-frame-pointer for this function if you want to use asm-only, like push , as the stack data structure, so there is a push variable. Or perhaps when optimizing code size.
You can always write a whole non-built-in function in asm and put it in a separate file, then you have full control. But do this only if your function includes a whole loop; don't make the call compiler a short function without a loop inside the inner loop of C.
It seems you are using push / pop inside the built-in asm because you do not have enough registers and you need to save / reload something. You do not need to use push / pop to save / restore. Instead, use dummy output operands with "=m" constraints to force the compiler to allocate stack space for you and use mov to / from these slots. (Of course, you are not limited to mov , it can be a win to use the memory source operand for an ALU instruction if you need only one or two times.)
This may be slightly worse for code size, but usually worse for performance (and may be better). If this is not so good, write the entire function (or the whole loop) in asm so that you do not have to deal with the compiler.
int foo(char *p, int a, int b) { int t1,t2;
This compiles into the next asm with gcc7.3 -O3 -m32 in the Godbolt compiler explorer . Pay attention to the asm comment showing that the compiler has chosen for all operands of the template: it has selected 12(%esp) for %[spill1] and% edi for % [spill2] (because I used "= & rm" for that operand, so the compiler saved/restore % edi` outside of asm, and gave it to us for this dummy operand).
foo(char*, int, int): pushl %ebp pushl %edi pushl %esi pushl %ebx subl $16, %esp movl 36(%esp), %edx movl %edx, %ebp
Hmm, a dummy memory operand, to tell the compiler which memory we are modifying seems to have led to the allocation of a register for this, I think because the operand p is an early clober, so it cannot use the same register. I think you could risk leaving early clobber if you are sure that none of the other inputs will use the same register as p . (that is, they do not have the same meaning).