Where is "2 + 2" in this assembly code (gcc to C translation)

Question

Where is "2 + 2" in this assembly code (gcc to C translation)

I wrote this simple C code

int main() { int calc = 2+2; return 0; }

And I want to see how it looks in the assembly, so I compiled it with gcc

 $ gcc -S -o asm.s test.c

And the result was ~ 65 lines (Mac OS X 10.8.3), and I found that they are related to each other:

enter image description here

Where to look for my 2+2 in this code?

Edit:

One part of the question has not been considered.

If %rbp, %rsp, %eax are variables, what values do they reach in this case?

+4

c assembly gcc

Morgan wilde Sep 16 '13 at 17:26

source share

6 answers

The compiler determined that 2+2 = 4 and entered it. The constant is stored on line 10 ( $4 ). To test this, change the math to 2+3 and you will see $5

EDIT: as for the registers themselves, %rsp is the stack pointer, %rbp is the frame pointer, and %eax is the general register

+10

Sheetjs Sep 16 '13 at 17:30

source share

Your program has no observable behavior, which means that in the general case, the compiler cannot create any machine code for it at all, except for some minimal instructions to complete the load, designed to ensure that zero returns to the calling environment. At the very least, declare the variable as volatile . Or print out its value after evaluating it. Or return it from main .

Also note that in the language C 2 + 2 qualifies as an integral constant expression. This means that the compiler is not just resolved, but actually requires knowing the result of this expression at compile time. Given this, it would be strange to expect the compiler to evaluate 2 + 2 at runtime when the final value is known at compile time (even if you completely disable optimization).

+2

AnT Sep 16 '13 at 17:56

source share

Here is an explanation of the build code:

 pushq %rbp

Saves a copy of the frame pointer on the stack. The function itself does not need this; it's there, so debuggers or exception handlers can find frames on the stack.

 movq %rsp, %rbp

Starts a new frame by setting the frame pointer to the current top stack. Again, the function does not need this; this is housekeeping to maintain a proper stack.

 mov $4, -12(%rbp)

Here the compiler initializes calc to 4. Several things happened here. First, the compiler itself rated 2+2 and used the result 4 in the build code. Arithmetic is not performed in the execution program; It was completed in the compiler. Secondly, calc was assigned a location 12 bytes below the frame pointer. (This is interesting because it is also below the stack pointer. OS X ABI for this architecture includes a “red zone” under the stack pointer that programs are allowed to use, which is unusual.) Third, the program was explicitly compiled without optimization. We know that since the optimizer recognizes that this code has no effect and is useless, it will remove it.

 movl $0, -8(%rbp)

This code stores 0 in the place that the compiler allocated to prepare the return value of main .

 movl -8(%rbp), %eax movl %eax, -4(%rbp)

This copies data from the place where the return value is prepared for the temporary reference. This is even more useless than the previous code, reinforcing the conclusion that optimization is not used. It looks like the code I expect with a negative level of optimization.

 movl -4(%rbp), %eax

This moves the return value from the temporary processing location to the register where it is returned to the caller.

 popq %rbp

This restores the frame pointer, thereby removing the previously transferred frame from the stack.

ret

This puts the program out of her misfortune.

+2

Eric Postpischil Sep 16 '13 at 18:50

source share

The compiler optimized it, he previously calculated the answer and simply set the result. If you want the compiler to make an addition, you cannot let it “see” the constants that you feed it.

If you compile this code yourself as an object (gcc -O2 -c test_add.c -o test_add.o) then you will force the compiler to generate the add code. But the operands will be registers or on the stack.

 int test_add ( int a, int b ) { return(a+b); }

Then, if you call it from the code in a separate source (gcc -O2 -c test.c -o test.o), you will see that two operands will be forced into the function.

 extern int test_add ( int, int ); int test ( void ) { return(test_add(2,2)); }

and you can parse both of these objects (objdump -D test.o, objdump -D test_add.o)

When you do something simple in one file

 int main ( void ) { int a,b,c; a=2; b=2; c=a+b; return(0); }

The compiler can optimize your code into one of several equivalents. My example is here, does nothing, math and results have no purpose, they are not used, so they can simply be deleted as dead code. This operation did it.

 int main ( void ) { int c; c=4; return(0); }

But this is also a legitimate optimization of the above code.

 int main ( void ) { return(0); }

EDIT:

Where is calc = 2 + 2 located?

I think that

 movl $4,-12(%rbp)

Is 2 + 2 (the answer is computed and just put in calc, which is on the stack.

 movl $0,-8(%rbp)

I assume 0 in your return (0);

The actual math of adding two numbers has been optimized.

+1

old_timer Sep 16 '13 at 18:16

source share

I think line 10, it is optimized since all are constants

0

Gar Sep 16 '13 at 17:31

source share

zwol · Accepted Answer · 2013-09-16T17:40:20+0000

Almost all of the code you received is just useless stack manipulation. With optimizations on ( gcc -S -O2 test.c ) you get something like

 main: .LFB0: .cfi_startproc xorl %eax, %eax ret .cfi_endproc .LFE0:

Ignore every line starting with a period or ending with a colon: there are only two assembly instructions:

  xorl %eax, %eax ret

and they encode return 0; . (XORing with the register itself sets it to all bits-zero. The return values of the function go to the %eax register for each x86 ABI.) Everything related to your int calc = 2+2; was discarded as unused.

If you changed the code to

 int main(void) { return 2+2; }

instead you get

  movl $4, %eax ret

where 4 is obtained from the compiler making the addition, and not to create the generated program (this is called constant folding ).

Perhaps more interesting if you change the code to

 int main(int argc, char **argv) { return argc + 2; }

then you get

  leal 2(%rdi), %eax ret

which does a certain job at runtime! In 64-bit ELF, ABI %rdi contains in this case the first argument of the argc function. leal 2(%rdi), %eax is the x86 assembly language for " %eax = %edi + 2 ", and this is mainly because the more familiar add command accepts only two arguments, so you cannot use it to add 2 in %rdi and put the result %eax all in one instruction. (Ignore the difference between %rdi and %edi for now.)

Where is "2 + 2" in this assembly code (gcc to C translation)

Edit:

More articles: