Why does initializing the `i` variable to 0 and a large size result in the same program size?

There is a problem that bothers me.

int main(int argc, char *argv[]) { int i = 12345678; return 0; } 

 int main(int argc, char *argv[]) { int i = 0; return 0; } 

size

Programs have the same bytes. Why?

And where is the literal meaning really stored? Text segment or another place?

memory map

+5
source share
4 answers

Programs have the same bytes. Why?

There are two possibilities:

  • The compiler optimizes the variable. It is not used anywhere and therefore does not make sense.

  • If 1. not applicable, program sizes are equal anyway. Why shouldn't they? 0 is the same size as 12345678 . Two variables of type T occupy the same size in memory.

And where is the literal meaning really stored?

On the stack. Local variables are usually stored on the stack.

+7
source

Think about your bedroom. If you filled it with material or you left it empty, will this change the area of ​​your bedroom? int size is equal to sizeof(int) .it doesn't matter what value you store in it.

+5
source

Because your program is optimized. During compilation, the compiler detected that i was useless and deleted it.

If optimization has not occurred, another simple explanation is that int is the same size as another int .

+1
source

TL DR

First question: They are the same size, since the output of your program's instructions about the same (more on this below). In addition, they are the same size, since the size (number of bytes) of your int never changes.

The second question: i variable is stored in a local variable frame , which is in the function stack. The actual value set to i is in the instructions (hard-coded) in the text segment.


Gdb and assembly

I know that you use Windows, but consider these codes and output to Linux. I used the same sources as you.

For the first, with i = 12345678 , the actual main function is the following computer instructions:

 (gdb) disass main Dump of assembler code for function main: 0x00000000004004ed <+0>: push %rbp 0x00000000004004ee <+1>: mov %rsp,%rbp 0x00000000004004f1 <+4>: mov %edi,-0x14(%rbp) 0x00000000004004f4 <+7>: mov %rsi,-0x20(%rbp) 0x00000000004004f8 <+11>:movl $0xbc614e,-0x4(%rbp) 0x00000000004004ff <+18>:mov $0x0,%eax 0x0000000000400504 <+23>:pop %rbp 0x0000000000400505 <+24>:retq End of assembler dump. 

As for the other program, with i = 0 , main :

 (gdb) disass main Dump of assembler code for function main: 0x00000000004004ed <+0>: push %rbp 0x00000000004004ee <+1>: mov %rsp,%rbp 0x00000000004004f1 <+4>: mov %edi,-0x14(%rbp) 0x00000000004004f4 <+7>: mov %rsi,-0x20(%rbp) 0x00000000004004f8 <+11>:movl $0x0,-0x4(%rbp) 0x00000000004004ff <+18>:mov $0x0,%eax 0x0000000000400504 <+23>:pop %rbp 0x0000000000400505 <+24>:retq End of assembler dump. 

The only difference between the two codes is the actual value that is stored in your variable. Let's go through these lines step by step below (my computer is x86_64, so if your architecture is different, the instructions may differ).


Opcodes

And the actual main instructions (using objdump ):

 00000000004004ed <main>: 4004ed: 55 push %rbp 4004ee: 48 89 e5 mov %rsp,%rbp 4004f1: 89 7d ec mov %edi,-0x14(%rbp) 4004f4: 48 89 75 e0 mov %rsi,-0x20(%rbp) 4004f8: c7 45 fc 4e 61 bc 00 movl $0xbc614e,-0x4(%rbp) 4004ff: b8 00 00 00 00 mov $0x0,%eax 400504: 5d pop %rbp 400505: c3 retq 400506: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 40050d: 00 00 00 

To get the actual difference in bytes, use objdump -D prog1 > prog1_dump and objdump -D prog2 > prog2_dump and their diff prog1_dump prog2_dump :

 2c2 < draft1: file format elf64-x86-64 --- > draft2: file format elf64-x86-64 51,58c51,58 < 400283: 00 bc f6 06 64 9f ba add %bh,-0x45609bfa(%rsi,%rsi,8) < 40028a: 01 3b add %edi,(%rbx) < 40028c: 14 d1 adc $0xd1,%al < 40028e: 12 cf adc %bh,%cl < 400290: cd 2e int $0x2e < 400292: 11 77 5d adc %esi,0x5d(%rdi) < 400295: 79 fe jns 400295 <_init-0x113> < 400297: 3b .byte 0x3b --- > 400283: 00 e8 add %ch,%al > 400285: f1 icebp > 400286: 6e outsb %ds:(%rsi),(%dx) > 400287: 8a f8 mov %al,%bh > 400289: a8 05 test $0x5,%al > 40028b: ab stos %eax,%es:(%rdi) > 40028c: 48 2d 3f e9 e2 b2 sub $0xffffffffb2e2e93f,%rax > 400292: f7 06 53 df ba af testl $0xafbadf53,(%rsi) 287c287 < 4004f8: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%rbp) --- > 4004f8: c7 45 fc 4e 61 bc 00 movl $0xbc614e,-0x4(%rbp) 

Pay attention to the address 0x4004f8 your number, 4e 61 bc 00 on prog2 and 00 00 00 00 on prog1 , both 4 bytes, which is equal to sizeof(int) . Bytes c7 45 fc - the rest of the instructions (move some value to the rbp offset). Also note that the first two sections that are different are the same size in bytes (21). So you go, although a little different, they are the same size.


Step by Step Using Assembly Instructions

  • push %rbp; mov %rsp, %rbp push %rbp; mov %rsp, %rbp : This is called the Stack Frame setting and is standard for all C functions (unless you say gcc - fomit-frame-pointer ). This allows you to access the stack and your local variables through a fixed register, in this case rbp .

  • mov %edi, -0x14(%rbp) : This moves the contents of the edi register into our local variable frame. In particular, at an offset of -0x14

  • mov %rsi, -0x20(%rbp) : Same thing here. But this time it saves rsi . This is part of the x86_64 calling convention (which now uses registers instead of pushing everything onto the stack, for example x86_32), but instead of storing them in registers, we free the registers by storing the contents in our local frame-register variables faster and this is the only way the processor can actually process something, so the more free registers we have, the better.

Note: edi is the 4-byte part of the rsi register, and from the x86_64 calling convention, we know that the rsi register is used for the first argument. main first argument int argc , so it makes sense to use a 4-byte register to store it. rsi is the second argument, effectively the address of a pointer to a character pointer ( **argv ). So, in 64-bit architectures that fit perfectly into the 64-bit register.

  1. <+11>: movl $0xbc614e,-0x4(%rbp) : This is the actual string int i = 12345678 ( 0xbc614e = 12345678d ). Now notice that the way to "move" this value is very similar to the way we store the main arguments. We use offset -0x4(%rbp) to store its memory in a "local frame of variables" (this answers your question about where it is stored).

  2. mov $0x0, %eax; pop %rbp; retq mov $0x0, %eax; pop %rbp; retq : Again, stupid things to clear the frame pointer and return (end the program, since we basically).

  3. Note that in the second example, the only difference is the line <+11>: movl $0x0,-0x4(%rbp) , which effectively stores a null value - in the words C, int i = 0 .

So, according to these instructions, you can see that the main function of both programs translates to the assembly in the same way, so their sizes at the end coincide. (Assuming you compiled them in the same way, because the compiler also contains many other things in binary files, such as data, library functions, etc. On linux, you can get the full disassembly demarrier with the objdump -D program .

Note 2: In these examples, you cannot see how the computer subtracts the values ​​from rsp to allocate stack space, but as usual.


Stack view

The stack for both cases will be the same (only the value of i will change or the value in -0x4(%rbp) )

 | ~~~ | Higher Memory addresses | | +------------------+ <--- Address 0x8(%rbp) | RETURN ADDRESS | +------------------+ <--- Address 0x0(%rbp) // instruction push %rbp | previous rbp | +------------------+ <--- Address -0x4(%rbp) | i=0x11223344 | +------------------+ <---- Address -0x14(%rbp) | argc | +------------------+ <---- address -0x20(%rbp) | argv | +------------------+ | | +~~~~~~~~~~~~~~~~~~+ Lower memory addresses 

Note 3: The direction of stack growth depends on your architecture. How data is written to memory also depends on your architecture.


Resources

+1
source

Source: https://habr.com/ru/post/1234132/


All Articles