The compiler is trying to maintain alignment of 16 bytes on the stack. This also applies to 32-bit code these days (and not just 64-bit). The idea is that the stack should be aligned with a 16-byte boundary the moment the CALL command is executed.
Since you compiled without optimization, there are some extraneous instructions.
0x0804835a <main+3>: sub esp,0x18 ; Allocate local stack space 0x0804835d <main+6>: and esp,0xfffffff0 ; Ensure `main` has a 16 byte aligned stack 0x08048360 <main+9>: mov eax,0x0 ; Extraneous, not needed 0x08048365 <main+14>: sub esp,eax ; Extraneous, not needed
Now ESP is now aligned 16 bytes after the last instruction above. We move the parameters for a call starting at the top of the stack in ESP. This is done using:
0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4 0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3 0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2 0x0804837f <main+40>: mov DWORD PTR [esp],0x1
Then CALL pops a 4-byte return address on the stack. Then we get the following instructions after the call:
0x08048344 <test_function+0>: push ebp ; 4 bytes pushed on stack 0x08048345 <test_function+1>: mov ebp,esp ; Setup stackframe
This pushes another 4 bytes on the stack. With 4 bytes from the return address, we are now offset by 8 bytes. To perform 16-byte alignment again, we will need to spend an additional 8 bytes on the stack. This is why there are 8 more bytes in this statement:
0x08048347 <test_function+3>: sub esp,0x28
- 0x08 bytes already on the stack due to return address (4 bytes) and EBP (4 bytes)
- 0x08 fill bytes needed to align the stack before aligning 16 bytes
- 0x20 bytes needed for local variable allocation = 32 bytes. 32/16 is evenly divided by 16, so alignment is supported
The second and third numbers added together represent the value 0x28 calculated by the compiler and used in sub esp,0x28 .
0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69
So why [ebp-12] in this manual? The first 8 bytes of [ebp-8] through [ebp-1] are the alignment bytes used to align the 16-bit stack. After that, local data will appear on the stack. In this case, [ebp-12] through [ebp-9] - 4 bytes for the 32-bit integer flag .
Then we have this to update buffer[0] with the character "A":
0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41
Then the oddness would be the reason that from [ebp+40] (the beginning of the array) to [ebp+13] array of 10 bytes of characters would be displayed, which is 28 bytes. The best assumption I can make is that the compiler believes that it can treat a 10-byte character array as a 128-bit (16-byte) vector. This will force the compiler to align the buffer along the 16-byte boundary and insert the array into 16 bytes (128 bits). From a compiler point of view, your code seems to act as if it were defined as:
Exit to GodBolt for GCC 4.9.0 , generating 32-bit code with SSE2 enabled, looks like this:
test_function: push ebp # mov ebp, esp #, sub esp, 40 #,same as: sub esp,0x28 mov DWORD PTR [ebp-12], 31337 # flag, mov BYTE PTR [ebp-40], 65 # bufu.buffer, leave ret
This is similar to your disassembly in GDB.
If you compiled with optimizations (e.g. -O1 , -O2 , -O3 ), the optimizer could simplify test_function because this is the worksheet function in your example. A sheet function is a function that does not call another function. Some shortcuts could be applied by the compiler.
As for why the character array seems to be aligned on a 16-byte boundary and filled up to 16 bytes? This question probably cannot be answered with certainty until we find out which GCC compiler you are using ( gcc --version will tell you). It would also be useful to know the OS and OS version. Even better would be adding the output of this command to your question gcc -Q -v -g my_program.c