Waste in memory allocation for local variables

This is my program:

void test_function(int a, int b, int c, int d){ int flag; char buffer[10]; flag = 31337; buffer[0] = 'A'; } int main() { test_function(1, 2, 3, 4); } 

I compile this program using the debug option:

 gcc -g my_program.c 

I am using gdb and I am parsing test_function with intel syntax:

 (gdb) disassemble test_function Dump of assembler code for function test_function: 0x08048344 <test_function+0>: push ebp 0x08048345 <test_function+1>: mov ebp,esp 0x08048347 <test_function+3>: sub esp,0x28 0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69 0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41 0x08048355 <test_function+17>: leave 0x08048356 <test_function+18>: ret End of assembler dump. 

And I make out the main thing:

 (gdb) disassemble main Dump of assembler code for function main: 0x08048357 <main+0>: push ebp 0x08048358 <main+1>: mov ebp,esp 0x0804835a <main+3>: sub esp,0x18 0x0804835d <main+6>: and esp,0xfffffff0 0x08048360 <main+9>: mov eax,0x0 0x08048365 <main+14>: sub esp,eax 0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4 0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3 0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2 0x0804837f <main+40>: mov DWORD PTR [esp],0x1 0x08048386 <main+47>: call 0x8048344 <test_function> 0x0804838b <main+52>: leave 0x0804838c <main+53>: ret End of assembler dump. 

I place a breakpoint at this address: 0x08048355 (leave instructions for test_function), and I started the program.

I look at the stack as follows:

 (gdb) x/16w $esp 0xbffff7d0: 0x00000041 0x08049548 0xbffff7e8 0x08048249 0xbffff7e0: 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x00007a69 0xbffff7f0: 0xb7fd6ff4 0xbffff8ac 0xbffff818 0x0804838b 0xbffff800: 0x00000001 0x00000002 0x00000003 0x00000004 

0x0804838b is the return address, 0xbffff818 is the saved frame pointer (main ebp), and the flag variable additionally contains 12 bytes. Why 12?

I do not understand this instruction:

 0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69 

Why don't we add the content variable 0x00007a69 to ebp-4 instead of 0xbffff8ac?

The same question for the buffer. Why 40?

We are not losing memory? 0xb7fd6ff4 0xbffff8ac and 0xb7f9f729 0xb7fd6ff4 0xbffff818 0x08049548 0xbffff7e8 0x08048249 not used?

This output is for gcc -Q -v -g my_program.c :

 Reading specs from /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/specs Configured with: ../src/configure -v --enable-languages=c,c++ --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-gxx-include-dir=/usr/include/c++/3.3 --enable-shared --enable-__cxa_atexit --with-system-zlib --enable-nls --without-included-gettext --enable-clocale=gnu --enable-debug i486-linux-gnu Thread model: posix gcc version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1) /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/cc1 -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6 notesearch.c -dumpbase notesearch.c -auxbase notesearch -g -version -o /tmp/ccGT0kTf.s GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1) (i486-linux-gnu) compiled by GNU C version 3.3.6 (Ubuntu 1:3.3.6-15ubuntu1). GGC heuristics: --param ggc-min-expand=99 --param ggc-min-heapsize=129473 options passed: -v -D__GNUC__=3 -D__GNUC_MINOR__=3 -D__GNUC_PATCHLEVEL__=6 -auxbase -g options enabled: -fpeephole -ffunction-cse -fkeep-static-consts -fpcc-struct-return -fgcse-lm -fgcse-sm -fsched-interblock -fsched-spec -fbranch-count-reg -fcommon -fgnu-linker -fargument-alias -fzero-initialized-in-bss -fident -fmath-errno -ftrapping-math -m80387 -mhard-float -mno-soft-float -mieee-fp -mfp-ret-in-387 -maccumulate-outgoing-args -mcpu=pentiumpro -march=i486 ignoring nonexistent directory "/usr/local/include/i486-linux-gnu" ignoring nonexistent directory "/usr/i486-linux-gnu/include" ignoring nonexistent directory "/usr/include/i486-linux-gnu" #include "..." search starts here: #include <...> search starts here: /usr/local/include /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/include /usr/include End of search list. gnu_dev_major gnu_dev_minor gnu_dev_makedev stat lstat fstat mknod fatal ec_malloc dump main print_notes find_user_note search_note Execution times (seconds) preprocessing : 0.00 ( 0%) usr 0.01 (25%) sys 0.00 ( 0%) wall lexical analysis : 0.00 ( 0%) usr 0.01 (25%) sys 0.00 ( 0%) wall parser : 0.02 (100%) usr 0.01 (25%) sys 0.00 ( 0%) wall TOTAL : 0.02 0.04 0.00 as -V -Qy -o /tmp/ccugTYeu.o /tmp/ccGT0kTf.s GNU assembler version 2.17.50 (i486-linux-gnu) using BFD version 2.17.50 20070103 Ubuntu /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/collect2 --eh-frame-hdr -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crt1.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crti.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtbegin.o -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6 -L/usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../.. /tmp/ccugTYeu.o -lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s --no-as-needed /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/crtend.o /usr/lib/gcc-lib/i486-linux-gnu/3.3.6/../../../crtn.o 

NOTE. I read the book " The Art of Exploitation " and I use VM to provide the book.

+5
source share
2 answers

The compiler is trying to maintain alignment of 16 bytes on the stack. This also applies to 32-bit code these days (and not just 64-bit). The idea is that the stack should be aligned with a 16-byte boundary the moment the CALL command is executed.

Since you compiled without optimization, there are some extraneous instructions.

 0x0804835a <main+3>: sub esp,0x18 ; Allocate local stack space 0x0804835d <main+6>: and esp,0xfffffff0 ; Ensure `main` has a 16 byte aligned stack 0x08048360 <main+9>: mov eax,0x0 ; Extraneous, not needed 0x08048365 <main+14>: sub esp,eax ; Extraneous, not needed 

Now ESP is now aligned 16 bytes after the last instruction above. We move the parameters for a call starting at the top of the stack in ESP. This is done using:

 0x08048367 <main+16>: mov DWORD PTR [esp+12],0x4 0x0804836f <main+24>: mov DWORD PTR [esp+8],0x3 0x08048377 <main+32>: mov DWORD PTR [esp+4],0x2 0x0804837f <main+40>: mov DWORD PTR [esp],0x1 

Then CALL pops a 4-byte return address on the stack. Then we get the following instructions after the call:

 0x08048344 <test_function+0>: push ebp ; 4 bytes pushed on stack 0x08048345 <test_function+1>: mov ebp,esp ; Setup stackframe 

This pushes another 4 bytes on the stack. With 4 bytes from the return address, we are now offset by 8 bytes. To perform 16-byte alignment again, we will need to spend an additional 8 bytes on the stack. This is why there are 8 more bytes in this statement:

 0x08048347 <test_function+3>: sub esp,0x28 
  • 0x08 bytes already on the stack due to return address (4 bytes) and EBP (4 bytes)
  • 0x08 fill bytes needed to align the stack before aligning 16 bytes
  • 0x20 bytes needed for local variable allocation = 32 bytes. 32/16 is evenly divided by 16, so alignment is supported

The second and third numbers added together represent the value 0x28 calculated by the compiler and used in sub esp,0x28 .

 0x0804834a <test_function+6>: mov DWORD PTR [ebp-12],0x7a69 

So why [ebp-12] in this manual? The first 8 bytes of [ebp-8] through [ebp-1] are the alignment bytes used to align the 16-bit stack. After that, local data will appear on the stack. In this case, [ebp-12] through [ebp-9] - 4 bytes for the 32-bit integer flag .

Then we have this to update buffer[0] with the character "A":

 0x08048351 <test_function+13>: mov BYTE PTR [ebp-40],0x41 

Then the oddness would be the reason that from [ebp+40] (the beginning of the array) to [ebp+13] array of 10 bytes of characters would be displayed, which is 28 bytes. The best assumption I can make is that the compiler believes that it can treat a 10-byte character array as a 128-bit (16-byte) vector. This will force the compiler to align the buffer along the 16-byte boundary and insert the array into 16 bytes (128 bits). From a compiler point of view, your code seems to act as if it were defined as:

 #include <xmmintrin.h> void test_function(int a, int b, int c, int d){ int flag; union { char buffer[10]; __m128 m128buffer; ; 16-byte variable that needs to be 16-bytes aligned } bufu; flag = 31337; bufu.buffer[0] = 'A'; } 

Exit to GodBolt for GCC 4.9.0 , generating 32-bit code with SSE2 enabled, looks like this:

 test_function: push ebp # mov ebp, esp #, sub esp, 40 #,same as: sub esp,0x28 mov DWORD PTR [ebp-12], 31337 # flag, mov BYTE PTR [ebp-40], 65 # bufu.buffer, leave ret 

This is similar to your disassembly in GDB.

If you compiled with optimizations (e.g. -O1 , -O2 , -O3 ), the optimizer could simplify test_function because this is the worksheet function in your example. A sheet function is a function that does not call another function. Some shortcuts could be applied by the compiler.

As for why the character array seems to be aligned on a 16-byte boundary and filled up to 16 bytes? This question probably cannot be answered with certainty until we find out which GCC compiler you are using ( gcc --version will tell you). It would also be useful to know the OS and OS version. Even better would be adding the output of this command to your question gcc -Q -v -g my_program.c

+6
source

If you are not trying to improve the gcc code itself, understanding why not optimized code is as bad as it will basically be a waste of time. Look at the result from -O3 if you want to know what the compiler does with your code, or from -Og if you want to see a more literal translation of your source into asm. Write functions that enter into args and produce output in global or return values, so optimized asm is not just ret .


You should not expect anything effective from gcc -O0 . This makes the most braindead literal translation of your source.

I cannot reproduce this asm output with any version of gcc or clang at http://gcc.godbolt.org/ . (gcc 4.4.7 to gcc 5.3.0, clang 3.0 to clang 3.7.1). (Note that godbolt uses g++ , but you can use -xc to handle input as C, instead of compiling it as C ++. Sometimes this can change the output of asm, even if you don't use any functions C99 / C11 has but C ++ is not. (e.g. C99 variable length arrays).

Some gcc versions by default give extra code if I don't use -fno-stack-protector .

At first I thought that the extra space reserved by test_function should have copied my arguments down into its frame stack, but at least modern gcc does not. ( 64-bit gcc stores its arguments in memory when they arrive in registers , but they are different. 32bit gcc will add the argument to the stack without copying it .

ABI allows the called function to compress its arguments on the stack, so the caller who would like to make repeated calls to the functions with the same arguments had to save them between calls.

clang 3.7.1 with -O0 copies its arguments down to the locale , but it still only reserves 32 ( 0x20 ) bytes.

This is the best answer you will get if you don't tell us which version of gcc you are using ...


+3
source

Source: https://habr.com/ru/post/1242468/


All Articles