Distribution of gcc arguments on x86-64

I am experimenting with x86-64 build. By compiling this dummy function:

long myfunc(long a, long b, long c, long d, long e, long f, long g, long h) { long xx = a * b * c * d * e * f * g * h; long yy = a + b + c + d + e + f + g + h; long zz = utilfunc(xx, yy, xx % yy); return zz + 20; } 

With gcc -O0 -g I was surprised to find the following at the beginning of the function build:

 0000000000400520 <myfunc>: 400520: 55 push rbp 400521: 48 89 e5 mov rbp,rsp 400524: 48 83 ec 50 sub rsp,0x50 400528: 48 89 7d d8 mov QWORD PTR [rbp-0x28],rdi 40052c: 48 89 75 d0 mov QWORD PTR [rbp-0x30],rsi 400530: 48 89 55 c8 mov QWORD PTR [rbp-0x38],rdx 400534: 48 89 4d c0 mov QWORD PTR [rbp-0x40],rcx 400538: 4c 89 45 b8 mov QWORD PTR [rbp-0x48],r8 40053c: 4c 89 4d b0 mov QWORD PTR [rbp-0x50],r9 400540: 48 8b 45 d8 mov rax,QWORD PTR [rbp-0x28] 400544: 48 0f af 45 d0 imul rax,QWORD PTR [rbp-0x30] 400549: 48 0f af 45 c8 imul rax,QWORD PTR [rbp-0x38] 40054e: 48 0f af 45 c0 imul rax,QWORD PTR [rbp-0x40] 400553: 48 0f af 45 b8 imul rax,QWORD PTR [rbp-0x48] 400558: 48 0f af 45 b0 imul rax,QWORD PTR [rbp-0x50] 40055d: 48 0f af 45 10 imul rax,QWORD PTR [rbp+0x10] 400562: 48 0f af 45 18 imul rax,QWORD PTR [rbp+0x18] 

gcc very strange spills all the argument registers onto the stack, and then takes them from memory for further operations.

This only happens at -O0 (there are no problems with -O1 ), but still, why? It looks like anti-optimization to me - why would gcc do it?

+6
source share
2 answers

I am by no means an expert on the GCC's interior, but I will do it. Unfortunately, most of the information about allocating and distinguishing GCC registers seems to be out of date (links to files like local-alloc.c that no longer exist).

I am looking at gcc-4.5-20110825 source code.

The GNU C Compiler Internals mentions that the initial function code is generated by expand_function_start in gcc/function.c . To process the parameters, we find the following:

 4462 /* Initialize rtx for parameters and local variables. 4463 In some cases this requires emitting insns. */ 4464 assign_parms (subr); 

In assign_parms code that processes where each argument is stored is as follows:

 3207 if (assign_parm_setup_block_p (&data)) 3208 assign_parm_setup_block (&all, parm, &data); 3209 else if (data.passed_pointer || use_register_for_decl (parm)) 3210 assign_parm_setup_reg (&all, parm, &data); 3211 else 3212 assign_parm_setup_stack (&all, parm, &data); 

assign_parm_setup_block_p handles aggregated data types and is not applicable in this case, and since the data is not passed as a pointer GCC checks use_register_for_decl .

Here's the relevant part:

 1972 if (optimize) 1973 return true; 1974 1975 if (!DECL_REGISTER (decl)) 1976 return false; 

DECL_REGISTER checks if a variable with the register keyword has been specified. And now we have our answer: most parameters live on the stack when optimization is not turned on, and then assign_parm_setup_stack processed. The route received through the source code before it ends, spilling the value, is a bit more complicated for the pointer arguments, but can be traced in the same file if you are interested.

Why does GCC distinguish between all arguments and local variables with optimizations disabled? To help debugging. Consider this simple function:

 1 extern int bar(int); 2 int foo(int a) { 3 int b = bar(a | 1); 4 b += 42; 5 return b; 6 } 

Compiled with gcc -O1 -c , this creates the following on my machine:

  0: 48 83 ec 08 sub $0x8,%rsp 4: 83 cf 01 or $0x1,%edi 7: e8 00 00 00 00 callq c <foo+0xc> c: 83 c0 2a add $0x2a,%eax f: 48 83 c4 08 add $0x8,%rsp 13: c3 retq 

Well, if only you break line 5 and try to print the value of a, you get

 (gdb) print a $1 = <value optimized out> 

As an argument, it is overwritten because it is not used after the call to bar .

+7
source

A few reasons:

  • In general, a function argument should be treated as a local variable, because it can be stored or have its own address accepted inside the function. Therefore, the easiest way is to allocate a stack slot for each argument.
  • Debugging information becomes much easier to emit using the stack location: the value of the argument is always in a certain place, and does not move between registers and memory.

When you look at -O0 code in general, consider that compiler priorities reduce compilation time as much as possible and generate high-quality debugging information.

+6
source

Source: https://habr.com/ru/post/895955/


All Articles