Why an empty function doesn't just return

If I compile an empty C function

void nothing(void) { } 

using gcc -O2 -S (and clang ) on macOS, it generates:

 _nothing: pushq %rbp movq %rsp, %rbp popq %rbp ret 

Why doesn't gcc remove everything except ret ? This seems to be a simple optimization if it really is not doing something (it seems not for me). This pattern (push / move at the beginning, pop at the end) is also displayed in other non-empty functions, where rbp is not used otherwise.

On Linux using later gcc (4.4.5) I only see

 nothing: rep ret 

Why rep ? rep absent in non-empty functions.

+4
source share
4 answers

Why a representative?

The reasons are explained in this blog post . In short, going directly to the single-byte ret command will ruin the branch prediction on some AMD processors. And instead of adding nop to ret , a meaningless prefix byte was added to save instruction decoding throughput.

Rep is absent in non-empty functions.

To quote a blog post that I linked to: "[ rep ret ] is preferable to simple ret either when it is the object of any branch, conditional ( jne/je/... ) or unconditional ( jmp/call/... )."
In the case of an empty function, ret would be a direct target for call . In a non-empty function, this will not be.

Why doesn't gcc remove everything except ret?

Perhaps some compilers will not skip the frame pointer code, even if you specified -O2 . At least with gcc, you can explicitly tell the compiler to omit them with the -fomit-frame-pointer parameter.

+3
source

As explained here: http://support.amd.com/us/Processor_TechDocs/25112.PDF , a two-byte command with a close return (i.e. rep ret ) is used, byte return may incorrectly predict me on some amd64 processors in some situations such as this one.

If you are playing with a gcc-oriented processor, you may find that you can create it to create a single-byte ret . -mtune=nocona worked for me.

+2
source

I suspect your last code is an error. According to johnfound. The first code is that all C compilers should always follow the call to the _cdecl call, which in a function means (in Intel, unfortunately, I don't know the AT & T syntax):

Function Definition

 _functionA: push rbp mov rbp, rsp ;Some function pop rbp ret 

In the caller:

 call _functionA sub esp, 0 ; Maybe if it zero, some compiler can strip it 

Why does GCC always abide by the _cdecl calling convention if it does not follow that this is nonsense, that is, the compiler is no smarter than the pre-build programmer. Thus, it always follows _cdecl at all costs.

+1
source

That is, because even the so-called "optimization compilers" are too dumb to always generate good machine code.

They cannot generate better code than their creators forced them to generate.

As long as an empty function is meaningless, they probably just did not bother to optimize it or even discover this special case.

Although, the single "rep" prefix is ​​probably a mistake. It does nothing when used without a string command, but in any case, in some new processors, this could theoretically throw an exception. (and should)

-3
source

Source: https://habr.com/ru/post/1495234/


All Articles