Compiler Code Mapping

Question

Compiler Code Mapping

So it all started here: An unsigned integer and an unsigned char with the same value, but in a different way, why?

I wrote the following application to understand what is going on behind the scenes (for example, how the compiler deals with this problem).

#include <stdio.h> int main() { { unsigned char k=-1; if(k==-1) { puts("uc ok\n"); } } { unsigned int k=-1; if(k==-1) { puts("ui ok"); } } }

And when compiling with GCC:

 gcc -O0 -S -masm=intel hc

I get the following build file:

  .file "hc" .intel_syntax noprefix .section .rodata .LC0: .string "ui ok" .text .globl main .type main, @function main: .LFB0: .cfi_startproc push rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 mov rbp, rsp .cfi_def_cfa_register 6 sub rsp, 16 mov BYTE PTR [rbp-1], -1 mov DWORD PTR [rbp-8], -1 cmp DWORD PTR [rbp-8], -1 jne .L3 mov edi, OFFSET FLAT:.LC0 call puts .L3: leave .cfi_def_cfa 7, 8 ret .cfi_endproc .LFE0: .size main, .-main .ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3" .section .note.GNU-stack,"",@progbits

And to my great surprise, the first check EVEN.

But, if I compile the same with Microsoft Visual C ++ (2010), I get (I cut a lot of garbage from this listing, why is it not so reliable):

 00B81780 push ebp 00B81781 mov ebp,esp 00B81783 sub esp,0D8h 00B81789 push ebx 00B8178A push esi 00B8178B push edi 00B8178C lea edi,[ebp-0D8h] 00B81792 mov ecx,36h 00B81797 mov eax,0CCCCCCCCh 00B8179C rep stos dword ptr es:[edi] 00B8179E mov byte ptr [k],0FFh 00B817A2 movzx eax,byte ptr [k] 00B817A6 cmp eax,0FFFFFFFFh 00B817A9 jne wmain+42h (0B817C2h) 00B817AB mov esi,esp 00B817AD push offset string "uc ok\n" (0B857A8h) 00B817B2 call dword ptr [__imp__puts (0B882ACh)] 00B817B8 add esp,4 00B817BB cmp esi,esp 00B817BD call @ILT+435(__RTC_CheckEsp) (0B811B8h) 00B817C2 mov dword ptr [k],0FFFFFFFFh 00B817C9 cmp dword ptr [k],0FFFFFFFFh 00B817CD jne wmain+66h (0B817E6h) 00B817CF mov esi,esp 00B817D1 push offset string "ui ok" (0B857A0h) 00B817D6 call dword ptr [__imp__puts (0B882ACh)] 00B817DC add esp,4 00B817DF cmp esi,esp 00B817E1 call @ILT+435(__RTC_CheckEsp) (0B811B8h)

Question: Why is this happening? Why does GCC skip the first IF, and how can I get GCC not to skip it? Optimizations are disabled, but it looks like it is still optimizing something ...

+4

c assembly gcc compiler-construction visual-c ++

fritzone May 27 '13 at 12:00

source share

4 answers

unwind · Answer 1 · 2013-05-27T12:05:14+0000

My guess (I'm not a GCC developer) is that it does enough static analysis to prove to itself that the first if test if never true.

This should not be tough too , because there is no code between initialization and test, there is no side effect or external object that could change the variable.

Just for curiosity, try making the variable static and / or volatile to see if something has changed.

Oak · Answer 2 · 2013-05-27T12:21:55+0000

This seems like a problem with the GCC, although admittedly very minor.

From the GCC documentation website (highlighted by me):

Without any optimization option, the goal of the compiler is to reduce the cost of compilation and to debug expected results. Statements are independent : if you stop a program with a breakpoint between statements, you can then assign a new value to any variable or change the program counter to any other statement in the function and get exactly the results that you expect from the source code.

Thus, with -O0 you can place a breakpoint between unsigned char k=-1; and if(k==-1) , during this breakpoint, change k and expect the branch to be busy; but this is not possible with the emitted code.

Marco van de voort · Answer 3 · 2013-05-27T12:05:18+0000

Updated: I assume that char, as a type under the base (int) type, simply scales to an integer type for comparison. (assuming the compiler took the literal as an integer and usually prefers an integer with a word size over a byte size)

And as an unsigned value, a zero extension is always positive (pay attention to MOVZX instead of the signed option!), So the check was probably optimized by the main distribution of constants.

You can try to force the literal to be a byte (listing or suffixes), for example. comparing with ((unsigned char) (- 1)), and perhaps then the compiler will insert a 1-byte comparison, and the result may be different.

cmaster · Answer 4 · 2013-06-02T14:25:08+0000

There are several small points here:

The compiler does not even have to look at the initialization of k to prove that the condition k == - 1 can never be true in the case of the unsigned char. The fact is that the value of unsigned 8 bits should be increased to 32 bits, since the right side of the comparison is an integer constant, which by default is 32 bits. Since k is unsigned, the result of this promotion will be 00000000 00000000 00000000 xxxxxxxx . The constant -1 has a bit pattern of 11111111 11111111 11111111 11111111 , so it does not matter that xxxxxxxx , the result of the comparison will always be false.
I may be wrong in this question, but I believe that even if k was indicated as volatile, the compiler should only load it into the register (since the load operation may cause some desirable side effect in the hardware) so as not to perform a comparison or Create code for an unreachable if block.
Actually, lowering the assembly assembly for unreachable code is fully consistent with the -O0 goal to speed up the compilation process.
AFAIK, comparison of unsigned and negative constants is in any case equal to undefined. At least, there is simply no machine instruction for the proper handling of the case, and compilers will not insert the necessary code to process it in the software, as you can see from the disassembly. All you get is implicit casting between signed and unsigned, which leads to overflow of the integer value (which is itself undefined) and comparison of the unmixed sign.

Compiler Code Mapping

More articles: