How exactly does gcc do the optimization?

To find out exactly how gcc does the optimization, I wrote two compilations of the program with -O2, but there is some difference in the build code. In my programs, I want to output "hello" in a loop and add some delay between each output. These two programs are intended only to illustrate my question, and I know that I can use volatile or asm in program 1 to achieve my goal.

Program 1

#include <stdio.h> int main(int argc, char **argv) { unsigned long i = 0; while (1) { if (++i > 0x1fffffffUL) { printf("hello\n"); i = 0; } } } 

Compile with -O2, build code:

 Disassembly of section .text.startup: 00000000 <_main>: #include <stdio.h> int main(int argc, char **argv) { 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 e4 f0 and $0xfffffff0,%esp 6: 83 ec 10 sub $0x10,%esp 9: e8 00 00 00 00 call e <_main+0xe> e: 66 90 xchg %ax,%ax 10: c7 04 24 00 00 00 00 movl $0x0,(%esp) 17: e8 00 00 00 00 call 1c <_main+0x1c> 1c: eb f2 jmp 10 <_main+0x10> 1e: 90 nop 1f: 90 nop 

Program 2

 int main(int argc, char **argv) { unsigned long i = 0; while (1) { if (i > 0x1fffffffUL) { printf("hello\n"); i = 0; } i++; } } 

Compile with -O2, build code:

 Disassembly of section .text.startup: 00000000 <_main>: #include <stdio.h> int main(int argc, char **argv) { 0: 55 push %ebp 1: 89 e5 mov %esp,%ebp 3: 83 e4 f0 and $0xfffffff0,%esp 6: 83 ec 10 sub $0x10,%esp 9: e8 00 00 00 00 call e <_main+0xe> e: 31 c0 xor %eax,%eax 10: 83 c0 01 add $0x1,%eax 13: 3d ff ff ff 1f cmp $0x1fffffff,%eax 18: 76 f6 jbe 10 <_main+0x10> 1a: c7 04 24 00 00 00 00 movl $0x0,(%esp) while (1) { if (i > 0x1fffffffUL) { printf("hello\n"); i = 0; } i++; 21: e8 00 00 00 00 call 26 <_main+0x26> int main(int argc, char **argv) { unsigned long i = 0; while (1) { if (i > 0x1fffffffUL) { 26: 31 c0 xor %eax,%eax 28: eb e6 jmp 10 <_main+0x10> printf("hello\n"); 2a: 90 nop 2b: 90 nop 2c: 90 nop 2d: 90 nop 2e: 90 nop 2f: 90 nop 

In program 1, the increase in i optimized, but this is not in program 2. Why is this happening? What rules do gcc use when optimizing with -O2 for these two programs?

+5
source share
2 answers

Asking the why about optimizers is usually a waste of time, because there are no β€œrules” that optimizers work with except β€œas if”: the optimizer cannot change the observed behavior of the corresponding code.

The "observed behavior" of both of your programs is to print hi several times.

In your first program, the count is optimized, making the observed behavior faster. This is the task of the optimizer. Be happy that your code is more efficient now!

In your second program, the calculation is not optimized, because somehow the optimizer - in this version of this compiler with this - did not see that it could do without it. What for? Who knows (other than the compiler optimizer support module)?

If your desired behavior should have a delay between outputs, use something like thrd_sleep () . Empty count cycles were a way to delay BASIC 2.0 C64 programs, but they cannot be used in C for the reason you just observed: you never know what the optimizer does.

+9
source

Branching in an if statement now depends on what happened in the previous iteration of the loop. In particular, the compiler can easily determine in program 1 that i increases at each iteration of the while loop (since it is located at the top right), while this is not the case in program 2.

In any case, compiler optimization is very complicated. See below:

gcc -O2 is the shortcut for these flags: (from the documentation )

  -fauto-inc-dec -fbranch-count-reg -fcombine-stack-adjustments -fcompare-elim -fcprop-registers -fdce -fdefer-pop -fdelayed-branch -fdse -fforward-propagate -fguess-branch-probability -fif-conversion2 -fif-conversion -finline-functions-called-once -fipa-pure-const -fipa-profile -fipa-reference -fmerge-constants -fmove-loop-invariants -freorder-blocks -fshrink-wrap -fsplit-wide-types -fssa-backprop -fssa-phiopt -ftree-bit-ccp -ftree-ccp -ftree-ch -ftree-coalesce-vars -ftree-copy-prop -ftree-dce -ftree-dominator-opts -ftree-dse -ftree-forwprop -ftree-fre -ftree-phiprop -ftree-sink -ftree-slsr -ftree-sra -ftree-pta -ftree-ter -funit-at-a-time -fthread-jumps -falign-functions -falign-jumps -falign-loops -falign-labels -fcaller-saves -fcrossjumping -fcse-follow-jumps -fcse-skip-blocks -fdelete-null-pointer-checks -fdevirtualize -fdevirtualize-speculatively -fexpensive-optimizations -fgcse -fgcse-lm -fhoist-adjacent-loads -finline-small-functions -findirect-inlining -fipa-cp -fipa-cp-alignment -fipa-sra -fipa-icf -fisolate-erroneous-paths-dereference -flra-remat -foptimize-sibling-calls -foptimize-strlen -fpartial-inlining -fpeephole2 -freorder-blocks-algorithm=stc -freorder-blocks-and-partition -freorder-functions -frerun-cse-after-loop -fsched-interblock -fsched-spec -fschedule-insns -fschedule-insns2 -fstrict-aliasing -fstrict-overflow -ftree-builtin-call-dce -ftree-switch-conversion -ftree-tail-merge -ftree-pre -ftree-vrp -fipa-ra 

Each of these flags corresponds to another possible optimization that the compiler can allow.

+2
source

Source: https://habr.com/ru/post/1244204/


All Articles