Optimization Switches - What Do They Really Do?

Question

Optimization Switches - What Do They Really Do?

Probably everyone uses some kind of optimization switches (in the case of gcc, the most common is -O2, I think).

But what does gcc (and other compilers like VS, Clang) really do in the presence of such parameters?

Of course, there is no definite answer, since it very much depends on the platform, version of the compiler, etc. However, if possible, I would like to put together a set of “rules of thumb”. When should I think of some tricks to speed up the code, and when should I just leave the task to the compiler?

For example, how far will the compiler (a little artificial ...) work for different optimization levels:

1) sin(3.141592) // will it be evaluated at compile time or should I think of a lookup table to speed up the calculations?

2) int a = 0; a = exp(18), cos(1.57), 2; int a = 0; a = exp(18), cos(1.57), 2; // will the compiler calculate exp and cos, although this is not necessary, since the value of the expression is 2?

3)

 for (size_t i = 0; i < 10; ++i) { int a = 10 + i; }

// will the compiler skip the whole loop because it has no visible side effects?

Perhaps you can come up with other examples.

+4

c ++ gcc visual-c ++ clang

user765572 Sep 7 '12 at 11:05

source share

3 answers

The compiler has several optimization passes. Each optimization step is responsible for a number of small optimizations. For example, you might have a skip that calculates arithmetic expressions at compile time (so you can express 5 MB as 5 * (1024 * 1024) without a penalty, for example). Another pass of built-in functions. Another searches for unreachable code and kills it. And so on.

Then the compiler developers decide which of these passes they want to execute in which order. For example, suppose you have this code:

 int foo(int a, int b) { return a + b; } void bar() { if (foo(1, 2) > 5) std::cout << "foo is large\n"; }

If you delete this code, nothing happens. Similarly, if you perform expression reduction, nothing happens. But inliner can decide that foo is small enough to be inline, so it replaces the call in the bar with the body of the function, replacing the arguments:

 void bar() { if (1 + 2 > 5) std::cout << "foo is large\n"; }

If you are now performing expression reduction, you must first decide that 1 + 2 is 3, and then decide that 3> 5 is false. So you get:

 void bar() { if (false) std::cout << "foo is large\n"; }

And now fixing the dead code will see if (false) and kill it, so the result is:

 void bar() { }

But now the bar is suddenly very tiny when it was bigger and more complex. Therefore, if you run inliner again, it will be able to integrate the bar into its callers. This may provide even more room for optimization, etc.

For compiler developers, this is a trade-off between compilation time and generated code quality. They determine the startup sequence of optimizers based on heuristics, testing, and experience. But since one size does not fit everyone, they set some knobs to adjust. The main handle for gcc and clang is the -O option family. -O1 launches a short list of optimizers; -O3 launches a much longer list containing more expensive optimizers, and retries occur more often.

In addition to determining which optimizers work, parameters can also tune the internal heuristics used by various passes. For example, inliner, as a rule, there are many parameters that determine when to embed a function. Pass-O3, and these parameters will be more oriented towards the built-in functions when there is a chance of increasing productivity; pass -Os, and parameters will cause only tiny functions (or functions supposedly called exactly once) to be embedded, since everything else will increase the size of the executable file.

+1

Sebastian redl Sep 7 '12 at 12:13

source share

Compilers do any kind of optimization you can't think of. Especially C ++ compilers.

They perform functions such as unrolling loops, executing built-in functions, eliminating dead code, replacing several instructions with just one, etc.

I can give one piece of advice: in C / C ++ compilers, you can believe that they will perform many optimizations.

Take a look at [1].

[1] http://en.wikipedia.org/wiki/Compiler_optimization

0

coredump Sep 7 '12 at 11:12

source share

Matthieu M. · Accepted Answer · 2012-09-07T11:27:44+0000

If you want to know what the compiler does, it is best to take a look at the compiler documentation. For optimization, you can, for example, view LLVM Analysis and Transform Passes .

1) sin (3.141592) // will this be evaluated at compile time?

Maybe. There is very precise semantics for IEEE float calculations. This may be surprising if you change the flags of the processor at runtime, by the way.

2) int a = 0; a = exp (18), cos (1.57), 2;

It depends:

are exp and cos functions inline or not
if they are not, are they properly annotated (therefore, the compiler knows that they have no side effect).

For functions taken from your standard C or C ++ library, they must be correctly recognized / annotated.

Regarding the elimination of the calculation:

-adce : Aggressively destroying dead code.
-dce : Eliminate dead code
-die : -die dead teams
-dse : Eliminate dead storage

compilers love finding code that is useless :)

3)

Similarly 2) . The result of the store is not used, and the expression has no side effect.

-loop-deletion : Remove dead loops

And for the finale: what did not put the compiler in the test?

 #include <math.h> #include <stdio.h> int main(int argc, char* argv[]) { double d = sin(3.141592); printf("%f", d); int a = 0; a = (exp(18), cos(1.57), 2); /* need parentheses here */ printf("%d", a); for (size_t i = 0; i < 10; ++i) { int a = 10 + i; } return 0; }

The clan tries to be useful already at compile time:

 12814_0.c:8:28: warning: expression result unused [-Wunused-value] int a = 0; a = (exp(18), cos(1.57), 2); ^~~ ~~~~ 12814_0.c:12:9: warning: unused variable 'a' [-Wunused-variable] int a = 10 + i; ^

And the emitted code (LLVM IR):

 @.str = private unnamed_addr constant [3 x i8] c"%f\00", align 1 @.str1 = private unnamed_addr constant [3 x i8] c"%d\00", align 1 define i32 @main(i32 %argc, i8** nocapture %argv) nounwind uwtable { %1 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([3 x i8]* @.str, i64 0, i64 0), double 0x3EA5EE4B2791A46F) nounwind %2 = tail call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([3 x i8]* @.str1, i64 0, i64 0), i32 2) nounwind ret i32 0 }

Notice, that:

as predicted, the calculation of sin was allowed at compile time
as predicted by exp and cos .
as expected, the cycle is also devoid of.

If you want to delve into compiler optimization, I would advise you:

learn to read IR (it's incredibly simple, really, much more than assembly)
use the LLVM Try Out page to test your assumptions.

Optimization Switches - What Do They Really Do?

More articles: