Speed ​​optimization and compiler optimization

I am a little versed in research related to the advantages of the speed of creating an inline function. I do not have a book with me, but the one text I read suggested a rather large overhead for calling functions; and when ever the executable size is either negligible or can be saved, the function must be declared inline for speed.

I wrote the following code to test this theory, and from what I can say, there is no speed to declare a function as inline. Both functions, called 4294967295 times, on my computer, execute in 196 seconds.

My question is: what would you think about why this is happening? Is this modern compiler optimization? Will there be a disadvantage of large computations taking place in a function?

Any understanding of this issue will be appreciated. Thanks in advance to friends.

#include < iostream > #include < time.h > // RESEARCH Jared Thomson 2010 //////////////////////////////////////////////////////////////////////////////// // Two functions that preform an identacle arbitrary floating point calculation // one function is inline, the other is not. double test(double a, double b, double c); double inlineTest(double a, double b, double c); double test(double a, double b, double c){ a = (3.1415 / 1.2345) / 4 + 5; b = 9.999 / a + (a * a); c = a *=b; return c; } inline double inlineTest(double a, double b, double c){ a = (3.1415 / 1.2345) / 4 + 5; b = 9.999 / a + (a * a); c = a *=b; return c; } // ENTRY POINT Jared Thomson 2010 //////////////////////////////////////////////////////////////////////////////// int main(){ const unsigned int maxUINT = -1; clock_t start = clock(); //============================ NON-INLINE TEST ===============================// for(unsigned int i = 0; i < maxUINT; ++i) test(1.1,2.2,3.3); clock_t end = clock(); std::cout << maxUINT << " calls to non inline function took " << (end - start)/CLOCKS_PER_SEC << " seconds.\n"; start = clock(); //============================ INLINE TEST ===================================// for(unsigned int i = 0; i < maxUINT; ++i) test(1.1,2.2,3.3); end = clock(); std::cout << maxUINT << " calls to inline function took " << (end - start)/CLOCKS_PER_SEC << " seconds.\n"; getchar(); // Wait for input. return 0; } // Main. 

Build output

Pastebin

+3
source share
9 answers

The inline is mostly useless. This is just a suggestion. The compiler can ignore it and refuse the built-in function, as well as freely embed a function declared without the inline .

If you are really interested in doing the job of calling a function, you should check the resulting assembly to make sure that the function really was (or was not) built-in. I am not familiar with VC ++, but it may have a compiler-specific method of forcing or blocking the embedding of a function (however, the standard C ++ inline will not).

So, I suppose the answer to the wider context of your research is: don't worry about explicit embedding . Modern compilers know when to install and when not, and will usually make better decisions than even very experienced programmers. Therefore, the inline often completely ignored. You should not worry about explicitly forcing or prohibiting the embedding of a function if you do not have a special need to do this (as a result of profiling the execution of your program and determining that a bottleneck can be resolved by forcing the installation of the built-in compiler for some reason not done).

Re: assembly:

 ; 30 : const unsigned int maxUINT = -1; ; 31 : clock_t start = clock(); mov esi, DWORD PTR __imp__clock push edi call esi mov edi, eax ; 32 : ; 33 : //============================ NON-INLINE TEST ===============================// ; 34 : for(unsigned int i = 0; i < maxUINT; ++i) ; 35 : blank(1.1,2.2,3.3); ; 36 : ; 37 : clock_t end = clock(); call esi 

This assembly:

  • Reading hours
  • Saving Clock
  • Reading hours again

Note what is missing: calling your function is a whole bunch of time

The compiler noticed that you are not doing anything with the result of the function and that the function has no side effects, so it is not called at all.

You can probably get it to call a function anyway by compiling with optimization turned off (in debug mode).

+15
source

Both functions can be integrated. The definition of a non-inline function is in the same compilation unit as the point of use, so the compiler is within its rights to inline it even without your request.

Submit the assembly and we can confirm it for you.

EDIT: MSVC compiler pragma to prevent overlay:

 #pragma auto_inline(off) void myFunction() { // ... } #pragma auto_inline(on) 
+1
source

Two things are possible:

  • The compiler can either embed both or both functions. Check your compiler documentation for how to control it.

  • Your function may be complex enough so that the overhead of making a function call is not large enough to make a big difference in tests.

Inlining is great for very small functions, but it's not always better. Code collapse can prevent CPU code caching.

In general, the built-in functions getter / setter and others are single-line. Then, during performance tuning, you can try to build in functions if you think you will get an impulse.

+1
source

Um, shouldn't

 //============================ INLINE TEST ===================================// for(unsigned int i = 0; i < maxUINT; ++i) test(1.1,2.2,3.3); 

will be

 //============================ INLINE TEST ===================================// for(unsigned int i = 0; i < maxUINT; ++i) inlineTest(1.1,2.2,3.3); 

?

But if it was just a typo, it is recommended that you look at the disassembler or reflector to make sure that the code is really embedded or is still pushed onto the stack.

+1
source

The presented code contains a couple of oddities.

1) The mathematics and output of your test functions are completely independent of the function parameters. If the compiler is smart enough to find that these functions always return the same value, this may give him an incentive to optimize their fully built-in or not.

2) Your main function is to call test for inline and non-line tests. If this is the actual code that you were running, then this would play a pretty big role in why you saw the same results.

Like others, it would be useful for you to examine the actual assembly code generated by the compiler to determine that you are actually checking what you intended.

+1
source

Sorry for the little flame ...

Compilers think in assembler. You must too. Whatever you do, just execute the code at the assembler level. Then you will know exactly what the compiler did.

Do not think about performance in absolute terms such as “fast” or “slow”. All this is relative, percentage. A way to create software is to remove things that take too much time through successive steps.

Here's the flames: if the compiler can do a pretty good job of embedding functions that clearly need it, and if it can handle the registry management perfectly, I think that is exactly what it should do. If he can do a reasonable job of deploying loops that obviously can use it, I can handle it. If he goes astray trying to outwit me by deleting calls to functions that I clearly wrote and intended to be called, or scrambled my code, a hypocritical attempt to save JMP, when this JMP takes 0.000001% of the execution time (like Fortran does), I’m frankly annoyed .

There seems to be a concept in the world of compilers that there is no such thing as useless optimization. No matter how smart the compiler is, real optimization is the work of the programmer and no one else.

+1
source

If this test took 196 seconds for each cycle, then you should not have optimized; with optimization disabled, usually compilers do nothing.

Nevertheless, when optimizing on the compiler, you can notice that your test function can be fully evaluated at compile time and suppress it to "return [constant]" - at this point it may well solve both functions built-in, since they are so trivial, and then they notice that the loops are meaningless, since the value of the function is not used, and deflate it too! This is basically what I got when I tried.

So, in any case, you are not checking what you think you tested.


The overhead of functional calls is not the same as before, compared with the overhead of removing the level 1 instruction cache, which makes an aggressive investment. You can easily find reports on the Internet with the gcc -Os option (optimize for size), being the best default choice for large projects than -O2 , and the big reason for this is because -O2 makes bets more aggressively. I would expect this to be very similar to MSVC.

0
source

The only way I know to guarantee a function is #define it

For instance:

 #define RADTODEG(x) ((x) * 57.29578) 

However, the only time I will worry about such a feature will be in the embedded system. On the desktop / server, the performance difference is negligible.

0
source

Run it in the debugger and look at the generated code to make sure that your function is always or never inlined. I find it always useful to take a look at the assembler code when you want to learn more about the optimization that the compiler does.

0
source

Source: https://habr.com/ru/post/989906/


All Articles