Offer the compiler selectively built-in function calls

Suppose I have the following code:

struct Foo { void helper() { ... } void fast_path() { ...; helper(); ... } void slow_path1() { ...; helper(); ... } void slow_path2() { ...; helper(); ... } }; 

The fast_path() method is fast_path() critical, so all (reasonable) efforts must be made to make it as fast as possible. The slow_path1() and slow_path2() methods are not performance critical.

From my point of view, a typical compiler can look at this code and decide not to embed helper() if it is complex enough to reduce the overall size of the command, since helper() is shared between several method functions. The same compiler could inline helper() if slow-path methods did not exist.

Given our required performance characteristics, we want the compiler to introduce the helper() call inside fast_path() , but prefers the default behavior for the compiler to slow_path1() and slow_path2() .

The workaround is for the slow path function definitions and the fast_path() call to fast_path() done in separate compilation units, so the compiler has never seen the use of helper() shared with fast_path() . But preserving this separation requires special care and cannot be enforced using the compiler. In addition, distributing files (Foo.h, FooINLINES.cpp, and now Foo.cpp) is undesirable, and additional compilation units complicate the assembly, possibly having a header-only library.

Is there a better way?

Ideally, I would like to create a new C ++ do_not_inline_function_calls_inside_me keyword, which I could use as follows:

  do_not_inline_function_calls_inside_me void slow_path1() { ... } do_not_inline_function_calls_inside_me void slow_path2() { ... } 

Alternatively, the keyword inline_function_calls_inside_me , for example:

  inline_function_calls_inside_me void fast_path() { ... } 

Note that these hypothetical keywords adorn the *_path*() methods, not the helper() method.

An example context in which you might have such performance requirements is a programming competition in which each participant writes an application that listens to sparse global broadcasts of data types A and B. When broadcast messages of type B are received, each application must execute calculation, which depends on the sequence of previously transmitted messages of type A, and transmits the calculation result to a central server. The first correct transponder for each type of B-broadcast rating point. The nature of the computational problem may allow preliminary computation for type A updates; there is no advantage to doing this quickly.

+5
source share
1 answer

Generally speaking, you should not try to be smarter than the compiler. Modern compilers do a great job of defining how built-in functions and people are notoriously poor at it.

In my experience, the best thing you can do is to have all the relevant functions there as inline functions in the same translation unit so that the compiler can see its definition and can embed them as it sees fit. However, Levae makes the final decision on whether to embed this function in the compiler and use the “forced inline” very sparingly if you have no evidence that it has a positive effect in this situation.

To facilitate the work with the compiler, you can provide him with additional information about your program. In GCC and Clang, you can use function attributes to do this.

 struct Foo { void helper(); void fast_path() __attribute__ ((hot)); void slow_path1() __attribute__ ((cold)); void slow_path2() __attribute__ ((cold)); }; inline void Foo::helper() { … } inline void Foo::fast_path() { … } inline void Foo::slow_path1() { … } inline void Foo::slow_path2() { … } 

This will tell the compiler to optimize Foo::fast_path more aggressively for speed and Foo::slow_path1 and Foo::slow_path2 for a small cache size. If any of these functions calls Foo::helper , it can make decisions in each case, regardless of whether to embed it or not. (See the documentation in the related manual for the exact effect of annotations.)

An even better way to help the compiler is to give it actual profiling data. With GCC, you can compile your program with the -fprofile-generate option. This will serve as a tool for your binary code with code that collects profile statistics. Now run your program with a representative set of inputs. This will create a *.gcda file with profile data. Now recompile with the -fprofile-use option. GCC will use the collected profile information to decide which paths in your code are hot and how they interact with each other. This method is known as Profile Based Optimization (PGO).

Of course, if you are worried about such things, first make sure you turn on the appropriate optimization levels ( -O2 ). Particularly heavy C ++ code (i.e., almost everything that uses the standard library or Boost) can generate really ugly machine code when compiling without decent optimization. Also consider whether you want to compile assert ions into your code ( -DNDEBUG ).

+3
source

Source: https://habr.com/ru/post/1239846/


All Articles