Suppose I have the following code:
struct Foo { void helper() { ... } void fast_path() { ...; helper(); ... } void slow_path1() { ...; helper(); ... } void slow_path2() { ...; helper(); ... } };
The fast_path() method is fast_path() critical, so all (reasonable) efforts must be made to make it as fast as possible. The slow_path1() and slow_path2() methods are not performance critical.
From my point of view, a typical compiler can look at this code and decide not to embed helper() if it is complex enough to reduce the overall size of the command, since helper() is shared between several method functions. The same compiler could inline helper() if slow-path methods did not exist.
Given our required performance characteristics, we want the compiler to introduce the helper() call inside fast_path() , but prefers the default behavior for the compiler to slow_path1() and slow_path2() .
The workaround is for the slow path function definitions and the fast_path() call to fast_path() done in separate compilation units, so the compiler has never seen the use of helper() shared with fast_path() . But preserving this separation requires special care and cannot be enforced using the compiler. In addition, distributing files (Foo.h, FooINLINES.cpp, and now Foo.cpp) is undesirable, and additional compilation units complicate the assembly, possibly having a header-only library.
Is there a better way?
Ideally, I would like to create a new C ++ do_not_inline_function_calls_inside_me keyword, which I could use as follows:
do_not_inline_function_calls_inside_me void slow_path1() { ... } do_not_inline_function_calls_inside_me void slow_path2() { ... }
Alternatively, the keyword inline_function_calls_inside_me , for example:
inline_function_calls_inside_me void fast_path() { ... }
Note that these hypothetical keywords adorn the *_path*() methods, not the helper() method.
An example context in which you might have such performance requirements is a programming competition in which each participant writes an application that listens to sparse global broadcasts of data types A and B. When broadcast messages of type B are received, each application must execute calculation, which depends on the sequence of previously transmitted messages of type A, and transmits the calculation result to a central server. The first correct transponder for each type of B-broadcast rating point. The nature of the computational problem may allow preliminary computation for type A updates; there is no advantage to doing this quickly.