Why not check all the built-in ones?

Question

Why not check all the built-in ones?

First, I'm not looking for a way to force the compiler to embed an implementation of each function.

To reduce erroneous answers, make sure you understand what the inline keyword really means. Here is a good description of inline vs static vs extern .

So my question is, why not mark every definition of the inline function? those. ideally, the only compilation unit would be main.cpp . Or perhaps a few more functions that cannot be defined in the header file (idiom pimpl, etc.).

The theory behind this odd query will provide maximum optimizer information for the job. Of course, this may be a built-in implementation of functions, but it can also perform cross-module optimization, since there is only one module. Are there any other benefits?

Has anyone tried this with a real application? Has performance increased? reduce?!?

What are the disadvantages of marking all inline function definitions?

Compilation can be slower and will consume much more memory.
Iterative assemblies are broken, the whole application will need to be rebuilt after each change.
Link time can be astronomical.

All these shortcomings affect only the developer. What are the disadvantages of runtime?

+42

c ++ optimization inline

deft_code Oct 22 '10 at 18:27

source share

11 answers

sqlite uses this idea. During development, it uses the traditional source structure. But for actual use there is one huge c file (112k lines). They do this for maximum optimization. 5-10% performance improvement requirement

http://www.sqlite.org/amalgamation.html

+15

pm100 Oct 22 '10 at 19:11

source share

We (and some other gaming companies) tried to do this by making one uber-.CPP, which #include edited all the others; This is a well-known technique. In our case, this does not seem to have much impact on runtime, but the compilation flaws you are talking about turned out to be completely crippling. Half an hour after each individual change, iteration becomes impossible. (And this is with the app divvied in over a dozen different libraries.)

We tried to make a different configuration so that during debugging we had several .obj, and then uber-CPP only in release-opt assemblies, but then we ran into the memory shortage problem with the compiler. For a large enough application, the tools simply do not consist in compiling a multi-million dollar cpp line file.

We also tried LTCG, and this provided a small but pleasant runtime optimizer, in rare cases when it did not just crash during the connection phase.

+8

Crashworks Oct 22 '10 at

source share

This is semi-connected, but note that Visual C ++ has the ability to perform cross-module optimization, including built-in modules. See http://msdn.microsoft.com/en-us/library/0zza0de8%28VS.80%29.aspx for information.

To add an answer to the original question, I don’t think that there will be a flaw at run time, assuming the optimizer is smart enough (hence why it was added as an optimization option in Visual Studio). Just use the compiler smart enough to do this automatically without creating all the problems you mentioned. :)

+7

Nick Oct 22 2018-10-22

source share

Interest Ask! You are certainly right that all of these shortcomings are specific to the developer. I would suggest, however, that a disadvantaged developer is far less likely to produce a quality product. There can be no flaws at runtime, but imagine how reluctant the developer will be to make small changes if each compiler takes several hours (or even days).

I would look at it from the angle of "premature optimization": the modular code in several files made life easier for the programmer, so the obvious benefit from this. Only if a particular application turns out to be too slow, and you can show that investing everyone makes a noticeable improvement, I would even think about the inconvenience of the developers. Even then, this would be after most of the development has been done (so that it can be measured) and probably only for production assemblies.

+7

e.James Oct 22 2018-10-18

source share

A slight advantage On a good compiler for a modern platform, inline will only affect very few functions. This is just a hint at the compiler, modern compilers are pretty good at this solution, and the overhead of calling a function has become quite small (often the main advantage of embedding is not reducing overhead, but opening up further optimizations).

Compilation time However, since inline also changes semantics, you need to #include everything in one huge compiler. This usually significantly increases compilation time, which is a killer in large projects.

Code size
If you move away from existing desktop platforms and high-performance compilers, things will change a lot. In this case, the increased code size created by the less intelligent compiler will be a problem - so much so that it makes the code much slower. On embedded platforms, code size is usually the first limitation.

However, some projects may profit from inline all. This gives you the same effect as optimizing link time, at least if your compiler is not blindly following inline .

+3

peterchen Oct 23 2018-10-23 at

source share

This has already been done in some cases. It is very similar to the idea of creating union units , and the advantages and disadvantages are not what you omit:

more options for optimizing the compiler
link time is mostly gone (if everything is in the same translation unit, there really is no link)
Compilation time is coming, well, one way or another. As you mentioned, incremental constructs become impossible. On the other hand, a complete assembly will be faster than otherwise (since each line of code is compiled exactly once. In a normal assembly, the code in the headers ends with compilation in each translation unit that includes the header)

But in cases where you already have a lot of code for headers only (for example, if you use a lot of Boost), this can be a very useful optimization, both in terms of build time and the performance of the executable.

As always, when it comes to performance, it depends. This is not a bad idea, but it is not universal.

As for runtime, you have basically two ways to optimize it:

minimize the number of translation units (so that your headings are included in fewer places) or
minimize the amount of code in the headers (so that the cost of including a header in several translation units is reduced)

C code usually takes the second option, to a large extent to the extreme: almost nothing but forward declarations and macros is stored in the headers. C ++ is often around the middle where you get the worst possible total build time (but PCH and / or incremental builds can shave off again for a while), but going further in the other direction, minimizing the number of translation units can really do wonders for the total build time .

+3

jalf Oct 24 2018-10-10T00:

source share

This is largely a philosophy of Optimizing the entire program and generating code time code (LTCG): optimization options are best with global knowledge.

From a practical point of view, this is a kind of pain, because now every change you make will require recompilation of the entire source tree. Generally speaking, you need an optimized build less often than you need to make arbitrary changes.

I tried this in the Metrowerks era (it’s pretty easy to configure using the Unity style), and the compilation never finished. I mentioned this only to indicate that this is a workflow setting that can impose a taxation in a way that they did not expect.

+2

Joe Valenzuela Oct 22 '10 at 18:40

source share

It is assumed that the compiler cannot optimize functions. This is a limitation of specific compilers, not a common problem. Using this as a general solution to a specific problem can be bad. The compiler can very easily inflate your program with the fact that it could be reused by functions at the same memory address (getting cache) compiled in another place (and performance loss due to the cache).

Large functions in the total cost during optimization, there is a balance between the overhead of local variables and the amount of code in the function. Saving the number of variables in the function (both past and local) up to the number of one-time variables for the platform leads to the fact that most of them can remain in the registers and should not fail, and the frame is not required (depending on the purpose) ), therefore, overhead costs are significantly reduced. It is difficult to do in real-world applications all the time, but the alternative is a small number of large functions with a large number of local variables, the code will spend a considerable amount of time evicting and loading registers with variables to / from ram (depends on the target).

Try llvm, it can optimize not only function by function throughout the program. Release 27 came to the gcc optimizer, at least for a test or two, I did not do exhaustive performance testing. And 28 no, so I guess it's better. Even with multiple files, the number of combinations of settings buttons is too large to mess with. I believe that it’s best not to optimize until you include the entire program in one file, and then do your optimization, providing the optimizer with the whole program to work, basically what you are trying to do with inlining, but without baggage.

+2

old_timer Oct 22 2018-10-22

source share

The problem with inlining is that you want high performance features to match the cache. You might think that the overhead of running is a big hit in performance, but in many cache misses, steam will hit, push and push out of the water. For example, if you have a large (possibly deep) function that needs to be called very rarely from the main high-performance path, this can lead to your main high-performance loop growing to such an extent that it is not suitable for L1 icache . This will slow your code down; it will be more than a random function call.

0

nmichaels Oct 22 2018-10-18

source share

Suppose foo() and bar() both calls to some helper() . If everything is in the same compilation unit, the compiler can choose a non-built-in helper() to reduce the overall size of the command. This causes foo() to call the non-built-in helper() function.

The compiler does not know that the nanosecond improvement in the foo() runtime adds $ 100 / day to your bottom line in anticipation. He does not know that improving performance or worsening anything outside of foo() does not affect your bottom line.

Only you, as a programmer, know these things (after carefully profiling and analyzing the course). The decision not to embed bar() is a way to tell the compiler what you know.

0

dshin Jan 05 '16 at 2:52 on

source share

Ben Voigt · Accepted Answer · 2010-10-22 18:35

Did you really mean #include everything? This will give you only one module and allow the optimizer to immediately see the entire program.

Actually, Microsoft Visual C ++ does just that, when you use the /GL switch (Optimization of the whole program) , it actually does not compile everything until the linker starts up and gets access to the whole code. Other compilers have similar options.

Why not check all the built-in ones?

First, I'm not looking for a way to force the compiler to embed an implementation of each function.

More articles: