Is C ++ communication smart enough to avoid linking unused libraries?

Question

Is C ++ communication smart enough to avoid linking unused libraries?

I am far from a complete understanding of how the C ++ linker works, and I have a specific question about this.

Say I have the following:

Utils.h

namespace Utils { void func1(); void func2(); }

Utils.cpp

 #include "some_huge_lib" // needed only by func2() namespace Utils { void func1() { /* do something */ } void func2() { /* make use of some functions defined in some_huge_lib */ } }

main.cpp

 int main() { Utils::func1() }

My goal is to generate the smallest possible binaries.

My question is: will some_huge_lib included in the output file of the object?

+42

c ++ linker

idanshmu Sep 08 '14 at 10:07 on

source share

4 answers

It depends on what tools and switches you use for communication and compilation.

First, if the link some_huge_lib is a shared library, all codes and dependencies will need to be resolved when linking the shared library. So yes, it’s tightened somewhere.

If you bind some_huge_lib as an archive, then it depends. It is good practice for the common sense of the reader to create func1 and func2 in separate source code files, in which case, generally speaking, the linker will be able to ignore unused object files and their dependencies.

If, however, you have both functions in the same file, some compilers should tell them to generate separate sections for each function. Some compilers do this automatically; some do not do this at all. If you do not have this option, pulling func1, you will get all the code for func2, and all the dependencies will need to be solved.

+20

Tom Tanner Sep 08 '14 at 10:42 on

source share

Think of each function as a node in a graph.
Each node is associated with a piece of binary code - the compiled binary code of the node function.
There is a link (directional border) between two nodes if one node function (function) depends on (calls) the other.

A static library is primarily a list of such nodes (+ index).

Running the node program is the main() function.
The linker crosses the graph from main() and links all nodes that are reachable with main() into the executable file. That is why it is called the linker (the binding maps the addresses of function calls in the executable file).

Unused functions do not have links from nodes in the graph coming from main() .
Thus, such disconnected nodes are not available and are not included in the final executable file.

The executable file (unlike the static library) is, first of all, a list of all nodes accessible from main() (+ index and start code, among other things).

+7

Adi Shavit Sep 08 '14 at 12:48

source share

In addition to the other answers, I must say that linkers usually work in terms of sections, not functions.

Compilers usually configure whether they put all of your object code in a single monolithic section or split them into several smaller ones. For example, the GCC options for enabling splitting are -ffunction-sections (for code) and -fdata-sections (for data); MSVC /Gy option (for both). -fnofunction-sections , -fnodata-sections , /Gy- respectively, to put all the code or data in one section.

You can “play” with compiling your modules in both modes and then reset them ( objdump for GCC, dumpbin for MSVC) to see the generated structure of the object files.

Once a section is formed by the compiler, for the linker it is one. Sections define characters and refer to characters defined in other sections. The compiler will build a graph of dependencies between partitions (starting with several roots), and then either decompose or save each of them entirely. So, if you have an unused and unused function in a section, an unused function will be saved.

Both modes have advantages and disadvantages. Turning shards means smaller executables, but larger object files and longer time snaps.

It should also be noted that in C ++, unlike C, there are certain situations where the rule of one definition is relaxed and multiple definitions of a function or data object are allowed (for example, in the case of built-in functions). The rules are formulated in such a way that the linker is allowed to choose any definition.

From the point of view of sections, the inclusion of built-in functions along with non-lowercase ones would mean that in a typical use case, the linker usually must force to save almost every definition of each built-in function; this will mean excessive bloating of the code. Therefore, such functions and data are usually placed in their sections regardless of the compiler command line parameters.

UPDATE: As @janm correctly recalled in his comment, the linker should also be instructed to get rid of unwritten sections by specifying --gc-sections (GNU) or /opt:ref (MS).

+5

ach 09 Sep '14 at 1:44

source share

Marco A. · Accepted Answer · 2014-09-08 10:15

Including or linking to large libraries will usually not matter if you are not using this material. Linkers should perform dead code removal and thus ensure that during build you will not receive large binaries with a lot of unused code (read the compiler / linker manual to find out more, this is not applied by the C ++ standard).

Including a large number of headers will also not increase your binary size (but it can significantly increase compilation time, pre-compiled headers). Some exceptions are global objects and dynamic libraries (they cannot be separated). I also recommend reading this passage (gcc only) regarding the division of code into several sections.

Last performance notice: if you use a lot of position- specific code (that is, code that cannot just match any address with relative offsets, but requires some “hotpatching” through a move or similar table), then there will be an initial cost .

Is C ++ communication smart enough to avoid linking unused libraries?

More articles: