Why are unused objects in the STATIC lib included in the final binary when the SHARED lib references them?

Summary:

The cross-function between STATIC and SHARED lib causes all STATIC lib objects (even unused!) To be included in the final binary!

You do not understand what I mean, I suppose? : - R

Sit and read the whole story below! The name has been changed to protect the innocent. An example goal was simplicity and reproducibility.

Teaser: There is SSCCE ! (Short, Self Contained, Correct (compilation), Example: http://www.sscce.org/ )

In the beginning I had:

  • binary code ( main ) calling the function ( fun1a() ) stored in STATIC lib ( libsub.a ). main also have an internal function ( mainsub() ).

  • a STATIC lib ( libsub.a ) that contains the NECESSARY objects, each of which has several functions used by other sources.

Compiling main will result in a binary file having ONLY a copy of the object (s) (STATIC lib) containing the specified functions. In the example below, main will only contain the functions of the shared1.o object (because main calls func1a() ) and the NOT functions of shared2.o (because there are no links).

OK!

  main.c libsub.a +-------------+ +------------+ | main | | shared1.o | | func1a() | <----> | func1a() | | mainsub() | | func1b() | +-------------+ | ---- | | shared2.o | | func2a() | | func2b() | +------------+ 

As an improvement, I wanted โ€œexternalโ€ people to be able to overwrite functions called in main with their own code, without having to recompile the MY binary.

In any case, I did not provide the source nor my static lib.

To do this, I intended to provide a skeleton source of the ready-to-fill function. (Is this called USER-EXIT ?!) Using the SHARED / DYNAMIC lib could do this IMHO. Functions that can be overwritten are either internal or main ( mainsub() ) or shared functions ( func1a() ...) and will be stored in a shared library (.so) for adding / referencing during the link.

New sources were created with the prefix "c", which will contain the "standard" functions of the "Client". A switch using (or not) an overwritten function is out of scope. Just take it as if the value of the UE were true, overwriting will be done.

cmain.c is a new source containing Client_mainsub() , which could be called a "replacement" for mainsub()

cshared1.c is a new source containing Client_func1a() , which might be called a "replacement" for func1a() . Indeed, all functions from shared1.c can have their replacement in cshared1.c

cshared2.c is a new source containing Client_func2a() , which could be called a "replacement" for func2a()

Overview becomes:

  main.c libsub.a clibsub.so +-----------------------+ +------------------------+ +--------------------+ | main | | shared1.o | | cshared1.o | | func1a() {} | | func1a() | | Client_func1a() | | mainsub() | <-> | { if UE | <-> | {do ur stuff } | | { if UE | | Client_func1a() | | | | Client_mainsub() | | return } | | cshared2.o | | return }| | func1b() | | Client_func2a() | +-----------------------+ | ------- | >| {do ur stuff } | ^ | shared2.o | / +--------------------+ cmain.cv | func2a() | / +--------------------+ | { if UE | / | cmain | | Client_func2a() |< | Client_mainsub() | | return } | | {do ur stuff } | | func2b() | +--------------------+ +------------------------+ 

Here again, since main does not call func2a() and func2b() , the (STATIC) shared2.o object is not included in the binary, and the reference to (SHARED) Client_func2a() does not exist. OK!


Finally, simply overwriting functions was not enough (or too much!). I wanted external people to be able to call my function (or not) ... but ALSO let them do something right BEFORE and / or right AFTER my function.

Therefore, instead of stupidly replacing Client_func2a() with Client_func2a() , we would be rude in pseudo-code:

  shared2.c | cshared2.c (assume UE=true) | func2a() { |Client_func2a() { if UE {} | Client_func2a() ==> do (or not) some stuf PRE call | | if (DOIT) { // activate or not standard call | UE=false | func2a() // do standard stuff | UE=true | } else | { do ur bespoke stuff } | | do (or not) some stuf POST call | } <== } else { do standard stuff } } 

Remember that cshared2.c provided to other people who (or not) can do their own things on the provided skeleton.

(Note: setting the UE to false and vice versa to true in Client_func2a() avoids the endless loop in func2a() call! ;-))

Now my problem.

In this case, the binary result result now includes the shared2.o object, despite the fact that NO is not executed mainly for any shared2.c and cshared2.c !!!!!

After searching, it looks due to cross-calls / links:

 shared2.o contains func2a() that may call Client_func2a() cshared2.o contains Client_func2a() that may call func2a() 

So why does main binary contain shared2.o?

 >dump -Tv main main: ***Loader Section*** ***Loader Symbol Table Information*** [Index] Value Scn IMEX Sclass Type IMPid Name [0] 0x00000000 undef IMP RW EXTref libc.a(shr_64.o) errno [1] 0x00000000 undef IMP DS EXTref libc.a(shr_64.o) __mod_init [2] 0x00000000 undef IMP DS EXTref libc.a(shr_64.o) exit [3] 0x00000000 undef IMP DS EXTref libc.a(shr_64.o) printf [4] 0x00000000 undef IMP RW EXTref libc.a(shr_64.o) __n_pthreads [5] 0x00000000 undef IMP RW EXTref libc.a(shr_64.o) __crt0v [6] 0x00000000 undef IMP RW EXTref libc.a(shr_64.o) __malloc_user_defined_name [7] 0x00000000 undef IMP DS EXTref libcmain.so Client_mainsub1 [8] 0x00000000 undef IMP DS EXTref libcshared.so Client_func1b [9] 0x00000000 undef IMP DS EXTref libcshared.so Client_func1a [10] 0x00000000 undef IMP DS EXTref libcshared.so Client_func2b <<< but why ??? ok bcoz func2b() is referenced ... [11] 0x00000000 undef IMP DS EXTref libcshared.so Client_func2a <<< but why ??? ok bcoz func2a() is referenced ... [12] 0x110000b50 .data ENTpt DS SECdef [noIMid] __start [13] 0x110000b78 .data EXP DS SECdef [noIMid] func1a [14] 0x110000b90 .data EXP DS SECdef [noIMid] func1b [15] 0x110000ba8 .data EXP DS SECdef [noIMid] func2b <<< but why this ? Not a single call is made in main ??? [16] 0x110000bc0 .data EXP DS SECdef [noIMid] func2a <<< but why this ? Not a single call is made in main ??? 

Please note that just inserting the comment func2a() (and func2b() ) solves the problem with the channel (breaking the cross) ... but this is not possible since I would like to keep the general lib !?

The behavior happens on AIX 7.1 with IBM XL C / C ++ 12.1, but it looks like it is on Linux (Red Hat 5 + GCC 5.4 with a slight change in the compilation option)

 IBM XL C/C++ for AIX, V12.1 (5765-J02, 5725-C72) Version: 12.01.0000.0000 Driver Version: 12.01(C/C++) Level: 120315 C Front End Version: 12.01(C/C++) Level: 120322 High-Level Optimizer Version: 12.01(C/C++) and 14.01(Fortran) Level: 120315 Low-Level Optimizer Version: 12.01(C/C++) and 14.01(Fortran) Level: 120321 

So, I realized that this is certainly a misunderstanding. Can anyone explain?


As promised, this is SSCCE. You can reproduce my problem by recreating / downloading the following small files and running go.sh (see the comment inside the script)

Edit1 : code added to question, not to external site as suggested

main.c

 #include <stdio.h> #include "inc.h" extern void func1a (), func1b (); int UEXIT(char* file, char* func) { printf(" UEXIT file=<%s> func=<%s>\n",file,func); return 1; /* always true for testing */ } main (){ printf(">>> main\n"); func1a (); mainsub (); printf("<<< main\n"); } mainsub () { printf(">>> mainsub\n"); if(UEXIT("main","mainsub")) { Client_mainsub1(); return; } printf("<<< mainsub\n"); } 

cmain.c

 #include <stdio.h> #include "inc.h" void Client_mainsub1 () { printf(">>>>>> Client_mainsub1\n"); printf("<<<<<< Client_mainsub1\n"); return; } 

inc.h

 extern int UEXIT(char * fileName, char * functionName); 

shared1.c

 #include <stdio.h> #include "inc.h" void func1a (){ printf(">>>>> func1a\n"); if(UEXIT("main","func1a")) { Client_func1a(); return; } printf("<<<<< func1a\n"); } void func1b (){ printf(">>>>> func1b\n"); if(UEXIT("main","func1b")){ Client_func1b(); return; } printf("<<<<< func1b\n"); } 

shared2.c

 #include <stdio.h> #include "inc.h" void func2a (){ printf(">>>>> func2a\n"); if(UEXIT("main","func2a")) { Client_func2a(); return; } printf("<<<<< func2a\n"); } void func2b (){ printf(">>>>> func2b\n"); if(UEXIT("main","func2b")){ Client_func2b(); return; } printf("<<<<< func2b\n"); } 

cshared1.c

 #include <stdio.h> #include "inc.h" void Client_func1a () { int standardFunctionCall = 0; printf("\t>>>> Client_func1a\n"); if (standardFunctionCall) { func1a(); } printf("\t<<< Client_func1a\n"); return; } void Client_func1b () { int standardFunctionCall = 0; printf("\t>>>> Client_func1b\n"); if (standardFunctionCall) { func1b(); } printf("\t<<< Client_func1b\n"); return; } 

cshared2.c

 #include <stdio.h> #include "inc.h" void Client_func2a () { int standardFunctionCall = 0; printf("\t>>>> Client_func2a\n"); if (standardFunctionCall) { func2a(); /* !!!!!! comment this to avoid crossed link with shared2.c !!!!! */ } printf("\t<<< Client_func2a\n"); return; } void Client_func2b () { int standardFunctionCall = 0; printf("\t>>>> Client_func2b\n"); if (standardFunctionCall) { func2b(); /* !!!!!! ALSO comment this to avoid crossed link with shared2.c !!!!! */ } printf("\t<<< Client_func2b\n"); return; } 

go.sh

 #!/bin/bash ## usage : ## . ./go.sh ## so that the redefinition of LIBPATH is propagated to calling ENV ... ## otherwise : "Dependent module libcshared.so could not be loaded." # default OBJECT_MODE to 64 bit (avoid explicitely setting -X64 options...) export OBJECT_MODE=64 export LIBPATH=.:$LIBPATH # Compile client functions for target binary cc -q64 -c -o cmain.o cmain.c # (1) Shared lib for internal function cc -G -q64 -o libcmain.so cmain.o # Compile common functions cc -c shared2.c shared1.c # Compile client common functions overwrite cc -c cshared2.c cshared1.c # (2) Built libsub.a for common functions (STATIC) ar -rv libsub.a shared1.o shared2.o # (3) Built libcshared.so for client common functions overwrite (SHARED) cc -G -q64 -o libcshared.so cshared1.o cshared2.o # Finally built binary using above (1) (2) (3) # main only call func1a() , so should only include objects shared1 # But pragmatically shared2 is also included if cshared2 reference a possible call to func2() in shared2 !!!!???? # Check this with "nm main |grep shared2" or "nm main |grep func2" or "dump -Tv main |grep func2" cc -q64 -o main main.c -bstatic libsub.a -bshared libcmain.so libcshared.so # result is the same without specifying -bstatic or -bshared #cc -q64 -o main2 main.c libsub.a libcmain.so libcshared.so #If I split libcshared.so into libcshared1.so and libcshared2.so it is also the same : #cc -G -q64 -o libcshared1.so cshared1.o #cc -G -q64 -o libcshared2.so cshared2.o #cc -q64 -o main4 main.c -bstatic libsub.a -bshared libcmain.so libcshared1.so libcshared2.so #If I do not inlcude libcshared2.so, binary is of course well working, without reference to cshared2 nor shared2 . # So why linker chooses to add STATIC shared2.o only if libcshared2.so is listed ? # Is there a way to avoid this add of unused code ? #cc -q64 -o main4 main.c -bstatic libsub.a -bshared libcmain.so libcshared1.so 

Edit2 : added version of go.sh script for RedHat on request

gored.sh

 ## usage : ## . ./gored.sh ## so that the redefinition of LD_LIBRARY_PATH is propagated to calling ENV ... ## otherwise : "Dependent module libcshared.so could not be loaded." export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH # Compile client functions for target binary gcc -fPIC -c cmain.c # (1) Shared lib for internal function gcc -shared -o libcmain.so cmain.o # Compile common functions gcc -c shared2.c shared1.c # Compile client common functions overwrite gcc -fPIC -c cshared2.c cshared1.c # (2) Built libsub.a for common functions (STATIC) ar -rv libsub.a shared1.o shared2.o # (3) Built libcshared.so for client common functions overwrite (SHARED) gcc -shared -o libcshared.so cshared1.o cshared2.o # Finally built binary using above (1) (2) (3) # main only call func1a() , so should only include objects shared1 # But pragmatically shared2 is also included if cshared2 reference a possible call to func2() in shared2 !!!!???? # Check this with "nm main |grep shared2" or "nm main |grep func2" or "dump -Tv main |grep func2" gcc -o main main.c libcmain.so libcshared.so libsub.a #If I split libcshared.so into libcshared1.so and libcshared2.so it is also the same : gcc -shared -o libcshared1.so cshared1.o gcc -shared -o libcshared2.so cshared2.o cc -o main2 main.c libcmain.so libcshared1.so libcshared2.so libsub.a #If I do not inlcude libcshared2.so, binary is of course well working, without reference to cshared2 nor shared2 . # So why linker chooses to add STATIC shared2.o only if libcshared2.so is listed ? # Is there a way to avoid this add of unused code ? cc -o main3 main.c libcmain.so libcshared1.so libsub.a 

Or here are the complete files (without gored.sh) in one .tar.bz2. (6KB).

https://pastebin.com/KsaqacAu

Just copy / paste to a new file (ex poc.uue ). Then enter

 uudecode poc.uue 

and you should get poc.tar.bz2

unzip, untar go to the poc folder and run

 . ./go.sh 

then

 dump -Tv main 

or if in redhat

 nm main 

Example result after gored.sh :

 poc>nm main |grep func2 * U Client_func2a U Client_func2b 0000000000400924 T func2a 000000000040095d T func2b poc>nm main2 |grep func2 U Client_func2a U Client_func2b 0000000000400934 T func2a 000000000040096d T func2b poc>nm main3 |grep func2 poc> 

Edit3: ASCII ART !: -)
Here is the โ€œvisualโ€ final state with unused objects / links. I think the linker is incorrect to enable. Or at least not smart enough to detect as unused. Perhaps this is normal or it is possible to avoid using unused static code in the final binary. This does not look like a difficult situation, as surrounded by the tag "UNUSED !?" the code is not associated with anything? Is not it?

  main.c libsub.a clibsub.so +-----------------------+ +-------------------------+ +-----------------------------+ | main | | +---------------------+ | | +-------------------------+ | | func1a(); <-------------\ | |shared1.o | | | | cshared1.o | | | mainsub() | \------>func1a() { <-------------+ /-----> Client_func1a() { | | | { if UE { | | | if UE { | | | / | | PRE-stuff | | | Client_mainsub() | | | Client_func1a() <-----C---/ | | if (DOIT) { | | | return ^ | | | return | | | | | UE=false | | | } | | | | } else { | | +----------------> func1a() | | | } | | | | do std stuff | | | | UE=true | | +-------------|---------+ | | } | | | | } else { | | | | | | | | | do bespoke stuff | | | | | func1b() { | | | | } | | | | | same as above | | | | POST-stuff | | | | | } | | | | } | | | | +---------------------+ | | | Client_func1b() {} | | | | | | +-------------------------+ | | ***|*******U*N*U*S*E*D**?!***|*****U*N*U*S*E*D**?!*******U*N*U*S*E*D**?!**** | * | +---------------------+ | | +-------------------------+ | * | U | |shared2.o | | | | cshared2.o | | U | * | | func2a() { <-------------+ /-----> Client_func2a() { | | * | N | | if UE { | | | / | | PRE-stuff | | N cmain.so | * | | Client_func2a())<-----C---/ | | if (DOIT) { | | * +-------------|------+ U | | return | | | | | UE=false | | U | cmain.ov | * | | } else { | | +----------------> func2a() | | * | Client_mainsub() | S | | do std stuff | | | | UE=true | | S | {do ur stuff } | * | | } | | | | } else { | | * +--------------------+ E | | | | | | do bespoke stuff | | E * | | func2b() { | | | | } | | * D | | same as above | | | | POST-stuff | | D * | | } | | | | Client_func2b() {} | | * * | +---------------------+ | | +-------------------------+ | * ? +-------------------------+ +---------------------------+ | ? ! ! *********U*N*U*S*E*D**?!*************U*N*U*S*E*D**?!******U*N*U*S*E*D**?!*** 

Any constructive answer to put me on the right track is welcome.

Thanks.

+5
source share
2 answers

Here is a very simplified illustration of the linker behavior that puzzles you:

main.c

 extern void foo(void); int main(void) { foo(); return 0; } 

foo.c

 #include <stdio.h> void foo(void) { puts(__func__); } 

bar.c

 #include <stdio.h> extern void do_bar(void); void bar(void) { do_bar(); } 

do_bar.c

 #include <stdio.h> void do_bar(void) { puts(__func__); } 

Compile all these source files into object files:

 $ gcc -Wall -c main.c foo.c bar.c do_bar.c 

Now we will try to link the program, for example:

 $ gcc -o prog main.o foo.o bar.o bar.o: In function `bar': bar.c:(.text+0x5): undefined reference to `do_bar' 

The undefined do_bar function is referenced only in the definition of bar , and bar not mentioned in programs at all. Why then communication failure?

Simply put, this link failed because we told the linker to link bar.o into a program; so it was; and bar.o contains the definition of bar , which refers to do_bar , which is not defined in the connection. bar no, but do_bar is - from bar , which is connected in the program.

By default, the linker requires that any character referenced by the program link is defined in the binding. If we force it to bind the definition from bar , then it will require the definition of do_bar , because without the definition of do_bar in fact, it did not get the definition of bar . This is if the definition of links is bar , it is not a question of whether we need to bind it, and then allow undefined links to do_bar if the answer is No.

Communication failure can be fixed with:

 $ gcc -o prog main.o foo.o bar.o do_bar.o $ ./prog foo 

Now in this illustration, the bar.o link in the program is simply free. We can also successfully link by simply not telling the linker the bar.o link.

 gcc -o prog main.o foo.o $ ./prog foo 

bar.o and do_bar.o are superfluous for main execution, but the program can be connected only with both, or neither

But suppose foo and bar defined in the same file?

They can be defined in the same object file, foobar.o :

 ld -r -o foobar.o foo.o bar.o 

And then:

 $ gcc -o prog main.o foobar.o foobar.o: In function `bar': (.text+0x18): undefined reference to `do_bar' collect2: error: ld returned 1 exit status 

Now the linker cannot bind the definition of foo without binding the definition of bar . So again, we need to bind the definition of do_bar :

 $ gcc -o prog main.o foobar.o do_bar.o $ ./prog foo 

A related prog contains the definitions of foo , bar and do_bar :

 $ nm prog | grep -e foo -e bar 000000000000065d T bar 0000000000000669 T do_bar 000000000000064a T foo 

( T = specific function symbol).

Equally, foo and bar can be defined in the same shared library:

 $ gcc -Wall -fPIC -c foo.c bar.c $ gcc -shared -o libfoobar.so foo.o bar.o 

and then this binding:

 $ gcc -o prog main.o -L. -lfoobar -Wl,-rpath=$(pwd) ./libfoobar.so: undefined reference to `do_bar' collect2: error: ld returned 1 exit status 

performed the same way as before, and fixed in the same way:

 $ gcc -o prog main.o do_bar.o -L. -lfoobar -Wl,-rpath=$(pwd) $ ./prog foo 

When we link the shared library libfoobar.so , and not the object file foobar.o , our prog has a different symbol table:

 $ nm prog | grep -e foo -e bar 00000000000007aa T do_bar U foo 

This time, prog contains no definitions of either foo or bar . This contains the undefined ( U ) link to foo , since it calls foo and of course this link will now be executed at runtime by definition in libfoobar.so . There is even an undefined reference to bar , and it should not be, because the program never calls bar .

But still, prog contains a do_bar definition, which is now not specified from all functions in the symbol table.

This repeats your own SSCCE, but in a less confusing way. In your case:

  • The libsub.a(shared2.o) object file libsub.a(shared2.o) associated with the program for defining definitions for func2a and func2b .

  • These definitions must be found and linked because they are referenced respectively by the definitions of Client_func2a and Client_func2b , which are defined in libcshared.so .

  • libcshared.so must be associated with providing the definition of Client_func1a .

  • The definition of Client_func1a must be found and linked because it refers to the definition of func1a .

  • And func1a is called main .

This is why we see:

 $ nm main | grep func2 U Client_func2a U Client_func2b 00000000004009f7 T func2a 0000000000400a30 T func2b 

in the symbol table of your program.

It is not at all unusual that definitions should be associated with a program for functions that it does not call. Usually this happens as we saw: a connection, recursively resolving character references starting with main , finds that this requires a definition from f , which it can only get by linking some object file file.o and file.o it also binds the definition of g , which is never called.

Which is rather strange, it ends up with a program like your main , and like my latest version of prog , which contains a definition of an unclaimed function (e.g. do_bar ) that is related to resolving a link from the definition of another unclaimed function (e.g. bar ), which is not defined in the program. Even if redundant definitions of functions exist, we can usually associate them with one or more object files in the link into which the first redundant definitions are inserted along with some necessary corrections.

This oddity arises in the case, for example:

 gcc -o prog main.o do_bar.o -L. -lfoobar -Wl,-rpath=$(pwd) 

since the first definition of the redundant function that must be associated ( bar ) is provided by linking the shared library libfoobar.so , while the definition of the do_bar that bar requires is not in this shared library or in any other shared library, but in the object file.

The bar definition provided by libfoobar.so will remain there when the program is associated with this shared library. He will not be physically associated with the program. This is the nature of dynamic communication. But any object file required by linkage - whether it is a standalone object file like do_bar.o or one that extracts a linker from an archive like libsub.a(shared2.o) - can only be physically connected to the program. So the redundant do_bar appears in the prog character table. But the excess bar , which explains why do_bar is, does not exist. It is located in the libfoobar.so symbol libfoobar.so .

When you find dead code in your program, you may need to make the linker smarter. This can usually be wiser, due to some extra effort. You need to ask him about garbage collection, and before that you need to ask the compiler to prepare the path by creating data partitions and functional sections in object files. See How to remove unused C / C ++ characters using GCC and ld? and the answer

But this way of trimming dead code will not work in the unusual case when dead code is linked in a program to satisfy redundant links from a shared library required by the link. The compiler can only recursively garbage collect unused sections from those that it outputs to the program, and it only outputs sections that are entered from object files, and not from shared libraries that should be dynamically linked.

The correct way to avoid dead code in main and my prog is not to make this peculiar kind of connection, in which the shared library will contain undefined links that the program does not call, but which should be resolved by linking the dead object code in your program.

Instead, when you create a shared library, either do not leave undefined links in it, or leave only undefined links that must be satisfied with its own dynamic dependencies.

So the correct way to create my libfoobar.so :

 $ gcc -shared -o libfoobar.so foo.o bar.o do_bar.o 

This gives me a shared library with the API:

 void foo(void); void bar(void); 

for those who want them or both, and undefined links. then I create my program, which is the client only foo :

 $ gcc -o prog main.o -L. -lfoobar -Wl,-rpath=$(pwd) $ ./prog foo 

And it does not contain dead code:

 $ nm prog | grep -e foo -e bar U foo 

Similarly, if you create your libshared.so without undefined links, for example:

 $ gcc -c -fPIC shared2.c shared1.c $ ar -crs libsub.a shared1.o shared2.o $ gcc -shared -o libcshared.so cshared1.o cshared2.o -L. -lsub 

and then a link to your program:

 $ gcc -o main main.c libcmain.so libcshared.so 

it will also not have dead code:

 $ nm main | grep func U func1a 

If you donโ€™t like the fact that libsub.a(shared1.o) and libsub.a(shared2.o) physically related to this solution, libcshared.so , then take another orthodox approach to linking the shared library: leave all the func* functions undefined in libcshared.so : make libsub also a shared library, which is then a dynamic dependency of libcshared.so .

+3
source

If you just want to get rid of unused functions, you may not need to use a shared library. For GCC, try this . For XL, replace -fdata-sections -ffunction-sections with -qfuncsect . An important related topic is the use of export / import lists and visibility capabilities. They control whether additional characters are exported to your library outside of your library or not. See here for more details.

0
source

Source: https://habr.com/ru/post/1275131/


All Articles