Reducing the printout of debugging symbols (the executable file is bloated up to 4 GB)

Thus, the main problem is that my embedded executable has a size of 4 GB with debugging symbols turned on (from 75 MB to 300 MB without debugging symbols and various optimization levels). How can I diagnose / analyze where all these symbols come from, and which are the biggest violators in terms of ranking? I found a few questions about reducing the indefatigable executable size (although they weren’t terribly covered), but here I am mainly focused on reducing the mess of debugging symbols. The executable file is so large that it takes a lot of time to load all the characters, which makes debugging difficult. Perhaps reducing code bloat is a fundamental task, but first I would like to know where my 4GB is spent.

Running the executable via 'size --format = SysV' I get the following output:

section size addr .interp 28 4194872 .note.ABI-tag 32 4194900 .note.gnu.build-id 36 4194932 .gnu.hash 714296 4194968 .dynsym 2728248 4909264 .dynstr 13214041 7637512 .gnu.version 227354 20851554 .gnu.version_r 528 21078912 .rela.dyn 37680 21079440 .rela.plt 15264 21117120 .init 26 21132384 .plt 10192 21132416 .text 25749232 21142608 .fini 9 46891840 .rodata 3089441 46891872 .eh_frame_hdr 584228 49981316 .eh_frame 2574372 50565544 .gcc_except_table 1514577 53139916 .init_array 2152 56753888 .fini_array 8 56756040 .jcr 8 56756048 .data.rel.ro 332264 56756064 .dynamic 992 57088328 .got 704 57089320 .got.plt 5112 57090048 .data 22720 57095168 .bss 1317872 57117888 .comment 44 0 .debug_aranges 2978704 0 .debug_info 278337429 0 .debug_abbrev 1557345 0 .debug_line 13416850 0 .debug_str 3620467085 0 .debug_loc 236168202 0 .debug_ranges 37473728 0 Total 4242540803 

from which, I think, we see that "debug_str" takes ~ 3.6 GB. I don’t know what “debug_str” is 100%, but I assume that they can literally be debug symbol string names? So does this tell me that the distorted names of my characters are just insanely big? How can I determine which ones to fix?

I think I can somehow do something with "nm" by directly checking the symbol names, but the result is huge, and I'm not sure how to look for it better. Are there any tools for such an analysis?

The compiler used was "C ++ (GCC) 4.9.2". And I think I should mention that I work in a Linux environment.

+5
source share
3 answers

One trick I use is to run strings in an executable that will print all of these long (probably due to patterns) and multiple (ditto) names of debugging symbols. You can pass it to sort | uniq -c | sort -n sort | uniq -c | sort -n sort | uniq -c | sort -n and see the results. In many large C ++ executables, you will see these patterns:

 my_template<std::basic_string<char, traits, allocator>, std::unordered_map<std::basic_string<char, traits, allocator>, 1L> my_template<std::basic_string<char, traits, allocator>, std::unordered_map<std::basic_string<char, traits, allocator>, 2L> my_template<std::basic_string<char, traits, allocator>, std::unordered_map<std::basic_string<char, traits, allocator>, 3L> 

You get the idea.

In some cases, I decided to simply reduce the number of templates. Sometimes he gets out of control. In other cases, you can win something by explicitly creating a template or compiling certain parts of your project without debugging characters or even disabling RTTI if you do not rely on dynamic_cast or typeid .

+1
source

I think I can somehow do something with "nm" by directly checking the symbol names, but the result is huge, and I'm not sure how to look for it better. Are there any tools for such an analysis?

You can do the following to order all nm characters by the length of the character:

 nm --no-demangle -a -P --size-sort myexecutable \ | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2- 

(Kudos to Sort the text file by the length of the line, including spaces for everything after the first | .) This will show long names. You can continue to output the output in c++filt -t to get the unmanned names, which may help you in your search.

Depending on your situation, it would be useful to split the executable file and its debugging symbols into separate files, which will allow you to distribute the less bloated executable file to the target environments / clients / etc. and save debugging symbols in one place if necessary. See How to generate debug gcc code outside the build target? for some details.

+1
source

So, I tracked down the main culprit, doing the following, based mainly on John Zwink's answer . Essentially, I just followed his suggestion to just run the “string” in the executable and parse the output.

 string my_executable > exec_strings.txt 

Then I sorted the result, mainly the following mindriot method :

 cat exec_strings.txt | awk '{ print length, $0 }' | sort -n -s | cut -d" " -f2- > exec_strings_sorted.txt 

and looked at the longest lines. In fact, it all looked like a crazy patterned bloat, from a specific library. Then I did a little more, considering:

 cat exec_strings.txt | wc -l 2928189 cat exec_strings.txt | grep <culprit_libname> | wc -l 1108426 

to see that of the approximately 3 million rows that were extracted, it seems that about 1 million of them came from this library. Finally making

 cat exec_strings.txt | wc -c 3659369876 cat exec_strings.txt | grep <culprit_libname> | wc -c 3601918899 

it became apparent that these millions of lines are very large and make up most of the debug symbol trash. So at least for now, I can focus on this library, trying to remove the root of the problem.

+1
source

Source: https://habr.com/ru/post/1258753/


All Articles