C ++ binary identification (manifest)

We have a large set of C ++ projects (GCC, Linux, mostly static libraries) with many dependencies between them. Then we compile the executable file using these libraries and deploy the binary file in the interface. It would be extremely useful to define this binary. Ideally, we would like to have a small script that would extract the following information from a binary file:

$ident binary $binary : Product=PRODUCT_NAME;Version=0.0.1;Build=xxx;User=xxx... $ dependency: Product=PRODUCT_NAME1;Version=0.1.1;Build=xxx;User=xxx... $ dependency: Product=PRODUCT_NAME2;Version=1.0.1;Build=xxx;User=xxx... 

Thus, it should display all the information for the binary itself and all its dependencies.

Currently, our approach:

  • At compile time for each product, we create Manifest.h and Manifest.cpp, and then type Manifest.o in binary

  • ident script analyzes the target binary file, finds the generated material there and prints this information

However, this approach is not always reliable for different versions of gcc .. I would like to ask the SO community - is there a better approach to solve this problem?

Thanks for any advice.

+4
source share
3 answers

One of the catches of storing data in the source code (your Manifest.h and .cpp ) is a size limit for literal data, which depends on the compiler.

My suggestion is to use ld . It allows you to store arbitrary binary data in your ELF file (e.g. objcopy ). If you prefer to write your own solution, check out libbfd .

Say we have hello.cpp containing the usual C ++ example "Hello world". Now we have the following make file ( GNUmakefile ):

 hello: hello.o hello.om $(LINK.cpp) $^ $(LOADLIBES) $(LDLIBS) -o $@ %.om: %.manifest ld -b binary -o $@ $< %.manifest: echo " $@ " > $@ 

What I'm doing here is to highlight the build phase, because I want the manifest (after converting to the ELF object format) to be associated with the binary. Since I use suffix rules, this is one way to go, others are certainly possible, including a better manifest naming scheme, where they also end up in .o files, and GNU make can figure out how to create them. Here I am talking about a recipe. Thus, we have .om files, which are manifests (arbitrary binary data) created from .manifest files. The recipe specifies the conversion of the binary input to an ELF object. The recipe for creating .manifest itself simply inserts a line into the file.

Obviously, the hard part of your business is not storing manifest data, but rather generating it. And frankly, I know too little about your build system to even try to suggest a recipe for the .manifest generation.

Whatever you choose in your .manifest file, there probably should be some kind of structured text that can be interpreted using the script you mention, or which can even be output by the binary itself if you implement the command line switch (and Ignore .so files and .so files cracked like regular executables when starting from the shell).

The above make file does not take into account dependencies - or rather, it will not help you create a list of dependencies in any way. You can probably get GNU to help you with this if you clearly express your dependencies for each purpose (i.e. static libraries, etc.). But do not waste this route ...

See also:


If you need specific names for the characters generated from the data (in your case, the manifest), you need to use a slightly different route and use the method described by John Ripley here .

How to access characters? Easy. Declare them as external (C linkage!) Data, and then use them:

 #include <cstdio> extern "C" char _binary_hello_manifest_start; extern "C" char _binary_hello_manifest_end; int main(int argc, char** argv) { const ptrdiff_t len = &_binary_hello_manifest_end - &_binary_hello_manifest_start; printf("Hello world: %*s\n", (int)len, &_binary_hello_manifest_start); } 

Characters are exact characters / bytes. You can also declare them as char[] , but this will lead to problems in the future. For instance. to call printf .

The reason I calculate the size myself is because.) I don’t know if the buffer will be guaranteed to be zero and b.) I did not find any documentation on the interaction with the *_size variable.

Side note: * in the format string tells printf that it should read the length of the string from the argument, and then select the next argument as the string to print. A.

+5
source

You can paste any data you like into the .comment section of the binary output file. You can do this with the linker after the fact, but it's probably easier to put it in C ++ code like this:

  asm (".section .comment.manifest\n\t" ".string \"hello, this is a comment\"\n\t" ".section .text"); int main() { .... 

In this case, the asm operator must go beyond any function. This should work as long as your compiler places the regular functions in the .text section. If this is not the case, you should make an obvious replacement.

The component must assemble all .comment.manifest sections into one block in the final binary. You can extract them from any .o or executable file using this:

 objdump -j .comment.manfest -s example.o 
+2
source

Have you considered using the standard packaging system for your distribution? In our company we have thousands of packages, and hundreds of them are automatically deployed every day.

We use debian packages containing all the necessary information:

  • A complete change log that includes:
    • authors;
    • Version
    • Brief descriptions and time stamps of changes.
  • Addiction Information:
    • A list of all packages that must be installed for the current one to work correctly.
  • Installation scripts that install the environment for the package.

I think you may not need to create manifests in your own way, as soon as a ready-made solution already exists. You can see the debian package HowTo here .

0
source

Source: https://habr.com/ru/post/1435873/


All Articles