Is it possible to programmatically obtain a function signature in a shared library?

The name is clear, we can load the dl_open library, etc.

But how can I get function signatures in it?

+6
source share
6 answers

This answer cannot be answered in general. Technically, if you compiled your executable file with comprehensive debugging information (the code may still be an optimized, release version), then the executable will contain additional sections that provide some binary reflectivity. On * nix systems (you referred to dl_open ) this is implemented using DWARF debugging data in additional ELF sections. Similarly, it works for generic Mach binaries on MacOS X.

However, Windows PE uses a completely different format, so, unfortunately, DWARF is not a cross platform (in fact, in the early stages of developing my 3D mechanism, I implemented an ELF / DWARF loader for Windows, so I could use a common format for engines of various modules , therefore, with some serious efforts this can be done).

If you do not want to implement your own loaders or debug information access tools, you can insert reflection information through some additional symbols exported (according to some standard naming scheme) that refer to the table of function names, displaying their signatures. In the case of C source files, writing a parser to extract information from the source file itself is pretty trivial. C ++ OTOH is so sadly hard to parse correctly that you need some kind of full-fledged compiler to get everything right. To do this, GCCXML, technically GCC, was developed which emits an AST in XML form instead of a binary object. The emitted XML is then much easier to parse.

From the extracted information, create the source file with some linked list / array / etc. structure describing each function. If you do not directly export each symbol of the function, but instead initialize any field in the reflection structure using the function pointer, you get a really beautiful and clean annotated export scheme. Technically, you can put this information in the spearate section of the binary as well, but putting it in a read-only section also does the job, too.


However, if you were provided with a third-party binary - say, the worst-case scenario, it was compiled from source C, no debugging information and all the characters that are not mentioned externally - you messed up a lot. The best thing you could do was apply some binary analysis of how the function accesses the various places in which parameters can be passed.

This will only tell you the number of parameters and the size of each parameter value, but not the type or name / value. In reverse engineering, some programs (such as malware analysis or security audits), determining the type and value of parameters passed to functions, is one of the main efforts. Recently, I came across some driver that I had to cancel for debugging purposes, and you cannot believe how amazed I was to find C ++ characters in the Linux kernel module (you cannot use C ++ in the Linux kernel image), but also with relief, because the C ++ language name provided me a lot of information.

+7
source

No, It is Immpossible. The signature of a function does not mean anything at run time, its part of the information useful at compilation time to check your program.

+4
source

You can not. Either the library publishes an open API in the header, or you need to know the signature in other ways.

+1
source

The parameters of the function at the lower level depend on how many stack arguments in the stack frame you consider and how you interpret them. Therefore, as soon as the function is compiled into object code, it is impossible to obtain such a signature. One remote possibility is to parse the code and read how it works in order to know the number, if the parameters, but still the type will be difficult or impossible to determine. In a word, this is impossible.

0
source

This information is not available. Even the debugger does not know:

 $ cat foo.c #include <stdio.h> #include <string.h> int main(int argc, char* argv[]) { char foo[10] = { 0 }; char bar[10] = { 0 }; printf("%s\n", "foo"); memcpy(bar, foo, sizeof(foo)); return 0; } $ gcc -g -o foo foo.c $ gdb foo Reading symbols from foo...done. (gdb) b main Breakpoint 1 at 0x4005f3: file foo.c, line 5. (gdb) r Starting program: foo Breakpoint 1, main (argc=1, argv=0x7fffffffe3e8) at foo.c:5 5 { (gdb) ptype printf type = int () (gdb) ptype memcpy type = int () (gdb) 
0
source

On Linux (or Mac) you can use a combination of "nm" and "C ++ filt" (for C ++ libraries)

nm mylibrary.so | C ++ FILTER

or

nm mylibrary.a | C ++ FILTER

"nm" will give you a distorted shape and "C ++ filt" will try to put them in a more readable format. You can use some parameters in nm to filter the results, especially if the library is large (or you can "grep" the final output to find a specific element).

0
source

Source: https://habr.com/ru/post/893927/


All Articles