Measuring the size of a function created using Clang / LLVM?

Recently, when I was working on a project, I needed to measure the size of the C function in order to be able to copy it somewhere else, but could not find any β€œclean” solutions (in the end, I just wanted the label to be inserted At the end of a function that I could reference).

Having written an LLVM backend for this architecture (although it may look like an ARM, it is not) and knowing that he chose the assembly code for this architecture, I chose the following hack (I think the comment explains it pretty well):

/*************************************************************************** * if ENABLE_SDRAM_CALLGATE is enabled, this function should NEVER be called * from C code as it will corrupt the stack pointer, since it returns before * its epilog. this is done because clang does not provide a way to get the * size of the function so we insert a label with inline asm to measure the * function. in addition to that, it should not call any non-forceinlined * functions to avoid generating a PC relative branch (which would fail if * the function has been copied) **************************************************************************/ void sdram_init_late(sdram_param_t* P) { /* ... */ #ifdef ENABLE_SDRAM_CALLGATE asm( "b lr\n" ".globl sdram_init_late_END\n" "sdram_init_late_END:" ); #endif } 

It worked as desired, but required some assembler mark code to invoke it, and it is a rather dirty hack that worked only because I could suggest a few things about the code generation process.

I also looked at other ways to do this, which would work better if LLVM emits machine code (since this approach will break as soon as I add the MC emitter to my LLVM server). The approach I considered included executing a function and searching for a terminator instruction (which would be either a b lr instruction or a variation of pop ..., lr ), but could also lead to additional complications (although this seemed better than mine initial decision).

Can anyone suggest a cleaner way to get the size of a C function without resorting to incredibly ugly and untrustworthy hacks like the ones described above?

+5
source share
1 answer

I think you're right that there are no really portable ways to do this. Compilers are allowed to reorder functions, so accepting the next function in the original order is unsafe (but works in some cases).


If you can parse an object file ( possibly with libbfd ), you can get function sizes from this.

clang asm output contains this metadata (assembler directive .size after each function), but I'm not sure if it ends in the object file.

 int foo(int a) { return a * a * 2; } ## clang-3.8 -O3 for amd64: ## some debug-info lines manually removed .globl foo foo: .Lfunc_begin0: .cfi_startproc imul edi, edi lea eax, [rdi + rdi] ret .Lfunc_end0: .size foo, .Lfunc_end0-foo ####### This line 

Compiling this in .o with clang-3.8 -O3 -Wall -Wextra func-size.c -c , I can do the following:

 $ readelf --symbols func-size.o Symbol table '.symtab' contains 4 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS func-size.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 3: 0000000000000000 7 FUNC GLOBAL DEFAULT 2 foo ### This line 

Three commands contain 7 bytes, which corresponds to the size output here. It does not include an add-on to align the entry point or the following function: the .align directives are outside of the two labels that are subtracted to compute .size .

This probably works poorly for remote executables. Even their global functions will not yet be present in the symbol table of the executable file. Thus, you may need a two-step build process:

  • compile your "normal" code
  • get function sizes you need in a table using readelf | some text processing > sizes.c readelf | some text processing > sizes.c
  • compile sizes.c
  • Tie it all together

Caveat

A truly smart compiler can compile several similar functions to share a common implementation. Thus, one of the functions goes into the middle of the other body of the function. If you're lucky, all the functions are grouped together with the "size" of each dimension from the entry point to the end of the code blocks that it uses. (But this overlap will cause the total sizes to be larger than the file size.)

Current compilers do not do this, but you can prevent it by placing this function in a separate compilation unit , rather than using time optimization of the whole program.

The compiler may decide to put a conditionally executable code block in front of the function entry point, so the branch can use shorter coding for a little movement. This makes this block look like a static "helper" function , which probably won't be included in the calculation of the "size" for the function. However, current compilers do not do this either.


Another idea that I'm not sure about is security :

Put asm volatile only with the label definition at the end of your function, and then suppose that the size of the function does not exceed + 32 bytes or something like that. Therefore, when you copy a function, you allocate a 32B buffer larger than your "calculated" size. Hopefully there is only a β€œret” insn above the label, but in reality it probably goes to a functional epilogue that displays all the registers stored in the codes that it used.

I don’t think that the optimizer can duplicate the asm volatile operator, so it will force the compiler to switch to the general epilogue instead of duplicating the epilogue, as it can sometimes be for early conditions.

But I'm not sure that there is an upper bound on how much can end after asm volatile.

+2
source

Source: https://habr.com/ru/post/1247315/


All Articles