I think you're right that there are no really portable ways to do this. Compilers are allowed to reorder functions, so accepting the next function in the original order is unsafe (but works in some cases).
If you can parse an object file ( possibly with libbfd ), you can get function sizes from this.
clang asm output contains this metadata (assembler directive .size after each function), but I'm not sure if it ends in the object file.
int foo(int a) { return a * a * 2; } ## clang-3.8 -O3 for amd64: ## some debug-info lines manually removed .globl foo foo: .Lfunc_begin0: .cfi_startproc imul edi, edi lea eax, [rdi + rdi] ret .Lfunc_end0: .size foo, .Lfunc_end0-foo ####### This line
Compiling this in .o with clang-3.8 -O3 -Wall -Wextra func-size.c -c , I can do the following:
$ readelf --symbols func-size.o Symbol table '.symtab' contains 4 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS func-size.c 2: 0000000000000000 0 SECTION LOCAL DEFAULT 2 3: 0000000000000000 7 FUNC GLOBAL DEFAULT 2 foo
Three commands contain 7 bytes, which corresponds to the size output here. It does not include an add-on to align the entry point or the following function: the .align directives are outside of the two labels that are subtracted to compute .size .
This probably works poorly for remote executables. Even their global functions will not yet be present in the symbol table of the executable file. Thus, you may need a two-step build process:
- compile your "normal" code
- get function sizes you need in a table using
readelf | some text processing > sizes.c readelf | some text processing > sizes.c - compile sizes.c
- Tie it all together
Caveat
A truly smart compiler can compile several similar functions to share a common implementation. Thus, one of the functions goes into the middle of the other body of the function. If you're lucky, all the functions are grouped together with the "size" of each dimension from the entry point to the end of the code blocks that it uses. (But this overlap will cause the total sizes to be larger than the file size.)
Current compilers do not do this, but you can prevent it by placing this function in a separate compilation unit , rather than using time optimization of the whole program.
The compiler may decide to put a conditionally executable code block in front of the function entry point, so the branch can use shorter coding for a little movement. This makes this block look like a static "helper" function , which probably won't be included in the calculation of the "size" for the function. However, current compilers do not do this either.
Another idea that I'm not sure about is security :
Put asm volatile only with the label definition at the end of your function, and then suppose that the size of the function does not exceed + 32 bytes or something like that. Therefore, when you copy a function, you allocate a 32B buffer larger than your "calculated" size. Hopefully there is only a βretβ insn above the label, but in reality it probably goes to a functional epilogue that displays all the registers stored in the codes that it used.
I donβt think that the optimizer can duplicate the asm volatile operator, so it will force the compiler to switch to the general epilogue instead of duplicating the epilogue, as it can sometimes be for early conditions.
But I'm not sure that there is an upper bound on how much can end after asm volatile.