Given the return address, how to get the function address?

Question

Given the return address, how to get the function address?

Suppose in a piece of C code I have a function foo that calls a bar . Inside the bar, I can use the assembly to get the address to which the bar will return. How to use this information to determine the address of foo ?

One approach would be to get the return address that will be returned by foo , and get the address from the operation code of the call command that calls foo . However, this requires knowledge of which invocation method (e.g. offset / absolute) is used, and therefore unreliable. Is there an easier way to determine the caller's address?

edit: I forgot to mention that this question is about building IA32 on 32-bit Unix Unix machines.

+4

c assembly x86

user2467539 Jun 09 '13 at 3:35

source share

3 answers

Ejp · Answer 1 · 2013-06-09T03:53:10+0000

One approach is to get the return address that foo will return, and get the address from the operation code of the call instruction that calls foo.

A? This will give you the address of the bar, not foo.

All you need is the highest procedure entry point, which is less than the return address.

500 - Internal Server Error · Answer 2 · 2013-06-09T07:18:13+0000

Assuming that there are regular page frames, and that bar was called with a normal call (as opposed to an indirect register), to get the address of bar , you again “go out” one level and find call bar .

So far in foo your stack will look something like this:

 . . parameters to bar (if any) return address, ie address following 'call bar' saved base page (ebp register) value locals to bar ... parameters to foo (if any) return address, ie address following 'call foo' within bar saved base page (ebp register) value locals to foo

So, to get the bar address from foo , you will do something like the following (this is not okay, so minor adjustments may be required, but you should get a general idea).

 mov eax, [ebp] // load calling scope (bar's) frame pointer mov eax, [eax+4] // load the return address for bar mov edx, [eax-4] // load offset from the call instruction that called bar lea eax, eax+edx // adjust (or something similar) to convert from offset to abs

FrankH. · Answer 3 · 2013-06-10T15:07:20+0000

On Linux, you can use dladdr() to solve the calling function using:

 #define _GNU_SOURCE #include <dlfcn.h> ... void *retAddr = __builtin_extract_return_addr(__builtin_return_address(0)); Dl_info d; (void)dladdr(retAddr, &d); printf("%s called from %s + 0x%p\n", __FUNC__, d.dli_sname, (retAddr - d.dli_saddr));

See the GCC docs, __builtin_return_address() and the Linux man page dladdr(3) for more details.

The dladdr() function is also available on Solaris / MacOSX / * BSD, but requires a different preprocessor than _GNU_SOURCE to get visibility; see files for the relevant operating system (s) ...

Edit: Please note that since this depends on the presence of a character table, it may not be successfully resolved in shared binaries. I did not try to add error handling above; in general, support for any type of automatic return (with support for resolving function names) does not mean that symbol tables are deleted.

For really fast, sometimes I just use:

 #include <execinfo.h> ... void *retAddr[10]; backtrace_symbols_fd(retAddr, backtrace(retaddr, 10), STDERR_FILENO);

as it gets ten-digit stack depth. Again, relying on the lack of withdrawal of the characters. There is a performance penalty for this, since you allow more than one address.

Edit2: Without symbol tables (which, among other things, contain the starting address and size for functions in the executable file / library), information that the "starting address" does not make sense; as far as the CPU itself is concerned, in fact there is no record of how the instruction pointer arrived at the place where it is at a certain moment - the equivalent of the goto assembly ( jmp ) or other strange bows of the self - modulating instructions are also "valid" for the CPU , as correctly structured, generated by the compiler codes. The x86 instructions are variable sizes, and the card of the operation code is quite dense, so almost any random sequence of bytes constitutes a "valid" stream of instructions; therefore heuristic reverse disassembly - binary disassembly is not a 100% safe thing.

Character tables in this sense also set “markers” for debuggers. You can expect that you will find a valid flow of commands if you start parsing the functions at the start addresses, as written in the symbol table, and you can cross-check this by confirming that any return addresses found in the return traces are preceded by a call statement.

Given the return address, how to get the function address?

More articles: