There is no absolutely reliable way to do this for each export.
Each export indicates only the offset in the executable file - logically, it can be considered as code or as data by any other code that refers to it.
As you already mentioned, you can come up with a heuristic to determine the type of export in almost all cases, but it would be easy to find counterexamples that do not work for any heuristic. Take, for example, the rule you suggested:
An exported record will be considered a valid exported function if it has a ret
statement and more than <min>
valid statements exist, and the IDA recognizes the function calling convention.
False negatives: You may have a function that uses tail call optimization and ends with jmp
rather than ret
statements. Any short function will also fail. And there are several ways in which the IDA can be confused so as not to consider the code as a function.
False positives: There may be a string in memory followed by C3
or C2
tags, like db 'BACKGAMMON0',0,0C3h
- this can logically disassemble as a valid 11-command function with ret
and no arguments.
The lines are even more blurred if you think that the export can be logically processed as code and data. Imagine that a sequence of bytes is exported to dynamically allocated memory ā potentially even in another process ā where it is later executed as code.
Perhaps a reasonable suggestion would be to simply trust the IDA and treat the export as a code if the IDA considers it a code. Most of the functionality of the IDA automatically guesses the logical data types, and this is usually pretty good. As you have shown, sometimes this is wrong. But you cannot get 100% accuracy. The best you can do is balance between false negatives and false positives.
Evidence of the unsolvability of this problem:
Whether the export will be performed, since the code is unsolvable. Regardless of whether data exports are exported, data can also be unsolvable. Since we cannot guarantee that this is true, it is impossible to distinguish seemingly ambiguous cases.
Evidence. Suppose we have an oracle A(P,I,E)
that returns 1 if program P
(including all its dependencies) performs (or reads) an export of E
(from any DLL loaded during execution P
) with an "input" (external state) I
Otherwise, it returns 0.
We construct a minimal program Z(P,I,E)
that executes (or reads) the export of E
(a DLL for which it loads into the address space) if and only if A(P,I,E)
returns 0.
Now consider the result Z(Z,I,E)
:
If Z(Z,I,E)
performs (or reads) an export of E
, then A(Z,I,E)
will return 1. But Z(Z,I,E)
is defined as not accessing the export of E
, unless A(Z,I,E)
does not return 0. This is a contradiction.
If Z(Z,I,E)
does not (or does not read) export E
, then A(Z,I,E)
returns 0. But Z(Z,I,E)
is defined so that it will access export E
when A(Z,I,E)
returns 0. This is a contradiction.
Therefore, our initial assumption about the existence of the oracle A(P,I,E)
proved false.
But you can do better with tools ...
Depending on the exact problem you are trying to solve, you can determine which export is a valid function at runtime.
For example, you can write an application that debugs the program you are analyzing and places protect pages on each page containing the export you want to connect. This means that whenever a page has access (being executed / read / written), an exception is raised and the debugger program gains control.
The debugger can check the context of the program to find out what type of access was made and whether it has anything to do with export. If access is an attempt to export, it may perform some interception functions before returning control to the program. Otherwise, it may simply return control to the program.
In any case, the PAGE_GUARD
modifier PAGE_GUARD
removed after each exception, so you will need to return it every time.
Unsurprisingly, this would make the execution of your program very slow , since any R / W / X access to any of the pages containing the export will cause an expensive context switch - this will probably include most of the instructions that are part of your exported functions , as well as several others who have nothing to do with them.
You can use a similar approach with other tools such as Pin .
Please note that you cannot receive information about the use of each export using the toolkit. This is because you may need to determine what input / external state is required to force the program to access each export, to find out if it is used as code or as data (if at all).
Also note that both start and read (or even write) can potentially occur with a single export.