I am writing a concolic engine for Python using the sys.settrace() functionality.
The main task during this kind of execution is to write down restrictions on the input variables. Constraints are nothing more than the conditions of if statements that create two branches (the "then" and "else" branches).
When the execution is completed, the engine selects the restriction and finds the corresponding values ββfor the inputs, so that the execution will go down another branch (when executing x, it goes to the "then" branch, when executing x + 1 it goes along the "else" branch).
This should be a bit of context, why am I doing what I'm trying to do ...
By combining the settrace() module and dis , I get to see the bytecode of each line in the source, just before it executes. That way, I can easily write down if conditions that appear at runtime.
But then I have a big problem. I need to know which way I went, which branch I was executed. Therefore, if my code looks something like this:
if x > a: print x else: print a
at some point my trace will see:
t: if x > 0:
then the python interpreter will do if and jump (or not) somewhere. And I will see:
t + 1: print x
So, the instruction t + 1 in the then branch or in the else? Keep in mind that the trace function only sees some bytecode in the current block.
I know two ways to do this. One of them is to evaluate the condition in order to determine exactly whether it is true or false. This only works in the absence of side effects.
Another way is to try to look at the pointer to t + 1 and try to understand where we are in the code. This is how I use it now, but it is very delicate, because in t + 1 I could find myself completely different (different module, built-in function, etc.).
So the question I have is this: is there a way to get the result of the last conditional jump from Python itself or from the C / extension / what module?
Alternatively, are there finer-grained trace parameters? Something like executing the bytecode of one operation code at a time. With the settrace() function, the maximum resolution that I get is whole lines of source code.
In the worst case scenario, I think I can modify the Python interpreter to expose such information, but I would leave this as a last resort for obvious reasons.