Python tracing and conditional jumps

I am writing a concolic engine for Python using the sys.settrace() functionality.

The main task during this kind of execution is to write down restrictions on the input variables. Constraints are nothing more than the conditions of if statements that create two branches (the "then" and "else" branches).

When the execution is completed, the engine selects the restriction and finds the corresponding values ​​for the inputs, so that the execution will go down another branch (when executing x, it goes to the "then" branch, when executing x + 1 it goes along the "else" branch).

This should be a bit of context, why am I doing what I'm trying to do ...

By combining the settrace() module and dis , I get to see the bytecode of each line in the source, just before it executes. That way, I can easily write down if conditions that appear at runtime.

But then I have a big problem. I need to know which way I went, which branch I was executed. Therefore, if my code looks something like this:

 if x > a: print x else: print a 

at some point my trace will see:

 t: if x > 0: 

then the python interpreter will do if and jump (or not) somewhere. And I will see:

 t + 1: print x 

So, the instruction t + 1 in the then branch or in the else? Keep in mind that the trace function only sees some bytecode in the current block.

I know two ways to do this. One of them is to evaluate the condition in order to determine exactly whether it is true or false. This only works in the absence of side effects.

Another way is to try to look at the pointer to t + 1 and try to understand where we are in the code. This is how I use it now, but it is very delicate, because in t + 1 I could find myself completely different (different module, built-in function, etc.).

So the question I have is this: is there a way to get the result of the last conditional jump from Python itself or from the C / extension / what module?

Alternatively, are there finer-grained trace parameters? Something like executing the bytecode of one operation code at a time. With the settrace() function, the maximum resolution that I get is whole lines of source code.

In the worst case scenario, I think I can modify the Python interpreter to expose such information, but I would leave this as a last resort for obvious reasons.

+6
source share
2 answers

In the end, this is what I did. I implemented the AST and it works very well.

When playing with AST, you need to move all function calls (also attributes and subscriptions due to getattr() and friends, from if conditions, creating temporary variables. You also need to separate and and or .

Then add a call to your own function at the beginning of each branch with the boolean True for the then branch and False for the else branch.

After that, I wrote an AST to source converter (it’s somewhere on the network, but it doesn’t work with current versions of Python).

Working with AST is very simple and fairly simple, I ended up doing three transformations, adding a few import statements as well.

This is the first pass, as an example. It is broken if the conditions, if they contain the operators or or and :

 class SplitBoolOpPass1(ast.NodeTransformer): def visit_If(self, node): while isinstance(node.test, ast.BoolOp): new_node = ast.If(test=node.test.values.pop(), body=node.body, orelse=node.orelse) if isinstance(node.test.op, ast.And): if len(node.test.values) == 1: node.test = node.test.values[0] node.body = [new_node] else: if len(node.test.values) == 1: node.test = node.test.values[0] node.orelse = [new_node] node = self.generic_visit(node) # recusion return node 

This is probably not very useful for code spanning applications because it is rather complicated with the code.

+4
source

There is no information on the last occupied field in the trace object.

What I did to measure the measurement of branch coverage in the scope is to save a record for each stack frame of the last line, then the next time I call the trace function, I can write a couple of line numbers that form from-to the execution arc.

About finer-grained tracing: you can trick the Python interpreter into providing information about byte code. My experiment in this is described here: Wicked hack: Python Byte Code Tracing

I would be very interested to see how this work progresses!

+5
source

Source: https://habr.com/ru/post/894297/


All Articles