Loop to steady state complex data structure in Python

Question

Loop to steady state complex data structure in Python

I have a more or less complicated data structure (a list of set dictionaries) on which I perform a bunch of operations in a loop until the data structure reaches a steady state, i.e. no longer changing. The number of iterations required to perform the calculation varies greatly depending on the input.

I would like to know if there is an established way to form a stop condition in this case. The best I could come up with was to collect the data structure, store it with md5 and check if it had changed from the previous iteration. Since it is more expensive than my operations, I do it every 20 iterations, but still, it feels wrong.

Is there a better or cheaper way to test deep equality so that I know when to stop?

Thanks!

+4

python

jozmos Nov 08 '11 at 14:05

source share

4 answers

Take a look at python-deep . It should do what you want, and if it is not fast enough, you can change it yourself.

It also greatly depends on how expensive the comparison operation is and how expensive the iteration of the calculation. Say, one iteration of the calculations takes time c , and one test takes t time, and the probability of completion is p , then the optimal test frequency:

 (t * p) / c

This assumes c < t , if it is not, you should obviously check every loop.

So, since you can dynamically track c and t and evaluate p (with possible adaptations in the code, if the code suspects that the calculation is complete), you can set your test frequency to the optimal value.

+4

orlp Nov 08 '11 at 14:28

source share

Checking for equality does not seem to me the right way. Provided that you have full control over the operations you perform, I would add a “modified” flag (a boolean variable) that is set to false at the beginning of each iteration. Whenever one of your operations changes (part) of your data structure, it is set to true, and repetition is performed until the modified one remains “false” throughout the iteration.

+2

misberner Nov 08 '11 at 14:14

source share

I would be sure that the python equality operator would be efficient enough to compare compositions of inline objects. I expect this to be faster than etching + hashing, provided that the python tests for list equality look something like this:

 def __eq__(a,b): if type(a) == list and type(b) == list: if len(a) != len(b): return False for i in range(len(a)): if a[i] != b[i]: return False return True #testing for other types goes here

Since the function returns as soon as it finds two elements that do not match, in the average case it will not need to iterate over all this. Compare with hashing, which you need to iterate over the entire data structure, even in the best case.

Here's how I would do it:

 import copy def perform_a_bunch_of_operations(data): #take care to not modify the original data, as we will be using it later my_shiny_new_data = copy.deepcopy(data) #do lots of math here... return my_shiny_new_data data = get_initial_data() while(True): nextData = perform_a_bunch_of_operations(data) if data == nextData: #steady state reached break data = nextData

This has the disadvantage that you need to make a deep copy of your data at each iteration, but it can be faster than hashing - you can only know exactly by profiling your particular case.

+1

Kevin Nov 08 '11 at 14:50

source share

Raymond hettinger · Accepted Answer · 2011-11-08T14:25:54+0000

I think your only choice:

Each update is marked with a “dirty flag” when it changes the value from its initial state.
Performing a holistic analysis of the structure (for example, the proposed pickle / md5 combination).
Just run a fixed number of iterations that are known to have reached a steady state (perhaps performed too many times, but don't have the overhead of checking the termination condition).

Option 1 is similar to what Python itself does with the ref count. Option 2 is similar to what Python does with its garbage collector. Option 3 is common in numerical analysis (i.e., mileage and average of 20 times to calculate the square root).

Loop to steady state complex data structure in Python

More articles: