The minimum and maximum of the last 1000 values ​​of the changing list

I am creating an iterative algorithm (Monte Carlo method). The algorithm returns a value at each iteration, creating a stream of values.

I need to analyze these values ​​and stop the algorithm when say 1000 return values ​​with some epsilon .

I decided to implement its calculation of max and min values ​​for the last 1000 values, and then calculate error using this formula (max-min)/min and compare it with epsilon : error<=epsilon , And if this condition is reached, stop the iteration and return result.

  • The first idea associated with the expression was to use the list and append values ​​for it, calculating the max and min values ​​for the last 1000 values ​​after each addition.

  • Then I decided that to use more than 1000 last values. So I remembered deque . This was a very good idea, since the complexity of adding and removing at both ends of the deque is O(1) . But this did not solve the problem of having to go through all the last 1000 values ​​at each iteration to calculate min and max .

  • Then I remembered that there is a heapq module. It stores data in such a way as to efficiently return the smallest of them at every moment. But I need both the smallest and the largest. In addition, I need to keep the order of the elements so that I can save the 1000 last returned elements of the algorithm, and I don’t see how this can be done with heapq .

Having all these thoughts, I decided to ask here:

How can I solve this problem most effectively?

+6
source share
6 answers

If you are free / willing to change your definition of error , you may need to use variance instead of (max-min)/min .

You can calculate the variance step by step . True, using this method, you do not remove any values ​​from your stream - the variance will depend on all values. But what? With sufficient values, the first few will not be of great importance for the variance, and the variance of the variance/n average value will become small when a sufficient number of values ​​are grouped around some fixed value.

So you can stop when variance/n < epsilon .

+7
source

As a refinement of @unutbu's excellent idea, you might consider using exponentially-weighted moving variance. It can be calculated in O(1) observation time, requires O(1) space, and has the advantage of automatically reducing the weight of the observation as the observation gets older.

The following article provides the appropriate formulas: link . See Equations (140) - (143).

Finally, you can work with standard deviation instead of variance. This is simply the square root of the variance and has the advantage that the units are the same as the original data. This should facilitate the formulation of meaningful stopping criteria.

+6
source

How about numpy?

Just to compare speed:

 import numpy as np a = range(1000) b = np.arange(1000) max(a) # 29.7us b.max() # 7.29us 

and you can write to this array endlessly:

 i = 0 b = np.empty([1000]) + np.nan your loop: b[i % 1000] = value i += 1 

Values ​​older than 1000 iterations will be overwritten. You get minimum / maximum with np.nanmin(b) and np.nanmax(b) .

The idea of nan is that you initialize this array with 1000 nan and rewrite them one by one. The nanmin and nanmax ignore these nano.

+4
source

I am afraid that I am not able to provide a good answer in Python, but I will give you a data structure diagram that you need to use:

Keep 1000 items in the FIFO queue. Keep pointers to the largest and smallest items in line. If one of them leaves the queue, search the queue for the new largest / smallest (Amortized cost depends on your data). If the queue includes the new highest / lowest value, simply refresh the pointer (O (1)). Assuming your data is converging, this should work well.

+3
source

Create a subclass of deque that has the minvalue and maxvalue properties. When adding or deleting records, compare them with the current min and max values ​​- then you only need to re-examine the deque value for min / max, if the value you are deleting is the current min or max. And when adding, simply compare the new value with the current min and max and update accordingly. This optimizes the scan of your deque for min / max values.

+1
source

You can use two heaps of fibonacci . Adding values ​​to O (1), deletion occurs in O (log (n)). In your question, you already offer the heapq module. I'm not sure what kind of heap he provides, but normal will work very smoothly too.

The problem is that you can only extract at least one heap, but not the maximum, can be solved by saving two heaps. Since I do not know the heapq module, you can either provide it with your own comparison function, or simply use -value instead of value for the key of the second heap.

+1
source

Source: https://habr.com/ru/post/899960/


All Articles