Find span where condition is True using NumPy

Imagine that I have a numpy array, and I need to find the intervals / ranges where this condition is true. For example, I have the following array in which I try to find gaps where the elements are greater than 1:

[0, 0, 0, 2, 2, 0, 2, 2, 2, 0] 

I will need to find the indices (start, stop):

 (3, 5) (6, 9) 

The fastest thing I could implement is to create a logical array:

 truth = data > threshold 

and then scrolling through the array using numpy.argmin and numpy.argmax to find the start and end positions.

  pos = 0 truth = container[RATIO,:] > threshold while pos < len(truth): start = numpy.argmax(truth[pos:]) + pos + offset end = numpy.argmin(truth[start:]) + start + offset if not truth[start]:#nothing more break if start == end:#goes to the end end = len(truth) pos = end 

But it was too slow for billions of positions in my arrays and the fact that the gaps that I find are usually only a few positions in a row. Does anyone know a faster way to find these gaps?

+4
source share
2 answers

As one way. First, take the boolean array that you have:

 In [11]: a Out[11]: array([0, 0, 0, 2, 2, 0, 2, 2, 2, 0]) In [12]: a1 = a > 1 

Move it to the left (to get the following state for each index) using roll :

 In [13]: a1_rshifted = np.roll(a1, 1) In [14]: starts = a1 & ~a1_rshifted # it True but the previous isn't In [15]: ends = ~a1 & a1_rshifted 

Where it is non-zero - this is the beginning of each True batch (or, accordingly, the final batch):

 In [16]: np.nonzero(starts)[0], np.nonzero(ends)[0] Out[16]: (array([3, 6]), array([5, 9])) 

And squeezing them together:

 In [17]: zip(np.nonzero(starts)[0], np.nonzero(ends)[0]) Out[17]: [(3, 5), (6, 9)] 
+5
source

If you have access to the scipy library:

You can use scipy.ndimage.measurements.label to identify any areas of nonzero value. it returns an array in which the value of each element is the identifier of the range or range in the original array.

You can then use scipy.ndimage.measurements.find_objects to return the fragments needed to extract these ranges. You can get start and end values ​​directly from these fragments.

In your example:

 from numpy import array from scipy.ndimage.measurements import label, find_objects data = numpy.array([0, 0, 0, 2, 2, 0, 2, 2, 2, 0]) labels, number_of_regions = label(a) ranges = find_objects(labels) for identified_range in ranges: print identified_range[0].start, identified_range[0].stop 

You should see:

 3 5 6 9 

Hope this helps!

+1
source

Source: https://habr.com/ru/post/1486647/


All Articles