Getting sequences ranges of identical records with minimum length in a numpy array

Question

Getting sequences ranges of identical records with minimum length in a numpy array

Consider an array with elements consisting exclusively of -1 or 1. How to get the ranges of all slices containing only 1 and have a minimum length t (for example, t=3 )

Example:

 >>>a=np.array([-1,-1,1,1,1,1,1,-1,1,-1,-1,1,1,1,1], dtype=int) >>> a array([-1, -1, 1, 1, 1, 1, 1, -1, 1, -1, -1, 1, 1, 1, 1])

Then the desired conclusion for t=3 will be [(2,7),(11,15)] .

+5

python arrays numpy

corinna Oct 21 '15 at 9:45

source share

2 answers

I don’t know that numpy is very good, but isn’t it better to use a simple function?

 def slices(a, t): start = None i = 0 # index into array slices = [] for val in a: if a[i] == 1: # start of sequence if start is None: start = i else: # -1 end of sequence if start is not None: if i - start >= t: # check sequence for minimum size slices.append((start, i)) start = None i += 1 # if sequence of 1 doesn't end with -1 within array if start is not None: if i - start >= t: slices.append((start, i)) return slices

0

Radek Luner Oct 21 '15 at 10:23

source share

Divakar · Accepted Answer · 2015-10-21T09:57:57+0000

One approach using np.diff and np.where is

 # Append with `-1s` at either ends and get the differentiation dfa = np.diff(np.hstack((-1,a,-1))) # Get the positions of starts and stops of 1s in `a` starts = np.where(dfa==2)[0] stops = np.where(dfa==-2)[0] # Get valid mask for pairs from starts and stops being of at least 3 in length valid_mask = (stops - starts) >= 3 # Finally collect the valid pairs as the output out = np.column_stack((starts,stops))[valid_mask].tolist()

Getting sequences ranges of identical records with minimum length in a numpy array

More articles: