Efficient way to create an array that is a sequence of variable length ranges in numpy

Suppose I have an array

import numpy as np x=np.array([5,7,2]) 

I want to create an array containing a sequence of ranges laid out along with the length of each range given by x:

 y=np.hstack([np.arange(1,n+1) for n in x]) 

Is there a way to do this without speed limits to understand a list or a loop. (x can be a very large array)

The result should be

 y == np.array([1,2,3,4,5,1,2,3,4,5,6,7,1,2]) 
+4
source share
2 answers

You can use accumulation:

 def my_sequences(x): x = x[x != 0] # you can skip this if you do not have 0s in x. # Create result array, filled with ones: y = np.cumsum(x, dtype=np.intp) a = np.ones(y[-1], dtype=np.intp) # Set all beginnings to - previous length: a[y[:-1]] -= x[:-1] # and just add it all up (btw. np.add.accumulate is equivalent): return np.cumsum(a, out=a) # here, in-place should be safe. 

(One word of caution: if the result of the array is larger than the possible size of np.iinfo(np.intp).max , this may with some failure return incorrect results, and not fail cleanly ...)

And because everyone always wants timings (compared to the Ofion method):

 In [11]: x = np.random.randint(0, 20, 1000000) In [12]: %timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind)) 1 loops, best of 3: 753 ms per loop In [13]: %timeit my_sequences(x) 1 loops, best of 3: 191 ms per loop 

of course, the my_sequences function will not work badly when the x values ​​become large.

+4
source

First idea; to prevent multiple np.arange and concatenate calls should be much faster than hstack :

 import numpy as np x=np.array([5,7,2]) >>>a=np.arange(1,x.max()+1) >>> np.hstack([a[:k] for k in x]) array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2]) >>> np.concatenate([a[:k] for k in x]) array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2]) 

If there are many inanimate values, this seems more efficient:

 >>>ua,uind=np.unique(x,return_inverse=True) >>>a=[np.arange(1,k+1) for k in ua] >>>np.concatenate(np.take(a,uind)) array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2]) 

Some timings for your case:

 x=np.random.randint(0,20,1000000) 

Source

 #Using hstack %timeit np.hstack([np.arange(1,n+1) for n in x]) 1 loops, best of 3: 7.46 s per loop #Using concatenate %timeit np.concatenate([np.arange(1,n+1) for n in x]) 1 loops, best of 3: 5.27 s per loop 

First code:

 #Using hstack %timeit a=np.arange(1,x.max()+1);np.hstack([a[:k] for k in x]) 1 loops, best of 3: 3.03 s per loop #Using concatenate %timeit a=np.arange(1,x.max()+1);np.concatenate([a[:k] for k in x]) 10 loops, best of 3: 998 ms per loop 

Second code:

 %timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind)) 10 loops, best of 3: 522 ms per loop 

Looks like we're getting 14x speedup with the final code.

A small health check:

 ua,uind=np.unique(x,return_inverse=True) a=[np.arange(1,k+1) for k in ua] out=np.concatenate(np.take(a,uind)) >>>out.shape (9498409,) >>>np.sum(x) 9498409 
+3
source

Source: https://habr.com/ru/post/1495557/


All Articles