Efficient way to create an array that is a sequence of variable length ranges in numpy

Question

Efficient way to create an array that is a sequence of variable length ranges in numpy

Suppose I have an array

import numpy as np x=np.array([5,7,2])

I want to create an array containing a sequence of ranges laid out along with the length of each range given by x:

 y=np.hstack([np.arange(1,n+1) for n in x])

Is there a way to do this without speed limits to understand a list or a loop. (x can be a very large array)

The result should be

 y == np.array([1,2,3,4,5,1,2,3,4,5,6,7,1,2])

+4

python numpy

dlm Aug 6 '13 at 15:02

source share

2 answers

seberg · Answer 1 · 2013-08-06T18:18:49+0000

You can use accumulation:

 def my_sequences(x): x = x[x != 0] # you can skip this if you do not have 0s in x. # Create result array, filled with ones: y = np.cumsum(x, dtype=np.intp) a = np.ones(y[-1], dtype=np.intp) # Set all beginnings to - previous length: a[y[:-1]] -= x[:-1] # and just add it all up (btw. np.add.accumulate is equivalent): return np.cumsum(a, out=a) # here, in-place should be safe.

(One word of caution: if the result of the array is larger than the possible size of np.iinfo(np.intp).max , this may with some failure return incorrect results, and not fail cleanly ...)

And because everyone always wants timings (compared to the Ofion method):

 In [11]: x = np.random.randint(0, 20, 1000000) In [12]: %timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind)) 1 loops, best of 3: 753 ms per loop In [13]: %timeit my_sequences(x) 1 loops, best of 3: 191 ms per loop

of course, the my_sequences function will not work badly when the x values become large.

Daniel · Answer 2 · 2013-08-06T15:43:06+0000

First idea; to prevent multiple np.arange and concatenate calls should be much faster than hstack :

 import numpy as np x=np.array([5,7,2]) >>>a=np.arange(1,x.max()+1) >>> np.hstack([a[:k] for k in x]) array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2]) >>> np.concatenate([a[:k] for k in x]) array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])

If there are many inanimate values, this seems more efficient:

 >>>ua,uind=np.unique(x,return_inverse=True) >>>a=[np.arange(1,k+1) for k in ua] >>>np.concatenate(np.take(a,uind)) array([1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 6, 7, 1, 2])

Some timings for your case:

 x=np.random.randint(0,20,1000000)

Source

 #Using hstack %timeit np.hstack([np.arange(1,n+1) for n in x]) 1 loops, best of 3: 7.46 s per loop #Using concatenate %timeit np.concatenate([np.arange(1,n+1) for n in x]) 1 loops, best of 3: 5.27 s per loop

First code:

 #Using hstack %timeit a=np.arange(1,x.max()+1);np.hstack([a[:k] for k in x]) 1 loops, best of 3: 3.03 s per loop #Using concatenate %timeit a=np.arange(1,x.max()+1);np.concatenate([a[:k] for k in x]) 10 loops, best of 3: 998 ms per loop

Second code:

 %timeit ua,uind=np.unique(x,return_inverse=True);a=[np.arange(1,k+1) for k in ua];np.concatenate(np.take(a,uind)) 10 loops, best of 3: 522 ms per loop

Looks like we're getting 14x speedup with the final code.

A small health check:

 ua,uind=np.unique(x,return_inverse=True) a=[np.arange(1,k+1) for k in ua] out=np.concatenate(np.take(a,uind)) >>>out.shape (9498409,) >>>np.sum(x) 9498409

Efficient way to create an array that is a sequence of variable length ranges in numpy

More articles: