Why is numpy.random.choice so slow?

Question

Why is numpy.random.choice so slow?

While writing the script, I found the numpy.random.choice function. I implemented it because it was much cleaner than the equivalent if statement. However, after running the script, I realized that it is much slower than the if statement.

The following is the MWE. The first method takes 0.0 s and the second takes 7.2 s. If you increase the ith cycle, you will see how random.choice decreases rapidly.

Can anyone comment on why random.choice is so much slower?

import numpy as np import numpy.random as rand import time as tm #------------------------------------------------------------------------------- tStart = tm.time() for i in xrange(100): for j in xrange(1000): tmp = rand.rand() if tmp < 0.25: var = 1 elif tmp < 0.5: var = -1 print('Time: %.1f s' %(tm.time() - tStart)) #------------------------------------------------------------------------------- tStart = tm.time() for i in xrange(100): for j in xrange(1000): var = rand.choice([-1, 0, 1], p = [0.25, 0.5, 0.25]) print('Time: %.1f s' %(tm.time() - tStart))

+8

python numpy random

Blink Sep 04 '13 at 20:02

source share

4 answers

It took me a lot of time to understand that my data generator is very slow due to random selection of keys through np.random.choice .

In case an uneven distribution is NOT necessary, here is a suitable solution that I found:

replace

 def get_random_key( a_huge_key_list ) : return np.random.choice( a_huge_key_list )

with

 def get_random_key( a_huge_key_list ) : L = len(a_huge_key_list) i = np.random.randint(0, L) return a_huge_key_list[i]

which greatly improves my speed by 60 times.

+2

pitfall Oct 21 '18 at 7:42

source share

I suspect that the np.random.choice community slows it down, especially for small samples than large ones.

Rough vectorization of the if version:

 def foo(n): x = np.random.rand(n) var = np.zeros(n) var[x<.25] = -1 var[x>.75] = 1 return var

Running in ipython I get:

 timeit np.random.choice([-1,0,1],size=1000,p=[.25,.5,.25]) 1000 loops, best of 3: 293 us per loop timeit foo(1000) 10000 loops, best of 3: 83.4 us per loop timeit np.random.choice([-1,0,1],size=100000,p=[.25,.5,.25]) 100 loops, best of 3: 11 ms per loop timeit foo(100000) 100 loops, best of 3: 8.12 ms per loop

So, for a size of 1000 choice ’s 3-4x slower, but with large vectors the difference begins to disappear.

+1

hpaulj Sep 05 '13 at 6:45

source share

This solution with an aggregate score is about 25 times faster:

 def choice(options,probs): x = np.random.rand() cum = 0 for i,p in enumerate(probs): cum += p if x < cum: break return options[i] options = ['a','b','c','d'] probs = [0.2,0.6,0.15,0.05] runs = 100000 now = time.time() temp = [] for i in range(runs): op = choice(options,probs) temp.append(op) temp = Counter(temp) for op,x in temp.items(): print(op,x/runs) print(time.time()-now) print("") now = time.time() temp = [] for i in range(runs): op = np.random.choice(options,p = probs) temp.append(op) temp = Counter(temp) for op,x in temp.items(): print(op,x/runs) print(time.time()-now)

Running it, I get:

 b 0.59891 a 0.20121 c 0.15007 d 0.04981 0.16232800483703613 b 0.5996 a 0.20138 c 0.14856 d 0.05046 3.8451428413391113

0

Miguel 25 sept. '19 at 15:34

source share

user2357112 · Accepted Answer · 2013-09-04T20:05:46+0000

You use it wrong. Vectorizing an operation, or numpy will not do any good:

 var = numpy.random.choice([-1, 0, 1], size=1000, p=[0.25, 0.5, 0.25])

Sync data:

 >>> timeit.timeit('''numpy.random.choice([-1, 0, 1], ... size=1000, ... p=[0.25, 0.5, 0.25])''', ... 'import numpy', number=10000) 2.380380242513752 >>> timeit.timeit(''' ... var = [] ... for i in xrange(1000): ... tmp = rand.rand() ... if tmp < 0.25: ... var.append(1) ... elif tmp < 0.5: ... var.append(-1) ... else: ... var.append(0)''', ... setup='import numpy.random as rand', number=10000) 5.673041396894519

Why is numpy.random.choice so slow?

More articles: