A quick way to compute this numpy request

Question

A quick way to compute this numpy request

I have a boolean length numpyarray . I also have an array of length <= containing numbers from (inclusive) to (inclusive) and does not contain duplicates. The query I want to calculate is , but I don't think this is the fastest way to do this.masknnumpy an0n-1np.array([x for x in a if mask[x]])

Is there a faster way to do this in numpythan the one I just wrote?

0

optimization numpy

michaelsnowden Mar 23 '17 at 18:24

source share

1 answer

michaelsnowden · Accepted Answer · 2017-03-23T18:38:57+0000

Seems like the fastest way to do this is simple a[mask[a]]. I wrote a quick test that shows the difference in speed between the two methods depending on the coverage of the mask, p (the number of true elements / n).

import timeit
import matplotlib.pyplot as plt
import numpy as np
n = 10000
p = 0.25
slow_times = []
fast_times = []
p_space = np.linspace(0, 1, 100)
for p in p_space:
    mask = np.random.choice([True, False], n, p=[p, 1 - p])
    a = np.arange(n)
    np.random.shuffle(a)
    y = np.array([x for x in a if mask[x]])
    z = a[mask[a]]
    n_test = 100
    t1 = timeit.timeit(lambda: np.array([x for x in a if mask[x]]), number=n_test)
    t2 = timeit.timeit(lambda: a[mask[a]], number=n_test)
    slow_times.append(t1)
    fast_times.append(t2)
plt.plot(p_space, slow_times, label='slow')
plt.plot(p_space, fast_times, label='fast')
plt.xlabel('p (# true items in mask)')
plt.ylabel('time (ms)')
plt.legend()
plt.title('Speed of method vs. coverage of mask')
plt.show()

Who gave me this plot

Thus, this method is much faster, regardless of the coverage of the mask.

A quick way to compute this numpy request

More articles: