A quick way to compute this numpy request

I have a boolean length numpyarray . I also have an array of length <= containing numbers from (inclusive) to (inclusive) and does not contain duplicates. The query I want to calculate is , but I don't think this is the fastest way to do this.masknnumpy an0n-1np.array([x for x in a if mask[x]])

Is there a faster way to do this in numpythan the one I just wrote?

0
source share
1 answer

Seems like the fastest way to do this is simple a[mask[a]]. I wrote a quick test that shows the difference in speed between the two methods depending on the coverage of the mask, p (the number of true elements / n).

import timeit
import matplotlib.pyplot as plt
import numpy as np
n = 10000
p = 0.25
slow_times = []
fast_times = []
p_space = np.linspace(0, 1, 100)
for p in p_space:
    mask = np.random.choice([True, False], n, p=[p, 1 - p])
    a = np.arange(n)
    np.random.shuffle(a)
    y = np.array([x for x in a if mask[x]])
    z = a[mask[a]]
    n_test = 100
    t1 = timeit.timeit(lambda: np.array([x for x in a if mask[x]]), number=n_test)
    t2 = timeit.timeit(lambda: a[mask[a]], number=n_test)
    slow_times.append(t1)
    fast_times.append(t2)
plt.plot(p_space, slow_times, label='slow')
plt.plot(p_space, fast_times, label='fast')
plt.xlabel('p (# true items in mask)')
plt.ylabel('time (ms)')
plt.legend()
plt.title('Speed of method vs. coverage of mask')
plt.show()

Who gave me this plot

enter image description here

Thus, this method is much faster, regardless of the coverage of the mask.

+1
source

Source: https://habr.com/ru/post/1673073/


All Articles