Effective numpy argsort parameter with condition while maintaining source indexes

Question

Effective numpy argsort parameter with condition while maintaining source indexes

I am wondering what is the most efficient way to make an argsort array based on conditions while maintaining the original index

import numpy as np x = np.array([0.63, 0.5, 0.7, 0.65]) np.argsort(x) #Corrected argsort(x) solution Out[99]: array([1, 0, 3, 2])

I want argsort this array with the condition that x> 0.6. Since 0.5 <0.6, index 1 should not be included.

 x = np.array([0.63, 0.5, 0.7, 0.65]) index = x.argsort() list(filter(lambda i: x[i] > 0.6, index)) [0,3,2]

This is inefficient because it is not vectorized.

EDIT: the filter will remove most of the elements. So ideally, it filters first, then sorts, while preserving the original index.

+5

python numpy

parasu Jan 25 '18 at 0:59

source share

4 answers

Come a bit to the party. The idea is that we can sort the array based on the sorted indices of another array.

 y = np.arange(x.shape[0]) # y for preserving the indices mask = x > thresh y = y[mask] x = x[mask] ans = y[np.argsort(x)] # change order of y based on sorted indices of x

The method is to add an array y , which is intended only for writing x indices. Then we filter both arrays based on boolean indices x > thresh . Then select x with argsort . Finally, use return argsort indices to reorder y !

+6

Tai Jan 25 '18 at 2:58

source share

Method 1 (answer @jp_data_analysis)

You should use this if you have no reason.

 def meth1(x, thresh): return np.argsort(x)[(x <= thresh).sum():]

Method 2

If the filter significantly reduces the number of elements in the array and the array is large, then the following may help:

 def meth2(x, thresh): m = x > thresh idxs = np.argsort(x[m]) offsets = (~m).cumsum() return idxs + offsets[m][idxs]

Speed comparison

 x = np.random.rand(10000000) %timeit meth1(x, 0.99) # 2.81 s ± 244 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) %timeit meth2(x, 0.99) # 104 ms ± 1.22 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

+3

Alex Jan 25 '18 at 1:25

source share

Here is another hacker approach that modifies the original array with some arbitrary maximum number, which is unlikely to happen in the original array.

 In [50]: x = np.array([0.63, 0.5, 0.7, 0.65]) In [51]: invmask = ~(x > 0.6) # replace it with some huge number which will not occur in your original array In [52]: x[invmask] = 9999.0 In [53]: np.argsort(x)[:-sum(invmask)] Out[53]: array([0, 3, 2])

0

kmario23 Jan 25 '18 at 3:46

source share

AGN Gazer · Accepted Answer · 2018-01-25T03:27:48+0000

Method 1 (same idea as Tai method, but using integer indexing)

Too late to the party, and if my decision is a repeat of the already published decision - ping me and I will delete it.

 def meth_agn_v1(x, thresh): idx = np.arange(x.size)[x > thresh] return idx[np.argsort(x[idx])]

Then

 In [143]: meth_agn_v1(x, 0.5) Out[143]: array([0, 3, 2])

Method 2 (significant performance improvement)

This uses the same idea as in the last section of my answer (comparison with the Tai method) that integer indexing is faster than logical indexing (to select a small number of expected elements) and eliminates the need to create an initial index.

 def meth_agn_v2(x, thresh): idx, = np.where(x > thresh) return idx[np.argsort(x[idx])]

Timing

 In [144]: x = np.random.rand(100000) In [145]: timeit meth_jp(x, 0.99) 100 loops, best of 3: 7.43 ms per loop In [146]: timeit meth_alex(x, 0.99) 1000 loops, best of 3: 498 µs per loop In [147]: timeit meth_tai(x, 0.99) 1000 loops, best of 3: 298 µs per loop In [148]: timeit meth_agn_v1(x, 0.99) 1000 loops, best of 3: 232 µs per loop In [161]: timeit meth_agn_v2(x, 0.99) 10000 loops, best of 3: 95 µs per loop

Comparison of v1 method with Tai

My first version of the answer is very similar to Tai's answer, but not identical.

Tai Method, originally published:

 def meth_tai(x, thresh): y = np.arange(x.shape[0]) y = y [x > thresh] x = x [x > thresh] # x = x[y] is used in my method y[np.argsort(x)]

So, my method is different in that indexing of whole arrays is used instead of the Boolean indexing used by Tai. For a small number of selected elements, integer indexing is faster than logical indexing, which makes this method more efficient than the Tai method, even after Ty optimized his code.

Effective numpy argsort parameter with condition while maintaining source indexes

Method 1 (same idea as Tai method, but using integer indexing)

Method 2 (significant performance improvement)

Timing

Comparison of v1 method with Tai

Method 1 (answer @jp_data_analysis)

Method 2

Speed ​​comparison

More articles:

Speed comparison