Steps:
- Use
lex-sort to get xy pairs in sequence. Or we can use the scaling method to scale one of the arrays over a range of values ββon the other, and then sum it with another array and finally use argsort to get the lex-equivalent equivalent indices. - Use
np.minimum.reduceat to get the minimum values ββin the intervals defined by paired groupings.
Thus, we would have one vectorized solution, for example:
def selectMinz_vectorized(x, y, z):
Run Example -
In [120]: np.c_[x,y,z] Out[120]: array([[ 0., 1., 69.], [ 2., 0., 47.], [ 1., 0., 62.], [ 0., 2., 33.], [ 1., 7., 32.], [ 1., 0., 50.], [ 2., 0., 55.]]) In [121]: selectMinz(x,y,z) # original method Out[121]: ([(0.0, 1.0), (2.0, 0.0), (1.0, 0.0), (0.0, 2.0), (1.0, 7.0)], [69.0, 47.0, 50.0, 33.0, 32.0]) In [122]: selectMinz_vectorized(x,y,z) Out[122]: ([(1.0, 0.0), (2.0, 0.0), (0.0, 1.0), (0.0, 2.0), (1.0, 7.0)], [50.0, 47.0, 69.0, 33.0, 32.0])
Here is my initial approach, which included creating a complex array, and then doing these operations. The implementation looked something like this:
def selectMinz_vectorized_v2(x, y, z): d = np.column_stack((x,y,z)) sidx = np.lexsort(d[:,:2].T) b = d[sidx] cut_idx = np.r_[0,np.flatnonzero(~(b[1:,:2] == b[:-1,:2]).all(1))+1] minZ = np.minimum.reduceat(b[:,-1], cut_idx) return ([tuple(i) for i in b[cut_idx,:2]], minZ.tolist())
Benchmarking for vectorized approaches
Approaches -
Dates -
In [141]: x,y,z=getData(10000) In [142]: %timeit selectMinz_vectorized_pruned(x, y, z) ...: %timeit numpy_indexed_app(x,y,z) ...: 1000 loops, best of 3: 763 Β΅s per loop 1000 loops, best of 3: 1.09 ms per loop In [143]: x,y,z=getData(100000) In [144]: %timeit selectMinz_vectorized_pruned(x, y, z) ...: %timeit numpy_indexed_app(x,y,z) ...: 100 loops, best of 3: 8.53 ms per loop 100 loops, best of 3: 12.9 ms per loop