Effectively get the minimum values for each pair of elements from two arrays in the third array

Question

Effectively get the minimum values for each pair of elements from two arrays in the third array

I have two floating point N arrays (which act as (x,y) coordinates and can have duplicates) and a linked z array of floating N (which act as weights for the coordinates).

For each pair of floating (x,y) I need to select the pair with the lowest associated value of z . I defined a selectMinz() function that does this (see the code below), but it takes too much time.

How can I improve the performance of this feature?

 import numpy as np import time def getData(): N = 100000 x = np.arange(0.0005, 0.03, 0.001) y = np.arange(6., 10., .05) # Select N values for x,y, where values can be repeated x = np.random.choice(x, N) y = np.random.choice(y, N) z = np.random.uniform(10., 15., N) return x, y, z def selectMinz(x, y, z): """ Select the minimum z for each (x,y) pair. """ xy_unq, z_unq = [], [] # For each (x,y) pair for i, xy in enumerate(zip(*[x, y])): # If this xy pair was already stored in the xy_unq list if xy in xy_unq: # If the stored z value associated with this xy pair is # larger than this new z[i] value if z_unq[xy_unq.index(xy)] > z[i]: # Store this smaller value instead z_unq[xy_unq.index(xy)] = z[i] else: # Store the xy pair, and its associated z value xy_unq.append(xy) z_unq.append(z[i]) return xy_unq, z_unq # Define data with the proper format. x, y, z = getData() s = time.clock() xy_unq, z_unq = selectMinz(x, y, z) # <-- TAKES TOO LONG (~15s in my system) print(time.clock() - s)

+5

performance python arrays numpy

Gabriel Aug 25 '17 at 18:12

source share

3 answers

Changing the data structure of xy_unq and z_unq in a dictionary containing both pieces of information reduced the time from ~ 7 to ~ 0.1 s in my system.

 def selectMinz(x, y, z): """ Select the minimum z for each (x,y) pair. """ xy_unq = {} # For each (x,y) pair for i, xy in enumerate(zip(*[x, y])): # If this xy pair was already stored in the xy_unq list if xy in xy_unq: # If the stored z value associated with this xy pair is # larger than this new z[i] value if xy_unq[xy] > z[i]: # Store this smaller value instead xy_unq[xy] = z[i] else: # Store the xy pair, and its associated z value xy_unq[xy] = z[i] return xy_unq.keys(), xy_unq.values()

The time for the above approach for me ranged from ~ 0.106 to ~ 0.11 s. Here is an alternative approach with fewer lines of code, but takes a little longer (~ 0.14):

 def selectMinz(x, y, z): """ Select the minimum z for each (x,y) pair. """ xy_unq = {} # For each (x,y) pair for i, xy in enumerate(zip(*[x, y])): # If this xy pair was already stored in the xy_unq list if xy in xy_unq: # Store the value that is smaller between the current stored value and the new z[i] xy_unq[xy] = min(xy_unq[xy], z[i]) else: # Store the xy pair, and its associated z value xy_unq[xy] = z[i] return xy_unq.keys(), xy_unq.values()

+3

Antimony Aug 25 '17 at 18:19

source share

The numpy_indexed package (disclaimer: I am the author) contains the functionality to solve these grouping problems in an elegant and efficient way:

 import numpy_indexed as npi xy_unq, z_unq = npi.group_by((x, y)).min(z)

+2

Eelco hoogendoorn Aug 25 '17 at 21:04

source share

Divakar · Accepted Answer · 2017-08-25T18:32:19+0000

Steps:

Use lex-sort to get xy pairs in sequence. Or we can use the scaling method to scale one of the arrays over a range of values on the other, and then sum it with another array and finally use argsort to get the lex-equivalent equivalent indices.
Use np.minimum.reduceat to get the minimum values in the intervals defined by paired groupings.

Thus, we would have one vectorized solution, for example:

 def selectMinz_vectorized(x, y, z): # Get grouped lex-sort indices sidx = (y + x*(y.max() - y.min() + 1)).argsort() # or sidx = np.lexsort([x, y]) # Lex-sort x, y, z x_sorted = x[sidx] y_sorted = y[sidx] z_sorted = z[sidx] # Get equality mask between each sorted X and Y elem against previous ones. # The non-zero indices of its inverted mask gives us the indices where the # new groupings start. We are calling those as cut_idx. seq_eq_mask = (x_sorted[1:] == x_sorted[:-1]) & (y_sorted[1:] == y_sorted[:-1]) cut_idx = np.flatnonzero(np.concatenate(( [True], ~seq_eq_mask))) # Use those cut_idx to get intervalled minimum values minZ = np.minimum.reduceat(z_sorted, cut_idx) # Make tuples of the groupings of x,y and the corresponding min Z values return (zip(x_sorted[cut_idx], y_sorted[cut_idx]), minZ.tolist())

Run Example -

 In [120]: np.c_[x,y,z] Out[120]: array([[ 0., 1., 69.], [ 2., 0., 47.], [ 1., 0., 62.], [ 0., 2., 33.], [ 1., 7., 32.], [ 1., 0., 50.], [ 2., 0., 55.]]) In [121]: selectMinz(x,y,z) # original method Out[121]: ([(0.0, 1.0), (2.0, 0.0), (1.0, 0.0), (0.0, 2.0), (1.0, 7.0)], [69.0, 47.0, 50.0, 33.0, 32.0]) In [122]: selectMinz_vectorized(x,y,z) Out[122]: ([(1.0, 0.0), (2.0, 0.0), (0.0, 1.0), (0.0, 2.0), (1.0, 7.0)], [50.0, 47.0, 69.0, 33.0, 32.0])

Here is my initial approach, which included creating a complex array, and then doing these operations. The implementation looked something like this:

 def selectMinz_vectorized_v2(x, y, z): d = np.column_stack((x,y,z)) sidx = np.lexsort(d[:,:2].T) b = d[sidx] cut_idx = np.r_[0,np.flatnonzero(~(b[1:,:2] == b[:-1,:2]).all(1))+1] minZ = np.minimum.reduceat(b[:,-1], cut_idx) return ([tuple(i) for i in b[cut_idx,:2]], minZ.tolist())

Benchmarking for vectorized approaches

Approaches -

 # Pruned version of the approach posted earlier def selectMinz_vectorized_pruned(x, y, z): sidx = (y + x*(y.max() - y.min() + 1)).argsort() x_sorted = x[sidx] y_sorted = y[sidx] z_sorted = z[sidx] seq_eq_mask = (x_sorted[1:] == x_sorted[:-1]) & (y_sorted[1:] == y_sorted[:-1]) cut_idx = np.flatnonzero(np.concatenate(( [True], ~seq_eq_mask))) minZ = np.minimum.reduceat(z_sorted, cut_idx) return x_sorted[cut_idx], y_sorted[cut_idx], minZ def numpy_indexed_app(x,y,z): # @Eelco Hoogendoorn soln return npi.group_by((x, y)).min(z)

Dates -

 In [141]: x,y,z=getData(10000) In [142]: %timeit selectMinz_vectorized_pruned(x, y, z) ...: %timeit numpy_indexed_app(x,y,z) ...: 1000 loops, best of 3: 763 µs per loop 1000 loops, best of 3: 1.09 ms per loop In [143]: x,y,z=getData(100000) In [144]: %timeit selectMinz_vectorized_pruned(x, y, z) ...: %timeit numpy_indexed_app(x,y,z) ...: 100 loops, best of 3: 8.53 ms per loop 100 loops, best of 3: 12.9 ms per loop

Effectively get the minimum values ​​for each pair of elements from two arrays in the third array

More articles:

Effectively get the minimum values for each pair of elements from two arrays in the third array