Difference between each pair of columns of two numpy arrays (how to do this more efficiently)?

Question

Difference between each pair of columns of two numpy arrays (how to do this more efficiently)?

I am trying to optimize some code, and by profiling I noticed that this particular loop takes a lot of time. Can you help me write this faster?

import numpy as np rows_a, rows_v, cols = (10, 15, 3) a = np.arange(rows_a*cols).reshape(rows_a,cols) v = np.arange(rows_v*cols).reshape(rows_v,cols) c = 0 for i in range(len(v)): D = ((av[i])**2).sum(axis=-1) c += D.min() print(c)

Is there a numpy function that can do this efficiently?

+5

performance python arrays numpy

marcotama Sep 22 '14 at 0:45

source share

1 answer

unutbu · Accepted Answer · 2014-09-22T01:37:06+0000

 import numpy as np rows_a, rows_v, cols = (10, 15, 3) a = np.arange(rows_a*cols).reshape(rows_a,cols) v = np.arange(rows_v*cols).reshape(rows_v,cols) def using_loop(): c = 0 for i in range(len(v)): D = ((av[i])**2).sum(axis=-1) c += D.min() return c def using_broadcasting(): return ((a[:,np.newaxis,:]-v)**2).sum(axis=-1).min(axis=0).sum()

 In [106]: %timeit using_loop() 1000 loops, best of 3: 233 µs per loop In [107]: %timeit using_broadcasting() 10000 loops, best of 3: 29.1 µs per loop In [108]: assert using_loop() == using_broadcasting()

When using NumPy, it usually helps to eliminate for-loops (if possible) and express calculations using operations performed on all arrays, or at least on arrays as much as possible. By doing so, you load most of the work into fast algorithms written in C or Fortran without intermediate Python code.

In the source code, D has the form (10,) for each iteration of the loop. Since there are 15 iterations of the loop, if we could express all the values for D from all 15 iterations at once as one large array, then D would have the form (10, 15) . In fact, we can do this:

Since a has the form (10,3) , a[:, np.newaxis, :] has the form (10,1,3) .

Using NumPy broadcast , since v has the form (15,3) ,

 a[:,np.newaxis,:]-v

has the form (10,15,3) . Squaring and then summing on the last axis gives an array of shape (10, 15) . This is the new D :

 In [109]: ((a[:,np.newaxis,:]-v)**2).sum(axis=-1).shape Out[109]: (10, 15)

Once you have D , the rest of the calculation is natural.

Difference between each pair of columns of two numpy arrays (how to do this more efficiently)?

More articles: