import numpy as np rows_a, rows_v, cols = (10, 15, 3) a = np.arange(rows_a*cols).reshape(rows_a,cols) v = np.arange(rows_v*cols).reshape(rows_v,cols) def using_loop(): c = 0 for i in range(len(v)): D = ((av[i])**2).sum(axis=-1) c += D.min() return c def using_broadcasting(): return ((a[:,np.newaxis,:]-v)**2).sum(axis=-1).min(axis=0).sum()
In [106]: %timeit using_loop() 1000 loops, best of 3: 233 ยตs per loop In [107]: %timeit using_broadcasting() 10000 loops, best of 3: 29.1 ยตs per loop In [108]: assert using_loop() == using_broadcasting()
When using NumPy, it usually helps to eliminate for-loops (if possible) and express calculations using operations performed on all arrays, or at least on arrays as much as possible. By doing so, you load most of the work into fast algorithms written in C or Fortran without intermediate Python code.
In the source code, D has the form (10,) for each iteration of the loop. Since there are 15 iterations of the loop, if we could express all the values โโfor D from all 15 iterations at once as one large array, then D would have the form (10, 15) . In fact, we can do this:
Since a has the form (10,3) , a[:, np.newaxis, :] has the form (10,1,3) .
Using NumPy broadcast , since v has the form (15,3) ,
a[:,np.newaxis,:]-v
has the form (10,15,3) . Squaring and then summing on the last axis gives an array of shape (10, 15) . This is the new D :
In [109]: ((a[:,np.newaxis,:]-v)**2).sum(axis=-1).shape Out[109]: (10, 15)
Once you have D , the rest of the calculation is natural.