Is it possible to vectorize this python function?

Question

Is it possible to vectorize this python function?

I am working on this function, which generates some parameters that I need for the modeling code that I am developing, and hit the wall, increasing its performance.

Profiling the code shows that this is the main bottleneck, so any improvements I can make for it, but minor, would be great.

I wanted to try to vectorize parts of this function, but I'm not sure if this is possible.

The main problem is that the parameters that are stored in my params array depend on the parameter indices. The only direct solution I've seen is to use np.ndenumerate , but that seems pretty slow.

Is it possible to vectorize this type of operation, where the values stored in the array depend on where they are stored? Or would it be smarter / faster to create a generator that would just give me tuples of array indices?

 import numpy as np from scipy.sparse import linalg as LA def get_params(num_bonds, energies): """ Returns the interaction parameters of different pairs of atoms. Parameters ---------- num_bonds : ndarray, shape = (M, 20) Sparse array containing the number of nearest neighbor bonds for different pairs of atoms (denoted by their column) and next- nearest neighbor bonds. Columns 0-9 contain nearest neighbors, 10-19 contain next-nearest neighbors energies : ndarray, shape = (M, ) Energy vector corresponding to each atomic system stored in each row of num_bonds. """ # -- Compute the bond energies x = LA.lsqr(num_bonds, energies, show=False)[0] params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4]) nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4], (1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6], (1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9], (3,2): x[9]} nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14], (1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16], (1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19], (3,2): x[19]} """ params contains the energy contribution of each site due to its local environment. The shape is given by the number of possible atom types and the number of sites in the lattice. """ for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params): params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \ nn[(i,m)] + nnn[(i,jj)] + \ nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)] return np.ascontiguousarray(params)

+5

performance python vectorization python-2.7 numpy

Steve Oct 12 '16 at 18:47

source share

1 answer

Divakar · Accepted Answer · 2016-10-12T21:40:55+0000

Here a vector approach is used using broadcasted sumations -

 # Gather the elements sorted by the keys in (row,col) order of a dense # 2D array for both nn and nnn sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort() a0 = np.array(nn.values())[sidx0].reshape(4,4) sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort() a1 = np.array(nnn.values())[sidx1].reshape(4,4) # Perform the summations keep the first axis aligned for nn and nnn parts parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \ a0[:,None,None,:,None] + a0[:,None,None,None,:] parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \ a1[:,None,None,:,None] + a1[:,None,None,None,:] # Finally add up sums from nn and nnn for final output out = parte0[...,None,None,None,None] + parte1[:,None,None,None,None]

Runtime test

Function Defects -

 def vectorized_approach(nn,nnn): sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort() a0 = np.array(nn.values())[sidx0].reshape(4,4) sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort() a1 = np.array(nnn.values())[sidx1].reshape(4,4) parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \ a0[:,None,None,:,None] + a0[:,None,None,None,:] parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \ a1[:,None,None,:,None] + a1[:,None,None,None,:] return parte0[...,None,None,None,None] + parte1[:,None,None,None,None] def original_approach(nn,nnn): params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4]) for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params): params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \ nn[(i,m)] + nnn[(i,jj)] + \ nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)] return params

Setup Inputs -

 # Setup inputs x = np.random.rand(30) nn = {(0,0): x[0], (1,1): x[1], (2,2): x[2], (3,3): x[3], (0,1): x[4], (1,0): x[4], (0,2): x[5], (2,0): x[5], (0,3): x[6], (3,0): x[6], (1,2): x[7], (2,1): x[7], (1,3): x[8], (3,1): x[8], (2,3): x[9], (3,2): x[9]} nnn = {(0,0): x[10], (1,1): x[11], (2,2): x[12], (3,3): x[13], (0,1): x[14], (1,0): x[14], (0,2): x[15], (2,0): x[15], (0,3): x[16], (3,0): x[16], (1,2): x[17], (2,1): x[17], (1,3): x[18], (3,1): x[18], (2,3): x[19], (3,2): x[19]}

Dates -

 In [98]: np.allclose(original_approach(nn,nnn),vectorized_approach(nn,nnn)) Out[98]: True In [99]: %timeit original_approach(nn,nnn) 1 loops, best of 3: 884 ms per loop In [100]: %timeit vectorized_approach(nn,nnn) 1000 loops, best of 3: 708 µs per loop

Welcome to 1000x+ speedup!

For the system of the total number of such external works, there exists a general solution that is implemented through these sizes -

 m,n = a0.shape # size of output array along each axis N = 4 # Order of system out = a0.copy() for i in range(1,N): out = out[...,None] + a0.reshape((m,)+(1,)*i+(n,)) for i in range(N): out = out[...,None] + a1.reshape((m,)+(1,)*(i+n)+(n,))

Is it possible to vectorize this python function?

More articles: