Here a vector approach is used using broadcasted sumations -
Runtime test
Function Defects -
def vectorized_approach(nn,nnn): sidx0 = np.ravel_multi_index(np.array(nn.keys()).T,(4,4)).argsort() a0 = np.array(nn.values())[sidx0].reshape(4,4) sidx1 = np.ravel_multi_index(np.array(nnn.keys()).T,(4,4)).argsort() a1 = np.array(nnn.values())[sidx1].reshape(4,4) parte0 = a0[:,:,None,None,None] + a0[:,None,:,None,None] + \ a0[:,None,None,:,None] + a0[:,None,None,None,:] parte1 = a1[:,:,None,None,None] + a1[:,None,:,None,None] + \ a1[:,None,None,:,None] + a1[:,None,None,None,:] return parte0[...,None,None,None,None] + parte1[:,None,None,None,None] def original_approach(nn,nnn): params = np.zeros([4, 4, 4, 4, 4, 4, 4, 4, 4]) for (i,j,k,l,m,jj,kk,ll,mm), val in np.ndenumerate(params): params[i,j,k,l,m,jj,kk,ll,mm] = nn[(i,j)] + nn[(i,k)] + nn[(i,l)] + \ nn[(i,m)] + nnn[(i,jj)] + \ nnn[(i,kk)] + nnn[(i,ll)] + nnn[(i,mm)] return params
Setup Inputs -
Dates -
In [98]: np.allclose(original_approach(nn,nnn),vectorized_approach(nn,nnn)) Out[98]: True In [99]: %timeit original_approach(nn,nnn) 1 loops, best of 3: 884 ms per loop In [100]: %timeit vectorized_approach(nn,nnn) 1000 loops, best of 3: 708 µs per loop
Welcome to 1000x+ speedup!
For the system of the total number of such external works, there exists a general solution that is implemented through these sizes -
m,n = a0.shape # size of output array along each axis N = 4 # Order of system out = a0.copy() for i in range(1,N): out = out[...,None] + a0.reshape((m,)+(1,)*i+(n,)) for i in range(N): out = out[...,None] + a1.reshape((m,)+(1,)*(i+n)+(n,))