Noting that your code has four times the set for loops, I focused on optimizing the inner pair. Here's the old code:
for i in xrange(K.shape[0]): for j in xrange(K.shape[1]): print(i,j) '''create an r vector ''' r=(i*distX,j*distY,z) for x in xrange(img.shape[0]): for y in xrange(img.shape[1]): '''create an ksi vector, then calculate it norm, and the dot product of r and ksi''' ksi=(x*distX,y*distY,z) ksiNorm=np.linalg.norm(ksi) ksiDotR=float(np.dot(ksi,r)) '''calculate the integrand''' temp[x,y]=img[x,y]*np.exp(1j*k*ksiDotR/ksiNorm) '''interpolate so that we can do the integral and take the integral''' temp2=rbs(a,b,temp.real) K[i,j]=temp2.integral(0,n,0,m)
Since K and img are approximately 2000x2000, the innermost statements must be performed sixteen trillion times. It's just not practical to use Python, but we can port the work to C and / or Fortran using NumPy for vectorization. I took this one thorough step at a time to try to make sure that the results would be consistent; here is what i ended up with:
'''create all r vectors''' R = np.empty((K.shape[0], K.shape[1], 3)) R[:,:,0] = np.repeat(np.arange(K.shape[0]), K.shape[1]).reshape(K.shape) * distX R[:,:,1] = np.arange(K.shape[1]) * distY R[:,:,2] = z '''create all ksi vectors''' KSI = np.empty((img.shape[0], img.shape[1], 3)) KSI[:,:,0] = np.repeat(np.arange(img.shape[0]), img.shape[1]).reshape(img.shape) * distX KSI[:,:,1] = np.arange(img.shape[1]) * distY KSI[:,:,2] = z
The inner pair of cycles is now completely gone, replaced by vectorized operations performed in advance (with a linear amount of space in the dimensions of the inputs).
This reduces the iteration time of the outer two loops from 340 seconds to 1.3 seconds on my Macbook Air 1.6 GHz i5 without using Numba. Of 1.3 seconds per iteration, 0.68 seconds are spent in the rbs function, which is equal to scipy.interpolate.RectBivariateSpline . There are probably opportunities for further optimization - here are a few ideas:
- Repeating Numba. I do not have it in my system. At the moment, this may not greatly affect, but it is easy for you to check.
- Make more optimization for the domain, for example, try to simplify the fundamental calculations. My optimization is lossless, and I donβt know your problem area, so I canβt optimize it as deep as you can.
- Try to vectorize the remaining loops. This can be tough if you don't want to replace the scipy RBS function with something that supports multiple calculations per call.
- Get a faster processor. Mina is rather slow; you'll probably get at least 2x acceleration just by using a better computer than my tiny laptop.
- Reduce the data. Your test images have a size of 2000x2000 pixels, but contain quite a bit of detail. If you reduce the linear dimensions by 2-10x, you will get tremendous acceleration.
So this is for me for now. Where does this leave you? Assuming the computer is slightly better and will not work any further, even optimized code will take about a month to process your test images. If you need to do this only once, perhaps this is good. If you need to do this more often or need to iterate over the code when trying to perform different actions, you will probably have to continue optimizing - starting with this RBS function, which consumes more than half the time.
Bonus tip: it would be much easier to deal with your code if it did not have almost the same variable names, such as k and k , and also did not use j as the variable name, as well as as a complex suffix number ( 0j ).