Need help with code vectorization or optimization

Question

Need help with code vectorization or optimization

I am trying to make a double integral by first interpolating the data to make a surface. I use numba to try to speed up this process, but it takes too long.

Here is my code, with the images needed to run the code located in here and here .

+4

optimization python numba

Nighthallow Jul 08 '13 at 14:33

source share

1 answer

John zwinck · Accepted Answer · 2013-07-13T04:12:51+0000

Noting that your code has four times the set for loops, I focused on optimizing the inner pair. Here's the old code:

for i in xrange(K.shape[0]): for j in xrange(K.shape[1]): print(i,j) '''create an r vector ''' r=(i*distX,j*distY,z) for x in xrange(img.shape[0]): for y in xrange(img.shape[1]): '''create an ksi vector, then calculate it norm, and the dot product of r and ksi''' ksi=(x*distX,y*distY,z) ksiNorm=np.linalg.norm(ksi) ksiDotR=float(np.dot(ksi,r)) '''calculate the integrand''' temp[x,y]=img[x,y]*np.exp(1j*k*ksiDotR/ksiNorm) '''interpolate so that we can do the integral and take the integral''' temp2=rbs(a,b,temp.real) K[i,j]=temp2.integral(0,n,0,m)

Since K and img are approximately 2000x2000, the innermost statements must be performed sixteen trillion times. It's just not practical to use Python, but we can port the work to C and / or Fortran using NumPy for vectorization. I took this one thorough step at a time to try to make sure that the results would be consistent; here is what i ended up with:

 '''create all r vectors''' R = np.empty((K.shape[0], K.shape[1], 3)) R[:,:,0] = np.repeat(np.arange(K.shape[0]), K.shape[1]).reshape(K.shape) * distX R[:,:,1] = np.arange(K.shape[1]) * distY R[:,:,2] = z '''create all ksi vectors''' KSI = np.empty((img.shape[0], img.shape[1], 3)) KSI[:,:,0] = np.repeat(np.arange(img.shape[0]), img.shape[1]).reshape(img.shape) * distX KSI[:,:,1] = np.arange(img.shape[1]) * distY KSI[:,:,2] = z # vectorized 2-norm; see http://stackoverflow.com/a/7741976/4323 KSInorm = np.sum(np.abs(KSI)**2,axis=-1)**(1./2) # loop over entire K, which is same shape as img, rows first # this loop populates K, one pixel at a time (so can be parallelized) for i in xrange(K.shape[0]): for j in xrange(K.shape[1]): print(i, j) KSIdotR = np.dot(KSI, R[i,j]) temp = img * np.exp(1j * k * KSIdotR / KSInorm) '''interpolate so that we can do the integral and take the integral''' temp2 = rbs(a, b, temp.real) K[i,j] = temp2.integral(0, n, 0, m)

The inner pair of cycles is now completely gone, replaced by vectorized operations performed in advance (with a linear amount of space in the dimensions of the inputs).

This reduces the iteration time of the outer two loops from 340 seconds to 1.3 seconds on my Macbook Air 1.6 GHz i5 without using Numba. Of 1.3 seconds per iteration, 0.68 seconds are spent in the rbs function, which is equal to scipy.interpolate.RectBivariateSpline . There are probably opportunities for further optimization - here are a few ideas:

Repeating Numba. I do not have it in my system. At the moment, this may not greatly affect, but it is easy for you to check.
Make more optimization for the domain, for example, try to simplify the fundamental calculations. My optimization is lossless, and I don’t know your problem area, so I can’t optimize it as deep as you can.
Try to vectorize the remaining loops. This can be tough if you don't want to replace the scipy RBS function with something that supports multiple calculations per call.
Get a faster processor. Mina is rather slow; you'll probably get at least 2x acceleration just by using a better computer than my tiny laptop.
Reduce the data. Your test images have a size of 2000x2000 pixels, but contain quite a bit of detail. If you reduce the linear dimensions by 2-10x, you will get tremendous acceleration.

So this is for me for now. Where does this leave you? Assuming the computer is slightly better and will not work any further, even optimized code will take about a month to process your test images. If you need to do this only once, perhaps this is good. If you need to do this more often or need to iterate over the code when trying to perform different actions, you will probably have to continue optimizing - starting with this RBS function, which consumes more than half the time.

Bonus tip: it would be much easier to deal with your code if it did not have almost the same variable names, such as k and k , and also did not use j as the variable name, as well as as a complex suffix number ( 0j ).

Need help with code vectorization or optimization

More articles: