How to avoid using for-loops with numpy?

Question

How to avoid using for-loops with numpy?

I already wrote the following code snippet that does exactly what I want, but it is too slow. I am sure there is a way to do this faster, but I cannot find how to do it. The first part of the code is just to show what form.

two measurement images ( VV1 and HH1 )
pre-calculated values, modeling VV and HH , which both depend on 3 parameters (pre-calculated for values (101, 31, 11) )
index 2 should just put the VV and HH images in the same ndarray, instead of creating two 3darrays

 VV1 = numpy.ndarray((54, 43)).flatten() HH1 = numpy.ndarray((54, 43)).flatten() precomp = numpy.ndarray((101, 31, 11, 2))

two of the three parameters that we allow vary

 comp = numpy.zeros((len(parameter1), len(parameter2))) for i,(vv,hh) in enumerate(zip(VV1,HH1)): comp0 = numpy.zeros((len(parameter1),len(parameter2))) for j in range(len(parameter1)): for jj in range(len(parameter2)): comp0[j,jj] = numpy.min((vv-precomp[j,jj,:,0])**2+(hh-precomp[j,jj,:,1])**2) comp+=comp0

The obvious thing I know I have to do is get rid of as many for-loops as possible, but I don't know how to get numpy.min to behave correctly when dealing with a lot of dimensions.

The second thing (less important if it can be vectorized, but still interesting), I noticed that it takes mostly CPU time, not RAM, but I searched for a long time, but I can’t find a way to write something like "parfor" instead of "for" in matlab (is it possible to make the @parallel decorator if I just put the for-loop in a separate method?)

edit: in response to Janne Karila: yes, it definitely improves it a lot,

 for (vv,hh) in zip(VV1,HH1): comp+= numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)

Definitely much faster, but is there any way to remove the outer loop too? And is there a way to make a parallel loop using @parallel or something else?

+4

python numpy

usethedeathstar May 17 '13 at 11:22

source share

3 answers

This can replace the inner loops, j and jj

 comp0 = numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)

It may be a replacement for the whole cycle, although all this indexing is a little stretch of my mind. (this creates a large intermediate array, though)

 comp = numpy.sum( numpy.min((VV1.reshape(-1,1,1,1) - precomp[numpy.newaxis,...,0])**2 +(HH1.reshape(-1,1,1,1) - precomp[numpy.newaxis,...,1])**2, axis=2), axis=0)

+5

Janne karila May 17 '13 at 11:43

source share

In computer science, there is the concept of Big O notation , used to get an approximate amount of work needed for something. To make the program fast, do as little as possible.

This is why Janne's answer is much faster, you do less computation. Following this principle, we can apply the concept of memoization , because you are attached to the CPU, and not to RAM. You can use a memory library if it should be more complex than in the following example.

 class AutoVivification(dict): """Implementation of perl autovivification feature.""" def __getitem__(self, item): try: return dict.__getitem__(self, item) except KeyError: value = self[item] = type(self)() return value memo = AutoVivification() def memoize(n, arr, end): if not memo[n][arr][end]: memo[n][arr][end] = (n-arr[...,end])**2 return memo[n][arr][end] for (vv,hh) in zip(VV1,HH1): first = memoize(vv, precomp, 0) second = memoize(hh, precomp, 1) comp+= numpy.min(first+second, axis=2)

Everything that has already been calculated is stored in the memory in the dictionary, and we can look at it later, and not reprogram it. You can even break up the math into smaller steps that are memorized if necessary.

The AutoVivification dictionary is just to make it easy to save results inside memoize, because I'm lazy. Again, you can memoize any of the math you do, so if numpy.min is slow, memoize too.

0

Spencer rathbun May 17 '13 at 12:30

source share

askewchan · Accepted Answer · 2013-05-17T15:01:31+0000

One way to parallelize a loop is to build it in such a way as to use map . In this case, you can use multiprocessing.Pool to use a parallel card.

I would change this:

 for (vv,hh) in zip(VV1,HH1): comp+= numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2)

Something like that:

 def buildcomp(vvhh): vv, hh = vvhh return numpy.min((vv-precomp[...,0])**2+(hh-precomp[...,1])**2, axis=2) if __name__=='__main__': from multiprocessing import Pool nthreads = 2 p = Pool(nthreads) complist = p.map(buildcomp, np.column_stack((VV1,HH1))) comp = np.dstack(complist).sum(-1)

Note that dstack assumes that each comp.ndim is 2 because it will add a third axis and sum it along. This will slow things down a bit, because you need to create a list, compose it, and then summarize, but all this is either parallel or multiple operations.

I also changed the zip to the numpy np.column_stack operation, since zip much slower for long arrays, assuming they are already 1d arrays (which they are in your example).

I cannot easily verify this, so if there is a problem, feel free to let me know.

How to avoid using for-loops with numpy?

More articles: