Using numpy.take for faster fancy indexing

Question

Using numpy.take for faster fancy indexing

EDIT I saved a more complex problem that I encountered below, but my problems with np.take can be summarized as follows. Let's say you have an array of img forms (planes, rows) and another array of lut forms (planes, 256) , and you want to use them to create a new array of out form (planes, rows) , where out[p,j] = lut[p, img[p, j]] , This can be achieved with fantastic indexing as follows:

 In [4]: %timeit lut[np.arange(planes).reshape(-1, 1), img] 1000 loops, best of 3: 471 us per loop

But if you use take and a python loop over planes instead of an indexing fantasy, events can be significantly accelerated:

 In [6]: %timeit for _ in (lut[j].take(img[j]) for j in xrange(planes)) : pass 10000 loops, best of 3: 59 us per loop

Can lut and img be rebuilt in some way so that the whole operation is done without python loops, but using numpy.take (or an alternative method) instead of the usual fancy indexing to keep the speed advantage?

ORIGINAL QUESTION I have a set of lookup tables (LUTs) that I want to use in an image. An array containing the LUT has the shape (planes, 256, n) , and the image has the shape (planes, rows, cols) . Both have dtype = 'uint8' corresponding to the 256 LUT axis. The idea is to start the p image plane through each of the n LUTs from the p LUT plane.

If my lut and img as follows:

 planes, rows, cols, n = 3, 4000, 4000, 4 lut = np.random.randint(-2**31, 2**31 - 1, size=(planes * 256 * n // 4,)).view('uint8') lut = lut.reshape(planes, 256, n) img = np.random.randint(-2**31, 2**31 - 1, size=(planes * rows * cols // 4,)).view('uint8') img = img.reshape(planes, rows, cols)

I can achieve what I get after using fancy indexing like

 out = lut[np.arange(planes).reshape(-1, 1, 1), img]

which gives me an array of forms (planes, rows, cols, n) , where out[i, :, :, j] contains the i th plane img passing through the j th LUT of the i th plane LUT ...

All is well, except for this:

 In [2]: %timeit lut[np.arange(planes).reshape(-1, 1, 1), img] 1 loops, best of 3: 5.65 s per loop

which is completely unacceptable, especially since I have all of the following not very pleasant alternatives using np.take , which are faster:

One LUT on one plane works faster on approximately x70:

 In [2]: %timeit np.take(lut[0, :, 0], img[0]) 10 loops, best of 3: 78.5 ms per loop

The python loop going through all the combinations you need ends faster x6:

 In [2]: %timeit for _ in (np.take(lut[j, :, k], img[j]) for j in xrange(planes) for k in xrange(n)) : pass 1 loops, best of 3: 947 ms per loop

Even starting all combinations of planes in the LUT and image and then discarding unwanted planes**2 - planes faster than fantastic indexing:
```
 In [2]: %timeit np.take(lut, img, axis=1)[np.arange(planes), np.arange(planes)] 1 loops, best of 3: 3.79 s per loop 
```

And the fastest combination I could come up with has a python loop iterating over the planes and finishes x13 faster:

 In [2]: %timeit for _ in (np.take(lut[j], img[j], axis=0) for j in xrange(planes)) : pass 1 loops, best of 3: 434 ms per loop

The question, of course, is that there is no way to do this with np.take without any python loop? Ideally, any change in shape or change in size should occur on the LUT, and not on the image, but I'm open to everything you can think of ...

+13

python numpy

Jaime Jan 23 '13 at 23:43

source share

1 answer

Saullo Castro · Answer 1 · 2013-05-09 11:03

Fist of all I have to say, I really liked your question. Without rebuilding LUT or IMG , the following solution worked:

 %timeit a=np.take(lut, img, axis=1) # 1 loops, best of 3: 1.93s per loop

But from the result you need to request the diagonal: a [0,0], a [1,1], a [2,2]; to get what you want. I tried to find a way to do this indexing only for diagonal elements, but still failed.

Here are some ways to reorder your LUT and IMG : The following works if the indices in the IMG are 0-255 for the 1st plane, 256-511 for the 2nd plane and 512-767 for the third plane, but this will prevent you from using 'uint8' that can be a big problem ...:

 lut2 = lut.reshape(-1,4) %timeit np.take(lut2,img,axis=0) # 1 loops, best of 3: 716 ms per loop # or %timeit np.take(lut2, img.flatten(), axis=0).reshape(3,4000,4000,4) # 1 loops, best of 3: 709 ms per loop

in my car, your solution is still the best option and very adequate, because you just need diagonal estimates, that is, the plane 1-plane1, the plane 2-plane2 and the plane 3-plane3:

 %timeit for _ in (np.take(lut[j], img[j], axis=0) for j in xrange(planes)) : pass # 1 loops, best of 3: 677 ms per loop

Hope this can give you some idea of a better solution. It would be nice to find additional options with flatten() and similar methods like np.apply_over_axes() or np.apply_along_axis() which seem promising.

I used this code below to generate data:

 import numpy as np num = 4000 planes, rows, cols, n = 3, num, num, 4 lut = np.random.randint(-2**31, 2**31-1,size=(planes*256*n//4,)).view('uint8') lut = lut.reshape(planes, 256, n) img = np.random.randint(-2**31, 2**31-1,size=(planes*rows*cols//4,)).view('uint8') img = img.reshape(planes, rows, cols)

Using numpy.take for faster fancy indexing

More articles: