EDIT I saved a more complex problem that I encountered below, but my problems with np.take can be summarized as follows. Let's say you have an array of img forms (planes, rows) and another array of lut forms (planes, 256) , and you want to use them to create a new array of out form (planes, rows) , where out[p,j] = lut[p, img[p, j]] , This can be achieved with fantastic indexing as follows:
In [4]: %timeit lut[np.arange(planes).reshape(-1, 1), img] 1000 loops, best of 3: 471 us per loop
But if you use take and a python loop over planes instead of an indexing fantasy, events can be significantly accelerated:
In [6]: %timeit for _ in (lut[j].take(img[j]) for j in xrange(planes)) : pass 10000 loops, best of 3: 59 us per loop
Can lut and img be rebuilt in some way so that the whole operation is done without python loops, but using numpy.take (or an alternative method) instead of the usual fancy indexing to keep the speed advantage?
ORIGINAL QUESTION I have a set of lookup tables (LUTs) that I want to use in an image. An array containing the LUT has the shape (planes, 256, n) , and the image has the shape (planes, rows, cols) . Both have dtype = 'uint8' corresponding to the 256 LUT axis. The idea is to start the p image plane through each of the n LUTs from the p LUT plane.
If my lut and img as follows:
planes, rows, cols, n = 3, 4000, 4000, 4 lut = np.random.randint(-2**31, 2**31 - 1, size=(planes * 256 * n // 4,)).view('uint8') lut = lut.reshape(planes, 256, n) img = np.random.randint(-2**31, 2**31 - 1, size=(planes * rows * cols // 4,)).view('uint8') img = img.reshape(planes, rows, cols)
I can achieve what I get after using fancy indexing like
out = lut[np.arange(planes).reshape(-1, 1, 1), img]
which gives me an array of forms (planes, rows, cols, n) , where out[i, :, :, j] contains the i th plane img passing through the j th LUT of the i th plane LUT ...
All is well, except for this:
In [2]: %timeit lut[np.arange(planes).reshape(-1, 1, 1), img] 1 loops, best of 3: 5.65 s per loop
which is completely unacceptable, especially since I have all of the following not very pleasant alternatives using np.take , which are faster:
One LUT on one plane works faster on approximately x70:
In [2]: %timeit np.take(lut[0, :, 0], img[0]) 10 loops, best of 3: 78.5 ms per loop
The python loop going through all the combinations you need ends faster x6:
In [2]: %timeit for _ in (np.take(lut[j, :, k], img[j]) for j in xrange(planes) for k in xrange(n)) : pass 1 loops, best of 3: 947 ms per loop
Even starting all combinations of planes in the LUT and image and then discarding unwanted planes**2 - planes faster than fantastic indexing:
In [2]: %timeit np.take(lut, img, axis=1)[np.arange(planes), np.arange(planes)] 1 loops, best of 3: 3.79 s per loop
And the fastest combination I could come up with has a python loop iterating over the planes and finishes x13 faster:
In [2]: %timeit for _ in (np.take(lut[j], img[j], axis=0) for j in xrange(planes)) : pass 1 loops, best of 3: 434 ms per loop
The question, of course, is that there is no way to do this with np.take without any python loop? Ideally, any change in shape or change in size should occur on the LUT, and not on the image, but I'm open to everything you can think of ...