Optimizing the implementation of a rotating mask in numpy / scipy

This is my first attempt at using steps in numpy, and it has improved speed compared to just iterating over various filters, but still it is rather slow (and it seems that there are at least one or two things that are completely redundant or inefficient).

So my question is: are there any better ways to accomplish this or tricks for my code that would make it significantly faster?

The algorithm locally evaluates 9 different filters for each pixel and selects the one that has the least standard deviation (my attempt to implement Nagau and Matsuyma (1980) "Structural analysis of complex areal photographs", as described in the image analysis book). The result is both a smooth and sharp image (pretty cool if you ask me!)

import numpy as np from scipy import ndimage from numpy.lib import stride_tricks def get_rotating_kernels(): kernels = list() protokernel = np.arange(9).reshape(3, 3) for k in xrange(9): ax1, ax2 = np.where(protokernel==k) kernel = np.zeros((5,5), dtype=bool) kernel[ax1: ax1+3, ax2: ax2+3] = 1 kernels.append(kernel) return kernels def get_rotation_smooth(im, **kwargs): kernels = np.array([k.ravel() for k in get_rotating_kernels()], dtype=bool) def rotation_matrix(section): multi_s = stride_tricks.as_strided(section, shape=(9,25), strides=(0, section.itemsize)) rot_filters = multi_s[kernels].reshape(9,9) return rot_filters[rot_filters.std(1).argmin(),:].mean() return ndimage.filters.generic_filter(im, rotation_matrix, size=5, **kwargs) from scipy import lena im = lena() im2 = get_rotation_smooth(im) 

(Just a comment, get_rotating_kernel is actually not optimized, since in any case there is no time wasted)

It took 126 seconds on my netbook, and Lena was a very small image.

Edit:

I received a proposal to change rot_filters.std(1) to rot_filters.var(1) to save a lot of square roots and shaved something in the order of 5 seconds.

+4
source share
2 answers

I find it difficult for you to optimize time using Python + scipy . However, I was able to make a slight improvement by using as_strided to generate rot_filters directly, rather than through Boolean indexing. This is based on the very simple n-dimensional windows function. (I wrote this to solve this problem before I realized that scipy has a 2-bit convolution function.) The following code provides moderate 10% acceleration on my machine; see below for an explanation of how this works:

 import numpy as np from scipy import ndimage from numpy.lib import stride_tricks # pass in `as_strided` as a default arg to save a global lookup def rotation_matrix2(section, _as_strided=stride_tricks.as_strided): section = section.reshape(5, 5) # sqrt(section.size), sqrt(section.size) windows_shape = (3, 3, 3, 3) # 5 - 3 + 1, 5 - 3 + 1, 3, 3 windows_strides = section.strides + section.strides windows = _as_strided(section, windows_shape, windows_strides) rot_filters = windows.reshape(9, 9) return rot_filters[rot_filters.std(1).argmin(),:].mean() def get_rotation_smooth(im, _rm=rotation_matrix2, **kwargs): return ndimage.filters.generic_filter(im, _rm, size=5, **kwargs) if __name__ == '__main__': import matplotlib.pyplot as plt from scipy.misc import lena im = lena() im2 = get_rotation_smooth(im) #plt.gray() # Uncomment these lines for #plt.imshow(im2) # demo purposes. #plt.show() 

The above rotation_matrix2 function is equivalent to the following two functions (which are actually a bit slower than your original function, because windows more generalized). This does exactly what your source code does - it creates 9 3x3 windows in a 5x5 array, and then converts them into a 9x9 array for processing.

 def windows(a, w, _as_strided=stride_tricks.as_strided): windows_shape = tuple(sa - sw + 1 for sa, sw in zip(a.shape, w)) windows_shape += w windows_strides = a.strides + a.strides return _as_strided(a, windows_shape, windows_strides) def rotation_matrix1(section, _windows=windows): rot_filters = windows(section.reshape(5, 5), (3, 3)).reshape(9, 9) return rot_filters[rot_filters.std(1).argmin(),:].mean() 

windows works with arrays of any dimension if the window has the same number of dimensions. Here's a breakdown of how this works:

  windows_shape = tuple(sa - sw + 1 for sa, sw in zip(a.shape, w)) 

We can consider the windows array as an nd array of nd arrays. The shape of the external nd array is dictated by the degree of freedom of the window within the larger array; in each dimension, the number of positions a window can take is equal to the length of the larger array minus the length of the window plus one. In this case, we have a 3x3 window into a 5x5 array, so the external 2-dimensional array is a 3x3 array.

  windows_shape += w 

The shape of the internal nd array is the same as the shape of the window itself. In our case, it is again a 3x3 array.

Now for the step. We must define the steps for the external nd array and for the internal nd array. But it turns out that they are the same! After all, a window moves around a larger array in the same way a single index moves around an array, right?

  windows_strides = a.strides + a.strides 

Now we have all the information needed to create windows:

  return _as_strided(a, windows_shape, windows_strides) 
+1
source

For complex single-pixel + neighborhood operations, you can use cython to improve performance. This allows you to efficiently write code for both loops in syntax similar to python syntax, which is later converted to C code.

For inspiration, you can, for example, take a look at the scikit-image code:

https://github.com/scikit-image/scikit-image/blob/master/skimage/filter/_denoise.pyx

+1
source

Source: https://habr.com/ru/post/1440680/


All Articles