You can do some awesome things with the as_strided function in combination with numpy cast. Here are two versions of your function:
import numpy as np from numpy.lib.stride_tricks import as_strided def sumsqdiff(input_image, template, valid_mask=None): if valid_mask is None: valid_mask = np.ones_like(template) total_weight = valid_mask.sum() window_size = template.shape ssd = np.empty((input_image.shape[0] - window_size[0] + 1, input_image.shape[1] - window_size[1] + 1)) for i in xrange(ssd.shape[0]): for j in xrange(ssd.shape[1]): sample = input_image[i:i + window_size[0], j:j + window_size[1]] dist = (template - sample) ** 2 ssd[i, j] = (dist * valid_mask).sum() return ssd def sumsqdiff2(input_image, template, valid_mask=None): if valid_mask is None: valid_mask = np.ones_like(template) total_weight = valid_mask.sum() window_size = template.shape
This is where the ipython session is executed for comparison.
The template that I will use for demonstration:
In [72]: template Out[72]: array([[-1, 1, -1], [ 1, 2, 1], [-1, 1, -1]])
A little input so that we can check the result:
In [73]: x Out[73]: array([[ 0., 1., 2., 3., 4., 5., 6.], [ 7., 8., 9., 10., 11., 12., 13.], [ 14., 15., 16., 17., 18., 19., 20.], [ 21., 22., 23., 24., 25., 26., 27.], [ 28., 29., 30., 31., 32., 33., 34.]])
Apply two functions to x and make sure we get the same result:
In [74]: sumsqdiff(x, template) Out[74]: array([[ 856., 1005., 1172., 1357., 1560.], [ 2277., 2552., 2845., 3156., 3485.], [ 4580., 4981., 5400., 5837., 6292.]]) In [75]: sumsqdiff2(x, template) Out[75]: array([[ 856., 1005., 1172., 1357., 1560.], [ 2277., 2552., 2845., 3156., 3485.], [ 4580., 4981., 5400., 5837., 6292.]])
Now make a much larger โimageโ input:
In [76]: z = np.random.randn(500, 500)
and check the performance:
In [77]: %timeit sumsqdiff(z, template) 1 loops, best of 3: 3.55 s per loop In [78]: %timeit sumsqdiff2(z, template) 10 loops, best of 3: 33 ms per loop
Not too shabby. :)
Two disadvantages:
- The calculation in
sumsqdiff2 will generate a temporary array, which for a 3x3 template will be 9 times the size of input_image . (In general, this will be template.size times larger than input_image .) - These walking tricks will not help you when you execute the Cythonize code. When you go into Cython, you often return to loops that you free when you vectorize with numpy.