Search for matching submatrices within the matrix

Question

Search for matching submatrices within the matrix

I have a 100x200 2D array expressed as a numpy array consisting of black (0) and white (255) cells. This is a raster file. Then I have 2D forms (the easiest way to think of them is letters), which are also two-dimensional black and white cells.

I know that I can naively iterate over the matrix, but this will be the "hot" part of my code, so speed is a concern. Is there a quick way to accomplish this in numpy / scipy?

I briefly looked at the Scipy correlation function. I'm not interested in "fuzzy matches", just exact matches. I also looked at some scientific articles, but they are above my head.

+6

python numpy

Daveo Jun 18 '12 at 6:52

source share

2 answers

Here is a method that you can use or adapt, depending on the details of your requirements. It uses ndimage.label and ndimage.find_objects :

label the image with ndimage.label , this finds all the blob in the array and puts them in integers.
Get fragments of these blocks using ndimage.find_objects
Then use set intersection to see if the found blobs your wanted blobs

Code for 1. and 2. .:

 import scipy from scipy import ndimage import matplotlib.pyplot as plt #flatten to ensure greyscale. im = scipy.misc.imread('letters.png',flatten=1) objects, number_of_objects = ndimage.label(im) letters = ndimage.find_objects(objects) #to save the images for illustrative purposes only: plt.imsave('ob.png',objects) for i,j in enumerate(letters): plt.imsave('ob'+str(i)+'.png',objects[j])

input example:

marked:

isolated drops for testing against:

+7

fraxel Jun 18 '12 at 9:02

source share

Henry Gomersall · Accepted Answer · 2012-06-18T08:54:16+0000

You can use correlation. You need to set the black values to -1 and your white values to 1 (or vice versa) so that you know the value of the correlation peak and that it only happens with the correct letter.

The following code does what I think you want.

import numpy from scipy import signal # Set up the inputs a = numpy.random.randn(100, 200) a[a<0] = 0 a[a>0] = 255 b = numpy.random.randn(20, 20) b[b<0] = 0 b[b>0] = 255 # put b somewhere in a a[37:37+b.shape[0], 84:84+b.shape[1]] = b # Now the actual solution... # Set the black values to -1 a[a==0] = -1 b[b==0] = -1 # and the white values to 1 a[a==255] = 1 b[b==255] = 1 max_peak = numpy.prod(b.shape) # c will contain max_peak where the overlap is perfect c = signal.correlate(a, b, 'valid') overlaps = numpy.where(c == max_peak) print overlaps

This outputs (array([37]), array([84])) , the location of the offsets set in the code.

You will most likely find that if the size of your message times your large array size is greater than about Nlog (N), where N corresponds to the size of the large array you are looking for (for each dimension), then you will probably get acceleration using an fft-based algorithm such as scipy.signal.fftconvolve (bearing in mind that you need to flip each axis of one of the data sets if you use convolution rather than correlation - flipud and fliplr ). The only modification would be the assignment c:

 c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid')

Comparison of timings in size above:

 In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid') 100 loops, best of 3: 6.78 ms per loop In [6]: timeit c = signal.correlate(a, b, 'valid') 10 loops, best of 3: 151 ms per loop

Search for matching submatrices within the matrix

More articles: