Search for matching submatrices within the matrix

I have a 100x200 2D array expressed as a numpy array consisting of black (0) and white (255) cells. This is a raster file. Then I have 2D forms (the easiest way to think of them is letters), which are also two-dimensional black and white cells.

I know that I can naively iterate over the matrix, but this will be the "hot" part of my code, so speed is a concern. Is there a quick way to accomplish this in numpy / scipy?

I briefly looked at the Scipy correlation function. I'm not interested in "fuzzy matches", just exact matches. I also looked at some scientific articles, but they are above my head.

+6
source share
2 answers

You can use correlation. You need to set the black values ​​to -1 and your white values ​​to 1 (or vice versa) so that you know the value of the correlation peak and that it only happens with the correct letter.

The following code does what I think you want.

import numpy from scipy import signal # Set up the inputs a = numpy.random.randn(100, 200) a[a<0] = 0 a[a>0] = 255 b = numpy.random.randn(20, 20) b[b<0] = 0 b[b>0] = 255 # put b somewhere in a a[37:37+b.shape[0], 84:84+b.shape[1]] = b # Now the actual solution... # Set the black values to -1 a[a==0] = -1 b[b==0] = -1 # and the white values to 1 a[a==255] = 1 b[b==255] = 1 max_peak = numpy.prod(b.shape) # c will contain max_peak where the overlap is perfect c = signal.correlate(a, b, 'valid') overlaps = numpy.where(c == max_peak) print overlaps 

This outputs (array([37]), array([84])) , the location of the offsets set in the code.

You will most likely find that if the size of your message times your large array size is greater than about Nlog (N), where N corresponds to the size of the large array you are looking for (for each dimension), then you will probably get acceleration using an fft-based algorithm such as scipy.signal.fftconvolve (bearing in mind that you need to flip each axis of one of the data sets if you use convolution rather than correlation - flipud and fliplr ). The only modification would be the assignment c:

 c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid') 

Comparison of timings in size above:

 In [5]: timeit c = signal.fftconvolve(a, numpy.fliplr(numpy.flipud(b)), 'valid') 100 loops, best of 3: 6.78 ms per loop In [6]: timeit c = signal.correlate(a, b, 'valid') 10 loops, best of 3: 151 ms per loop 
+8
source

Here is a method that you can use or adapt, depending on the details of your requirements. It uses ndimage.label and ndimage.find_objects :

  • label the image with ndimage.label , this finds all the blob in the array and puts them in integers.
  • Get fragments of these blocks using ndimage.find_objects
  • Then use set intersection to see if the found blobs your wanted blobs

Code for 1. and 2. .:

 import scipy from scipy import ndimage import matplotlib.pyplot as plt #flatten to ensure greyscale. im = scipy.misc.imread('letters.png',flatten=1) objects, number_of_objects = ndimage.label(im) letters = ndimage.find_objects(objects) #to save the images for illustrative purposes only: plt.imsave('ob.png',objects) for i,j in enumerate(letters): plt.imsave('ob'+str(i)+'.png',objects[j]) 

input example:

enter image description here

marked:

enter image description here

isolated drops for testing against:

enter image description hereenter image description hereenter image description hereenter image description hereenter image description hereenter image description here

+7
source

Source: https://habr.com/ru/post/918330/


All Articles