The number of occurrences of an array without overlapping in another array

I have a matrix mxn A , where m%t = n%t = 0 , so the smaller matrix txt B splits the matrix without borders or overlaps. I want to check if A consists entirely of tiles from B without calculating as a secret step as an intermediate step as efficiently as possible. Also, for my special use, there is no need to know B It is enough to check if A strictly repeats each tile txt in each direction.

Numeric examples:

 A = [[1, 0, 1, 0], [0, 1, 0, 1], [1, 0, 1, 0], [0, 1, 0, 1]] B.shape = [2,2] --> True B.shape = [1,1] --> False 

So far, I am calculating the comparison matrix C , which is simply a partition of B by size A :

 import numpy as np x,y = B.shape x_a, y_a = A.shape x_t = x_a/x y_t = y_a/y B_dash = A[:x, :y] C = np.tile(B_dash,(x_t, y_t)) np.count_nonzero(AC) 

Is there a faster way without computing C ?

+5
source share
2 answers

Appproach # 1: It seems we are counting the number of occurrences of B in separate blocks. So we can use skimage.util.view_as_blocks -

 from skimage.util import view_as_blocks as viewW out = np.count_nonzero((viewW(A, B.shape) == B).all((2,3))) 

Appproach # 2: Staying with NumPy , we would have -

 m1,n1 = A.shape m2,n2 = B.shape out = np.count_nonzero((A.reshape(m1//m2,m2,n1//n2,n2) == B[:,None]).all((1,3))) 

Run Examples -

 In [274]: A Out[274]: array([[2, 0, 2, 0], [5, 3, 5, 1], [3, 3, 2, 6], [1, 0, 3, 1]]) In [275]: B Out[275]: array([[3, 3], [1, 0]]) In [276]: np.count_nonzero((viewW(A, B.shape) == B).all((2,3))) Out[276]: 1 In [278]: A Out[278]: array([[2, 0, 3, 3], [5, 3, 1, 0], [3, 3, 2, 6], [1, 0, 3, 1]]) In [279]: B Out[279]: array([[3, 3], [1, 0]]) In [280]: np.count_nonzero((viewW(A, B.shape) == B).all((2,3))) Out[280]: 2 
+3
source

You can get the result without generating C. You can do this:

 x,y = B.shape x_a, y_a = A.shape np.array_equal(A[:, :y_a-y], A[:, y:]) and np.array_equal(A[:x_a-x, :], A[x:, :]) 

That is, to compare A the first columns y_a-y and the last columns y_a-y , then do the same for the rows. I have not tested the code above, but it should be faster, because with this numpy method no new memories will be allocated.

The last statement can be optimized for:

 np.array_equal(A[:, :y_a-y], A[:, y:]) and np.array_equal(A[:x_a-x, :y_a], A[x:, :y_a]) 

Since if the first term is True , we already know that columns A are repeated t times.

0
source

Source: https://habr.com/ru/post/1264250/


All Articles