Looking for intersection of two matrices in Python within tolerance?

I am looking for the most efficient way to find the intersection of two matrices of different sizes. Each matrix has three variables (columns) and a different number of observations (rows). For example, matrix A:

a = np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
b = np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003'; 9 9 3000; 7 7 1000')

If I set the tolerance for each column as col1 = 1, col2 = 2and col3 = 10, I need a function to output indices in aand bthat are within their respective tolerance, for example:

[x1, x2] = func(a, b, col1, col2, col3)
print x1
>> [2 3]
print x2
>> [1 3]

You can see from the indices that element 2 of ais within the tolerance of 1 element b.

I think that I could scroll through each element of the matrix a, check if it is within the allowable values ​​of each element in b, and do it this way. But this is inefficient for very large datasets.

Any suggestions on alternatives to the loop method to accomplish this?

+4
source share
1 answer

If you don't mind working with NumPy arrays, you can use broadcastingfor a vectorized solution. Here's the implementation -

# Set tolerance values for each column
tol = [1, 2, 10]

# Get absolute differences between a and b keeping their columns aligned
diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))

# Compare each row with the triplet from `tol`.
# Get mask of all matching rows and finally get the matching indices
x1,x2 = np.nonzero((diffs < tol).all(2))

Run Example -

In [46]: # Inputs
    ...: a=np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
    ...: b=np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003; 9 9 3000; 7 7 1000')
    ...: 

In [47]: # Set tolerance values for each column
    ...: tol = [1, 2, 10]
    ...: 
    ...: # Get absolute differences between a and b keeping their columns aligned
    ...: diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))
    ...: 
    ...: # Compare each row with the triplet from `tol`.
    ...: # Get mask of all matching rows and finally get the matching indices
    ...: x1,x2 = np.nonzero((diffs < tol).all(2))
    ...: 

In [48]: x1,x2
Out[48]: (array([2, 3]), array([1, 3]))

:. , , , - 3, 3 , -

na = a.shape[0]
nb = b.shape[0]
accum = np.ones((na,nb),dtype=bool)
for i in range(a.shape[1]):
    accum &=  np.abs((a[:,i] - b[:,i].ravel())) < tol[i]
x1,x2 = np.nonzero(accum)
+3

Source: https://habr.com/ru/post/1614386/


All Articles