Get intersecting lines across two numpy 2D arrays

Question

Get intersecting lines across two numpy 2D arrays

I want to get intersecting (common) lines in two 2D numpy arrays. For example, if the following arrays are passed as input:

array([[1, 4], [2, 5], [3, 6]]) array([[1, 4], [3, 6], [7, 8]])

the conclusion should be:

 array([[1, 4], [3, 6])

I know how to do this with loops. I am looking at the Pythonic / Numpy way for this.

+33

python numpy

Karthik Nov 29 '11 at 20:07

source share

7 answers

You can use Python sets:

 >>> import numpy as np >>> A = np.array([[1,4],[2,5],[3,6]]) >>> B = np.array([[1,4],[3,6],[7,8]]) >>> aset = set([tuple(x) for x in A]) >>> bset = set([tuple(x) for x in B]) >>> np.array([x for x in aset & bset]) array([[1, 4], [3, 6]])

As Rob Cowie notes, this can be done more briefly, since

 np.array([x for x in set(tuple(x) for x in A) & set(tuple(x) for x in B)])

There is probably a way to do this without moving from arrays to tuples, but that is not coming to me right now.

+14

mtrw Nov 29 '11 at 20:17

source share

I could not understand why there is no proposed pure numpy way to make this work. So I found one that uses numpy broadcast. The main idea is to convert one of the arrays to 3d by replacing the axes. Let 2 arrays be built:

 a=np.random.randint(10, size=(5, 3)) b=np.zeros_like(a) b[:4,:]=a[np.random.randint(a.shape[0], size=4), :]

On my start, it gave:

 a=array([[5, 6, 3], [8, 1, 0], [2, 1, 4], [8, 0, 6], [6, 7, 6]]) b=array([[2, 1, 4], [2, 1, 4], [6, 7, 6], [5, 6, 3], [0, 0, 0]])

Steps (arrays can be used interchangeably):

 #a is nxm and b is kxm c = np.swapaxes(a[:,:,None],1,2)==b #transform a to nx1xm # c has nxkxm dimensions due to comparison broadcast # each nxixj slice holds comparison matrix between a[j,:] and b[i,:] # Decrease dimension to nxk with product: c = np.prod(c,axis=2) #To get around duplicates:// # Calculate cumulative sum in k-th dimension c= c*np.cumsum(c,axis=0) # compare with 1, so that to get only one 'True' statement by row c=c==1 #// # sum in k-th dimension, so that a nx1 vector is produced c=np.sum(c,axis=1).astype(bool) # The intersection between a and b is a[c] result=a[c]

In a function with 2 lines to reduce memory usage (correct me if not):

 def array_row_intersection(a,b): tmp=np.prod(np.swapaxes(a[:,:,None],1,2)==b,axis=2) return a[np.sum(np.cumsum(tmp,axis=0)*tmp==1,axis=1).astype(bool)]

which gave the result for my example:

 result=array([[5, 6, 3], [2, 1, 4], [6, 7, 6]])

This is faster than the given solutions, as it uses only simple numpy operations, while it reduces dimensional stability and is ideal for two large matrices. Probably, I could be mistaken in my comments, as I received an answer from experiments and instinct. The equivalent for crossing columns can be found either by moving arrays, or by slightly changing the steps. Also, if duplicates are needed, then the steps inside the "//" should be skipped. The function can be edited to return only a Boolean array of indices, which came in handy when trying to get different indexes of arrays with the same vector. Benchmark for a voted answer and mine (the number of elements in each dimension plays a role in the selection):

code:

 def voted_answer(A,B): nrows, ncols = A.shape dtype={'names':['f{}'.format(i) for i in range(ncols)], 'formats':ncols * [A.dtype]} C = np.intersect1d(A.view(dtype), B.view(dtype)) return C.view(A.dtype).reshape(-1, ncols) a_small=np.random.randint(10, size=(10, 10)) b_small=np.zeros_like(a_small) b_small=a_small[np.random.randint(a_small.shape[0],size=[a_small.shape[0]]),:] a_big_row=np.random.randint(10, size=(10, 1000)) b_big_row=a_big_row[np.random.randint(a_big_row.shape[0],size=[a_big_row.shape[0]]),:] a_big_col=np.random.randint(10, size=(1000, 10)) b_big_col=a_big_col[np.random.randint(a_big_col.shape[0],size=[a_big_col.shape[0]]),:] a_big_all=np.random.randint(10, size=(100,100)) b_big_all=a_big_all[np.random.randint(a_big_all.shape[0],size=[a_big_all.shape[0]]),:] print 'Small arrays:' print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_small,b_small),number=100)/100 print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_small,b_small),number=100)/100 print 'Big column arrays:' print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_col,b_big_col),number=100)/100 print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_col,b_big_col),number=100)/100 print 'Big row arrays:' print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_row,b_big_row),number=100)/100 print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_row,b_big_row),number=100)/100 print 'Big arrays:' print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_all,b_big_all),number=100)/100 print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_all,b_big_all),number=100)/100

with the results:

 Small arrays: Voted answer: 7.47108459473e-05 Proposed answer: 2.47001647949e-05 Big column arrays: Voted answer: 0.00198730945587 Proposed answer: 0.0560171294212 Big row arrays: Voted answer: 0.00500325918198 Proposed answer: 0.000308241844177 Big arrays: Voted answer: 0.000864889621735 Proposed answer: 0.00257176160812

The next verdict is that if you need to compare 2 large 2d arrays of 2d points, use a voice response. If you have large matrices in all dimensions, the voted answer is the best of all. So it depends on what you choose each time.

+6

Vasilis Lemonidis Nov 15 '16 at 2:18

source share

Another way to achieve this is with a structured array:

 >>> a = np.array([[3, 1, 2], [5, 8, 9], [7, 4, 3]]) >>> b = np.array([[2, 3, 0], [3, 1, 2], [7, 4, 3]]) >>> av = a.view([('', a.dtype)] * a.shape[1]).ravel() >>> bv = b.view([('', b.dtype)] * b.shape[1]).ravel() >>> np.intersect1d(av, bv).view(a.dtype).reshape(-1, a.shape[1]) array([[3, 1, 2], [7, 4, 3]])

Just for clarity, the structured view looks like this:

 >>> a.view([('', a.dtype)] * a.shape[1]) array([[(3, 1, 2)], [(5, 8, 9)], [(7, 4, 3)]], dtype=[('f0', '<i8'), ('f1', '<i8'), ('f2', '<i8')])

+4

Ram Kumar Karn Nov 30 '11 at 9:13

source share

 np.array(set(map(tuple, b)).difference(set(map(tuple, a))))

It may also work.

+2

Espoir murhabazi Feb 05 '18 at 18:40

source share

 A = np.array([[1,4],[2,5],[3,6]]) B = np.array([[1,4],[3,6],[7,8]]) def matching_rows(A,B): matches=[i for i in range(B.shape[0]) if np.any(np.all(A==B[i],axis=1))] if len(matches)==0: return B[matches] return np.unique(B[matches],axis=0) >>> matching_rows(A,B) array([[1, 4], [3, 6]])

This, of course, assumes that all lines are the same length.

+1

John blackmore Aug 2 '19 at 0:54

source share

 import numpy as np A=np.array([[1, 4], [2, 5], [3, 6]]) B=np.array([[1, 4], [3, 6], [7, 8]]) intersetingRows=[(B==irow).all(axis=1).any() for irow in A] print(A[intersetingRows])

0

SzorgosDiák Aug 28 '19 at 18:07

source share

Joe kington · Accepted Answer · 2011-11-29T20:37:59+0000

For short arrays, using sets is probably the clearest and most readable way to do it.

Another way is to use numpy.intersect1d . You will have to trick it by treating the strings as a single value, though ... This makes things a little less readable ...

 import numpy as np A = np.array([[1,4],[2,5],[3,6]]) B = np.array([[1,4],[3,6],[7,8]]) nrows, ncols = A.shape dtype={'names':['f{}'.format(i) for i in range(ncols)], 'formats':ncols * [A.dtype]} C = np.intersect1d(A.view(dtype), B.view(dtype)) # This last bit is optional if you're okay with "C" being a structured array... C = C.view(A.dtype).reshape(-1, ncols)

For large arrays, this should be significantly faster than using sets.

Get intersecting lines across two numpy 2D arrays

More articles: