I could not understand why there is no proposed pure numpy way to make this work. So I found one that uses numpy broadcast. The main idea is to convert one of the arrays to 3d by replacing the axes. Let 2 arrays be built:
a=np.random.randint(10, size=(5, 3)) b=np.zeros_like(a) b[:4,:]=a[np.random.randint(a.shape[0], size=4), :]
On my start, it gave:
a=array([[5, 6, 3], [8, 1, 0], [2, 1, 4], [8, 0, 6], [6, 7, 6]]) b=array([[2, 1, 4], [2, 1, 4], [6, 7, 6], [5, 6, 3], [0, 0, 0]])
Steps (arrays can be used interchangeably):
In a function with 2 lines to reduce memory usage (correct me if not):
def array_row_intersection(a,b): tmp=np.prod(np.swapaxes(a[:,:,None],1,2)==b,axis=2) return a[np.sum(np.cumsum(tmp,axis=0)*tmp==1,axis=1).astype(bool)]
which gave the result for my example:
result=array([[5, 6, 3], [2, 1, 4], [6, 7, 6]])
This is faster than the given solutions, as it uses only simple numpy operations, while it reduces dimensional stability and is ideal for two large matrices. Probably, I could be mistaken in my comments, as I received an answer from experiments and instinct. The equivalent for crossing columns can be found either by moving arrays, or by slightly changing the steps. Also, if duplicates are needed, then the steps inside the "//" should be skipped. The function can be edited to return only a Boolean array of indices, which came in handy when trying to get different indexes of arrays with the same vector. Benchmark for a voted answer and mine (the number of elements in each dimension plays a role in the selection):
code:
def voted_answer(A,B): nrows, ncols = A.shape dtype={'names':['f{}'.format(i) for i in range(ncols)], 'formats':ncols * [A.dtype]} C = np.intersect1d(A.view(dtype), B.view(dtype)) return C.view(A.dtype).reshape(-1, ncols) a_small=np.random.randint(10, size=(10, 10)) b_small=np.zeros_like(a_small) b_small=a_small[np.random.randint(a_small.shape[0],size=[a_small.shape[0]]),:] a_big_row=np.random.randint(10, size=(10, 1000)) b_big_row=a_big_row[np.random.randint(a_big_row.shape[0],size=[a_big_row.shape[0]]),:] a_big_col=np.random.randint(10, size=(1000, 10)) b_big_col=a_big_col[np.random.randint(a_big_col.shape[0],size=[a_big_col.shape[0]]),:] a_big_all=np.random.randint(10, size=(100,100)) b_big_all=a_big_all[np.random.randint(a_big_all.shape[0],size=[a_big_all.shape[0]]),:] print 'Small arrays:' print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_small,b_small),number=100)/100 print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_small,b_small),number=100)/100 print 'Big column arrays:' print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_col,b_big_col),number=100)/100 print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_col,b_big_col),number=100)/100 print 'Big row arrays:' print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_row,b_big_row),number=100)/100 print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_row,b_big_row),number=100)/100 print 'Big arrays:' print '\t Voted answer:',timeit.timeit(lambda:voted_answer(a_big_all,b_big_all),number=100)/100 print '\t Proposed answer:',timeit.timeit(lambda:array_row_intersection(a_big_all,b_big_all),number=100)/100
with the results:
Small arrays: Voted answer: 7.47108459473e-05 Proposed answer: 2.47001647949e-05 Big column arrays: Voted answer: 0.00198730945587 Proposed answer: 0.0560171294212 Big row arrays: Voted answer: 0.00500325918198 Proposed answer: 0.000308241844177 Big arrays: Voted answer: 0.000864889621735 Proposed answer: 0.00257176160812
The next verdict is that if you need to compare 2 large 2d arrays of 2d points, use a voice response. If you have large matrices in all dimensions, the voted answer is the best of all. So it depends on what you choose each time.