Compare large arrays of arrays

I have a numpy array of n 1x3 arrays, where n is the total number of possible combinations of elements in 1x3 arrays, where each element is in the range from 0 to 50. That is

 A = [[0,0,0],[0,0,1]...[0,1,0]...[50,50,50]]

and

 len(A) = 50*50*50 = 125000

I have a numpy array from m 1x3 arrays, where m = 10 million, and arrays can have values ​​belonging to the set described by A.

I want to calculate how many of each combination are present in B, that is, how many times [0,0,0]appear in B, how many times appear [0,0,1]... how many times appear [50,50,50]. So far I have the following:

for i in range(len(A)):
   for j in range(len(B)):
    if np.array_equal(A[i], B[j]):
        y[i] += 1

where y keeps track of how many times the ith array occurs. So, y[0]how many times [0,0,0]appeared in B, y[1]how many times [0,0,1]appeared ... y[125000], how many times [50,50,50]appeared, etc.

, . 10 , 125000 . ?

+4
4

dict(), 10 .

, , , , dict.

:

A = [tuple(i) for i in A]

dict() , 0.

d = {i:0 for i in A}

numpy d [ ] 1

for subarray in B:
    d[tuple(subarray)] += 1

D , - , B.

+1

B, np.unique return_counts=True. , B A, ndarray.all ndarray.any . :

In [82]: unique, counts = np.unique(B, axis=0, return_counts=True)

In [83]: indices = np.where((unique == A[:,None,:]).all(axis=2).any(axis=0))[0]

# Get items from A that exist in B
In [84]: unique[indices]

# Get the counts 
In [85]: counts[indices]

:

In [86]: arr = np.array([[2 ,3, 4], [5, 6, 0], [2, 3, 4], [1, 0, 4], [3, 3, 3], [5, 6, 0], [2, 3, 4]])

In [87]: a = np.array([[2, 3, 4], [1, 9, 5], [3, 3, 3]])

In [88]: unique, counts = np.unique(arr, axis=0, return_counts=True)

In [89]: indices = np.where((unique == a[:,None,:]).all(axis=2).any(axis=0))[0]

In [90]: unique[indices]
Out[90]: 
array([[2, 3, 4],
       [3, 3, 3]])

In [91]: counts[indices]
Out[91]: array([3, 1])
+1

y=[np.where(np.all(B==arr,axis=1))[0].shape[0] for arr in A]

arr A np.all , B np.where, , shape ,

+1

. 10 range(50)^3 100 , (@Primusa's):

, 0 - 50^3 - 1 . ( , A .) np.ravel_multi_index np.unravel_index .

B , , np.bincount. , 50x50x50, . ( 0 49, len(A) 125000):

>>> B = np.random.randint(0, 50, (10000000, 3))
>>> Br = np.ravel_multi_index(B.T, (50, 50, 50))
>>> result = np.bincount(Br, minlength=125000).reshape(50, 50, 50)

:

>>> B = np.random.randint(0, 3, (10, 3))
>>> Br = np.ravel_multi_index(B.T, (3, 3, 3))
>>> result = np.bincount(Br, minlength=27).reshape(3, 3, 3)
>>> 
>>> B
array([[1, 1, 2],
       [2, 1, 2],
       [2, 0, 0],
       [2, 1, 0],
       [2, 0, 2],
       [0, 0, 2],
       [0, 0, 2],
       [0, 2, 2],
       [2, 0, 0],
       [0, 2, 0]])
>>> result
array([[[0, 0, 2],
        [0, 0, 0],
        [1, 0, 1]],

       [[0, 0, 0],
        [0, 0, 1],
        [0, 0, 0]],

       [[2, 0, 1],
        [1, 0, 1],
        [0, 0, 0]]])

, , [2,1,0] B,

>>> result[2,1,0]
1

As stated above: To convert indexes to Aand actual strings A(which are indexes to my result), you can use np.ravel_multi_indexand np.unravel_index. Or you can leave the last change (i.e. Use result = np.bincount(Br, minlength=125000), then the counts are indexed exactly the same as A.

+1
source

Source: https://habr.com/ru/post/1695946/


All Articles