Find the most common row or vector matrix mode - Python / NumPy

Question

Find the most common row or vector matrix mode - Python / NumPy

I have a numpy array of form (?, N) that represents a vector of n-dimensional vectors.

I want to find the most frequent line.

It still seems that the best way is to simply iterate over all the records and save the score, but it seems indecent that numpy or scipy will not have something built in to complete this task.

+4

python numpy scipy

dant Apr 22 '17 at 3:12

source share

3 answers

Pandas, , :

import numpy as np
import pandas as pd

# generate sample data
ncol = 5
nrow = 20000
matrix = np.random.randint(0,ncol,ncol*nrow).reshape(nrow,ncol)
df = pd.DataFrame(matrix)

df.head()
   0  1  2  3  4
0  3  0  4  4  4
1  4  0  0  2  0
2  3  3  2  0  0
3  0  3  4  3  3
4  1  1  3  3  3

# count duplicated rows
(df.groupby(df.columns.tolist())
   .size()
   .sort_values(ascending=False))

:

0  1  2  3  4
4  2  2  1  1    17
2  2  4  2  3    16
3  2  1  2  2    15
   1  2  4  3    15
                 ..
4  1  3  0  1     1
1  2  3  0  4     1

- . - .

0

andrew_reece 22 . '17 4:03

The numpy_indexed package (dsiclaimer: I am its author) has functionality that does just that, which works on any number of dimensions:

import numpy_indexed as npi
row = npi.mode(arr)

Under the hood, this is similar to Divacar’s decision in terms of algorithms and complexity, with a few more bells and whistles; see "wights" and "return_indices" kwargs.

0

Eelco hoogendoorn Apr 22 '17 at 7:42

source share

Divakar · Accepted Answer · 2017-04-22T05:37:58+0000

Here's a use approach NumPy viewsthat should be pretty effective -

def mode_rows(a):
    a = np.ascontiguousarray(a)
    void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
    _,ids, count = np.unique(a.view(void_dt).ravel(), \
                                return_index=1,return_counts=1)
    largest_count_id = ids[count.argmax()]
    most_frequent_row = a[largest_count_id]
    return most_frequent_row

Run Example -

In [45]: # Let have a random arrayb with three rows(2,4,8) and two rows(1,7)
    ...: # being duplicated. Thus, the most freequent row must be 2 here.
    ...: a = np.random.randint(0,9,(9,5))
    ...: a[4] = a[8]
    ...: a[2] = a[4]
    ...: 
    ...: a[1] = a[7]
    ...: 

In [46]: a
Out[46]: 
array([[8, 8, 7, 0, 7],
       [7, 8, 2, 6, 1],
       [2, 2, 5, 7, 6],
       [6, 5, 8, 8, 5],
       [2, 2, 5, 7, 6],
       [5, 7, 3, 6, 3],
       [2, 8, 7, 2, 0],
       [7, 8, 2, 6, 1],
       [2, 2, 5, 7, 6]])

In [47]: mode_rows(a)
Out[47]: array([2, 2, 5, 7, 6])

Find the most common row or vector matrix mode - Python / NumPy

More articles: