Find all maximum indices in Pandas DataFrame

I need to find all indexes where the maximum value (for each row) is obtained in the Pandas DataFrame. For example, if I have a dataFrame like this:

   cat1  cat2  cat3
0     0     2     2
1     3     0     1
2     1     1     0

then the method I'm looking for will give a result, for example:

[['cat2', 'cat3'],
 ['cat1'],
 ['cat1', 'cat2']]

This is a list of lists, but some other data structure is also good.

I can’t use df.idxmax(axis=1)because it gives only the first maximum.

+4
source share
2 answers

Here is the information in another data structure:

In [8]: df = pd.DataFrame({'cat1':[0,3,1], 'cat2':[2,0,1], 'cat3':[2,1,0]})

In [9]: df
Out[9]: 
   cat1  cat2  cat3
0     0     2     2
1     3     0     1
2     1     1     0

[3 rows x 3 columns]

In [10]: rowmax = df.max(axis=1)

Maximum values ​​are indicated by True values:

In [82]: df.values == rowmax[:,None]
Out[82]: 
array([[False,  True,  True],
       [ True, False, False],
       [ True,  True, False]], dtype=bool)

np.where returns indexes where the DataFrame is above True.

In [84]: np.where(df.values == rowmax[:,None])
Out[84]: (array([0, 0, 1, 2, 2]), array([1, 2, 0, 0, 1]))

axis=0, axis=1. 5 , , True.


itertools.groupby , , , , , :

In [46]: import itertools as IT

In [47]: import operator

In [48]: idx = np.where(df.values == rowmax[:,None])

In [49]: groups = IT.groupby(zip(*idx), key=operator.itemgetter(0))

In [50]: [[df.columns[j] for i, j in grp] for k, grp in groups]
Out[50]: [['cat1', 'cat1'], ['cat2'], ['cat3', 'cat3']]
+3

In [2560]: cols = df.columns.values

In [2561]: vals = df.values

In [2562]: [cols[v].tolist() for v in vals == vals.max(1)[:, None]]
Out[2562]: [['cat2', 'cat3'], 
            ['cat1'], 
            ['cat1', 'cat2']]
0

Source: https://habr.com/ru/post/1525787/


All Articles