Finding the maximum value in a Python column

Question

Finding the maximum value in a Python column

I have a data frame ( combined_ranking_df) like this in pandas python:

                Id  Rank                         Activity
0              14035   8.0                         deployed
1              47728   8.0                         deployed
2              24259   1.0                         NaN
3              24259   6.0                         WIP
4              14251   8.0                         deployed
5              14250   1.0                         NaN
6              14250   6.0                         WIP
7              14250   5.0                         NaN
8              14250   5.0                         NaN
9              14250   1.0                         NaN

I am trying to get the maximum value for each id. for example, 14250 it should be 6.0. 24259 it should be 6.0.

                Id  Rank                         Activity
0              14035   8.0                         deployed
1              47728   8.0                         deployed
3              24259   6.0                         WIP
4              14251   8.0                         deployed
6              14250   6.0                         WIP

I tried to do it combined_ranking_df.groupby(['Id'], sort=False)['Rank'].max(), but the result I achieved was the first dataframe(nothing has changed).

What am I doing wrong?

+4

python pandas group-by pandas-groupby

Adam Jul 12 '17 at 17:27

source share

4 answers

piRSquared · Answer 1 · 2017-07-12T17:32:36+0000

1
, @ayhan
, , 'Id'. pd.DataFrame.drop_duplicates . , , . , 'Id'.

df.sort_values('Rank').drop_duplicates('Id', 'last')

      Id  Rank  Activity
3  24259   6.0       WIP
6  14250   6.0       WIP
0  14035   8.0  deployed
1  47728   8.0  deployed
4  14251   8.0  deployed

df.sort_values('Rank').drop_duplicates('Id', 'last').sort_index()

      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP

2
groupby idxmax
, . @MaxU - , n 'Id'.

df.loc[df.groupby('Id', sort=False).Rank.idxmax()]

      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP

MaxU · Answer 2 · 2017-07-12T17:30:08+0000

IIUC:

In [40]: df.groupby('Id', as_index=False, sort=False) \
           .apply(lambda x: x.nlargest(1, ['Rank'])) \
    ...:   .reset_index(level=1, drop=True)
Out[40]:
      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
2  24259   6.0       WIP
3  14251   8.0  deployed
4  14250   6.0       WIP

@piRSquared:

In [41]: df.groupby('Id', group_keys=False, sort=False) \
           .apply(pd.DataFrame.nlargest, n=1, columns='Rank')
Out[41]:
      Id  Rank  Activity
0  14035   8.0  deployed
1  47728   8.0  deployed
3  24259   6.0       WIP
4  14251   8.0  deployed
6  14250   6.0       WIP

Diego Aguado · Answer 3 · 2017-07-12T17:29:57+0000

, groupedby

groups = combined_ranking_df.groupby(['Id'], as_index=False, sort=False).max()[['Id','Rank']].

      Id  Rank
0  14035   8.0
1  47728   8.0
2  24259   6.0
3  14251   8.0
4  14250   6.0

Alexander · Answer 4 · 2017-07-12T17:50:35+0000

You can create a logical index to check if its maximum value matches a Rankgiven Idone. Then use boolean indexing to extract the maximum values from the data frame.

The mask is created using groupbyon Idwith the help transform, which saves the original size of the data block.

>>> df[(df[['Rank']] == df[['Id', 'Rank']].groupby('Id').transform(max)).squeeze().tolist()]
      Id  Rank  Activity
0  14035     8  deployed
1  47728     8  deployed
3  24259     6       WIP
4  14251     8  deployed
6  14250     6       WIP

Finding the maximum value in a Python column

More articles: