Pandas groupby and adding a new column

Question

Pandas groupby and adding a new column

I am trying to filter out a dataframe that has 3 columns, and what I am trying to do is: group by col1 and col2 and get the maximum value of col3, and also get the second maximum value of col3, but insert it as a new column: col 4

I managed to group it using below, but don't know how to get the second max and insert it as another column:

grouped = df.groupby(['COL1', 'COL2']).agg({'COL3': 'max'})

   COL1  COL2  COL3
0   A    1      0.2 
1   A    1      0.4
3   B    4      0.7

Required Conclusion:

   COL1  COL2  COL3  COL4
0   A    1      0.4  0.2
3   B    4      0.7  0.7

+4

python pandas

hmaxx Feb 05 '18 at 21:43

source share

3 answers

sort_values head , iat, , :

grouped = (df.sort_values(['COL1','COL2','COL3'], ascending=[True, True, False])
             .groupby(['COL1', 'COL2'])['COL3']
             .agg(['max', lambda x: x.head(2).iat[-1]])
          )
grouped.columns = ['COL3','COL4']
grouped = grouped.reset_index()
print (grouped)
  COL1  COL2  COL3  COL4
0    A     1   0.4   0.2
1    B     4   0.7   0.7

+1

jezrael 05 . '18 21:50

use the youngest function with the group and then reset index:

df2 = df.groupby(
          ['COL1', 'COL2']
      ).apply(
          lambda x: pd.Series(x.COL3.nlargest(2).values, index=['COL3', 'COL4'])
      ).reset_index()

outputs:

   COL1  COL2  COL3  COL4
0   A    1      0.4  0.2
1   B    4      0.7  0.7

0

Haleemur ali Feb 05 '18 at 22:10

source share

Alex · Accepted Answer · 2018-02-05T21:59:34+0000

You can use .nlargest. The next solution uses the fact that the constructor Serieswill pass values that match the shape of the index.

df.groupby(['COL1', 'COL2'])['COL3'].apply(
    lambda s: pd.Series(s.nlargest(2).values, index=['COL3', 'COL4'])
).unstack()

returns

           COL3  COL4
COL1 COL2            
A    1      0.4   0.2
B    4      0.7   0.7

Pandas groupby and adding a new column

More articles: