Pandas groupby and adding a new column

I am trying to filter out a dataframe that has 3 columns, and what I am trying to do is: group by col1 and col2 and get the maximum value of col3, and also get the second maximum value of col3, but insert it as a new column: col 4

I managed to group it using below, but don't know how to get the second max and insert it as another column:

grouped = df.groupby(['COL1', 'COL2']).agg({'COL3': 'max'})

   COL1  COL2  COL3
0   A    1      0.2 
1   A    1      0.4
3   B    4      0.7   

Required Conclusion:

   COL1  COL2  COL3  COL4
0   A    1      0.4  0.2
3   B    4      0.7  0.7 
+4
source share
3 answers

You can use .nlargest. The next solution uses the fact that the constructor Serieswill pass values ​​that match the shape of the index.

df.groupby(['COL1', 'COL2'])['COL3'].apply(
    lambda s: pd.Series(s.nlargest(2).values, index=['COL3', 'COL4'])
).unstack()

returns

           COL3  COL4
COL1 COL2            
A    1      0.4   0.2
B    4      0.7   0.7
+2
source

sort_values head , iat, , :

grouped = (df.sort_values(['COL1','COL2','COL3'], ascending=[True, True, False])
             .groupby(['COL1', 'COL2'])['COL3']
             .agg(['max', lambda x: x.head(2).iat[-1]])
          )
grouped.columns = ['COL3','COL4']
grouped = grouped.reset_index()
print (grouped)
  COL1  COL2  COL3  COL4
0    A     1   0.4   0.2
1    B     4   0.7   0.7
+1

use the youngest function with the group and then reset index:

df2 = df.groupby(
          ['COL1', 'COL2']
      ).apply(
          lambda x: pd.Series(x.COL3.nlargest(2).values, index=['COL3', 'COL4'])
      ).reset_index()

outputs:

   COL1  COL2  COL3  COL4
0   A    1      0.4  0.2
1   B    4      0.7  0.7 
0
source

Source: https://habr.com/ru/post/1693212/


All Articles