How to get maximum value based on another series in pandas groupby

Question

How to get maximum value based on another series in pandas groupby

I have a large data frame, in a column named currency and amount_in_euros , the currency column contains data such as EUR, GBR, etc., and amount_in_euros contains a floating point value. I want to calculate the amount of each currency (EUR, GBR, etc.) And put the maximum value of the currency in the new series. I have to calculate this operation for each client . How to do it in pandas.

Input:

Customer  currency   amount_in_euros
1           EUR      10
1           GBR      6
1           GBR      18
1           EUR      2
1           EUR      3
2           IND      12 
.
.
.

Conclusion:

Customer  currency   amount_in_euros   max
1           EUR      10                GBR
1           GBR      6                 GBR
1           GBR      18                GBR
1           EUR      2                 GBR
1           EUR      3                 GBR 
2           IND      12                IND
. 
. 
.

still i tried

df=pd.read_csv('analysis.csv')
res=pd.DataFrame()
for u,v in df.groupby(['Customer']):
   temp= v[['currency','amount_in_euros']].groupby(['currency'])['amount_in_euros'].sum().reset_index().sort_values('amount_in_euros',ascending=False)
   v['max']=temp['currency'].iloc[0]
   res=res.append(v)

My above code works fine for me, but adding an operation takes a lot of time. please help me solve this problem. Thanks in advance.

+4

python pandas pandas-groupby

Mohamed thasin ah Feb 27 '18 at 11:50

source

1

jezrael · Accepted Answer · 2018-02-27T11:58:56+0000

:

sum Customer currency
sort_values
drop_duplicates max
Series set_index
last map

df1 = df.groupby(['Customer', 'currency'], as_index=False)['amount_in_euros'].sum()
s = (df1.sort_values(['Customer','amount_in_euros'])
        .drop_duplicates('Customer', keep='last')
        .set_index('Customer')['currency'])

df['max'] = df['Customer'].map(s)
print (df)
   Customer currency  amount_in_euros  max
0         1      EUR               10  GBR
1         1      GBR                6  GBR
2         1      GBR               18  GBR
3         1      EUR                2  GBR
4         1      EUR                3  GBR
5         2      IND               12  IND

EDIT:

, :

print (df)
   Customer currency  amount_in_euros
0         1      EUR               10
1         1      GBR                6
2         1      GBR               18
3         1      EUR                2
4         1      USD                1
5         1      USD                2
6         1      EUR                3
7         2      IND               12
8         2      USD                2

df1 = df.groupby(['Customer', 'currency'], as_index=False)['amount_in_euros'].sum()
df2 = df1.sort_values(['Customer','amount_in_euros'])
df2 = (df2.set_index(['Customer', 
                      df2.groupby(['Customer']).cumcount(ascending=False)])['currency']
          .unstack()
          .add_prefix('max_'))

print (df2)
         max_0 max_1 max_2
Customer                  
1          GBR   EUR   USD
2          IND   USD  None

df = df.join(df2, on='Customer')

print (df)
   Customer currency  amount_in_euros max_0 max_1 max_2
0         1      EUR               10   GBR   EUR   USD
1         1      GBR                6   GBR   EUR   USD
2         1      GBR               18   GBR   EUR   USD
3         1      EUR                2   GBR   EUR   USD
4         1      USD                1   GBR   EUR   USD
5         1      USD                2   GBR   EUR   USD
6         1      EUR                3   GBR   EUR   USD
7         2      IND               12   IND   USD  None
8         2      USD                2   IND   USD  None

How to get maximum value based on another series in pandas groupby

More articles: