Assume this:
np.random.seed(123)
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
'foo', 'bar', 'foo', 'foo'],
'B' : ['one', 'one', 'two', 'three',
'two', 'two', 'one', 'three'],
'C' : np.random.randn(8),
'D' : np.random.randn(8)})
So, the data frame is as follows:
A B C D
0 foo one -1.085631 1.265936
1 bar one 0.997345 -0.866740
2 foo two 0.282978 -0.678886
3 bar three -1.506295 -0.094709
4 foo two -0.578600 1.491390
5 bar two 1.651437 -0.638902
6 foo one -2.426679 -0.443982
7 foo three -0.428913 -0.434351
I want to group dfby B, calculate the sum of the column Ctimes the sum of the column Dfor each group, and finally connecting this grouped result with the original df, In Python:
grouped = df.groupby('B').apply(lambda group: sum(group['C'])*sum(group['D'])).reset_index()
grouped.columns = ['B', 'new_value']
df.join(grouped.set_index('B'), on='B')
Is there a more pythonic and effective way to solve this kind of problem?
source
share