Pandas group data and join

Assume this:

np.random.seed(123)
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                           'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8),
                   'D' : np.random.randn(8)})

So, the data frame is as follows:

     A      B         C         D
0  foo    one -1.085631  1.265936
1  bar    one  0.997345 -0.866740
2  foo    two  0.282978 -0.678886
3  bar  three -1.506295 -0.094709
4  foo    two -0.578600  1.491390
5  bar    two  1.651437 -0.638902
6  foo    one -2.426679 -0.443982
7  foo  three -0.428913 -0.434351

I want to group dfby B, calculate the sum of the column Ctimes the sum of the column Dfor each group, and finally connecting this grouped result with the original df, In Python:

grouped = df.groupby('B').apply(lambda group: sum(group['C'])*sum(group['D'])).reset_index()
grouped.columns = ['B', 'new_value']
df.join(grouped.set_index('B'), on='B')

Is there a more pythonic and effective way to solve this kind of problem?

+4
source share
2 answers

Solution 1:

In [25]: df.groupby('B')['C','D'].transform('sum').prod(1)
Out[25]:
0    0.112635
1    0.112635
2    0.235371
3    1.023841
4    0.235371
5    0.235371
6    0.112635
7    1.023841
dtype: float64

Solution 2:

In [18]: grp = df.groupby('B')

In [19]: grp['C'].transform('sum') * grp['D'].transform('sum')
Out[19]:
0    0.112635
1    0.112635
2    0.235371
3    1.023841
4    0.235371
5    0.235371
6    0.112635
7    1.023841
dtype: float64

Demo:

In [20]: df
Out[20]:
     A      B         C         D
0  foo    one -1.085631  1.265936
1  bar    one  0.997345 -0.866740
2  foo    two  0.282978 -0.678886
3  bar  three -1.506295 -0.094709
4  foo    two -0.578600  1.491390
5  bar    two  1.651437 -0.638902
6  foo    one -2.426679 -0.443982
7  foo  three -0.428913 -0.434351

In [21]: grp = df.groupby('B')

In [22]: df['new'] = grp['C'].transform('sum') * grp['D'].transform('sum')

In [23]: df
Out[23]:
     A      B         C         D       new
0  foo    one -1.085631  1.265936  0.112635
1  bar    one  0.997345 -0.866740  0.112635
2  foo    two  0.282978 -0.678886  0.235371
3  bar  three -1.506295 -0.094709  1.023841
4  foo    two -0.578600  1.491390  0.235371
5  bar    two  1.651437 -0.638902  0.235371
6  foo    one -2.426679 -0.443982  0.112635
7  foo  three -0.428913 -0.434351  1.023841


In [26]: df['new2'] = df.groupby('B')['C','D'].transform('sum').prod(1)

In [27]: df
Out[27]:
     A      B         C         D       new      new2
0  foo    one -1.085631  1.265936  0.112635  0.112635
1  bar    one  0.997345 -0.866740  0.112635  0.112635
2  foo    two  0.282978 -0.678886  0.235371  0.235371
3  bar  three -1.506295 -0.094709  1.023841  1.023841
4  foo    two -0.578600  1.491390  0.235371  0.235371
5  bar    two  1.651437 -0.638902  0.235371  0.235371
6  foo    one -2.426679 -0.443982  0.112635  0.112635
7  foo  three -0.428913 -0.434351  1.023841  1.023841

Check:

In [28]: df.new.eq(df.new2).all()
Out[28]: True
+2
source

Solution 1

groupby ['C', 'D'], prod axis=1 ( , ). , B. join on='B', . , rename pd.Series , , .

df.join(df.groupby('B')['C', 'D'].sum().prod(1).rename('newCol'), on='B')

2

, 1, , map + assign df

df.assign(newCol=df.B.map(df.groupby('B')['C', 'D'].sum().prod(1)))

     A      B         C         D    newCol
0  foo    one -1.085631  1.265936  0.112635
1  bar    one  0.997345 -0.866740  0.112635
2  foo    two  0.282978 -0.678886  0.235371
3  bar  three -1.506295 -0.094709  1.023841
4  foo    two -0.578600  1.491390  0.235371
5  bar    two  1.651437 -0.638902  0.235371
6  foo    one -2.426679 -0.443982  0.112635
7  foo  three -0.428913 -0.434351  1.023841
+1

Source: https://habr.com/ru/post/1668201/


All Articles