Pandas group data and join

Question

Pandas group data and join

Assume this:

np.random.seed(123)
df = pd.DataFrame({'A' : ['foo', 'bar', 'foo', 'bar',
                          'foo', 'bar', 'foo', 'foo'],
                   'B' : ['one', 'one', 'two', 'three',
                           'two', 'two', 'one', 'three'],
                   'C' : np.random.randn(8),
                   'D' : np.random.randn(8)})

So, the data frame is as follows:

     A      B         C         D
0  foo    one -1.085631  1.265936
1  bar    one  0.997345 -0.866740
2  foo    two  0.282978 -0.678886
3  bar  three -1.506295 -0.094709
4  foo    two -0.578600  1.491390
5  bar    two  1.651437 -0.638902
6  foo    one -2.426679 -0.443982
7  foo  three -0.428913 -0.434351

I want to group dfby B, calculate the sum of the column Ctimes the sum of the column Dfor each group, and finally connecting this grouped result with the original df, In Python:

grouped = df.groupby('B').apply(lambda group: sum(group['C'])*sum(group['D'])).reset_index()
grouped.columns = ['B', 'new_value']
df.join(grouped.set_index('B'), on='B')

Is there a more pythonic and effective way to solve this kind of problem?

+4

python join pandas group-by

enneppi Jan 29 '17 at 20:31

source share

2 answers

Solution 1

groupby ['C', 'D'], prod axis=1 ( , ). , B. join on='B', . , rename pd.Series , , .

df.join(df.groupby('B')['C', 'D'].sum().prod(1).rename('newCol'), on='B')

2

, 1, , map + assign df

df.assign(newCol=df.B.map(df.groupby('B')['C', 'D'].sum().prod(1)))

     A      B         C         D    newCol
0  foo    one -1.085631  1.265936  0.112635
1  bar    one  0.997345 -0.866740  0.112635
2  foo    two  0.282978 -0.678886  0.235371
3  bar  three -1.506295 -0.094709  1.023841
4  foo    two -0.578600  1.491390  0.235371
5  bar    two  1.651437 -0.638902  0.235371
6  foo    one -2.426679 -0.443982  0.112635
7  foo  three -0.428913 -0.434351  1.023841

+1

piRSquared 29 . '17 20:47

Maxu · Accepted Answer · 2017-01-29T20:42:05+0000

Solution 1:

In [25]: df.groupby('B')['C','D'].transform('sum').prod(1)
Out[25]:
0    0.112635
1    0.112635
2    0.235371
3    1.023841
4    0.235371
5    0.235371
6    0.112635
7    1.023841
dtype: float64

Solution 2:

In [18]: grp = df.groupby('B')

In [19]: grp['C'].transform('sum') * grp['D'].transform('sum')
Out[19]:
0    0.112635
1    0.112635
2    0.235371
3    1.023841
4    0.235371
5    0.235371
6    0.112635
7    1.023841
dtype: float64

Demo:

In [20]: df
Out[20]:
     A      B         C         D
0  foo    one -1.085631  1.265936
1  bar    one  0.997345 -0.866740
2  foo    two  0.282978 -0.678886
3  bar  three -1.506295 -0.094709
4  foo    two -0.578600  1.491390
5  bar    two  1.651437 -0.638902
6  foo    one -2.426679 -0.443982
7  foo  three -0.428913 -0.434351

In [21]: grp = df.groupby('B')

In [22]: df['new'] = grp['C'].transform('sum') * grp['D'].transform('sum')

In [23]: df
Out[23]:
     A      B         C         D       new
0  foo    one -1.085631  1.265936  0.112635
1  bar    one  0.997345 -0.866740  0.112635
2  foo    two  0.282978 -0.678886  0.235371
3  bar  three -1.506295 -0.094709  1.023841
4  foo    two -0.578600  1.491390  0.235371
5  bar    two  1.651437 -0.638902  0.235371
6  foo    one -2.426679 -0.443982  0.112635
7  foo  three -0.428913 -0.434351  1.023841


In [26]: df['new2'] = df.groupby('B')['C','D'].transform('sum').prod(1)

In [27]: df
Out[27]:
     A      B         C         D       new      new2
0  foo    one -1.085631  1.265936  0.112635  0.112635
1  bar    one  0.997345 -0.866740  0.112635  0.112635
2  foo    two  0.282978 -0.678886  0.235371  0.235371
3  bar  three -1.506295 -0.094709  1.023841  1.023841
4  foo    two -0.578600  1.491390  0.235371  0.235371
5  bar    two  1.651437 -0.638902  0.235371  0.235371
6  foo    one -2.426679 -0.443982  0.112635  0.112635
7  foo  three -0.428913 -0.434351  1.023841  1.023841

Check:

In [28]: df.new.eq(df.new2).all()
Out[28]: True

Pandas group data and join

Solution 1

2

More articles: