How to sum over many columns using pandas groupby?

Question

How to sum over many columns using pandas groupby?

I have a dataframe that looks like

day  type  col  d_1  d_2  d_3  d_4  d_5...
1    A     1    1    0    1    0
1    A     2    1    0    1    0
2    B     1    1    1    0    0

That is, I have one normal column (col) and many columns with the prefix d _

I need to run a group by day and enter, and I want to calculate the sum of the values in each d_ column for each day type combination. I also need to perform other aggregation functions in other columns in my data (e.g. colin the example)

I can use:

agg_df=df.groupby(['day','type']).agg({'d_1': 'sum', 'col': 'mean'})

but this calculates the sum for only one column d_. How can I specify all possible d_ columns in my data?

In other words, I would like to write something like

agg_df=df.groupby(['day','type']).agg({'d_*': 'sum', 'col': 'mean'})

so the expected result:

day  type  col  d_1  d_2  d_3  d_4  d_5...
1    A     1.5  2    0    2    0    ...
2    B     1    1    1    0    0

As you can see, col is aggregated by mean, while d_ columns are summed.

Thank you for your help!

+4

python pandas

ℕʘʘḆḽḘ 08 . '16 12:32

2

IIUC d_*. str.contain :

cols = df.columns[df.columns.str.contains('(d_)+|col')]
agg_df=df.groupby(['day','type'])[cols].sum()


In [150]: df
Out[150]:
   day type  col  d_1  d_2  d_3  d_4
0    1    A    1    1    0    1    0
1    1    A    2    1    0    1    0
2    2    B    1    1    1    0    0

In [155]: agg_df
Out[155]:
          col  d_1  d_2  d_3  d_4
day type
1   A       3    2    0    2    0
2   B       1    1    1    0    0

. col contains . , , |.

+6

Anton Protopopov 08 . '16 12:54

Colonel Beauvel · Accepted Answer · 2016-02-08T12:53:34+0000

filter:

In [23]: df.groupby(['day','type'], as_index=False)[df.filter(regex='d_.*').columns].sum()

Out[23]:
   day type  d_1  d_2  d_3  d_4
0    1    A    2    0    2    0
1    2    B    1    1    0    0

:

dic = {}
dic.update({i:np.sum for i in df.filter(regex='d_.*').columns})
dic.update({'col':np.mean})

In [48]: df.groupby(['day','type'], as_index=False).agg(dic)
#Out[48]:
#   day type  d_2  d_3  d_1  col  d_4
#0    1    A    0    2    2  1.5    0
#1    2    B    1    0    1  1.0    0

How to sum over many columns using pandas groupby?

More articles: