How to perform pandas groupby operation in one column, but save another in the resulting data frame

Question

How to perform pandas groupby operation in one column, but save another in the resulting data frame

My question is about groupby operation with pandas. I have the following DataFrame:

In [4]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est", "Est", "West", "West"]}) In [5]: df Out[5]: ABC 0 0 PO Est 1 1 PO Est 2 2 PA West 3 3 PA West

This is what I would like to do: I want to group by column B and do the sum in column A. But in the end I would like column C to still be in the DataFrame. If I do this:

 In [8]: df.groupby(by="B").aggregate(pd.np.sum) Out[8]: A B PA 5 PO 1

The job is in progress, but column C is missing. I can also do this:

 In [9]: df.groupby(by=["B", "C"]).aggregate(pd.np.sum) Out[9]: A BC PA West 5 PO Est 1

or

 In [11]: df.groupby(by=["B", "C"], as_index=False).aggregate(pd.np.sum) Out[11]: BCA 0 PA West 5 1 PO Est 1

But in both cases, it is grouped by B AND C, not just B, and retains the value of C. Is this what I want to make inappropriate or is there a way to do this?

+5

python pandas group-by dataframe

Ger Nov 03 '16 at 8:44

source share

1 answer

Maxu · Accepted Answer · 2016-11-03T08:47:12+0000

try using the DataFrameGroupBy.agg () method with dict of {column -> function} :

 In [6]: df.groupby('B').agg({'A':'sum', 'C':'first'}) Out[6]: CA B PA West 5 PO Est 1

From the docs:

Function for joining groups. If the function should either work when passing a DataFrame or when passing to DataFrame.apply. If passed a dict, the keys must be the column names of the DataFrame.

or something like this depending on your goals:

 In [8]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est1", "Est2", "West1", "West2"]}) In [9]: df.groupby('B').agg({'A':'sum', 'C':'first'}) Out[9]: CA B PA West1 5 PO Est1 1 In [10]: df['sum_A'] = df.groupby('B')['A'].transform('sum') In [11]: df Out[11]: ABC sum_A 0 0 PO Est1 1 1 1 PO Est2 1 2 2 PA West1 5 3 3 PA West2 5

How to perform pandas groupby operation in one column, but save another in the resulting data frame

More articles: