How to perform pandas groupby operation in one column, but save another in the resulting data frame

My question is about groupby operation with pandas. I have the following DataFrame:

In [4]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est", "Est", "West", "West"]}) In [5]: df Out[5]: ABC 0 0 PO Est 1 1 PO Est 2 2 PA West 3 3 PA West 

This is what I would like to do: I want to group by column B and do the sum in column A. But in the end I would like column C to still be in the DataFrame. If I do this:

 In [8]: df.groupby(by="B").aggregate(pd.np.sum) Out[8]: A B PA 5 PO 1 

The job is in progress, but column C is missing. I can also do this:

 In [9]: df.groupby(by=["B", "C"]).aggregate(pd.np.sum) Out[9]: A BC PA West 5 PO Est 1 

or

 In [11]: df.groupby(by=["B", "C"], as_index=False).aggregate(pd.np.sum) Out[11]: BCA 0 PA West 5 1 PO Est 1 

But in both cases, it is grouped by B AND C, not just B, and retains the value of C. Is this what I want to make inappropriate or is there a way to do this?

+5
source share
1 answer

try using the DataFrameGroupBy.agg () method with dict of {column -> function} :

 In [6]: df.groupby('B').agg({'A':'sum', 'C':'first'}) Out[6]: CA B PA West 5 PO Est 1 

From the docs:

Function for joining groups. If the function should either work when passing a DataFrame or when passing to DataFrame.apply. If passed a dict, the keys must be the column names of the DataFrame.

or something like this depending on your goals:

 In [8]: df = pd.DataFrame({"A": range(4), "B": ["PO", "PO", "PA", "PA"], "C": ["Est1", "Est2", "West1", "West2"]}) In [9]: df.groupby('B').agg({'A':'sum', 'C':'first'}) Out[9]: CA B PA West1 5 PO Est1 1 In [10]: df['sum_A'] = df.groupby('B')['A'].transform('sum') In [11]: df Out[11]: ABC sum_A 0 0 PO Est1 1 1 1 PO Est2 1 2 2 PA West1 5 3 3 PA West2 5 
+9
source

Source: https://habr.com/ru/post/1259195/


All Articles