Pandas: filter-based aggregate in another column

I have a dataframe that looks like this:

Month Fruit Sales 1 Apple 45 1 Bananas 12 3 Apple 6 1 Kiwi 34 12 Melon 12 

I am trying to get a dataframe that looks like this

 Fruit Sales (month=1) Sales (month=2) Apple 55 65 Bananas 12 102 Kiwi 54 78 Melon 132 43 

I have now

 df=df.groupby(['Fruit']).agg({'Sales':np.sum}).reset_index() 

There must be some way to filter the arguments in agg () based on the "Month" variable. I just could not find it in the docs. Any help?

Edit: Thanks for the solution. To complicate the situation, I would like to summarize another column. Example:

 Month Fruit Sales Revenue 1 Apple 45 45 1 Bananas 12 12 3 Apple 6 6 1 Kiwi 34 34 12 Melon 12 12 

Preferred output will be similar to

  Sales Revenue Fruit 1 3 12 1 3 12 0 Apple 61 6 0 61 6 0 1 Bananas 12 6 0 12 6 0 2 Kiwi 34 0 0 34 0 0 3 Melon 0 0 12 0 0 12 

I managed to get this with df.pivot_table(values=['Sales','Revenue'], index='Fruit', columns=['Month'], aggfunc='np.sum').reset_index() , so my problem is resolved.

I tried to do the same with df.groupby(['Fruit', 'Month'])['Sales','Revenue'].sum().unstack('Month', fill_value=0).rename_axis(None, 1).reset_index() , but this raises a TypeError. Is it possible to perform the above operation using groupby ?

+5
source share
2 answers

To answer an updated question, you have to do something completely different. First group by elements that should be columns afterwards (month and fruit). Then calculate the sum of these groups and unstack after that DataFrame, which leaves the Fruit column as the index column.

 data = ''' Month Fruit Sales Revenue 1 Apple 45 45 1 Bananas 12 12 1 Apple 16 16 3 Apple 6 6 1 Kiwi 34 34 3 Bananas 6 6 12 Melon 12 12 ''' df = pd.read_csv(StringIO(data), sep='\s+') df.groupby(['Month', 'Fruit'])\ .sum()\ .unstack(level=0) 

Result

  Sales Revenue Month 1 3 12 1 3 12 Fruit Apple 61.0 6.0 NaN 61.0 6.0 NaN Bananas 12.0 6.0 NaN 12.0 6.0 NaN Kiwi 34.0 NaN NaN 34.0 NaN NaN Melon NaN NaN 12.0 NaN NaN 12.0 

old answer

Use the pivot_table method:

 import pandas as pd from io import StringIO data = '''\ Month Fruit Sales 1 Apple 45 1 Bananas 12 1 Apple 16 3 Apple 6 1 Kiwi 34 3 Bananas 6 12 Melon 12 ''' df = pd.read_csv(StringIO(data), sep='\s+') df.pivot_table('Sales', index='Fruit', columns=['Month'], aggfunc='sum') 

Result:

 Month 1 3 12 Fruit Apple 61.0 6.0 NaN Bananas 12.0 6.0 NaN Kiwi 34.0 NaN NaN Melon NaN NaN 12.0 
+4
source

UPDATE:

 In [177]: df Out[177]: Month Fruit Sales Revenue 0 1 Apple 45 45 1 1 Bananas 12 12 2 3 Apple 6 6 3 1 Kiwi 34 34 4 12 Melon 12 12 In [178]: df.groupby(['Fruit', 'Month'])[['Sales','Revenue']].sum().unstack('Month', fill_value=0) Out[178]: Sales Revenue Month 1 3 12 1 3 12 Fruit Apple 45 6 0 45 6 0 Bananas 12 0 0 12 0 0 Kiwi 34 0 0 34 0 0 Melon 0 0 12 0 0 12 

OLD answer:

Alternatively, you can use groupby() + unstack() :

 In [206]: df.groupby(['Fruit', 'Month'])['Sales'].sum().unstack('Month', fill_value=0) \ ...: .rename_axis(None, 1).reset_index() ...: Out[206]: Fruit 1 3 12 0 Apple 61 6 0 1 Bananas 12 6 0 2 Kiwi 34 0 0 3 Melon 0 0 12 
0
source

Source: https://habr.com/ru/post/1263753/


All Articles