How to group pandas DataFrame by changing dates?

I am trying to collapse daily data into fiscal quarter data. For example, I have a table with fiscal quarter end dates:

Company Period Quarter_End M 2016Q1 05/02/2015 M 2016Q2 08/01/2015 M 2016Q3 10/31/2015 M 2016Q4 01/30/2016 WFM 2015Q2 04/12/2015 WFM 2015Q3 07/05/2015 WFM 2015Q4 09/27/2015 WFM 2016Q1 01/17/2016 

and daily data table:

 Company Date Price M 06/20/2015 1.05 M 06/22/2015 4.05 M 07/10/2015 3.45 M 07/29/2015 1.86 M 08/24/2015 1.58 M 09/02/2015 8.64 M 09/22/2015 2.56 M 10/20/2015 5.42 M 11/02/2015 1.58 M 11/24/2015 4.58 M 12/03/2015 6.48 M 12/05/2015 4.56 M 01/03/2016 7.14 M 01/30/2016 6.34 WFM 06/20/2015 1.05 WFM 06/22/2015 4.05 WFM 07/10/2015 3.45 WFM 07/29/2015 1.86 WFM 08/24/2015 1.58 WFM 09/02/2015 8.64 WFM 09/22/2015 2.56 WFM 10/20/2015 5.42 WFM 11/02/2015 1.58 WFM 11/24/2015 4.58 WFM 12/03/2015 6.48 WFM 12/05/2015 4.56 WFM 01/03/2016 7.14 WFM 01/17/2016 6.34 

And I would like to create the table below.

 Company Period Quarter_end Sum(Price) M 2016Q2 8/1/2015 10.41 M 2016Q3 10/31/2015 18.2 M 2016Q4 1/30/2016 30.68 WFM 2015Q3 7/5/2015 5.1 WFM 2015Q4 9/27/2015 18.09 WFM 2016Q1 1/17/2016 36.1 

However, I do not know how to group by changing dates without going through each record. Any help is appreciated.

Thanks!

+5
source share
2 answers

I think you can use merge_ordered :

 #first convert columns to datetime df1.Quarter_End = pd.to_datetime(df1.Quarter_End) df2.Date = pd.to_datetime(df2.Date) df = pd.merge_ordered(df1, df2, left_on=['Company','Quarter_End'], right_on=['Company','Date'], how='outer') print (df) Company Period Quarter_End Date Price 0 M 2016Q1 2015-05-02 NaT NaN 1 M NaN NaT 2015-06-20 1.05 2 M NaN NaT 2015-06-22 4.05 3 M NaN NaT 2015-07-10 3.45 4 M NaN NaT 2015-07-29 1.86 5 M 2016Q2 2015-08-01 NaT NaN 6 M NaN NaT 2015-08-24 1.58 7 M NaN NaT 2015-09-02 8.64 8 M NaN NaT 2015-09-22 2.56 9 M NaN NaT 2015-10-20 5.42 10 M 2016Q3 2015-10-31 NaT NaN 11 M NaN NaT 2015-11-02 1.58 12 M NaN NaT 2015-11-24 4.58 13 M NaN NaT 2015-12-03 6.48 14 M NaN NaT 2015-12-05 4.56 15 M NaN NaT 2016-01-03 7.14 16 M 2016Q4 2016-01-30 2016-01-30 6.34 17 WFM 2015Q2 2015-04-12 NaT NaN 18 WFM NaN NaT 2015-06-20 1.05 19 WFM NaN NaT 2015-06-22 4.05 20 WFM 2015Q3 2015-07-05 NaT NaN 21 WFM NaN NaT 2015-07-10 3.45 22 WFM NaN NaT 2015-07-29 1.86 23 WFM NaN NaT 2015-08-24 1.58 24 WFM NaN NaT 2015-09-02 8.64 25 WFM NaN NaT 2015-09-22 2.56 26 WFM 2015Q4 2015-09-27 NaT NaN 27 WFM NaN NaT 2015-10-20 5.42 28 WFM NaN NaT 2015-11-02 1.58 29 WFM NaN NaT 2015-11-24 4.58 30 WFM NaN NaT 2015-12-03 6.48 31 WFM NaN NaT 2015-12-05 4.56 32 WFM NaN NaT 2016-01-03 7.14 33 WFM 2016Q1 2016-01-17 2016-01-17 6.34 

Then fill in the NaN in the Period and Quarter_End bfill and aggregate sum . If you want to remove all NaN values, add Series.dropna and the last reset_index :

 df.Period = df.Period.bfill() df.Quarter_End = df.Quarter_End.bfill() print (df.groupby(['Company','Period','Quarter_End'])['Price'].sum().dropna().reset_index()) Company Period Quarter_End Price 0 M 2016Q2 2015-08-01 10.41 1 M 2016Q3 2015-10-31 18.20 2 M 2016Q4 2016-01-30 30.68 3 WFM 2015Q3 2015-07-05 5.10 4 WFM 2015Q4 2015-09-27 18.09 5 WFM 2016Q1 2016-01-17 36.10 
+6
source
  • set_index
  • pd.concat for index alignment
  • groupby with agg

 prd_df = period_df.set_index(['Company', 'Quarter_End']) prc_df = price_df.set_index(['Company', 'Date'], drop=False) df = pd.concat([prd_df, prc_df], axis=1) df.groupby([df.index.get_level_values(0), df.Period.bfill()]) \ .agg(dict(Date='last', Price='sum')).dropna() 

enter image description here

+4
source

Source: https://habr.com/ru/post/1258474/


All Articles