Extra bit with Pandas Resample

I have a pandas data frame defined as follows:

last_4_weeks_range = pandas.date_range( start=datetime.datetime(2001, 5, 4), periods=28) last_4_weeks = pandas.DataFrame( [{'REST_KEY': 1, 'DLY_TRN_QT': 80, 'DLY_SLS_AMT': 90, 'COOP_DLY_TRN_QT': 30, 'COOP_DLY_SLS_AMT': 20}] * 28 + [{'REST_KEY': 2, 'DLY_TRN_QT': 70, 'DLY_SLS_AMT': 10, 'COOP_DLY_TRN_QT': 50, 'COOP_DLY_SLS_AMT': 20}] * 28, index=last_4_weeks_range.append(last_4_weeks_range)) last_4_weeks.sort(inplace=True) 

and when I proceed to reconfigure it:

 In [265]: last_4_weeks.resample('7D', how='sum') Out[265]: COOP_DLY_SLS_AMT COOP_DLY_TRN_QT DLY_SLS_AMT DLY_TRN_QT REST_KEY 2001-05-04 280 560 700 1050 21 2001-05-11 280 560 700 1050 21 2001-05-18 280 560 700 1050 21 2001-05-25 280 560 700 1050 21 2001-06-01 0 0 0 0 0 

As a result, I get an extra empty box that I did not expect to see - 2001-06-01. I would not expect this bunker to be there, since my 28 days are evenly divided into the 7-day repeat sample that I am doing. I tried to communicate with private kwarg, but I can not avoid this extra garbage. Why does this extra bit appear when I have nothing to invest in it and how can I avoid creating it?

What I'm ultimately trying to do is get 7 day averages for REST_KEY, so

 In [266]: last_4_weeks.groupby('REST_KEY').resample('7D', how='sum').mean(level=0) Out[266]: COOP_DLY_SLS_AMT COOP_DLY_TRN_QT DLY_SLS_AMT DLY_TRN_QT REST_KEY REST_KEY 1 112 168 504 448 5.6 2 112 280 56 392 11.2 

but this extra pool throws my average value (for example, for COOP_DLY_SLS_AMT I have 112 that (20 * 7 * 4) / 5, not 140, which I got from (20 * 7 * 4) / 4, if I have there wasn’t this extra bean.) I also did not expect REST_KEY to appear in the aggregation, as this is part of the group, but this is really a smaller problem.

PS I am using pandas 0.11.0

+4
source share
1 answer

I think this is a mistake:

Exit from pandas 0.9.0dev on mac:

 In [3]: pandas.__version__ Out[3]: '0.9.0.dev-1e68fd9' In [6]: last_4_weeks.resample('7D', how='sum') Out[6]: COOP_DLY_SLS_AMT COOP_DLY_TRN_QT DLY_SLS_AMT DLY_TRN_QT REST_KEY 2001-05-04 40 80 100 150 3 2001-05-11 280 560 700 1050 21 2001-05-18 280 560 700 1050 21 2001-05-25 280 560 700 1050 21 2001-06-01 240 480 600 900 18 In [4]: last_4_weeks.groupby('REST_KEY').resample('7D', how='sum').mean(level=0) Out[4]: COOP_DLY_SLS_AMT COOP_DLY_TRN_QT DLY_SLS_AMT DLY_TRN_QT REST_KEY REST_KEY 1 112 168 504 448 5.6 2 112 280 56 392 11.2 

I am using this version (via freeze):

 numpy==1.8.0.dev-9597b1f-20120920 pandas==0.9.0.dev-1e68fd9-20120920 
-1
source

Source: https://habr.com/ru/post/1481033/


All Articles