Repeat step in the skating window with pandas

Suppose that I have daily data ( not on a regular basis ), I want to calculate the standard deviation (or an arbitrary non-linear function) for 5 months within a month. For example, for May 2012, I would calculate stddev from the period from January 2012 to May 2012 (5 months). In June 2012, the period begins in February 2012, etc. The end result is a time series with monthly values.

I can’t apply the transition window because it will be daily at first, and secondly, I need to specify the number of values ​​(the sliding window is not aggregated by time frames, some messages are addressed to this, but they are not related to my problem, since the display will continue every new day).

I cannot apply re-sampling , because then the sample will be every 5 months, e..g. I would only have values ​​for May 2012, October 2012, March 2013 ... Finally, as a function it is not linear. I can’t restore it by first making a monthly sample and then applying a 5-second sliding window on it.

So, I will need some oversampling function applied to the rolling determined by the time interval (not the number of values).

How to do it in pandas? One approach might be to combine several (5 in this example) remarketing (5 months) time series, each with one month of offset, and then align all these series into one ... but I don't know how to implement this .

+3
source share
2 answers

Here the attempt is not super clean, but it can work.

Dummy data:

df = pd.DataFrame(data={'a': 1.}, index=pd.date_range(start='2001-1-1', periods=1000)) 

First, define a function to reduce the date n number of months. This needs to be cleared, but works for n <= 12.

 from datetime import datetime def decrease_month(date, n): assert(n <= 12) new_month = date.month - n year_offset = 0 if new_month <= 0: year_offset = -1 new_month = 12 + new_month return datetime(date.year + year_offset, new_month, 1) 

Then add 5 new columns for the 5 rolling periods that will cross each date.

 for n in range(rolling_period): df['m_' + str(n)] = df.index.map(lambda x: decrease_month(x, n)) 

Then - use the melt function to convert data from wide-angle to long, so each rolling period will have one record.

 df_m = pd.melt(df, id_vars='a') 

You should be able to group the newly created column, and each date will represent a valid period of 5 months.

 In [222]: df_m.groupby('value').sum() Out[222]: a value 2000-09-01 31 2000-10-01 59 2000-11-01 90 2000-12-01 120 2001-01-01 151 2001-02-01 150 2001-03-01 153 2001-04-01 153 2001-05-01 153 2001-06-01 153 2001-07-01 153 ... 
+1
source

I had a similar problem regarding the timedelta series, where I wanted to take the moving average and then repeat the selection. Here is an example when I have 100 seconds of data. I take the moving average of 10 second windows, and then re-reset every 5 seconds, taking the first entry in each container for re-fetching. The result should be the previous 10 second average in 5-second increments. You could do something similar in the month format instead of seconds:

 df = pd.DataFrame(range(0,100), index=pd.TimedeltaIndex(range(0,100),'s')) df.rolling('10s').mean().resample('5s').first() 

Result:

  0 00:00:00 0.0 00:00:05 2.5 00:00:10 5.5 00:00:15 10.5 00:00:20 15.5 00:00:25 20.5 00:00:30 25.5 00:00:35 30.5 00:00:40 35.5 00:00:45 40.5 00:00:50 45.5 00:00:55 50.5 00:01:00 55.5 00:01:05 60.5 00:01:10 65.5 00:01:15 70.5 00:01:20 75.5 00:01:25 80.5 00:01:30 85.5 00:01:35 90.5 
+1
source

Source: https://habr.com/ru/post/1271125/


All Articles