How to iterate over time periods in pandas

Question

How to iterate over time periods in pandas

I have a pandas Series with DatetimeIndex with daily frequency. I want to iterate over this Series with an arbitrary frequency and an arbitrary back window. For example: Iterate for six months with a 1y back window.

Something like this would be ideal:

 for df_year in df.timegroup(freq='6m', lookback='1y'): # df_year will span one year of daily prices and be generated every 6 months

I know about TimeGrouper , but did not understand how this can be done. Anyway, I could just encode it manually, but was hoping for a clever pandas single line.

Edit: This is getting a little closer:

 pd.rolling_apply(df, 252, lambda s: s.sum(), freq=pd.datetools.BMonthEnd())

This does not work because it applies a 252 * BMonthEnd () back window, while I would like it to be independent and have a back window of 252 days at each end of the month.

+6

python pandas

twiecki Mar 17 '15 at 13:40

source share

2 answers

This solution provides a single liner using a list. Starting from the left side of the time series and iterating forward (a reverse iteration can also be performed), the iteration returns a subset of the index equal to the loop window and goes into step size equal to the frequency. Note that the last period is probably a stub whose length is shorter than the reverse window.

This method uses days, not monthly or weekly shifts.

 freq = 30 # Days lookback = 60 # Days idx = pd.date_range('2010-01-01', '2015-01-01') [idx[(freq * n):(lookback + freq * n)] for n in range(int(len(idx) / freq))] Out[86]: [<class 'pandas.tseries.index.DatetimeIndex'> [2010-01-01, ..., 2010-03-01] Length: 60, Freq: D, Timezone: None, <class 'pandas.tseries.index.DatetimeIndex'> [2010-01-31, ..., 2010-03-31] Length: 60, Freq: D, Timezone: None, ... Length: 60, Freq: D, Timezone: None, <class 'pandas.tseries.index.DatetimeIndex'> [2014-11-06, ..., 2015-01-01] Length: 57, Freq: D, Timezone: None]

+2

Alexander Mar 17 '15 at 15:02

source share

Jeff · Accepted Answer · 2015-03-17T21:59:30+0000

I think this is what you are looking for

Build a series of frequencies. Use 1 for clarification here.

 In [77]: i = pd.date_range('20110101','20150101',freq='B') In [78]: s = Series(1,index=i) In [79]: s Out[79]: 2011-01-03 1 2011-01-04 1 2011-01-05 1 2011-01-06 1 2011-01-07 1 .. 2014-12-26 1 2014-12-29 1 2014-12-30 1 2014-12-31 1 2015-01-01 1 Freq: B, dtype: int64 In [80]: len(s) Out[80]: 1044

Match the index with a different frequency. This makes each index element at the end of the month here.

 In [81]: s.index = s.index.to_period('M').to_timestamp('M') In [82]: s Out[82]: 2011-01-31 1 2011-01-31 1 2011-01-31 1 2011-01-31 1 2011-01-31 1 .. 2014-12-31 1 2014-12-31 1 2014-12-31 1 2014-12-31 1 2015-01-31 1 dtype: int64

Then its a simple change to a different frequency. This gives you the number of working days in this period.

 In [83]: s.resample('3M',how='sum') Out[83]: 2011-01-31 21 2011-04-30 64 2011-07-31 65 2011-10-31 66 2012-01-31 66 .. 2014-01-31 66 2014-04-30 63 2014-07-31 66 2014-10-31 66 2015-01-31 44 Freq: 3M, dtype: int64

How to iterate over time periods in pandas

More articles: