Creating overlapping groups with pandas timegrouper

I use Pandas Timegrouper to group datapoints in a Pandas frame in python:

grouped = data.groupby(pd.TimeGrouper('30S')) 

I would like to know if there is a way to achieve window overlap, as suggested in this question: Window overlap in Pandas , keeping the Pandas dataframe as a data structure.

Update: verified synchronization of the three solutions below and the average value of the moving average:

 %timeit df.groupby(pd.TimeGrouper('30s',closed='right')).mean() %timeit df.resample('30s',how='mean',closed='right') %timeit pd.rolling_mean(df,window=30).iloc[29::30] 

gives:

 1000 loops, best of 3: 336 Β΅s per loop 1000 loops, best of 3: 349 Β΅s per loop 1000 loops, best of 3: 199 Β΅s per loop 
+4
source share
1 answer

Create some data exactly 3 x 30 s

 In [51]: df = DataFrame(randn(90,2),columns=list('AB'),index=date_range('20130101 9:01:01',freq='s',periods=90)) 

Using TimeGrouper in this way is equivalent to re-fetching (and what actually does re-fetch) Note that I used closed to make sure that only 30 observations are included

 In [57]: df.groupby(pd.TimeGrouper('30s',closed='right')).mean() Out[57]: AB 2013-01-01 09:01:00 -0.214968 -0.162200 2013-01-01 09:01:30 -0.090708 -0.021484 2013-01-01 09:02:00 -0.160335 -0.135074 In [52]: df.resample('30s',how='mean',closed='right') Out[52]: AB 2013-01-01 09:01:00 -0.214968 -0.162200 2013-01-01 09:01:30 -0.090708 -0.021484 2013-01-01 09:02:00 -0.160335 -0.135074 

This is also equivalent if you then select 30 s intervals

 In [55]: pd.rolling_mean(df,window=30).iloc[28:40] Out[55]: AB 2013-01-01 09:01:29 NaN NaN 2013-01-01 09:01:30 -0.214968 -0.162200 2013-01-01 09:01:31 -0.150401 -0.180492 2013-01-01 09:01:32 -0.160755 -0.142534 2013-01-01 09:01:33 -0.114918 -0.181424 2013-01-01 09:01:34 -0.098945 -0.221110 2013-01-01 09:01:35 -0.052450 -0.169884 2013-01-01 09:01:36 -0.011172 -0.185132 2013-01-01 09:01:37 0.100843 -0.178179 2013-01-01 09:01:38 0.062554 -0.097637 2013-01-01 09:01:39 0.048834 -0.065808 2013-01-01 09:01:40 0.003585 -0.059181 

So, depending on what you want to achieve, it is easy to overlap using roll_mean and then select any β€œfrequency” you want. For example, here is a 5s-repeated sample with an interval of 30 seconds.

 In [61]: pd.rolling_mean(df,window=30)[9::5] Out[61]: AB 2013-01-01 09:01:10 NaN NaN 2013-01-01 09:01:15 NaN NaN 2013-01-01 09:01:20 NaN NaN 2013-01-01 09:01:25 NaN NaN 2013-01-01 09:01:30 -0.214968 -0.162200 2013-01-01 09:01:35 -0.052450 -0.169884 2013-01-01 09:01:40 0.003585 -0.059181 2013-01-01 09:01:45 -0.055886 -0.111228 2013-01-01 09:01:50 -0.110191 -0.045032 2013-01-01 09:01:55 0.093662 -0.036177 2013-01-01 09:02:00 -0.090708 -0.021484 2013-01-01 09:02:05 -0.286759 0.020365 2013-01-01 09:02:10 -0.273221 -0.073886 2013-01-01 09:02:15 -0.222720 -0.038865 2013-01-01 09:02:20 -0.175630 0.001389 2013-01-01 09:02:25 -0.301671 -0.025603 2013-01-01 09:02:30 -0.160335 -0.135074 
+4
source

Source: https://habr.com/ru/post/1498442/


All Articles