Pandas: how to calculate the moving amount of a variable over the past few days, but only at a specific hour?

Question

Pandas: how to calculate the moving amount of a variable over the past few days, but only at a specific hour?

I have a dataframe as follows

df = pd.DataFrame({ 'X' : np.random.randn(50000)}, index=pd.date_range('1/1/2000', periods=50000, freq='T'))

df.head(10)
Out[37]: 
                            X
2000-01-01 00:00:00 -0.699565
2000-01-01 00:01:00 -0.646129
2000-01-01 00:02:00  1.339314
2000-01-01 00:03:00  0.559563
2000-01-01 00:04:00  1.529063
2000-01-01 00:05:00  0.131740
2000-01-01 00:06:00  1.282263
2000-01-01 00:07:00 -1.003991
2000-01-01 00:08:00 -1.594918
2000-01-01 00:09:00 -0.775230

I would like to create a variable containing sumX

for the last 5 days ( not including current monitoring )
only taking into account the observations that occur at the same hour as the current observation.

In other words:

In the index 2000-01-01 00:00:00, it df['rolling_sum_same_hour']contains the sum of the X values observed during 00:00:00the last 5 days in the data (not counting 2000-01-01, of course).
The index 2000-01-01 00:01:00, df['rolling_sum_same_hour']containing the sum of X, observed in 00:00:01over the last 5 days, etc.

, , .

df['rolling_sum_same_hour']=df.at_time(df.index.minute).rolling(window=5).sum()

. ?

!

+4

python pandas

ℕʘʘḆḽḘ 12 . '16 19:28

2

IIUC, , , .

df.X.groupby([df.index.hour, df.index.minute]).apply(lambda g: g.rolling(window=5).sum())

( , 5 10 .) :

In [43]: df.X.groupby([df.index.hour, df.index.minute]).apply(lambda g: g.rolling(window=5).sum()).tail()
Out[43]: 
2000-02-04 17:15:00   -2.135887
2000-02-04 17:16:00   -3.056707
2000-02-04 17:17:00    0.813798
2000-02-04 17:18:00   -1.092548
2000-02-04 17:19:00   -0.997104
Freq: T, Name: X, dtype: float64

+2

Ami Tavory 12 . '16 20:05

StarFox · Accepted Answer · 2016-09-12T20:05:14+0000

groupby!

df = # as you defined above
df['rolling_sum_by_time'] = df.groupby(df.index.time)['X'].apply(lambda x: x.shift(1).rolling(10).sum())

, , ( python datetime.time), , ( - ), !

Pandas: how to calculate the moving amount of a variable over the past few days, but only at a specific hour?

More articles: