I have a dataframe as follows
df = pd.DataFrame({ 'X' : np.random.randn(50000)}, index=pd.date_range('1/1/2000', periods=50000, freq='T'))
df.head(10)
Out[37]:
X
2000-01-01 00:00:00 -0.699565
2000-01-01 00:01:00 -0.646129
2000-01-01 00:02:00 1.339314
2000-01-01 00:03:00 0.559563
2000-01-01 00:04:00 1.529063
2000-01-01 00:05:00 0.131740
2000-01-01 00:06:00 1.282263
2000-01-01 00:07:00 -1.003991
2000-01-01 00:08:00 -1.594918
2000-01-01 00:09:00 -0.775230
I would like to create a variable containing sumX
- for the last 5 days ( not including current monitoring )
- only taking into account the observations that occur at the same hour as the current observation.
In other words:
- In the index
2000-01-01 00:00:00, it df['rolling_sum_same_hour']contains the sum of the X values observed during 00:00:00the last 5 days in the data (not counting 2000-01-01, of course). - The index
2000-01-01 00:01:00, df['rolling_sum_same_hour']containing the sum of X, observed in 00:00:01over the last 5 days, etc.
, , .
df['rolling_sum_same_hour']=df.at_time(df.index.minute).rolling(window=5).sum()
.
?
!