How to get numpy date retrospective moving average of time with window based on date?

I have such timers:

                  times | data
1994-07-25 15:15:00.000 | 165
1994-07-25 16:00:00.000 | 165
1994-07-26 18:45:00.000 | 165

1994-07-27 15:15:00.000 | 165
1994-07-27 16:00:00.000 | 165

1994-07-28 18:45:00.000 | 165
1994-07-28 19:15:00.000 | 63
1994-07-28 20:35:00.000 | 64
1994-07-28 21:55:00.000 | 64

1994-07-29 14:15:00.000 | 62

1994-07-30 15:35:00.000 | 62
1994-07-30 16:55:00.000 | 61

I would like to make a moving average backward on this data, but with a window in a date , not in rows or in datetime .


For example, let's say lookback = 3 daysthen for

1994-07-29 14:15:00.000 | 62

the average value of its back reflection should be on average

1994-07-26 18:45:00.000 | 165

1994-07-27 15:15:00.000 | 165
1994-07-27 16:00:00.000 | 165

1994-07-28 18:45:00.000 | 165
1994-07-28 19:15:00.000 | 63
1994-07-28 20:35:00.000 | 64
1994-07-28 21:55:00.000 | 64

Because it is a 3-day callback, so the average value will start from 1994-07-26within 3 days, no matter how many rows in one day.


In addition, for multiple rows with the same date (not including time), their average values ​​for backtracking should be the same.


How can I easily achieve this?

+4
2

pandas DatetimeIndex .

rolling_mean, , .

import numpy as np
import pandas
df = pandas.DataFrame({'times': np.array(['1994-07-25 15:15:00.000',
                                '1994-07-25 16:00:00.000', 
                                '1994-07-26 18:45:00.000', 
                                '1994-07-27 15:15:00.000', 
                                '1994-07-27 16:00:00.000', 
                                '1994-07-28 18:45:00.000', 
                                '1994-07-28 19:15:00.000', 
                                '1994-07-28 20:35:00.000', 
                                '1994-07-28 21:55:00.000', 
                                '1994-07-29 14:15:00.000', 
                                '1994-07-30 15:35:00.000', 
                                '1994-07-30 16:55:00.000'], dtype='datetime64'),
                       'data': [165,165,165,165,165,165,63,64,64,62,62,61]})
df = df.set_index('times')
g = df.groupby(df.index.date)
days = 3
pandas.rolling_mean(g.sum(), days)

:

1994-07-25         NaN
1994-07-26         NaN
1994-07-27  275.000000
1994-07-28  283.666667
1994-07-29  249.333333
1994-07-30  180.333333

, center min_periods rolling_mean, , .

+3

pandas, resample:

import pandas as pd

, csv:

df=pd.read_csv('yourfile.txt',sep=' | ',parse_dates=True,index_col=0)

, , "" :

df2 = df.resample('D')

3 :

df2[-3:]

:

            data
1994-07-28  89.0
1994-07-29  62.0
1994-07-30  61.5

, yourfile.txt :

times | data
1994-07-25 15:15:00.000 | 165
1994-07-25 16:00:00.000 | 165
1994-07-26 18:45:00.000 | 16
1994-07-27 15:15:00.000 | 165
1994-07-27 16:00:00.000 | 165
1994-07-28 18:45:00.000 | 165
1994-07-28 19:15:00.000 | 63
1994-07-28 20:35:00.000 | 64
1994-07-28 21:55:00.000 | 64
1994-07-29 14:15:00.000 | 62
1994-07-30 15:35:00.000 | 62
1994-07-30 16:55:00.000 | 61
0

Source: https://habr.com/ru/post/1613644/


All Articles