I have a dataframe with a date type column and a float type column.
date value
0 2010-01-01 01:23:00 21.2
1 2010-01-02 01:33:00 63.4
2 2010-01-03 06:02:00 80.6
3 2010-01-04 06:05:00 50.1
4 2010-01-05 06:20:00 346.5
5 2010-01-06 07:44:00 111.8
6 2010-01-07 08:00:00 113.1
7 2010-01-08 08:22:00 10.6
8 2010-01-09 09:00:00 287.2
9 2010-01-10 09:14:00 1652.6
I want to create a new column to record the average one hour before the current time of the iteration row.
[UPDATE] Example :
If the current iteration 4 2010-01-05 06:20:00 346.5, I need to calculate (50.1 + 80.6) / 2(the value is in the range 2010-01-05 05:20:00~2010-01-05 06:20:00and calculate the average value).
date value before_1hr_mean
4 2010-01-05 06:20:00 346.5 65.35
I use iterrows()to solve this problem as the following code. But this method is very slow, and the function is iterrows()usually not recommended in pandas, and this line will become
[UPDATE]
df['before_1hr_mean'] = np.nan
for index, row in df.iterrows():
df.loc[index, 'before_1hr_mean'] = df[(df['date'] < row['date']) & \
(df['date'] >= row['date'] - pd.Timedelta(hours=1))]['value'].mean()
Is there a better way to handle this situation?
source
share