Fill in the missing values ​​with close row values ​​in the range of 5 minutes

Using this code:

import numpy as np 
import pandas as pd
df = pd.read_csv('wind.txt', header=0, delim_whitespace= True, index_col = True)

The dataframe looks something like this:

Date               Vel Dir
2016-07-12 16:15:00 2.8  1.8
2016-07-12 16:16:00 3.9  21.8
2016-07-12 16:17:00 9.8  4.8
2016-07-12 16:18:00 16.9 5.8
2016-07-12 16:19:00 17.0 7.1
2016-07-12 16:20:00 NaN  NaN
2016-07-12 16:21:00 2.8  1.8
2016-07-12 16:22:00 3.9  21.8
...                 ...  ...
...                 ...  ...
2017-01-01 00:00:00 21.2  19.7

Sometimes there is a lot of missing data in the dataframe, for example here:

Date               Vel   Dir
2016-07-12 17:56:00 2.8  1.8
2016-07-12 17:57:00 NaN  NaN
2016-07-12 17:58:00 9.8  4.8
2016-07-12 17:59:00 NaN  NaN
2016-07-12 18:00:00 NaN  NaN
2016-07-12 18:01:00 NaN  NaN
2016-07-12 18:02:00 2.8  1.8
2016-07-12 18:03:00 NaN  NaN
...                 ...  ...
...                 ...  ...
2017-01-01 00:00:00 21.2  19.7

The first goal was to create a new data frame, but instead, a time of 1 minute used the time after 3 hours. Using this code:

df2 = pd.DataFrame({'Date':pd.date_range(start='2016-07-12 18:00:00',end='2017-01-01 00:00:00',freq='3H')})

So far, everything was fine, this generates a dataframe without Vel and Dir, as was shown, and like this:

Date               
2016-07-12 18:00:00
2016-07-12 21:00:00
2016-07-13 00:00:00
2016-07-13 03:00:00
...        ...
...        ...
2017-01-01 00:00:00

Now the challenge is to populate df2 with Vel and Dir df1 values ​​based on Date, but some data is missing. Knowing this, I tried merge_asof in this code:

df3 = pd.merge_asof(df2,df1, on='Date', tolerance=pd.Timedelta("5 minutes")).fillna('NaN')

It worked, but it only fills in the missing data with the first line. The goal is to use the values ​​in the rows after and after to fill in the missing data. Something like that:

Date               Vel   Dir
2016-07-12 17:56:00 2.8  1.8
2016-07-12 17:57:00 NaN  NaN
2016-07-12 17:58:00 9.8  4.8
2016-07-12 17:59:00 NaN  NaN
2016-07-12 18:00:00 NaN  NaN
2016-07-12 18:01:00 NaN  NaN
2016-07-12 18:02:00 2.8  1.8
2016-07-12 18:03:00 NaN  NaN
...                 ...  ...
...                 ...  ...
2017-01-01 00:00:00 21.2  19.7

:

2016-07-12 18:00:00 9.8  4.8

dataframe :

Date               Vel   Dir
2016-07-12 17:56:00 NaN  NaN
2016-07-12 17:57:00 NaN  NaN
2016-07-12 17:58:00 NaN  NaN
2016-07-12 17:59:00 NaN  NaN
2016-07-12 18:00:00 NaN  NaN
2016-07-12 18:01:00 NaN  NaN
2016-07-12 18:02:00 2.8  1.8
2016-07-12 18:03:00 NaN  NaN
...                 ...  ...
...                 ...  ...
2017-01-01 00:00:00 21.2  19.7

:

2016-07-12 18:00:00 2.8  1.8

, , 5 Vel Dir NaN. , - .

+4
1

Pandas 0.20.1 pd.merge_asof direction='nearest':

df3 = pd.merge_asof(df2,df1, on='Date', tolerance=pd.Timedelta("5 minutes"), direction='nearest').fillna('NaN')
+2

Source: https://habr.com/ru/post/1677473/


All Articles