Python pandas dataframe interpolate missing data

I have a dataset as shown below. We only have data for the last day of the month. I'm trying to interpolate the rest of it, is it right to do this?

Date Australia China 2011-01-01 NaN NaN 2011-01-02 NaN NaN - - - - - - 2011-01-31 4.75 5.81 2011-02-01 NaN NaN 2011-02-02 NaN NaN - - - - - - 2011-02-28 4.75 5.81 2011-03-01 NaN NaN 2011-03-02 NaN NaN - - - - - - 2011-03-31 4.75 6.06 2011-04-01 NaN NaN 2011-04-02 NaN NaN - - - - - - 2011-04-30 4.75 6.06 

To interpolate this data frame to find the missing NaN values, I use the following code

 import pandas as pd df = pd.read_csv("data.csv", index_col="Date") df.index = pd.DatetimeIndex(df.index) df.interpolate(method='linear', axis=0).ffill().bfill() 

But I get the error "TypeError: Unable to interpolate with all NaN".

What could be wrong here, how can I fix it?

Thanks.

+5
source share
2 answers

You can try converting dataframe to float with astype :

 import pandas as pd df = pd.read_csv("data.csv", index_col=['Date'], parse_dates=['Date']) print df Australia China Date 2011-01-31 4.75 5.81 2011-02-28 4.75 5.81 2011-03-31 4.75 6.06 2011-04-30 4.75 6.06 df = df.reindex(pd.date_range("2011-01-01", "2011-10-31"), fill_value="NaN") #convert to float df = df.astype(float) df = df.interpolate(method='linear', axis=0).ffill().bfill() 
 print df Australia China 2011-01-01 4.75 5.81 2011-01-02 4.75 5.81 2011-01-03 4.75 5.81 2011-01-04 4.75 5.81 2011-01-05 4.75 5.81 2011-01-06 4.75 5.81 2011-01-07 4.75 5.81 2011-01-08 4.75 5.81 2011-01-09 4.75 5.81 2011-01-10 4.75 5.81 2011-01-11 4.75 5.81 2011-01-12 4.75 5.81 2011-01-13 4.75 5.81 2011-01-14 4.75 5.81 2011-01-15 4.75 5.81 2011-01-16 4.75 5.81 2011-01-17 4.75 5.81 2011-01-18 4.75 5.81 2011-01-19 4.75 5.81 2011-01-20 4.75 5.81 2011-01-21 4.75 5.81 2011-01-22 4.75 5.81 2011-01-23 4.75 5.81 2011-01-24 4.75 5.81 2011-01-25 4.75 5.81 2011-01-26 4.75 5.81 2011-01-27 4.75 5.81 2011-01-28 4.75 5.81 2011-01-29 4.75 5.81 2011-01-30 4.75 5.81 ... ... ... 2011-10-02 4.75 6.06 2011-10-03 4.75 6.06 2011-10-04 4.75 6.06 2011-10-05 4.75 6.06 2011-10-06 4.75 6.06 2011-10-07 4.75 6.06 2011-10-08 4.75 6.06 2011-10-09 4.75 6.06 2011-10-10 4.75 6.06 2011-10-11 4.75 6.06 2011-10-12 4.75 6.06 2011-10-13 4.75 6.06 2011-10-14 4.75 6.06 2011-10-15 4.75 6.06 2011-10-16 4.75 6.06 2011-10-17 4.75 6.06 2011-10-18 4.75 6.06 2011-10-19 4.75 6.06 2011-10-20 4.75 6.06 2011-10-21 4.75 6.06 2011-10-22 4.75 6.06 2011-10-23 4.75 6.06 2011-10-24 4.75 6.06 2011-10-25 4.75 6.06 2011-10-26 4.75 6.06 2011-10-27 4.75 6.06 2011-10-28 4.75 6.06 2011-10-29 4.75 6.06 2011-10-30 4.75 6.06 2011-10-31 4.75 6.06 [304 rows x 2 columns] 

And you can omit ffill() because ffill() are only in the first lines of the dataframe :

 df = df.interpolate(method='linear', axis=0).ffill().bfill() 

in

 df = df.interpolate(method='linear', axis=0).bfill() 
+4
source

You can try dropping the NaN from the dataset before interpolating.

 import pandas as pd df = pd.read_csv("data.csv", index_col="Date") df = df.dropna() df.index = pd.DatetimeIndex(df.index) df.interpolate(method='linear', axis=0).ffill().bfill() 
+1
source

Source: https://habr.com/ru/post/1240230/


All Articles