Python pandas dataframe interpolate missing data

Question

Python pandas dataframe interpolate missing data

I have a dataset as shown below. We only have data for the last day of the month. I'm trying to interpolate the rest of it, is it right to do this?

Date Australia China 2011-01-01 NaN NaN 2011-01-02 NaN NaN - - - - - - 2011-01-31 4.75 5.81 2011-02-01 NaN NaN 2011-02-02 NaN NaN - - - - - - 2011-02-28 4.75 5.81 2011-03-01 NaN NaN 2011-03-02 NaN NaN - - - - - - 2011-03-31 4.75 6.06 2011-04-01 NaN NaN 2011-04-02 NaN NaN - - - - - - 2011-04-30 4.75 6.06

To interpolate this data frame to find the missing NaN values, I use the following code

 import pandas as pd df = pd.read_csv("data.csv", index_col="Date") df.index = pd.DatetimeIndex(df.index) df.interpolate(method='linear', axis=0).ffill().bfill()

But I get the error "TypeError: Unable to interpolate with all NaN".

What could be wrong here, how can I fix it?

Thanks.

+5

python pandas interpolation

Unnikrishnan Jan 9 '16 at 11:48

source share

2 answers

You can try dropping the NaN from the dataset before interpolating.

 import pandas as pd df = pd.read_csv("data.csv", index_col="Date") df = df.dropna() df.index = pd.DatetimeIndex(df.index) df.interpolate(method='linear', axis=0).ffill().bfill()

+1

station Jan 9 '16 at 11:54

source share

jezrael · Accepted Answer · 2016-01-09T14:39:44+0000

You can try converting dataframe to float with astype :

 import pandas as pd df = pd.read_csv("data.csv", index_col=['Date'], parse_dates=['Date']) print df Australia China Date 2011-01-31 4.75 5.81 2011-02-28 4.75 5.81 2011-03-31 4.75 6.06 2011-04-30 4.75 6.06 df = df.reindex(pd.date_range("2011-01-01", "2011-10-31"), fill_value="NaN") #convert to float df = df.astype(float) df = df.interpolate(method='linear', axis=0).ffill().bfill()

 print df Australia China 2011-01-01 4.75 5.81 2011-01-02 4.75 5.81 2011-01-03 4.75 5.81 2011-01-04 4.75 5.81 2011-01-05 4.75 5.81 2011-01-06 4.75 5.81 2011-01-07 4.75 5.81 2011-01-08 4.75 5.81 2011-01-09 4.75 5.81 2011-01-10 4.75 5.81 2011-01-11 4.75 5.81 2011-01-12 4.75 5.81 2011-01-13 4.75 5.81 2011-01-14 4.75 5.81 2011-01-15 4.75 5.81 2011-01-16 4.75 5.81 2011-01-17 4.75 5.81 2011-01-18 4.75 5.81 2011-01-19 4.75 5.81 2011-01-20 4.75 5.81 2011-01-21 4.75 5.81 2011-01-22 4.75 5.81 2011-01-23 4.75 5.81 2011-01-24 4.75 5.81 2011-01-25 4.75 5.81 2011-01-26 4.75 5.81 2011-01-27 4.75 5.81 2011-01-28 4.75 5.81 2011-01-29 4.75 5.81 2011-01-30 4.75 5.81 ... ... ... 2011-10-02 4.75 6.06 2011-10-03 4.75 6.06 2011-10-04 4.75 6.06 2011-10-05 4.75 6.06 2011-10-06 4.75 6.06 2011-10-07 4.75 6.06 2011-10-08 4.75 6.06 2011-10-09 4.75 6.06 2011-10-10 4.75 6.06 2011-10-11 4.75 6.06 2011-10-12 4.75 6.06 2011-10-13 4.75 6.06 2011-10-14 4.75 6.06 2011-10-15 4.75 6.06 2011-10-16 4.75 6.06 2011-10-17 4.75 6.06 2011-10-18 4.75 6.06 2011-10-19 4.75 6.06 2011-10-20 4.75 6.06 2011-10-21 4.75 6.06 2011-10-22 4.75 6.06 2011-10-23 4.75 6.06 2011-10-24 4.75 6.06 2011-10-25 4.75 6.06 2011-10-26 4.75 6.06 2011-10-27 4.75 6.06 2011-10-28 4.75 6.06 2011-10-29 4.75 6.06 2011-10-30 4.75 6.06 2011-10-31 4.75 6.06 [304 rows x 2 columns]

And you can omit ffill() because ffill() are only in the first lines of the dataframe :

 df = df.interpolate(method='linear', axis=0).ffill().bfill()

in

 df = df.interpolate(method='linear', axis=0).bfill()

Python pandas dataframe interpolate missing data

More articles: