Remove NaN values from data frame without fillna or Interpolate

Question

Remove NaN values from data frame without fillna or Interpolate

I have a dataset:

367235 419895 992194 1999-01-11 8 5 1 1999-03-23 NaN 4 NaN 1999-04-30 NaN NaN 1 1999-06-02 NaN 9 NaN 1999-08-08 2 NaN NaN 1999-08-12 NaN 3 NaN 1999-08-17 NaN NaN 10 1999-10-22 NaN 3 NaN 1999-12-04 NaN NaN 4 2000-03-04 2 NaN NaN 2000-09-29 9 NaN NaN 2000-09-30 9 NaN NaN

When I draw it using plt.plot(df, '-o') , I get the following:

But I would like the data from each column to be connected in a row like this:

I understand that matplotlib does not bind datapoints that are separated by NaN values. I have considered all the options here for processing missing data, but all of them would substantially distort the data in the data frame. This is because each value in the data frame represents an incident; if I try to replace NaN with scalar values or use the interpolation option, I get a bunch of points that are not actually in my dataset. Here's what the interpolation looks like:

df_wanted2 = df.apply(pd.Series.interpolate)

If I try to use dropna , I will lose entire rows / columns from the data framework, and these rows store valuable data.

Does anyone know a way to connect my points? I suspect that I need to extract individual arrays from the framework and draw them as indicated in here , but this seems like a lot of work (and my actual framework is much bigger.) Does anyone have a solution?

+6

python matplotlib pandas plot

oymonk Dec 20 '16 at 22:49

source share

3 answers

Try iterating with apply , then release the missing values inside the apply function

 def make_plot(s): s.dropna().plot() df.apply(make_plot)

+4

Ted petrou Dec 20 '16 at 10:55

source share

An alternative would be to outsource the processing of NaN to libary Plotly graphics using the connectgaps function.

 import plotly import pandas as pd txt = """367235 419895 992194 1999-01-11 8 5 1 1999-03-23 NaN 4 NaN 1999-04-30 NaN NaN 1 1999-06-02 NaN 9 NaN 1999-08-08 2 NaN NaN 1999-08-12 NaN 3 NaN 1999-08-17 NaN NaN 10 1999-10-22 NaN 3 NaN 1999-12-04 NaN NaN 4 2000-03-04 2 NaN NaN 2000-09-29 9 NaN NaN 2000-09-30 9 NaN NaN""" data_points = [line.split(' ') for line in txt.splitlines()[1:]] df = pd.DataFrame(data_points) data = list() for i in range(1, len(df.columns)): data.append(plotly.graph_objs.Scatter( x = df.iloc[:,0].tolist(), y = df.iloc[:,i].tolist(), mode = 'line', connectgaps = True )) fig = dict(data=data) plotly.plotly.sign_in('user', 'token') plot_url = plotly.plotly.plot(fig)

+3

Maximilian peters Dec 20 '16 at 23:23

source share

piRSquared · Accepted Answer · 2016-12-20T22:55:27+0000

use the interpolate method with the parameter 'index'

 df.interpolate('index').plot(marker='o')

alternative answer

plot after iteritems

 for _, c in df.iteritems(): c.dropna().plot(marker='o')

additional loan
only interpolate from the first valid index to the last valid index for each column

 for _, c in df.iteritems(): fi, li = c.first_valid_index(), c.last_valid_index() c.loc[fi:li].interpolate('index').plot(marker='o')

Remove NaN values ​​from data frame without fillna or Interpolate

More articles:

Remove NaN values from data frame without fillna or Interpolate