Remove NaN values โ€‹โ€‹from data frame without fillna or Interpolate

I have a dataset:

367235 419895 992194 1999-01-11 8 5 1 1999-03-23 NaN 4 NaN 1999-04-30 NaN NaN 1 1999-06-02 NaN 9 NaN 1999-08-08 2 NaN NaN 1999-08-12 NaN 3 NaN 1999-08-17 NaN NaN 10 1999-10-22 NaN 3 NaN 1999-12-04 NaN NaN 4 2000-03-04 2 NaN NaN 2000-09-29 9 NaN NaN 2000-09-30 9 NaN NaN 

When I draw it using plt.plot(df, '-o') , I get the following:

data plotting

But I would like the data from each column to be connected in a row like this:

desired output from the data frame graph

I understand that matplotlib does not bind datapoints that are separated by NaN values. I have considered all the options here for processing missing data, but all of them would substantially distort the data in the data frame. This is because each value in the data frame represents an incident; if I try to replace NaN with scalar values โ€‹โ€‹or use the interpolation option, I get a bunch of points that are not actually in my dataset. Here's what the interpolation looks like:

df_wanted2 = df.apply(pd.Series.interpolate)

enter image description here

If I try to use dropna , I will lose entire rows / columns from the data framework, and these rows store valuable data.

Does anyone know a way to connect my points? I suspect that I need to extract individual arrays from the framework and draw them as indicated in here , but this seems like a lot of work (and my actual framework is much bigger.) Does anyone have a solution?

+6
source share
3 answers

use the interpolate method with the parameter 'index'

 df.interpolate('index').plot(marker='o') 

enter image description here

alternative answer

plot after iteritems

 for _, c in df.iteritems(): c.dropna().plot(marker='o') 

enter image description here


additional loan
only interpolate from the first valid index to the last valid index for each column

 for _, c in df.iteritems(): fi, li = c.first_valid_index(), c.last_valid_index() c.loc[fi:li].interpolate('index').plot(marker='o') 

enter image description here

+11
source

Try iterating with apply , then release the missing values โ€‹โ€‹inside the apply function

 def make_plot(s): s.dropna().plot() df.apply(make_plot) 
+4
source

An alternative would be to outsource the processing of NaN to libary Plotly graphics using the connectgaps function.

 import plotly import pandas as pd txt = """367235 419895 992194 1999-01-11 8 5 1 1999-03-23 NaN 4 NaN 1999-04-30 NaN NaN 1 1999-06-02 NaN 9 NaN 1999-08-08 2 NaN NaN 1999-08-12 NaN 3 NaN 1999-08-17 NaN NaN 10 1999-10-22 NaN 3 NaN 1999-12-04 NaN NaN 4 2000-03-04 2 NaN NaN 2000-09-29 9 NaN NaN 2000-09-30 9 NaN NaN""" data_points = [line.split(' ') for line in txt.splitlines()[1:]] df = pd.DataFrame(data_points) data = list() for i in range(1, len(df.columns)): data.append(plotly.graph_objs.Scatter( x = df.iloc[:,0].tolist(), y = df.iloc[:,i].tolist(), mode = 'line', connectgaps = True )) fig = dict(data=data) plotly.plotly.sign_in('user', 'token') plot_url = plotly.plotly.plot(fig) 

enter image description here

+3
source

Source: https://habr.com/ru/post/1013366/


All Articles