Getting a regression line to build a Pandas regression

Question

Getting a regression line to build a Pandas regression

I tried using (pandas) pd.ols and (statsmodels) sm.ols to get a regression scatter plot with a regression line, I can get a scatter plot, but I can't get the parameters to get the regression line to the plot. It's probably obvious that I'm doing some coding here :-( (using this as a guide: http://nbviewer.ipython.org/github/weecology/progbio/blob/master/ipynbs/statistics.ipynb

My data is in the DataFrame panda, and the column x is merged2 [: -1]. Lastqu, and the data column y is concatenated2 [: -1]. Units Now my code looks like this: to get regression:

def fit_line2(x, y): X = sm.add_constant(x, prepend=True) #Add a column of ones to allow the calculation of the intercept model = sm.OLS(y, X,missing='drop').fit() """Return slope, intercept of best fit line.""" X = sm.add_constant(x) return model model=fit_line2(merged2[:-1].lastqu,merged2[:-1].Units) print fit.summary()

^^^^ seems to be normal

 intercept, slope = model.params << I don't think this is quite right plt.plot(merged2[:-1].lastqu,merged2[:-1].Units, 'bo') plt.hold(True)

^^^^^ this gives a scatter plot **** and below does not give me a regression line

 x = np.array([min(merged2[:-1].lastqu), max(merged2[:-1].lastqu)]) y = intercept + slope * x plt.plot(x, y, 'r-') plt.show()

Dataframe fragment: [: -1] removes the current period from the data, which will subsequently be a projection

 Units lastqu Uperchg lqperchg fcast errpercent nfcast date 2000-12-31 7177 NaN NaN NaN NaN NaN NaN 2001-12-31 10694 2195.000000 0.490038 NaN 10658.719019 1.003310 NaN 2002-12-31 11725 2469.000000

Edit:

I found that I could do:

 fig = plt.figure(figsize=(12,8)) fig = sm.graphics.plot_regress_exog(model, "lastqu", fig=fig)

as described here in the Statsmodels document, which seems to give the main thing that I wanted (and even more), I would still like to know where I made a mistake in the previous code!

+4

python matplotlib pandas statsmodels

dartdog Jan 23 '14 at 19:27

source share

1 answer

Josef · Accepted Answer · 2014-01-24T19:56:29+0000

Check what values you have in your arrays and variables.

I assume that your x is just nans, because you are using Python min and max. At least this happens with the version of Pandas that I was opening right now.

Minimum and maximum methods should work as they know how to handle nan or missing values

 >>> x = pd.Series([np.nan,2], index=['const','slope']) >>> x const NaN slope 2 dtype: float64 >>> min(x) nan >>> max(x) nan >>> x.min() 2.0 >>> x.max() 2.0

Getting a regression line to build a Pandas regression

More articles: