Different linear regression coefficients using statsmodels and sklearn

Question

Different linear regression coefficients using statsmodels and sklearn

I planned to use sklearn linear_model to plot the linear regression result and statsmodels.api to get a detailed summary of the training result. However, the two packages give very different results on the same input.

For example, the constant member from sklearn is 7.8e-14, but the constant member from statsmodels is 48.6. (I added a column of 1 to x for constant use when using both methods). My code for both methods is succint:

# Use statsmodels linear regression to get a result (summary) for the model. def reg_statsmodels(y, x): results = sm.OLS(y, x).fit() return results # Use sklearn linear regression to compute the coefficients for the prediction. def reg_sklearn(y, x): lr = linear_model.LinearRegression() lr.fit(x, y) return lr.coef_

The entry is too complicated to post here. Is it possible that singular input x caused this problem?

Having made the 3rd graph using PCA, it seems that the result is sclear is not a good approximation. What are the explanations? I still want to make a visualization, so it will be very useful to fix problems in implementing linear sklearn regression.

0

scikit-learn linear-regression statsmodels

Xinyi jiang Jul 19 '16 at 6:59

source share

1 answer

dirkster · Accepted Answer · 2016-07-19T09:06:31+0000

You say that

 I added a column of 1 in x for constant term when using both methods

But the LinearRegression documentation says that

 LinearRegression(fit_intercept=True, [...])

it is by default suitable for interception. This may explain why you have differences in the permanent member.

Now for other coefficients differences may arise when two of the variables are strongly correlated. Consider the most extreme case when your two columns are identical. Then a decrease in the coefficient in front of either of the two can be compensated by an increase in the other. This is the first thing I checked.

Different linear regression coefficients using statsmodels and sklearn

More articles: