By chance, I noticed that OLS models implemented with sklearn and statsmodels give different R ^ 2 values ββif they are not suitable for interception. Otherwise they work fine. The following code gives:
import numpy as np import sklearn import statsmodels import sklearn.linear_model as sl import statsmodels.api as sm np.random.seed(42) N=1000 X = np.random.normal(loc=1, size=(N, 1)) Y = 2 * X.flatten() + 4 + np.random.normal(size=N) sklernIntercept=sl.LinearRegression(fit_intercept=True).fit(X, Y) sklernNoIntercept=sl.LinearRegression(fit_intercept=False).fit(X, Y) statsmodelsIntercept = sm.OLS(Y, sm.add_constant(X)) statsmodelsNoIntercept = sm.OLS(Y, X) print(sklernIntercept.score(X, Y), statsmodelsIntercept.fit().rsquared) print(sklernNoIntercept.score(X, Y), statsmodelsNoIntercept.fit().rsquared) print(sklearn.__version__, statsmodels.__version__)
prints:
0.78741906105 0.78741906105 -0.950825182861 0.783154483028 0.19.1 0.8.0
Where is the difference?
The question is different from Linear Regression Coefficients with statsmodels and sklearn values , since there sklearn.linear_model.LinearModel (with interception) is suitable for X, prepared as for statsmodels.api.OLS .
The question is different from Statsmodels: calculate the set values ββand the square R since it takes into account the difference between the two Python packages ( statsmodels and scikit-learn ), while the related question concerns statsmodels and the general definition of R ^ 2. The same answer answers them However, this question was discussed here: Does it have the same answer that the questions should be closed as duplicates?
source share