I planned to use sklearn linear_model to plot the linear regression result and statsmodels.api to get a detailed summary of the training result. However, the two packages give very different results on the same input.
For example, the constant member from sklearn is 7.8e-14, but the constant member from statsmodels is 48.6. (I added a column of 1 to x for constant use when using both methods). My code for both methods is succint:
# Use statsmodels linear regression to get a result (summary) for the model. def reg_statsmodels(y, x): results = sm.OLS(y, x).fit() return results
The entry is too complicated to post here. Is it possible that singular input x caused this problem?
Having made the 3rd graph using PCA, it seems that the result is sclear is not a good approximation. What are the explanations? I still want to make a visualization, so it will be very useful to fix problems in implementing linear sklearn regression.
source share