Why does R-Squared decrease when I add an exogenous variable to OLS using python statsmodels

If I understand the OLS model correctly, this should never be the case.

trades['const']=1 Y = trades['ret']+trades['comms'] #X = trades[['potential', 'pVal', 'startVal', 'const']] X = trades[['potential', 'pVal', 'startVal']] from statsmodels.regression.linear_model import OLS ols=OLS(Y, X) res=ols.fit() res.summary() 

If I include const, I get rsquared from 0.22 and with it I get 0.43. How is this possible?

+4
source share
1 answer

see answer here Statsmodels: calculate set values ​​and square R

Rsquared follows a different definition depending on whether a constant exists in the model or not.

Rsquared in a linear model with a constant is a standard definition that uses comparison with the average only of the model as a reference. Total squares reduced.

Rsquared in a linear model without a constant is compared with a model that has no regressors at all, or the constant effect is zero. In this case, the calculation of the squares R uses the total sum of the squares, which do not demean.

Since the definition changes if we add or drop a constant, the square R can go anyway. The actually explained sum of the squares will always increase if we add additional explanatory variables or remain unchanged if the new variable does not contribute anything,

+6
source

Source: https://habr.com/ru/post/1275407/


All Articles