Why does lm return values ​​when there is no deviation from the predicted value?

Consider the following R code (which, I think, ultimately calls some Fortran):

X <- 1:1000 Y <- rep(1,1000) summary(lm(Y~X)) 

Why are values ​​returned using a summary? Should this model not match, since there is no variance in Y? More importantly, why is the model R ^ 2 ~ = .5?

Edit

I was tracking code from lm ​​to lm.fit and could see this call:

 z <- .Fortran("dqrls", qr = x, n = n, p = p, y = y, ny = ny, tol = as.double(tol), coefficients = mat.or.vec(p, ny), residuals = y, effects = y, rank = integer(1L), pivot = 1L:p, qraux = double(p), work = double(2 * p), PACKAGE = "base") 

This is where the real fit comes in. Looking at http://svn.r-project.org/R/trunk/src/appl/dqrls.f ) did not help me understand what was happening because I do not know fortran.

+6
source share
3 answers

Statistically speaking, what should we expect (I would like to say "expect", but this is a very specific term ;-))? Coefficients should be (0,1), and not "do not match." The covariance (X, Y) is considered proportional to the variance of X, and not vice versa. Since X has non-zero variance, there is no problem. Since the covariance is 0, the estimated coefficient for X should be 0. Thus, within the tolerance of the machine, this is the answer you get.

There is no statistical anomaly here. There may be a statistical misunderstanding. There is also the problem of machine tolerance, but a coefficient of the order of 1E-19 is very negligible, given the scale of the predictor and response values.

Update 1: A quick overview of simple linear regression can be found on this page on Wikipedia . The main thing to note is that Var(x) is in the denominator Cov(x,y) in the numerator. In this case, the numerator is 0, the denominator is non-zero, so there is no reason to expect a NaN or NA . However, one may ask why it is not the resulting coefficient for x a 0 , and this is due to the numerical problems of the accuracy of QR decomposition.

+5
source

I believe this is simply because QR decomposition is implemented with floating point arithmetic.

The singular.ok parameter actually refers to the design matrix (i.e. only X). Try

 lm.fit(cbind(X, X), Y) 

against.

 lm.fit(cbind(X, X), Y, singular.ok=F) 
+2
source

I agree that the problem could be floating point. but I don’t think it is a singularity.

If you use solve(t(x1)%*%x1)%*%(t(x1)%*%Y) instead of QR, (t(x1)%*%x1) not unique

use x1 = cbind(rep(1,1000,X) because lm(Y~X) involves interception.

+2
source

Source: https://habr.com/ru/post/908235/


All Articles