In statistics :: glm (), why does the subset argument give different results when I multiply the data argument myself?

Consider the following code:

library(ISLR)

row_list <- structure(list(`1` = 1:40, `2` = 41:79, `3` = 80:118, `4` = 119:157, 
               `5` = 158:196, `6` = 197:235, `7` = 236:274, `8` = 275:313, 
               `9` = 314:352, `10` = 353:392), 
          .Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
test <- row_list[[1]]
train <- setdiff(unlist(row_list), row_list[[1]])

Output 1:

> glm(mpg ~ poly(horsepower, 1), data = Auto, subset = train)

Call:  glm(formula = mpg ~ poly(horsepower, 1), data = Auto, subset = train)

Coefficients:
        (Intercept)  poly(horsepower, 1)  
              23.37              -133.05  

Degrees of Freedom: 351 Total (i.e. Null);  350 Residual
Null Deviance:      21460 
Residual Deviance: 8421     AIC: 2122

Output 2:

> glm(mpg ~ poly(horsepower, 1), data = Auto[train,])

Call:  glm(formula = mpg ~ poly(horsepower, 1), data = Auto[train, ])

Coefficients:
        (Intercept)  poly(horsepower, 1)  
              24.05              -114.19  

Degrees of Freedom: 351 Total (i.e. Null);  350 Residual
Null Deviance:      21460 
Residual Deviance: 8421     AIC: 2122

As you can see above, the values ​​of (Intercept)and poly(horsepower, 1)differ between the two outputs. Why is this?

At least for lm(), Introduction to Statistical Learning suggests (see page 191) that row indices can be used in an argument subset. Does this apply glm()or is it subsetjust not being used correctly?

+4
source share
1 answer

This is due to how orthogonal polynomials are constructed on poly.

, ( glm).

:

coef(glm(mpg~poly(hp,1),data=mtcars,subset=10:32))
(Intercept) poly(hp, 1) 
   20.63307   -28.66876 
coef(glm(mpg~poly(hp,1),data=mtcars[10:32,]))
(Intercept) poly(hp, 1) 
   19.93043   -25.43935 
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars,subset=10:32))
            (Intercept) poly(hp, 1, raw = TRUE) 
            31.64927851             -0.07509986 
coef(glm(mpg~poly(hp,1,raw=TRUE),data=mtcars[10:32,]))
            (Intercept) poly(hp, 1, raw = TRUE) 
            31.64927851             -0.07509986 
+7

Source: https://habr.com/ru/post/1688175/


All Articles