Consider the following code:
library(ISLR)
row_list <- structure(list(`1` = 1:40, `2` = 41:79, `3` = 80:118, `4` = 119:157,
`5` = 158:196, `6` = 197:235, `7` = 236:274, `8` = 275:313,
`9` = 314:352, `10` = 353:392),
.Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
test <- row_list[[1]]
train <- setdiff(unlist(row_list), row_list[[1]])
Output 1:
> glm(mpg ~ poly(horsepower, 1), data = Auto, subset = train)
Call: glm(formula = mpg ~ poly(horsepower, 1), data = Auto, subset = train)
Coefficients:
(Intercept) poly(horsepower, 1)
23.37 -133.05
Degrees of Freedom: 351 Total (i.e. Null); 350 Residual
Null Deviance: 21460
Residual Deviance: 8421 AIC: 2122
Output 2:
> glm(mpg ~ poly(horsepower, 1), data = Auto[train,])
Call: glm(formula = mpg ~ poly(horsepower, 1), data = Auto[train, ])
Coefficients:
(Intercept) poly(horsepower, 1)
24.05 -114.19
Degrees of Freedom: 351 Total (i.e. Null); 350 Residual
Null Deviance: 21460
Residual Deviance: 8421 AIC: 2122
As you can see above, the values of (Intercept)
and poly(horsepower, 1)
differ between the two outputs. Why is this?
At least for lm()
, Introduction to Statistical Learning suggests (see page 191) that row indices can be used in an argument subset
. Does this apply glm()
or is it subset
just not being used correctly?