Consider the following code:
library(ISLR)
row_list <- structure(list(`1` = 1:40, `2` = 41:79, `3` = 80:118, `4` = 119:157,
`5` = 158:196, `6` = 197:235, `7` = 236:274, `8` = 275:313,
`9` = 314:352, `10` = 353:392),
.Names = c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))
test <- row_list[[1]]
train <- setdiff(unlist(row_list), row_list[[1]])
Output 1:
> glm(mpg ~ poly(horsepower, 1), data = Auto, subset = train)
Call: glm(formula = mpg ~ poly(horsepower, 1), data = Auto, subset = train)
Coefficients:
(Intercept) poly(horsepower, 1)
23.37 -133.05
Degrees of Freedom: 351 Total (i.e. Null); 350 Residual
Null Deviance: 21460
Residual Deviance: 8421 AIC: 2122
Output 2:
> glm(mpg ~ poly(horsepower, 1), data = Auto[train,])
Call: glm(formula = mpg ~ poly(horsepower, 1), data = Auto[train, ])
Coefficients:
(Intercept) poly(horsepower, 1)
24.05 -114.19
Degrees of Freedom: 351 Total (i.e. Null); 350 Residual
Null Deviance: 21460
Residual Deviance: 8421 AIC: 2122
As you can see above, the values of (Intercept)and poly(horsepower, 1)differ between the two outputs. Why is this?
At least for lm(), Introduction to Statistical Learning suggests (see page 191) that row indices can be used in an argument subset. Does this apply glm()or is it subsetjust not being used correctly?