In the following example, the prediction function generates an error when the model is created by the watcht :: train function, called the formula (y ~ x). The prediction function works if the model was generated using the x, y specification. Why is this? Should the x, y specification be used? So far, I thought it was based on user preferences? Is there a way to get the prediction function to work with the specification of the formula?
I thought the models might be different due to the use of factor variables. However, the models seem to create identical regression equations. There are no differences in the predictions in the sample.
library(ggplot2)
library(caret)
data("diamonds")
set.seed(42)
trainIndex <- createDataPartition(diamonds$price, p=0.8, list = FALSE)
train <- diamonds[trainIndex,]
test <- diamonds[-trainIndex,]
lm_formula <- train(
price ~ ., train,
method = "lm",
trControl=trainControl(method="none")
)
lm_xy <- train(y = train$price,
x = train[,-which(colnames(train)=="price")],
method = "lm",
trControl=trainControl(method="none")
)
pred_formula <- predict(lm_formula$finalModel,test)
pred_xy <- predict(lm_xy$finalModel,test)
sum((lm_formula$finalModel$fitted.values-lm_xy$finalModel$fitted.values)^2)
source
share