Why does the caret :: train function generate an error if called with a formula instead of x, y?

In the following example, the prediction function generates an error when the model is created by the watcht :: train function, called the formula (y ~ x). The prediction function works if the model was generated using the x, y specification. Why is this? Should the x, y specification be used? So far, I thought it was based on user preferences? Is there a way to get the prediction function to work with the specification of the formula?

I thought the models might be different due to the use of factor variables. However, the models seem to create identical regression equations. There are no differences in the predictions in the sample.

library(ggplot2)
library(caret)
data("diamonds")
set.seed(42)
trainIndex <- createDataPartition(diamonds$price, p=0.8, list = FALSE)
train <- diamonds[trainIndex,]
test <- diamonds[-trainIndex,]
lm_formula <- train(
    price ~ ., train,
    method = "lm",
    trControl=trainControl(method="none")
)
lm_xy <- train(y = train$price,
                 x = train[,-which(colnames(train)=="price")],
                 method = "lm",
                 trControl=trainControl(method="none")
)

# the following generates the error shown beneath it
pred_formula <- predict(lm_formula$finalModel,test)
# Error in eval(predvars, data, env) : object 'cut.L' not found

pred_xy <- predict(lm_xy$finalModel,test)

# The following produces zero indicating the in-sample fits are identical
sum((lm_formula$finalModel$fitted.values-lm_xy$finalModel$fitted.values)^2)
+4
source share
1 answer

, , predict.lm -

class(lm_xy$finalModel) #lm

pred_formula <- predict(lm_formula, test)
pred_xy <- predict(lm_xy, test)

predict.train, :

all.equal(pred_xy, pred_formula)
#TRUE

, :

summary(lm_formula$finalModel)
summary(lm_xy$finalModel)

, `` , predict.lm. predict.train.

+3

Source: https://habr.com/ru/post/1691854/


All Articles