R-package (rpart): building a classification tree

I have been struggling for several days to complete the classification tree using the caret package. The problem is my variable factors. I generate a tree, but when I try to use the best model for predicting a test sample, it fails because the train function creates dummies for my variable factors, and then the prediction function cannot find these newly created dummies in the test set, How can I solve this a problem?

My code is as follows:

install.packages("caret", dependencies = c("Depends", "Suggests"))      
library(caret)                                      
db=data.frame(read.csv ("db.csv", head=TRUE, sep=";", na.strings ="?"))     
fix(db)
db$defaillance=factor(db$defaillance)
db$def=ifelse(db$defaillance==0,"No","Yes") 
db$def=factor(db$def)
db$defaillance=NULL
db$canal=factor(db$canal)
db$sect_isodev=factor(db$sect_isodev)
db$sect_risq=factor(db$sect_risq)       

#delete zero variance predictors                                
nzv <- nearZeroVar(db[,-78])
db_new <- db[,-nzv]

inTrain <- createDataPartition(y = db_new$def, p = .75, list = FALSE)                               
training <- db_new[inTrain,]
testing <- db_new[-inTrain,]
str(training)
str(testing)
dim(training)
dim(testing)

An example of the str () function for training / testing is given below:

 $ FDR        : num  1305 211 162 131 143 ...
 $ FCYC       : num  0.269 0.18 0.154 0.119 0.139 ...
 $ BFDR       : num  803 164 108 72 76 63 100 152 188 80 ...
 $ TRES       : num  502 47 54 59 67 49 53 -7 -103 -109 ...
 $ sect_isodev: Factor w/ 9 levels "1","2","3","4",..: 4 3 3 3 3 3 3 3 3 3 ...
 $ sect_risq  : Factor w/ 6 levels "0","1","2","3",..: 6 6 6 6 6 6 6 6 6 6 ...
 $ def        : Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
> dim(training)
[1] 14553    42
> dim(testing)
[1] 4850   42

Then my code looks like this:

fitControl <- trainControl(method = "repeatedcv",
                           number = 10,
                           repeats = 10,
                   classProbs = TRUE,
                   summaryFunction = twoClassSummary)

#CART1
set.seed(1234)
tree1 = train (def~.,
           training,
           method = "rpart",
           tuneLength=20,
           metric="ROC",
           trControl = fitControl)

Sample

summary(tree1$finalModel)

here

RNTB          38.397731
sect_isodev1   6.742289
sect_isodev3   4.005016
sect_isodev8   2.520850
sect_risq3     9.909127
sect_risq4     6.737908
sect_risq5     3.085714
SOLV          73.067539
TRES          47.906884
sect_isodev2   0.000000
sect_isodev4   0.000000
sect_isodev5   0.000000
sect_isodev6   0.000000
sect_isodev7   0.000000
sect_isodev9   0.000000
sect_risq0     0.000000
sect_risq1     0.000000
sect_risq2     0.000000

And here is the error:

model.tree1 <- pred (tree1 $ finalModel, testing) Error in eval (expr, envir, enc): object 'sect_isodev1' not found

. Max Kuhn "Predictive Modeling with R" :

predict(rpartTune$finalModel, newdata, type = "class")

rpartTune$finalModel - , ( , ). R type = "class". type = "prob". - .

+4
2

, :

  • R predict tree1$finalModel, predict.rpart, tree1$finalModel rpart. , , . R type = "class". predict.rpart .
  • train x y , sect_isodev1

( str) x y predict.rpart rpart:

tree1 = train (y = training$def,
               x = training[, -which(colnames(training) == "def")],
               method = "rpart",
               tuneLength=20,
               metric="ROC",
               trControl = fitControl)
summary(tree1$finalModel)
# This still results in Error: could not find function "predict.rpart":
model.tree1 <- predict.rpart(tree1$finalModel, newdata = testing)
# Explicitly calling predict.rpart from the rpart package works:
rpart:::predict.rpart(object = tree1$finalModel, 
                      newdata = testing, 
                      type = "class") 

, predict(tree1, testing), predict.train train, . : , , predict.

+5

predict.rpart train$finalModel, . rpart -, train, . . , train, minutia, predict.train .

Max

EDIT -

type = "class" type = "prob".

predict.rpart . rpart , , .

predict.train , type = "prob" .

+7

Source: https://habr.com/ru/post/1568185/


All Articles