Problems using randomForest in a carriage with ROC as an optimization indicator

I had a problem building random forest models using a carriage. I have a data set of about 46 thousand rows and 10 columns (one of which is the goal of optimization). From this dataset, I am trying to compare different classifiers. I have done the following:

ctrl = trainControl(method="boot" ,classProbs=TRUE ,summaryFunction=twoClassSummary ) #GLM Model: model.glm = train(x=d[,2:10] ,y=d$CONV_BT, method='glm' ,trControl=ctrl, metric="ROC" ,family="binomial") #Random Forest Model: model.rf = train(x=d[,2:10] ,y=d$CONV_BT, method='rf' ,trControl=ctrl, metric="ROC") #Naive Bayes Model: model.nb = train(x=d[,2:10] ,y=d$CONV_BT, method='nb' ,trControl=ctrl, metric="ROC" ) 

Then .glm and model.nb models look pretty decent. I can look at 25 boot reps, and each case has a ROC of about .7. However, something seems wrong with model.rf because the registered ROC estimates are all around .3. This tells me that something is incorrect, because I could just switch my forecasts from the rf model from p to 1-p, and then my ROC will be .7, right?

I apologize that I cannot provide data (because it is quite large for download and is the property). Another strange thing: when I simulate data, I no longer have this problem. Any idea what this could be ??? Thank you for your help!

+4
source share

Source: https://habr.com/ru/post/1482015/


All Articles