Problems using randomForest in a carriage with ROC as an optimization indicator

Question

Problems using randomForest in a carriage with ROC as an optimization indicator

I had a problem building random forest models using a carriage. I have a data set of about 46 thousand rows and 10 columns (one of which is the goal of optimization). From this dataset, I am trying to compare different classifiers. I have done the following:

ctrl = trainControl(method="boot" ,classProbs=TRUE ,summaryFunction=twoClassSummary ) #GLM Model: model.glm = train(x=d[,2:10] ,y=d$CONV_BT, method='glm' ,trControl=ctrl, metric="ROC" ,family="binomial") #Random Forest Model: model.rf = train(x=d[,2:10] ,y=d$CONV_BT, method='rf' ,trControl=ctrl, metric="ROC") #Naive Bayes Model: model.nb = train(x=d[,2:10] ,y=d$CONV_BT, method='nb' ,trControl=ctrl, metric="ROC" )

Then .glm and model.nb models look pretty decent. I can look at 25 boot reps, and each case has a ROC of about .7. However, something seems wrong with model.rf because the registered ROC estimates are all around .3. This tells me that something is incorrect, because I could just switch my forecasts from the rf model from p to 1-p, and then my ROC will be .7, right?

I apologize that I cannot provide data (because it is quite large for download and is the property). Another strange thing: when I simulate data, I no longer have this problem. Any idea what this could be ??? Thank you for your help!

+4

r machine-learning classification random-forest roc

random_forest_fanatic May 21 '13 at 18:19

source share

No one has answered this question yet.

See related questions:

181

Which machine learning classifier should I choose as a whole?

4

Effective classifiers of memory in R for extremely wide and not too long training sets

3

Errors starting Caret package in R