Using randomforest () for classification in R?

Initially, I had a data frame consisting of 12 columns in N rows. The last column is my class (0 or 1). I had to convert the entire entire data frame to a numeric number using

training <- sapply(training.temp,as.numeric) 

But then I thought it was necessary for the class column to be a factor column in order to use the randomforest () tool as a classifier, so I did

 training[,"Class"] <- factor(training[,ncol(training)]) 

I move on to creating a tree with

 training_rf <- randomForest(Class ~., data = trainData, importance = TRUE, do.trace = 100) 

But I get two errors:

 1: In Ops.factor(training[, "Status"], factor(training[, ncol(training)])) : <= this is not relevant for factors (roughly translated) 2: In randomForest.default(m, y, ...) : The response has five or fewer unique values. Are you sure you want to do regression? 

I would appreciate if someone could point out the formatting error I am making.

Thanks!

+6
source share
2 answers

So the problem is actually quite simple. It turns out that my training data was an atomic vector. Therefore, at first it had to be converted into a data frame. So I needed to add the following line:

 training <- as.data.frame(training) 

The problem is solved!

+6
source

Firstly, your coercion to the coefficient does not work due to syntax errors. Secondly, you should always use indexing when setting the RF model. Here are the changes to the code that should make it work.

  training <- sapply(training.temp,as.numeric) training[,"Class"] <- as.factor(training[,"Class"]) training_rf <- randomForest(x=training[,1:(ncol(training)-1)], y=training[,"Class"], importance=TRUE, do.trace=100) # You can also coerce to a factor directly in the model statement training_rf <- randomForest(x=training[,1:(ncol(training)-1)], y=as.factor(training[,"Class"]), importance=TRUE, do.trace=100) 
+5
source

Source: https://habr.com/ru/post/955685/


All Articles