How to create a gain diagram in R for a decision tree model?

I created a decision tree model in R. The target variable is Salary, where we are trying to predict whether a person’s salary is above or below 50 thousand based on other input variables

df<-salary.data train = sample(1:nrow(df), nrow(df)/2) train = sample(1:nrow(df), size=0.2*nrow(df)) test = - train training_data = df[train, ] testing_data = df[test, ] fit <- rpart(training_data$INCOME ~ ., method="class", data=training_data)##generate tree testing_data$predictionsOutput = predict(fit, newdata=testing_data, type="class")##make prediction 

After that I tried to create a gain diagram by doing the following

 # Gain Chart pred <- prediction(testing_data$predictionsOutput, testing_data$INCOME) gain <- performance(pred,"tpr","fpr") plot(gain, col="orange", lwd=2) 

Studying the link, I can’t understand how to use the ROCR package to build a chart using the Prediction function. Is it only for binary target variables? I get the error "Prediction format is invalid"

Any help with this would be greatly appreciated to help me build a gain diagram for the above model. Thanks!!

  AGE EMPLOYER DEGREE MSTATUS JOBTYPE SEX C.GAIN C.LOSS HOURS 1 39 State-gov Bachelors Never-married Adm-clerical Male 2174 0 40 2 50 Self-emp-not-inc Bachelors Married-civ-spouse Exec-managerial Male 0 0 13 3 38 Private HS-grad Divorced Handlers-cleaners Male 0 0 40 COUNTRY INCOME 1 United-States <=50K 2 United-States <=50K 3 United-States <=50K 
+6
source share
2 answers

Convert prediction to vector using c ()

 library('rpart') library('ROCR') setwd('C:\\Users\\John\\Google Drive\\working\\R\\questions') df<-read.csv(file='salary-class.csv',header=TRUE) train = sample(1:nrow(df), nrow(df)/2) train = sample(1:nrow(df), size=0.2*nrow(df)) test = - train training_data = df[train, ] testing_data = df[test, ] fit <- rpart(training_data$INCOME ~ ., method="class", data=training_data)##generate tree testing_data$predictionsOutput = predict(fit, newdata=testing_data, type="class")##make prediction # Doesn't work # pred <- prediction(testing_data$predictionsOutput, testing_data$INCOME) v <- c(pred = testing_data$predictionsOutput) pred <- prediction(v, testing_data$INCOME) gain <- performance(pred,"tpr","fpr") plot(gain, col="orange", lwd=2) 

enter image description here

+6
source

This should work if you change

 predict(fit, newdata=testing_data, type="class") 

to

 predict(fit, newdata=testing_data, type="prob") 

The gain graph wants to be ranked by model probability.

+1
source

Source: https://habr.com/ru/post/979375/


All Articles