I use ScikitLearn DecisionTree.jl to create a random forest model for the binary classification problem of one of the RDatasets datasets (see the bottom of the DecisionTree.jl main page, which I mean by ScikitLearn). I also use the MLBase package to evaluate the model.
I built a random forest model of my data and would like to create an ROC curve for this model. Reading the documentation, I understand that the ROC curve is theoretical. I just can't figure out how to create it for a particular model.
From the Wikipedia page, the last part of the first sentence that I highlighted in bold italics is the one that causes my confusion: “In statistics, the receiver performance (ROC) or ROC curve is a graph that illustrates the performance of the binary classifier system because the threshold her discrimination is changing . " Throughout the article, the threshold value is greater, but it still confuses me on binary classification issues. What is the threshold value and how to change it?
In addition, the MLBase documentation for ROC curves says: "Compute an ROC instance or ROC curve (vector of ROC instances), based on the estimated data and the threshold value." But does not mention this threshold anywhere else.
Sample code for my project is given below. Basically, I want to create a ROC curve for a random forest, but I'm not sure how to do it or if it even works.
using DecisionTree
using RDatasets
using MLBase
quakes_data = dataset("datasets", "quakes");
quakes_data[:MagGT5] = convert(Array{Int32,1}, quakes_data[:Mag] .> 5.0)
features = convert(Array, quakes_data[:, [1:3;5]]);
labels = convert(Array, quakes_data[:, 6]);
labels[labels.==0] = 2
r_f_model = RandomForestClassifier(nsubfeatures = 3, ntrees = 50, partialsampling=0.7, maxdepth = 4)
DecisionTree.fit!(r_f_model, features, labels)
r_f_prediction = convert(Array{Int64,1}, DecisionTree.predict(r_f_model, features))
TrainingROC = roc(labels, r_f_prediction)
I also read this question and did not find it useful.