I have a very large dataset ( ds ). One of its columns is Popularity , type factor ('High' / 'Low').
I divided the data by 70% and 30% to create a training set ( ds_tr ) and a test set ( ds_te ).
I created the following model using logistic regression:
mdl <- glm(formula = popularity ~ . -url , family= "binomial", data = ds_tr )
then I created a predict object (repeat this for ds_te )
y_hat = predict(mdl, data = ds_tr - url , type = 'response')
I want to find the accuracy value corresponding to a cutoff threshold of 0.5 and find the response value corresponding to a cutoff threshold of 0.5, so I did:
library(ROCR) pred <- prediction(y_hat, ds_tr$popularity) perf <- performance(pred, "prec", "rec")
The result is a table with many values
str(perf) Formal class 'performance' [package "ROCR"] with 6 slots ..@ x.name : chr "Recall" ..@ y.name : chr "Precision" ..@ alpha.name : chr "Cutoff" ..@ x.values :List of 1 .. ..$ : num [1:27779] 0.00 7.71e-05 7.71e-05 1.54e-04 2.31e-04 ... ..@ y.values :List of 1 .. ..$ : num [1:27779] NaN 1 0.5 0.667 0.75 ... ..@ alpha.values:List of 1 .. ..$ : num [1:27779] Inf 0.97 0.895 0.89 0.887 ...
How to find specific accuracy and return values โโcorresponding to a cutoff threshold of 0.5?
source share