Drawing a ROC curve in scikit gives only 3 points

The TL; DR function: scikit roc_curve returns only 3 points for a specific dataset. Why is this possible, and how can we control how many points need to be returned?

I try to draw a ROC curve, but I get a “ROC triangle” in sequence.

 lr = LogisticRegression(multi_class = 'multinomial', solver = 'newton-cg') y = data['target'].values X = data[['feature']].values model = lr.fit(X,y) # get probabilities for clf probas_ = model.predict_log_proba(X) 

Just to make sure the length is ok:

 print len(y) print len(probas_[:, 1]) 

Returns 13759 on both.

Then runs:

 false_pos_rate, true_pos_rate, thresholds = roc_curve(y, probas_[:, 1]) print false_pos_rate 

returns [0. 0.28240129 1.]

If I call threasholds, I get an array ([0.4822225, -0.5177775, -0.84595197]) (always only 3 points).

Therefore, it is not surprising that my ROC curve looks like a triangle.

What I cannot understand is why scikit roc_curve only returns 3 points. . Help is much appreciated.

enter image description here

+13
source share
2 answers

The number of points depends on the number of unique values ​​at the input. Since the input vector has only 2 unique values, the function gives the correct result.

+11
source

I had the same problem with another example. The error I made was to enter the results for a given threshold, not the probabilities in the y_score roc_curve argument. It also gives a three-point plot.

0
source

Source: https://habr.com/ru/post/986567/


All Articles