The TL; DR function: scikit roc_curve returns only 3 points for a specific dataset. Why is this possible, and how can we control how many points need to be returned?
I try to draw a ROC curve, but I get a “ROC triangle” in sequence.
lr = LogisticRegression(multi_class = 'multinomial', solver = 'newton-cg') y = data['target'].values X = data[['feature']].values model = lr.fit(X,y) # get probabilities for clf probas_ = model.predict_log_proba(X)
Just to make sure the length is ok:
print len(y) print len(probas_[:, 1])
Returns 13759 on both.
Then runs:
false_pos_rate, true_pos_rate, thresholds = roc_curve(y, probas_[:, 1]) print false_pos_rate
returns [0. 0.28240129 1.]
If I call threasholds, I get an array ([0.4822225, -0.5177775, -0.84595197]) (always only 3 points).
Therefore, it is not surprising that my ROC curve looks like a triangle.
What I cannot understand is why scikit roc_curve only returns 3 points. . Help is much appreciated.

source share