The specifics of learning scikit

Question

The specifics of learning scikit

I need specificity for my classification, which is defined as: TN/(TN+FP)

I am writing a custom counter function:

 from sklearn.metrics import make_scorer def specificity_loss_func(ground_truth, predictions): print predictions tp, tn, fn, fp = 0.0,0.0,0.0,0.0 for l,m in enumerate(ground_truth): if m==predictions[l] and m==1: tp+=1 if m==predictions[l] and m==0: tn+=1 if m!=predictions[l] and m==1: fn+=1 if m!=predictions[l] and m==0: fp+=1 `return tn/(tn+fp) score = make_scorer(specificity_loss_func, greater_is_better=True)

Then

 from sklearn.dummy import DummyClassifier clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0) ground_truth = [0,0,1,0,1,1,1,0,0,1,0,0,1] p = [0,0,0,1,0,1,1,1,1,0,0,1,0] clf_dummy = clf_dummy.fit(ground_truth, p) score(clf_dummy, ground_truth, p)

When I run these commands, I get p like:

 [0 0 0 0 0 0 0 0 0 0 0 0 0] 1.0

Why does my p change to a series of zeros when entering p = [0,0,0,1,0,1,1,1,1,0,0,1,0]

+8

python scikit-learn

Darshan chaudhary Oct 22 '15 at 7:26

source share

3 answers

You can get specificity from confusion matrix . For a binary classification problem, it would be something like:

 from sklearn.metrics import confusion_matrix y_true = [0, 0, 0, 1, 1, 1, 1, 1] y_pred = [0, 1, 0, 1, 0, 1, 0, 1] tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel() specificity = tn / (tn+fp)

+11

sedeh Jul 30 '17 at 3:22

source share

I appreciate this to be an old question, but thought I'd mention that sklearn pretty much does it (at least in scikit-learn v0.21.2, but I'm sure it has been forever)

As I understand it, “specificity” is just a special case of “recall”. A reminder is calculated for the actual positive class (TP / [TP + FN]), while “specificity” is the same type of calculation, but for the actual negative class (TN / [TN + FP]).

It really makes sense to have such specific terminology for binary classification problems. For the task of classifying a multiclass it would be more convenient to talk about recall with respect to each class. There is no reason why you cannot talk about recall in this way, even when dealing with a binary classification problem (e.g. recall for class 0, recall for class 1).

For example, remember, the proportion of patients who actually have cancer who have been successfully diagnosed as cancer tells us. However, in general, we can say that the recall of class X tells us the proportion of samples actually belonging to class X that were successfully predicted to belong to class X.

Given this, you can use from sklearn.metrics import classification_report to create a dictionary of accuracy, recall, f1 score and support for each label / class . You can also rely on from sklearn.metrics import precision_recall_fscore_support , depending on your preference. The documentation is here .

0

jtromans May 27 '19 at 13:20

source share

Ibraim ganiev · Accepted Answer · 2015-10-22T09:11:58+0000

First of all, you need to know that:

 DummyClassifier(strategy='most_frequent'...

Gives you a classifier that returns the most commonly used label from your training set. It does not even consider patterns in X. You can pass something instead of ground_truth on this line:

 clf_dummy = clf_dummy.fit(ground_truth, p)

the result of the training, and the forecasts will remain the same, because most labels inside p stand for "0".

The second thing you need to know: make_scorer returns a function with a scorer(estimator, X, y) interface scorer(estimator, X, y) This function calls the X-valuation prediction method and computes your specificity function between the predicted labels and y.

Thus, he calls clf_dummy on any data set (no matter which one he always returns 0), and returns the vector 0, then he calculates the loss of specificity between ground_truth and the forecasts. Your predictions are 0 because 0 was the majority class in the training set. Your score is 1 because there are no false positive predictions.

I adjusted your code to add more convenience.

 from sklearn.dummy import DummyClassifier clf_dummy = DummyClassifier(strategy='most_frequent', random_state=0) X = [[0],[0],[1],[0],[1],[1],[1],[0],[0],[1],[0],[0],[1]] p = [0,0,0,1,0,1,1,1,1,0,0,1,0] clf_dummy = clf_dummy.fit(X, p) score(clf_dummy, X, p)

The specifics of learning scikit

More articles: