The accuracy of the model is 0.86, and the AUC is 0.50?

Question

The accuracy of the model is 0.86, and the AUC is 0.50?

I launched several models in sklearn. Here is the code for it.

# Function for Stochastic Gradient Descent Logistic Regression with Elastic Net
def SGDlogistic(k_fold,train_X,train_Y):
        """Method to implement Multi-class SVM using 
        Stochastic Gradient Descent
        """

        from sklearn.linear_model import SGDClassifier
        scores_sgd_lr = []

        for train_indices, test_indices in k_fold:
            train_X_cv = train_X[train_indices]
            train_Y_cv= train_Y[train_indices]

            test_X_cv = train_X[test_indices]
            test_Y_cv= train_Y[test_indices]

            sgd_lr=SGDClassifier(loss='log',penalty='elasticnet')
            scores_sgd_lr.append(sgd_lr.fit(train_X_cv,train_Y_cv).score(test_X_cv,test_Y_cv))

        print("The mean accuracy of Stochastic Gradient Descent Logistic on CV data is:", np.mean(scores_sgd_lr)) 

        return sgd_lr



def test_performance(test_X,test_Y,classifier,name):
        """This method checks the performance of each algorithm on test data."""

        from sklearn import metrics

        # For SGD
        print ("The accuracy of "+ name + " on test data is:",classifier.score(test_X,test_Y))
        print 'Classification Metrics for'
        print metrics.classification_report(test_Y, classifier.predict(test_X))
        print "Confusion matrix"
        print metrics.confusion_matrix(test_Y, classifier.predict(test_X))




def plot_ROC(test_X,test_Y,classifier):
    """ This functions plots the ROC curve of the classifier"""

    from sklearn.metrics import roc_curve, auc
    false_positive_rate, true_positive_rate, thresholds =roc_curve(test_Y, classifier.predict(test_X))
    roc_auc= auc(false_positive_rate, true_positive_rate)
    plt.title('Receiver Operating Characteristic')
    plt.plot(false_positive_rate, true_positive_rate, 'b',label='AUC = %0.2f'% roc_auc)
    plt.legend(loc='lower right')
    plt.ylabel('True Positive Rate')
    plt.xlabel('False Positive Rate')

The first function performs logistic regression with an elastic network penalty. The second function is testing the performance of the algorithm on test data. This gives confusion and accuracy.

While plot_ROC maps the ROC curve to test data.

Here is what I see.

('The accuracy of Logistic with Elastic Net on test data is:', 0.90566607467092586)
Classification Metrics for
             precision    recall  f1-score   support

          0       0.91      1.00      0.95    227948
          1       0.50      0.00      0.00     23743

avg / total       0.87      0.91      0.86    251691

Confusion matrix
[[227944      4]
 [ 23739      4]]

(array([ 0.        ,  0.00001755,  1.        ]),
 array([ 0.        ,  0.00016847,  1.        ]),
 array([2, 1, 0]))

If you see, the accuracy of the test data is 90% and even the confusion matrix shows good accuracy and feedback. Thus, it is not just accuracy that can be misleading. But the ROC and AUC that it gives are 0.50 ?. This is so strange. It behaves like a random guess according to ROC, while the accuracy and the Confusion matrix show a different image.

Help pls

Edit 2:

Ok. AUC.

, .

AUC 0,71. . . SVM .. _proba Huber Loss. , , AUC?

+4

python scikit-learn machine-learning classification

Baktaawar 20 . '15 1:55

1

David Maust · Answer 1 · 2015-11-20T03:45:32+0000

, -, , 0 .

, 90% 0, 0. .

from sklearn.metrics import confusion_matrix, classification_report
y_true = [0] * 90 + [1] * 10 # 90% Class 0, 10% class 1
y_pred = [0] * 90 + [0] * 10 # All predictions are class 0

print classification_report(y_true, y_pred)

#             precision    recall  f1-score   support
#
#          0       0.90      1.00      0.95        90
#          1       0.00      0.00      0.00        10
#
# avg / total       0.81      0.90      0.85       100

print confusion_matrix(y_true, y_pred)

#[[90  0]
# [10  0]]

print roc_auc_score(y_true, y_pred)

# 0.5

, AUC , predict_proba .

probs = classifier.predict_proba(test_X).T[1]
false_positive_rate, true_positive_rate, thresholds = \
     roc_curve(test_Y, probs)

The accuracy of the model is 0.86, and the AUC is 0.50?

More articles: