I launched several models in sklearn. Here is the code for it.
def SGDlogistic(k_fold,train_X,train_Y):
"""Method to implement Multi-class SVM using
Stochastic Gradient Descent
"""
from sklearn.linear_model import SGDClassifier
scores_sgd_lr = []
for train_indices, test_indices in k_fold:
train_X_cv = train_X[train_indices]
train_Y_cv= train_Y[train_indices]
test_X_cv = train_X[test_indices]
test_Y_cv= train_Y[test_indices]
sgd_lr=SGDClassifier(loss='log',penalty='elasticnet')
scores_sgd_lr.append(sgd_lr.fit(train_X_cv,train_Y_cv).score(test_X_cv,test_Y_cv))
print("The mean accuracy of Stochastic Gradient Descent Logistic on CV data is:", np.mean(scores_sgd_lr))
return sgd_lr
def test_performance(test_X,test_Y,classifier,name):
"""This method checks the performance of each algorithm on test data."""
from sklearn import metrics
print ("The accuracy of "+ name + " on test data is:",classifier.score(test_X,test_Y))
print 'Classification Metrics for'
print metrics.classification_report(test_Y, classifier.predict(test_X))
print "Confusion matrix"
print metrics.confusion_matrix(test_Y, classifier.predict(test_X))
def plot_ROC(test_X,test_Y,classifier):
""" This functions plots the ROC curve of the classifier"""
from sklearn.metrics import roc_curve, auc
false_positive_rate, true_positive_rate, thresholds =roc_curve(test_Y, classifier.predict(test_X))
roc_auc= auc(false_positive_rate, true_positive_rate)
plt.title('Receiver Operating Characteristic')
plt.plot(false_positive_rate, true_positive_rate, 'b',label='AUC = %0.2f'% roc_auc)
plt.legend(loc='lower right')
plt.ylabel('True Positive Rate')
plt.xlabel('False Positive Rate')
The first function performs logistic regression with an elastic network penalty. The second function is testing the performance of the algorithm on test data. This gives confusion and accuracy.
While plot_ROC maps the ROC curve to test data.
Here is what I see.
('The accuracy of Logistic with Elastic Net on test data is:', 0.90566607467092586)
Classification Metrics for
precision recall f1-score support
0 0.91 1.00 0.95 227948
1 0.50 0.00 0.00 23743
avg / total 0.87 0.91 0.86 251691
Confusion matrix
[[227944 4]
[ 23739 4]]

(array([ 0. , 0.00001755, 1. ]),
array([ 0. , 0.00016847, 1. ]),
array([2, 1, 0]))
If you see, the accuracy of the test data is 90% and even the confusion matrix shows good accuracy and feedback. Thus, it is not just accuracy that can be misleading. But the ROC and AUC that it gives are 0.50 ?. This is so strange. It behaves like a random guess according to ROC, while the accuracy and the Confusion matrix show a different image.
Help pls
Edit 2:
Ok. AUC.
, .

AUC 0,71. . . SVM .. _proba Huber Loss. , , AUC?