I am using sklearn v 0.13.1 svm to try and solve the binary classification problem. I use the kfold cross-reference check and calculate the area under the roc curve (roc_auc) to check the quality of my model. However, for some folds, roc_auc is less than 0.5, even for training data. Is it impossible? Is it not always possible for an algorithm to at least reach 0.5 on the data it trains?
Here is my code:
classifier = svm.SVC(kernel='poly', degree=3, probability=True, max_iter=100000)
kf = cross_validation.KFold(len(myData), n_folds=3, indices=False)
for train, test in kf:
Fit = classifier.fit(myData[train], classVector[train])
probas_ = Fit.predict_proba(myData[test])
fpr, tpr, thresholds = roc_curve(classVector[test], probas_[:,1])
roc_auc = auc(fpr, tpr)
probas_ = Fit.predict_proba(myData[train])
fpr2, tpr2, thresholds2 = roc_curve(classVector[train], probas_[:,1])
roc_auc2 = auc(fpr2, tpr2)
print "Training auc: ", roc_auc2, " Testing auc: ", roc_auc
The result is as follows:
Training auc: 0.423920939062 Testing auc: 0.388436883629
Training auc: 0.525472613736 Testing auc: 0.565581854043
Training auc: 0.470917930528 Testing auc: 0.259344660194
0,5 ? , 0,5, , , - . , , , 0,5 ?