Sklearn svm area under ROC less than 0.5 for training data

I am using sklearn v 0.13.1 svm to try and solve the binary classification problem. I use the kfold cross-reference check and calculate the area under the roc curve (roc_auc) to check the quality of my model. However, for some folds, roc_auc is less than 0.5, even for training data. Is it impossible? Is it not always possible for an algorithm to at least reach 0.5 on the data it trains?

Here is my code:

classifier = svm.SVC(kernel='poly', degree=3, probability=True, max_iter=100000)
kf = cross_validation.KFold(len(myData), n_folds=3, indices=False)
for train, test in kf:
    Fit = classifier.fit(myData[train], classVector[train])

    probas_ = Fit.predict_proba(myData[test])
    fpr, tpr, thresholds = roc_curve(classVector[test], probas_[:,1])
    roc_auc = auc(fpr, tpr)

    probas_ = Fit.predict_proba(myData[train])
    fpr2, tpr2, thresholds2 = roc_curve(classVector[train], probas_[:,1])
    roc_auc2 = auc(fpr2, tpr2)

    print "Training auc: ", roc_auc2, " Testing auc: ", roc_auc

The result is as follows:

    Training auc: 0.423920939062  Testing auc: 0.388436883629
    Training auc: 0.525472613736  Testing auc: 0.565581854043
    Training auc: 0.470917930528  Testing auc: 0.259344660194

0,5 ? , 0,5, , , - . , , , 0,5 ?

+4
1

, , AUROCs < 0,5. , , , classifier.fit roc_curve Vector. , - , , . , , . . pos_label roc_curve , y_true .

, , AUROCs > 0.5 , . , , , .

+1

Source: https://habr.com/ru/post/1525494/


All Articles