Using f-score in xgb

I am trying to use the f-score from scikit-learn as an evaluation metric in an xgb classifier. Here is my code:

clf = xgb.XGBClassifier(max_depth=8, learning_rate=0.004, n_estimators=100, silent=False, objective='binary:logistic', nthread=-1, gamma=0, min_child_weight=1, max_delta_step=0, subsample=0.8, colsample_bytree=0.6, base_score=0.5, seed=0, missing=None) scores = [] predictions = [] for train, test, ans_train, y_test in zip(trains, tests, ans_trains, ans_tests): clf.fit(train, ans_train, eval_metric=xgb_f1, eval_set=[(train, ans_train), (test, y_test)], early_stopping_rounds=900) y_pred = clf.predict(test) predictions.append(y_pred) scores.append(f1_score(y_test, y_pred)) def xg_f1(y, t): t = t.get_label() return "f1", f1_score(t, y) 

But there is an error:

Unable to handle a combination of binary and continuous

+5
source share
1 answer

The problem is that f1_score trying to compare non-binary and binary targets, and by default this method performs binary averaging. From the documentation, the middle one is : string, [No, 'binary (default),' micro, 'macro,' samples, 'weighted.'

In any case, this suggests that your prediction is similar to this [0.001, 0.7889,0.33...] , and your target is binary [0,1,0...] . Therefore, if you know your threshold, I recommend that you pre-process your result before sending it to the f1_score function. The usual threshold would be 0.5 .

A tested example of your evaluation function. No longer displays an error:

 def xg_f1(y,t): t = t.get_label() y_bin = [1. if y_cont > 0.5 else 0. for y_cont in y] # binaryzing your output return 'f1',f1_score(t,y_bin) 
+3
source

Source: https://habr.com/ru/post/1243015/


All Articles