Xgboost using auc label correctly

I have a slightly unbalanced dataset for a binary classification problem, with a positive and negative ratio of 0.6. I recently learned about the auc metric from this answer: https://stats.stackexchange.com/a/132832/128229 and decided to use it.

But I came across another link http://fastml.com/what-you-wanted-to-know-about-auc/ , which claims that AUC-ROC is insensitive to class imbalance, and we should use AUC for a curve with a precision recall.

Xgboost docs are unclear what AUC they use, do they use AUC-ROC? The link also mentions that AUC should be used only if you do not care about probability and only care about ranking.

However, since I am using the binary: logistic task, I think I should take care of the probabilities, since I have to set a threshold for my forecasts.

The xgboost settings guide https://github.com/dmlc/xgboost/blob/master/doc/how_to/param_tuning.md also offers an alternative method for handling class imbalance without balancing positive and negative patterns and using max_delta_step = 1.

So can someone explain when the AUC outperformed another method for xgboost to handle class imbalance. And if I use AUC, what is the threshold that I need to set for forecasting, or, more generally, how exactly should I use AUC to solve the problem of imbalance of binary classification in xgboost?

EDIT:

, , , , : ?

+4

Source: https://habr.com/ru/post/1666340/


All Articles