Does sklearn support a cost matrix?

Is it possible to train classifiers in sklearn with a price matrix with different costs for different errors? For example, in a class 2 problem, the cost matrix will be a 2 by 2 square matrix. For example, A_ij = the cost of classifying i as j.

The main classifier that I use is a random forest.

Thanks.

+6
source share
4 answers

The costing scenario that you are describing is not supported in scikit-learn, in any of the classifiers that we have.

+3
source

One way around this limitation is to use or resample. For example, if you perform binary classification with an unbalanced dataset and want to make minority class errors more costly, you can reprofile it. You can look at imbalanced-learn , which is a package from scikit-learn-contrib.

+3
source

You can always just look at your ROC curve. Each point of the ROC curve corresponds to a separate confusion matrix. Thus, specifying the matrix of confusion that you want by choosing a threshold for the classifier implies some kind of cost-weighting scheme. Then you just need to choose a confusion matrix that will imply the required cost matrix.

On the other hand, if you really had your heart set up, and you really want to β€œtrain” the algorithm using a cost matrix, you can β€œsort” it in sklearn.

Despite the fact that it is not possible to directly configure the algorithm for cost sensitivity in sklearn, you can use the sorting of cost parameters to configure your hyperparameters. I did something similar to this using a genetic algorithm. This is not really a good job, but it should give a modest boost to performance.

+2
source

It may not be directly to your question (since you are asking about a random forest). But for SVM (in Sklearn), you can use the class_weight parameter to specify the weights of different classes. In essence, you will go to the dictionary.

You can refer to this page to see an example using class_weight.

+1
source

Source: https://habr.com/ru/post/973201/


All Articles