How to deal with class imbalance in random sklearn forests. Should I use sample weight or class weight

Question

How to deal with class imbalance in random sklearn forests. Should I use sample weight or class weight

I am trying to solve the problem of binary classification with class imbalance. I have a data set of 210,000 records in which 92% 0s and 8% 1s . I use sklearn (v 0.16)in pythonfor random forests.

I see that when building the classifier there are two parameters sample_weightand class_weight. I am currently using the option class_weight="auto".

Am I using this correctly? What makes the class and weight of the sample really and what should I use?

+4

python scikit-learn supervised-learning random-forest

NG_21 Jan 7 '16 at 12:51

source share

1 answer

David Maust · Accepted Answer · 2016-01-07T17:26:59+0000

Class scales are what you should use.

The sample scales allow you to specify a factor for the impact that a particular sample has. Weighing the sample with a weight of 2.0 is approximately the same, as if the dot was present twice in the data (although the exact effect was dependent on the estimate).

The weight of the classes has the same effect, but it is used to apply the set factor to each pattern that falls into the specified class. In terms of functionality, you can use any, but it is class_weightsprovided for convenience, so you do not need to manually weigh each sample. You can also combine the use of two in which class weights are multiplied by sample weights.

sample_weights fit() , -, AdaBoostClassifier, .

How to deal with class imbalance in random sklearn forests. Should I use sample weight or class weight

More articles: