Balanced random forest in scikit-learn (python)

Question

Balanced random forest in scikit-learn (python)

I am wondering if there is an implementation of Balanced Random Forest (BRF) in the latest versions of scikit-learn. BRF is used in case of unbalanced data. It works like a regular RF, but for every iteration of the bootstrap, it balances the prevalence class by un-sampling. For example, given the two classes N0 = 100 and N1 = 30 copies, for each random sample, he draws (with replacement) 30 copies from the first class and the same number of copies from the second class, that is, he trains the tree on a balanced data set. See this article for more information.

RandomForestClassifier () has a parameter of "class_weight =" that can be set to "balanced", but I'm not sure if it is associated with downsampling of boot training samples.

+5

scikit-learn classification random-forest

Arnold klein Nov 12 '16 at 17:11

source share

1 answer

mamafoku · Accepted Answer · 2017-09-17T14:49:57+0000

I know this is 10 months late, but I think you are looking for a BalancedBaggingClassifier from imblearn .

imblearn.ensemble.BalancedBaggingClassifier(base_estimator=None, n_estimators=10, max_samples=1.0, max_features=1.0, bootstrap=True, bootstrap_features=False, oob_score=False, warm_start=False, ratio='auto', replacement=False, n_jobs=1, random_state=None, verbose=0)

Effectively, what it allows you to do is to consistently underestimate your majority class when setting the score from above. You can use a random forest or any basic scikit-learn rating. Here is an example .

Balanced random forest in scikit-learn (python)

More articles: