How can I classify an ensemble (multi-classifier) ​​using scikit-learn?

I have a fairly limited data set on which I do supervisory, multi-class text classification using scikit-learn. To alleviate the lack of information a bit, I would like to do the following:

  • Extract ngrams from the content that I want to classify, combine it with the content unigrams and perform the classification

  • Introduce (or use an existing implementation) a voting-based ensemble classifier to improve classification accuracy. For example, both Multinomial Bayes and KNN seem to give good results for different classes: ideally, I would combine them so that I get a little better (and, hopefully, no worse) performance, rather than crappy ~ 50%, I can use my limited dataset.

While the first step is trivial, I can't learn much about how I could classify an ensemble using scikit-learn. I noted that scikit-learn has some entries in ensemble classes such as this , but it doesn't seem to be exactly what I'm looking for.

Does anyone know a concrete example of this using scikit-learn?

+4
source share
1 answer

I also struggled with this issue. After a lot of experimentation, I found a better way to classify the ensemble in a sci-kit, to average the values ​​of clf.predict_proba (X) of each trained model. The average value is better in the long run (runs of 50 or more) than any single model.

If you can guarantee that some of your trained models are stronger than others, you can also look at using weighted averages or an approach with several armed gang ensembles.

http://en.wikipedia.org/wiki/Multi-armed_bandit

+2
source

Source: https://habr.com/ru/post/1569031/


All Articles