I have a fairly limited data set on which I do supervisory, multi-class text classification using scikit-learn. To alleviate the lack of information a bit, I would like to do the following:
Extract ngrams from the content that I want to classify, combine it with the content unigrams and perform the classification
Introduce (or use an existing implementation) a voting-based ensemble classifier to improve classification accuracy. For example, both Multinomial Bayes and KNN seem to give good results for different classes: ideally, I would combine them so that I get a little better (and, hopefully, no worse) performance, rather than crappy ~ 50%, I can use my limited dataset.
While the first step is trivial, I can't learn much about how I could classify an ensemble using scikit-learn. I noted that scikit-learn has some entries in ensemble classes such as this , but it doesn't seem to be exactly what I'm looking for.
Does anyone know a concrete example of this using scikit-learn?
source
share