Store Naive Bayes Classifier in NLTK

I am a little confused as to how I can save a trained classifier. As in the case of retraining the classifier every time I want to use it, it is obviously very bad and slow, how can I save it and load it again when I need it? Code below, in advance for help. I am using Python with the NLTK Naive Bayes Classifier.

classifier = nltk.NaiveBayesClassifier.train(training_set) # look inside the classifier train method in the source code of the NLTK library def train(labeled_featuresets, estimator=nltk.probability.ELEProbDist): # Create the P(label) distribution label_probdist = estimator(label_freqdist) # Create the P(fval|label, fname) distribution feature_probdist = {} return NaiveBayesClassifier(label_probdist, feature_probdist) 
+45
python machine-learning classification nltk bayesian
Apr 04 '12 at 18:24
source share
2 answers

To save:

 import pickle f = open('my_classifier.pickle', 'wb') pickle.dump(classifier, f) f.close() 

To download later:

 import pickle f = open('my_classifier.pickle', 'rb') classifier = pickle.load(f) f.close() 
+80
Apr 04 '12 at 22:05
source share
β€” -

I went for the same problem and you cannot save the object, since this is the ELEFreqDistr NLTK class. Anyway, NLTK is hell slow. The training took 45 minutes on a decent set, and I decided to implement my own version of the algorithm (run it with pypy or rename it .pyx and install cython). It takes about 3 minutes with the same set, and it can just save the data as json (I will use pickle, which is faster / better).

I started a simple github project, checked the code here

+5
Apr 04 '12 at 18:39
source share



All Articles