Store Naive Bayes Classifier in NLTK

Question

Store Naive Bayes Classifier in NLTK

I am a little confused as to how I can save a trained classifier. As in the case of retraining the classifier every time I want to use it, it is obviously very bad and slow, how can I save it and load it again when I need it? Code below, in advance for help. I am using Python with the NLTK Naive Bayes Classifier.

classifier = nltk.NaiveBayesClassifier.train(training_set) # look inside the classifier train method in the source code of the NLTK library def train(labeled_featuresets, estimator=nltk.probability.ELEProbDist): # Create the P(label) distribution label_probdist = estimator(label_freqdist) # Create the P(fval|label, fname) distribution feature_probdist = {} return NaiveBayesClassifier(label_probdist, feature_probdist)

+45

python machine-learning classification nltk bayesian

user179169 Apr 04 '12 at 18:24

source share

2 answers

I went for the same problem and you cannot save the object, since this is the ELEFreqDistr NLTK class. Anyway, NLTK is hell slow. The training took 45 minutes on a decent set, and I decided to implement my own version of the algorithm (run it with pypy or rename it .pyx and install cython). It takes about 3 minutes with the same set, and it can just save the data as json (I will use pickle, which is faster / better).

I started a simple github project, checked the code here

+5

luke14free Apr 04 '12 at 18:39

source share

Jacob · Accepted Answer · 2012-04-04 22:05

To save:

 import pickle f = open('my_classifier.pickle', 'wb') pickle.dump(classifier, f) f.close()

To download later:

 import pickle f = open('my_classifier.pickle', 'rb') classifier = pickle.load(f) f.close()

Store Naive Bayes Classifier in NLTK

More articles: