Should there be an SVM? NTLK has built-in tools for POS tags: Categorizing and labeling words
If you want to use your own classifier, look here: http://www.nltk.org/api/nltk.classify.html , Ctrl + F "svm", NTLK provides a shell for scikit-learn called SklearnClassifier. Then look at http://www.nltk.org/api/nltk.tag.html , Ctrl + F "classifier", there is a class nltk.tag.sequential.ClassifierBasedPOSTaggerthat, apparently, can use wrapped classifiers from sklearn.
I have not tried this, but it could work.
EDIT: It should work as follows:
from nltk.classify import SklearnClassifier
from sklearn.svm import SVC
clf = SklearnClassifier(SVC(),sparse=False)
cpos = nltk.tag.sequential.ClassifierBasedPOSTagger(train=train_sents,classifier_builder
= lambda train_feats: clf.train(train_feats))
The only problem is that sklearn classifiers only accept numerical functions, so you need to convert them somehow.
source
share