I am a complete newbie to Machine Learning, NLP, Data Analysis, but I am very motivated to understand it better. I read a couple of books on NLTK, scikit-learn, etc. I discovered the python module "TextBlob" and found it easy to get started. So I created a sample demo python script, which is located at: https://gist.github.com/dpnishant/367cef57a8033138eb0a . I am trying to find the most suitable algorithm for analyzing moods and classifying text. My questions are as follows:
Why is sensitivity analysis in NaiveBayesClassifier slowed down even on such a small training set? Is this time constant or will it increase even more with more training data? Also, the sentiment analysis is incorrect (see. The script output, it says "negative" for the input text, "the sandwich is good"). What am I doing wrong?
I read in the TextBlob documentation that NaiveBayesClassifier is trained on the movie_review package. Is there an api where I can change it to something else, maybe nps_chat? Something that is not very clear to me is the role of the corps? I mean, we train the classifier with our own examples of training data, and then as a more specific case, for example. nps_chat, product_reviews, moview_review, etc. will help?
I understand that I need to train a classifier so that it works on unmarked data. But if the learning data becomes huge, what is the best way to handle this? Should the program each time create a model from the training data, or is there a way so that we can save the model to a file (something like a brine) and read it from there? Is this possible with TextBlob and will there be improvements in this methodology?
script, SklearnClassifier NLTKClassifier, . . ? , , /, nltk.classify - TextBlob, . Megam, LogisticRegression, SVM, BernoulliNB, GaussianNB .. , .