Naive Bayes vs. SVM for text data classification

Question

Naive Bayes vs. SVM for text data classification

I am working on a problem that includes the classification of a large database of texts. The texts are very short (3-8 words each), and there are 10-12 categories into which I want to sort them. For functions, I just use the tf-idf frequency of each word. Thus, the number of functions is approximately equal to the number of words that are generally displayed in texts (I delete stop words and some others).

When trying to create a model for use, I had the following two ideas:

Naive Bayes (probably the implementation of Sklearn's multi-core highest bayes)
Vector machine support (with stochastic gradient descent used in training, as well as sklearn implementation)

I have built both models and am currently comparing the results.

What are the theoretical pros and cons of each model? Why can one of them be better for this type of problem? I am new to machine learning, so I would like to understand why you can do better.

Many thanks!

+4

scikit-learn theory supervised-learning machine-learning

Ryan Feb 12 '16 at 10:21

source share

2 answers

Horia Coman · Answer 1 · 2016-02-12T10:55:58+0000

, "", , , SVM , (, rbf, poly ..). , , , , , , , SVM , , .

ML , SVM , Naive Bayes.

, . , - . , , (y (a, b) = ab), , . SVM ( 2/3 ), , , .

- , . , Naive Bayes. ..

Prakhar Agarwal · Answer 2 · 2017-12-24T20:00:40+0000

(SVM) .
Multinomial Naive Bayes (MNB) .

MNB , . (Ng Jordan, 2002) , NB , SVM/ (LR) , MNB . SVM NB, 30-50 , , MNB - (9k ).

, NBSVM .

: https://github.com/prakhar-agarwal/Naive-Bayes-SVM
: http://nlp.stanford.edu/pubs/sidaw12_simple_sentiment.pdf
Cite: , . . " : , ." 50- : - 2. , 2012.

Naive Bayes vs. SVM for text data classification

More articles: