Configure multivariate Naive Bayes? scikit study

Question

Configure multivariate Naive Bayes? scikit study

Does anyone know how to set the alpha parameter when classifying naive bikes?

eg. First, I used a word pack to build a matrix of attributes, and each cell in the matrix is the number of words, and then I used tf (term frequency) to normalize the matrix.

But when I used Naive bayes to build the classifier model, I decided to use the NB polynomial (which, I think, is correct, not Bernoulli and Gaussian). The default alpha setting is 1.0 (the docs say this is Laplace smoothing, I have no idea what it is).

The result is very poor, since only 21% are reminded to find a positive class (target class). but when I set alpha = 0.0001 (I accidentally chose), the results get a 95% recall rate.

In addition, I checked the polynomial formula NB, I think this is because the problem is with the alpha version, because if I used the number of words as a function, alpha = 1 does not affect the results, however, since tf is between 0- 1, alpha = 1 really affects the results of this formula.

I also tested results that do not use tf, only the word sum is used, the results are also 95%, so does anyone know how to set the alpha value? because I have to use tf as a function.

Thanks.

+5

python scikit-learn classification naivebayes

HAO CHEN Nov 20 '15 at 15:59

source share

1 answer

jakevdp · Answer 1 · 2015-11-21T06:39:32+0000

In multi-million dollar Naive Bayes, the alpha parameter is what is called hyperparameter ; that is, a parameter that controls the shape of the model itself. In most cases, the best way to determine the optimal values for hyperparameters is to search the grid for possible parameter values, using cross-validation to evaluate the performance of the model according to your data for each value. Read the links above for details on how to do this with scikit-learn.

Configure multivariate Naive Bayes? scikit study

More articles: