Configure multivariate Naive Bayes? scikit study

Does anyone know how to set the alpha parameter when classifying naive bikes?

eg. First, I used a word pack to build a matrix of attributes, and each cell in the matrix is ​​the number of words, and then I used tf (term frequency) to normalize the matrix.

But when I used Naive bayes to build the classifier model, I decided to use the NB polynomial (which, I think, is correct, not Bernoulli and Gaussian). The default alpha setting is 1.0 (the docs say this is Laplace smoothing, I have no idea what it is).

The result is very poor, since only 21% are reminded to find a positive class (target class). but when I set alpha = 0.0001 (I accidentally chose), the results get a 95% recall rate.

In addition, I checked the polynomial formula NB, I think this is because the problem is with the alpha version, because if I used the number of words as a function, alpha = 1 does not affect the results, however, since tf is between 0- 1, alpha = 1 really affects the results of this formula.

I also tested results that do not use tf, only the word sum is used, the results are also 95%, so does anyone know how to set the alpha value? because I have to use tf as a function.

Thanks.

+5
source share
1 answer

In multi-million dollar Naive Bayes, the alpha parameter is what is called hyperparameter ; that is, a parameter that controls the shape of the model itself. In most cases, the best way to determine the optimal values ​​for hyperparameters is to search the grid for possible parameter values, using cross-validation to evaluate the performance of the model according to your data for each value. Read the links above for details on how to do this with scikit-learn.

+4
source

Source: https://habr.com/ru/post/1236449/


All Articles