Many of them understand how good "accuracy" is as an indicator of performance, and it depends on your problem. If it is wrong to classify “A” as “B” as bad / good as mistakenly classifying “B” as “A”, then there is no reason to do anything other than just mark everything as “A”, as you know that it will reliably get you at 98% accuracy (as long as this unbalanced distribution reflects the true distribution).
Without knowing your problem (and if accuracy is the measure you should use), the best answer I could give is "it depends on the data set." You may be able to get 99% accuracy with standard naive compartments, although this may be unlikely. For Naive Bayes in particular, you can do this to disable the use of suburbs (essentially a preliminary proportion of each class). This leads to the pretense that each class is equally likely, although the model parameters will be studied from unequal amounts of data.
Your proposed solution is common practice; it sometimes works well. Another practice is to create fake data for a smaller class (as it will depend on your data, for text documents I don’t know a special way). Another practice is to increase the weight of data points in underrepresented classes.
You can search for “unbalanced classification” and find much more information about these types of problems (they are one of the most difficult).
If accuracy is not really a good measure for your problem, you can find additional information on the “cost classification” that should be useful.
source share