How do you deal with data imbalance in SVM?

If I train SVM in the lrge training set, and if the class variable is True or False, will there be very few True values โ€‹โ€‹compared to the number of False values โ€‹โ€‹in the training set affecting the training model / results? Should they be equal? If my training set does not have an equal distribution of True and False, how can I take care of this so that my training is as effective as possible?

+6
source share
2 answers

It is good to have unbalanced data, because the SVM should be able to assign a greater punishment to errors of misclassification associated with a less probable instance (for example, โ€œTrueโ€ in your case), instead of assigning equal weight to the error, which leads to an undesirable classifier that assigns all the majority. However, you are likely to get better results with balanced data. In fact, it all depends on your data.

You can artificially distort data to get more balanced data. Why don't you check this document: http://pages.stern.nyu.edu/~fprovost/Papers/skew.PDF .

+3
source

My experience is that standard SVM classifiers do not really work very well on unbalanced data. I came across this for C-SVM and it is even worse for nu-SVM. You might want to watch P-SVM , which offers a mode that is especially suitable for unbalanced data.

+2
source

Source: https://habr.com/ru/post/921746/