How do you deal with data imbalance in SVM?

Question

How do you deal with data imbalance in SVM?

If I train SVM in the lrge training set, and if the class variable is True or False, will there be very few True values compared to the number of False values in the training set affecting the training model / results? Should they be equal? If my training set does not have an equal distribution of True and False, how can I take care of this so that my training is as effective as possible?

+6

svm

London guy Jul 31 '12 at 8:34

source share

2 answers

TakeS · Answer 1 · 2012-08-01T01:10:04+0000

It is good to have unbalanced data, because the SVM should be able to assign a greater punishment to errors of misclassification associated with a less probable instance (for example, “True” in your case), instead of assigning equal weight to the error, which leads to an undesirable classifier that assigns all the majority. However, you are likely to get better results with balanced data. In fact, it all depends on your data.

You can artificially distort data to get more balanced data. Why don't you check this document: http://pages.stern.nyu.edu/~fprovost/Papers/skew.PDF .

UBod · Answer 2 · 2014-02-21T22:00:39+0000

My experience is that standard SVM classifiers do not really work very well on unbalanced data. I came across this for C-SVM and it is even worse for nu-SVM. You might want to watch P-SVM , which offers a mode that is especially suitable for unbalanced data.

How do you deal with data imbalance in SVM?

More articles: