LinearSVC vs SVC (kernel = 'linear'): conflicting arguments?

Question

LinearSVC vs SVC (kernel = 'linear'): conflicting arguments?

From my research, I found three conflicting results:

Can someone explain when to use LinearSVC against SVC(kernel="linear") ?

LinearSVC seems to be slightly better than SVC, and is usually more finicky. But if scikit decided to spend time implementing a specific case for linear classification, why not LinearSVC outperform SVC ?

+20

scikit-learn machine-learning svm

THIS USER NEEDS HELP Jan 29 '16 at 3:55

source share

2 answers

The real problem is the problem with the scikit approach , where they call SVM , which is not SVM . LinearSVC actually minimizes the loss of squared loops instead of just losing the hinge, in addition, it penalizes the offset size (which is not SVM ), for more details see Another question: What are the parameters of SVC and LinearSVC in the equivalent of scikit-learn?

So which one to use? This is a pure problem . Since, due to the lack of a free dining theorem, it is impossible to say that "this loss function is best, period." Sometimes a square loss will work better, sometimes a normal hinge.

+6

lejlot Jan 31 '16 at 18:46

source share

eickenberg · Accepted Answer · 2016-01-29T10:12:52+0000

Mathematically, SVM optimization is a convex optimization problem, usually with a unique minimizer. This means that there is only one solution to this problem of mathematical optimization.

The differences in the results come from several aspects: SVC and LinearSVC should optimize the same problem, but in fact all liblinear estimates punish the interception, while libsvm do not (IIRC). This leads to a different mathematical optimization problem and, therefore, to different results . There may be other subtle differences, such as scaling and the default loss function (edit: make sure you set loss='hinge' in LinearSVC ). Further, in the classification of multiclasses, liblinear does one-vs-rest by default, while libsvm does one-vs-one.

SGDClassifier(loss='hinge') differs from the other two in the sense that it uses stochastic gradient descent rather than exact gradient descent and cannot converge to the same solution. However, the resulting solution can be better generalized.

Between SVC and LinearSVC one important decision criterion is that LinearSVC tends to converge faster, the larger the number of samples. This is due to the fact that the linear kernel is a special case optimized in Liblinear, but not in Libsvm.

LinearSVC vs SVC (kernel = 'linear'): conflicting arguments?

More articles: