Is it possible to create a ROC section from SVM with a pre-computed kernel in scikit-learn?

Question

Is it possible to create a ROC section from SVM with a pre-computed kernel in scikit-learn?

I use this example to create a ROC graph from SVM classification results: http://scikit-learn.org/0.13/auto_examples/plot_roc.html

However, each data point effectively consists of 4 vectors of length-d combined using a user-defined kernel function that does not correspond to a specific paradigm K (X, X). So, I have to provide a precalculated kernel for learning scikit to do the classification. It looks something like this:

K = numpy.zeros(shape = (n, n)) # w1 + w2 + w3 + w4 = 1.0 # v1: array, shape (n, d) # w1: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v1, v1) mu = 1.0 / numpy.mean(chi) K += w1 * numpy.exp(-mu * chi) # v2: array, shape (n, d) # w2: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v2, v2) mu = 1.0 / numpy.mean(chi) K += w2 * numpy.exp(-mu * chi) # v3: array, shape (n, d) # w3: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v3, v3) mu = 1.0 / numpy.mean(chi) K += w3 * numpy.exp(-mu * chi) # v4: array, shape (n, d) # w4: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v4, v4) mu = 1.0 / numpy.mean(chi) K += w4 * numpy.exp(-mu * chi) return K

The main obstacle to creating the ROC graph (from the link above), apparently, is the process of splitting the data into two sets, and then calling predict_proba() in the test set. Is it possible to do this in scikit-learn using a pre-computed kernel?

+4

python scikit-learn machine-learning svm roc

Magsol May 23 '13 at 16:30

source share

1 answer

Bull · Accepted Answer · 2013-05-24T04:44:39+0000

The short answer is "maybe not." Have you tried something like below?

Based on the example of http://scikit-learn.org/stable/modules/svm.html you need something like:

  import numpy as np from sklearn import svm X = np.array([[0, 0], [1, 1]]) y = [0, 1] clf = svm.SVC(kernel='precomputed') # kernel computation K = numpy.zeros(shape = (n, n)) # "At the moment, the kernel values between all training vectors # and the test vectors must be provided." # according to scikit learn web page. # -- This is the problem! # v1: array, shape (n, d) # w1: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v1, v1) mu = 1.0 / numpy.mean(chi) K += w1 * numpy.exp(-mu * chi) # v2: array, shape (n, d) # w2: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v2, v2) mu = 1.0 / numpy.mean(chi) K += w2 * numpy.exp(-mu * chi) # v3: array, shape (n, d) # w3: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v3, v3) mu = 1.0 / numpy.mean(chi) K += w3 * numpy.exp(-mu * chi) # v4: array, shape (n, d) # w4: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v4, v4) mu = 1.0 / numpy.mean(chi) K += w4 * numpy.exp(-mu * chi) # scikit-learn is a wrapper LIBSVM and looking at the LIBSVM Readme file # it seems you need kernel values for test data something like this: Kt = numpy.zeros(shape = (nt, n)) # t1: array, shape (nt, d) # w1: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(t1, v1) mu = 1.0 / numpy.mean(chi) Kt += w1 * numpy.exp(-mu * chi) # v2: array, shape (n, d) # w2: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(t2, v2) mu = 1.0 / numpy.mean(chi) Kt += w2 * numpy.exp(-mu * chi) # v3: array, shape (n, d) # w3: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(t3, v3) mu = 1.0 / numpy.mean(chi) Kt += w3 * numpy.exp(-mu * chi) # v4: array, shape (n, d) # w4: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(t4, v4) mu = 1.0 / numpy.mean(chi) Kt += w4 * numpy.exp(-mu * chi) clf.fit(K, y) # predict on testing examples probas_ = clf.predict_proba(Kt)

from here just copy the bottom of http://scikit-learn.org/0.13/auto_examples/plot_roc.html

Is it possible to create a ROC section from SVM with a pre-computed kernel in scikit-learn?

More articles: