I use this example to create a ROC graph from SVM classification results: http://scikit-learn.org/0.13/auto_examples/plot_roc.html
However, each data point effectively consists of 4 vectors of length-d combined using a user-defined kernel function that does not correspond to a specific paradigm K (X, X). So, I have to provide a precalculated kernel for learning scikit to do the classification. It looks something like this:
K = numpy.zeros(shape = (n, n)) # w1 + w2 + w3 + w4 = 1.0 # v1: array, shape (n, d) # w1: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v1, v1) mu = 1.0 / numpy.mean(chi) K += w1 * numpy.exp(-mu * chi) # v2: array, shape (n, d) # w2: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v2, v2) mu = 1.0 / numpy.mean(chi) K += w2 * numpy.exp(-mu * chi) # v3: array, shape (n, d) # w3: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v3, v3) mu = 1.0 / numpy.mean(chi) K += w3 * numpy.exp(-mu * chi) # v4: array, shape (n, d) # w4: float in [0, 1) chi = sklearn.metrics.pairwise.chi2_kernel(v4, v4) mu = 1.0 / numpy.mean(chi) K += w4 * numpy.exp(-mu * chi) return K
The main obstacle to creating the ROC graph (from the link above), apparently, is the process of splitting the data into two sets, and then calling predict_proba() in the test set. Is it possible to do this in scikit-learn using a pre-computed kernel?
source share