Can I extract y-values (data labels) from within cross-validation in scikit-learn?

Question

Can I extract y-values (data labels) from within cross-validation in scikit-learn?

In my text classification, this step:

Chunking with a custom transformer with several parameters (input: XML text file; output: a bunch of documents and shortcuts for these documents)
Vectorization , with TfidfVectorizer (input: list of documents; output: DxF matrix, where D is the number of documents and F is the number of functions)
Matrix transformer with sparse density (input: sparse matrix, output: dense matrix)
Dimension reduction , with PCA or similar technology (input: DxF matrix, output: DxN matrix, where N is parameter: number of desired components)
Prediction using GaussianMixture (input: DxN matrix, output: cluster assignment, i.e. grouping of documents)

There are so many parameters for each of these steps that it is inefficient to view all possible combinations of parameters manually, so I am trying to do a grid search using cross-references with CVGridSearch(). This can use a counter to compare output groups with source groups (shortcuts). (I am using scorer metrics.adjusted_rand_index().)

1, chunker, , 2, 2-4, . , , 1, , , 1. , 1 , .

: , CVGridSearch , , ?

: , , . ( .)

+4

python scikit-learn machine-learning nlp cross-validation

Jono 17 . '16 21:32

:

186

scikit-learn

5

Scikit-learn - -

4