Grid Search parameter and cross-checked data set in the KNN classifier in Scikit-learn

Question

Grid Search parameter and cross-checked data set in the KNN classifier in Scikit-learn

I am trying to execute my first KNN classifier using SciKit-Learn. I am following the User Guide and other online examples, but there are some things that I'm not sure about. You can use the following for this message.

X = data Y = target

1) In most cases, when I read the machine learning pages, it seems that you need a learning set, a test set, and a test set. As far as I understand, cross-validation allows you to combine training and validation kits for training a model, and then you need to test it on a test set to get an estimate. However, I saw in the docs that in many cases, you can simply retrain the validation for the entire data set, and then report the results of the CV evaluation as accuracy. I understand that in an ideal world you would like to test individual data, but if it is legal, I would like to cross-check my entire data set and report such indicators

2) So, let's start the process

I define my KNN classifier as follows

knn = KNeighborsClassifier(algorithm = 'brute')

I am looking for the best n_neighbors using

clf = GridSearchCV(knn, parameters, cv=5)

Now if i say

clf.fit(X,Y)

,

clf.best_params_

clf.score(X,Y)

- , , 1 ?

clf.best_params_ = 14,

knn2 = KNeighborsClassifier(n_neighbors = 14, algorithm='brute')
cross_val_score(knn2, X, Y, cv=5)

, , , clf.fit , cross_val_score knn?

3) , "" :

X_train, X_test, Y_train, Y_test, →

knn = KNeighborsClassifier(algorithm = 'brute')
clf = GridSearchCV(knn, parameters, cv=5)
clf.fit(X_train,Y_train)
clf.best_params_

clf.score(X_test,Y_test)

?

, . , , , , , -, .

, gridsearch ( - ) .

+7

scikit-learn knn cross-validation grid-search

browser 16 . '16 14:31

2

, /. . GridSearchCV 5- , (clf.fit(X, y)), (80%) (20%).

, , clf.cv_results_. , , mean_test_score ( , 1 n_neighbor). 'mean_train_score', , . . (. Knn - ML, , StandardScaler):

    pipe = Pipeline([
        ('sc', StandardScaler()),     
        ('knn', KNeighborsClassifier(algorithm='brute')) 
    ])
    params = {
        'knn__n_neighbors': [3, 5, 7, 9, 11] # usually odd numbers
    }
    clf = GridSearchCV(estimator=pipe,           
                      param_grid=params, 
                      cv=5,
                      return_train_score=True) # Turn on cv train scores
    clf.fit(X, y)

: n_neighbor, , GridSearchCV. , .

+1

Kai Zhao 04 . '19 9:15

nitheism · Accepted Answer · 2016-11-17T08:29:51+0000

, CV , , - 2 CV .
.score float , best estimator ( , GridSearchCV) X, Y
, - 14, , , , . (- , ). , , , .

, :)

Grid Search parameter and cross-checked data set in the KNN classifier in Scikit-learn

More articles: