Grid Search parameter and cross-checked data set in the KNN classifier in Scikit-learn

I am trying to execute my first KNN classifier using SciKit-Learn. I am following the User Guide and other online examples, but there are some things that I'm not sure about. You can use the following for this message.

X = data Y = target

1) In most cases, when I read the machine learning pages, it seems that you need a learning set, a test set, and a test set. As far as I understand, cross-validation allows you to combine training and validation kits for training a model, and then you need to test it on a test set to get an estimate. However, I saw in the docs that in many cases, you can simply retrain the validation for the entire data set, and then report the results of the CV evaluation as accuracy. I understand that in an ideal world you would like to test individual data, but if it is legal, I would like to cross-check my entire data set and report such indicators

2) So, let's start the process

I define my KNN classifier as follows

knn = KNeighborsClassifier(algorithm = 'brute')

I am looking for the best n_neighbors using

clf = GridSearchCV(knn, parameters, cv=5)

Now if i say

clf.fit(X,Y)

,

clf.best_params_

clf.score(X,Y)

- , , 1 ?

clf.best_params_ = 14,

knn2 = KNeighborsClassifier(n_neighbors = 14, algorithm='brute')
cross_val_score(knn2, X, Y, cv=5)

, , , clf.fit , cross_val_score knn?

3) , "" :

X_train, X_test, Y_train, Y_test, β†’

knn = KNeighborsClassifier(algorithm = 'brute')
clf = GridSearchCV(knn, parameters, cv=5)
clf.fit(X_train,Y_train)
clf.best_params_

clf.score(X_test,Y_test)

?


, . , , , , , -, .

, gridsearch ( - ) .

+7
2
  • , CV , , - 2 CV .

  • .score float , best estimator ( , GridSearchCV) X, Y

  • , - 14, , , , . (- , ). , , , .

, :)

+6

, /. . GridSearchCV 5- , (clf.fit(X, y)), (80%) (20%).

, , clf.cv_results_. , , mean_test_score ( , 1 n_neighbor). 'mean_train_score', , . . (. Knn - ML, , StandardScaler):

    pipe = Pipeline([
        ('sc', StandardScaler()),     
        ('knn', KNeighborsClassifier(algorithm='brute')) 
    ])
    params = {
        'knn__n_neighbors': [3, 5, 7, 9, 11] # usually odd numbers
    }
    clf = GridSearchCV(estimator=pipe,           
                      param_grid=params, 
                      cv=5,
                      return_train_score=True) # Turn on cv train scores
    clf.fit(X, y)

: n_neighbor, , GridSearchCV. , .

+1

Source: https://habr.com/ru/post/1660956/


All Articles