I am trying to execute my first KNN classifier using SciKit-Learn. I am following the User Guide and other online examples, but there are some things that I'm not sure about. You can use the following for this message.
X = data Y = target
1) In most cases, when I read the machine learning pages, it seems that you need a learning set, a test set, and a test set. As far as I understand, cross-validation allows you to combine training and validation kits for training a model, and then you need to test it on a test set to get an estimate. However, I saw in the docs that in many cases, you can simply retrain the validation for the entire data set, and then report the results of the CV evaluation as accuracy. I understand that in an ideal world you would like to test individual data, but if it is legal, I would like to cross-check my entire data set and report such indicators
2) So, let's start the process
I define my KNN classifier as follows
knn = KNeighborsClassifier(algorithm = 'brute')
I am looking for the best n_neighbors using
clf = GridSearchCV(knn, parameters, cv=5)
Now if i say
clf.fit(X,Y)
,
clf.best_params_
clf.score(X,Y)
- , , 1 ?
clf.best_params_ = 14,
knn2 = KNeighborsClassifier(n_neighbors = 14, algorithm='brute')
cross_val_score(knn2, X, Y, cv=5)
, , , clf.fit , cross_val_score knn?
3) , "" :
X_train, X_test, Y_train, Y_test,
β
knn = KNeighborsClassifier(algorithm = 'brute')
clf = GridSearchCV(knn, parameters, cv=5)
clf.fit(X_train,Y_train)
clf.best_params_
clf.score(X_test,Y_test)
?
, . , , , , , -, .
, gridsearch ( - ) .