Scikit-learn GridSearchCV with multiple repetitions

I am trying to get the best set of parameters for the SVR model. I would like to use GridSearchCVfor different values C. However, from a previous test, I noticed that splitting into the Training / Test set has a huge impact on overall performance (r2 in this case). To solve this problem, I would like to do a 5x cross-linking test again (10 x 5CV). Is there a built-in way to execute it with GridSearchCV?

FAST DECISION:

Following the idea presented in the official sci-kit documentation, a quick solution is presented:

NUM_TRIALS = 10
scores = []
for i in range(NUM_TRIALS):
     cv = KFold(n_splits=5, shuffle=True, random_state=i)
     clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=cv)
     scores.append(clf.best_score_)
print "Average Score: {0} STD: {1}".format(numpy.mean(scores), numpy.std(scores))
+4
source share
2

cross_validation. , , .

:

svr = SVC(kernel="rbf")
c_grid = {"C": [1, 10, 100, ...  ]}

# CV Technique "LabelKFold", "LeaveOneOut", "LeaveOneLabelOut", etc.

# To be used within GridSearch (5 in your case)
inner_cv = KFold(n_splits=5, shuffle=True, random_state=i)

# To be used in outer CV (you asked for 10)
outer_cv = KFold(n_splits=10, shuffle=True, random_state=i)

# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=c_grid, cv=inner_cv)
clf.fit(X_iris, y_iris)
non_nested_score = clf.best_score_

# Pass the gridSearch estimator to cross_val_score
# This will be your required 10 x 5 cvs
# 10 for outer cv and 5 for gridSearch internal CV
clf = GridSearchCV(estimator=svr, param_grid=c_grid, cv=inner_cv)
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv).mean()

- cross_val_score() GridSearchCV()

  • clf = GridSearchCV (, param_grid, cv = internal_cv).
  • clf, X, y, outer_cv cross_val_score
  • cross_val_score, X X_outer_train, X_outer_test outer_cv. y.
  • X_outer_test , X_outer_train clf fit() (GridSearchCV ). , X_outer_train X_inner, , , y_outer_train - y_inner.
  • X_inner X_inner_train X_inner_test inner_cv GridSearchCV. y
  • gridSearch X_inner_train y_train_inner X_inner_test y_inner_test.
  • 5 6 inner_cv_iters ( 5).
  • , , clf.best_estimator_ , .. X_train.
  • clf (gridsearch.best_estimator_) X_outer_test y_outer_test.
  • 3 9 outer_cv_iters ( 10), cross_val_score
  • mean() nested_score.
+7

- GridSearchCV. StratifiedKFold. KFold. . , RepeatedKFold RepeatedStratifiedKFold.

from sklearn.model_selection import GridSearchCV, RepeatedStratifiedKFold

# Define svr here
...

# Specify cross-validation generator, in this case (10 x 5CV)
cv = RepeatedKFold(n_splits=5, n_repeats=10)
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=cv)

# Continue as usual
clf.fit(...)
+2

Source: https://habr.com/ru/post/1669710/


All Articles