KFold will provide train / test indices for sharing data on trains and test suites. It will divide the data set into k
consecutive folds (no default permutation). Then each time a test set is used, and the remaining k - 1
folds form a training set ( source ).
Suppose you have some data indices from 1 to 10. If you use n_fold=k
, in the first iteration you will get the n_fold=k
(i<=k)
fold as test indices and the remaining (k-1)
folds (without this i
reset) together as train indices.
Example
import numpy as np from sklearn.cross_validation import KFold x = [1,2,3,4,5,6,7,8,9,10,11,12] kf = KFold(12, n_folds=3) for train_index, test_index in kf: print (train_index, test_index)
Output
Add 1: [4 5 6 7 8 9 10 11] [0 1 2 3]
Add 2: [0 1 2 3 8 9 10 11] [4 5 6 7]
Add 3: [0 1 2 3 4 5 6 7] [8 9 10 11]
Import update for sklearn 0.20:
The KFold object was moved to the sklearn.model_selection
module in version 0.20. To import KFold into sklearn 0. 20+ use from sklearn.model_selection import KFold
. KFold current source of documentation
source share