Difference between StratifiedKFold and StratifiedShuffleSplit in sklearn

As the name I wonder what is the difference between

StratifiedKFold with shuffle = True

StratifiedKFold(n_splits=10, shuffle=True, random_state=0)

and

StratifiedShuffleSplit

StratifiedShuffleSplit(n_splits=10, test_size=’default’, train_size=None, random_state=0)

and what are the benefits of using StratifiedShuffleSplit

+4
source share
1 answer

In KFolds, each test set should not overlap, even with shuffling. Using KFolds and shuffle, data is shuffled once at the beginning, and then divided by the number of desired partitions. Test data is always one of the sections, the rest is train data.

ShuffleSplit , . , .

. . ShuffleSplit.

splits = 5

tx = range(10)
ty = [0] * 5 + [1] * 5

from sklearn.model_selection import StratifiedShuffleSplit, StratifiedKFold
from sklearn import datasets

kfold = StratifiedKFold(n_splits=splits, shuffle=True, random_state=42)
shufflesplit = StratifiedShuffleSplit(n_splits=splits, random_state=42, test_size=2)

print("KFold")
for train_index, test_index in kfold.split(tx, ty):
    print("TRAIN:", train_index, "TEST:", test_index)

print("Shuffle Split")
for train_index, test_index in shufflesplit.split(tx, ty):
    print("TRAIN:", train_index, "TEST:", test_index)

:

KFold
TRAIN: [0 2 3 4 5 6 7 9] TEST: [1 8]
TRAIN: [0 1 2 3 5 7 8 9] TEST: [4 6]
TRAIN: [0 1 3 4 5 6 8 9] TEST: [2 7]
TRAIN: [1 2 3 4 6 7 8 9] TEST: [0 5]
TRAIN: [0 1 2 4 5 6 7 8] TEST: [3 9]
Shuffle Split
TRAIN: [8 4 1 0 6 5 7 2] TEST: [3 9]
TRAIN: [7 0 3 9 4 5 1 6] TEST: [8 2]
TRAIN: [1 2 5 6 4 8 9 0] TEST: [3 7]
TRAIN: [4 6 7 8 3 5 1 2] TEST: [9 0]
TRAIN: [7 2 6 5 4 3 0 9] TEST: [1 8]

, , KFolds , ShuffleSplit 2 /. , .

+11

Source: https://habr.com/ru/post/1684778/


All Articles