How does the max_samples keyword for the Bagging classifier affect the number of samples used for each of the base estimates?

I want to understand how the max_samples value for the Bagging classifier affects the number of samples used for each of the base estimates.

This is the output of GridSearch:

GridSearchCV(cv=5, error_score='raise',
       estimator=BaggingClassifier(base_estimator=DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=1, spl... n_estimators=100, n_jobs=-1, oob_score=False,
         random_state=1, verbose=2, warm_start=False),
       fit_params={}, iid=True, n_jobs=-1,
       param_grid={'max_features': [0.6, 0.8, 1.0], 'max_samples': [0.6, 0.8, 1.0]},
       pre_dispatch='2*n_jobs', refit=True, scoring=None, verbose=2)

Here I find out what were the best options:

print gs5.best_score_, gs5.best_params_
0.828282828283 {'max_features': 0.6, 'max_samples': 1.0}

Now I select the best estimate for the grid search and try to see the number of samples that the special Bagging classifier uses in my set of 100 basic decision tree estimates.

val=[]
for i in np.arange(100):
    x = np.bincount(gs5.best_estimator_.estimators_samples_[i])[1]
    val.append(x)
print np.max(val)
print np.mean(val), np.std(val)

587
563.92 10.3399032877

Now the size of the training set is 891. Since CV is 5, 891 * 0.8 = 712.8 should be included in each classification of Bagging classifiers, and since max_samples 1.0, 891 * 0.5 * 1.0 = 712.8 should be the number of samples for each basic assessment or something close to her?

, 564 +/- 10 587, , , 712? .

+4
1

, , , . GridSearchCV - , , CV-. , .

, BaggingClassifier, GridSearchCV, 891 . , max_sample = 1. 891 . , , - . , bootstrap BaggingClassifier false.

, , ?

n n n * (1- (n-1 )/n) ^ n. 891 ,

>>> 891 * (1.- (890./891)**891)
563.4034437025824

(563,4) (563,8), , .

+2

Source: https://habr.com/ru/post/1650201/


All Articles