Random forest with GridSearchCV - Error in param_grid

I am trying to create a Random Forest model with GridSearchCV, but I get an error related to param_grid: "ValueError: invalid max_features parameter for the control pipeline. Check the list of available parameters with the function` evaluator .get_params (). Keys ()", I classify documents, so I also push tf-idf vectorizer to the pipeline. Here is the code:

from sklearn import metrics from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import classification_report, f1_score, accuracy_score, precision_score, confusion_matrix from sklearn.pipeline import Pipeline #Classifier Pipeline pipeline = Pipeline([ ('tfidf', TfidfVectorizer()), ('classifier', RandomForestClassifier()) ]) # Params for classifier params = {"max_depth": [3, None], "max_features": [1, 3, 10], "min_samples_split": [1, 3, 10], "min_samples_leaf": [1, 3, 10], # "bootstrap": [True, False], "criterion": ["gini", "entropy"]} # Grid Search Execute rf_grid = GridSearchCV(estimator=pipeline , param_grid=params) #cv=10 rf_detector = rf_grid.fit(X_train, Y_train) print(rf_grid.grid_scores_) 

I can not understand why the error is displayed. The same btw occurs when I run the decision tree with GridSearchCV. (Scikit-learn 0.17)

+5
source share
2 answers

You must assign parameters to the specified step in the pipeline. In your case, classifier . Try adding classifier__ to the parameter name. Conveyor example

 params = {"classifier__max_depth": [3, None], "classifier__max_features": [1, 3, 10], "classifier__min_samples_split": [1, 3, 10], "classifier__min_samples_leaf": [1, 3, 10], # "bootstrap": [True, False], "classifier__criterion": ["gini", "entropy"]} 
+10
source

Try running get_params() on the final pipeline, not just the evaluation. Thus, it will generate all available pipe elements that are unique to the grid parameters.

 sorted(pipeline.get_params().keys()) 

] Classifier__n_jobs ',' Classifier__oob_score ',' Classifier__random_state ',' Classifier__verbose ',' Classifier__warm_start ',' steps ',' Tfidf ',' Tfidf__analyzer ',' Tfidf__binary ',' Tfidf__decode_fidffidffidffidffidffidffidffidffidfidfidfidfidffidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidfidf , 'Tfidf__lowercase', 'Tfidf__max_df', 'tfidf__max_features',' Tfidf__min_df ',' Tfidf__ngram_range ',' Tfidf__norm ',' Tfidf__preprocessor ',' Tfidf__smooth_idf ',' tfidf__stop_words', 'tfidf__strip_accents',' Tfidf__sublinear_tf ',' Tfidf__token_pattern ',' Tfidf__tokenizer ',' Tfidf__use_idf ',' Tfidf__vocabulary ']

This is especially useful if you use the short make_pipeline() syntax for Piplines, where you do not interfere with the shortcuts for pipe elements:

 pipeline = make_pipeline(TfidfVectorizer(), RandomForestClassifier()) sorted(pipeline.get_params().keys()) 

[ 'Randomforestclassifier', 'Randomforestclassifier__bootstrap' , 'Randomforestclassifier__class_weight', 'Randomforestclassifier__criterion', 'Randomforestclassifier__max_depth', 'randomforestclassifier__max_features',' randomforestclassifier__max_leaf_nodes', 'Randomforestclassifier__min_impurity_split', 'Randomforestclassifier__min_samples_leaf', 'Randomforestclassifier__min_samples_split', 'Randomforestclassifier__min_weight_fraction_leaf', 'randomforestclassifier__n_estimators',' Randomforestclassifier__n_jobs', 'Randomforestclassifier__oob_score', 'Randomforestclassifier__random_state', 'Randomforestclassifier__verbose', 'Randomforestclassifier__warm_start', "steps", 'Tfidfvectorizer', 'Tfidfvectorizer__analyzer', 'Tfidfvectorizer__binary', 'Tfidfvectorizer__decode_error', 'Tfidfvectorizer__dtype', 'Tfidfvectorizer__encoding', 'Tfidfvectorizer__input' , 'Tfidfvectorizer__lowercase', 'Tfidfvectorizer__max_df', 'tfidfvectorizer__max_features',' Tfidfvecto rizer__min_df ',' Tfidfvectorizer__ngram_range ',' Tfidfvectorizer__norm ',' Tfidfvectorizer__preprocessor ',' Tfidfvectorizer__smooth_idf ',' tfidfvectorizer__stop_words', 'tfidfvectorizer__strip_accents',' Tfidfvectorizer__sublinear_tf ',' Tfidfvectorizer__token_pattern ',' Tfidfvectorizer__tokenizer ',' Tfidfvectorizer__use_idf ',' Tfidfvectorizer__vocabulary ']

+2
source

Source: https://habr.com/ru/post/1240966/


All Articles