Using the get_params() function, you can access the various parts of the pipeline and their corresponding internal parameters. Here is an example of access to 'vect'
text_clf = Pipeline([('vect', CountVectorizer()), ('tfidf', TfidfTransformer()), ('clf', MultinomialNB())] print text_clf.get_params()['vect']
gives (for me)
CountVectorizer(analyzer=u'word', binary=False, decode_error=u'strict', dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content', lowercase=True, max_df=1.0, max_features=None, min_df=1, ngram_range=(1, 1), preprocessor=None, stop_words=None, strip_accents=None, token_pattern=u'(?u)\\b\\w\\w+\\b', tokenizer=None, vocabulary=None)
I did not bind the pipeline to any data in this example, so calling get_feature_names() at this point will return an error.
source share