Insert or remove step in scikit-learn Pipeline

Is it possible to delete or paste a step into the sklearn.pipeline.Pipeline object?

I am trying to do a grid search with one or without one step in a Pipeline object. And I wonder if I can insert or delete a step in the pipeline. I saw in the source code of Pipeline , there is a self.steps object containing all the steps. We can execute the named_steps() steps. Before modifying it, I want to make sure that I do not cause unexpected effects.

Here is a sample code:

 from sklearn.pipeline import Pipeline from sklearn.svm import SVC from sklearn.decomposition import PCA estimators = [('reduce_dim', PCA()), ('svm', SVC())] clf = Pipeline(estimators) clf 

Is it possible that we do something like steps = clf.named_steps() , and then insert or delete in this list? Does this have an undesirable effect on the clf object?

+5
source share
3 answers

Based on rudimentary testing, you can safely remove a step from the scikit-learn pipeline, like any list item, with a simple

 clf_pipeline.steps.pop(n) 

where n is the individual rating position that you are trying to remove.

+2
source

I see that everyone mentioned only the removal step. If you want to also insert a step into the pipeline:

 pipe.steps.append(['step name',transformer()]) 

pipe.steps works just like lists, so you can also insert an element in a specific place:

 pipe.steps.insert(1,['estimator',transformer()]) #insert as second step 
+2
source

Yes, it’s possible, but you must fulfill the same requirements that are required for Pipeline to be executed during initialization, that is, you cannot insert a predictor at any step except the last, you must call fit after updating Pipeline.steps, because after such an update all steps (they may have been studied in previous calls to fit ) will be invalid, also the last Pipeline step should always implement the fit method, all previous steps should implement fit_transform .

So, yes, this will work in the current codebase, but I think this is not a good solution for your task, it makes your code more dependent on the current Pipeline implementation, I think it is more convenient to create a new pipeline with modified steps, because Pipeline, at least, will check all your steps during initialization, also creating a new Pipeline will not differ significantly in terms of speed from changing the steps of the existing pipeline, but, as I said, creating a new Pipeline after each modification of the steps is safer when, if someone significantly changes the implementation of the pipeline.

+1
source

Source: https://habr.com/ru/post/1238439/


All Articles