Using multiple custom classes with Pipeline sklearn (Python)

I am trying to make a Pipeline textbook for students, but I am blocking. I am not an expert, but I am trying to improve. So thanks for your indulgence. In fact, I am trying to execute a pipeline to perform several steps when preparing a data block for a classifier:

  • Step 1: Data Frame Description
  • Step 2: Fill in NaN Values
  • Step 3: Convert Categorical Values ​​to Numbers

Here is my code:

class Descr_df(object):

    def transform (self, X):
        print ("Structure of the data: \n {}".format(X.head(5)))
        print ("Features names: \n {}".format(X.columns))
        print ("Target: \n {}".format(X.columns[0]))
        print ("Shape of the data: \n {}".format(X.shape))

    def fit(self, X, y=None):
        return self

class Fillna(object):

    def transform(self, X):
        non_numerics_columns = X.columns.difference(X._get_numeric_data().columns)
        for column in X.columns:
            if column in non_numerics_columns:
                X[column] = X[column].fillna(df[column].value_counts().idxmax())
            else:
                 X[column] = X[column].fillna(X[column].mean())            
        return X

    def fit(self, X,y=None):
        return self

class Categorical_to_numerical(object):

    def transform(self, X):
        non_numerics_columns = X.columns.difference(X._get_numeric_data().columns)
        le = LabelEncoder()
        for column in non_numerics_columns:
            X[column] = X[column].fillna(X[column].value_counts().idxmax())
            le.fit(X[column])
            X[column] = le.transform(X[column]).astype(int)
        return X

    def fit(self, X, y=None):
        return self

If I perform steps 1 and 2 or steps 1 and 3, this works, but if I perform steps 1, 2 and 3 at the same time. I have this error:

pipeline = Pipeline([('df_intropesction', Descr_df()), ('fillna',Fillna()), ('Categorical_to_numerical', Categorical_to_numerical())])
pipeline.fit(X, y)
AttributeError: 'NoneType' object has no attribute 'columns'
+6
source share
1

- , , ...

:

, , .

, :

  • Descr_df.fit(X) → self
  • newX = Descr_df.transform(X) → newX, , ( ). None
  • Fillna.fit(newX) → self
  • Fillna.transform(newX) → newX.columns. newX = None from step2. .

. Descr_df, :

def transform (self, X):
    print ("Structure of the data: \n {}".format(X.head(5)))
    print ("Features names: \n {}".format(X.columns))
    print ("Target: \n {}".format(X.columns[0]))
    print ("Shape of the data: \n {}".format(X.shape))
    return X

. Base Estimator Transformer scikit, .

class Descr_df(object) class Descr_df(BaseEstimator, TransformerMixin), Fillna(object) Fillna(BaseEstimator, TransformerMixin) ..

. Pipeline:

+3

Source: https://habr.com/ru/post/1016634/


All Articles