I am trying to use featureunion for the first time in the sklearn pipeline to combine numeric (2 columns) and text functions (1 column) to classify several classes.
from sklearn.preprocessing import FunctionTransformer
from sklearn.pipeline import Pipeline
from sklearn.multiclass import OneVsRestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import FeatureUnion
get_text_data = FunctionTransformer(lambda x: x['text'], validate=False)
get_numeric_data = FunctionTransformer(lambda x: x[['num1','num2']], validate=False)
process_and_join_features = FeatureUnion(
[
('numeric_features', Pipeline([
('selector', get_numeric_data),
('clf', OneVsRestClassifier(LogisticRegression()))
])),
('text_features', Pipeline([
('selector', get_text_data),
('vec', CountVectorizer()),
('clf', OneVsRestClassifier(LogisticRegression()))
]))
]
)
In this code, "text" has text columns, and "num1", "num2" have 2 numeric columns.
Error message
TypeError: All estimators should implement fit and transform. 'Pipeline(memory=None,
steps=[('selector', FunctionTransformer(accept_sparse=False,
func=<function <lambda> at 0x7fefa8efd840>, inv_kw_args=None,
inverse_func=None, kw_args=None, pass_y='deprecated',
validate=False)), ('clf', OneVsRestClassifier(estimator=LogisticRegression(C=1.0, class_weigh...=None, solver='liblinear', tol=0.0001,
verbose=0, warm_start=False),
n_jobs=1))])' (type <class 'sklearn.pipeline.Pipeline'>) doesn't
Any step I missed?