Given data such as:
from sklearn.preprocessing import OneHotEncoder
import numpy as np
dt = 'object, i4, i4'
d = np.array([('aaa', 1, 1), ('bbb', 2, 2)], dtype=dt)
I want to exclude a text column using OHE functionality.
Why does the following not work?
ohe = OneHotEncoder(categorical_features=np.array([False,True,True], dtype=bool))
ohe.fit(d)
ValueError: could not convert string to float: 'bbb'
The documentation says:
categorical_features: "all" or array of indices or mask :
Specify what features are treated as categorical.
‘all’ (default): All features are treated as categorical.
array of indices: Array of categorical feature indices.
mask: Array of length n_features and with dtype=bool.
I use a mask, but it is still trying to convert to float.
Even using
ohe = OneHotEncoder(categorical_features=np.array([False,True,True], dtype=bool),
dtype=dt)
ohe.fit(d)
The same mistakes.
And also in the case of an "array of indices":
ohe = OneHotEncoder(categorical_features=np.array([1, 2]), dtype=dt)
ohe.fit(d)