I was recently informed that if you have a dataframe dflike this:
A B C
0 0 Boat 45
1 1 NaN 12
2 2 Cat 6
3 3 Moose 21
4 4 Boat 43
You can automatically encode categorical data with pd.get_dummies:
df1 = pd.get_dummies(df)
What gives this:
A C B_Boat B_Cat B_Moose
0 0 45 1.0 0.0 0.0
1 1 12 0.0 0.0 0.0
2 2 6 0.0 1.0 0.0
3 3 21 0.0 0.0 1.0
4 4 43 1.0 0.0 0.0
I usually use LabelEncoder().fit_transformfor this task before putting it in pd.get_dummies, but if I can skip a few steps that would be desirable.
Am I losing something by simply using pd.get_dummieson my full data frame to encode it?
source
share