LabelEncoder (). Fit_transform vs. pd.get_dummies for categorical coding

I was recently informed that if you have a dataframe dflike this:

   A      B   C
0  0   Boat  45
1  1    NaN  12
2  2    Cat   6
3  3  Moose  21
4  4   Boat  43

You can automatically encode categorical data with pd.get_dummies:

df1 = pd.get_dummies(df)

What gives this:

   A   C  B_Boat  B_Cat  B_Moose
0  0  45     1.0    0.0      0.0
1  1  12     0.0    0.0      0.0
2  2   6     0.0    1.0      0.0
3  3  21     0.0    0.0      1.0
4  4  43     1.0    0.0      0.0

I usually use LabelEncoder().fit_transformfor this task before putting it in pd.get_dummies, but if I can skip a few steps that would be desirable.

Am I losing something by simply using pd.get_dummieson my full data frame to encode it?

+4
source share
1 answer

, LabelEncoder, . , ( ), pd.get_dummies (., , A C). OneHotEncoder. OneHotEncoder , , .

+5

Source: https://habr.com/ru/post/1655508/


All Articles