:
import numpy as np
import pandas as pd
""" create some test-data """
random_data = np.random.random([3, 3])
random_data[0,0] = 0.0
random_data[1,2] = 0.0
df = pd.DataFrame(random_data,
columns=['A', 'B', 'C'], index=['first', 'second', 'third'])
print(df)
""" binarize """
threshold = lambda x: x > 0
df_ = df.apply(threshold).astype(int)
print(df_)
:
A B C
first 0.000000 0.610263 0.301024
second 0.728070 0.229802 0.000000
third 0.243811 0.335131 0.863908
A B C
first 0 1 1
second 1 1 0
third 1 1 1
:
- get_dummies () parses each unique value for each column and introduces new columns (for each unique value) to mark whether this value is included
- = if column A has 20 unique values, 20 new columns are added, where exactly one column is true, the rest is false
source
share