In the pandas numeric age column, sample as ageD with qcut, we create open borders from the qcut borders:
import pandas as pd from itertools import chain d = {'age': {0: 5, 1: 23, 2: 43, 3: 70, 4: 30}} df = pd.DataFrame.from_dict(d) df['ageD'] = pd.qcut(df.iloc[:, 0], 2) df.ageD.cat.categories
From the index ([u '[5, 30]', u '(30, 70]'], dtype = 'object'), we do bopens:
>>> bopens = get_open_bounds(df) >>> bopens
Then we transform the categorical variable into dummy variables with get_dummies indicators:
df = pd.get_dummies(df) print df
I want to enrich the data frame with columns of open borders, df.shape will be quite large, ~ (10e6, 32). What is the best way to make 6 bopen cols for each row?
The df target will look like this:
>>> df age age_[5, 30] age_(30, 70] (-inf, 5] (-inf, 30] (-inf, 70] (5, +inf) (30, +inf) (70, +inf) 0 5 1 0 1 1 1 0 0 0 1 23 1 0 0 1 1 1 0 0 2 43 0 1 0 0 1 1 1 0 3 70 0 1 0 0 1 1 1 0 4 30 1 0 0 1 1 1 0 0
PS: get_open_bounds used to create bopens:
def get_open_bounds(df): bounds = [(int(x[1:]), int(y[:-1])) for x, y in [c.split(', ') for c in df.ageD.cat.categories]] bounds = list(chain(*bounds)) bounds