R dcast equivalent in python pandas

Question

R dcast equivalent in python pandas

I am trying to execute the equivalent of the below commands in python:

test <- data.frame(convert_me=c('Convert1','Convert2','Convert3'), values=rnorm(3,45, 12), age_col=c('23','33','44')) test library(reshape2) t <- dcast(test, values ~ convert_me+age_col, length ) t

That is it:

 convert_me values age_col Convert1 21.71502 23 Convert2 58.35506 33 Convert3 60.41639 44

becomes the following:

 values Convert2_33 Convert1_23 Convert3_44 21.71502 0 1 0 58.35506 1 0 0 60.41639 0 0 1

I know that with dummy variables I can get the value of the columns and convert as the column name, but is there a way to merge them (combination) easily, as R does?

+6

python pandas r

Adriano almeida Sep 2 '14 at 8:05

source share

2 answers

We can use the pd.get_dummies function. In current pandas 0.22.0, pd.get_dummies is commonly used when one-time encoding in a Dataframe.

 import pandas as pd df_dummies = pd.get_dummies( df[['convert_me', 'age_col']].apply(lambda x: '_'.join(x.astype(str)), axis=1), prefix_sep='') df = pd.concat([df["values"], df_dummies], axis=1) # Out[39]: # values Convert1_23 Convert2_33 Convert3_44 # 0 21.71502 1 0 0 # 1 58.35506 0 1 0 # 2 60.41639 0 0 1

0

Keiku Feb 22 '18 at 2:44

source share

joris · Accepted Answer · 2014-09-02T08:30:12+0000

You can use the crosstab function to do this:

 In [14]: pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']]) Out[14]: convert_me Convert1 Convert2 Convert3 age_col 23 33 44 values 21.71502 1 0 0 58.35506 0 1 0 60.41639 0 0 1

or pivot_table (with len as an aggregate function, but here you have to fillna use NaN with zeros manually):

 In [18]: df.pivot_table(index=['values'], columns=['age_col', 'convert_me'], aggfunc=len).fillna(0) Out[18]: age_col 23 33 44 convert_me Convert1 Convert2 Convert3 values 21.71502 1 0 0 58.35506 0 1 0 60.41639 0 0 1

See the docs here: http://pandas.pydata.org/pandas-docs/stable/reshaping.html#pivot-tables-and-cross-tabulations

Most functions in pandas will return a multi-level (hierarchical) index, in this case for columns. If you want to "melt" this one level, as in R, you can do:

 In [15]: df_cross = pd.crosstab(index=df['values'], columns=[df['convert_me'], df['age_col']]) In [16]: df_cross.columns = ["{0}_{1}".format(l1, l2) for l1, l2 in df_cross.columns] In [17]: df_cross Out[17]: Convert1_23 Convert2_33 Convert3_44 values 21.71502 1 0 0 58.35506 0 1 0 60.41639 0 0 1

R dcast equivalent in python pandas

More articles: