How to flip one column of a DataFrame into a truth table with columns based on another DataFrame?

Question

How to flip one column of a DataFrame into a truth table with columns based on another DataFrame?

I have dfwith user_idand a category. I would like to convert this to a truth table to see if this user has at least one entry for this category. However, the summary table should also have columns for all categories that appear in "df_list", which may not appear at all in df.

Now I create a truth table with groupby+ size, and then check for the absence of any columns, and then manually set these columns to False, but I was wondering if there is a way to accomplish this in the initial step groupby.

Here is an example:

import pandas as pd
df = pd.DataFrame({'user_id': [1,1,1,2,2],
                 'category': ['A', 'B', 'D', 'A', 'F']})
df_list = pd.DataFrame({'category': ['A', 'B', 'C', 'D', 'E', 'F']})

df_truth = df.groupby(['user_id', 'category']).size().unstack(fill_value=0).astype(bool)
#category     A      B      D      F
#user_id                            
#1         True   True   True  False
#2         True  False  False   True

To then obtain the desired result, follow these steps:

missing_vals = df_list.category.unique()[~pd.Series(df_list.category.unique()).isin(df_truth.columns)]
for element in missing_vals:
    df_truth.loc[:,element] = False
#category     A      B      D      F      C      E
#user_id                                          
#1         True   True   True  False  False  False
#2         True  False  False   True  False  False

+4

python pandas dataframe categorical-data pandas-groupby

ALollz 31 . '18 22:58

1

cᴏʟᴅsᴘᴇᴇᴅ · Accepted Answer · 2018-03-31T23:04:59+0000

1
crosstab
. crosstab/pivot .

i = df.user_id
j = pd.Categorical(df.category, categories=df_list.category)

pd.crosstab(i, j).astype(bool)

col_0       A      B      C      D      E      F
user_id                                         
1        True   True  False   True  False  False
2        True  False  False  False  False   True

2
unstack + reindex
, reindex:

(df.groupby(['user_id', 'category'])
   .size()
   .unstack(fill_value=0)
   .reindex(df_list.category, axis=1, fill_value=0)
   .astype(bool)
)

category     A      B      C      D      E      F
user_id                                          
1         True   True  False   True  False  False
2         True  False  False  False  False   True

How to flip one column of a DataFrame into a truth table with columns based on another DataFrame?

More articles: