I have dfwith user_idand a category. I would like to convert this to a truth table to see if this user has at least one entry for this category. However, the summary table should also have columns for all categories that appear in "df_list", which may not appear at all in df.
Now I create a truth table with groupby+ size, and then check for the absence of any columns, and then manually set these columns to False, but I was wondering if there is a way to accomplish this in the initial step groupby.
Here is an example:
import pandas as pd
df = pd.DataFrame({'user_id': [1,1,1,2,2],
'category': ['A', 'B', 'D', 'A', 'F']})
df_list = pd.DataFrame({'category': ['A', 'B', 'C', 'D', 'E', 'F']})
df_truth = df.groupby(['user_id', 'category']).size().unstack(fill_value=0).astype(bool)
To then obtain the desired result, follow these steps:
missing_vals = df_list.category.unique()[~pd.Series(df_list.category.unique()).isin(df_truth.columns)]
for element in missing_vals:
df_truth.loc[:,element] = False