Pandas or python equivalent for tidyr complete

I have data that looks like this:

library("tidyverse") df <- tibble(user = c(1, 1, 2, 3, 3, 3), x = c("a", "b", "a", "a", "c", "d"), y = 1) df # user xy # 1 1 a 1 # 2 1 b 1 # 3 2 a 1 # 4 3 a 1 # 5 3 c 1 # 6 3 d 1 

Python format:

 import pandas as pd df = pd.DataFrame({'user':[1, 1, 2, 3, 3, 3], 'x':['a', 'b', 'a', 'a', 'c', 'd'], 'y':1}) 

I would like to β€œpopulate” the data frame so that each user an entry for all possible x with default padding y set to 0.

This is somewhat trivial in R (tidyverse / tidyr):

 df %>% complete(nesting(user), x = c("a", "b", "c", "d"), fill = list(y = 0)) # user xy # 1 1 a 1 # 2 1 b 1 # 3 1 c 0 # 4 1 d 0 # 5 2 a 1 # 6 2 b 0 # 7 2 c 0 # 8 2 d 0 # 9 3 a 1 # 10 3 b 0 # 11 3 c 1 # 12 3 d 1 

Is there an equivalent to complete in pandas / python that will give the same result?

+5
source share
1 answer

You can use reindex MultiIndex.from_product :

 df = df.set_index(['user','x']) mux = pd.MultiIndex.from_product([df.index.levels[0], df.index.levels[1]],names=['user','x']) df = df.reindex(mux, fill_value=0).reset_index() print (df) user xy 0 1 a 1 1 1 b 1 2 1 c 0 3 1 d 0 4 2 a 1 5 2 b 0 6 2 c 0 7 2 d 0 8 3 a 1 9 3 b 0 10 3 c 1 11 3 d 1 

Or set_index + stack + unstack :

 df = df.set_index(['user','x'])['y'].unstack(fill_value=0).stack().reset_index(name='y') print (df) user xy 0 1 a 1 1 1 b 1 2 1 c 0 3 1 d 0 4 2 a 1 5 2 b 0 6 2 c 0 7 2 d 0 8 3 a 1 9 3 b 0 10 3 c 1 11 3 d 1 
+5
source

Source: https://habr.com/ru/post/1268404/


All Articles