Consider this data set:
data_dict = {'ind' : [1, 2, 3, 4], 'location' : [301, 301, 302, 303], 'ind_var' : [4, 8, 10, 15], 'loc_var' : [1, 1, 7, 3]} df = pd.DataFrame(data_dict) df_indexed = df.set_index(['ind', 'location']) df_indexed
which looks like
ind_var loc_var ind location 1 301 4 1 2 301 8 1 3 302 10 7 4 303 15 3
ind_var is a variable that changes ind (= individual), and loc_var is location dependent. (I also have an extra variable that varies in both index and location, but I omit it to make presentation easier)
I need to convert the data so that each individual index contains all possible locations. I can reindex this way (just showing individuals 1 to 3):
new_shape = [(1, 301), (1, 302), (1, 303), (2, 301), (2, 302), (2, 303), (3, 301), (3, 302), (3, 303)] idx = pd.Index(new_shape) df2 = df_indexed.reindex(idx, method = None) df2.index.names = ['id', 'location']
which gives
ind_var loc_var id location 1 301 4 1 302 NaN NaN 303 NaN NaN 2 301 8 1 302 NaN NaN 303 NaN NaN 3 301 NaN NaN 302 10 7 303 NaN NaN
but I need a way to fill in the missing values, so I get:
ind_var loc_var id location 1 301 4 1 302 4 7 303 4 3 2 301 8 1 302 8 7 303 8 3 3 301 10 1 302 10 7 303 10 3
I tried two different things without success:
1) Using loc_dict = {301: 1, 302: 7, 303: 3} to replace loc_var and ind_dict = {1: 4, 2: 8, 3: 10, 4: 15} to replace ind_var
2) Using the groupby method.
# First reset index df_non_indexed = df2.reset_index() df_non_indexed['loc_var'] = df_non_indexed.groupby(['location'])['loc_var'].transform(lambda x: x.fillna(method='ffill'))
It almost works, but only fills forward (or backward)
There must be a very simple way to do this, but I could not figure it out! Thank you for your time.
Note: this is related to my question resaaping from wide to long . I took a different approach and simplified in the hope that this is easier to understand.