Well, it took longer than I expected, but here is a more general answer that works with an arbitrary number of options for each person. I'm sure there are simpler ways, so it would be great if someone could intercept something better for some of the following code.
df = pd.DataFrame( {'location' : ['A', 'A', 'A', 'B', 'B', 'B'], 'dist_to_A' : [0, 0, 0, 50, 50, 50], 'dist_to_B' : [50, 50, 50, 0, 0, 0], 'location_var': [10, 10, 10, 14, 14, 14], 'ind_var': [3, 8, 10, 1, 3, 4]})
which gives
dist_to_A dist_to_B ind_var location location_var 0 0 50 3 A 10 1 0 50 8 A 10 2 0 50 10 A 10 3 50 0 1 B 14 4 50 0 3 B 14 5 50 0 4 B 14
Then do:
df.index.names = ['ind'] # Add choice var df['choice'] = 1 # Create dictionaries we'll use later ind_to_loc = dict(df['location']) # gives ind_to_loc equal to {0 : 'A', 1 : 'A', 2 : 'A', 3 : 'B', 4 : 'B', 5: 'B'} ind_dict = dict(df['ind_var']) #gives { 0: 3, 1 : 8, 2 : 10, 3: 1, 4 : 3, 5: 4} loc_dict = dict( df.groupby('location').agg(lambda x : int(np.mean(x)) )['location_var'] ) # gives {'A' : 10, 'B' : 14}
Now I create a Multi-Index and re-index to get a long form
df = df.set_index( [df.index, df['location']] ) df.index.names = ['ind', 'location'] # re-index to long shape loc_list = ['A', 'B'] ind_list = [0, 1, 2, 3, 4, 5] new_shape = [ (ind, loc) for ind in ind_list for loc in loc_list] idx = pd.Index(new_shape) df_long = df.reindex(idx, method = None) df_long.index.names = ['ind', 'loc']
The long figure is as follows:
dist_to_A dist_to_B ind_var location location_var choice ind loc 0 A 0 50 3 A 10 1 B NaN NaN NaN NaN NaN NaN 1 A 0 50 8 A 10 1 B NaN NaN NaN NaN NaN NaN 2 A 0 50 10 A 10 1 B NaN NaN NaN NaN NaN NaN 3 A NaN NaN NaN NaN NaN NaN B 50 0 1 B 14 1 4 A NaN NaN NaN NaN NaN NaN B 50 0 3 B 14 1 5 A NaN NaN NaN NaN NaN NaN B 50 0 4 B 14 1
So now fill in the NaN values ββwith dictionaries:
df_long['ind_var'] = df_long.index.map(lambda x : ind_dict[x[0]] ) df_long['location'] = df_long.index.map(lambda x : ind_to_loc[x[0]] ) df_long['location_var'] = df_long.index.map(lambda x : loc_dict[x[1]] )
Finally, all that remains is to create dist_S
I will cheat here and assume that I can create a nested dictionary like this
nested_loc = {'A' : {'A' : 0, 'B' : 50}, 'B' : {'A' : 50, 'B' : 0}}
(That says: if you are at location A, then location A is at 0 km, and location B is at 50 km).
def nested_f(x): return nested_loc[x[0]][x[1]] df_long = df_long.reset_index() df_long['dist_S'] = df_long[['loc', 'location']].apply(nested_f, axis=1) df_long = df_long.drop(['dist_to_A', 'dist_to_B', 'location'], axis = 1 ) df_long
gives the desired result
ind loc ind_var location_var choice dist_S 0 0 A 3 10 1 0 1 0 B 3 14 0 50 2 1 A 8 10 1 0 3 1 B 8 14 0 50 4 2 A 10 10 1 0 5 2 B 10 14 0 50 6 3 A 1 10 0 50 7 3 B 1 14 1 0 8 4 A 3 10 0 50 9 4 B 3 14 1 0 10 5 A 4 10 0 50 11 5 B 4 14 1 0