Reindexing and populating NaN values in Pandas

Question

Reindexing and populating NaN values in Pandas

Consider this data set:

data_dict = {'ind' : [1, 2, 3, 4], 'location' : [301, 301, 302, 303], 'ind_var' : [4, 8, 10, 15], 'loc_var' : [1, 1, 7, 3]} df = pd.DataFrame(data_dict) df_indexed = df.set_index(['ind', 'location']) df_indexed

which looks like

  ind_var loc_var ind location 1 301 4 1 2 301 8 1 3 302 10 7 4 303 15 3

ind_var is a variable that changes ind (= individual), and loc_var is location dependent. (I also have an extra variable that varies in both index and location, but I omit it to make presentation easier)

I need to convert the data so that each individual index contains all possible locations. I can reindex this way (just showing individuals 1 to 3):

 new_shape = [(1, 301), (1, 302), (1, 303), (2, 301), (2, 302), (2, 303), (3, 301), (3, 302), (3, 303)] idx = pd.Index(new_shape) df2 = df_indexed.reindex(idx, method = None) df2.index.names = ['id', 'location']

which gives

  ind_var loc_var id location 1 301 4 1 302 NaN NaN 303 NaN NaN 2 301 8 1 302 NaN NaN 303 NaN NaN 3 301 NaN NaN 302 10 7 303 NaN NaN

but I need a way to fill in the missing values, so I get:

  ind_var loc_var id location 1 301 4 1 302 4 7 303 4 3 2 301 8 1 302 8 7 303 8 3 3 301 10 1 302 10 7 303 10 3

I tried two different things without success:

1) Using loc_dict = {301: 1, 302: 7, 303: 3} to replace loc_var and ind_dict = {1: 4, 2: 8, 3: 10, 4: 15} to replace ind_var

2) Using the groupby method.

 # First reset index df_non_indexed = df2.reset_index() df_non_indexed['loc_var'] = df_non_indexed.groupby(['location'])['loc_var'].transform(lambda x: x.fillna(method='ffill'))

It almost works, but only fills forward (or backward)

There must be a very simple way to do this, but I could not figure it out! Thank you for your time.

Note: this is related to my question resaaping from wide to long . I took a different approach and simplified in the hope that this is easier to understand.

+4

python pandas reshape

cd98 Jul 18 '13 at 17:29

source share

2 answers

A cleaner solution than my original one. Thanks @ cd98

 In [41]: loc_dict = {301 : 1, 302 : 7, 303 : 3} In [42]: ind_dict = {1 : 4, 2: 8, 3: 10} In [198]: df2 = df2.reset_index() In [199]: df2 Out[199]: index id location ind_var loc_var 0 0 1 301 4 1 1 1 1 302 NaN NaN 2 2 1 303 NaN NaN 3 3 2 301 8 1 4 4 2 302 NaN NaN 5 5 2 303 NaN NaN 6 6 3 301 NaN NaN 7 7 3 302 10 7 8 8 3 303 NaN NaN In [200]: df2['ind_var'] = df2.id.map(ind_dict) In [201]: df2['loc_var'] = df2.location.map(loc_dict) In [202]: df2 Out[202]: index id location ind_var loc_var 0 0 1 301 4 1 1 1 1 302 4 7 2 2 1 303 4 3 3 3 2 301 8 1 4 4 2 302 8 7 5 5 2 303 8 3 6 6 3 301 10 1 7 7 3 302 10 7 8 8 3 303 10 3 In [203]: df2 = df2.set_index(['id', 'location']) In [204]: df2 Out[204]: index ind_var loc_var id location 1 301 0 4 1 302 1 4 7 303 2 4 3 2 301 3 8 1 302 4 8 7 303 5 8 3 3 301 6 10 1 302 7 10 7 303 8 10 3

+2

Tomugspurger Jul 18 '13 at 18:48

source share

Zhen sun · Accepted Answer · 2014-08-14T00:22:57+0000

This can be done with stack/unstack and groupby very easily:

 # unstack to wide, fillna as 0s df_wide = df_indexed.unstack().fillna(0) # stack back to long df_long = df_wide.stack() # change 0s to max using groupby. df_long['ind_var'] = df_long['ind_var'].groupby(level = 0).transform(lambda x: x.max()) df_long['loc_var'] = df_long['loc_var'].groupby(level = 1).transform(lambda x: x.max()) print df_long

This gives you the results:

  ind_var loc_var ind location 1 301 4 1 302 4 7 303 4 3 2 301 8 1 302 8 7 303 8 3 3 301 10 1 302 10 7 303 10 3 4 301 15 1 302 15 7 303 15 3

Reindexing and populating NaN values ​​in Pandas

More articles:

Reindexing and populating NaN values in Pandas