When stacking a pandas DataFrame , a Series returned. Usually after I fold the DataFrame , I convert it back to a DataFrame . However, the default names coming from the data stacked on the stack make renaming the columns a bit hacked. I am looking for a simpler / inline way to give the columns reasonable names after stacking.
For example, for the following DataFrame :
In [64]: df = pd.DataFrame({'id':[1,2,3], ...: 'date':['2015-09-31']*3, ...: 'value':[100, 95, 42], ...: 'value2':[200, 57, 27]}).set_index(['id','date']) In [65]: df Out[65]: value value2 id date 1 2015-09-31 100 200 2 2015-09-31 95 57 3 2015-09-31 42 27
I add and convert it back to a DataFrame as follows:
In [68]: df.stack().reset_index() Out[68]: id date level_2 0 0 1 2015-09-31 value 100 1 1 2015-09-31 value2 200 2 2 2015-09-31 value 95 3 2 2015-09-31 value2 57 4 3 2015-09-31 value 42 5 3 2015-09-31 value2 27
So, to properly name these columns, I would need to do something like this:
In [72]: stacked = df.stack() In [73]: stacked Out[73]: id date 1 2015-09-31 value 100 value2 200 2 2015-09-31 value 95 value2 57 3 2015-09-31 value 42 value2 27 dtype: int64 In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True) In [88]: stacked.reset_index().rename(columns={0:'value'}) Out[88]: id date var_name value 0 1 2015-09-31 value 100 1 1 2015-09-31 value2 200 2 2 2015-09-31 value 95 3 2 2015-09-31 value2 57 4 3 2015-09-31 value 42 5 3 2015-09-31 value2 27
Ideally, the solution would look something like this:
df.stack(new_index_name='var_name', new_col_name='value')
But looking at the docs , it doesn't look like stack accepts any such arguments. Is there an easier / inline way in pandas to work with this workflow?