Set column names when laying pandas DataFrame

When stacking a pandas DataFrame , a Series returned. Usually after I fold the DataFrame , I convert it back to a DataFrame . However, the default names coming from the data stacked on the stack make renaming the columns a bit hacked. I am looking for a simpler / inline way to give the columns reasonable names after stacking.

For example, for the following DataFrame :

 In [64]: df = pd.DataFrame({'id':[1,2,3], ...: 'date':['2015-09-31']*3, ...: 'value':[100, 95, 42], ...: 'value2':[200, 57, 27]}).set_index(['id','date']) In [65]: df Out[65]: value value2 id date 1 2015-09-31 100 200 2 2015-09-31 95 57 3 2015-09-31 42 27 

I add and convert it back to a DataFrame as follows:

 In [68]: df.stack().reset_index() Out[68]: id date level_2 0 0 1 2015-09-31 value 100 1 1 2015-09-31 value2 200 2 2 2015-09-31 value 95 3 2 2015-09-31 value2 57 4 3 2015-09-31 value 42 5 3 2015-09-31 value2 27 

So, to properly name these columns, I would need to do something like this:

 In [72]: stacked = df.stack() In [73]: stacked Out[73]: id date 1 2015-09-31 value 100 value2 200 2 2015-09-31 value 95 value2 57 3 2015-09-31 value 42 value2 27 dtype: int64 In [74]: stacked.index.set_names('var_name', level=len(stacked.index.names)-1, inplace=True) In [88]: stacked.reset_index().rename(columns={0:'value'}) Out[88]: id date var_name value 0 1 2015-09-31 value 100 1 1 2015-09-31 value2 200 2 2 2015-09-31 value 95 3 2 2015-09-31 value2 57 4 3 2015-09-31 value 42 5 3 2015-09-31 value2 27 

Ideally, the solution would look something like this:

 df.stack(new_index_name='var_name', new_col_name='value') 

But looking at the docs , it doesn't look like stack accepts any such arguments. Is there an easier / inline way in pandas to work with this workflow?

+5
source share
2 answers

pd.melt often useful for converting DataFrames from wide to long. You can use pd.melt here if you first convert the id and date index levels into columns:

 In [56]: pd.melt(df.reset_index(), id_vars=['id', 'date'], value_vars=['value', 'value2'], var_name='var_name', value_name='value') Out[56]: id date var_name value 0 1 2015-09-31 value 100 1 2 2015-09-31 value 95 2 3 2015-09-31 value 42 3 1 2015-09-31 value2 200 4 2 2015-09-31 value2 57 5 3 2015-09-31 value2 27 
+6
source

So, here you can find a little cleaner using the fact that columns and Series can also carry names.

 In [45]: df Out[45]: value value2 id date 1 2015-09-31 100 200 2 2015-09-31 95 57 3 2015-09-31 42 27 In [46]: df.columns.name = 'var_name' In [47]: s = df.stack() In [48]: s.name = 'value' In [49]: s.reset_index() Out[49]: id date var_name value 0 1 2015-09-31 value 100 1 1 2015-09-31 value2 200 2 2 2015-09-31 value 95 3 2 2015-09-31 value2 57 4 3 2015-09-31 value 42 5 3 2015-09-31 value2 27 
+7
source

Source: https://habr.com/ru/post/1233187/


All Articles