Subnets from pandas multi-index data grouped by level

Question

Subnets from pandas multi-index data grouped by level

How to make multiple plot from several indexed pandas DataFrame based on one of the multi-index levels?

I have results from a model using different technologies in different scenarios, the results may look something like this:

import numpy as np import pandas as pd df=pd.DataFrame(abs(np.random.randn(12,4)),columns=[2011,2012,2013,2014]) df['scenario']=['s1','s1','s1','s2','s2','s3','s3','s3','s3','s4','s4','s4'] df['technology'=['t1','t2','t5','t2','t6','t1','t3','t4','t5','t1','t3','t4'] dfg=df.groupby(['scenario','technology']).sum().transpose()

dfg will use technology every year for each scenario. I would like to have a subtitle for each scenario sharing the legend.

If I just use the subplots = True arguments, then it displays all possible combinations (12 subheadings)

 dfg.plot(kind='bar',stacked=True,subplots=True)

Based on this answer , I came close to what I was looking for.

 f,a=plt.subplots(2,2) fig1=dfg['s1'].plot(kind='bar',ax=a[0,0]) fig2=dfg['s2'].plot(kind='bar',ax=a[0,1]) fig2=dfg['s3'].plot(kind='bar',ax=a[1,0]) fig2=dfg['s3'].plot(kind='bar',ax=a[1,1]) plt.tight_layout()

but the result is not perfect, each subtitle has a different legend ... and this makes it quite difficult to read. There should be an easier way to make subtitles out of multi-indexed data frames ... Thanks!

EDIT1: Ted Petru suggested a good solution using the marine factor, but I have two problems. I already have a certain style, and I would prefer not to use the marine style (one solution could change the parameters of the seabed). Another problem is that I wanted to use a multi-line chart, requiring significant additional settings . Can I do something similar with Matplotlib?

+5

python matplotlib pandas multi-index subplot

Nabla Jan 23 '17 at 16:42

source share

1 answer

Ted petrou · Accepted Answer · 2017-01-23T17:40:34+0000

In my opinion, it’s easier to analyze the data when you “organize” your data, as a result of which each column represents one variable. Here you have all 4 years presented in different columns. Pandas has one function and one method for creating long (neat) data from wide (dirty) data. You can use df.stack or pd.melt(df) to organize your data. Then you can take advantage of the excellent seabed library, which expects accurate data to easily display whatever you want.

Sort Data

 df1 = pd.melt(df, id_vars=['scenario', 'technology'], var_name='year') print(df1.head()) scenario technology year value 0 s1 t1 2011 0.406830 1 s1 t2 2011 0.495418 2 s1 t5 2011 0.116925 3 s2 t2 2011 0.904891 4 s2 t6 2011 0.525101

Use seaborn

 import seaborn as sns sns.factorplot(x='year', y='value', hue='technology', col='scenario', data=df1, kind='bar', col_wrap=2, sharey=False)

Subnets from pandas multi-index data grouped by level

Sort Data

Use seaborn

More articles: