How to access MultiIndex column after groupby in pandas?

With a single-index data framework, columns are available in a group by object:

df1 = pd.DataFrame({'a':[2,2,4,4], 'b': [5,6,7,8]})
df1.groupby('a')['b'].sum() -> 

a
2    11
4    15

But in the MultiIndex framework, when it is not grouped by level, the columns are no longer available in the group by object

df = pd.concat([df1, df1], keys=['c', 'd'], axis=1)
df -> 

   c     d
   a  b  a  b
0  2  5  2  5
1  2  6  2  6
2  4  7  4  7
3  4  8  4  8

df.groupby([('c','a')])[('c','b')].sum() -> 
KeyError: "Columns not found: 'b', 'c'"

This works as a workaround, but it is inefficient since it does not use the cpythonized aggregator, not to mention its uncomfortable look.

df.groupby([('c','a')]).apply(lambda df: df[('c', 'b')].sum())

Is there a way to access the MultiIndex column in the groupby object that I skipped?

+4
source share
2 answers

Adding a comma after the tag ('c','b')seems to work:

df.groupby([('c','a')])[('c','b'),].sum()

, pandas .

+4

, :

df.groupby([('c','a')]).sum()

         c  d    
         b  a   b
(c, a)           
2       11  4  11
4       15  8  15

df.groupby([('c','a')])[('c','b'),('d','b')].sum()

         c   d
         b   b
(c, a)        
2       11  11
4       15  15
0

Source: https://habr.com/ru/post/1649953/


All Articles