Why do I see all the original index elements in the slice of the data?

I have a multi data index:

import pandas as pd
import numpy as np


df = pd.DataFrame({'ind1': list('aaaaaaaaabbbbbbbbb'),
                   'ind2': list('cccdddeeecccdddeee'),
                   'ind3': list(range(3))*6,
                   'val1': list(range(100, 118)),
                   'val2': list(range(70, 88))})

df_mult = df.set_index(['ind1', 'ind2', 'ind3'])

                val1  val2
ind1 ind2 ind3            
a    c    0      100    70
          1      101    71
          2      102    72
     d    0      103    73
          1      104    74
          2      105    75
     e    0      106    76
          1      107    77
          2      108    78
b    c    0      109    79
          1      110    80
          2      111    81
     d    0      112    82
          1      113    83
          2      114    84
     e    0      115    85
          1      116    86
          2      117    87

Now I can select a subset of it using .loc, like this

df_subs = df_mult.loc[pd.IndexSlice['a', ['c', 'd'], :], :]

which gives the expected

                val1  val2
ind1 ind2 ind3            
a    c    0      100    70
          1      101    71
          2      102    72
     d    0      103    73
          1      104    74
          2      105    75

When i type

df_subs.index

I get

MultiIndex(levels=[[u'a', u'b'], [u'c', u'd', u'e'], [0, 1, 2]],
           labels=[[0, 0, 0, 0, 0, 0], [0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]],
           names=[u'ind1', u'ind2', u'ind3'])

Why is it still bat level 0, and not just a?

This can be a problem if I want to use index elements for something else. Then

df_subs.index.levels[0]

gives me

Index([u'a', u'b'], dtype='object', name=u'ind1')

However

df_subs.index.get_level_values('ind1').unique()

gives me

Index([u'a'], dtype='object', name=u'ind1')

which looks incompatible with me.

Is this a mistake or an alleged behavior?

+3
source share
1 answer

It discusses the GitHub surrounding this behavior here .

, , , MultiIndex, , - , MultiIndex. MultiIndex, , .. df_mult df_subs, .

, , MultiIndex, MultiIndex.remove_unused_levels().

>>> df_subs.index.remove_unused_levels().levels[0]
Index(['a'], dtype='object', name='ind1')
+3

Source: https://habr.com/ru/post/1687088/


All Articles