How to use Pandas Dataframe with multi-index, which includes intervals?

I am trying to slice a DataFrame with a MultiIndex consisting of IntervalIndex and a regular index. Code example:

from pandas import Interval as ntv

df = pd.DataFrame.from_records([
   {'id': 1, 'var1': 0.1, 'ntv': ntv(0,10), 'E': 1}, 
   {'id':2, 'var1': 0.5, 'ntv': ntv(0,12), 'E': 0}
], index=('ntv', 'id'))

It looks like:

            E  var1
ntv     id
(0, 10] 1   1   0.1
(0, 12] 2   0   0.5

I would like to do this to slice a DataFrame into a specific value and return all rows that have an interval containing the value. Example:

df.loc[4]

should return (trivially)

    E  var1
id
1   1   0.1
2   0   0.5

The problem is that I keep getting TypeErrorabout the index, and docs shows a similar operation (but on a sibling index) that really creates what I'm looking for.

TypeError: only integer scalar arrays can be converted to a scalar index

, . id , , set_index('id').

, : a) - MultiIndexes b) / IntervalIndex MultiIndex.

+4
5

, get_loc, , . , :

from pandas import Interval as ntv

df = pd.DataFrame.from_records([
   {'id': 1, 'var1': 0.1, 'ntv': ntv(0,10), 'E': 1}, 
   {'id':2, 'var1': 0.5, 'ntv': ntv(0,12), 'E': 0}
], index=('ntv', 'id'))

df.iloc[(df.index.get_level_values(0).get_loc(4))]
            E  var1
ntv     id         
(0, 10] 1   1   0.1
(0, 12] 2   0   0.5

df.iloc[(df.index.get_level_values(0).get_loc(11))]
             E  var1
ntv     id         
(0, 12] 2   0   0.5

, inteval i.e

df = pd.DataFrame.from_records([
   {'id': 1, 'var1': 0.1, 'ntv': ntv(0,10), 'E': 1}, 
   {'id': 3, 'var1': 0.1, 'ntv': ntv(0,10), 'E': 1},
   {'id':2, 'var1': 0.5, 'ntv': ntv(0,12), 'E': 0}
], index=('ntv', 'id'))

df.iloc[(df.index.get_level_values(0).get_loc(4))]

            E  var1
ntv     id         
(0, 10] 1   1   0.1
        3   1   0.1
(0, 12] 2   0   0.5

, i.e

ndf = pd.concat([df]*10000)

%%timeit
ndf.iloc[ndf.index.get_level_values(0).get_loc(4)]
10 loops, best of 3: 32.8 ms per loop

%%timeit
intervals = ndf.index.get_level_values(0)
mask = [4 in i for i in intervals]
ndf.loc[mask]
1 loop, best of 3: 193 ms per loop
+5

, , . , . "slice (array ([0, 1], dtype = int64), array ([1, 2], dtype = int64), None)"

( index_type, Pandas)

index_type , index_type. .

   >>> arrays = [[1, 1, 2, 2], ['red', 'blue', 'red', 'blue']]
    >>> pd.MultiIndex.from_arrays(arrays, names=('number', 'color'))
    MultiIndex(levels=[[1, 2], ['blue', 'red']],
           labels=[[0, 0, 1, 1], [1, 0, 1, 0]],
           names=['number', 'color'])

, . [1] [1] , [1] [0] .

, , , , intervalindex . https://github.com/pandas-dev/pandas/issues/7640

"IntervalIndex .

- . , numba, , . , ?

+3

@Dark , Index.get_loc Index.get_indexer , , .

idx = df.index.get_level_values(0)
df.iloc[idx.get_indexer([4])]

:

intervals = df.index.get_level_values(0)
mask = [4 in i for i in intervals]
df.loc[mask]

, , , , , , / / :

df.reset_index(level=1, drop=True).loc[4] # good
df.loc[4]  # TypeError
+2

, , , , ( ). , , , , , , ?

, github:

ENH: MultiIndex.is_monotonic_decreasing # 17455

, , (0,6) (7,12):

df = pd.DataFrame.from_records([
   {'id': 1, 'var1': 0.1, 'ntv': ntv(0, 6), 'E': 1}, 
   {'id': 2, 'var1': 0.5, 'ntv': ntv(7,12), 'E': 0}
], index=('ntv', 'id'))

loc :

df.loc[4]

    E  var1
id         
1   1   0.1
+2
def check_value(num):
    return df[[num in i for i in map(lambda x: x[0], df.index)]] 

a = check_value(4)
a
>> 
            E  var1
ntv     id         
(0, 10] 1   1   0.1
(0, 12] 2   0   0.5  

,

a.index = a.droplevel(0)
0

Source: https://habr.com/ru/post/1690205/


All Articles