The most efficient way to exclude indexed rows in pandas dataframe

I am relatively new to Python and pandas and struggling with (hierarchical) indexes. I have the basics, but I'm lost with a more advanced cut and cross split.

For example, with the following data framework

import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(9).reshape((3, 3)),
    index=pd.Index(['Ohio', 'Colorado', 'New York'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))

I want to highlight everything except the line with the index 'Colorado'. For a small dataset, I could do:

data.ix[['Ohio','New York']]

But if the number of unique index values ​​is large, it is impractical. Naively, I would expect a syntax like

data.ix[['state' != 'Colorado']]

However, this only returns the first Ohio record and does not return New York. It works, but cumbersome

filter = list(set(data.index.get_level_values(0).unique()) - set(['Colorado']))
data[filter]

Surely there is a more Pythonic, verbose way to do this?

+4
source share
2

Python, pandas one: 'state' != 'Colorado' - True, pandas data.ix[[True]].

>>> data.loc[data.index != "Colorado"]
number    one  two  three
state                    
Ohio        0    1      2
New York    6    7      8

[2 rows x 3 columns]

DataFrame.query:

>>> data.query("state != 'New York'")
number    one  two  three
state                    
Ohio        0    1      2
Colorado    3    4      5

[2 rows x 3 columns]

data. ( , .query(), , Python , pandas - .)

+8

, MultiIndex

excluded = ['Ohio']
indices = data.index.get_level_values('state').difference(excluded)
indx = pd.IndexSlice[indices.values]

In [77]: data.loc[indx]
Out[77]:
number    one  two  three
state
Colorado    3    4      5
New York    6    7      8

MultiIndex

MultiIndex...

data = pd.DataFrame(np.arange(18).reshape(6,3), index=pd.MultiIndex(levels=[[u'AU', u'UK'], [u'Derby', u'Kensington', u'Newcastle', u'Sydney']], labels=[[0, 0, 0, 1, 1, 1], [0, 2, 3, 0, 1, 2]], names=[u'country', u'town']), columns=pd.Index(['one', 'two', 'three'], name='number'))

, 'Newcastle' MultiIndex

excluded = ['Newcastle']
indices = data.index.get_level_values('town').difference(excluded)
indx = pd.IndexSlice[:, indices.values]

In [115]: data.loc[indx, :]
Out[115]:
number              one  two  three
country town
AU      Derby         0    1      2
        Sydney        3    4      5
UK      Derby         0    1      2
        Kensington    3    4      5

  • , , data.sort_index(inplace=True)
  • , data.loc[indx, :]
  • indx = pd.IndexSlice[:, indices] , , indx = pd.IndexSlice[:, indices.values]
+1

Source: https://habr.com/ru/post/1525961/


All Articles