Pandas - filter multi-index by condition for all values โ€‹โ€‹inside the index

I am trying to filter a dataframe with a multi-index like the following.

import numpy as np
import pandas as pd

data = pd.DataFrame(np.random.rand(8),
             index=[list('AABBCCDD'),
                    ['M', 'F']*4])
data['Count'] = [1,2,15,17,8,12,11,20]

I would like to select all rows where the โ€œCountโ€ for โ€œMโ€ and โ€œFโ€ inside a given external level index is greater than 10. So for the framework example, all rows โ€œBโ€ and โ€œDโ€ should be selected, but none of the other rows . The only way I can do this is to iterate over the external index, but since loops in pandas are almost never the best way to do what I think should be the best solution.

+4
source share
3 answers

groupby, filter + all, , thresh

data.groupby(level=0).filter(lambda x : x['Count'].gt(10).all())
Out[495]: 
            0  Count
B M  0.232856     15
  F  0.536026     17
D M  0.375064     11
  F  0.795447     20

Jpp, isin

s=data.Count.min(level=0).gt(10)
data.loc[data.index.get_level_values(0).isin(s[s].index)]
+5

groupby.transform :

res = data[data.groupby(data.index.get_level_values(0))['Count'].transform('min') > 10]

print(res)

#             0  Count
# B M  0.143501     15
#   F  0.964689     17
# D M  0.092362     11
#   F  0.981470     20
+2

Option 1

Stacking and unlocking with a level mask

data.unstack()[data.Count.gt(10).all(level=0)].stack()

            0  Count
B F  0.778883     17
  M  0.548054     15
D F  0.035073     20
  M  0.544838     11

Option 2

Using an argument levelfor pandas.Series.alland pd.DataFrame.reindex.
This avoids cracking / stacking.

mask = data.Count.gt(10).all(level=0)
data.reindex(mask.index[mask], level=0)

            0  Count
B M  0.548054     15
  F  0.778883     17
D M  0.544838     11
  F  0.035073     20
+2
source

Source: https://habr.com/ru/post/1696226/


All Articles