Delete row groups based on condition

I have the following pandas dataframe:

df =

A          B       C
111-ABC    123    EEE
111-ABC    222    EEE
111-ABC    444    XXX
222-CCC    222    YYY
222-CCC    333    67T
333-DDD    123    TTT
333-DDD    123    BTB
333-DDD    444    XXX
333-DDD    555    AAA

I want to delete all row groups (grouped by A) that do not contain 123in the column B.

The expected result is one (row group 222-CCCdeleted):

result =

A          B       C
111-ABC    123    EEE
111-ABC    222    EEE
111-ABC    444    XXX
333-DDD    123    TTT
333-DDD    123    BTB
333-DDD    444    AAA

How to do it? I suggest that you should use it first groupby, but how do I filter groups of strings, not just specific strings?

result = df.groupby("A").... ??
+4
source share
2 answers

via query

a = df.query('B == 123').A.unique()
df.query('A in @a')

         A    B    C
0  111-ABC  123  EEE
1  111-ABC  222  EEE
2  111-ABC  444  XXX
5  333-DDD  123  TTT
6  333-DDD  123  BTB
7  333-DDD  444  XXX
8  333-DDD  555  AAA

You can include additional conditions during the first query

b = df.query('B == 123 & C == "EEE"').A.unique()
df.query('A in @b')

          A    B    C
0  111-ABC  123  EEE
1  111-ABC  222  EEE
2  111-ABC  444  XXX

If speed is important. Try it.

cond1 = df.B.values == 123
a = np.unique(df.A.values[cond1])
df.loc[df.A.isin(a)]
+2
source

You can use the syntax groupby().filter():

df.groupby('A').filter(lambda g: (g.B == 123).any())

enter image description here

+4

Source: https://habr.com/ru/post/1664703/


All Articles