Python Pandas: string contains and does not contain

Question

I am trying to match rows of a Pandas DataFrame that contains and does not contain specific rows. For example:

import pandas
df = pandas.Series(['ab1', 'ab2', 'b2', 'c3'])
df[df.str.contains("b")]

Output:

0    ab1
1    ab2
2     b2
dtype: object

Required Conclusion:

2     b2
dtype: object

Question: is there an elegant way to say something like this?

df[[df.str.contains("b")==True] and [df.str.contains("a")==False]]
# Doesn't give desired outcome

+4

Sam perry Dec 03 '15 at 0:12

2 answers

Or:

>>> ts.str.contains('b') & ~ts.str.contains('a')
0    False
1    False
2     True
3    False
dtype: bool

or use regex:

>>> ts.str.contains('^[^a]*b[^a]*$')
0    False
1    False
2     True
3    False
dtype: bool

+3

behzad.nouri Dec 03 '15 at 0:22

maxymoo · Accepted Answer · 2015-12-03T00:25:17+0000

You are almost there, you just did not get the syntax exactly right, it should be:

df[(df.str.contains("b") == True) & (df.str.contains("a") == False)]

Another approach, which may be cleaner if you have many conditions to apply, is to link your filters along with a shrink or a loop:

from functools import reduce
filters = [("a", False), ("b", True)]
reduce(lambda df, f: df[df.str.contains(f[0]) == f[1]], filters, df)
#outputs b2