How to check if a string contains one of the substrings in the list, in pandas?

Question

How to check if a string contains one of the substrings in the list, in pandas?

Is there any function that would be equivalent to a combination of df.isin()and df[col].str.contains()?

For example, let's say I have a series s = pd.Series(['cat','hat','dog','fog','pet']), and I want to find all the places where sany of it contains ['og', 'at'], I would like to get everything except 'pet'.

I have a solution, but it's not quite elegant

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

Is there a better way to do this?

+70

python string pandas match dataframe

ari Oct 26 '14 at 20:23

source share

2 answers

str.contains OR (|):

s[s.str.contains('og|at')]

dataframe, str.contains:

df = pd.DataFrame(s)
df[s.str.contains('og|at')]

:

0 cat
1 hat
2 dog
3 fog

+32

l'L'l 26 . '14 21:33

Alex Riley · Accepted Answer · 2014-10-26T20:40:33+0000

One option is to use the regex character |to try to match each of the substrings in the words in your series s(still using str.contains).

, searchfor |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

@AndyHayden , , , $ ^, . .

, - re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

str.contains.

How to check if a string contains one of the substrings in the list, in pandas?

More articles: