How to check if a string contains one of the substrings in the list, in pandas?

Is there any function that would be equivalent to a combination of df.isin()and df[col].str.contains()?

For example, let's say I have a series s = pd.Series(['cat','hat','dog','fog','pet']), and I want to find all the places where sany of it contains ['og', 'at'], I would like to get everything except 'pet'.

I have a solution, but it's not quite elegant

searchfor = ['og', 'at']
found = [s.str.contains(x) for x in searchfor]
result = pd.DataFrame[found]
result.any()

Is there a better way to do this?

+70
source share
2 answers

One option is to use the regex character |to try to match each of the substrings in the words in your series s(still using str.contains).

, searchfor |:

>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0    cat
1    hat
2    dog
3    fog
dtype: object

@AndyHayden , , , $ ^, . .

, - re.escape:

>>> import re
>>> matches = ['$money', 'x^y']
>>> safe_matches = [re.escape(m) for m in matches]
>>> safe_matches
['\\$money', 'x\\^y']

str.contains.

+128

str.contains OR (|):

s[s.str.contains('og|at')]

dataframe, str.contains:

df = pd.DataFrame(s)
df[s.str.contains('og|at')] 

:

0 cat
1 hat
2 dog
3 fog 
+32

Source: https://habr.com/ru/post/1693179/


All Articles