The fastest way to check if all items in a string list are in a string

Question

The fastest way to check if all items in a string list are in a string

I have a line

"My name is Andrey, I'm very cool."

Suppose I have a list of lists, for example

[['andrew', 'name', 'awesome'], ['andrew', 'designation', 'awesome']]

I need my return solution

['andrew', 'name', 'awesome']

Naive solution:

myString='My name is Andrew, I am pretty awesome'
keywords = [['andrew', 'name', 'awesome'], ['andrew', 'designation', 'awesome']]
results=[]
for i in keywords:
 if all(substring in myString.lower() for substring in i):
    results.append(i)
print results

My problem is that when the list keywords are very large (say 100000), there are performance bottlenecks. I need to know the most effective way to do this.

+4

python string string-matching search

suzee Jan 16 '18 at 9:35

source share

1 answer

cᴏʟᴅsᴘᴇᴇᴅ · Accepted Answer · 2018-01-16T09:41:45+0000

BlackBear > , - . , .

. . -, .

string = "My name is Andrew, I am pretty awesome"
choices = [['andrew', 'name', 'awesome'], ['andrew', 'designation', 'awesome']]

1
in . in Boyer-Moore C .

>>> [c for c in choices if all(y in string.lower() for y in c)]
[['andrew', 'name', 'awesome']]

, . -, nitpick; string.lower() , -

v = string.lower()
%timeit [c for c in choices if all(y in v for y in c)]
1000000 loops, best of 3: 2.05 µs per loop

2
re.split + set.issuperset;

>>> import re
>>> [c for c in choices if set(re.split('\W', string.lower())).issuperset(c)] 
[['andrew', 'name', 'awesome']]

re.split , - .

, set . -

v = set(re.split('\W', string.lower()))
%timeit [c for c in choices if v.issuperset(c)] 
1000000 loops, best of 3: 1.13 µs per loop

, , . , . , - , , .

The fastest way to check if all items in a string list are in a string

More articles: