The fastest way to check if a string contains any word from a list

I have a Python application.

There is a list of 450 prohibited phrases. A message has been received from the user. I want to check if this message contains any of these forbidden phrases. What is the fastest way to do this?

I currently have this code:

message = "sometext"
lista = ["a","b","c"]

isContaining = false

for a, member in enumerate(lista):
 if message.contains(lista[a]):
  isContaining = true
  break

Is there a faster way to do this? I need to process a message (maximum 500 characters) in less than 1 second.

+4
source share
4 answers

any built-in function specifically for this:

>>> message = "sometext"
>>> lista = ["a","b","c"]
>>> any(a in message for a in lista)
False
>>> lista = ["a","b","e"]
>>> any(a in message for a in lista)
True

Alternatively, you can check the intersection of sets:

>>> lista = ["a","b","c"]
>>> set(message) & set(lista)
set([])
>>> lista = ["a","b","e"]
>>> set(message) & set(lista)
set(['e'])
>>> set(['test','sentence'])&set(['this','is','my','sentence'])
set(['sentence'])

But you cannot check for subwords:

>>> set(['test','sentence'])&set(['this is my sentence'])
+8

regex

, .

lista = [...]
lista_escaped = [re.escape(item) for item in lista]
bad_match = re.compile('|'.join(lista_escaped))
is_bad = bad_match.search(message, re.IGNORECASE)
+3

I would combine the built-in anywith the operator in:

isContaining = any(a in message for a in lista)

I do not know if this is the fastest way, but it seems to me the easiest.

+1
source

We can also use the method set intersection

>>> message = "sometext"
>>> lista = ["a","b","c"]
>>> isContaining = False
>>> if set(list(message)).intersection(set(lista)):
...    isContaining = True
... 
>>> isContaining
False
>>> message = "sometext a"
>>> list(message)
['s', 'o', 'm', 'e', 't', 'e', 'x', 't', ' ', 'a']
>>> if set(list(message)).intersection(set(lista)):
...    isContaining = True
... 
>>> isContaining
True
0
source

Source: https://habr.com/ru/post/1570001/


All Articles