A quick fix would be to create a Trie from your sentences and convert that trie into a regular expression. For your example, the template will look like this:
(?:bla\ bla|h(?:ave\ a\ tea|y\ i\ m\ luca)|i\ love\ (?:android|ios))
Here is an example on debuggex :

It might be nice to add '\b' as word boundaries to avoid matching "have a team" .
You will need a small Trie script . This is not an official package yet, but you can simply download here as trie.py in your current directory.
Then you can use this code to generate trie / regex:
import re from trie import Trie to_find_sentences = [ 'bla bla', 'have a tea', 'hy im luca', 'i love android', 'i love ios', ] trie = Trie() for sentence in to_find_sentences: trie.add(sentence) print(trie.pattern())
You invest some time to create Trie and regex, but processing should be very fast.
Here's a related answer (Speed up millions of regular expression notes in Python 3) if you need more info.
Note that he will not find overlapping sentences:
to_find_sentences = [ 'i love android', 'android Marshmallow' ] # ... print(re.findall(pattern, "I love android Marshmallow")) # ['I love android']
You need to modify the regular expression with positive results in order to find overlapping sentences.
source share