Use regex. Here is the implementation:
import re
def filterlines(prefixes, lines):
pattern = "|".join([re.escape(p) for p in prefixes])
regex = re.compile(pattern)
for line in lines:
if regex.match(line):
yield line
First we build and compile a regular expression (expensive, but only one), but then the match is very fast.
Test code for the above:
with open("/usr/share/dict/words") as words:
prefixes = [line.strip() for line in words]
lines = [
"zoo this should match",
"000 this shouldn't match",
]
print(list(filterlines(prefixes, lines)))
source
share