I have a python script that probably has 100 lines of regular expressions, each line corresponds to specific words.
the script explicitly consumes up to 100% of the processor each time it is started (I basically pass it a sentence and it will return the matching words found).
I want to combine them into 4 or 5 different "compiled" regular expression parsers, such as:
>>> words = ('hello', 'good\-bye', 'red', 'blue')
>>> pattern = re.compile('(' + '|'.join(words) + ')', re.IGNORECASE)
How many words can I safely have in this, and will it matter? Right now, if I run a cycle of a thousand random sentences, it is processing maybe 10 seconds, trying to dramatically increase this speed so that it can do 500 seconds (if possible).
Also, is such a list possible?
>>> words = ('\d{4,4}\.\d{2,2}\.\d{2,2}', '\d{2,2}\s\d{2,2}\s\d{4,4}\.')
>>> pattern = re.compile('(' + '|'.join(words) + ')', re.IGNORECASE)
>>> print pattern.findall("Today is 2010 11 08)