Suppose these regular expressions are really regular. Then, each of them can be transformed into a nondeterministic finite state machine , which can be converted into a deterministic finite state machine , which can be estimated in O (n) time in the input length.
But this does not apply to the issue of matching multiple regular expressions at once. We can do this by creating one regex that looks like this: (regexp1|regexp2|...) , and turn it into a single NFA / DFA. Add some tools to the branches of the machine to keep track of which particular regular expression created the path that matched the input, and you have your helper still O (n) in the length of the input string.
This method will not support any "regular expression" functions that make the language irregular, such as backlinks .
Another disadvantage is that the resulting DFA can be large. It is also possible to directly evaluate the NFA, which is probably slower but has better memory behavior.
In fact, it is quite easy to express this idea in code, without worrying about the machine. Just use the appropriate groups:
combined_regexp = (regexp1)|(regexp2)|...
During the evaluation, just look at which group corresponds to the input.
Keep in mind that most regex implementations / libraries have pretty bad behavior in some cases where they can take exponential time to compile or match regex. I am not sure what the problem is in practice. The Google RE2 library is one that has been specifically designed to not have this pathological behavior, but there may be others.
Another problem may be that if your regular expression implementation does not specifically advertise O (n) behavior, it may just try each alternative in turn. In this case, this approach will not buy you anything.
source share