You already know that your regular expression is causing a problem for large files. So maybe you can make it a little smarter?
For example, you use + to match one or more characters. Say you have a string of 10,000 characters. The regular expression should look like 10,000 combinations to find the greatest match. Then you combine it with similar ones. Let's say you have a line with 20,000 characters and two + groups. How can they match in a file. Probably 10,000 x 10,000 possibilities. And so on and so forth.
If you can limit the number of characters (this is a bit like searching for email templates), probably limit the domain name of the email address to 256 and the address itself to 256 characters. Then it will be 256 x 256 possibilities for checking "only":
/[a-z0-9_\-\+]{1,256}@[a-z0-9\-]{1,256}\.([az]{2,3})(?:\.[az]{2})?/i
This is probably already much faster. Then, using these quantifiers will reduce backtracking for PCRE:
/[a-z0-9_\-\+]{1,256} +@ [a-z0-9\-]{1,256}+\.([az]{2,3})(?:\.[az]{2})?/i
Which should speed it up again.
hakre source share