The most efficient way to find the first match in a batch regular expression for pcre library

There are several ways to customize the pcre library, and for some time I have been thinking about the most efficient processing method to match the regex list.

In my use case, I only care about getting a match, I would like to stop after the first match, and I don't care about the content of the match. I understand what I can use match_limit.

Efficiency is important.

Regular expressions are presented in a list, and I could "or" combine them into one regular expression. (?:(?:regex1)|(?:regex2))or some of them, but it may be problematic, I may need to compile a particularly long line and duplicate memory usage, as I am building a line, it seems unpleasant, and I'm not sure how much it would be vs comparing a list of compiled pcre, I think I have to compare.

I will write some benchmarks and try the following methods as best as possible:

  • Combine a regular expression into one large regular expression.
  • Compile the regular expression separately - match the list of regular expressions sequentially
  • Compile regex separately - matching list in parallel workflows

- ? , .

-, , .

+4

Source: https://habr.com/ru/post/1623271/


All Articles