Big regex patterns: PCRC won't do it

I have a long list of words that I want to find in a large string. There are about 500 words, and a line is usually around 500 thousand.

PCRE gives an error message preg_match_all: Compilation failed: regular expression is too large at offset 704416

Is there an alternative to this? I know that I can recompile PCRE with a higher internal connection size, but I want to avoid mess with server packages.

0
source share
4 answers

Could you approach the problem from a different direction?

  • Use regex to clear your 500K HTML and pull all the words into an array of large particles. Something like \ b (\ w +) \ b .. (sorry, did not check this).

  • - 500 , . , , ( ) . - ( ) .

  • (1), -.

  • -.

+2

, , , , ?

+3

re2.

, , , .

0

You can use the str_word_count or explode string in a space (or any other dillimeter makes sense for the context of your document), then filter the results by your keywords.

$allWordsArray = str_word_count($content, 1);
$matchedWords = array_filter($allWordsArray, function($word) use ($keywordsArray) {
   return in_array($word, $keywordsArray);
});

Suppose php5 + uses closure, but this can be replaced with create_function in earlier versions of php.

0
source

Source: https://habr.com/ru/post/1791889/


All Articles