The correct PHP regular expression expression does not work in PHP 5.3.3-7 with unicode

I am trying to get this regular expression to work, which is designed to find both two words in a sentence.

echo (int)preg_match('/\bHello\W+(?:\w+\W+){0,6}?World\b/ui', 'Hello, world!', $matches).PHP_EOL;
print_r($matches);

And it works great:

1
Array
(
    [0] => Hello, world
)

... but only with Latin words. If I switch to Unicode, it will not find anything. There is also no need to look at the syntax because it is from a book (chapter 8. "Find two words next to each other"). The problem is that it works only for Latin words, but not for unicode strings, such as: "Privit, svitu!" (in Ukrainian).

And I checked almost all the possible problems:

✓ I use the 'u' flag in the regular expression pattern.

✓ I authorize UTF-8 support in code before executing this statement as follows:

 ini_set('default_charset', 'UTF-8');
 mb_internal_encoding('UTF-8');
 mb_regex_encoding('UTF-8');

✓ PCRE Debian Linux :

 # pcretest -C
 PCRE version 8.02 2010-03-19
 Compiled with
   UTF-8 support
   Unicode properties support
   Newline sequence is LF
   \R matches all Unicode newlines
   Internal link size = 2
   POSIX malloc threshold = 10
   Default match limit = 10000000
   Default recursion depth limit = 10000000
   Match recursion uses stack

✓ (* UTF8) , :

echo (int)preg_match('/(*UTF8)\bі\W+(?:\w+\W+){0,6}?і\b/ui', 'і, і!', $matches).PHP_EOL;
print_r($matches);

:

0
Array
(
)

, : unicode , , ? , :

echo (int)preg_match('/і/ui', 'і, і!', $matches).PHP_EOL;
print_r($matches);

:

1
Array
(
    [0] => і
)

, , , - regex ( , , , ).

Stackoverflow, , .

+4
1

, UTF-8 PHP. 5.3, , . : http://3v4l.org/7HurJ. , 5.3.4, , , , . , - , , , , - "" .

+1

Source: https://habr.com/ru/post/1544567/


All Articles