I am trying to get this regular expression to work, which is designed to find both two words in a sentence.
echo (int)preg_match('/\bHello\W+(?:\w+\W+){0,6}?World\b/ui', 'Hello, world!', $matches).PHP_EOL;
print_r($matches);
And it works great:
1
Array
(
[0] => Hello, world
)
... but only with Latin words. If I switch to Unicode, it will not find anything. There is also no need to look at the syntax because it is from a book (chapter 8. "Find two words next to each other"). The problem is that it works only for Latin words, but not for unicode strings, such as: "Privit, svitu!" (in Ukrainian).
And I checked almost all the possible problems:
✓ I use the 'u' flag in the regular expression pattern.
✓ I authorize UTF-8 support in code before executing this statement as follows:
ini_set('default_charset', 'UTF-8');
mb_internal_encoding('UTF-8');
mb_regex_encoding('UTF-8');
✓ PCRE Debian Linux :
PCRE version 8.02 2010-03-19
Compiled with
UTF-8 support
Unicode properties support
Newline sequence is LF
\R matches all Unicode newlines
Internal link size = 2
POSIX malloc threshold = 10
Default match limit = 10000000
Default recursion depth limit = 10000000
Match recursion uses stack
✓ (* UTF8) , :
echo (int)preg_match('/(*UTF8)\bі\W+(?:\w+\W+){0,6}?і\b/ui', 'і, і!', $matches).PHP_EOL;
print_r($matches);
:
0
Array
(
)
, : unicode , , ? , :
echo (int)preg_match('/і/ui', 'і, і!', $matches).PHP_EOL;
print_r($matches);
:
1
Array
(
[0] => і
)
, , , - regex ( , , , ).
Stackoverflow, , .