Why does \ b not work correctly for some languages?

Here is my code: (It works correctly for English)

$str1 = "itt is a testt";
$str2 = "it is a testt";
$str3 = "itt is a test";
$str4 = "it is a test";

echo preg_match("[\b(?:it|test)\b]", $str1) ? 1 : 2; // output: 2 (do not match)
                                     $str2           // output: 1 (it matches)
                                     $str3           // output: 1 (it matches)
                                     $str4           // output: 1 (it matches)

But I don’t know why, the above REGEX does not work correctly for the Persian language: (it always returns 1)

$str1 = "دیوار";
$str2 = "دیوارر";

echo preg_match("/[\b(?:دیوار|خوب)\b]/u", $str1) ? 1 : 2; // output: 1
echo preg_match("/[\b(?:دیوار|خوب)\b]/u", $str2) ? 1 : 2; // output: 1 (it should be 2)

How can i fix this?

+4
source share
2 answers

You placed your regular expression in a character class in "/[\b(?:دیوار|خوب)\b]/u", remove []from it:

"/\b(?:دیوار|خوب)\b/u"

You can replace \bwith an alternative:

"/(?:^|\s)(?:دیوار|خوب)(?:\s|$)/u"

You can also change \sto a negative character class that lists Arabic letters. I do not know them, but I like: [^دیوارخوب]...

+4
source

\b - backspace.

: , , b .

  • '/\b(?:دیوار|خوب)\b/u' ...
  • "/\\b(?:دیوار|خوب)\\b/u"

IDEONE:

echo preg_match('/\b(?:دیوار|خوب)\b/u', $str1) ? 1 : 2; // output: 1
echo preg_match('/\b(?:دیوار|خوب)\b/u', $str2) ? 1 : 2; // output: 1 (it should be 2)
+1

Source: https://habr.com/ru/post/1615571/


All Articles