Preg_replace does not work for some words / characters

$str = 'کس نے موسیٰ کے بارے میں سنا ہے؟';
$str = preg_replace('/(?<=\b)موسیٰ(?=\b)/u', 'Musa', $str);
$str = preg_replace('/(?<=\b)سنا(?=\b)/u', 'suna', $str);
echo $str;

This does not replace موسیٰ. He must give کس نے Musa کے بارے میں suna ہے؟, but instead gives کس نے موسیٰ کے بارے میں suna ہے؟.

This happens for all words that end in ٰ, for example تعالیٰ. It works for words where it ٰis in the middle of a word (words do not start with ٰ). Does this mean that it \bjust doesn't work with ٰ? This is mistake?

+4
source share
3 answers

The reason is that the word boundary coincides in the following positions:

  • Before the first character in a string, if the first character is a word character.
  • After the last character in a string, if the last character is a character in a word.
  • , , .

"" U+0670 ARABIC LETTER SUPERSCRIPT ALEF \p{Mn} ( Unicode) , , . \b , char, \w (, , _).

, / :

$str = 'کس نے موسیٰ کے بارے میں سنا ہے؟';
$str = preg_replace('/(?<!\w)موسیٰ(?!\w)/u', 'Musa', $str);
$str = preg_replace('/(?<!\w)سنا(?!\w)/u', 'suna', $str);
echo $str; // => کس نے Musa کے بارے میں suna ہے؟

- PHP.

(?<!\w) - lookbehind, , char , (?!\w) - , , char .

+1

:

< >   $ str = 'کس نے موسی کے بارے میں سنا ہے?';   $ = [ '/موسی/U', '/سنا/'];   $ = [ 'Musa', ''];   echo preg_replace ($ patterns, $replacements, $str); >

, , / ?

$str = 'کس نے موسیٰ کے بارے میں سنا ہے؟';
$patterns[]='/(?<= |^)موسیٰ(?= |$)/u';
$patterns[]='/\bسنا\b/u';
// or \s perhaps instead of blank space
$replacements=['Musa','suna'];
echo preg_replace($patterns,$replacements,$str);

:

کس نے Musa کے بارے میں suna ہے؟
+1

, :

\b \b... \w \w.

\w , ASCII, (*UCP) u unicode \w , .

, \b , ٰ , , .

What you are trying to do is more like finding out if there is any character without a word preceding or following a word موسیٰ, so the statement of the \Smetacharacter does the job:

(?<!\S)موسیٰ(?!\S)

Another way to solve this problem is to transliterate the entire input line using the ICU library to remove all accents, and then try to match a word موسیthat does not include the join label ٰ:

<?php

$strings = [
    'is' => 'کس نے موسیٰ کے بارے میں سنا ہے؟', // input string
    'wts' => 'موسیٰ' // word to search
];

array_walk($strings, function(&$value) {
    $value = transliterator_transliterate('[:Nonspacing Mark:] Remove;', $value);
});

// word boundaries now can be used
echo preg_replace('/\b' . $strings['wts'] . '\b/u', 'musa', $strings['is']);

Outputs:

کس نے musa کے بارے میں سنا ہے؟
0
source

Source: https://habr.com/ru/post/1676970/


All Articles