Replace / replaceAll with regex on unicode issues

Is there a way to apply the method replaceto text in Unicode in general (Arabic is a concern here)? In the example below, while replacing the whole word works fine in the English text, it does not detect and as a result replaces the Arabic word. I added uas a flag to enable Unicode parsing, but that didn't help. In the Arabic example below, the word ุงู„ู†ุฌูˆู… should be replaced, but not ูˆุงู„ู†ุฌูˆู…, but this does not happen.

<!DOCTYPE html>
<html>
<body>
<p>Click to replace...</p>
<button onclick="myFunction()">replace</button>
<p id="demo"></p>
<script>
function myFunction() {
  var str = "ุงู„ุดู…ุณ ูˆุงู„ู‚ู…ุฑ ูˆุงู„ู†ุฌูˆู…ุŒ ุซู… ุงู„ู†ุฌูˆู… ูˆุงู„ู†ู‡ุงุฑ";
  var rep = 'ุงู„ู†ุฌูˆู…';
  var repWith = 'ุงู„ู„ูŠู„';

  //var str = "the sun and the stars, then the starsz and the day";
  //var rep = 'stars';
  //var repWith = 'night';

  var result = str.replace(new RegExp("\\b"+rep+"\\b", "ug"), repWith);
  document.getElementById("demo").innerHTML = result;
}
</script>
</body>
</html>

And whatever solution you propose, save it using variables, as you see in the above code (the variable repabove), since they replace the words that are requested, passed through function calls.

UPDATE. , .

+4
2

A \bword\b (^|[A-Za-z0-9_])word(?![A-Za-z0-9_]), , $1 .

Unicode, XRegExp , "" \pL . A-Za-z \pL:

var str = "ุงู„ุดู…ุณ ูˆุงู„ู‚ู…ุฑ ูˆุงู„ู†ุฌูˆู…ุŒ ุซู… ุงู„ู†ุฌูˆู… ูˆุงู„ู†ู‡ุงุฑ";
var rep = 'ุงู„ู†ุฌูˆู…';
var repWith = 'ุงู„ู„ูŠู„';

var regex = new XRegExp('(^|[^\\pL0-9_])' + rep + '(?![\\pL0-9_])');
var result = XRegExp.replace(str, regex, '$1' + repWith, 'all');
console.log(result);
<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>
Hide result

by @mohsenmadi: Angular, :

  • npm install xregexp, package.json
  • import { replace, build } from 'xregexp/xregexp-all.js';
  • let regex = build('(^|[^\\pL0-9_])' + rep + '(?![\\pL0-9_])');
  • : let result = replace(str, regex, '$1' + repWith, 'all');
+2

, .

var Rx = new RegExp(
   "(^|[\\u0009-\\u000D\\u0020\\u0085\\u00A0\\u1680\\u2000-\\u200A\\u2028-\\u2029\\u202F\\u205F\\u3000])"
   + text +
   "(?![^\\u0009-\\u000D\\u0020\\u0085\\u00A0\\u1680\\u2000-\\u200A\\u2028-\\u2029\\u202F\\u205F\\u3000])"
   ,"ug");

var result = str.replace( Rx, '$1' + repWith );

 (                             # (1 start), simulated whitespace boundary
      ^                             # BOL
   |                              # or whitespace
      [\u0009-\u000D\u0020\u0085\u00A0\u1680\u2000-\u200A\u2028-\u2029\u202F\u205F\u3000] 
 )                             # (1 end)

 text                          # To find

 (?!                           # Whitespace boundary
      [^\u0009-\u000D\u0020\u0085\u00A0\u1680\u2000-\u200A\u2028-\u2029\u202F\u205F\u3000] 
 )

, lookbehind,
: (?<!\S)text(?!\S).

+1

Source: https://habr.com/ru/post/1691310/


All Articles