How to define regular expression campaigns for non-Latin words?

The following regular expression works and matches any search query (no more than 25 characters), starting with the word "How", as an advertising request in Google Custom Search:

How\b.{0,25}\b 

However, this does not work when I use the non-Latin word UTF-8, for example, the Arabic "ΩƒΩŠΩ":

 ΩƒΩŠΩ\b.{0,25}\b 

Does anyone know a solution?

+4
source share
1 answer

I'm not very familiar with Google Custom Search, but it looks like this is a regular expression of JavaScript, right?

If so, the problem is that \b indicates the boundary between the word character and the non-word character, where the word character is an ASCII letter, underscore, or number ( [A-Za-z_0-9] ) .

There is no general general solution; JavaScript simply knows nothing about individual characters outside the ASCII range, so it cannot determine which ones are letters and which are not. But you can write something like this:

 /^ΩƒΩŠΩ(?:\s.{0,24})?$/ 

to match any query that is either just the word ΩƒΩŠΩ or consists of the word ΩƒΩŠΩ followed by a space character and up to 24 characters. I think this should come close to meeting your requirements.

0
source

Source: https://habr.com/ru/post/1440590/


All Articles