Regex to allow non-ascii and foreign letters?

Is it possible to create a regular expression to allow non-ascii letters along with Latin alphabets, for example, Chinese or Greek characters (for example, A 汉语 AbN 漢語 is allowed)?

Currently, I have the following ^[\w\d][\w\d_\-\.\s]*$ , which only allows Latin alphabets.

+4
source share
1 answer

In .NET,

 ^[\p{L}\d_][\p{L}\d_.\s-]*$ 

equivalent to your regular expression, optionally allowing other Unicode letters.

Explanation:

\p{L} is a shorthand for the Unicode property "Letter".

Caution: I think you did not want the underscore to be the initial character (as evidenced by its presence only in the second class of characters). Since \w contains an underscore, your regex really did allow that. You can remove it from the first character class in my solution (of course, it is not included in \p{L} ).

In ECMAScript, things are not so simple. You will need to define your own Unicode character ranges. Fortunately, the StackOverflow user has already risen on this subject and has developed a JavaScript-regix converter:

fooobar.com/questions/9626 / ...

+5
source

Source: https://habr.com/ru/post/1442052/


All Articles