How to match the international alphabet (English az, + not English) with a regular expression?

I want to allow only entered data from the English alphabet and from the alphabet from Germany.

Like öäü OR France, like áê or Chinese, like ...

How can I customize my regular expression to accept all alphabetic characters from the international alphabet?

+11
regex unicode
Mar 06
source share
4 answers

Since you specifically request Unicode, \p{L} is a shortcut for a Unicode letter. However, not all regular expression flavors support this syntax..NET, Perl, Java, and the JGSoft-regex engine, for example, Python will not.

So, for example, \b\p{L}+\b will match a whole word of Unicode characters.

+13
Mar 06 '10 at 12:11
source share

With PCRE, it will be \w , the word character. It also accepts Unicode when configured correctly.

+1
Mar 06 '10 at 10:48
source share

This is changing. Some languages ​​have a "Unicode" flag, which extends \d , \w , etc. Some support range equivalence classes, for example. [[=e=]] matches e , é , ê , etc. The regex documentation for your language or library will explain what options are available.

+1
Mar 06
source share

In multilingual languages, you can simply enter Unicode characters in the character class: [a-zäöüß] , etc.

0
Mar 06
source share



All Articles