How to match the international alphabet (English az, + not English) with a regular expression?

Question

How to match the international alphabet (English az, + not English) with a regular expression?

I want to allow only entered data from the English alphabet and from the alphabet from Germany.

Like öäü OR France, like áê or Chinese, like ...

How can I customize my regular expression to accept all alphabetic characters from the international alphabet?

+11

regex unicode

msfanboy Mar 06

source share

4 answers

Tim Pietzcker · Answer 1 · 2010-03-06 12:11

Since you specifically request Unicode, \p{L} is a shortcut for a Unicode letter. However, not all regular expression flavors support this syntax..NET, Perl, Java, and the JGSoft-regex engine, for example, Python will not.

So, for example, \b\p{L}+\b will match a whole word of Unicode characters.

Wolph · Answer 2 · 2010-03-06 10:48

With PCRE, it will be \w , the word character. It also accepts Unicode when configured correctly.

Ignacio Vazquez-Abrams · Answer 3 · 2010-03-06 10:51

This is changing. Some languages have a "Unicode" flag, which extends \d , \w , etc. Some support range equivalence classes, for example. [[=e=]] matches e , é , ê , etc. The regex documentation for your language or library will explain what options are available.

poke · Answer 4 · 2010-03-06 14:36

In multilingual languages, you can simply enter Unicode characters in the character class: [a-zäöüß] , etc.

How to match the international alphabet (English az, + not English) with a regular expression?

More articles: