Starting with Java 7 you can use Pattern.UNICODE_CHARACTER_CLASS
String s = "Müller"; Pattern p = Pattern.compile("^\\w+$", Pattern.UNICODE_CHARACTER_CLASS); Matcher m = p.matcher(s); if (m.find()) { System.out.println(m.group()); } else { System.out.println("not found"); }
without an option, it does not recognize the word "Müller", but using Pattern.UNICODE_CHARACTER_CLASS
Includes a Unicode version of the predefined character classes and POSIX character classes.
See details
You can also look here for more information on Unicode in Java 7.
and here on regular-expression.info an overview of Unicode scripts, properties, and blocks.
See tchrist ’s famous answer about Java regex warnings, including an update of what has changed with Java 7 (from this will be in Java 8)
stema Feb 29 2018-12-12T00: 00Z
source share