Regex for checking alphabets and numbers in a localized string

I have an input field that is localized . I need to add validation with a regular expression that should only accept alphabets and numbers. I could use [a-z0-9] if I only used English.

I am currently using the Character.isLetterOrDigit(name.charAt(i)) method (yes, I repeat through each character) to filter out alphabets present in different languages.

Are there any better ways to do this? Are any regular expressions or other libraries available for this?

+11
java regex unicode localization
Feb 29 2018-12-12T00:
source share
3 answers

Starting with Java 7 you can use Pattern.UNICODE_CHARACTER_CLASS

 String s = "Müller"; Pattern p = Pattern.compile("^\\w+$", Pattern.UNICODE_CHARACTER_CLASS); Matcher m = p.matcher(s); if (m.find()) { System.out.println(m.group()); } else { System.out.println("not found"); } 

without an option, it does not recognize the word "Müller", but using Pattern.UNICODE_CHARACTER_CLASS

Includes a Unicode version of the predefined character classes and POSIX character classes.

See details

You can also look here for more information on Unicode in Java 7.

and here on regular-expression.info an overview of Unicode scripts, properties, and blocks.

See tchrist ’s famous answer about Java regex warnings, including an update of what has changed with Java 7 (from this will be in Java 8)

+18
Feb 29 2018-12-12T00:
source share
 boolean foundMatch = name.matches("[\\p{L}\\p{Nd}]*"); 

must work.

[\p{L}\p{Nd}] matches a character that is a Unicode letter or number. The regex .matches() method ensures that the entire string matches the pattern.

+8
Feb 29 '12 at 13:41
source share

Some people, faced with a problem, think: "I know, I will use regular expressions." Now they have two problems.

- Jamie Zawinki

I say this as a joke, but repeating through String, as you do, will have runtime performance, at least as good as any regular expression - there is no way that the regular expression can do what you want faster; and you don’t have the overhead of compiling the template in the first place.

Until:

  • to check, you do not need to do anything else similar to a regular expression (nothing was mentioned in the question)
  • the intent of the loop passing through String becomes clear (and if not, refactor until it is)

then why replace it with regex just because you can?

+1
Feb 29 2018-12-12T00:
source share



All Articles