Template objects that do not match different languages

I have the following reg expression that works fine when the user enters English. But it always fails when using Portuguese characters.

Pattern p = Pattern.compile("^[a-zA-Z]*$"); Matcher matcher = p.matcher(fieldName); if (!matcher.matches()) { .... } 

Is there a way to get a pattern object to recognize valid Portuguese characters, such as ÁÃÃÃÇÉÇç ....?

thanks

+4
source share
3 answers

You need a regular expression that matches the class of all the letters. In all the scenarios of the world there are many of them, but, fortunately, we can say that the Java 6 RE engine that we are after writing will use the magic of Unicode classes to do the rest. In particular, the class L corresponds to all types of letters, upper, lower and "oh, this concept does not apply in my language":

 Pattern p = Pattern.compile("^\\p{L}*$"); // the rest is identical, so won't repeat it... 

When reading docs, remember that backslashes need to be doubled if they are placed in a Java literal to stop the Java compiler from interpreting them as something else. (Also keep in mind that this RE is not suitable for things like checking people's names, which is a completely different and much more complex problem.)

+5
source

It should work with "^\p{IsAlphabetic}*$" , which takes into account Unicode characters. For help, see Options at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

+5
source

Check out the Pattern doc and especially the Unicode section:

Unicode blocks and categories are written using \ p and \ P as in Perl. \ P {prop} matches if the input has the prop property, and \ P {prop} does not match if the input has this property. Blocks are prefixed with In, as in InMongolian. categories can be specified with an additional prefix Is: Both \ p {L} and \ p {IsL} denote the category of Unicode letters. Blocks and categories can be used both inside and outside the character class.

(for Java 1.4.x). I suspect you are interested in identifying Unicode letters and not especially Portuguese letters?

+3
source

Source: https://habr.com/ru/post/1389076/


All Articles