Canonical equivalence in a pattern

I mean the test harness specified here http://docs.oracle.com/javase/tutorial/essential/regex/test_harness.html

The only change I made to the class is that the template is created as follows:

Pattern pattern = Pattern.compile(console.readLine("%nEnter your regex(Pattern.CANON_EQ set): "),Pattern.CANON_EQ); 

As shown in the http://docs.oracle.com/javase/tutorial/essential/regex/pattern.html lesson, I insert a pattern or regular expression like a\u030A and a string so that it matches \u00E5 , but it ends in No Match Found. I saw that both lines are a small case of "a" with a ring on top.

Did I understand the use case correctly?

+6
source share
1 answer

The behavior you see has nothing to do with the Pattern.CANON_EQ flag.

The input read from the console does not match the Java string literal. When a user (presumably you, checking this flag) enters \u00E5 into the console, the resulting line read using console.readLine is equivalent to "\\u00E5" and not "å". See for yourself: http://ideone.com/lF7D1

As for Pattern.CANON_EQ , it behaves exactly as described:

 Pattern withCE = Pattern.compile("^a\u030A$",Pattern.CANON_EQ); Pattern withoutCE = Pattern.compile("^a\u030A$"); String input = "\u00E5"; System.out.println("Matches with canon eq: " + withCE.matcher(input).matches()); // true System.out.println("Matches without canon eq: " + withoutCE.matcher(input).matches()); // false 

http://ideone.com/nEV1V

+7
source

Source: https://habr.com/ru/post/913826/


All Articles