Another Java RegEx Question

I have the following code:

public static void main(String[] args){
    StringBuilder content = new StringBuilder("abcd efg h i. -  – jk(lmn) qq zz.");
    String patternSource = "[.-–]($| )";
    Pattern pattern = Pattern.compile(patternSource);
    Matcher matcher = pattern.matcher(content);
    System.out.println(matcher.replaceAll(""));
}

where the patternSource character class consists of a period, minus, and \ u2013 character (something like a long dash). When executed in

abcefi-  jk(lmn) qzz

If I change the order of the characters in my character class in any way, it starts working fine and gives

abcd efg h i jk(lmn) qq zz

What the heck?

Tested under JDK / JRE 1.6.0_23

+3
source share
1 answer

If you have an unescaped hyphen in a character class, it has special meaning as a range of characters: for example. [AZ] means all characters between A and Z.

The exception is that a hyphen is at the beginning or end of a character class, in which case it is processed literally and matches only a hyphen.

+4
source

Source: https://habr.com/ru/post/1791146/


All Articles