Java regexp error: \ (not a valid character

I used java regexp today and found that you are not allowed to use the following regex sequence

String pattern = "[a-zA-Z\\s\\.-\\)\\(]*"; 

if I use it, it will fail and tell me that \ (is not a valid character.

But if I changed regexp to

 String pattern = "[[a-zA-Z\\s\\.-]|[\\(\\)]]*"; 

Then it will work. Is this a bug in the regxp engine, or I don’t understand how to work with the engine?

EDIT: I had an error in my line: there should not be 2 starts [[, this should be only one. This is now fixed.

+4
source share
4 answers

Your regex has two problems.

  • You did not close the character class.

  • - acts as a range operator c . on LHS and ( on RHS. But ( precedes . in unicode, so this leads to an invalid range.

To fix problem 1, close the char class or if you do not want to include [ in valid characters, delete one of [ .

To fix problem 2, either release - as \\- , or move - to the beginning or end of the char class.

So you can use:

 String pattern = "[a-zA-Z\\s\\.\\-\\)\\(]*"; 

or

 String pattern = "[a-zA-Z\\s\\.\\)\\(-]*"; 

or

 String pattern = "[-a-zA-Z\\s\\.\\)\\(]*"; 
+9
source

You should use a dash at the end of the character class, as it is usually used to display a range (as in az ). Reorder it:

 String pattern = "[[a-zA-Z\\s\\.\\)\\(-]*"; 

Also, I don't think you need to avoid characters (.) Inside brackets.

Refresh . As others have pointed out, you should also avoid [ in the java regex character class.

+5
source

The problem is that \.-\) ( "\\.-\\)" in the Java string literal) is trying to determine the range from . before ) . Because the code number is Unicode . (U + 002E) higher than code ) (U + 0029), this is an error.

Try using this template and you will see: [za] .

The correct solution is to either put a dash at the end of a group of characters (after which it will lose its special meaning), or to avoid it.

You also need to close the unclosed open square bracket or to escape from it if it is not intended for grouping.

Also, avoid a full cycle . not necessarily inside a character group.

+2
source

You need to exit the dash and close the solid square bracket. So you get two errors with this regex:

 java.util.regex.PatternSyntaxException: Illegal character range near index 14 

because a dash is used to indicate a range, and \) is obviously an invalid range character. If you exit the dash by making it [[a-zA-Z\s\.\-\)\(]* , you will get

 java.util.regex.PatternSyntaxException: Unclosed character class near index 19 

which means that you have an extra opening square bracket that is used to indicate the character class. I don't know what you meant by adding an extra bracket here, but either escaping or deleting it will make it a valid regular expression.

+1
source

Source: https://habr.com/ru/post/1343181/


All Articles