Java error replaceAll regex

I want to convert all "*" to ". *" Excepte "\ *"

String regex01 = "\\*toto".replaceAll("[^\\\\]\\*", ".*"); assertTrue("*toto".matches(regex01));// True String regex02 = "toto*".replaceAll("[^\\\\]\\*", ".*"); assertTrue("tototo".matches(regex02));// True String regex03 = "*toto".replaceAll("[^\\\\]\\*", ".*"); assertTrue("tototo".matches(regex03));// Error 

If "*" is the first character, an error occurs: java.util.regex.PatternSyntaxException: The hanging metacharacter '*' next to index 0

What is the correct regular expression?

+6
source share
3 answers

You need to use a negative lookbehind here:

 String regex01 = input.replaceFirst("(?<!\\\\)\\*", ".*"); 

(?<!\\\\) is a negative lookbehind which means matching * unless preceded by a backslash.

Examples:

 regex01 = "\\*toto".replaceAll("(?<!\\\\)\\*", ".*"); //=> \*toto regex01 = "*toto".replaceAll("(?<!\\\\)\\*", ".*"); //=> .*toto 
+2
source

This is currently the only solution able to work with multiple shielded \ per line:

 String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*"); 

How it works

Let print the regex line to look at the actual line processed by the regex engine:

 \G((?:[^\\*]|\\[\\*])*)[*] 

((?:[^\\*]|\\[\\*])*) matches a sequence of characters, not \ or * , or escape sequences \\ or \* . We match all the characters that we don’t want to touch, and put them in the capture group so that we can return it.

In the above sequence, an unshielded asterisk follows, as described in [*] .

To make sure we don’t β€œjump” when the regular expression cannot match unescaped * , \G used to make sure that the next match can only start at the beginning of a line, or from which the last match ends.

Why such a long solution? . The look-behind construct needs to check whether the number of consecutive \ preceding * odd or even Java regex is not officially supported. Therefore, we need to use the string from left to right, taking into account the escape sequences, until we encounter unescaped * and replace it with .* .

Testing program

 String inputs[] = { "toto*", "\\*toto", "\\\\*toto", "*toto", "\\\\\\\\*toto", "\\\\*\\\\\\*\\*\\\\\\\\*"}; for (String input: inputs) { String regex = input.replaceAll("\\G((?:[^\\\\*]|\\\\[\\\\*])*)[*]", "$1.*"); System.out.println(input); System.out.println(Pattern.compile(regex)); System.out.println(); } 

Output example

 toto* toto.* \*toto \*toto \\*toto \\.*toto *toto .*toto \\\\*toto \\\\.*toto \\*\\\*\*\\\\* \\.*\\\*\*\\\\.* 
+3
source

You should handle the case of a line starting with * in your regular expression:

 (^|[^\\\\])\\* 

A single carriage is a "start anchor".

Edit

Besides the fix above, the replacement string in the replaceAll call should be $1.* instead of .* So that the character matches before the lost * is lost.

0
source

Source: https://habr.com/ru/post/986649/


All Articles