Cannot understand String.replace; All non-greedy behavior

Possible duplicate:
Java regex anomaly?

any idea why the next test fails (returns "xx" instead of "x")

@Test public void testReplaceAll(){ assertEquals("x", "xyz".replaceAll(".*", "x")); } 

I do not want to do "^.*$" .... I want to understand this behavior. any clues?

+6
source share
2 answers

Yes, this is exactly the same as described in this question !

.* will first match the entire input, but then also the empty line at the end of the input ...

Designate regex mechanism with | and input with <...> in your example.

  • input: <xyz> ;
  • regex engine, before the first start: <|xyz> ;
  • regex engine, after the first start: <xyz|> (consistent text: "xyz");
  • regex engine, after the second start: <xyz>| (agreed upon text: "").

Not all regex engines behave this way. However, Java. So does perl. Sed, as a counterexample, will position its cursor after completing the input in step 3.

Now you also need to understand one important thing: regular expression engines, when they encounter a zero-length match, always put forward one character. Otherwise, think about what happens if you try to replace '^' with 'a': '^' matches the position, so it matches the zero length. If the engine has not advanced one character, "x" will be replaced by "ax", which will replace "aax", etc. So, after the second match, which is empty, the Java regex mechanism advances one character "... of which there are none: completion of processing.

+9
source
 @Test public void testReplaceAll(){ assertEquals("x", "xyz".replaceAll(".+", "x")); } 

This will probably be a trick, since it requires one or more characters, and this prevents behavior in which * can match with null characters and replace it with "x".

0
source

Source: https://habr.com/ru/post/904695/


All Articles