What exactly does a quantifier do * +?

The idea of ​​lazy and greedy is easy to understand, but I only saw / used *+ once in my regular expression (in Java) [A]|[^B]*+(?!C) (A, B, C are arbitrary values) simply because it worked when a lazy modifier led to a StackOverflow error.

Due to the inability of most search engines to search for characters, I cannot find any documentation on this subject. So what exactly does * + do and how does it do it?

+4
source share
3 answers

The greedy quantifier matches everything it can, and then the pattern goes back until the match succeeds.

The Len quantifier forward tracks until the match is complete.

Possessive quantifier matches all possibilities, but never returns .

+ stands for possessive quantifier. If they can be used, for example, ++ or *+ .

This ability to prevent backtracking means that it can stop catastrophic backtracking .

+7
source

As the other answers indicate, *+ is a possessive quantifier. It matches the previous element as many times as possible, just like a greedy quantifier, but never returns.

Why is this useful? Only as a performance optimization. In addition, only as a performance optimization when the regular expression does not match. This is an important point for understanding regular expressions: their worst performance always occurs when they do not match.

Depending on the regular expression engine used and the details of the regular expression itself, the worst case performance can sometimes be stunningly bad. For a simple example, take this regex: a*a*a*b matched with this line: aaaaac .

In connection with this situation, the standard regular expression engine of the NFA type will do the following:

  • Try doing the 1st a 5 times, the second a zero time and the third a zero time.
  • Try matching b with c - it doesn't work.
  • "Backtrack" and match 1st a 4 times, 2nd 1st time and 3rd zero time.
  • Try b again - it does not work.
  • Go back, try the 1st a 4 times, the second zero time and the third 1 time.
  • ...
  • Come back, try 1st a 3 times, 2nd 2 times and 3rd zero time ...

(I think you can complete the next few hundred steps yourself.)

If the regular expression was a*+a*a*b , this will never happen. It would be more like:

  • Try to execute the 1st a 5 times, the second zero time and the third zero time.
  • Try b - it does not work.
  • Since 1st a is “possessive,” once it matches 5 times, it will never be able to try to match fewer times. And since there is no a in the line, but for the other a* - they can only coincide with zero time. There is nothing to try, so the match as a whole will not work.
  • There is no step 4. You are done. While your friends are waiting for their regular expressions to complete, you can drop your heels and crack your own cold drink of your choice.
+2
source

X * + means X, zero or more times (Possessive)

Possessive quantifiers always eat the entire input line, trying once (and only once) to match. Unlike greedy quantifiers, possessive quantifiers never back down, even if this allows a coincidence of complete success.

+1
source

Source: https://habr.com/ru/post/1495405/


All Articles