The difference between possessive quantifier and one-time subpatterns

I read the PCRE documentation, and I noticed that possessive quantifier + and one-time subpatterns (?>) , Also known as atomic groups, are somewhat similar in concept. Is there a significant difference?

+4
source share
3 answers

(?>) is actually an atomic group.

From the Atomic Group to regular-expressions.info :

An atomic group is a group that, when the regular expression engine exits it, automatically discards all return positions remembered by any tokens within the group. Atomic groups are not captured. Syntax (?> Group).

From Possessing Quantifiers to regular-expressions.info :

Potential quantifiers are a way to prevent the regex engine from occurring all permutations. This is primarily useful for performance reasons. You can also use possessive quantifiers to eliminate certain matches.

On the same page:

From a technical point of view, possessive quantifiers are place an atomic group around one quantifier. All regular expression flavors that support possessive quantifiers also support atomic grouping. But not all flavors of regular expressions support the atomic grouping of the possessive quantifier. With these flavors you can achieve the same results using the atomic group.

Basically, instead of X*+ write (?>X*) . It is important to note that both the quantized token X and the quantifier are inside the atomic group. Even if X is a group, you still need to add an additional atomic group around it in order to achieve the same effect. (?:a|b)*+ equivalent to (?>(?:a|b)*) , but not (?>a|b)* . The latter is a valid regular expression, but it will not have the same effect when used as part of a larger regular expression.

+7
source

If you look at this regular-expressions.info page , you will see in the table that " x++ identical to (?>x+) ".

The only difference:

Potential quantifiers are a limited, but syntactically cleaner alternative to atomic grouping.

So, it is not as popular as the atomic group, but it can be considered cleaner.

+1
source

Note that (?>X+) does not exactly match X++ at the inverse point. Since the regular expression mechanism inside the brackets has the ability to indent, so the regular expression mechanism always registers the return positions in the atomic group (but forgets them after closing the parenthesis), this may, of course, not be the case with the possessive quantifier. Example:

consider the line aaaabbbb

(?>a+)ab , since a++ab will fail, because the regular expression mechanism will not be able to go back when the bracket of the closed atomic group is closed.

but

(?>a+ab) will be successful, since return positions are always written inside the atomic group.

(?:a+|ab)+(?<!a)b will succeed, but (?>a+|ab)+(?<!a)b will fail because the bracket is closed between each repetition.

Conclusion: the exact synonym (?>X+) not X++ , but (?:X+){1}+

+1
source

Source: https://habr.com/ru/post/1497608/


All Articles