The value of the `+` following the `*` when the latter is used as a quantifier in the regular expression

Today I came across the following regex and wanted to know what Ruby would do with it:

> "#a" =~ /^[\W].*+$/ => 0 > "1a" =~ /^[\W].*+$/ => nil 

In this case, Ruby seems to ignore the + character. If this is not true, I am not sure what is going on with it. I assume that this is not interpreted as a quantifier, since * not escaped and is used as a quantifier. In Perl / Ruby regular expressions, sometimes when a character (e.g. - ) is used in a context in which it cannot be interpreted as a special character, it is treated as a literal. But if this happens in this case, I expect the first match to fail, because there is no + in the lvalue line.

Is it subtly correct to use the + symbol? Is the above behavior an error? Am I missing something?

+6
source share
1 answer

Ok, you can use + after * . You can read a little about it on this site . + after * is called possessive quantifier.

What is he doing? This prevents return * .

Usually, when you have something like .*c and using this to match abcde .* Will first match the entire string ( abcde ), and since the regular expression cannot match c after .* , The engine will return one character at a time to check if there is a match (this is a rollback).

As soon as it returns to c , you will get an abc match with abcde .

Now imagine that the engine should cancel several hundred characters, and if you have nested groups and several * (or + or {m,n} ) forms, you can quickly get thousands, millions of characters to return, called catastrophic rollback .

Possessive quantifiers are used here. They actually impede any form of retreat. In the above regex, abcde will not match .*+c abcde As soon as .*+ Consumes the entire line, it cannot back out, and since there is no c at the end of the line, the match is not performed.

So another possible use of possessive quantifiers is that they can improve the performance of some regular expressions if the engine can support it.

For your regular expression /^[\W].*+$/ , I don’t think there is an improvement (perhaps a slight improvement) that the possessive quantifier gives. Lastly, it can easily be rewritten as /^\W.*+$/ .

+5
source

Source: https://habr.com/ru/post/954465/


All Articles