Why greedy quantifiers are cheaper than lazy quantifiers

http://www.rexegg.com/regex-quantifiers.html#tempered_greed

enter image description here


Greedy quantifiers (default) - Work by swallowing as many characters as possible to slowly reduce the number of characters matched one by one to make room for the rest of the template.

For example, the regular expression .*world for a string hello world will first try to internalize the entire string and put it in .* . But this is not possible, because then the world can not match, therefore .* Begins to discard characters one by one until it discards world in the original string - in this case, the entire regular expression may match.

Lenic quantifiers work in much the same way, except the other way around. They want to match as few characters as possible, and they do the same thing by adding more characters one by one until the rest of the template can match


But in accordance with this article that I read, these seemingly identical processes for greedy and lazy quantifiers are different - in that only lazy quantifiers have a need to "back off." But wouldnโ€™t there be greedy quantifiers, do you also need to "back off" when spilling out previously swallowed elements?

Why is this so? It just seems intuitive

+5
source share
2 answers

Greedy and lazy quantifiers are equally expensive when used correctly. However, lazy quantifiers have gained a bad reputation for being slow because they can be applied and are often used to compensate for inaccuracies in the pattern.

Consider a simple example: <.*?> Vs. <.*> .

When both expressions apply to the same input

 <abcdefghijklmnopqrstuvwxyz0123456789> 

they correspond to the exact text. The only difference is how they get into the match: the "lazy" expression puts longer and longer lines, reaching the match after 40 steps ( demo 1 ). The greedy expression, on the other hand, goes all the way and returns once to come to the match in only 5 steps ( demo 2 ).

Please note that the situation can be reversed if after > add more characters:

 <abcdefghijklmnopqrstuvwxyz0123456789>abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789abcdefghijklmnopqrstuvwxyz0123456789 

Now the greedy expression becomes โ€œslowโ€, taking 149 steps ( demo 3 ), while the lazy expression continues to take the same 40 steps as before ( demo 4 ).

+9
source

Sometimes greedy sucks.

Example [asdfsikbfqew0787072143k*(*&^(**&]

Bench

 Regex1: \[[^\]]*\] Options: < none > Completed iterations: 200 / 200 ( x 1000 ) Matches found per iteration: 1 Elapsed Time: 0.48 s, 476.21 ms, 476211 ยตs Regex2: \[.*?\] Options: < none > Completed iterations: 200 / 200 ( x 1000 ) Matches found per iteration: 1 Elapsed Time: 0.32 s, 322.73 ms, 322730 ยตs 

Never, but never, use the greedy ones with a dot, unless it requires a remainder at the end of the line.

Someone once told me that this (?=.*d) is a look that starts to check from the next character. This is absolutely untrue. This immediately sets the pointer to the end of the line and starts checking backward (backward). Keep track of this because it skips over everything.
It is slower than viewing paint dry.

I use greedy all the time, now I use it only under duress, and I do not recommend anyone.

+2
source

Source: https://habr.com/ru/post/1244328/


All Articles