Regular Expression Capture Order: Wrong Alternative Agreed After a Greedy Pattern

I have this template:

(\w+)(sin|in|pak|red)$ 

And the replacement template is as follows:

 $1tak 

The problem is that this word:

setesin

will be converted to:

setestak

instead

setetak

For some reason, in always has sin priority in the template.

How can I make a template follow this order?

+5
source share
2 answers

Use a lazy quantifier:

 (\w+?)(sin|in|pak|red)$ ^ 

Watch the regex demo

\w+ contains a greedy quantifier that: 1) captures as many characters as it can (and note that it can match s , i , all letters, numbers, and underscores), and then backtraces (gives one char after another moving from right to left), trying to adapt to subsequent patterns. Since it is in first, it matches, and the whole group is considered consistent, the regular expression continues to check the end of the line with $ . The left quantifier will have a regex mechanism, skipping \w+? after matching 1, the words char, and other patterns will be checked, moving from left to right.

+9
source

Do not use the quantifier at all:

 (\w)(?:sin|in|pak|red)$ 

with the same replacement

or

 \B(?:sin|in|pak|red)$ 

with tak as a replacement. The non-word boundary \B ensures that there is no word character before the character of the first character (unless the character of the first word is required before interleaving deletes \B ).

In these two ways, the first entries on the left are found first and are not consumed by the greedy quantifier.

+3
source

Source: https://habr.com/ru/post/1260454/


All Articles