Regexp matches string - positive

Regexp: (?=(\d+))\w+\1 String: 456x56

Hi,

I don’t get a clue how this regular expression matches β€œ56x56” in the line β€œ456x56”.

  • The search pattern (? = (\ D +)) captures 456 and fits in \ 1, for (\ d +)
  • The word character, \ w +, matches the entire line ("456x56")
  • \ 1, which is 456, should follow \ w +
  • After returning the string, it should not find a match, since there is no "456" preceded by a word character

However, the regular expression matches 56x56.

+6
source share
5 answers

You do not bind your regular expression, as was said. Another problem is that \w also matches digits ... Now let's see how the regex engine will match your input:

 # begin regex: |(?=(\d+))\w+\1 input: |456x56 # lookahead (first group = '456') regex: (?=(\d+))|\w+\1 input: |456x56 # \w+ regex: (?=(\d+))\w+|\1 input: 456x56| # \1 cannot be satisfied: backtrack on \w+ regex: (?=(\d+))\w+|\1 input: 456x5|6 # And again, and again... Until the beginning of the input: \1 cannot match # Regex engine therefore decides to start from the next character: regex: |(?=(\d+))\w+\1 input: 4|56x56 # lookahead (first group = '56') regex: (?=(\d+))|\w+\1 input: 4|56x56 # \w+ regex: (?=(\d+))\w+|\1 input: 456x56| # \1 cannot be satisfied: backtrack regex: (?=(\d+))\w+|\1 input: 456x5|6 # \1 cannot be satisfied: backtrack regex: (?=(\d+))\w+|\1 input: 456x|56 # \1 satified: match regex: (?=(\d+))\w+\1| input: 4<56x56> 
+6
source

5) Regex engines conclude that they cannot find a match if it starts a search with 4, so it skips one character and searches again. This time it captures two digits in \1 and finishes matching 56x56

If you want to match only whole lines, use ^(?=(\d+))\w+\1$

 ^ matches beginning of string $ matches end of string 
+7
source

The points you indicated are almost completely, but not quite, wrong!

  1) The group (?=(\d+)) matches a sequence of one or more digits not necessarily 456 2) \w captures only characters, not digits 3) \1 the is a back reference to the match in the group 

Thus, a role expression means searching for a sequence of numbers, followed by a sequence of words with the word characters followed by the sequence that was found before the characters. Hence the 56x56 match.

0
source

Ok that makes him a positive look

  (?=(\d+))\w+\1 

You are right when you say that the first \ d + will match 456, so \ 1 should also be 456, but if this is the case: the expression will not match the string.

Only the common characters before x and after x are 56, and what will he do to get a positive match.

0
source

The + operator is greedy and inverse as necessary. Looks (?=(\d+)) will match 456, then 56 if the re-expression fails, then 6 if the regex works. First attempt: 456. It matches, group 1 contains 456. Then we have \w+ , which is greedy and accepts 456x56, nothing remains, but we still need to match \1 ie 456. Thus: failure. Then \w+ discards one step at a time until we get to the start of the regular expression. And still fail.

We use a character from a string. The next backtrack tries to find a match with substring 56. it matches, and group 1 contains 56. \w+ matches to the end of the line and gets 456x56, and then we try to match 56: fail. So, \w+ bactracks until we leave 56 in the line, and then we have a global match and a regular expression.

You should try it using debug mode in regex mode.

0
source

Source: https://habr.com/ru/post/905399/


All Articles