Using backreference regex as part of regex in Python

I am trying to use the regex part as input for the later part of the regex.

What I still have (which I could not say):

import re regex = re.compile(r"(?P<length>\d+)(\d){(?P=length)}") assert bool(regex.match("3123")) is True assert bool(regex.match("100123456789")) is True 

Destroying this, the first digit denotes the number of digits that must be matched subsequently. In the first statement, I get a 3 as the first character, which means that there must be exactly three digits after it, otherwise more than 9 digits. If there are more than 9 digits, then the first group should be expanded and checked for the rest of the digits.

The regular expression 3(\d){3} will correctly match the first statement, however I cannot get the regular expression according to the general case when the brackets {} get the regular expression backreference: {(?P=length)}

Call a regex with the re.DEBUG flag:

 subpattern 1 max_repeat 1 4294967295 in category category_digit subpattern 2 in category category_digit literal 123 groupref 1 literal 125 

Curly braces { ( 123 ) and } ( 125 ) appear to be interpreted as literals when there is a backreference inside them.
When there is no backlink, for example {3} , I see that {3} interpreted as max_repeat 3 3

Is using a backreference as part of a regular expression possible?

+6
source share
1 answer

Cannot put backreference as a quantifier bounding argument inside a template. To solve your current problem, I can suggest the following code (see Built-in comments explaining the logic):

 import re def checkNum(s): first = '' if s == '0': return True # Edge case, 0 is valid input m = re.match(r'\d{2,}$', s) # The string must be all digits, at least 2 if m: lim = '' for i, n in enumerate(s): lim = lim + n if re.match(r'{0}\d{{{0}}}$'.format(lim), s): return True elif int(s[0:i+1]) > len(s[i+1:]): return False print(checkNum('3123')) # Meets the pattern (123 is 3 digit chunk after 3) print(checkNum('1234567')) # Does not meet the pattern, just 7 digits print(checkNum('100123456789')) # Meets the condition, 10 is followed with 10 digits print(checkNum('9123456789')) # Meets the condition, 9 is followed with 9 digits 

See a demo of Python on the Internet .

The pattern used with re.match (which anchors the pattern at the beginning of the line), {0}\d{{{0}}}$ will look like 3\d{3}$ if 3123 is passed to the checkNum method. It will match a line starting with 3 , and will match exactly 3 digits, followed by the end of the line marker ( $ ).

+3
source

Source: https://habr.com/ru/post/1013060/


All Articles