Python regex module: repeating backlinks doesn't work correctly

Note. I am using an alternative regular PyPi module

I have a python program in which I look for duplicate labels in a specific format, separated by commas.

Format: (* words ... * # * number *)

For example: Trial #1, Trial #2, Run #3,and Spring trial #13will fit into the format.

I use: ([\w ]*#\d\d?,)\1*in the source line as my regex pattern.

In java and in various regex testing mechanisms, using findall()with this template in line:

Launch No. 1, Launch No. 1, Launch No. 1, Launch No. 1, Launch No. 1, Launch No. 1, Launch No. 1, Launch No. 2, Launch No. 2, Launch No. 2, Launch No. 2, Launch No. 2, Launch No. 2, Launch No. 2, Launch No. 3, Launch No. 3, Launch No. 3, Launch No. 3, Launch No. 3, Launch No. 3, Launch No. 3, (...

...) Run # 20, run # 20, run # 20, run # 20, run # 20, run # 20, run # 20

returns:

corresponds to 1: Launch No. 1, Launch No. 1, Launch No. 1, Launch No. 1, Launch No. 1, Launch No. 1, Launch No. 1,

coincidence 2: Launch No. 2, Launch No. 2, Launch No. 2, Launch No. 2, Launch No. 2, Launch No. 2, Launch No. 2,

... etc..

but in python it returns:

matches 1: Run # 1,

match 2: Run # 2,

... etc..

I want it to return the first result (returned by java and other programmatic regular expressions)

-, python regex engine? ?

:

import regex

file = open('Pendulum Data.csv',mode='r')
header1 = file.readline()
header2 = file.readline()

pattern1 = regex.compile(r'([\w ]*#\d\d?)\1*',flags=regex.V0)
header1Match = pattern1.findall(header1)
for x in header1Match:
    print(x)

for print .

( : regex.findall()? findall() , , ?)

... , .

+4
1

. Python .finall , . , .finditer.

. Python re.finditer:

iterator, MatchObject RE . , . , .

re.findall:

, . , . , ; , .

re.finditer:

import re
p = re.compile(r'([\w ]*#\d\d?,)\1*')
test_str = "Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3, (..."
print [x.group() for x in p.finditer(test_str)]

:

['Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,Run #1,', 'Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,Run #2,', 'Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,Run #3,']

, , re.

+1

Source: https://habr.com/ru/post/1615832/


All Articles