Group matches and regular expression mismatches

the script I'm working on currently does three regular expression searches in a file; consider the following as input:

2018-01-22 04.02.03: Wurk: 98745061 (12345678)
 Replies (pos: 2) are missing/not sent on assignment: Asdf (55461)

2018-01-22 04.02.03: Wurk: 98885612 (87654321)
 Gorp: 98885612 is not registered for arrival!
 Brork: 98885612 is not registered for arrival!

2018-01-22 04.02.08: Wurk: 88855521 (885052)
 Blam: 12365479 is not registered for arrival!
 Fork: 56564123 is not registered for arrival!

2018-01-22 04.02.08: Wurk: A0885521 (885052)
 Blam: 12365479 is not registered for arrival!
 Fork: 56564123 is not registered for arrival!

where each regular expression finds the lines in the file according to the date of the line, as well as the first number after Wurk :, and collects eight digits / characters after Wurk :.

import time, glob, re
logpath = glob.glob('path\\to\\log*.log')[0]
readfile = open(logpath, "r")
daysdate = time.strftime("%Y-%m-%d")
nine = []
eight = []
seven = []
no_match = []
for line in readfile:
    for match in re.finditer(daysdate + r'.*Wurk: (9.{7})', line):
        nine.append(match.group(1))
    for match in re.finditer(daysdate + r'.*Wurk: (8.{7})', line):
        eight.append(match.group(1))
    for match in re.finditer(daysdate + r'.*Wurk: (7.{7})', line):
        seven.append(match.group(1))
print("\nNine:\n%s\n" % ",\n".join(map(str, nine)) +
   "\nEight:\n%s\n" % ",\n".join(map(str, eight)) +
   "\nSeven:\n%s\n" % ",\n".join(map(str, seven)) +
   "\nNo matches found:\n%s\n" % ",\n".join(map(str, no_match)))

Currently outputting:

Nine:
98745061,
98885612

Eight:
88855521

Seven:

No matches found:

Now the problem is how to make a regular expression that matches the eight digits / characters after Wurk: that were not matched in any previous regular expressions. Therefore, the new output should be as follows:

Nine:
98745061,
98885612

Eight:
88855521

Seven:

No matches found:
A0885521

TL DR

How do you match regular expressions that don't match the criteria for previous regular expressions?

+4
source
1

Regex ; . regex , :

seven, eight, nine, no_match = [], [], [], []

wurk_map = {'7': seven,
            '8': eight,
            '9': nine}

wurks = re.findall(r'(?<=Wurk: ).{8}', text)
for wurk in wurks:
    wurk_map.get(wurk[0], no_match).append(wurk)

print(seven)     # []
print(eight)     # ['88855521']
print(nine)      # ['98745061', '98885612']
print(no_match)  # ['A0885521']
+2

Source: https://habr.com/ru/post/1692537/


All Articles