Using the Python.match () regex method to get the string before and after the underline

I have the following code ...

tablesInDataset = ["henry_jones_12345678", "henry_jones", "henry_jones_123"]

for table in tablesInDataset:
    tableregex = re.compile("\d{8}")
    tablespec = re.match(tableregex, table)

    everythingbeforedigits = tablespec.group(0)
    digits = tablespec.group(1)

My regex should only return a string if it contains 8 digits after the underscore. When it returns the string, I want to use .match()to get two groups using the method .group(). The first group must contain a string that will contain all the characters before the digits, and the second should contain a string with 8 digits.

Can someone please help me figure out the right regex to use the results I'm looking for using .match()and .group()?

+4
4
tableregex = re.compile("(.*)_(\d{8})")
+4

:

>>> import re
>>> pat = re.compile(r'(?P<name>.*)_(?P<number>\d{8})')
>>> pat.findall(s)
[('henry_jones', '12345678')]

, :

>>> match = pat.match(s)
>>> match.groupdict()
{'name': 'henry_jones', 'number': '12345678'}
+5

I think that this pattern should match what you need: (.*?_)(\d{8}).

The first group includes all up to 8 digits, including an underscore. The second group is 8 digits.

If you do not want to use underscore, use instead: (.*?)_(\d{8})

+2
source

Here you go:

import re

tablesInDataset = ["henry_jones_12345678", "henry_jones", "henry_jones_123"]
rx = re.compile(r'^(\D+)_(\d{8})$')

matches = [(match.groups()) \
            for item in tablesInDataset \
            for match in [rx.search(item)] \
            if match]
print(matches)

Better than any dot-star-soup :)

+1
source

Source: https://habr.com/ru/post/1652512/


All Articles