How to return a regular expression matching the text?

Question

How to return a regular expression matching the text?

Javascript Regular Question Answer Returns part of the matched expression as follows: "No, because compilation breaks the connection between the regular expression text and the matching logic."

But Python saves Match Objects , and re.groups() returns the specific group (s) that caused the match. It should be simple to save the regular expression text of each group as part of the matching object and return it, but it doesn't seem to be a challenge.

 import re pat = "(^\d+$)|(^\w+$)|(^\W+$)" test = ['a', 'c3', '36d', '51', '29.5', '#$%&'] for t in test: m = re.search(pat, t) s = (m.lastindex, m.groups()) if m else '' print(str(bool(m)), s)

This returns:

 True (2, (None, 'a', None)) True (2, (None, 'c3', None)) True (1, ('51', None, None)) False True (3, (None, None, '#$%&'))

The compiler obviously knows that there are three groups in this template. Is there a way to extract the subpattern in each group in a regular expression with something like:

 >>> print(m.regex_group_text) ('^\d+$', '^\w+$', '^\W+$')

Yes, you could write your own pattern, for example, to divide it by '|' for this particular model. But it would be much simpler and more reliable to use compiler comprehension of the text in each group.

+5

python python-3.x regex

Dave Mar 11 '16 at 22:27

source share

3 answers

This may or may not be useful depending on the problem you are actually trying to solve ... But python allows you to name groups:

 r = re.compile('(?P<int>^\d+$)|(?P<word>^\w+$)')

From there, when you have a match, you can check groupdict to see which groups are present:

 r.match('foo').groupdict() # {'int': None, 'word': 'foo'} r.match('10').groupdict() # {'int': '10', 'word': None}

Of course, this does not tell you the exact regular expression associated with the match. You will need to track this yourself based on the group name.

If you really want to go beyond this, you probably want something more complex than just regular expression analysis. In this case, I could suggest something like pyparsing . Don't let the old school style of the website fool you (or lack of a PEP-8 compatible API) - the library becomes powerful enough once you get used to it.

+4

mgilson Mar 11 '16 at 10:38

source share

It remains for you to monitor what regular expressions you load in re.search . Sort of:

 import re patts = { 'a': '\d+', 'b': '^\w+', 'c': '\W+' } pat = '^' + '|'.join('({})'.format(x) for x in patts.values()) + '$' test = ['a', 'c3', '36d', '51', '29.5', '#$%&'] for t in test: m = re.search(pat, t) if m: for g in m.groups(): for key, regex in patts.iteritems(): if g and re.search(regex, g): print "t={} matched regex={} ({})".format(t, key, regex) break

+2

user2926055 Mar 11 '16 at 10:52

source share

jbndlr · Accepted Answer · 2016-03-11T22:49:56+0000

If the indexes are not enough, and you absolutely need to know the exact part of the regular expression, perhaps there is no other way than to independently analyze groups of expressions.

In general, this does not really matter, since you can just count the opening and closing brackets and register their indices:

 def locateBraces(inp): bracePositions = [] braceStack = [] depth = 0 for i in range(len(inp)): if inp[i] == '(': braceStack.append(i) depth += 1 if inp[i] == ')': bracePositions.append((braceStack.pop(), i)) depth -= 1 if depth < 0: raise SyntaxError('Too many closing braces.') if depth != 0: raise SyntaxError('Too many opening braces.') return bracePositions

Edited: This dumb implementation only considers opening and closing curly braces. However, regular expressions may contain escaped braces, for example. \( which with this method are considered regular group-defining curly braces. You may want to adapt it to omit curly braces that have an uneven amount of backslash right in front of them. I leave this problem as a task for you;)

Using this function, your example will look like this:

 pat = "(^\d+$)|(^\w+$)|(^\W+$)" bloc = locateBraces(pat) test = ['a', 'c3', '36d', '51', '29.5', '#$%&'] for t in test: m = re.search(pat, t) print(str(bool(m)), end='') if m: h = bloc[m.lastindex - 1] print(' %s' % (pat[h[0]:h[1] + 1])) else: print()

What returns:

 True (^\w+$) True (^\w+$) True (^\w+$) True (^\d+$) False True (^\W+$)

Edited: To get a list of your groups, of course, a simple understanding:
 gtxt = [pat[b[0]:b[1] + 1] for b in bloc] 

How to return a regular expression matching the text?

More articles: