How to handle nested parentheses with regular expressions?

Question

How to handle nested parentheses with regular expressions?

I came up with a regex line that parses this text into 3 categories:

in parentheses
in brackets
neither.

Like this:

\[.+?\]|\(.+?\)|[\w+ ?]+

I intend to use only the external operator. So, given a(b[c]d)e , the split will be:

 a || (b[c]d) || e

It works great with given brackets inside brackets or brackets inside parentheses, but it breaks when brackets inside brackets and brackets are enclosed in brackets. For example, a[b[c]d]e is divided as

 a || [b[c] || d || ] || e.

Is there a way to handle this with a regular expression without resorting to using code to count the number of open / closed parentheses? Thanks!

+4

python regex

Minas abovyan Jun 29 '13 at 20:41

source share

2 answers

Well, as soon as you give up the idea that parsing nested expressions should work at unlimited depth, you can simply use regular expressions, specifying the maximum depth in advance. Here's how:

 def nested_matcher (n): # poor man matched paren scanning, gives up after n+1 levels. # Matches any string with balanced parens or brackets inside; add # the outer parens yourself if needed. Nongreedy. Does not # distinguish parens and brackets as that would cause the # expression to grow exponentially rather than linearly in size. return "[^][()]*?(?:[([]"*n+"[^][()]*?"+"[])][^][()]*?)*?"*n import re p = re.compile('[^][()]+|[([]' + nested_matcher(10) + '[])]') print p.findall('a(b[c]d)e') print p.findall('a[b[c]d]e') print p.findall('[hello [world]] abc [123] [xyz jkl]')

This will lead to the conclusion

 ['a', '(b[c]d)', 'e'] ['a', '[b[c]d]', 'e'] ['[hello [world]]', ' abc ', '[123]', ' ', '[xyz jkl]']

0

user3489112 Apr 2 '14 at 12:43

source share

arshajii · Accepted Answer · 2013-06-29T20:44:02+0000

Standard regular expressions ^{1 are} not complex enough to match nested structures. The best way to get close to this is probably crossing the line and tracking open / close pairs.

¹ I said the standard, but not all regex engines are really standard. You could do this with Perl, for example, using recursive regular expressions. For instance:

 $str = "[hello [world]] abc [123] [xyz jkl]"; my @matches = $str =~ /[^\[\]\s]+ | \[ (?: (?R) | [^\[\]]+ )+ \] /gx; foreach (@matches) { print "$_\n"; }

  [hello [world]]
 abc
 [123]
 [xyz jkl]

EDIT: I see that you are using Python; check out pyparsing .

How to handle nested parentheses with regular expressions?

More articles: