Is there a regex-like capable of parsing relevant characters?

This is a regex

/\(.*\)/ 

will not match matching brackets, but the last bracket in the line. Is there a regex extension or something similar with the correct syntax that allows this? For instance:

 there are (many (things (on) the)) box (except (carrots (and apples))) 

/OPEN(.*CLOSE)/ must match (many (things (on) the))

There can be infinite levels of parentheses.

+1
source share
3 answers

If you have only one level of parentheses, then there are two possibilities.

Option 1: use repeat repetition:

 /\(.*?\)/ 

This will stop when he meets the first ) .

Option 2: use the negative character class

 /\([^)]*\)/ 

This can only repeat characters that are not ) , so it may never go past the first closing parenthesis. This option is usually preferred for performance reasons. Also, this option is easier to extend to avoid brackets being accelerated (so that you can match this full line: (some\)thing) instead of throwing thing) away thing) ). But this is probably quite rarely necessary.

However, if you want nested structures, this is usually too complicated for regular expression (although some options, such as PCRE, support recursive patterns). In this case, you just need to go through the line and count the parentheses to track the current level of nesting.

Just like a side note about these recursive patterns: In PCRE (?R) , the whole pattern is simply displayed, so pasting this place makes it all recursive. But then each content of parentheses should be the same structure as the whole match. In addition, it is actually impossible to make meaningful one-step replacements for this, as well as to use capture groups on several nested levels. In general, it’s best for you not to use regular expressions for nested structures.

Update:. Since it seems to you that you are looking for a solution for regular expressions, here is how you would match your example using PCRE (PHP implementation example):

 $str = 'there are (many (things (on) the)) box (except (carrots (and apples)))'; preg_match_all('/\([^()]*(?:(?R)[^()]*)*\)/', $str, $matches); print_r($matches); 

leads to

 Array ( [0] => Array ( [0] => (many (things (on) the)) [1] => (except (carrots (and apples))) ) ) 

What the template does:

 \( # opening bracket [^()]* # arbitrarily many non-bracket characters (?: # start a non-capturing group for later repetition (?R) # recursion! (match any nested brackets) [^()]* # arbitrarily many non-bracket characters )* # close the group and repeat it arbitrarily many times \) # closing bracket 

This allows you to use infinite nested levels, as well as for infinite parallel levels.

Note that it is not possible to get all nested levels as separate captured groups. You will always receive only the largest or outermost group. In addition, it is not possible to make a recursive replacement.

+7
source

Regular expressions are not powerful enough to find matching brackets, because the brackets are nested structures. There is a simple algorithm for finding matching brackets, although described in this answer .

If you are simply trying to find the correct first parenthesis in an expression, you should use non-living matches in your regular expression. In this case, the unwanted version of your regular expression looks like this:

 /\(.*?\)/ 
+2
source

For a string containing nested matching brackets, you can either match the innermost sets with this (non-recursive JavaScript) regular expression:

 var re = /\([^()]*\)/g; 

Or you can map external elements to this (recursive PHP) regular expression:

 $re = '/\((?:[^()]++|(?R))*\)/'; 

But you cannot easily match sets of matching parentheses between internal and external.

Please also note that the expression (naive and common): /\(.*?\)/ will always match incorrectly (neither the most intimate nor the most external mapped sets).

+1
source

Source: https://habr.com/ru/post/947034/


All Articles