Recursive regular expression with text in front of nested brackets

I have the following text

$text = 'This is a test to see if something(try_(this(once))) works'; 

I need to get something(try_(this(once))) with a regular expression from text. I have the following problem

  • My nesting will not remain constant, my text may be

    • something(try_(this(once))) or
    • something(try_this(once)) or
    • something(try_thisonce)

I have tried several regular expressions found on the site, but cannot make them work. Here is the closest I came

EXAMPLE 1:

 $text = 'This is a test to see if something(try_(this(once))) works'; $output = preg_match_all('/(\(([^()]|(?R))*\))/', $text, $out); ?><pre><?php var_dump($out[0]); ?></pre><?php 

Displays

 array(1) { [0]=> string(18) "(try_(this(once)))" } 

No matter where I add the word something (for example, '/something(\(([^()]|(?R))*\))/' and '/(\something(([^()]|(?R))*\))/' ), I get an empty array or NULL

EXAMPLE 2

 $text2 = 'This is a test to see if something(try_(this(once))) works'; $output2 = preg_match_all('/something\((.*?)\)/', $text2, $out2); ?><pre><?php var_dump($out2[0]); ?></pre><?php 

With this code, I will return the word something ,

 array(1) { [0]=> string(25) "something(try_(this(once)" } 

but then the expression stops and returns after the first close ) , which is expected, since it is not a recursive expression

How do I recursively match and return the enclosed bracket with the word something before the first opening ( , and, if that is possible, what happens, then there may or may not be a space before the word something , for example

  • something(try_(this(once))) or
  • something (try_(this(once)))
+5
source share
3 answers

(?R) not a magic spell to get a pattern that can handle balanced things (like brackets). (?R) is the same as (?0) , it is an alias for the "capture group zero", in other words, the whole template.

In the same way you can use (?1) , (?2) , etc. as aliases for submatrices in groups 1, 2, etc.

Note that, with the exception of (?0) and (?R) , which, obviously, are always in their submatrix, since this is the whole pattern, (?1) , (?2) cause recursion only if they are in their Own own groups and can be used only in order not to rewrite part of the template.

something\((?:[^()]|(?R))*\) does not work, because for each nested (or not) opening bracket, something in your line precedes.

Conclusion, you cannot use (?R) here, and you need to create a capture group to handle only nested parentheses:

 (\((?:[^()]|(?1))*\)) 

which can be recorded in a more efficient way:

 (\([^()]*(?:(?1)[^()]*)*+\)) 

To finish, you need to add something that is no longer included in recursion:

 something(\([^()]*(?:(?1)[^()]*)*+\)) 

Note that if something is a submatrix with an undefined number of capture groups, it is more convenient to refer to the last open capture group with a relative link, for example:

 som(eth)ing(\([^()]*(?:(?-1)[^()]*)*+\)) 
+3
source
 [^() ]*(\((?:[^()]|(?1))*\)) 

Do you need to use ?1 . (?1) recurses the 1st subpattern Watch a demo.

https://regex101.com/r/cJ6zQ3/4

+3
source

This is a pretty literal way of matching the desired text and handling nested parentheses:

 something\s*\(.*?\)+ 

https://regex101.com/r/cN6nQ9/1

+1
source

Source: https://habr.com/ru/post/1232989/


All Articles