Recursive regex doesn't work

The line I'm working on is as follows:

abc {def ghi {jkl mno} pqr stv} xy z 

And I need to put the parentheses of the shapes in the tags, so it should look like this:

 abc <tag>def ghi <tag>jkl mno</tag> pqr stv</tag> xy z 

Ive tried

 '#(?<!\pL)\{ ( ([^{}]+) | (?R) )* \}(?!\pL)#xu' 

but I get just <tag>xy z</tag> . Please help me, what am I doing wrong?

+4
source share
2 answers

How about two steps:

s!{!<tag>!g;
s!}!</tag>!g;

(perl format; translate in the appropriate format)

or maybe this:

1 while s!{([^{}]*)}!<tag>$1</tag>!g;

+3
source

Nested structures are by definition too complex for regular expressions (yes, PCRE supports recursion, but this does not help for this replacement problem). There are two possible options for you (using regular expressions). First, you can simply replace the opening brackets by opening the tags, and the same goes for the closing tags. This, however, also converts incomparable brackets:

 $str = preg_replace('/\{/', '<tag>', $str); $str = preg_replace('/\}/', '</tag>', $str); 

Another option is only to replace the { and } matches, but you need to do this several times, because a single call to preg_replace cannot replace multiple nested levels:

 do { $str = preg_replace('/\{([^{]*?)\}/', '<tag>$1</tag>', $str, -1, $count); } while ($count > 0) 

EDIT: As long as PCRE supports recursion with (?R) , this most likely will not help with the replacement. The reason is that if a repeating capture group is repeated, its link will contain only the last capture (i.e., when matching /(a|b)+/ in aaaab , $1 will contain b ). I believe this is the same for recursion. This is why you can only replace the innermost match, because this is the last match of the capture group in recursion. Likewise, you could not try to capture { and } with recursion and replace them, because they can also be matched an arbitrary number of times and only the last match is replaced.

Just matching the correct nested syntax and then replacing the innermost or outer matching parentheses will not help with any (with one call to preg_replace ), because multiple matches will never overlap (so if 3 nested parentheses are found, the inner 2 parentheses themselves will be ignored for further matches).

+5
source

Source: https://habr.com/ru/post/1439443/


All Articles