Get an external β€œpair” when nested

I use the regex <@(.+?)@> To match patterns, for example:

 <@set:template default.spt @> 

It works fine, but I came across situations where I needed to attach a template, for example:

 <@set:template <@get:oldtemplate @> @> 

Instead of getting the parent pair (<@ and @>) I get the following:

 <@set:template <@get:oldtemplate @> 

I do not want him to get a child, I just want the most external parent in all nested situations. How to fix my regular expression so that it does this for me? I believe that I could do this if I knew how to require for each <@ that there was one @> inside the parent, but I do not know how to do this.

+4
source share
2 answers

What you are describing is an "irregular language." It cannot be parsed using regular expression.

Well, if you are willing to limit the level of nesting, technically you can do this with a regular expression. But it will be ugly.

Here's how to make out your thing with a few (increasing) maximum nesting depths, if you can make the condition for the absence of @ inside your tags:

 no nesting: <@[^@] +@ > up to 1: <@[^@]+(<@[^@] +@ >)?[^@]*@> up to 2: <@[^@]+(<@[^@]+(<@[^@] +@ >)?[^@]*@>)?[^@]*@> up to 3: <@[^@]+(<@[^@]+(<@[^@]+(<@[^@] +@ >)?[^@]*@>)?[^@]*@>)?[^@]*@> ... 

If you cannot ban lone @ in your tags, you will need to replace each instance [^@] like this: (?:[^<@]|<[^@]|@[^>]) .

Think about it, and then think about expanding your regex to parse up to 10 deep attachments.

Here I will do it for you:

 <@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[ ^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|< [^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@ [^>])+(<@(?:[^<@]|<[^@]|@[^>]) +@ >)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>] )*@>)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@ >)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@>)? (?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@> 

I hope my answer shows that regular expression is not the right tool for parsing a language. The traditional combination of a lexer (tokenizer) and a parser will do a much better job, be significantly faster, and handle an indefinite investment.

+5
source

I don't think you can do this with a regex, see the answer to this question that asks a similar thing. Regexes are not powerful enough to deal with arbitrary levels of nesting, if you have only 2 levels of nesting, then this should be possible, but perhaps regular expressions are not the best tool to work with.

+1
source

Source: https://habr.com/ru/post/1481254/


All Articles