What you are describing is an "irregular language." It cannot be parsed using regular expression.
Well, if you are willing to limit the level of nesting, technically you can do this with a regular expression. But it will be ugly.
Here's how to make out your thing with a few (increasing) maximum nesting depths, if you can make the condition for the absence of @ inside your tags:
no nesting: <@[^@] +@ > up to 1: <@[^@]+(<@[^@] +@ >)?[^@]*@> up to 2: <@[^@]+(<@[^@]+(<@[^@] +@ >)?[^@]*@>)?[^@]*@> up to 3: <@[^@]+(<@[^@]+(<@[^@]+(<@[^@] +@ >)?[^@]*@>)?[^@]*@>)?[^@]*@> ...
If you cannot ban lone @ in your tags, you will need to replace each instance [^@] like this: (?:[^<@]|<[^@]|@[^>]) .
Think about it, and then think about expanding your regex to parse up to 10 deep attachments.
Here I will do it for you:
<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[ ^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|< [^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@[^>])+(<@(?:[^<@]|<[^@]|@ [^>])+(<@(?:[^<@]|<[^@]|@[^>]) +@ >)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>] )*@>)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@ >)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@>)? (?:[^<@]|<[^@]|@[^>])*@>)?(?:[^<@]|<[^@]|@[^>])*@>
I hope my answer shows that regular expression is not the right tool for parsing a language. The traditional combination of a lexer (tokenizer) and a parser will do a much better job, be significantly faster, and handle an indefinite investment.
Tobia source share