I use a regex to parse some BBCode, so the regex should work recursively to also match the tags inside others. Most BBCode has an argument, and is sometimes quoted, although not always.
The simplified equivalent of the regular expression I use (with html style tags to reduce the need for escaping):
'~<(\")?a(?(1)\1)> #Match the tag, and require a closing quote if an opening one provided ([^<]+ | (?R))* #Match the contents of the tag, including recursively </a>~x'
However, if I have a test line that looks like this:
<"a">Content<a>Also Content</a></a>
it matches only the <a>Also Content</a> character, because when it tries to match the first tag, the first matching group \1 set to " , and this is not , overwritten when the regular expression is run recursively to match the internal tag , which means that since it is not quoted, it does not match and that regex is not working.
If instead I use or do not use quotation marks sequentially, it works fine, but I cannot be sure that this will be the case with the content that I need to parse. Is there any way around this?
The full regex that I use to match [spoiler]content[/spoiler] , [spoiler=option]content[/spoiler] and [spoiler="option"]content[/spoiler] ,
"~\[spoiler\s*+ #Match the opening tag (?:=\s*+(\"|\')?((?(1)(?!\\1).|[^\]]){0,100})(?(1)\\1))?+\s*\] #If an option exists, match that (?:\ *(?:\n|<br />))?+ #Get rid of an extra new line before the start of the content if necessary ((?:[^\[\n]++ #Capture all characters until the closing tag |\n(?!\[spoiler]) Capture new line separately so backtracking doesn't run away due to above |\[(?!/?spoiler(?:\s*=[^\]*])?) #Also match all tags that aren't spoilers |(?R))*+) #Allow the pattern to recurse - we also want to match spoilers inside spoilers, # without messing up nesting \n? #Get rid of an extra new line before the closing tag if necessary \[/spoiler] #match the closing tag ~xi"
There are a few other mistakes with him, though.