Regex alone does not seem good enough, but since you are working with elevated scripts here, there is a way to simplify both the code and the process. Keep in mind that I am a vim user and not familiar with exalted internals. Also, I usually work with javascript regular expressions, not PCRE (which seems to be the format used by exalted or closest).
The idea is this:
- use regex to get the tag, attributes (per line) and tag content
- use capture groups for further processing and matching if necessary
In this case, I made this regex:
<([az]+)\ ?([az]+=\".*?\"\ ?)?>([.\n\sa-z]*)(<\/\1>)?
It starts by finding the opening tag, creates a control group for the tag name, if it finds a space, it matches a lot of attributes (inside the template \"...\" I could use \"[^\"]*?\" For matches only non-quote characters, but I purposefully match any character with greed for a closing quote - this should match most attributes that we can process later), match any text between tags, and then finally match the closing tag.
Creates 4 capture groups:
- tag name
- attribute string
- tag content
- closing tag
as you can see in this demo , if there is no closing tag, we do not get a capture group for it, the same for attributes, but we always get a capture group for the contents of the tag. This may be a problem in general (since we cannot assume that the captured function will be in one group), but it is not here, because in case of conflict, when we do not receive any attributes and no content, so the second capture group is empty , we can simply assume that this means the absence of attributes, and the absence of a third group speaks for itself. If there is nothing to disassemble, nothing can be analyzed incorrectly.
Now, to analyze the attributes, we can just do it with
([az]+=\"[^\"]*?\")
demo here . This gives us the attributes for sure. If an exalted script allows you to go this far, it will certainly allow you to continue processing if necessary. You can, of course, always use something like this:
(([az]+)=\"([^\"]*?)\")
which will provide capture groups for the attribute as a whole and its name and value separately.
Using this approach, you should be able to analyze the tags well enough for selection in 2-3 passes and send the content for selection to any marker you want (or just select it as plain text in any way convenient for you).