You can use this template:
(?|\G(?!\A)(?|-{2,}+([^->][^-]*)|(-[^-]+)|-+(?=-->)|-->[^<]*(*SKIP)(*FAIL))|[^<]*<+(?>[^<]+<+)*?(?:!--\K|[^<]*\z\K(*ACCEPT))(?|-*+([^->][^-]*)|-+(?=-->)|-?+([^-]+)|-->[^<]*(*SKIP)(*FAIL)()))
Details:
(?| \G(?!\A) # contiguous to the precedent match (inside a comment) (?| -{2,}+([^->][^-]*) # duplicate hyphens, not part of the closing sequence | (-[^-]+) # preserve isolated hyphens | -+ (?=-->) # hyphens before closing sequence, break contiguity | -->[^<]* # closing sequence, go to next < (*SKIP)(*FAIL) # break contiguity ) | [^<]*<+ # reach the next < (outside comment) (?> [^<]+ <+ )*? # next < until !-- or the end of the string (?: !-- \K | [^<]*\z\K (*ACCEPT) ) # new comment or end of the string (?| -*+ ([^->][^-]*) # possible hyphens not followed by > | -+ (?=-->) # hyphens before closing sequence, break contiguity | -?+ ([^-]+) # one hyphen followed by > | -->[^<]* # closing sequence, go to next < (*SKIP)(*FAIL) () # break contiguity (note: "()" avoids a mysterious bug ) # in regex101, you can remove it) )
With this replacement: \1
online demo
The \G function guarantees matching matches. To break the contact, two methods are used:
- view
(?=-->) - backtracking control verbs
(*SKIP)(*FAIL) , which cause the pattern to fail and all characters match before they are repeated.
So, when the contact is broken or at the beginning the first main branch will fail (reason for binding \G ), and the second branch will be used.
\K removes everything to the left of the match result.
(*ACCEPT) makes the template unconditional.
This template uses the massive function reset (?|...(..)...|...(..)...|...) , so all capture groups have the same number (in other words, there is only one group, group 1.)
Note. Even this template is long, it takes a few steps to get a match. The influence of non-greedy quantifiers is reduced as much as possible, and each alternative is sorted and most effective. One goal is to reduce the total number of matches needed to process a string.