Match text to regular expression up to the second dot, exclude html tags

I am trying to match all the text from the beginning to the second point, excluding the points that are contained inside the html tags.

The following regexp /^([^\.]*[\.]){0,2}/works fine if they are not HTML tags, as it selects everything from the very beginning to the second point.

However, when I have this:

<p><img src="example.image.com" alt="foo">Text. More text.</p>

I would like my regular expression to stop at the second occurrence of the text, and not at the point between the "image" and the "com".

I also know that I \.(?![^><]*>)will select all the points outside the html tags, but I'm really afraid, and I would really appreciate your help!

+4
source share
1 answer

:

(?:(?:(?:<[^>]+>)*[^<.]*)*\.){2}
(?:                  # start of non-capturing group
    (?:              # start of non-capturing group
        (?:          # start of non-capturing group
            <[^>]+>  # matches an HTML tag
        )*           # match any more tags
        [^<.]*       # matches a sequence of non-tag, non-dot characters
    )*               # match any more tags and non-dot characters
    \.               # match a dot
){2}                 # repeat all of the above again

.

+1

Source: https://habr.com/ru/post/1537123/


All Articles