I am working on a small Python script to clear HTML documents. It works by accepting a tag list for KEEP, and then parsing HTML tags that are not in the list that I used for regular expressions, and I was able to match the opening tags and the self-closing tags but not closing the tags. The pattern I experimented to match the closing tags is - </(?!a)>. It seems logical to me, so why doesn't it work? (?!a)should coincide with everything that is NOT an anchor tag (not that "a" can be anything - it's just an example).
Edit: AGG! I think the regex did not show!
source
share