I want to find text between a pair of <a> tags that link to this site
Here's the re line that I use to find the content:
r'''(<a([^<>]*)href=("|')(http://)?(www\.)?%s([^'"]*)("|')([^<>]*)>([^<]*))</a>''' % our_url
The result will be something like this:
r'''(<a([^<>]*)href=("|')(http://)?(www\.)?stackoverflow.com([^'"]*)("|')([^<>]*)>([^<]*))</a>'''
This works great for most links, but these are errors with links to tags inside it. I tried changing the final part of the regex:
([^<]*))</a>'''
in
(.*))</a>'''
But it just got everything on the page after the link, which I don't want. Are there any suggestions as to what I can do to solve this problem?
source
share