I am trying to parse a bit of HTML, and I would like to extract a link that matches a specific template. I use a findregex method , but it does not give me the correct link. Here is my snippet. Can someone tell me what I am doing wrong?
from BeautifulSoup import BeautifulSoup
import re
html = """
<div class="entry">
<a target="_blank" href="http://www.rottentomatoes.com/m/diary_of_a_wimpy_kid/">RT</a>
<a target="_blank" href="http://www.imdb.com/video/imdb/vi2496267289/">Trailer</a> –
<a target="_blank" href="http://www.imdb.com/title/tt1196141/">IMDB</a> –
</div>
"""
soup = BeautifulSoup(html)
print soup.find('a', href = re.compile(r".*title/tt.*"))['href']
I need to get the second link, but BS always returns the first link. hrefThe first link doesn't even match my regex, so why does it return it?
Thank.
source
share