Failed to get the correct link in BeautifulSoup

Question

Failed to get the correct link in BeautifulSoup

I am trying to parse a bit of HTML, and I would like to extract a link that matches a specific template. I use a findregex method , but it does not give me the correct link. Here is my snippet. Can someone tell me what I am doing wrong?

from BeautifulSoup import BeautifulSoup
import re

html = """
<div class="entry">
    <a target="_blank" href="http://www.rottentomatoes.com/m/diary_of_a_wimpy_kid/">RT</a>
    <a target="_blank" href="http://www.imdb.com/video/imdb/vi2496267289/">Trailer</a> &ndash; 
    <a target="_blank" href="http://www.imdb.com/title/tt1196141/">IMDB</a> &ndash; 
</div>
"""

soup = BeautifulSoup(html)
print soup.find('a', href = re.compile(r".*title/tt.*"))['href']

I need to get the second link, but BS always returns the first link. hrefThe first link doesn't even match my regex, so why does it return it?

Thank.

+3

python beautifulsoup

Mridang agarwalla Jul 23 '10 at 8:11

source share

2 answers

, () .

import BeautifulSoup

to

from BeautifulSoup import BeautifulSoup

( beautifulsoup 3.1.0.1) :

http://www.imdb.com/title/tt1196141/

0

miku 23 . '10 8:13

katrielalex · Accepted Answer · 2010-07-23T09:03:29+0000

findreturns only the first tag <a>. You want to findAll.

Failed to get the correct link in BeautifulSoup

More articles: