Failed to get the correct link in BeautifulSoup

I am trying to parse a bit of HTML, and I would like to extract a link that matches a specific template. I use a findregex method , but it does not give me the correct link. Here is my snippet. Can someone tell me what I am doing wrong?

from BeautifulSoup import BeautifulSoup
import re

html = """
<div class="entry">
    <a target="_blank" href="http://www.rottentomatoes.com/m/diary_of_a_wimpy_kid/">RT</a>
    <a target="_blank" href="http://www.imdb.com/video/imdb/vi2496267289/">Trailer</a> &ndash; 
    <a target="_blank" href="http://www.imdb.com/title/tt1196141/">IMDB</a> &ndash; 
</div>
"""

soup = BeautifulSoup(html)
print soup.find('a', href = re.compile(r".*title/tt.*"))['href']

I need to get the second link, but BS always returns the first link. hrefThe first link doesn't even match my regex, so why does it return it?

Thank.

+3
source share
2 answers

findreturns only the first tag <a>. You want to findAll.

+2
source

, () .

import BeautifulSoup

to

from BeautifulSoup import BeautifulSoup

( beautifulsoup 3.1.0.1) :

http://www.imdb.com/title/tt1196141/
0

Source: https://habr.com/ru/post/1756137/


All Articles