Comprehensive lovely soup request

Question

Comprehensive lovely soup request

Here is a snippet of the HTML file that I am learning with Beautiful Soup.

<td width="50%"> <strong class="sans"><a href="http:/website">Site</a></strong> <br />

I would like to get <a href> for any line that has <strong class="sans"> and is inside <td width="50%"> .

Is it possible to request an HTML file for these several conditions using Beautiful Soup?

+3

python beautifulsoup

user41767 Apr 01 '09 at 16:51

source share

2 answers

 >>> BeautifulSoup.BeautifulSoup("""<html><td width="50%"> ... <strong class="sans"><a href="http:/website">Site</a></strong> <br /> ... </html>""" ) <html><td width="50%"> <strong class="sans"><a href="http:/website">Site</a></strong> <br /> </td></html> >>> [ a for a in strong.findAll("a") for strong in tr.findAll("strong", attrs = {"class": "sans"}) for tr in soup.findAll("td", width = "50%")] [<a href="http:/website">Site</a>]

0

Aaron maenpaa Apr 01 '09 at 17:19

source share

Jarret hardie · Accepted Answer · 2009-04-01T17:15:12+0000

BeautifulSoup search engines accept callable code, which seems to be recommended for your case: "If you need to impose complex or blocking restrictions on tag attributes, pass the callable for the name ...". (well ... they talk about attributes specifically, but the tip reflects the basic spirit of the BeautifulSoup API).

If you need a single line:

 soup.findAll(lambda tag: tag.name == 'a' and \ tag.findParent('strong', 'sans') and \ tag.findParent('strong', 'sans').findParent('td', attrs={'width':'50%'}))

In this example, I used a lambda, but in practice, you may need to define the function to be called if you have several requirements attached, as this lambda must make two calls to findParent('strong', 'sans') to avoid raising an exception if <a> does not have a strong parent. Using the correct function, you can make the test more efficient.

Comprehensive lovely soup request

More articles: