Comprehensive lovely soup request

Here is a snippet of the HTML file that I am learning with Beautiful Soup.

<td width="50%"> <strong class="sans"><a href="http:/website">Site</a></strong> <br /> 

I would like to get <a href> for any line that has <strong class="sans"> and is inside <td width="50%"> .

Is it possible to request an HTML file for these several conditions using Beautiful Soup?

+3
source share
2 answers

BeautifulSoup search engines accept callable code, which seems to be recommended for your case: "If you need to impose complex or blocking restrictions on tag attributes, pass the callable for the name ...". (well ... they talk about attributes specifically, but the tip reflects the basic spirit of the BeautifulSoup API).

If you need a single line:

 soup.findAll(lambda tag: tag.name == 'a' and \ tag.findParent('strong', 'sans') and \ tag.findParent('strong', 'sans').findParent('td', attrs={'width':'50%'})) 

In this example, I used a lambda, but in practice, you may need to define the function to be called if you have several requirements attached, as this lambda must make two calls to findParent('strong', 'sans') to avoid raising an exception if <a> does not have a strong parent. Using the correct function, you can make the test more efficient.

+9
source
 >>> BeautifulSoup.BeautifulSoup("""<html><td width="50%"> ... <strong class="sans"><a href="http:/website">Site</a></strong> <br /> ... </html>""" ) <html><td width="50%"> <strong class="sans"><a href="http:/website">Site</a></strong> <br /> </td></html> >>> [ a for a in strong.findAll("a") for strong in tr.findAll("strong", attrs = {"class": "sans"}) for tr in soup.findAll("td", width = "50%")] [<a href="http:/website">Site</a>] 
0
source

Source: https://habr.com/ru/post/1441939/


All Articles