Beautiful soup: looking for a nested template?

soup.find_allwill search the BeautifulSoup document for all occurrences of the same tag. Is there a way to search for specific nested tag templates?

For example, I would like to search for all occurrences of this template:

<div class="separator">
  <a>
    <img />
  </a>
</div>
+4
source share
2 answers

There are several ways to search for a template, but the easiest way to use it is CSS selector:

for img in soup.select('div.separator > a > img'):
    print img  # or img.parent.parent to get the "div"

Demo:

>>> from bs4 import BeautifulSoup
>>> data = """
... <div>
...     <div class="separator">
...       <a>
...         <img src="test1"/>
...       </a>
...     </div>
... 
...     <div class="separator">
...       <a>
...         <img src="test2"/>
...       </a>
...     </div>
... 
...     <div>test3</div>
... 
...     <div>
...         <a>test4</a>
...     </div>
... </div>
... """
>>> soup = BeautifulSoup(data)
>>> 
>>> for img in soup.select('div.separator > a > img'):
...     print img.get('src')
... 
test1
test2

I understand that, strictly speaking, the solution will not work if it divhas more than one child aor if athere is something other than the tag inside the tag img. If so, the solution can be improved with additional checks (edit the answer if necessary).

+1

docs. , , :

def nested_img(div):
    child = div.contents[0]
    return child.name == "a" and child.contents[0].name == "img"

soup.find_all("div", nested_img)

P.S.: .

+1

Source: https://habr.com/ru/post/1541821/


All Articles