How to use BeautifulSoup to get deeply nested div values?

I need to get the values ​​of deeply nested elements <span>in a DOM structure that looks like this:

<div class="panda">
    <div class="that">
        <ul class="foo">
            <li class="bar">
                <div class="hi">
                    <p class="bye">
                        <span class="cheese">Cheddar</span>

A problem with

soup.findAll("span", {"class": "cheese"})

is that there are hundreds of span elements on the page with the "cheese" class, so I need to filter them by the "panda" class. I need to get a list of values ​​like["Cheddar", "Parmesan", "Swiss"]

+1
source share
1 answer

Use the css selector:

[e.get_text() for e in soup.select('.panda .cheese')]

Or if you prefer find_all:

# Calling a soup or tag is the same as find_all

[e.get_text() for panda in soup('div', {'class': 'panda'}) 
              for e in panda('span', {'class': 'cheese'})]
+2
source

Source: https://habr.com/ru/post/1625238/


All Articles