...">

How to use Beautiful4 to filter multiple classes

from bs4 import BeautifulSoup

html = """
    <div class="aa bb"></div>
    <div class="aa ccc"></div>
    <div class="aa"></div>
"""


def find(aclass):
    print(aclass)
    return aclass != "bb"

soup = BeautifulSoup(html, 'lxml')

div = soup.find_all('div', attrs={'class': find})

print(div)

I just want class = 'aa' and not 'aa bb' or any others. Please help me! Thanks!!

+4
source share
2 answers

You can also use a simple CSS selector :

soup.select("div[class=aa]")

Demo:

>>> from bs4 import BeautifulSoup
>>> 
>>> html = """
...     <div class="aa bb"></div>
...     <div class="aa ccc"></div>
...     <div class="aa"></div>
... """
>>> soup = BeautifulSoup(html, 'lxml')
>>> 
>>> for elm in soup.select("div[class=aa]"):
...     print(str(elm))
... 
<div class="aa"></div>
+2
source

Here was the answer of BeautifulSoup webscraping find_all (): finding an exact match

This will only give you a tag with the class "aa".

div = soup.find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['aa'])
+4
source

Source: https://habr.com/ru/post/1624929/


All Articles