How to use Beautiful4 to filter multiple classes
from bs4 import BeautifulSoup
html = """
<div class="aa bb"></div>
<div class="aa ccc"></div>
<div class="aa"></div>
"""
def find(aclass):
print(aclass)
return aclass != "bb"
soup = BeautifulSoup(html, 'lxml')
div = soup.find_all('div', attrs={'class': find})
print(div)
I just want class = 'aa' and not 'aa bb' or any others. Please help me! Thanks!!
+4
2 answers
You can also use a simple CSS selector :
soup.select("div[class=aa]")
Demo:
>>> from bs4 import BeautifulSoup
>>>
>>> html = """
... <div class="aa bb"></div>
... <div class="aa ccc"></div>
... <div class="aa"></div>
... """
>>> soup = BeautifulSoup(html, 'lxml')
>>>
>>> for elm in soup.select("div[class=aa]"):
... print(str(elm))
...
<div class="aa"></div>
+2
Here was the answer of BeautifulSoup webscraping find_all (): finding an exact match
This will only give you a tag with the class "aa".
div = soup.find_all(lambda tag: tag.name == 'div' and tag.get('class') == ['aa'])
+4