Get all HTML tags using Beautiful Soup

I am trying to get a list of all html tags from a wonderful soup.

I see everything, but I need to know the tag name before searching.

If there is text, for example

html = """<div>something</div> <div>something else</div> <div class='magical'>hi there</div> <p>ok</p>""" 

How do I get a list, for example

 list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"] 

I know how to do this with regex, but I'm trying to learn BS4

+5
source share
1 answer

You do not need to specify any arguments for find_all() - in this case BeautifulSoup will find you each tag in the tree recursively. Example:

 >>> from bs4 import BeautifulSoup >>> >>> html = """<div>something</div> ... <div>something else</div> ... <div class='magical'>hi there</div> ... <p>ok</p>""" >>> soup = BeautifulSoup(html, "html.parser") >>> [tag.name for tag in soup.find_all()] [u'div', u'div', u'div', u'p'] >>> [str(tag) for tag in soup.find_all()] ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>'] 
+14
source

Source: https://habr.com/ru/post/1245400/


All Articles