Get all HTML tags using Beautiful Soup

Question

Get all HTML tags using Beautiful Soup

I am trying to get a list of all html tags from a wonderful soup.

I see everything, but I need to know the tag name before searching.

If there is text, for example

html = """<div>something</div> <div>something else</div> <div class='magical'>hi there</div> <p>ok</p>"""

How do I get a list, for example

 list_of_tags = ["<div>", "<div>", "<div class='magical'>", "<p>"]

I know how to do this with regex, but I'm trying to learn BS4

+5

python html beautifulsoup

humanbeing Mar 19 '16 at 23:43

source share

1 answer

alecxe · Accepted Answer · 2016-03-20T00:25:38+0000

You do not need to specify any arguments for find_all() - in this case BeautifulSoup will find you each tag in the tree recursively. Example:

 >>> from bs4 import BeautifulSoup >>> >>> html = """<div>something</div> ... <div>something else</div> ... <div class='magical'>hi there</div> ... <p>ok</p>""" >>> soup = BeautifulSoup(html, "html.parser") >>> [tag.name for tag in soup.find_all()] [u'div', u'div', u'div', u'p'] >>> [str(tag) for tag in soup.find_all()] ['<div>something</div>', '<div>something else</div>', '<div class="magical">hi there</div>', '<p>ok</p>']

Get all HTML tags using Beautiful Soup

More articles: