How to get a list of all parent tags in BeautifulSoup?

Say I have a structure like this:

<folder name="folder1">
     <folder name="folder2">
          <bookmark href="link.html">
     </folder>
</folder>

If I point to a bookmark, which command will just retrieve all the rows of the folders? For instance,

bookmarks = soup.findAll('bookmark')

then beautifulsoupcommand(bookmarks[0])will return:

[<folder name="folder1">,<folder name="folder2">]

I also want to know when end tags are deleted. Any ideas?

Thanks in advance!

+3
source share
2 answers

Here is my hit:

>>> from BeautifulSoup import BeautifulSoup
>>> html = """<folder name="folder1">
     <folder name="folder2">
          <bookmark href="link.html">
     </folder>
</folder>
"""
>>> bookmarks = soup.findAll('bookmark')
>>> [p.get('name') for p in bookmarks[0].findAllPrevious(name = 'folder')]
[u'folder2', u'folder1']

The key difference from @eumiro's answer is what I use findAllPreviousinstead findParents. When I tested the @eumiro solution, I found that it findParentsonly returned the first (immediate) parent, since the name of the parent and grandparents are the same.

>>> [p.get('name') for p in bookmarks[0].findParents('folder')]
[u'folder2']

>>> [p.get('name') for p in bookmarks[0].findParents()]
[u'folder2', None]

, - .

>>> html = """<folder name="folder1">
     <folder_parent name="folder2">
          <bookmark href="link.html">
     </folder_parent>
</folder>
"""
>>> soup = BeautifulSoup(html)
>>> bookmarks = soup.findAll('bookmark')
>>> [p.get('name') for p in bookmarks[0].findParents()]
[u'folder2', u'folder1', None]
+7

bookmarks[0].findParents('folder') . name.

+3

Source: https://habr.com/ru/post/1765636/


All Articles