Extract text between HTML comments using BeautifulSoup

Using Python 3 and BeautifulSoup 4, I would like to be able to extract text from an HTML page, which is indicated only above the comment. Example:

<\!--UNIQUE COMMENT-->
I would like to get this text
<\!--SECOND UNIQUE COMMENT-->
I would also like to find this text

I found various ways to extract page text or comments, but could not do what I'm looking for. Any help would be greatly appreciated.

+4
source share
3 answers

You just need to iterate through all the available comments to see if this is one of your required entries, and then display the text for the next element as follows:

from bs4 import BeautifulSoup, Comment

html = """
<html>
<body>
<p>p tag text</p>
<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')

for comment in soup.findAll(text=lambda text:isinstance(text, Comment)):
    if comment in ['UNIQUE COMMENT', 'SECOND UNIQUE COMMENT']:
        print comment.next_element.strip()

The following will appear:

I would like to get this text
I would also like to find this text
+4
source

- - , - :

comments_to_search_for = {'UNIQUE COMMENT', 'SECOND UNIQUE COMMENT'}
for comment in soup.find_all(text=lambda text: isinstance(text, Comment) and text in comments_to_search_for):
    print(comment.next_element.strip())

I would like to get this text
I would also like to find this text
+2

The bs4Python module has a Comment class . You can use these comment extractors.

from bs4 import BeautifulSoup, Comment

html = """
<html>
<body>
<p>p tag text</p>
<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')
comments = soup.findAll(text=lambda text:isinstance(text, Comment))

This will give you Comment elements.

[u'UNIQUE COMMENT', u'SECOND UNIQUE COMMENT']
+1
source

Source: https://habr.com/ru/post/1623400/


All Articles