This is a very interesting paragraph.

HTML processing

I want to process some HTML code and remove the tags, as in the example:

"<b> This </b> is a very interesting paragraph. </p>" leads to "This is a very interesting paragraph."

I use Python as a technology; Do you know any structure that I can use to remove HTML tags?

Thank!

+3
source share
5 answers

This question may help you: Pull HTML from strings in Python

, , . , - HTML, HTML .

+6
+4
import libxml2

text = "<p><b>This</b> is a very interesting paragraph.</p>"
root = libxml2.parseDoc(text)
print root.content

# 'This is a very interesting paragraph.'
+2

/<(.|\n)*?>/ . , , .

0
0

Source: https://habr.com/ru/post/1770940/


All Articles