HTML processing

Question

HTML processing

I want to process some HTML code and remove the tags, as in the example:

"<b> This </b> is a very interesting paragraph. </p>" leads to "This is a very interesting paragraph."

I use Python as a technology; Do you know any structure that I can use to remove HTML tags?

Thank!

+3

python html-parsing

Laurențiu Dascălu Oct 22 '10 at 15:07

source share

5 answers

BeautifulSoup

+4

kevingessner 22 . '10 15:11

import libxml2

text = "<p><b>This</b> is a very interesting paragraph.</p>"
root = libxml2.parseDoc(text)
print root.content

# 'This is a very interesting paragraph.'

+2

eumiro 22 . '10 15:14

/<(.|\n)*?>/ . , , .

0

Daniel Mendel 22 . '10 15:16

lxml.

0

ghostdog74 22 . '10 15:26

Colin O'Dell · Accepted Answer · 2010-10-22T15:11:28+0000

This question may help you: Pull HTML from strings in Python

, , . , - HTML, HTML .

HTML processing

More articles: