How can I use Python to replace HTML escape characters?

Question

How can I use Python to replace HTML escape characters?

Possible duplicate:
Decode HTML objects in Python string?

I have a string full of HTML escape characters such as " , ” and — .

Are there any reliable ways in any Python libraries to replace all these escape characters with the corresponding actual characters?

For example, I want to replace all " on "s.

+7

python

dangerChihuahua007 Jul 10 '12 at 2:55

source share

1 answer

Francis yaconiello · Accepted Answer · 2012-07-10T03:04:14+0000

Do you want to use this:

 try: from html.parser import HTMLParser # Python 3 except ModuleNotFoundError: from HTMLParser import HTMLParser # Python 2 parser = HTMLParser() html_decoded_string = parser.unescape(html_encoded_string)

I also see a lot of love for BeautifulSoup

 from BeautifulSoup import BeautifulSoup html_decoded_string = BeautifulSoup(html_encoded_string, convertEntities=BeautifulSoup.HTML_ENTITIES)

Also duplicate these existing questions:

Decode HTML objects in Python string?

Decoding HTML Objects Using Python

How can I use Python to replace HTML escape characters?

More articles: