How can I use Python to replace HTML escape characters?

Possible duplicate:
Decode HTML objects in Python string?

I have a string full of HTML escape characters such as " , ” and — .

Are there any reliable ways in any Python libraries to replace all these escape characters with the corresponding actual characters?

For example, I want to replace all " on "s.

+7
source share
1 answer

Do you want to use this:

 try: from html.parser import HTMLParser # Python 3 except ModuleNotFoundError: from HTMLParser import HTMLParser # Python 2 parser = HTMLParser() html_decoded_string = parser.unescape(html_encoded_string) 

I also see a lot of love for BeautifulSoup

 from BeautifulSoup import BeautifulSoup html_decoded_string = BeautifulSoup(html_encoded_string, convertEntities=BeautifulSoup.HTML_ENTITIES) 

Also duplicate these existing questions:

Decode HTML objects in Python string?

Decoding HTML Objects Using Python

Decoding HTML Objects Using Python

+17
source

Source: https://habr.com/ru/post/920092/


All Articles