With lxml as a backend parser:
import html5lib body = "<p>Hello World. Greetings from <strong>Mars.</strong></p>" doc = html5lib.parse(body, treebuilder="lxml") print doc.text_content()
Honestly, this is actually a hoax, as it is equivalent to the following (only the relevant parts change):
from lxml import html doc = html.fromstring(body) print doc.text_content()
If you really need the html5lib :
from lxml.html import html5parser doc = html5parser.fromstring(body) print doc.xpath("string()")
source share