Recommended way to generate XHTML documents using lxml

The Python lxml library seems to provide several compilers for creating HTML documents. What is the difference between the two?

But they generate simple HTML, not XHTML. Although I could manually add xmlns declarations, this is not elegant. So what is the recommended way to create X HTML documents with lxml?

lxml.builder.E

Example from http://lxml.de/tutorial.html#the-e-factory :

 >>> from lxml.builder import E >>> def CLASS(*args): # class is a reserved word in Python ... return {"class":' '.join(args)} >>> html = page = ( ... E.html( # create an Element called "html" ... E.head( ... E.title("This is a sample document") ... ), ... E.body( ... E.h1("Hello!", CLASS("title")), ... Ep("This is a paragraph with ", Eb("bold"), " text in it!"), ... Ep("This is another paragraph, with a", "\n ", ... Ea("link", href="http://www.python.org"), "."), ... Ep("Here are some reserved characters: <spam&egg>."), ... etree.XML("<p>And finally an embedded XHTML fragment.</p>"), ... ) ... ) ... ) 

lxml.html.builder

Example from http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory :

 >>> from lxml.html import builder as E >>> from lxml.html import usedoctest >>> html = E.HTML( ... E.HEAD( ... E.LINK(rel="stylesheet", href="great.css", type="text/css"), ... E.TITLE("Best Page Ever") ... ), ... E.BODY( ... E.H1(E.CLASS("heading"), "Top News"), ... EP("World News only on this page", style="font-size: 200%"), ... "Ah, and here some more text, by the way.", ... lxml.html.fromstring("<p>... and this is a parsed fragment ...</p>") ... ) ... ) 
+6
source share
2 answers

Python's lythml library provides several compilers for creating HTML documents. What is the difference between the two?

lxml.builder.E uses the factory template

  from lxml.html import builder as E
  from lxml.html import usedoctest
  html = E.HTML (
    E.HEAD (
      E.LINK (rel = "stylesheet", href = "great.css", type = "text / css"),
      E.TITLE ("Best Page Ever")
    ),
    E.BODY (
      E.H1 (E.CLASS ("heading"), "Top News"),
      EP ("World News only on this page", style = "font-size: 200%"),
      "Ah, and here some more text, by the way.",
      lxml.html.fromstring (" 

... and this is a parsed fragment ...

"))

lxml.builder uses the prototype template:

  from lxml.builder import E

  def CLASS (* args): # class is a reserved word in Python
      return {"class": '' .join (args)}

  html = page = (
    E.html (# create an Element called "html"
      E.head (
        E.title ("This is a sample document")
      ),
      E.body (
        E.h1 ("Hello!", CLASS ("title")),
        Ep ("This is a paragraph with", Eb ("bold"), "text in it!"),
        Ep ("This is another paragraph, with a", "\ n",
          Ea ("link", href = "http://www.python.org"), "."),
        Ep ("Here are some reserved characters:."),
        etree.XML (" 

And finally an embedded XHTML fragment.

"),)))

While I could manually add xmlns declarations, this is inelegant.

XSLT will be another option.

 <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <xsl:output method="xml" encoding="utf-8" version="" indent="yes" standalone="no" media-type="text/html" omit-xml-declaration="no" doctype-system="about:legacy-compat" /> <xsl:template match="/"> <html xmlns="http://www.w3.org/1999/xhtml"> <xsl:copy-of select="."/> </html> </xsl:template> </xsl:stylesheet> 

References

0
source

Mixing ElementMaker and E from lxml.builder does the trick for me:

 from lxml import etree from lxml.builder import ElementMaker,E M=ElementMaker(namespace=None, nsmap={None: "http://www.w3.org/1999/xhtml"}) html = M.html(E.head(E.title("Test page")), E.body(Ep("Hello world"))) result = etree.tostring(html, xml_declaration=True, doctype='<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">', encoding='utf-8', standalone=False, with_tail=False, method='xml', pretty_print=True) print result 

Result

 <?xml version='1.0' encoding='utf-8' standalone='no'?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>Test page</title> </head> <body> <p>Hello world</p> </body> </html> 
0
source

Source: https://habr.com/ru/post/917322/


All Articles