Define default namespace (no prefix) in lxml

When rendering XHTML using lxml, everything is fine, unless you are using Firefox, which seems to be unrelated to XHTML elements with pre-encrypted space and javascript. Although Opera can execute javascript (this is true for both jQuery and MathJax), regardless of whether the XHTML namespace has a prefix ( h: in my case) or not, in Firefox scripts will be interrupted with strange errors ( this.head undefined in case of MathJax).

I know about the register_namespace function, but does not accept None and "" as a namespace prefix. I heard about _namespace_map in the lxml.etree module, but my Python complains that this attribute does not exist (version problem?)

Is there any other way to remove the namespace prefix for the XHTML namespace? Please note that str.replace , as suggested in the answer to another related question, is not a method that I could accept, since it does not know the XML semantics and can easily ruin the resulting document.

Upon request, you will find two examples ready to use. One with namespace prefixes and one without . The first one will display 0 in Firefox (wrong), and the second will display 1 (correct). Opera will do both the right thing. This is obviously a Firefox bug , but it justifies the need for prefix XHTML with lxml - there are other good reasons to reduce traffic for mobile clients, etc. (even h: pretty much if you count dozens or hundret html tags).

+2
source share
2 answers

This XSL transform removes all prefixes from content , preserving the namespaces defined in the root directory of the node:

 import lxml.etree as ET content = '''\ <?xml version='1.0' encoding='utf-8'?> <!DOCTYPE html> <h:html xmlns:h="http://www.w3.org/1999/xhtml" xmlns:ml="http://foo"> <h:head> <h:title>MathJax Test Page</h:title> <h:script type="text/javascript"><![CDATA[ function test() { alert(document.getElementsByTagName("p").length); }; ]]></h:script> </h:head> <h:body onload="test();"> <h:p>test</h:p> <ml:foo></ml:foo> </h:body> </h:html> ''' dom = ET.fromstring(content) xslt = '''\ <xsl:stylesheet version="1.0" xmlns="http://www.w3.org/1999/xhtml" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="no"/> <!-- identity transform for everything else --> <xsl:template match="/|comment()|processing-instruction()|*|@*"> <xsl:copy> <xsl:apply-templates /> </xsl:copy> </xsl:template> <!-- remove NS from XHTML elements --> <xsl:template match="*[namespace-uri() = 'http://www.w3.org/1999/xhtml']"> <xsl:element name="{local-name()}"> <xsl:apply-templates select="@*|node()" /> </xsl:element> </xsl:template> <!-- remove NS from XHTML attributes --> <xsl:template match="@*[namespace-uri() = 'http://www.w3.org/1999/xhtml']"> <xsl:attribute name="{local-name()}"> <xsl:value-of select="." /> </xsl:attribute> </xsl:template> </xsl:stylesheet> ''' xslt_doc = ET.fromstring(xslt) transform = ET.XSLT(xslt_doc) dom = transform(dom) print(ET.tostring(dom, pretty_print = True, encoding = 'utf-8')) 

gives

 <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title>MathJax Test Page</title> <script type="text/javascript"> function test() { alert(document.getElementsByTagName("p").length); }; </script> </head> <body onload="test();"> <p>test</p> <ml:foo xmlns:ml="http://foo"/> </body> </html> 
+2
source

Use ElementMaker and give it nsmap , which maps None to the default namespace.

 #!/usr/bin/env python # dogeml.py from lxml.builder import ElementMaker from lxml import etree E = ElementMaker( nsmap={ None: "http://wow/" # <--- This is the special sauce } ) doge = E.doge( E.such('markup'), E.many('very namespaced', syntax="tricks") ) options = { 'pretty_print': True, 'xml_declaration': True, 'encoding': 'UTF-8', } serialized_bytes = etree.tostring(doge, **options) print(serialized_bytes.decode(options['encoding'])) 

As you can see in the output of this script, the default namespace is defined, but tags are not prefixed.

 <?xml version='1.0' encoding='UTF-8'?> <doge xmlns="http://wow/"> <such>markup</such> <many syntax="tricks">very namespaced</many> </doge> 

I tested this code with Python 2.7.6, 3.3.5 and 3.4.0 in combination with lxml 3.3.1.

+3
source

Source: https://habr.com/ru/post/951070/


All Articles