Unwanted namespace declaration in lxml XPath

I want to select the first child of a specific element ( subelement ), but this child namespace is different from the parent namespace. Moreover, this child can be any namespace.

 xml = '''<root xmlns="default_ns"> <subelement> <!-- here we can have an element of any namespace --> <some_prefix:a xmlns:some_prefix="some_namespace"> <some_prefix:b/> </some_prefix:a> </subelement> </root>''' root = etree.fromstring(xml) evaluator = etree.XPathEvaluator(root, namespaces={'def':'default_ns'}) child = evaluator.evaluate('//def:subelement/child::*')[0] a_string = etree.tostring(child) print a_string 

This gives:

 <some_prefix:a xmlns:some_prefix="some_namespace" xmlns="default_ns"> <some_prefix:b/> </some_prefix:a> 

but what I want to get is a child without a namespace declaration from the parent xmlns="default_ns" :

 <some_prefix:a xmlns:some_prefix="some_namespace"> <some_prefix:b/> </some_prefix:a> 
source share
2 answers

Dimitre fully explained why namespaces are inherited and how to get rid of it using XSLT.

I used deepcopy from copy to remove the unwanted namespace.

This is my final solution using Python:

 from lxml import etree from copy import deepcopy xml = '''<root xmlns="default_ns"> <subelement> <!-- here we can have an element of any namespace --> <some_prefix:a xmlns:some_prefix="some_namespace"> <some_prefix:b/> </some_prefix:a> </subelement> </root>''' root = etree.fromstring(xml) evaluator = etree.XPathEvaluator(root, namespaces={'def':'default_ns'}) child = evaluator.evaluate('//def:subelement/child::*')[0] child = deepcopy(child) a_string = etree.tostring(child) print a_string 

but what I want to get is a child without a namespace declaration from parent xmlns = "default_ns".

This cannot be achieved only by evaluating the XPath expression.

In XML, any element inherits all its parent namespace nodes , unless it redefines a specific namespace.

This means that some_prefix:a inherits the default namespace "default_ns" from its parent ( subelement ), which inherits the same default namespace node from the top root element.

XPath is a query language for XML documents. Thus, it only helps to select nodes, but evaluating an XPath expression never destroys, adds, or modifies nodes, including namespace nodes .

Because of this, the default namespace node belonging to some_prefix:a cannot be destroyed as a result of evaluating your XPath expression, so this node namespace is displayed when some_prefix:a serialized into text.

Solution . Use your favorite PL that hosts XPath to remove the unwanted node namespace.

For example, if the hosting language is XSLT :

 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:d="default_ns"> <xsl:output omit-xml-declaration="yes" indent="yes"/> <xsl:strip-space elements="*"/> <xsl:template match="/"> <xsl:apply-templates mode="delNS" select="/*/d:subelement/*[1]"/> </xsl:template> <xsl:template match="*" mode="delNS"> <xsl:element name="{name()}" namespace="{namespace-uri()}"> <xsl:copy-of select="namespace::*[name()]"/> <xsl:copy-of select="@*"/> <xsl:apply-templates mode="delNS" select="node()"/> </xsl:element> </xsl:template> </xsl:stylesheet> 

when this conversion is applied to the provided XML document :

 <root xmlns="default_ns"> <subelement> <!-- here we can have an element of any namespace --> <some_prefix:a xmlns:some_prefix="some_namespace"> <some_prefix:b/> </some_prefix:a> </subelement> </root> 

required, the correct result is obtained :

 <some_prefix:a xmlns:some_prefix="some_namespace"> <some_prefix:b/> </some_prefix:a> 

Source: https://habr.com/ru/post/909581/

All Articles