Delete all namespaces in lxml?

I am working with some google data APIs using the lxml library in python. Namespaces are a huge problem. For the big work I do (mainly for xpath), it would be nice to just ignore them.

Is there an easy way to ignore xml namespaces in python / lxml?

thanks!

+4
source share
2 answers

If you want to remove all namespaces from elements and attributes, I suggest the code shown below.

Context: in my application, I get XML representations of SOAP response flows, but I'm not interested in creating objects on the client side; I'm only interested in XML views. Moreover, I am not interested in any namespace thing that only makes things more complicated than they should be for my purposes. That way, I just remove the namespaces from the elements, and I remove all the attributes that contain the namespaces.

def dropns(root): for elem in root.iter(): parts = elem.tag.split(':') if len(parts) > 1: elem.tag = parts[-1] entries = [] for attrib in elem.attrib: if attrib.find(':') > -1: entries.append(attrib) for entry in entries: del elem.attrib[entry] # Test case name = '~/tmp/mantisbt/test.xml' f = open(name, 'rb') import lxml.etree as etree parser = etree.XMLParser(ns_clean=True, recover=True) root = etree.parse(f, parser=parser) print('=====================================================================') print etree.tostring(root, pretty_print = True) print('=====================================================================') dropns(root) print etree.tostring(root, pretty_print = True) print('=====================================================================') 

which prints:

 ===================================================================== <SOAP-ENV:Envelope SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/"> <SOAP-ENV:Body> <ns1:mc_issue_getResponse> <return xsi:type="tns:IssueData"> <id xsi:type="xsd:integer">356</id> <view_state xsi:type="tns:ObjectRef"> <id xsi:type="xsd:integer">10</id> <name xsi:type="xsd:string">public</name> </view_state> </return> </ns1:mc_issue_getResponse> </SOAP-ENV:Body> </SOAP-ENV:Envelope> ===================================================================== <Envelope> <Body> <mc_issue_getResponse> <return> <id>356</id> <view_state> <id>10</id> <name>public</name> </view_state> </return> </mc_issue_getResponse> </Body> </Envelope> ===================================================================== 
+1
source

In lxml some_element.tag there is a string like {namespace-uri}local-name , if there is a namespace, just local-name otherwise. Remember that this is not a string value for non-element nodes (such as comments).

Try the following:

 for node in some_tree.iter(): startswith = getattr(node 'startswith', None) if startswith and startswith('{'): node.tag = node.tag.rsplit('}', 1)[-1] 

In Python 2.x, a tag can be either an ASCII byte string or a Unicode string. Existence of the startswith method for both.

-one
source

Source: https://habr.com/ru/post/1396892/


All Articles