Python: namespaces in xml ElementTree (or lxml)

I want to get the old xml file, manipulate and save it.

Here is my code:

from xml.etree import cElementTree as ET NS = "{http://www.somedomain.com/XI/Traffic/10}" def fix_xml(filename): f = ET.parse(filename) root = f.getroot() eventlist = root.findall("%(ns)Event" % {'ns':NS }) xpath = "%(ns)sEventDetail/%(ns)sEventDescription" % {'ns':NS } for event in eventlist: desc = event.find(xpath) desc.text = desc.text.upper() # do some editting to the text. ET.ElementTree(root, nsmap=NS).write("out.xml", encoding="utf-8") shorten_xml("test.xml") 

The download file contains:

 xmlns="http://www.somedomain.com/XI/Traffic/10" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.somedomain.com/XI/Traffic/10 10.xds" 

in the root tag.

I have the following namespace issues:

  • As you can see, for each tag call, I gave a namespace at the beginning to return the child.
  • The generated xml file does not have <?xml version="1.0" encoding="utf-8"?> At the beginning.
  • The output tags contain such <ns0:eventDescription> , while I need the output as the original <eventDescription> , with no namespace at the beginning.

How can they be solved?

+4
source share
2 answers

See the lxml section in the namespace . Also this article is about namespaces in ElementTree .

Problem 1: Rely on her, like everyone else. Instead of "%(ns)Event" % {'ns':NS } try NS+"Event" .

Problem 2: By default, an XML declaration is written only when necessary. You can force it (lxml only) using xml_declaration=True in your write() call.

Problem 3: The nsmap argument looks like lxml-only. AFAICT he needs MAPping, not a string. Try nsmap={None: NS} . The effbot article has a section describing a workaround for this.

+4
source

To answer your questions in order:

  • you can't just ignore the namespace, not the syntax of the path that uses .findall() , but not in the "real" xpath (supported by lxml): there you will still be forced to use a prefix and you still need to provide some prefix mapping- uri.

  • use xml_declaration=True , as well as encoding='utf-8' with a call to .write() (available in lxml, but in stdlib xml.etree only with python 2.7, I believe)

  • I believe lxml will behave the way you want

+1
source

Source: https://habr.com/ru/post/1338248/


All Articles