Empty XML element processing in Python

I am puzzled by the minidom parsing of an empty element, as shown in the next section of code.

import xml.dom.minidom

doc = xml.dom.minidom.parseString('<value></value>')
print doc.firstChild.nodeValue.__repr__()
# Out: None
print doc.firstChild.toxml()
# Out: <value/>

doc = xml.dom.minidom.Document()
v = doc.appendChild(doc.createElement('value'))
v.appendChild(doc.createTextNode(''))
print v.firstChild.nodeValue.__repr__()
# Out: ''
print doc.firstChild.toxml()
# Out: <value></value>

How can I get consistent behavior? I would like to get an empty string as the value of an empty element (this is what I put into the XML structure in the first place).

+3
source share
3 answers

By unwinding xml.dom.minidom and searching for "/"> ", we will find this:

# Method of the Element(Node) class.
def writexml(self, writer, indent="", addindent="", newl=""):
    # [snip]
    if self.childNodes:
        writer.write(">%s"%(newl))
        for node in self.childNodes:
            node.writexml(writer,indent+addindent,addindent,newl)
        writer.write("%s</%s>%s" % (indent,self.tagName,newl))
    else:
        writer.write("/>%s"%(newl))

From this we can conclude that the short-end-tag form occurs only when childNodes is an empty list. Indeed, it looks like this:

>>> doc = Document()
>>> v = doc.appendChild(doc.createElement('v'))
>>> v.toxml()
'<v/>'
>>> v.childNodes
[]
>>> v.appendChild(doc.createTextNode(''))
<DOM Text node "''">
>>> v.childNodes
[<DOM Text node "''">]
>>> v.toxml()
'<v></v>'

, XML . , , , .

xml.dom.minidom - -, . . Element toxml, , . monkeypatch , Element.

+4
value = thing.firstChild.nodeValue or ''
+1

The Xml specification does not distinguish between these two cases.

+1
source

Source: https://habr.com/ru/post/1713645/


All Articles