How to read XML header in Python

How can I read the title of an XML document in Python 3?

Ideally, I would use the defusedxml module, since the documentation claims that it is safer , but for now (after several hours of trying this), I would agree to any parser.

For example, I have a document (this is actually from an exercise) that looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0"> <!-- this is root -->
    <!-- CONTENTS -->
</plist>

I am wondering how to access everything to the root of a node.

This seems like such a general question that I thought I could easily find the answer on the Internet, but I was probably mistaken. The closest I found was this stack overflow question that really didn't help (I looked at xml.sax but couldn't find anything suitable).

+4
3

minidom, , . :

from xml.dom.minidom import parse

dom = parse('file.xml')
print('<?xml version="{}" encoding="{}"?>'.format(dom.version, dom.encoding))
print(dom.doctype.toxml())
#or
print(dom.getElementsByTagName('plist')[0].previousSibling.toxml())
#or
print(dom.childNodes[0].toxml())

:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>
<!DOCTYPE plist  PUBLIC '-//Apple Computer//DTD PLIST 1.0//EN'  'http://www.apple.com/DTDs/PropertyList-1.0.dtd'>

minidom defusedxml. from defusedxml.minidom import parse, .

+2

lxml DocInfo.

from lxml import etree

tree = etree.parse('input.xml')
info = tree.docinfo
v, e, d = info.xml_version, info.encoding, info.doctype

print('<?xml version="{}" encoding="{}"?>'.format(v, e))
print(d)

:

 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
+2

! xml 's'.

MyParser, XmlDecl XML, XML-. , ParserCreate(), xml.parsers.

"" MyParser .

from xml.parsers import expat

s = """<?xml version='1.0' encoding='iso-8859-1'?>
       <book>
           <title>Title</title>
           <chapter>Chapter 1</chapter>
       </book>"""

class MyParser(object):
    def XmlDecl(self, version, encoding, standalone):
        print ("XmlDecl", version, encoding, standalone)

    def Parse(self, data):
        Parser = expat.ParserCreate()
        Parser.XmlDeclHandler = self.XmlDecl
        Parser.Parse(data, 1)

parser = MyParser()
parser.Parse(s)
0

Source: https://habr.com/ru/post/1694024/


All Articles