Reading DOCTYPE XML Information Using Python

I need to parse a version of an XML file as follows.

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE twReport [ 
<!ELEMENT twReport (twHead?, (twWarn | twDebug | twInfo)*, twBody, twSum?, 
               twDebug*, twFoot?, twClientInfo?)> 
<!ATTLIST twReport version CDATA "10,4"> <----- VERSION INFO HERE

I use xml.dom.minidom to parse an XML file, and I need to parse the version of the XML file recorded in the embedded DTD.

  • Can I use xml.dom.minidom for this purpose?
  • Is there any python XML parser for this purpose?
+3
source share
2 answers

How about xmlproc DTD api

Here is a random piece of code that I wrote many years ago to do some work with DTDs from Python, which may give you an idea of ​​how it works with this library:

from xml.parsers.xmlproc import dtdparser

attr_separator = '_'
child_separator = '_'

dtd = dtdparser.load_dtd('schedule.dtd')

for name, element in dtd.elems.items():
    for attr in element.attrlist:
        output = '%s%s%s = ' % (name, attr_separator, attr)
        print output
    for child in element.get_valid_elements(element.get_start_state()):
        output = '%s%s%s = ' % (name, child_separator, child)
        print output

(FYI, , "python dtd parser" )

+2

XML (xml.dom.minidom xml.etree) (xml.parsers.expat), "" XML-, .

, lxml BeautifulSoup, , , , .

0

Source: https://habr.com/ru/post/1730221/


All Articles