Try the XML parser from the xml.sax package in the standard library.
from xml.sax import parse
from xml.sax.handler import ContentHandler
from sys import argv
class Handler (ContentHandler):
def startElementNS (self, name, qname, attrs):
self.startElement (name, attrs)
def endElementNs (self, name, qname):
self.endElement (name, attrs)
def startElement (self, name, qname, attrs):
... do whatever you like on tag start ...
def characters (self, content):
... on tag content ...
def endElement (self, name):
... on tag closing ...
if __name__ == "__main__":
parse (argv [1], Handler ())
Here, I suggested that argv [1] is the path to the file you want to parse. (the first argument to the parse () function is the file name or stream). It is easy to convert it into a loop: just take all the necessary information in the above methods and paste them into some list or stack. Iterate over it as soon as you finish the parsing.
source share