item 1

Python xml to dictionary to iterate over elements

I have the following XML example

<?xml version="1.0"?> <test> <items> <item>item 1</item> <item>item 2</item> </items> </test> 

I need to iterate over each tag in a for loop in python. If I tried a lot of things, but I just can't get it.

thanks for the help

+4
source share
4 answers

I personally use xml.etree.cElementTree since I found that it works very well, fast, easy to use , and works well with large (> 2 GB) files .

 import xml.etree.cElementTree as etree with open(xml_file_path) as xml_file: tree = etree.iterparse(xml_file) for items in tree: for item in items: print item.text 

In the interactive console

 >>> x="""<?xml version="1.0"?> <test> <items> <item>item 1</item> <item>item 2</item> </items> </test>""" >>> x '<?xml version="1.0"?>\n<test>\n <items>\n <item>item 1</item>\n <item>item 2</item>\n </items>\n</test>' >>> import xml.etree.cElementTree as etree >>> tree = etree.fromstring(x) >>> tree <Element 'test' at 0xb63ad248> >>> for i in tree: for j in i: print j <Element 'item' at 0xb63ad2f0> <Element 'item' at 0xb63ad338> >>> for i in tree: for j in i: j.text 'item 1' 'item 2' >>> 
+6
source

Try the XML parser from the xml.sax package in the standard library.

  from xml.sax import parse
 from xml.sax.handler import ContentHandler
 from sys import argv

 class Handler (ContentHandler):
     def startElementNS (self, name, qname, attrs):
         self.startElement (name, attrs)

     def endElementNs (self, name, qname):
         self.endElement (name, attrs)

     def startElement (self, name, qname, attrs):
         ... do whatever you like on tag start ...

     def characters (self, content):
         ... on tag content ...

     def endElement (self, name):
         ... on tag closing ...

 if __name__ == "__main__":
     parse (argv [1], Handler ())

Here, I suggested that argv [1] is the path to the file you want to parse. (the first argument to the parse () function is the file name or stream). It is easy to convert it into a loop: just take all the necessary information in the above methods and paste them into some list or stack. Iterate over it as soon as you finish the parsing.

+1
source
 import xml.dom.minidom as md x='''<?xml version="1.0"?> <test> <items> <item>item 1</item> <item>item 2</item> </items> </test> ''' xml=md.parseString(x) items=xml.getElementsByTagName("item") # [<DOM Element: item at 0xc16e40>, <DOM Element: item at 0xc16ee0>] 

since items is a DOM Element Array, you can loop through with for

+1
source

You probably would like to use something like ElementTree. This is a well-known library, I have not used it personally, but I always hear things well.

Also, as in python 2.5, it is part of the standard library

0
source

Source: https://habr.com/ru/post/1303000/


All Articles