Python recursive parsing using elementtree

I am trying to parse the below XML using Python ElementTree to output the product as shown below. I am trying to write modules for the top elements to print them. However, this is a little complicated, because the category element may or may not have properties, and the cathory element may have a category element inside.

I mentioned the previous question in this section, but they did not consist of nested elements with the same name

My code: http://pastebin.com/Fsv2Xzqf

work.xml: <suite id="1" name="MainApplication"> <displayNameKey>my Application</displayNameKey> <displayName>my Application</displayName> <application id="2" name="Sub Application1"> <displayNameKey>sub Application1</displayNameKey> <displayName>sub Application1</displayName> <category id="2423" name="about"> <displayNameKey>subApp.about</displayNameKey> <displayName>subApp.about</displayName> <category id="2423" name="comms"> <displayNameKey>subApp.comms</displayNameKey> <displayName>subApp.comms</displayName> <property id="5909" name="copyright" type="string_property" width="40"> <value>2014</value> </property> <property id="5910" name="os" type="string_property" width="40"> <value>Linux 2.6.32-431.29.2.el6.x86_64</value> </property> </category> <property id="5908" name="releaseNumber" type="string_property" width="40"> <value>9.1.0.3.0.54</value> </property> </category> </application> </suite> 

The output should be as follows:

 Suite: MainApplication Application: Sub Application1 Category: about property: releaseNumber | 9.1.0.3.0.54 category: comms property: copyright | 2014 property: os | Linux 2.6.32-431.29.2.el6.x86_64 

Any pointers in the right direction will help.

+6
source share
1 answer
 import xml.etree.ElementTree as ET tree = ET.ElementTree(file='work.xml') indent = 0 ignoreElems = ['displayNameKey', 'displayName'] def printRecur(root): """Recursively prints the tree.""" if root.tag in ignoreElems: return print ' '*indent + '%s: %s' % (root.tag.title(), root.attrib.get('name', root.text)) global indent indent += 4 for elem in root.getchildren(): printRecur(elem) indent -= 4 root = tree.getroot() printRecur(root) 

OUTPUT:

 Suite: MainApplication Application: Sub Application1 Category: about Category: comms Property: copyright Value: 2014 Property: os Value: Linux 2.6.32-431.29.2.el6.x86_64 Property: releaseNumber Value: 9.1.0.3.0.54 

This is closest to 5 minutes. You should just call the processor function recursively and that will take care. You can improve from now on :)


You can also define a handler function for each tag and put all the words in the dictionary for easy search. Then you can check if you have the appropriate handler function for this tag, and then call it differently, just keep typing blindly. For instance:

 HANDLERS = { 'property': 'handle_property', <tag_name>: <handler_function> } def handle_property(root): """Takes property root element and prints the values.""" data = ' '*indent + '%s: %s ' % (root.tag.title(), root.attrib['name']) values = [] for elem in root.getchildren(): if elem.tag == 'value': values.append(elem.text) print data + '| %s' % (', '.join(values)) # printRecur would get modified accordingly. def printRecur(root): """Recursively prints the tree.""" if root.tag in ignoreElems: return global indent indent += 4 if root.tag in HANDLERS: handler = globals()[HANDLERS[root.tag]] handler(root) else: print ' '*indent + '%s: %s' % (root.tag.title(), root.attrib.get('name', root.text)) for elem in root.getchildren(): printRecur(elem) indent -= 4 

The result is higher:

 Suite: MainApplication Application: Sub Application1 Category: about Category: comms Property: copyright | 2014 Property: os | Linux 2.6.32-431.29.2.el6.x86_64 Property: releaseNumber | 9.1.0.3.0.54 

I think this is very useful, and not put a few tons of if / else in the code.

+8
source

Source: https://habr.com/ru/post/981674/


All Articles