XML sorting in python etree

I know this question has been asked before, but I try my best to get it to work with my example and really appreciate some help. What I'm trying to achieve looks pretty straightforward: I have 2 files, 1 similar to the one below, and the second almost the same, except that it has only LAYER and then an IDENTIFIED NAME - i.e. no MASTER.

<MASTER> <LAYER NAME="LAYER B"> <TEST NAME="Soup1"> <TITLE>Title 2</TITLE> <SCRIPTFILE>PAth 2</SCRIPTFILE> <ASSET_FILE PATH="Path 22" /> <ARGS> <ARG ID="arg_21">some_Arg11</ARG> <ARG ID="arg_22">some_Arg12</ARG> </ARGS> <TIMEOUT OSTYPE="111">1200</TIMEOUT> </TEST> <TEST NAME="Bread2"> <TITLE>Title 1</TITLE> <SCRIPTFILE>PAth 1</SCRIPTFILE> <ASSET_FILE PATH="Path 11" /> <ARGS> <ARG ID="arg_11">some_Arg12</ARG> <ARG ID="arg_12">some_Arg22</ARG> </ARGS> <TIMEOUT OSTYPE="2222">1000</TIMEOUT> </TEST> </LAYER> <LAYER NAME="LAYER A"> <TEST NAME="Soup2"> <TITLE>Title 2</TITLE> <SCRIPTFILE>PAth 2</SCRIPTFILE> <ASSET_FILE PATH="Path 22" /> <ARGS> <ARG ID="arg_21">some_Arg11</ARG> <ARG ID="arg_22">some_Arg12</ARG> </ARGS> <TIMEOUT OSTYPE="111">1200</TIMEOUT> </TEST> <TEST NAME="Bread2"> <TITLE>Title 1</TITLE> <SCRIPTFILE>PAth 1</SCRIPTFILE> <ASSET_FILE PATH="Path 11" /> <ARGS> <ARG ID="arg_11">some_Arg12</ARG> <ARG ID="arg_12">some_Arg22</ARG> </ARGS> <TIMEOUT OSTYPE="2222">1000</TIMEOUT> </TEST> </LAYER> </MASTER> 

and all I am trying to do is sort these files based on NAME, observing individual LAYERS.

In the above scenario, LAYER A must be before LAYER B and inside each layer, they must be ordered by NAME, which means Bread before Soup. For my second scenario, I do not have these sublayers.

 <LAYER> <TEST NAME="Soup1"> <TITLE>Title 2</TITLE> <SCRIPTFILE>PAth 2</SCRIPTFILE> <ASSET_FILE PATH="Path 22" /> <ARGS> <ARG ID="arg_21">some_Arg11</ARG> <ARG ID="arg_22">some_Arg12</ARG> </ARGS> <TIMEOUT OSTYPE="111">1200</TIMEOUT> </TEST> <TEST NAME="Bread2"> <TITLE>Title 1</TITLE> <SCRIPTFILE>PAth 1</SCRIPTFILE> <ASSET_FILE PATH="Path 11" /> <ARGS> <ARG ID="arg_11">some_Arg12</ARG> <ARG ID="arg_12">some_Arg22</ARG> </ARGS> <TIMEOUT OSTYPE="2222">1000</TIMEOUT> </TEST> </LAYER> 

and I want them to be sorted by TEST NAME.

Thanks in advance for your help to be appreciated.

+5
source share
2 answers

Using ElementTree , you can do this:

 import xml.etree.ElementTree as ET def sortchildrenby(parent, attr): parent[:] = sorted(parent, key=lambda child: child.get(attr)) tree = ET.parse('input.xml') root = tree.getroot() sortchildrenby(root, 'NAME') for child in root: sortchildrenby(child, 'NAME') tree.write('output.xml') 
+12
source

If you want to sort in a recursive way, processing comments and sorting by all attributes:

 #!/usr/bin/env python # encoding: utf-8 from __future__ import print_function import logging from lxml import etree def get_node_key(node, attr=None): """Return the sorting key of an xml node using tag and attributes """ if attr is None: return '%s' % node.tag + ':'.join([node.get(attr) for attr in sorted(node.attrib)]) if attr in node.attrib: return '%s:%s' % (node.tag, node.get(attr)) return '%s' % node.tag def sort_children(node, attr=None): """ Sort children along tag and given attribute. if attr is None, sort along all attributes""" if not isinstance(node.tag, str): # PYTHON 2: use basestring instead # not a TAG, it is comment or DATA # no need to sort return # sort child along attr node[:] = sorted(node, key=lambda child: get_node_key(child, attr)) # and recurse for child in node: sort_children(child, attr) def sort(unsorted_file, sorted_file, attr=None): """Sort unsorted xml file and save to sorted_file""" tree = etree.parse(unsorted_file) root = tree.getroot() sort_children(root, attr) sorted_unicode = etree.tostring(root, pretty_print=True, encoding='unicode') with open(sorted_file, 'w') as output_fp: output_fp.write('%s' % sorted_unicode) logging.info('written sorted file %s', sorted_unicode) 

Note. I am using lxml.etree ( http://lxml.de/tutorial.html )

0
source

Source: https://habr.com/ru/post/1200468/


All Articles