Match siblings according to tag attributes in xml using Python and libxml2


I am new to programming and maybe somewhere lacking the basics.

I have xml:

<mother> <daughter nr='1' state='nice' name='Ada'> <daughter nr='2' state='naughty' name='Beta'> <daughter nr='3' state='nice' name='Cecilia'> <daughter nr='4' state='neither' name='Dora'> <daughter nr='5' state='naughty' name='Elis'> </mother> 

I need to pick up naughty and good daughters according to their number (sweet and her closest naughty) and type pairs:

 Ada Beta Cecilia Elis 

my code is:

 import libxml2, sys doc = libxml2.parseFile("file.xml") tree = doc.xpathNewContext() nice = tree.xpathEval("//daugter[@state='nice']") for l in nice: print l.prop("name") nice_nr = [] for n in nice: nice_nr.append(n.prop("nr")) # and the same for the naugty daugters doc.freeDoc() 

So, I can get the values ​​of my attributes, but I can’t figure out how to pair them.
What I can find is the next-brother axis for Xpath, but of all the examples I could find, I'm not sure if it can be used here. The syntax is quite different, and it accepts all of the following siblings. Any help is appreciated.

+4
source share
3 answers

Using

  /*/daughter[@state = 'nice'][1] | /*/daughter[@state = 'nice'][1] /following-sibling::daughter[@state='naughty'] [1] 

Here a couple of the first pleasant daughter and her closest naughty daughter are selected.

To select the second such pair, use:

  /*/daughter[@state = 'nice'][2] | /*/daughter[@state = 'nice'][2] /following-sibling::daughter[@state='naughty'] [1] 

... etc.

Note that these expressions do not guarantee that the node parameter will be selected at all - there cannot be daughter elements, or not every nice daughter element can have the next sibling daughter element, which is naughty.

If it is guaranteed that the document has strictly daughter elements strictly ( 'nice' , 'naughty ), then you can use a very simple XPath expression to get all pairs

/ * / daughter [@state = 'nice' or @state = 'naughty']

This selects all daughter elements that are children of the top element, and have a variable state attribute with values: nice, naughty, nice, naughty, ...

If the XPath API used gets them in an array of objects , then for each even k pair of daughters is in the kth and (k + 1) -th members of this array.

+3
source

Each XPath expression will return a list of ordered nodes. Just write down the lists together to find the matching pairs:

 xpath = lambda state: tree.xpathEval("//daughter[@state='%s']" % state) for nodes in zip(xpath('nice'), xpath('naughty')): print ' '.join(n.prop('name') for n in nodes) 

Above, xpath is a function that evaluates an XPath expression that returns daughters matching a given state . Then two lists are passed zip , which return the tuple of the i-th element from each list.

If the child nodes are not listed in the XML file in order, you can sort the nodes using the nr attribute before passing them to zip .

0
source

I have a solution without xpath. The number of daughters is also taken into account. The dock passes only once.

 from lxml.etree import fromstring data = """the-xml-above""" def fetch_sorted_daughters(data): # load data into xml document doc = fromstring(data) nice = [] naughty = [] # extract into doubles - number, name for subelement in doc: if subelement.tag=='daughter': nr = subelement.get('nr') name = subelement.get('name') if subelement.get('state')=='nice': nice.append((nr, name)) if subelement.get('state')=='naughty': naughty.append((nr, name)) del doc # release document # sort doubles nice.sort(key=lambda x:x[0]) naughty.sort(key=lambda x:x[0]) # get sorted names from doubles nice = tuple([double[1] for double in nice]) naughty = tuple([double[1] for double in naughty]) return nice, naughty nice, naughty = fetch_sorted_daughters(data) pairs = zip(nice, naughty) print pairs 
0
source

Source: https://habr.com/ru/post/1343494/


All Articles