Match siblings according to tag attributes in xml using Python and libxml2

Question

Match siblings according to tag attributes in xml using Python and libxml2

I am new to programming and maybe somewhere lacking the basics.

I have xml:

<mother> <daughter nr='1' state='nice' name='Ada'> <daughter nr='2' state='naughty' name='Beta'> <daughter nr='3' state='nice' name='Cecilia'> <daughter nr='4' state='neither' name='Dora'> <daughter nr='5' state='naughty' name='Elis'> </mother>

I need to pick up naughty and good daughters according to their number (sweet and her closest naughty) and type pairs:

 Ada Beta Cecilia Elis

my code is:

 import libxml2, sys doc = libxml2.parseFile("file.xml") tree = doc.xpathNewContext() nice = tree.xpathEval("//daugter[@state='nice']") for l in nice: print l.prop("name") nice_nr = [] for n in nice: nice_nr.append(n.prop("nr")) # and the same for the naugty daugters doc.freeDoc()

So, I can get the values of my attributes, but I can’t figure out how to pair them.
What I can find is the next-brother axis for Xpath, but of all the examples I could find, I'm not sure if it can be used here. The syntax is quite different, and it accepts all of the following siblings. Any help is appreciated.

+4

python xpath attributes libxml2

zufanka Mar 13 '11 at 0:18

source share

3 answers

Each XPath expression will return a list of ordered nodes. Just write down the lists together to find the matching pairs:

 xpath = lambda state: tree.xpathEval("//daughter[@state='%s']" % state) for nodes in zip(xpath('nice'), xpath('naughty')): print ' '.join(n.prop('name') for n in nodes)

Above, xpath is a function that evaluates an XPath expression that returns daughters matching a given state . Then two lists are passed zip , which return the tuple of the i-th element from each list.

If the child nodes are not listed in the XML file in order, you can sort the nodes using the nr attribute before passing them to zip .

0

samplebias Mar 13 '11 at 2:34

source share

I have a solution without xpath. The number of daughters is also taken into account. The dock passes only once.

 from lxml.etree import fromstring data = """the-xml-above""" def fetch_sorted_daughters(data): # load data into xml document doc = fromstring(data) nice = [] naughty = [] # extract into doubles - number, name for subelement in doc: if subelement.tag=='daughter': nr = subelement.get('nr') name = subelement.get('name') if subelement.get('state')=='nice': nice.append((nr, name)) if subelement.get('state')=='naughty': naughty.append((nr, name)) del doc # release document # sort doubles nice.sort(key=lambda x:x[0]) naughty.sort(key=lambda x:x[0]) # get sorted names from doubles nice = tuple([double[1] for double in nice]) naughty = tuple([double[1] for double in naughty]) return nice, naughty nice, naughty = fetch_sorted_daughters(data) pairs = zip(nice, naughty) print pairs

0

vonPetrushev Mar 13 '11 at 10:24

source share

Dimitre novatchev · Accepted Answer · 2011-03-13T02:34:05+0000

Using

  /*/daughter[@state = 'nice'][1] | /*/daughter[@state = 'nice'][1] /following-sibling::daughter[@state='naughty'] [1]

Here a couple of the first pleasant daughter and her closest naughty daughter are selected.

To select the second such pair, use:

  /*/daughter[@state = 'nice'][2] | /*/daughter[@state = 'nice'][2] /following-sibling::daughter[@state='naughty'] [1]

... etc.

Note that these expressions do not guarantee that the node parameter will be selected at all - there cannot be daughter elements, or not every nice daughter element can have the next sibling daughter element, which is naughty.

If it is guaranteed that the document has strictly daughter elements strictly ( 'nice' , 'naughty ), then you can use a very simple XPath expression to get all pairs

/ * / daughter [@state = 'nice' or @state = 'naughty']

This selects all daughter elements that are children of the top element, and have a variable state attribute with values: nice, naughty, nice, naughty, ...

If the XPath API used gets them in an array of objects , then for each even k pair of daughters is in the kth and (k + 1) -th members of this array.

Match siblings according to tag attributes in xml using Python and libxml2

More articles: