I personally recommend you use an XML parser that fully supports xpath expressions. The subcomponent supported by xml.etree is not enough for such tasks.
For example, in lxml I can do:
"give me all the children of the children of the <item> node":
doc.xpath('//item/*/child::*')
or,
"give me all the children of <item> who don’t have the children themselves":
doc.xpath('/item/*[count(child::*) = 0]') Out[20]: [<Element a1 at 0x7f60ec1c1588>, <Element a2 at 0x7f60ec1c15c8>, <Element a3 at 0x7f60ec1c1608>]
or,
"give me ALL elements that have no children":
doc.xpath('//*[count(child::*) = 0]') Out[29]: [<Element a1 at 0x7f60ec1c1588>, <Element a2 at 0x7f60ec1c15c8>, <Element a3 at 0x7f60ec1c1608>, <Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>] # and if I only care about the text from those nodes... doc.xpath('//*[count(child::*) = 0]/text()') Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22']
source share