Check if Element has children or not.

I am extracting XML documents this way:

import xml.etree.ElementTree as ET root = ET.parse(urllib2.urlopen(url)) for child in root.findall("item"): a1 = child[0].text # ok a2 = child[1].text # ok a3 = child[2].text # ok a4 = child[3].text # BOOM # ... 

XML looks like this:

 <item> <a1>value1</a1> <a2>value2</a2> <a3>value3</a3> <a4> <a11>value222</a11> <a22>value22</a22> </a4> </item> 

How to check if there is a4 (in this particular case, but it could be any other element) there are children?

+6
source share
4 answers

You can try the list function for the element:

 >>> xml = """<item> <a1>value1</a1> <a2>value2</a2> <a3>value3</a3> <a4> <a11>value222</a11> <a22>value22</a22> </a4> </item>""" >>> root = ET.fromstring(xml) >>> list(root[0]) [] >>> list(root[3]) [<Element 'a11' at 0x2321e10>, <Element 'a22' at 0x2321e48>] >>> len(list(root[3])) 2 >>> print "has children" if len(list(root[3])) else "no child" has children >>> print "has children" if len(list(root[2])) else "no child" no child >>> # Or simpler, without a call to list within len, it also works: >>> print "has children" if len(root[3]) else "no child" has children 

I changed your sample because the call to the findall function in the root of item did not work (since findall will look for direct children, not the current element). If you want to access the texts of children later in your work program, you can do:

 for child in root.findall("item"): # if there are children, get their text content as well. if len(child): for subchild in child: subchild.text # else just get the current child text. else: child.text 

That would be well suited for a recursive.

+6
source

The easiest way I could find is to use the bool value of the element directly. This means that you can use a4 in the as-is conditional:

 a4 = Element('a4') if a4: print('Has kids') else: print('No kids yet') a4.append(Element('x')) if a4: print('Has kids now') else: print('Still no kids') 

Running this code will print

 No kids yet Has kids now 

The boolean value of an element says nothing about text , tail or attributes. This indicates only the presence or absence of children, which asked the initial question.

+2
source

The element class has a get children method. Therefore, you should use something like this to check if there are children and save the result in the dictionary by key = tag name:

 result = {} for child in root.findall("item"): is child.getchildren() == []: result[child.tag] = child.text 
0
source

I personally recommend you use an XML parser that fully supports xpath expressions. The subcomponent supported by xml.etree is not enough for such tasks.

For example, in lxml I can do:

"give me all the children of the children of the <item> node":

 doc.xpath('//item/*/child::*') #equivalent to '//item/*/*', if you're being terse Out[18]: [<Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>] 

or,

"give me all the children of <item> who don’t have the children themselves":

 doc.xpath('/item/*[count(child::*) = 0]') Out[20]: [<Element a1 at 0x7f60ec1c1588>, <Element a2 at 0x7f60ec1c15c8>, <Element a3 at 0x7f60ec1c1608>] 

or,

"give me ALL elements that have no children":

 doc.xpath('//*[count(child::*) = 0]') Out[29]: [<Element a1 at 0x7f60ec1c1588>, <Element a2 at 0x7f60ec1c15c8>, <Element a3 at 0x7f60ec1c1608>, <Element a11 at 0x7f60ec1c1348>, <Element a22 at 0x7f60ec1c1888>] # and if I only care about the text from those nodes... doc.xpath('//*[count(child::*) = 0]/text()') Out[30]: ['value1', 'value2', 'value3', 'value222', 'value22'] 
0
source

Source: https://habr.com/ru/post/975579/


All Articles