As mentioned in Kevin Guerra's answer, the "root.clear ()" strategy in the ElementTree documentation removes the fully parsed children of the root. If these children fasten huge branches, it is not very useful.
He touched on the perfect solution, but did not add any code, so here is an example:
element_stack = [] context = ET.iterparse(stream, events=('start', 'end')) for event, elem in context: if event == 'start': element_stack.append(elem) elif event == 'end': element_stack.pop() # see if elem is one of interest and do something with it here if element_stack: element_stack[-1].remove(elem) del context
The item of interest will not have sub-elements; they will be deleted as soon as their end tags are visible. It might be OK if all you need is element text or attributes.
If you want to query the descendants of an element, you need to create a full branch for it. To do this, maintain a flag implemented as a depth counter for these elements. Only call .remove () when the depth is zero:
element_stack = [] interesting_element_depth = 0 context = ET.iterparse(stream, events=('start', 'end')) for event, elem in context: if event == 'start': element_stack.append(elem) if elem.tag == 'foo': interesting_element_depth += 1 elif event == 'end': element_stack.pop() if elem.tag == 'foo': interesting_element_depth -= 1 # do something with elem and its descendants here if element_stack and not interesting_element_depth: element_stack[-1].remove(elem) del context
source share