How to remove an item in lxml

Question

How to remove an item in lxml

I need to completely remove elements based on attribute content using python lxml. Example:

import lxml.etree as et xml=""" <groceries> <fruit state="rotten">apple</fruit> <fruit state="fresh">pear</fruit> <fruit state="fresh">starfruit</fruit> <fruit state="rotten">mango</fruit> <fruit state="fresh">peach</fruit> </groceries> """ tree=et.fromstring(xml) for bad in tree.xpath("//fruit[@state=\'rotten\']"): #remove this element from the tree print et.tostring(tree, pretty_print=True)

I would like this to print:

 <groceries> <fruit state="fresh">pear</fruit> <fruit state="fresh">starfruit</fruit> <fruit state="fresh">peach</fruit> </groceries>

Is there a way to do this without saving the temporary variable and printing it manually, like:

 newxml="<groceries>\n" for elt in tree.xpath('//fruit[@state=\'fresh\']'): newxml+=et.tostring(elt) newxml+="</groceries>"

+64

python xml lxml

ewok Nov 02 2018-11-11T00:

source share

4 answers

You are looking for the remove function. Call the tree deletion method and pass it the subelement for deletion.

 import lxml.etree as et xml=""" <groceries> <fruit state="rotten">apple</fruit> <fruit state="fresh">pear</fruit> <punnet> <fruit state="rotten">strawberry</fruit> <fruit state="fresh">blueberry</fruit> </punnet> <fruit state="fresh">starfruit</fruit> <fruit state="rotten">mango</fruit> <fruit state="fresh">peach</fruit> </groceries> """ tree=et.fromstring(xml) for bad in tree.xpath("//fruit[@state='rotten']"): bad.getparent().remove(bad) print et.tostring(tree, pretty_print=True)

Result:

 <groceries> <fruit state="fresh">pear</fruit> <fruit state="fresh">starfruit</fruit> <fruit state="fresh">peach</fruit> </groceries>

+24

Acorn Nov 02 2018-11-11T00:

source share

I met one situation:

 <div> <script> some code </script> text here </div>

div.remove(script) will remove the text here which I did not mean.

after the answer here, I found that etree.strip_elements is the best solution for me that you can control whether you delete text with the with_tail=(bool) parameter.

But still I don't know if this can use the xpath filter for the tag. Just put this for information.

Here is the document:

strip_elements (tree_or_element, * tag_names, with_tail = True)
Remove all elements with the specified tag names from the tree or subtree. This will delete the elements and their entire subtree, including all their attributes, text content and descendants. It will also remove the tail text of the element unless you explicitly set the with_tail argument of the with_tail keyword to False.
Tag names can contain wildcards, as in _Element.iter .
Note that this will not remove the element (or root ElementTree) that you passed in, even if it matches. He will relate only to his descendants. If you want to include the root element, check its tag name immediately before calling this function.
Usage Example ::
  strip_elements(some_element, 'simpletagname', # non-namespaced tag '{http://some/ns}tagname', # namespaced tag '{http://some/other/ns}*' # any tag from a namespace lxml.etree.Comment # comments ) 

+9

zephor Dec 28 '16 at 9:46

source share

As already mentioned, you can use the remove() method to remove (sub) elements from the tree:

 for bad in tree.xpath("//fruit[@state=\'rotten\']"): bad.getparent().remove(bad)

But it removes the element, including its tail , which is a problem if you are processing mixed content documents such as HTML:

 <div><fruit state="rotten">avocado</fruit> Hello!</div>

becomes

 <div></div>

Which I assume that you do not always want :) I created a helper function to remove only the element and keep its tail:

 def remove_element(el): parent = el.getparent() if el.tail.strip(): prev = el.getprevious() if prev: prev.tail = (prev.tail or '') + el.tail else: parent.text = (parent.text or '') + el.tail parent.remove(el) for bad in tree.xpath("//fruit[@state=\'rotten\']"): remove_element(bad)

This way it will save the tail text:

 <div> Hello!</div>

0

Messa Dec 01 '18 at 16:33

source share

Cédric Julien · Accepted Answer · 2011-11-02 14:22

Use the remove xmlElement method:

 tree=et.fromstring(xml) for bad in tree.xpath("//fruit[@state=\'rotten\']"): bad.getparent().remove(bad) # here I grab the parent of the element to call the remove directly on it print et.tostring(tree, pretty_print=True, xml_declaration=True)

If I had to compare with the @Acorn version, my work will work even if the items to be deleted are not directly under the root node of your xml.

How to remove an item in lxml

More articles: