Itโs easy to completely remove a given element from an XML document using the lxml API ElementTree implementation, but I donโt see a simple way to replace the element with some text sequentially. For example, given the following input:
input = '''<everything> <m>Some text before <r/></m> <m><r/> and some text after.</m> <m><r/></m> <m>Text before <r/> and after</m> <m><b/> Text after a sibling <r/> Text before a sibling<b/></m> </everything> '''
... you can easily remove each <r> element with:
from lxml import etree f = etree.fromstring(data) for r in f.xpath('//r'): r.getparent().remove(r) print etree.tostring(f, pretty_print=True)
However, how would you decide to replace each element with text to get the result:
<everything> <m>Some text before DELETED</m> <m>DELETED and some text after.</m> <m>DELETED</m> <m>Text before DELETED and after</m> <m><b/>Text after a sibling DELETED Text before a sibling<b/></m> </everything>
It seems to me that since the ElementTree API deals with text through the .text and .tail for each element and not for nodes in the tree, this means that you have to deal with many different cases depending on whether the element has elements -brothers or not, does the existing element have a .tail attribute, etc. Am I missing some easy way to do this?
python xml lxml elementtree
Mark Longair Mar 24 '11 at 11:11 2011-03-24 11:11
source share