How can I replace an element with text in lxml?

Itโ€™s easy to completely remove a given element from an XML document using the lxml API ElementTree implementation, but I donโ€™t see a simple way to replace the element with some text sequentially. For example, given the following input:

input = '''<everything> <m>Some text before <r/></m> <m><r/> and some text after.</m> <m><r/></m> <m>Text before <r/> and after</m> <m><b/> Text after a sibling <r/> Text before a sibling<b/></m> </everything> ''' 

... you can easily remove each <r> element with:

 from lxml import etree f = etree.fromstring(data) for r in f.xpath('//r'): r.getparent().remove(r) print etree.tostring(f, pretty_print=True) 

However, how would you decide to replace each element with text to get the result:

 <everything> <m>Some text before DELETED</m> <m>DELETED and some text after.</m> <m>DELETED</m> <m>Text before DELETED and after</m> <m><b/>Text after a sibling DELETED Text before a sibling<b/></m> </everything> 

It seems to me that since the ElementTree API deals with text through the .text and .tail for each element and not for nodes in the tree, this means that you have to deal with many different cases depending on whether the element has elements -brothers or not, does the existing element have a .tail attribute, etc. Am I missing some easy way to do this?

+8
python xml lxml elementtree
Mar 24 '11 at 11:11
source share
3 answers

I think the unptbu XSLT solution is probably the right way to achieve your goal.

However, here's a somewhat hacky way to achieve it by modifying the <r/> tag tails and then using etree.strip_elements .

 from lxml import etree data = '''<everything> <m>Some text before <r/></m> <m><r/> and some text after.</m> <m><r/></m> <m>Text before <r/> and after</m> <m><b/> Text after a sibling <r/> Text before a sibling<b/></m> </everything> ''' f = etree.fromstring(data) for r in f.xpath('//r'): r.tail = 'DELETED' + r.tail if r.tail else 'DELETED' etree.strip_elements(f,'r',with_tail=False) print etree.tostring(f,pretty_print=True) 

Gives you:

 <everything> <m>Some text before DELETED</m> <m>DELETED and some text after.</m> <m>DELETED</m> <m>Text before DELETED and after</m> <m><b/> Text after a sibling DELETED Text before a sibling<b/></m> </everything> 
+12
Mar 24 2018-11-11T00:
source share

Using strip_elements has the disadvantage that you cannot force it to hold some <r> elements when replacing others. It also requires an ElementTree instance (which may not be the case). Lastly, you cannot use it to replace XML comments or processing instructions. The following should do your job:

 for r in f.xpath('//r'): text = 'DELETED' + r.tail parent = r.getparent() if parent is not None: previous = r.getprevious() if previous is not None: previous.tail = (previous.tail or '') + text else: parent.text = (parent.text or '') + text parent.remove(r) 
+6
May 9 '12 at 16:50
source share

Using ET.XSLT :

 import io import lxml.etree as ET data = '''<everything> <m>Some text before <r/></m> <m><r/> and some text after.</m> <m><r/></m> <m>Text before <r/> and after</m> <m><b/> Text after a sibling <r/> Text before a sibling<b/></m> </everything> ''' f=ET.fromstring(data) xslt='''\ <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <!-- Replace r nodes with DELETED http://www.w3schools.com/xsl/el_template.asp --> <xsl:template match="r">DELETED</xsl:template> <!-- How to copy XML without changes http://mrhaki.blogspot.com/2008/07/copy-xml-as-is-with-xslt.html --> <xsl:template match="*"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> <xsl:template match="@*|text()|comment()|processing-instruction"> <xsl:copy-of select="."/> </xsl:template> </xsl:stylesheet> ''' xslt_doc=ET.parse(io.BytesIO(xslt)) transform=ET.XSLT(xslt_doc) f=transform(f) print(ET.tostring(f)) 

gives

 <everything> <m>Some text before DELETED</m> <m>DELETED and some text after.</m> <m>DELETED</m> <m>Text before DELETED and after</m> <m><b/> Text after a sibling DELETED Text before a sibling<b/></m> </everything> 
+3
Mar 24 '11 at 12:31
source share



All Articles