Python xml filtering
I have the following XML document:
<node0> <node1> <node2 a1="x1"> ... </node2> <node2 a1="x2"> ... </node2> <node2 a1="x1"> ... </node2> </node1> </node0> I want to filter node2 when a1="x2" . The user provides xpath values ββand attributes that need to be tested and filtered. I looked at some solutions in python such as BeautifulSoup, but they are too complex and do not preserve the case with text. I want the document to be the same as before, with some filters.
Can you recommend a simple and concise solution? This should not be too complicated due to its appearance. The actual XML document is not as simple as above, but the idea is the same.
In this case, xml.etree.ElementTree used, which is in the standard library:
import xml.etree.ElementTree as xee data='''\ <node1> <node2 a1="x1"> ... </node2> <node2 a1="x2"> ... </node2> <node2 a1="x1"> ... </node2> </node1> ''' doc=xee.fromstring(data) for tag in doc.findall('node2'): if tag.attrib['a1']=='x2': doc.remove(tag) print(xee.tostring(doc)) # <node1> # <node2 a1="x1"> ... </node2> # <node2 a1="x1"> ... </node2> # </node1> In this case, lxml used, which is not in the standard library, but has a more powerful syntax :
import lxml.etree data='''\ <node1> <node2 a1="x1"> ... </node2> <node2 a1="x2"> ... </node2> <node2 a1="x1"> ... </node2> </node1> ''' doc = lxml.etree.XML(data) e=doc.find('node2/[@a1="x2"]') doc.remove(e) print(lxml.etree.tostring(doc)) # <node1> # <node2 a1="x1"> ... </node2> # <node2 a1="x1"> ... </node2> # </node1> Edit: If node2 deeper into xml, you can node2 over all tags, check each parent tag to see if node2 one of its children, and remove it if it is:
Using only xml.etree.ElementTree:
doc=xee.fromstring(data) for parent in doc.getiterator(): for child in parent.findall('node2'): if child.attrib['a1']=='x2': parent.remove(child) Using lxml:
doc = lxml.etree.XML(data) for parent in doc.iter('*'): child=parent.find('node2/[@a1="x2"]') if child is not None: parent.remove(child)