Python xml filtering

Question

Python xml filtering

I have the following XML document:

<node0> <node1> <node2 a1="x1"> ... </node2> <node2 a1="x2"> ... </node2> <node2 a1="x1"> ... </node2> </node1> </node0>

I want to filter node2 when a1="x2" . The user provides xpath values and attributes that need to be tested and filtered. I looked at some solutions in python such as BeautifulSoup, but they are too complex and do not preserve the case with text. I want the document to be the same as before, with some filters.

Can you recommend a simple and concise solution? This should not be too complicated due to its appearance. The actual XML document is not as simple as above, but the idea is the same.

+4

python xml xpath elementtree

user236215 May 19, '10 at 21:30

source share

1 answer

unutbu · Accepted Answer · 2010-05-19T23:13:55+0000

In this case, xml.etree.ElementTree used, which is in the standard library:

 import xml.etree.ElementTree as xee data='''\ <node1> <node2 a1="x1"> ... </node2> <node2 a1="x2"> ... </node2> <node2 a1="x1"> ... </node2> </node1> ''' doc=xee.fromstring(data) for tag in doc.findall('node2'): if tag.attrib['a1']=='x2': doc.remove(tag) print(xee.tostring(doc)) # <node1> # <node2 a1="x1"> ... </node2> # <node2 a1="x1"> ... </node2> # </node1>

In this case, lxml used, which is not in the standard library, but has a more powerful syntax :

 import lxml.etree data='''\ <node1> <node2 a1="x1"> ... </node2> <node2 a1="x2"> ... </node2> <node2 a1="x1"> ... </node2> </node1> ''' doc = lxml.etree.XML(data) e=doc.find('node2/[@a1="x2"]') doc.remove(e) print(lxml.etree.tostring(doc)) # <node1> # <node2 a1="x1"> ... </node2> # <node2 a1="x1"> ... </node2> # </node1>

Edit: If node2 deeper into xml, you can node2 over all tags, check each parent tag to see if node2 one of its children, and remove it if it is:

Using only xml.etree.ElementTree:

 doc=xee.fromstring(data) for parent in doc.getiterator(): for child in parent.findall('node2'): if child.attrib['a1']=='x2': parent.remove(child)

Using lxml:

 doc = lxml.etree.XML(data) for parent in doc.iter('*'): child=parent.find('node2/[@a1="x2"]') if child is not None: parent.remove(child)

Python xml filtering

More articles: