How to check if two XML files are equivalent with Python?

Question

How to check if two XML files are equivalent with Python?

How to check if two XML files are equivalent?

For example, two XML files are the same, even if the order is different. I need to check if two XML files contain the same textual information as the order.

<a>
   <b>hello</b>
   <c><d>world</d></c>
</a>

<a>
   <c><d>world</d></c>
   <b>hello</b>
</a>

Are there any tools for this?

+3

python xml

prosseek Oct 20 '10 at 12:59

source share

2 answers

XML, , , . XML, , , , , , bazillion XML- ( lxml, ).

+2

Gintautas Miliauskas 20 . '10 13:02

Steven · Accepted Answer · 2010-10-20T16:02:25+0000

It all depends on your definition of "equivalent."

Assuming that you really only care about text nodes (for example: the tags din your example don't even matter, you only care about the content word), you can simply create a set of text nodes for each document and compare the sets. Using lxml, it might look like this:

from lxml import etree

tree1 = etree.parse('example1.xml')
tree2 = etree.parse('example2.xml')

print set(tree1.getroot().itertext()) == set(tree2.getroot().itertext())

, , - :

set(i for i in tree.getroot().itertext() if i.strip())

, , , ( , , ). , - , (, collections.defaultdict() collections.Counter python 2.7)

( , a), , , , , , xml , ( , , ).

from lxml import etree

tree1 = etree.parse('example1.xml')
tree2 = etree.parse('example2.xml')

set1 = set(etree.tostring(i, method='c14n') for i in tree1.getroot())
set2 = set(etree.tostring(i, method='c14n') for i in tree2.getroot())

print set1 == set2

. , lxml , etree.tostring() method='c14n', c14n() ElementTree, . , StringIO() ).

, , , .

: : , , "", !

How to check if two XML files are equivalent with Python?

More articles: