How to check if two XML files are equivalent with Python?

How to check if two XML files are equivalent?

For example, two XML files are the same, even if the order is different. I need to check if two XML files contain the same textual information as the order.

<a>
   <b>hello</b>
   <c><d>world</d></c>
</a>

<a>
   <c><d>world</d></c>
   <b>hello</b>
</a>

Are there any tools for this?

+3
source share
2 answers

It all depends on your definition of "equivalent."

Assuming that you really only care about text nodes (for example: the tags din your example don't even matter, you only care about the content word), you can simply create a set of text nodes for each document and compare the sets. Using lxml, it might look like this:

from lxml import etree

tree1 = etree.parse('example1.xml')
tree2 = etree.parse('example2.xml')

print set(tree1.getroot().itertext()) == set(tree2.getroot().itertext())

, , - :

set(i for i in tree.getroot().itertext() if i.strip())

, , , ( , , ). , - , (, collections.defaultdict() collections.Counter python 2.7)

( , a), , , , , , xml , ( , , ).

from lxml import etree

tree1 = etree.parse('example1.xml')
tree2 = etree.parse('example2.xml')

set1 = set(etree.tostring(i, method='c14n') for i in tree1.getroot())
set2 = set(etree.tostring(i, method='c14n') for i in tree2.getroot())

print set1 == set2

. , lxml , etree.tostring() method='c14n', c14n() ElementTree, . , StringIO() ).

, , , .

: : , , "", !

+7

XML, , , . XML, , , , , , bazillion XML- ( lxml, ).

+2

Source: https://habr.com/ru/post/1770530/


All Articles