Determine where documents are different from Python

I use Python difflib libraries to find where 2 documents differ. Differ () compare () method does this, but very slowly. - at least 100x slower for large HTML documents compared to diff .

How can I effectively determine where two documents differ in Python? (Ideally, I am after the position, and the text itself, like SequenceMatcher (). Get_opcodes () returns.)

+3
source share
3 answers
a = open("file1.txt").readlines()
b = open("file2.txt").readlines()
count = 0
pos = 0

while 1:
    count += 1
    try:
        al = a.pop(0)
        bl = b.pop(0)
        if al != bl:
            print "files differ on line %d, byte %d" % (count,pos)
        pos += len(al)
    except IndexError:
        break
+3
source

Google diff API- python, html-, . , , , .

+2

An ugly and stupid solution: if difffaster, use it; through a call from python through subprocess, parse the output of the command for the necessary information. It will not be as fast as once diff, but perhaps faster than difflib.

+1
source

Source: https://habr.com/ru/post/1727219/


All Articles