Determine where documents are different from Python

Question

Determine where documents are different from Python

I use Python difflib libraries to find where 2 documents differ. Differ () compare () method does this, but very slowly. - at least 100x slower for large HTML documents compared to diff .

How can I effectively determine where two documents differ in Python? (Ideally, I am after the position, and the text itself, like SequenceMatcher (). Get_opcodes () returns.)

+3

python diff document difflib

hoju Jan 4 '10 at 11:39

source share

3 answers

Google diff API- python, html-, . , , , .

+2

Raja Selvaraj 04 . '10 13:13

An ugly and stupid solution: if difffaster, use it; through a call from python through subprocess, parse the output of the command for the necessary information. It will not be as fast as once diff, but perhaps faster than difflib.

+1

miku Jan 4 '10 at 12:18

source share

Kimvais · Accepted Answer · 2010-01-04T12:30:02+0000

a = open("file1.txt").readlines()
b = open("file2.txt").readlines()
count = 0
pos = 0

while 1:
    count += 1
    try:
        al = a.pop(0)
        bl = b.pop(0)
        if al != bl:
            print "files differ on line %d, byte %d" % (count,pos)
        pos += len(al)
    except IndexError:
        break

Determine where documents are different from Python

More articles: