Search for differences between lines

I have the following function, which gets the source and the changed lines and highlights the changed words in it.

def appendBoldChanges(s1, s2): "Adds <b></b> tags to words that are changed" l1 = s1.split(' ') l2 = s2.split(' ') for i, val in enumerate(l1): if l1[i].lower() != l2[i].lower(): s2 = s2.replace(l2[i], "<b>%s</b>" % l2[i]) return s2 print appendBoldChanges("britney spirs", "britney spears") # returns britney <b>spears</b> 

It works great on lines with the same number of words, but doesn’t work on different lines of words, like sora iro days and sorairo days .

How can I consider spaces?

+6
source share
3 answers

You can use difflib and do it like this:

 from difflib import Differ def appendBoldChanges(s1, s2): "Adds <b></b> tags to words that are changed" l1 = s1.split(' ') l2 = s2.split(' ') dif = list(Differ().compare(l1, l2)) return " ".join(['<b>'+i[2:]+'</b>' if i[:1] == '+' else i[2:] for i in dif if not i[:1] in '-?']) print appendBoldChanges("britney spirs", "britney sprears") print appendBoldChanges("sora iro days", "sorairo days") #Output: britney <b>sprears</b> <b>sorairo</b> days 
+21
source

Take a look at the difflib module, you can use SequenceMatcher to find changed regions in the text.

+2
source

A small answer tp @fraxel updates, which returns 2 outputs - the original and the new version with marked changes. I also change the single-line font to a more readable version, in my opinion.

 def show_diff(text, n_text): seqm = difflib.SequenceMatcher(None, text, n_text) output_orig = [] output_new = [] for opcode, a0, a1, b0, b1 in seqm.get_opcodes(): orig_seq = seqm.a[a0:a1] new_seq = seqm.b[b0:b1] if opcode == 'equal': output_orig.append(orig_seq) output_new.append(orig_seq) elif opcode == 'insert': output_new.append("<font color=green>{}</font>".format(new_seq)) elif opcode == 'delete': output_orig.append("<font color=red>{}</font>".format(orig_seq)) elif opcode == 'replace': output_new.append("<font color=blue>{}</font>".format(new_seq)) output_orig.append("<font color=blue>{}</font>".format(orig_seq)) else: print('Error') return ''.join(output_orig), ''.join(output_new) 
+1
source

Source: https://habr.com/ru/post/916789/


All Articles