Diff algorithm, i.e. The shortest way to edit graphics

I am trying to understand the algorithm from the famous "diff" paper here , running on characters in two command line arguments. However, my code does not produce the expected results. I wrote an algorithm so that it matched their variables:

$ ./diff abbcabba cbabac #Hmm.. I think it should be 5 SES: 10 $ ./diff abcdefg abcdefg #0! Great! SES: 0 $ ./diff abcXefg abcYefg # 2 seems reasonable SES: 2 $ ./diff abcXefg abcefg # clearly wrong SES: 7 

Here is my code (Sorry for the code wall):

  a = argv[1]; b = argv[2]; alen = strlen(a); blen = strlen(b); tlen = alen + blen; maxd = tlen; vp = (int *)calloc(2 * maxd + 1, sizeof(int)); // How I'm thinking about pointer arithmetic: // vp in [0, 2*maxd + 1) == [0, 2*maxd] // v in [-maxd, maxd + 1) == [-maxd, maxd] v = vp + maxd; v[1] = 0; for (D = 0; D <= maxd; D++) { for (k = -D; k <= D; k += 2) { if (k == -D || (k != D && v[k-1] < v[k+1])) { x = v[k + 1]; } else { x = v[k - 1] + 1; } y = x - k; while (x < alen && y < blen && a[x] == b[x]) { x++; y++; } v[k] = x; if (x >= alen && y >= blen) { printf("SES: %d\n", D); goto out; } } } printf("Nothing found. SES > %d\n", maxd); 

Any idea where the flaw is here? I find it very difficult to find this problem on the Internet ...

+4
source share
1 answer

It seems that the problem is in this line:

 while (x < alen && y < blen && a[x] == b[x]) { 

Here b[x] should be b[y] , which gives:

 while (x < alen && y < blen && a[x] == b[y]) { 

With this change, the results for your examples are 6, 0, 2, and 1. This seems accurate.

+4
source

Source: https://habr.com/ru/post/1399273/


All Articles