Levenshtein matrix using only a diagonal strip

Question

Levenshtein matrix using only a diagonal strip

According to wikipedia, there is a possible modification of the Wagner-Fisher algorithm, which can calculate if the Levenshtein distance of two words is below a certain threshold, which is much faster than the original, if that's all you want to know.

"By studying diagonals instead of lines and using a lazy estimate, we can find the Levenshtein distance in O (m (1 + d)) time (where d is the Levenshtein distance), which is much faster than the usual dynamic programming algorithm if the distance is small."

How does this solution work? It’s very difficult for me to visualize it, because it feels that the value of any cell in the matrix depends on the values of the cells above, to the left and diagonally to the left of it, so I'm not sure how to go through the matrix using only the diagonal strip.

+4

algorithm matrix levenshtein distance string-metric

Nyfiken gul Feb 08 '17 at 12:12

source share

1 answer

David Eisenstat · Accepted Answer · 2017-02-08T14:03:01+0000

Second attempt at explanation:

Suppose we find the distance between the word length-m and the length-n word. Let matrix entries are indexed at [0, m] & times; [0, n], where the entry (i, j) represents the editing distance between the prefix length-i of the word length-m and the prefix length-j of the word length-n.

(0, 0) (m, n) , , -1 -1 -0 -1 , j. , , A* H (i, j) = | (m - ) - (n - j) |. , A * d. :

   o t h e r w o r d
 t * * *
 h   * * *
 e     * * *
 w       * * *
 o         * * *
 r           * * *
 d             * * *

:

(i, j) | - j |, , . , (i, j) | - j | ≤ d,

   o t h e r w o r d
 t * * *
 h * * * *
 e * * * * *
 w   * * * * *
 o     * * * * *
 r       * * * * *
 d         * * * * *

d = 2. , , d. , ≤ d , d + 1, , .

,

   o t h e r w o r d
 t * * *
 h   * * *
 e     * * *
 w       * * *
 o         * * *
 r           * * *
 d             * * *

.

Levenshtein matrix using only a diagonal strip

More articles: