Optimizing the version of the Levenshtein algorithm with Damerra is better than O (n * m)

Here is the algorithm (in ruby)

#http://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance def self.dameraulevenshtein(seq1, seq2) oneago = nil thisrow = (1..seq2.size).to_a + [0] seq1.size.times do |x| twoago, oneago, thisrow = oneago, thisrow, [0] * seq2.size + [x + 1] seq2.size.times do |y| delcost = oneago[y] + 1 addcost = thisrow[y - 1] + 1 subcost = oneago[y - 1] + ((seq1[x] != seq2[y]) ? 1 : 0) thisrow[y] = [delcost, addcost, subcost].min if (x > 0 and y > 0 and seq1[x] == seq2[y-1] and seq1[x-1] == seq2[y] and seq1[x] != seq2[y]) thisrow[y] = [thisrow[y], twoago[y-2] + 1].min end end end return thisrow[seq2.size - 1] end 

My problem is that with seq1 of length 780 and seq2 of length 7238 it takes about 25 seconds to work on i7 laptop. Ideally, I would like it to be reduced to about a second, as it works as part of a web application.

I found that there is a way to optimize the distance from the vanilla levenshtein , so the runtime drops from O (n * m) to O (n + d ^ 2), where n is the length of the longer line, d is the editing distance. So my question is, can the same optimization apply to the damerau version that I have (above)?

+4
source share
1 answer

Yes, optimization can be applied to the damereau version. Here is the haskell code for this (I don't know Ruby):

 distd :: Eq a => [a] -> [a] -> Int distd ab = last (if lab == 0 then mainDiag else if lab > 0 then lowers !! (lab - 1) else{- < 0 -} uppers !! (-1 - lab)) where mainDiag = oneDiag ab (head uppers) (-1 : head lowers) uppers = eachDiag ab (mainDiag : uppers) -- upper diagonals lowers = eachDiag ba (mainDiag : lowers) -- lower diagonals eachDiag a [] diags = [] eachDiag a (bch:bs) (lastDiag:diags) = oneDiag a bs nextDiag lastDiag : eachDiag a bs diags where nextDiag = head (tail diags) oneDiag ab diagAbove diagBelow = thisdiag where doDiag [_] b nw nw = [] doDiag a [_] nw nw = [] doDiag (apr:ach:as) (bpr:bch:bs) nw nw = me : (doDiag (ach:as) (bch:bs) me (tail n) (tail w)) where me = if ach == bch then nw else if ach == bpr && bch == apr then nw else 1 + min3 (head w) nw (head n) firstelt = 1 + head diagBelow thisdiag = firstelt : doDiag ab firstelt diagAbove (tail diagBelow) lab = length a - length b min3 xyz = if x < y then x else min yz distance :: [Char] -> [Char] -> Int distance ab = distd ('0':a) ('0':b) 

The above code is an adaptation of this code .

0
source

Source: https://habr.com/ru/post/1403988/


All Articles