Levenshtein Algorithm: How do I meet these requirements for text editing?

I use the levenshtein algorithm to satisfy these requirements:

When finding a word of N characters, the words suggested as correction in my dictionary database are as follows:

Each word is a word of N characters that has 1 difference character with the found word. Example: found word: bearn, dictionary word: bears

Each word is a word of N + 1 characters, which has N characters equal to the found word. Example: found word: bear, dictionary word: bears

Each word dictionary has N-1 characters that have N-1 characters equal to the found word. Example: found word: bears, dictionary word: bear

I use this implementation of the Levenshtein algorithm in C ++ to find when a word has a Levenshtein number 1 (which is the Levenshtein number for all three cases), but then how can I choose a word to suggest? I read about Boyer-Moore-Horspool and Knut-Morris-Pratt, but I'm not sure how they can be useful.

#include <string>
#include <vector>
#include <algorithm>

using namespace std;

int levenshtein(const string &s1, const string &s2)
{
   string::size_type N1 = s1.length();
   string::size_type N2 = s2.length();
   string::size_type i, j;
   vector<int> T(N2+1);

   for ( i = 0; i <= N2; i++ )
      T[i] = i;

   for ( i = 0; i < N1; i++ ) {
      T[0] = i+1;
      int corner = i;
      for ( j = 0; j < N2; j++ ) {
         int upper = T[j+1];
         if ( s1[i] == s2[j] )
            T[j+1] = corner;
         else
            T[j+1] = min(T[j], min(upper, corner)) + 1;
         corner = upper;
      }
   }
   return T[N2];
}
+3
source share
4 answers

You can also add Norvig an excellent reading proofing article .

Some time has passed since I read it, but I remember that it is very similar to what you are writing about.

+6
source

, - . , .

++ . .

+2

, ? , - . ​​ , .

, , , N + 1, . , , , , , .

, Python, , .

0

, . , Levenshtein, - , , - . , , ?

Interestingly, the Damerau extension for Levenshtein may be of interest to you, where it is also believed that two replaced characters give a rating of 1 instead of 2, which returns vanilla Levenshtein.

0
source

Source: https://habr.com/ru/post/1702323/


All Articles