I use the levenshtein algorithm to satisfy these requirements:
When finding a word of N characters, the words suggested as correction in my dictionary database are as follows:
Each word is a word of N characters that has 1 difference character with the found word. Example: found word: bearn, dictionary word: bears
Each word is a word of N + 1 characters, which has N characters equal to the found word. Example: found word: bear, dictionary word: bears
Each word dictionary has N-1 characters that have N-1 characters equal to the found word. Example: found word: bears, dictionary word: bear
I use this implementation of the Levenshtein algorithm in C ++ to find when a word has a Levenshtein number 1 (which is the Levenshtein number for all three cases), but then how can I choose a word to suggest? I read about Boyer-Moore-Horspool and Knut-Morris-Pratt, but I'm not sure how they can be useful.
#include <string>
#include <vector>
#include <algorithm>
using namespace std;
int levenshtein(const string &s1, const string &s2)
{
string::size_type N1 = s1.length();
string::size_type N2 = s2.length();
string::size_type i, j;
vector<int> T(N2+1);
for ( i = 0; i <= N2; i++ )
T[i] = i;
for ( i = 0; i < N1; i++ ) {
T[0] = i+1;
int corner = i;
for ( j = 0; j < N2; j++ ) {
int upper = T[j+1];
if ( s1[i] == s2[j] )
T[j+1] = corner;
else
T[j+1] = min(T[j], min(upper, corner)) + 1;
corner = upper;
}
}
return T[N2];
}
source
share