Matching Two Lists

I have a table containing data entered by a person. There is a column that must correspond to another list; the human value entered must match this identically in the form of a basic list of possibilities.

However, the problem is that the person’s data is abbreviated, written with errors, etc. Is there a mechanism that does some kind of similarity search to find what really entered the person’s data?

Examples

**Human Entered**         **Should Be**
Carbon-12                 Carbon(12)
South Korea               Republic of Korea
farenheit                 Fahrenheit

The only thought I really have is to break up the Human data entered in the 3 sections of the characters and see if they are on the Must-Have list. He will simply choose the highest rating. As a later addition, he can present the user with a choice of the top 10 or something else.

I was also not necessarily interested in an absolutely perfect solution, but if it worked as 70% to the right, it would save a lot of time going through the list.

+3
source share
3 answers

You can try to calculate the similarity of the two lines using the Levenshtein distance :

private static int CalcLevensteinDistance(string left, string right)
{
    if (left == right)
        return 0;

    int[,] matrix = new int[left.Length + 1, right.Length + 1];

    for (int i = 0; i <= left.Length; i++)
        // delete
        matrix[i, 0] = i;

    for (int j = 0; j <= right.Length; j++)
        // insert
        matrix[0, j] = j;

    for (int i = 0; i < left.Length; i++)
    {
        for (int j = 0; j < right.Length; j++)
        {
            if (left[i] == right[j])
                matrix[i + 1, j + 1] = matrix[i, j];
            else
            {
                // deletion or insertion
                matrix[i + 1, j + 1] = System.Math.Min(matrix[i, j + 1] + 1, matrix[i + 1, j] + 1);

                // substitution
                matrix[i + 1, j + 1] = System.Math.Min(matrix[i + 1, j + 1], matrix[i, j] + 1);
            }
        }
    }

    return matrix[left.Length, right.Length];
}

Now calculate the similarity between the two lines in%

public static double CalcSimilarity(string left, string right, bool ignoreCase)
{
    if (ignoreCase)
    {
        left = left.ToLower();
        right = right.ToLower();
    }

    double distance = CalcLevensteinDistance(left, right);
    if (distance == 0.0f)
        return 1.0f;

    double longestStringSize = System.Math.Max(left.Length, right.Length);
    double percent = distance / longestStringSize;

    return 1.0f - percent;
}
+2
source

, , . , .

- . , , , , , :

  • .
  • .
  • .
  • .

. .

+3

(... ) ? , . . , , , , "" , - .

: ; "-12", " 12", " (12)", " (12)", "-12" ..... . , " " " ", "1:1" ( ""?), .

, , , . , , , .

+1

Source: https://habr.com/ru/post/1782036/


All Articles