Are there any arithmetic comparisons of strings that are "better" than the Levenshtein distance?

I use it for a project I'm working on, but some of the results are not what I would choose. For instance:

When the "Date" is compared to

  • "State" has a left distance of 2
  • "Today Date" has a left distance of 9

This is what we might expect from the algorithm, but I'm curious if anyone knows of something that will give a closer match to any comparable strings that have an exact match for the original string (Date)? This means that Today Date will have a higher rating because it has Date.

Bonus points if you can find a .NET library that implements this.

+4
source share
3 answers

I think it meant that you signed the word before hiring Levenshtein. Alternatively there is the Jaro-Winker distance .

There is a .net SimMetrics library here that seems to cover a few alternatives .

+1
source

Perhaps you wanted to find the longest common subsequence ?

+2
source

To do it right you need some kind of usage context

If you are trying to search by address, then "Nosuch STREET" may have a perfect match for "Nosuch ROAD", or on the "no years" list, you want all 20 Gadaffi spellings to match.

if you are trying to analyze how much a piece of historical text with copying has changed, you need a different algorithm,

0
source

Source: https://habr.com/ru/post/1339566/


All Articles