Approximation of lines (selection of the nearest line from the dictionary)

Is there any string matching code or algorithm that gives us a roughly consistent string from dictionay (contains a predefined set of strings)?

For example: if there are 10 lines in the dictionary (Set of Strings), if the user enters some string, then the algorithm should tell you a string with a close match from the dictionary. It would be great if I get a string with a matching value (or percentage).

+4
source share
4 answers

I think it’s better to use the lucene library, it has a package called org.apache.lucene.search.spell , you can easily use it. It provides 3 algorithms NGramDistance, LevensteinDistance, JaroWinklerDistance . try it

+2
source

You can calculate the Levenshtein distance between lines and lines in the dictionary to find the closest matches. This may not be the best for spellchecking, as it does not do any good to replace the letter or words that are phonetically similar. for example, the question is closer to rest than quitcum.

For more examples, you can read http://en.wikipedia.org/wiki/Approximate_string_matching

+1
source

You can try Levenshtein techinque distance .

A simple idea consists of four basic operations:

  • Insert (hell β†’ hell o )
  • Replacement (good β†’ r ice)
  • Deletion (bowlin g β†’ bowlin)
  • Paging (brohter β†’ bro th er)

You must calculate the distance between your word and each word in the dictionary. The smallest distance means that this word more closely matches the given input.

0
source

I just wanted to add that StringUtils also has a convenient Levenshtein distance method since version 3.0

 public static int getLevenshteinDistance(CharSequence s, CharSequence t) 

After that, it is as simple as iterating through the collection and remembering the closest match:

 public static Object findClosestMatch(Collection<?> collection, Object target) { int distance = Integer.MAX_VALUE; Object closest = null; for (Object compareObject : collection) { int currentDistance = StringUtils.getLevenshteinDistance(compareObject.toString(), target.toString()); if(currentDistance < distance) { distance = currentDistance; closest = compareObject; } } return closest; } 

Note that the method described above requires the collection to be uptime and the toString () function to be implemented with feeling.

0
source

Source: https://habr.com/ru/post/1432032/


All Articles