Android and fuzzy match, n-gram and Levenshtein distance

I am creating an Android application that accepts string input and returns a list of books using the Google API.

I am looking for a way to compare the open-end string that a user enters with the first item in a list to see if what they typed in is "likely" to be a single book. I have a lot of information about the book, title, author, description, etc. Therefore, I can search in any part.

Example:

  'eyre affair fforde', 'fforde eyre affair', 'the eyre affair'
 ----> 
 'Likely' to be 'The Eyre Affair by Jasper Fforde'

What would be the best way? I looked at the levenshtein distance, but I don’t think it will work with such open inputs, n-grams seem to be a good way, or a fuzzy match.

Any other ideas?

+4
source share
1 answer

I would go with one of them:

SimMetrics (SimMetrics is an extensible open source library of similarities or distances, e.g. Levenshtein distance, L2 distance, cosine similarity, Jaccard Similarity, etc.)

Commons Lang LevenshteinDistance

Or, to get rid of hearing or spelling mistakes: soundex or metaphone .

+7
source

Source: https://habr.com/ru/post/1341107/


All Articles