Android and fuzzy match, n-gram and Levenshtein distance

Question

Android and fuzzy match, n-gram and Levenshtein distance

I am creating an Android application that accepts string input and returns a list of books using the Google API.

I am looking for a way to compare the open-end string that a user enters with the first item in a list to see if what they typed in is "likely" to be a single book. I have a lot of information about the book, title, author, description, etc. Therefore, I can search in any part.

Example:

  'eyre affair fforde', 'fforde eyre affair', 'the eyre affair'
 ----> 
 'Likely' to be 'The Eyre Affair by Jasper Fforde'

What would be the best way? I looked at the levenshtein distance, but I don’t think it will work with such open inputs, n-grams seem to be a good way, or a fuzzy match.

Any other ideas?

+4

java android levenshtein distance fuzzy-search n-gram

Carrie hall Feb 24 '11 at 8:40

source share

1 answer

Chris · Accepted Answer · 2011-02-24T08:51:43+0000

I would go with one of them:

SimMetrics (SimMetrics is an extensible open source library of similarities or distances, e.g. Levenshtein distance, L2 distance, cosine similarity, Jaccard Similarity, etc.)

Commons Lang LevenshteinDistance

Or, to get rid of hearing or spelling mistakes: soundex or metaphone .

Android and fuzzy match, n-gram and Levenshtein distance

More articles: