Easy fuzzy search library

Can you offer a small fuzzy text search library?

What I want to do is let users find the right data for typo searches.

I could use full-text search engines like Lucene, but I think this is overkill.

Edit:
To make the question clearer, this is the main scenario for this library:
I have a large list of strings. I want to be able to search in this list (something like MSVS intellisense), but it should be possible to filter this list by lines that are not in it, but close enough to some line that is in the list.
Example:

  • Red
  • Green
  • blue

When I type "Gren" or "Geen" in the text box, I want to see "Green" in the result set.

The main language for indexed data will be English.

I think that Lutsen is hard for this task.

Update

I found one product that meets my requirements. This is a ShuffleText .
Do you know any alternatives?

+14
fuzzy-search
03 Sep '08 at 15:49
source share
9 answers

Lucene is very scalable, which is also good for small applications. You can create an index in memory very quickly if that is all you need.

For a fuzzy search, you really need to decide which algorithm you want to use. When searching for information, I use the n-gram technique for Lucene to work successfully. But this is a special indexing technique, not the "library" itself.

Without knowing more about your application, it will not be easy to recommend a suitable library. How much data are you looking for? What is the data format? How often is the data updated?

+3
Sep 03 '08 at 16:34
source share

I'm not sure how well Lucene is for fuzzy searches, a custom library would be a better choice. For example, this search is performed in Java and works quite quickly, but it is custom-made for such a task: http://www.softcorporation.com/products/people/

+2
Oct 25
source share

Soundex is very "English" in this coding - Deutsch-Mokotoff works best for many names, especially European (Germanic) and Hebrew names. In my British world, this is what I use.

Wiki is here .

+1
Sep 03 '08 at 16:10
source share

If you can use a database, I recommend using PostgreSQL and fuzzy string functions .

If you can use Ruby, I suggest taking a look at the amatch library .

+1
Mar 12 '09 at 4:25
source share

You did not specify your development platform, but if its PHP offers you a look at the ZEND Lucene lubrary:

http://ifacethoughts.net/2008/02/07/zend-brings-lucene-to-php/ http://framework.zend.com/manual/en/zend.search.lucene.html

Like LAMP, it is much lighter than Lucene in Java, and it can be easily expanded for other types of files, provided you can find a conversion library or cmd string converter - there are many OSS solutions for this.

+1
Jan 05 '10 at 3:23
source share

Try Walnutil - the Lucene-based API - integrated with SQL Server and Oracle DB. You can create any type of index and then use it. For a simple search, you can use some methods from walnutilsoft; for more complex search cases, you can use the Lucene API. See Web Example for indexes created from Walnutil tools. You can also see sample code written in Java and C # that you can use to create another type of search. These tools are free. http://www.walnutilsoft.com/

+1
10 Sep '10 at 13:32
source share

@aku - links to soundex working libraries are located at the bottom of the page

Regarding the Levenshtein distance, the Wikipedia article about this also has the implementations listed below.

0
Sep 03 '08 at 15:58
source share

Powerful, lightweight sphinx solution.

It is smaller than Lucene and supports ambiguity.

It is written in C ++, it is fast, battle tested, has libraries for every env, and is used by large companies like craigslists.org

0
Mar 10 2018-12-12T00:
source share

Check this link. It uses levenshtein distance metrics, but much faster. http://narenonit.blogspot.com/2012/07/fuzzy-matching-autocomplete-library.html

0
Aug 01 '12 at 3:27
source share



All Articles