Smart Search Implementation / Fuzzy String Comparison

I have a webpage in an ASP.NET MVC application where customers are looking for suppliers. Suppliers record their own data on the website. The customer wants to use the "smart search" function, where they can search for suppliers and find them, even if the spelling of the supplier is "slightly different" from what is printed in the search box.

I have no idea what the client concept is "slightly different." I studied the implementation of the custom soundex algorithm. This converts the word into a number based on how it sounds. This number is then used for comparison.

For instance:

Zach

Zack

will encode a single value. Are there any other options that I could take a look at?

+6
source share
2 answers

You can use Levenshtein distance in combination with the "tags" field in the "Suppliers" section of your database for "smart search" style functionality.

This is pretty simple, but well suited for cases like Zak / Zak.

Adding tags to your database allows you to handle situations where people can search for a supplier by their abbreviation or other common name.

See How to calculate a measure of similarity between two rows? and http://www.dotnetperls.com/levenshtein for implementation details.

+6
source

What you need is an indexed search using a phonetic analysis filter.

Lucene.NET offers exactly that.

http://lucene.apache.org/core/4_0_0/analyzers-phonetic/org/apache/lucene/analysis/phonetic/PhoneticFilterFactory.html

How to perform a phonetic and approximative search in Lucene.net

See here the version of .NET Phonetix:
http://sourceforge.net/projects/phonetixnet/

Here is some more information on how to implement it in C #:
lucene.net phonetic filter

You can also use BeiderMorseEncoder , which is designed to handle many languages .

Regarding the search for similar words, why not use a fuzzy search?

how to do a fuzzy search in Lucene.net in asp.net?
Search for fuzzy phrases Lucene.net

There are also many functions of string metrics that can be used using the CLR stored procedure: http://anastasiosyal.com/post/2009/01/11/Beyond-SoundEx-Functions-for-Fuzzy-Searching-in-MS-SQL -Server

+7
source

Source: https://habr.com/ru/post/972811/


All Articles