Fuzzy name matching algorithm

I have a database containing the names of certain companies and individuals blacklisted. All transactions have been created; its details must be scanned from these blacklisted names. Created transactions may have names that are incorrectly spelled, for example, you can write "Wilson" as "Wilson", "Wilson" or "Wilson". The fuzzy search logic or utility should match the name "Wilson" present in the blacklisted database, and based on the required percentage of accuracy / precision specified by the user, should show the matching name in percent.

Transactions will be sent in batches or in real time to check black names.

I would appreciate it if users who had a similar requirement and implemented them could also give their views and implementation

+4
source share
1 answer

T-SQL is poor in fuzzy searches. Your best bet is third-party libraries, but if you don't want to conflict with this, it is best to use the DIFFERENCE function built into SQL Server. For instance:

SELECT * FROM tblUsers U WHERE DIFFERENCE(U.Name, @nameEntered) >= 3 

A higher return value for DIFFERENCE indicates a higher precision. The disadvantage of this is that the algorithm supports words that sound the same, which may not be your desired characteristic.

The following example shows how to get the best match from a table:

 DECLARE @users TABLE (Name VARCHAR(255)) INSERT INTO @users VALUES ('Dylan'), ('Bob'), ('Tester'), ('Dude') SELECT *, MAX(DIFFERENCE(Name, 'Dillon')) AS SCORE FROM @users GROUP BY Name ORDER BY SCORE DESC 

It returns:

  Name | Score Dylan 4 Dude 3 Bob 2 Tester 0 
+4
source

Source: https://habr.com/ru/post/1485996/


All Articles