Search for similar names in multiple tables

I have several tables with different client names. I am trying to figure out how many times the same name in the table. The challenge here is that someone could enter the name "John Smith" or "Smith, John."

Each table contains 40,000 rows and more than 40 different tables. I am trying to somehow query without knowing the names, but still return the names.

Basically I need to group the names without using an instruction like:

WHERE cust_name LIKE '%john%'

How can you query multiple table columns using the contents of other table columns when the data inside cannot be in the same format? What is the best way to clear data to remove commas, spaces, etc.

+3
source share
3 answers

You have fuzzy logic available in SSIS. I have used fuzzy grouing successfully to find duplicates - although you will want to match more than the name, as many people use the same names. I made a match using name, address, phone and email. Fuzzy grouping allows you to use multiple fields for matching.

+2
source

. , . , . , , .

+1

. " " ", ", . , - , .

, "FirstName LastName" "LastName, FirstName", - :

SELECT
    CASE
        WHEN name LIKE '%,%'
            THEN SUBSTRING(name, CHARINDEX(',', name) + 2, LEN(name)) + ' ' +
                 SUBSTRING(name, 1, CHARINDEX(',', name) - 1)
        ELSE name
    END AS name

. , . .. , .

, . , - , , .

, soundex, , .

0

Source: https://habr.com/ru/post/1752288/


All Articles