Besides the SOUNDEX () DIFFERENCE () option (which is a very good cue ball!), You can take a look at SSIS more.
If your data is written in English, and not just the names of people, you can do a lot with these components:
Highlighting the term
Search by date
Fuzzy grouping
Fuzzy search
The main thread will be a multi-level structure in which you are trying to find duplicates with more and more defined ways. Instead of automatically applying them, you send all the names and keys you need to apply the changes to the staging area where they can be viewed and, if necessary.
If you go really smart, you can use the scanned data as a repository in order to make the package βlearnβ, for example, iu is hardly ever valid in English, so if it is detected and changes it to ui, he will make a valid English word that you might want to start applying automatically at some point.
Another thing to keep in mind is to keep a list of all confirmed names and use them to check for duplicates of these names and prevent unnecessary recursion / loading when checking the source data.
source share