Get recordings with similar sounds

I want to find all duplicate names from the contact table whose name matches the sound. For example: Rita or Reeta, Microsoft or Microsift, Mukherjee or Mukherji.

I used below query:

SELECT contacts.id 
FROM contacts 
INNER JOIN (
    SELECT first_name, last_name, count(*) AS rows 
    FROM contacts 
    WHERE deleted = 0 
    GROUP BY SOUNDEX(first_name), SOUNDEX(last_name) 
    HAVING count(rows) > 1
) AS p 
WHERE contacts.deleted = 0 
AND p.first_name SOUNDS LIKE contacts.first_name 
AND p.last_name SOUNDS LIKE contacts.last_name 
ORDER BY contacts.date_entered DESC

The above query gives the correct results, but takes a long time when there are many records.

+4
source share
2 answers

() , SOUNDEX(). , , , , . , . MySQL, , WHERE

[...] AND SOUNDEX(p.first_name) = SOUNDEX(contacts.first_name) [...]

, , ( !), !

, . , HAVING COUNT(*) > 1, , / !?

- :

SELECT c1.id as contact_id, 
       c2.id as similar_id
  FROM contacts c1 
  JOIN contacts c2
    ON c2.id <> c1.id
   AND c2.deleted = 0
   AND SOUNDEX(c2.first_name) = SOUNDEX(c1.first_name)
   AND SOUNDEX(c2.last_name) = SOUNDEX(c1.last_name)
 WHERE c1.deleted = 0 
ORDER BY c1.date_entered DESC

, ,

SELECT c1.id as contact_id, 
       c2.id as similar_id
  FROM contacts c1 
  JOIN contacts c2
    ON c2.id <> c1.id
   AND c2.deleted = 0
   AND c2.first_name_soundex = c1.first_name_soundex
   AND c2.last_name_soundex = c1.last_name_soundex
 WHERE c1.deleted = 0 
ORDER BY c1.date_entered DESC

first_name_soundex SOUNDEX (first_name) .. deleted, first_name_soundex, last_name_soundex. (AFAIK MySQL , deleted = 0).

0

SOUNDEX - (IMHO) . ...

SELECT SOUNDEX('cholmondley');
+------------------------+
| SOUNDEX('cholmondley') |
+------------------------+
| C4534                  |
+------------------------+

SELECT SOUNDEX('chumleigh');
+----------------------+
| SOUNDEX('chumleigh') |
+----------------------+
| C542                 |
+----------------------+
0

Source: https://habr.com/ru/post/1535905/


All Articles