How to find similar values ​​in one column using postgresql


Im a complete newbie to SQL and therefore I am not very familiar with its functionality.
So here is my problem.
I have the following table s> 100,000 companies (let me call it "comp"):

id | title | name
---- + --------------------- + --------------
1 | XYZ | xyz
---- + --------------------- + --------------
2 | Smarts | smarts
---- + --------------------- + --------------
3 | XYZ LTD | xyzltd
---- + --------------------- + --------------
4 | Outsmarts | outsmarts
---- + --------------------- + --------------
5 | XYZ Entertainment | xyzentertainment
---- + --------------------- + --------------
6 | Smarts Entertainment | smartsentertainment

where "title" is the name of the company, and "name" is the same name, but with a small break and without spaces. Is there a way to find all companies with similar names (using "name" or "name")? So basically, I want to get:

id | title | name
---- + --------------------- + --------------
1 | XYZ | xyz
---- + --------------------- + --------------
3 | XYZ LTD | xyzltd
---- + --------------------- + --------------
5 | XYZ Entertainment | xyzentertainment
---- + --------------------- + --------------
2 | Smarts | smarts
---- + --------------------- + --------------
6 | Smarts Entertainment | smartsentertainment

:
1) "XYZ", "XYZ LTD" "XYZ Entertainment"
2) "" " "
"XYZ Entertainment" "Smart Entertainment", "Smart" "Outsmarts".

, :

SELECT set_limit(0.8);

SELECT
  similarity(c1.name, c2.name) AS sim,
  c1.name,
  c2.name
FROM comp AS c1
  JOIN comp AS c2
    ON c1.name != c2.name
       AND c1.name % c2.name
ORDER BY sim DESC;

by 'did not work' , 7 . , ?

+4
1

Levenshtein distance, :

SELECT levenshtein(c1.name, c2.name) AS sim, 0c1.name, c2.name
FROM comp AS c1 JOIN comp AS c2 ON c1.name != c2.name ORDER BY sim DESC;
+2
source

Source: https://habr.com/ru/post/1661673/


All Articles