SQL - comparing rows from two tables (fuzzy match ... sorta)

I was looking for questions, and there are similar questions, but not a solution that I think I can use. This question looks like a fuzzy match ... sorta. I need help comparing two tables. One table is the company’s look-up table, and the other is the table that receives raw company data daily. The lookup table is clean and has a company identifier associated with each individual company. The daily data that is imported does not have a company identifier. What I'm trying to do is to have a daily link to the Company Checklist data in the company name and update the Company Table company_state column based on company_name. Unfortunately, the daily data row for company_name is not always the same every day. There may be different characters (az, 0-9, +, - ,.) And spaces in front or after the actual name of the company with different durations per day, so I do not believe that I can use charindex to clean it.

Company Reference Table

company_id  company_name  company_state
1           Awesome Inc   NY
2           Excel-guru    AL
3           Clean All     MI 

Company table

company_name              company_state
abc123 Awesome   Inc      NULL
Excel gur xyz-987         NULL
Clean All Cleanall        NULL

I want this done. Varieties like a fuzzy coincidence.

Company table

company_name              company_state
abc123 Awesome   Inc      NY
Excel gur xyz-987         AL
Clean All Cleanall        MI

Any help is greatly appreciated. Thank.

+4
source share
3 answers

Try the query below to update the company table :

update company c INNER JOIN company_ref cr
ON c.company_name LIKE concat('%', cr.company_name, '%') 
SET c.company_state = cr.company_state;

Another way: just SELECT

SELECT c.*, cr.* FROM company c INNER JOIN company_ref cr
ON c.company_name LIKE concat('%', cr.company_name, '%');

SQL Fiddle: http://sqlfiddle.com/#!2/ec76f/1

+1
source

, company_name , , . , , , A B. , MySQL , ( ):

select c.company_name, r.company_state from company_table c, reference_table r where locate(r.company_name, c.company_name) != 0

, MySQL locate(A, B) 0 , A B.

0

, , . , , .

, , - -, , . :

  • (, "Awesome Inc" → "Awesome Inc" )
  • -
  • , ?

- :

  • ( ), , ( ). , , .
  • , (, " " " " ) - . . , > 1 , .
  • , , " " , ().

, , , , , , .

You can also keep all previous matches for the company, which means that over time, your system can improve. It depends on how much data changes every day.

0
source

Source: https://habr.com/ru/post/1530364/


All Articles