Addresses stored on the SQL server have many small changes (errors)

I have a table in my database that stores packing lists and their information. I am trying to query this table and get each unique address. I came close, but I still have a lot of misses, and I'm looking for a way to exclude these close duplicates from my choice.

Data examples

CompanyCode   CompanyName                     Addr1                City         State   Zip
10033         UNITED DIE  CUTTING & FINISHIN  3610 HAMILTON AVE    CLEVELAND    Ohio    44114
10033         UNITED DIE CUTTING & FINISHING  3610 HAMILTON AVE    CLEVELAND    Ohio    44114
10033         UNITED DIE CUTTING & FINISHING  3610 HAMILTON AVE.   CLEVELAND    Ohio    44114
10033         UNITED DIE CUTTING & FINISHING  3610 HAMILTON AVENUE CLEVELAND    Ohio    44114
10033         UNITED DIECUTTING & FINISHING   3610 HAMILTON AVE    CLEVELAND    Ohio    44144
10033         UNITED FINISHING                3610 HAMILTON AVE    CLEVLAND     Ohio    44114
10033         UNITED FINISHING & DIE CUTTING  3610 HAMILTON AVE    CLEVELAND    Ohio    44114

And all I want is 1 entry. Is there a way to get an “average” record? Meaning, if most records say CLEVELAND, not CLEVLAND, I want my 1 record to say CLEVELAND. Is there any way to compare this data with what I'm looking for?

Desired output

 CompanyCode   CompanyName                     Addr1                City         State   Zip
 10033         UNITED DIE CUTTING & FINISHING  3610 HAMILTON AVE    CLEVELAND    Ohio    44114
+1
source share
3 answers

select :

 select CompanyCode,
    (select top 1 CompanyName from Table1 where CompanyCode=X.CompanyCode 
     group by CompanyName order by count(*) desc) CompanyName,
    (select top 1 Addr1 from Table1 where CompanyCode=X.CompanyCode 
     group by Addr1 order by count(*) desc) Addr1,
    (select top 1 City from Table1 where CompanyCode=X.CompanyCode 
     group by City order by count(*) desc) City,
    (select top 1 State from Table1 where CompanyCode=X.CompanyCode 
     group by State order by count(*) desc) State,
    (select top 1 Zip from Table1 where CompanyCode=X.CompanyCode 
     group by Zip order by count(*) desc) Zip
from    Table1 X
group by CompanyCode
0

. , , , .

, (, , , , db), - :

  • , // .., .
  • . . "" / "Ave". "" / ".". .
  • , ​​ / / , ( ..). , .

( , 100%, , , , ), , SELECT DISTINCT...

+2

, ? , ( SQL), - , .

select C1.* from Company C1, 
(select CompanyCode, min(CompanyName) as CompanyNameSelected 
   from Company
   group by CompanyCode) C2
where 
   C1.CompanyCode = C2.CompanyCode and 
   C1.CompanyName = C2.CompanyNameSelected;

You can use any of the aggregation functions instead min(returning the CompanyName name, of course), or even write your own saved function, but you only need one thing — you need to explain in the query language why entry No. 1 is better than # 2.

+1
source

Source: https://habr.com/ru/post/1791082/


All Articles