What is an SQL query for a list of all rows that have 2 columns of a substring, like duplicates?

Good. I have a table with redundant data, and I'm trying to identify all rows that have duplicate substrings (due to lack of a better word). By substrings, I mean only consideration of COL1 and COL2.

So let's say I have something like this:

COL1 COL2 COL3 --------------------- aa 111 blah_x aa 111 blah_j aa 112 blah_m ab 111 blah_s bb 112 blah_d bb 112 blah_d cc 112 blah_w cc 113 blah_p 

I need a SQL query that returns this:

  COL1 COL2 COL3 --------------------- aa 111 blah_x aa 111 blah_j bb 112 blah_d bb 112 blah_d 
+4
source share
10 answers

Does this work for you?

 select t.* from table t left join ( select col1, col2, count(*) as count from table group by col1, col2 ) c on t.col1=c.col1 and t.col2=c.col2 where c.count > 1 
+8
source

With the data you provided, your request is not possible. The data on lines 5 and 6 do not differ from each other.

Assuming your table is called "quux" if you start with something like this:

 SELECT a.COL1, a.COL2, a.COL3 FROM quux a, quux b WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.COL3 <> b.COL3 ORDER BY a.COL1, a.COL2 

As a result, you will get the answer:

  COL1 COL2 COL3 --------------------- aa 111 blah_x aa 111 blah_j 

This is because lines 5 and 6 have the same values ​​for COL3. Any query that returns both rows 5 and 6 also returns duplicates of ALL rows in this dataset.

On the other hand, if you have a primary key (ID), you can use this query instead:

 SELECT a.COL1, a.COL2, a.COL3 FROM quux a, quux b WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.ID <> b.ID ORDER BY a.COL1, a.COL2 

[Edited to simplify the WHERE clause]

And you will get the desired results:

 COL1 COL2 COL3 --------------------- aa 111 blah_x aa 111 blah_j bb 112 blah_d bb 112 blah_d 

I just tested this on SQL Server 2000, but you should see the same results in any modern SQL database.

blorgbeard proved me wrong - good for him!

+5
source

Join yourself like this:

 SELECT a.col3, b.col3, a.col1, a.col2 FROM tablename a, tablename b WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3 

If you use postgresql, you can use oid to return fewer duplicate results, for example:

 SELECT a.col3, b.col3, a.col1, a.col2 FROM tablename a, tablename b WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3 AND a.oid < b.oid 
+4
source

You don't have a database to test this, but I think it should work ...

 select * from theTable where col1 in ( select col1 from theTable group by col1||col2 having count(col1||col2) > 1 ) 
+2
source

My naive attempt was

 select a.*, b.* from table a, table b where a.col1 = b.col1 and a.col2 = b.col2 and a.col3 != b.col3; 

but this will return all rows twice. I'm not sure how you would limit it to just returning them once. Perhaps if there was a primary key, you could add "and a.pkey <b.pkey".

As I said, this is not elegant and is probably the best way to do this.

+2
source

Something like this should work:

 SELECT a.COL1, a.COL2, a.COL3 FROM YourTable a JOIN YourTable b ON b.COL1 = a.COL1 AND b.COL2 = a.COL2 AND b.COL3 <> a.COL3 

In general, a JOIN clause should include each column that you consider to be part of the "duplicate" (COL1 and COL2 in this case), and at least one column (or as many as is needed) to eliminate the row connecting itself (COL3 , in this case).

+2
source

This is very similar to self-connection, except that it will not have duplicates.

 select COL1,COL2,COL3 from theTable a where exists (select 'x' from theTable b where a.col1=b.col1 and a.col2=b.col2 and a.col3<>b.col3) order by col1,col2,col3 
+2
source

Here's how you find duplicates. Tested in oracle 10g with your data.

select * from tst where (col1, col2) in (select col1, col2 from tst of the group col1, col2 with the score (*)> 1)

+1
source

select COL1, COL2, COL3

from the table

by COL1, COL2, COL3

has a counter (*)> 1

0
source

Forget about connections - use the analytic function:

 select col1, col2, col3 from ( select col1, col2, col3, count(*) over (partition by col1, col2) rows_per_col1_col2 from table ) where rows_per_col1_col2 > 1 
0
source

Source: https://habr.com/ru/post/1277138/