What is an SQL query for a list of all rows that have 2 columns of a substring, like duplicates?

Question

What is an SQL query for a list of all rows that have 2 columns of a substring, like duplicates?

Good. I have a table with redundant data, and I'm trying to identify all rows that have duplicate substrings (due to lack of a better word). By substrings, I mean only consideration of COL1 and COL2.

So let's say I have something like this:

COL1 COL2 COL3 --------------------- aa 111 blah_x aa 111 blah_j aa 112 blah_m ab 111 blah_s bb 112 blah_d bb 112 blah_d cc 112 blah_w cc 113 blah_p

I need a SQL query that returns this:

  COL1 COL2 COL3 --------------------- aa 111 blah_x aa 111 blah_j bb 112 blah_d bb 112 blah_d

+4

sql database

fuentesjr 25 sept. '08 at 1:33

source share

10 answers

With the data you provided, your request is not possible. The data on lines 5 and 6 do not differ from each other.

Assuming your table is called "quux" if you start with something like this:

 SELECT a.COL1, a.COL2, a.COL3 FROM quux a, quux b WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.COL3 <> b.COL3 ORDER BY a.COL1, a.COL2

As a result, you will get the answer:

  COL1 COL2 COL3 --------------------- aa 111 blah_x aa 111 blah_j

This is because lines 5 and 6 have the same values for COL3. Any query that returns both rows 5 and 6 also returns duplicates of ALL rows in this dataset.

On the other hand, if you have a primary key (ID), you can use this query instead:

 SELECT a.COL1, a.COL2, a.COL3 FROM quux a, quux b WHERE a.COL1 = b.COL1 AND a.COL2 = b.COL2 AND a.ID <> b.ID ORDER BY a.COL1, a.COL2

[Edited to simplify the WHERE clause]

And you will get the desired results:

 COL1 COL2 COL3 --------------------- aa 111 blah_x aa 111 blah_j bb 112 blah_d bb 112 blah_d

I just tested this on SQL Server 2000, but you should see the same results in any modern SQL database.

blorgbeard proved me wrong - good for him!

+5

Craig trader 25 sept. '08 at 1:40

source share

Join yourself like this:

 SELECT a.col3, b.col3, a.col1, a.col2 FROM tablename a, tablename b WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3

If you use postgresql, you can use oid to return fewer duplicate results, for example:

 SELECT a.col3, b.col3, a.col1, a.col2 FROM tablename a, tablename b WHERE a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 != b.col3 AND a.oid < b.oid

+4

Jerub 25 sept. '08 at 1:35

source share

You don't have a database to test this, but I think it should work ...

 select * from theTable where col1 in ( select col1 from theTable group by col1||col2 having count(col1||col2) > 1 )

+2

dacracot 25 sept. '08 at 1:37

source share

My naive attempt was

 select a.*, b.* from table a, table b where a.col1 = b.col1 and a.col2 = b.col2 and a.col3 != b.col3;

but this will return all rows twice. I'm not sure how you would limit it to just returning them once. Perhaps if there was a primary key, you could add "and a.pkey <b.pkey".

As I said, this is not elegant and is probably the best way to do this.

+2

Paul tomblin 25 sept. '08 at 1:38

source share

Something like this should work:

 SELECT a.COL1, a.COL2, a.COL3 FROM YourTable a JOIN YourTable b ON b.COL1 = a.COL1 AND b.COL2 = a.COL2 AND b.COL3 <> a.COL3

In general, a JOIN clause should include each column that you consider to be part of the "duplicate" (COL1 and COL2 in this case), and at least one column (or as many as is needed) to eliminate the row connecting itself (COL3 , in this case).

+2

Jonathan schuster 25 sept. '08 at 1:43

source share

This is very similar to self-connection, except that it will not have duplicates.

 select COL1,COL2,COL3 from theTable a where exists (select 'x' from theTable b where a.col1=b.col1 and a.col2=b.col2 and a.col3<>b.col3) order by col1,col2,col3

+2

IK. 25 sept. '08 at 1:48

source share

Here's how you find duplicates. Tested in oracle 10g with your data.

select * from tst where (col1, col2) in (select col1, col2 from tst of the group col1, col2 with the score (*)> 1)

+1

Kyle dyer 01 Oct '08 at 4:46

source share

select COL1, COL2, COL3

from the table

by COL1, COL2, COL3

has a counter (*)> 1

0

pappes 25 sept. '08 at 2:43

source share

Forget about connections - use the analytic function:

 select col1, col2, col3 from ( select col1, col2, col3, count(*) over (partition by col1, col2) rows_per_col1_col2 from table ) where rows_per_col1_col2 > 1

0

David aldridge 25 sept. '08 at 3:27

source share

Blorgbeard · Accepted Answer · 2008-09-25T01:40:43+0000

Does this work for you?

 select t.* from table t left join ( select col1, col2, count(*) as count from table group by col1, col2 ) c on t.col1=c.col1 and t.col2=c.col2 where c.count > 1

What is an SQL query for a list of all rows that have 2 columns of a substring, like duplicates?

More articles: