How to find duplicate score among several columns?

Question

How to find duplicate score among several columns?

Here is an example table that mimics my scenario:

COL_1   COL_2   COL_3   COL_4   LAST_COL
A       P       X       NY      10
A       P       X       NY      11
A       P       Y       NY      12
A       P       Y       NY      13
A       P       X       NY      14
B       Q       X       NY      15
B       Q       Y       NY      16
B       Q       Y       CA      17
B       Q       Y       CA      18

LAST_COL is the primary key, so it will be different each time.

I want to ignore LAST_COL and collect some statistics related to the other four columns.

Basically, I have millions of rows in my table, and I want to know which set COL_1, COL_2, COL_3 and COL_4has the most rows.

So, I need a query that can print me all unique lines with their number of occurrences.

COL_1   COL_2   COL_3   COL_4   TOTAL
A       P       X       NY      3
A       P       Y       NY      2
B       Q       X       NY      1
B       Q       Y       NY      1
B       Q       Y       CA      2

Thanks to everyone who helps me with this.

* I use MS SQL if that matters.

+3

sql database tsql sql-server-2005

bits Mar 01 '11 at 20:38

source share

4 answers

GROUP BY - , . :

SELECT COL_1, COL_2, COL_3, COL_4, COUNT(*)
FROM my_table
GROUP BY COL_1, COL_2, COL_3, COL_4

+1

Adrian Smith 01 . '11 20:41

If I understand correctly, all you need is something like:

SELECT COL_1,COL_2,COL_3,COL_4, COUNT(*) AS TOTAL
FROM table
GROUP BY COL_1,COL_2,COL_3,COL_4

+1

pilavdzice Mar 01 '11 at 20:44

source share

have the most rows

So you want to count, and then ORDER BY count DESC

SELECT    COL_1, COL_2, COL_3, COL_4, COUNT(*) COUNT_ROWS
FROM      TBL
GROUP BY  COL_1, COL_2, COL_3, COL_4
ORDER BY  COUNT_ROWS DESC

+1

RichardTheKiwi Mar 01 '11 at 21:07

source share

squillman · Accepted Answer · 2011-03-01T20:40:57+0000

SELECT COL_1, COL_2, COL_3, COL_4, COUNT(*)
FROM MyTable
GROUP BY COL_1, COL_2, COL_3, COL_4

If you ever want to cut lines that don't have a duplicate:

SELECT COL_1, COL_2, COL_3, COL_4, COUNT(*)
FROM MyTable
GROUP BY COL_1, COL_2, COL_3, COL_4
HAVING COUNT(*) > 1

How to find duplicate score among several columns?

More articles: