How to find duplicate score among several columns?

Here is an example table that mimics my scenario:

COL_1   COL_2   COL_3   COL_4   LAST_COL
A       P       X       NY      10
A       P       X       NY      11
A       P       Y       NY      12
A       P       Y       NY      13
A       P       X       NY      14
B       Q       X       NY      15
B       Q       Y       NY      16
B       Q       Y       CA      17
B       Q       Y       CA      18

LAST_COL is the primary key, so it will be different each time.

I want to ignore LAST_COL and collect some statistics related to the other four columns.

Basically, I have millions of rows in my table, and I want to know which set COL_1, COL_2, COL_3 and COL_4has the most rows.

So, I need a query that can print me all unique lines with their number of occurrences.

COL_1   COL_2   COL_3   COL_4   TOTAL
A       P       X       NY      3
A       P       Y       NY      2
B       Q       X       NY      1
B       Q       Y       NY      1
B       Q       Y       CA      2

Thanks to everyone who helps me with this.

* I use MS SQL if that matters.

+3
source share
4 answers
SELECT COL_1, COL_2, COL_3, COL_4, COUNT(*)
FROM MyTable
GROUP BY COL_1, COL_2, COL_3, COL_4

If you ever want to cut lines that don't have a duplicate:

SELECT COL_1, COL_2, COL_3, COL_4, COUNT(*)
FROM MyTable
GROUP BY COL_1, COL_2, COL_3, COL_4
HAVING COUNT(*) > 1
+9

GROUP BY - , . :

SELECT COL_1, COL_2, COL_3, COL_4, COUNT(*)
FROM my_table
GROUP BY COL_1, COL_2, COL_3, COL_4
+1

If I understand correctly, all you need is something like:

SELECT COL_1,COL_2,COL_3,COL_4, COUNT(*) AS TOTAL
FROM table
GROUP BY COL_1,COL_2,COL_3,COL_4
+1
source

have the most rows

So you want to count, and then ORDER BY count DESC

SELECT    COL_1, COL_2, COL_3, COL_4, COUNT(*) COUNT_ROWS
FROM      TBL
GROUP BY  COL_1, COL_2, COL_3, COL_4
ORDER BY  COUNT_ROWS DESC
+1
source

Source: https://habr.com/ru/post/1795596/


All Articles