PostgreSQL equivalent for MySQL GROUP BY

Question

PostgreSQL equivalent for MySQL GROUP BY

I need to find duplicates in a table. In MySQL, I just write:

SELECT *,count(id) count FROM `MY_TABLE` GROUP BY SOME_COLUMN ORDER BY count DESC

This request is beautiful:

Finds duplicates based on SOME_COLUMN, giving the number of repetitions.
Sorts in the order of repetition, which is useful for quick scanning of the main duplicates.
Selects a random value for all other columns, giving me an idea of the values in these columns.

A similar request in Postgres welcomes me with an error:

column "MY_TABLE.SOME_COLUMN" should appear in a GROUP BY clause or used in an aggregate function

What is the Postgres equivalent of this request?

PS: I know that the behavior of MySQL is different from the SQL standards.

+6

sql mysql aggregate-functions group-by postgresql

jerrymouse May 01 '12 at 13:41

source share

4 answers

Here's another approach using DISTINCT ON:

 select distinct on(ct, some_column) *, count(id) over(PARTITION BY some_column) as ct from my_table x order by ct desc, some_column, id

Data source:

 CREATE TABLE my_table (some_column int, id int, col1 int); INSERT INTO my_table VALUES (1, 3, 4) ,(2, 4, 1) ,(2, 5, 1) ,(3, 6, 4) ,(3, 7, 3) ,(4, 8, 3) ,(4, 9, 4) ,(5, 10, 1) ,(5, 11, 2) ,(5, 11, 3);

Conclusion:

 SOME_COLUMN ID COL1 CT 5 10 1 3 2 4 1 2 3 6 4 2 4 8 3 2 1 3 4 1

Live test: http://www.sqlfiddle.com/#!1/e2509/1

DISTINCT ON Documentation: http://www.postgresonline.com/journal/archives/4-Using-Distinct-ON-to-return-newest-order-for-each-customer.html

+3

Michael buen May 03 '12 at 12:01

source share

mysql allows group by to omit the non-aggregated selected columns from the group by list that it performs, returning the first row found for each unique combination grouped by columns. This is non-standard SQL behavior.

postgres, on the other hand, conforms to the SQL standard.

There is no equivalent request in postgres.

+1

Bohemian May 01, '12 at 13:45

source share

Below is a link to the CTE that allows the use of select * . key0 is a unique key, {key1, key2} are additional key elements necessary for addressing non-standard strings. Use at your own risk, YMMV.

 WITH zcte AS ( SELECT DISTINCT tt.key0 , MIN(tt.key1) AS key1 , MIN(tt.key2) AS key2 , COUNT(*) AS cnt FROM ztable tt GROUP BY tt.key0 HAVING COUNT(*) > 1 ) SELECT zt.* , zc.cnt AS cnt FROM ztable zt JOIN zcte zc ON zc.key0 = zt.key0 AND zc.key1 = zt.key1 AND zc.key2 = zt.key2 ORDER BY zt.key0, zt.key1,zt.key2 ;

BTW: to get the intended behavior for the OP, the HAVING COUNT(*) > 1 must be omitted.

+1

wildplasser May 01, '12 at 15:08

source share

Erwin brandstetter · Accepted Answer · 2012-05-01T13:44:16+0000

Back-ticks are a non-standard MySQL thing. Use canonical double quotes to quote identifiers (perhaps in MySQL too). That is, if your table is actually called "MY_TABLE" (all uppercase). If you (more sensibly) called it my_table (all lowercase letters), you can remove double quotes or use lowercase letters.

In addition, I use ct instead of count as an alias because it is bad practice to use function names as identifiers.

Simple case

This will work with PostgreSQL 9.1 :

 SELECT *, count(id) ct FROM my_table GROUP BY primary_key_column(s) ORDER BY ct DESC;

It requires primary key columns in the GROUP BY . The results are identical for MySQL query, but ct will always be 1 (or 0 if id IS NULL ) - it is useless to find duplicates.

Group other than primary key columns

If you want to group other columns, things get more complicated. This query mimics the behavior of your MySQL query - and you can use * .

 SELECT DISTINCT ON (1, some_column) count(*) OVER (PARTITION BY some_column) AS ct ,* FROM my_table ORDER BY 1 DESC, some_column, id, col1;

This works because DISTINCT ON (PostgreSQL-specific), such as DISTINCT (SQL-Standard), is applied after the window function count(*) OVER (...) . Window functions (with the OVER clause) require PostgreSQL 8.4 or later and are not available in MySQL.

Works with any table, regardless of primary or unique constraints.

1 in DISTINCT ON and ORDER BY is simply abbreviated to refer to the sequence number of an item in a SELECT list.

SQL Fiddle to demonstrate how side by side.

See more in this close answer:

Select the first row in each GROUP BY?

`count(*)` vs `count(id)`

If you're looking for duplicates, you're better off with count(*) than count(id) . There is a subtle difference if id can be NULL because NULL values are not taken into account - while count(*) counts all rows. If the id is NOT NULL , the results are the same, but count(*) is usually more appropriate (and a little faster).

PostgreSQL equivalent for MySQL GROUP BY

Simple case

Group other than primary key columns

count(*) vs count(id)

More articles:

`count(*)` vs `count(id)`