How can I record various records in a Postgres database where there are duplicate records?

Question

How can I record various records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:

table with duplicate data

The SQL to retrieve this data was simply SELECT * The first column is "row_id", the second is "id" is the order identifier, and the third is "total" is the revenue.

I’m not sure why there are duplicate rows in the database, but when I do the SUM (total), including the second entry in the database, although the order ID is the same as calling my numbers it will be more than if I chose a separate one (id ), total - export for excel, and then summed up the values manually.

So my question is: how can I use SUM only for individual order identifiers in order to get the same income as export to select each individual line of the order identifier?

Thanks in advance!

+8

postgresql

Katie f Apr 10 '16 at 0:43

source share

6 answers

Bohemian · Answer 1 · 2016-04-10T01:31:30+0000

Easy - just split up the score:

select id, sum(total) / count(id) from orders group by id

Also handles any level of duplication, for example, three times, etc.

zedfoxus · Answer 2 · 2016-04-10T01:04:15+0000

You can try something like this (with your example):

Table

 create table test ( row_id int, id int, total decimal(15,2) ); insert into test values (6395, 1509, 112), (22986, 1509, 112), (1393, 3284, 40.37), (24360, 3284, 40.37);

Query

 with distinct_records as ( select distinct id, total from test ) select a.id, b.actual_total, array_agg(a.row_id) as row_ids from test a inner join (select id, sum(total) as actual_total from distinct_records group by id) b on a.id = b.id group by a.id, b.actual_total

Result

 | id | actual_total | row_ids | |------|--------------|------------| | 1509 | 112 | 6395,22986 | | 3284 | 40.37 | 1393,24360 |

Explanation

We do not know what reasons for orders and totals appear more than once with different row_id. Thus, using the common table expression (CTE), using the phrase with ... , we get a separate identifier and the final value.

At CTE, we use this different data to do a totalization. We append the identifier in the source table with aggregation by individual values. Then we separate the row_ids comma to make the information look cleaner.

SQLFiddle example

http://sqlfiddle.com/#!15/72639/3

Mike kruk · Answer 3 · 2017-07-10T14:49:46+0000

You can use DISTINCT in your aggregate functions:

 SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id

Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES

scottjustin5000 · Answer 4 · 2016-04-10T01:12:31+0000

If we can believe that the sum for 1 order is actually 1 line. We could exclude duplicates in the subquery by selecting the MAX column of the PK identifier. Example:

 CREATE TABLE test2 (id int, order_id int, total int); insert into test2 values (1,1,50); insert into test2 values (2,1,50); insert into test2 values (5,1,50); insert into test2 values (3,2,100); insert into test2 values (4,2,100); select order_id, sum(total) from test2 t join ( select max(id) as id from test2 group by order_id) as sq on t.id = sq.id group by order_id

sql script

Paulzi · Answer 5 · 2019-06-25T09:43:39+0000

In difficult cases:

 select id, ( SELECT SUM(value::int4) FROM jsonb_each_text(jsonb_object_agg(row_id, total)) ) as total from orders group by id

Jaques rheeder · Answer 6 · 2019-06-25T10:31:56+0000

I would suggest just using a subquery:

 SELECT "a"."id", SUM("a"."total") FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a" GROUP BY "a"."id"

Above will give you the sum of each id

Use below if you want the full result of each duplicate to be deleted:

 SELECT SUM("a"."total") FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a"

How can I record various records in a Postgres database where there are duplicate records?

More articles: