How can I record various records in a Postgres database where there are duplicate records?

Imagine a table that looks like this:

table with duplicate data

The SQL to retrieve this data was simply SELECT * The first column is "row_id", the second is "id" is the order identifier, and the third is "total" is the revenue.

I’m not sure why there are duplicate rows in the database, but when I do the SUM (total), including the second entry in the database, although the order ID is the same as calling my numbers it will be more than if I chose a separate one (id ), total - export for excel, and then summed up the values ​​manually.

So my question is: how can I use SUM only for individual order identifiers in order to get the same income as export to select each individual line of the order identifier?

Thanks in advance!

+8
source share
6 answers

Easy - just split up the score:

select id, sum(total) / count(id) from orders group by id 

Also handles any level of duplication, for example, three times, etc.

+8
source

You can try something like this (with your example):

Table

 create table test ( row_id int, id int, total decimal(15,2) ); insert into test values (6395, 1509, 112), (22986, 1509, 112), (1393, 3284, 40.37), (24360, 3284, 40.37); 

Query

 with distinct_records as ( select distinct id, total from test ) select a.id, b.actual_total, array_agg(a.row_id) as row_ids from test a inner join (select id, sum(total) as actual_total from distinct_records group by id) b on a.id = b.id group by a.id, b.actual_total 

Result

 | id | actual_total | row_ids | |------|--------------|------------| | 1509 | 112 | 6395,22986 | | 3284 | 40.37 | 1393,24360 | 

Explanation

We do not know what reasons for orders and totals appear more than once with different row_id. Thus, using the common table expression (CTE), using the phrase with ... , we get a separate identifier and the final value.

At CTE, we use this different data to do a totalization. We append the identifier in the source table with aggregation by individual values. Then we separate the row_ids comma to make the information look cleaner.

SQLFiddle example

http://sqlfiddle.com/#!15/72639/3

+2
source

You can use DISTINCT in your aggregate functions:

 SELECT id, SUM(DISTINCT total) FROM orders GROUP BY id 

Documentation here: https://www.postgresql.org/docs/9.6/static/sql-expressions.html#SYNTAX-AGGREGATES

+2
source

If we can believe that the sum for 1 order is actually 1 line. We could exclude duplicates in the subquery by selecting the MAX column of the PK identifier. Example:

 CREATE TABLE test2 (id int, order_id int, total int); insert into test2 values (1,1,50); insert into test2 values (2,1,50); insert into test2 values (5,1,50); insert into test2 values (3,2,100); insert into test2 values (4,2,100); select order_id, sum(total) from test2 t join ( select max(id) as id from test2 group by order_id) as sq on t.id = sq.id group by order_id 

sql script

0
source

In difficult cases:

 select id, ( SELECT SUM(value::int4) FROM jsonb_each_text(jsonb_object_agg(row_id, total)) ) as total from orders group by id 
0
source

I would suggest just using a subquery:

 SELECT "a"."id", SUM("a"."total") FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a" GROUP BY "a"."id" 

Above will give you the sum of each id

Use below if you want the full result of each duplicate to be deleted:

 SELECT SUM("a"."total") FROM (SELECT DISTINCT ON ("id") * FROM "Database"."Schema"."Table") AS "a" 
0
source

Source: https://habr.com/ru/post/1246798/


All Articles