Maintaining Double Counted Rows in GROUP BY

Here are the main problems of my schema and problems: http://sqlfiddle.com/#!1/72ec9/4/0

Please note that the period table can refer to a variable time range - it can be a whole season, it can be several games or one game. For a given team and year, all lines of a period are exceptional time intervals.

I have a written query that joins tables and uses the GROUP BY.year period to add season points (see sqlfiddle). However, if the trainer has two positions in the same year, GROUP BY will read the same line of the period twice. How can I omit duplicates when the trainer occupied two positions, but still summarizes the periods when the year consists of several periods? If you had a better way to make a circuit, I would appreciate it too if you would point it to me.

+4
source share
3 answers

I explained the underlying problem (joining multiple tables with multiple matches) in this close answer .

First, I simplified your query syntactically to make it easier to read:

select pe.year ,sum(pe.wins) AS wins ,sum(pe.losses) AS losses ,sum(pe.ties) AS ties ,array_agg(po.id) AS position_id ,array_agg(po.name) AS position_names from periods_positions_coaches_linking pp join positions po ON po.id = pp.position join periods pe ON pe.id = pp.period where pp.coach = 1 group by pe.year order by pe.year; 

Sets the same (incorrect) result as your original, but easier / faster / easier to read.

  • It makes no sense to join the table trainer until you use the columns in the SELECT list. I completely removed it and replaced the WHERE clause with where pp.coach = 1 .

  • You do not need COALESCE at all. NULL values ​​are ignored in the aggregate function sum() . It makes no sense to replace 0 .

  • Use table aliases to make reading easier

Next, I solved your problem as follows:

 SELECT * FROM ( SELECT pe.year ,array_agg(DISTINCT po.id) AS position_id ,array_agg(DISTINCT po.name) AS position_names FROM periods_positions_coaches_linking pp JOIN positions po ON po.id = pp.position JOIN periods pe ON pe.id = pp.period WHERE pp.coach = 1 GROUP BY pe.year ) po LEFT JOIN ( SELECT pe.year ,sum(pe.wins) AS wins ,sum(pe.losses) AS losses ,sum(pe.ties) AS ties FROM ( SELECT period FROM periods_positions_coaches_linking WHERE coach = 1 GROUP BY period ) pp JOIN periods pe ON pe.id = pp.period GROUP BY pe.year ) pe USING (year) ORDER BY year 
  • Join positions and periods separately before joining them.

  • In the first list of position subqueries, only once, simply using DISTINCT .

  • In the second subquery

    • GROUP BY , because a coach can have several positions per period.
    • JOIN to the data periods after that, and then aggregate to get the amounts.

Updated sqlfiddle demonstrates solution.

+6
source

use distinct as shown here

code:

 select periods.year as year, sum(coalesce(periods.wins, 0)) as wins, sum(coalesce(periods.losses, 0)) as losses, sum(coalesce(periods.ties, 0)) as ties, array_agg( distinct positions.id) as position_id, array_agg( distinct positions.name) as position_names from periods_positions_coaches_linking join coaches on coaches.id = periods_positions_coaches_linking.coach join positions on positions.id = periods_positions_coaches_linking.position join periods on periods.id = periods_positions_coaches_linking.period where coaches.id = 1 group by periods.year, positions.id order by periods.year; 
+2
source

In your case, the simplest way would be to split the positions:

 select periods.year as year, sum(coalesce(periods.wins, 0))/COUNT(distinct positions.id) as wins, sum(coalesce(periods.losses, 0))/COUNT(distinct positions.id) as losses, sum(coalesce(periods.ties, 0))/COUNT(distinct positions.id) as ties, array_agg(distinct positions.id) as position_id, array_agg(distinct positions.name) as position_names from periods_positions_coaches_linking join coaches on coaches.id = periods_positions_coaches_linking.coach join positions on positions.id = periods_positions_coaches_linking.position join periods on periods.id = periods_positions_coaches_linking.period where coaches.id = 1 group by periods.year order by periods.year; 

The number of positions scales the winnings, losses and ties, so their division corrects the calculations.

+1
source

Source: https://habr.com/ru/post/1437666/


All Articles