Why do you need to include a field in GROUP BY when using OVER (PARTITION BY x)?

I have a table for which I want to make a simple sum of a field grouped by two columns. Then I want the total value for all values ​​for each year_num.

See an example: http://rextester.com/QSLRS68794

This query throws: "42803: the column" foo.num_cust "should appear in the GROUP BY clause or be used in the aggregate function," and I cannot understand why. Why does an aggregate function using OVER (PARTITION BY x) require the summed field to be in GROUP BY ??

select year_num ,age_bucket ,sum(num_cust) --,sum(num_cust) over (partition by year_num) --THROWS ERROR!! from foo group by year_num ,age_bucket order by 1,2 

Table:

 | loc_id | year_num | gen | cust_category | cust_age | num_cust | age_bucket | |--------|-----------|------|----------------|-----------|-----------|-------------| | 1 | 2016 | M | cash | 41 | 2 | 04_<45 | | 1 | 2016 | F | Prepaid | 41 | 1 | 03_<35 | | 1 | 2016 | F | cc | 61 | 1 | 05_45+ | | 1 | 2016 | F | cc | 19 | 2 | 02_<25 | | 1 | 2016 | M | cc | 64 | 1 | 05_45+ | | 1 | 2016 | F | cash | 46 | 1 | 05_45+ | | 1 | 2016 | F | cash | 27 | 3 | 03_<35 | | 1 | 2016 | M | cash | 42 | 1 | 04_<45 | | 1 | 2017 | F | cc | 35 | 1 | 04_<45 | | 1 | 2017 | F | cc | 37 | 1 | 04_<45 | | 1 | 2017 | F | cash | 46 | 1 | 05_45+ | | 1 | 2016 | F | cash | 19 | 4 | 02_<25 | | 1 | 2017 | M | cash | 43 | 1 | 04_<45 | | 1 | 2017 | M | cash | 29 | 1 | 03_<35 | | 1 | 2016 | F | cc | 13 | 1 | 01_<18 | | 1 | 2017 | F | cash | 16 | 2 | 01_<18 | | 1 | 2016 | F | cc | 17 | 2 | 01_<18 | | 1 | 2016 | M | cc | 17 | 2 | 01_<18 | | 1 | 2017 | F | cash | 18 | 9 | 02_<25 | 

DESIRED OUTPUT:

 | year_num | age_bucket | sum | sum over (year_num) | |----------|------------|-----|---------------------| | 2016 | 01_<18 | 5 | 21 | | 2016 | 02_<25 | 6 | 21 | | 2016 | 03_<35 | 4 | 21 | | 2016 | 04_<45 | 3 | 21 | | 2016 | 05_45+ | 3 | 21 | | 2017 | 01_<18 | 2 | 16 | | 2017 | 02_<25 | 9 | 16 | | 2017 | 03_<35 | 1 | 16 | | 2017 | 04_<45 | 3 | 16 | | 2017 | 05_45+ | 1 | 16 | 
+5
source share
1 answer

You need to sum() s:

 select year_num, age_bucket, sum(num_cust), sum(sum(num_cust)) over (partition by year_num) --WORKS!! from foo group by year_num, age_bucket order by 1, 2; 

Why? Well, the window function does not perform aggregation. The argument must be an expression that can be evaluated after group by (because it is an aggregation request). Since num_cust not a group by key, it needs an aggregation function.

Perhaps this is more understandable if you used a subquery:

 select year_num, age_bucket, sum_num_cust, sum(sum_num_cust) over (partition by year_num) from (select year_num, age_bucket, sum(num_cust) as sum_num_cust from foo group by year_num, age_bucket ) ya order by 1, 2; 

These two queries do the same. But with a subquery, it should be more obvious why you need additional aggregation.

+6
source

Source: https://habr.com/ru/post/1272408/


All Articles