Computing the median with the By group in AWS Redshift

I saw other posts about using the median () window function in Redshift , but how would you use it with a query that has a group at the end?

For example, suppose the table rate is:

Course | Subject | Num_Students ------------------------------- 1 | Math | 4 2 | Math | 6 3 | Math | 10 4 | Science | 2 5 | Science | 10 6 | Science | 12 

I want to get the average number of students for each subject of the course. How to write a query that gives the following result:

  Subject | Median ----------------------- Math | 6 Science | 10 

I tried:

 SELECT subject, median(num_students) over () FROM course GROUP BY 1 ; 

But it lists all the cases of the subject and the same average for topics such as (this is fake data, so the actual value that he returns is not 6, but simply shows it the same way for all items):

  Subject | Median ----------------------- Math | 6 Math | 6 Math | 6 Science | 6 Science | 6 Science | 6 
+8
source share
4 answers

You just need to remove the "over ()" part.

 SELECT subject, median(num_students) FROM course GROUP BY 1; 
+2
source

Below you will get exactly the result you are looking for:

 SELECT distinct subject, median(num_students) over(partition by Subject) FROM course order by Subject; 
+8
source

You have not defined a section in the window. Instead of OVER() you need OVER(PARTITION BY subject) .

+1
source

Let's say you want to calculate other aggregations by subject, for example, avg (), you need to use a subquery:

 WITH subject_numstudents_medianstudents AS ( SELECT subject , num_students , median(num_students) over (partition BY subject) AS median_students FROM course ) SELECT subject , median_students , avg(num_students) as avg_students FROM subject_numstudents_medianstudents GROUP BY 1, 2 
0
source

Source: https://habr.com/ru/post/982460/


All Articles