T-SQL averages

For even lines, the formula for the median is (104.5 + 108) / 2 for the table below, and for odd lines - 108 for the table below:

Total Total 100 100 101 101 104.5 104.5 108 108 108.3 108.3 112 112 114 

I wrote this query and it calculates the correct median when the number of rows is odd:

 WITH a AS ( SELECT Total , ROW_NUMBER() OVER ( ORDER BY CAST(Total AS FLOAT) ASC ) rownumber FROM [Table] A ), b AS ( SELECT TOP 2 Total , isodd FROM ( SELECT TOP 50 PERCENT Total , rownumber % 2 isodd FROM a ORDER BY CAST(Total AS FLOAT) ASC ) a ORDER BY CAST(total AS FLOAT) DESC ) SELECT * FROM b 

What is the general T-SQL query to find the median in both situations? For example, when the number of lines is odd and also when the number of lines is equal?

Can my request be twisted so that it can work for the median in both an even and an odd number of lines?

+6
source share
5 answers

This method does not work. I cannot delete my answer because it was accepted. Do not use this approach.

 select avg(Total) median from (select Total, rnasc = row_number() over(order by Total), rndesc = row_number() over(order by Total desc) from [Table] ) b where rnasc between rndesc - 1 and rndesc + 1 
+5
source

I wrote a blog about Medium, Median and Mode a couple of years ago. I recommend you read it.

Compute Medium, Medium, and Mode with SQL Server

 SELECT (( SELECT TOP 1 Total FROM ( SELECT TOP 50 PERCENT Total FROM [TABLE] A WHERE Total IS NOT NULL ORDER BY Total ) AS A ORDER BY Total DESC) + ( SELECT TOP 1 Total FROM ( SELECT TOP 50 PERCENT Total FROM [TABLE] A WHERE Total IS NOT NULL ORDER BY Total DESC ) AS A ORDER BY Total ASC)) / 2 
+8
source

I know that you were looking for a solution that works with SQL Server 2008, but if someone is looking for the MEDIAN() aggregate function in SQL Server 2012, they can emulate it with the PERCENTILE_CONT() inverse distribution function:

 WITH t(value) AS ( SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 100 ) SELECT DISTINCT percentile_cont(0.5) WITHIN GROUP (ORDER BY value) OVER (PARTITION BY 1) FROM t; 

This emulation of MEDIAN() through PERCENTILE_CONT() also described here . Unfortunately, SQL Server only supports this function as a window function, and not as a regular ordered aggregate function like Oracle or PostgreSQL.

+8
source

t-clausens answer, unfortunately, does not work correctly when there are many duplicate values ​​in the list. Then the number of lines generated by different OVER clauses is not predictable that this request works.

In my case, it worked well:

 WITH SortedTable AS ( SELECT Total, rnasc, rndesc = ROW_NUMBER() OVER(ORDER BY rnasc DESC) FROM ( SELECT Total, rnasc = ROW_NUMBER() OVER(ORDER BY Total) FROM [Table] ) SourceTable ) SELECT DISTINCT AVG(Total) median FROM SortedTable WHERE rnasc = rndesc OR ABS(rnasc-rndesc) = 1 

The WHERE clause now also clearly distinguishes between an even and an odd number of records.

+3
source

An example for the problem mentioned in my comment on the accepted answer:

 select avg(Total) median from ( select Total, rnasc = row_number() over(order by Total), rndesc = row_number() over(order by Total desc) from [Table] ) b where rnasc between rndesc - 1 and rndesc + 1 

This snippet is not guaranteed to work if there are duplicate values ​​in the input dataset, so row_number () will not provide the expected values.

For example, to enter:

 DROP TABLE #b CREATE TABLE #b (id INT IDENTITY, Total INT) INSERT INTO #b SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 5 UNION ALL SELECT 5 UNION ALL SELECT 5 

The internal request is returned (I think it may differ on different servers):

 Total rnasc rndesc 5 3 1 5 4 2 5 5 3 1 1 4 1 2 5 

An external runnig request will result in NULL (since there is no line where rnasc is between rndesc-1 and rndesc + 1)

A simple solution is to add some surrogate key (I used the identification column) to the data set and include this column in the OVER () clause:

 SELECT avg(Total) median from ( SELECT Total, rnasc = row_number() over(order by Total, id), rndesc = row_number() over(order by Total DESC, id desc) from #b ) b WHERE rnasc between rndesc - 1 and rndesc + 1 

Now the sort order is guaranteed and an internal query is returned:

 Total rnasc rndesc 5 5 1 5 4 2 5 3 3 1 2 4 1 1 5 

And the result is correct :)

+3
source

Source: https://habr.com/ru/post/896107/


All Articles