Calculate medians for multiple columns in one table in one query

StackOverflow to the rescue!, I need to find medians for five columns at once, in one query.

The median calculations below work for individual columns, but when combined, multiple use of "rownum" discards the query. How can I update this to work for multiple columns? THANKS. This is to create a web-based tool where nonprofits can compare their financial performance with user-defined peer groups.

SELECT t1_wages.totalwages_pctoftotexp AS median_totalwages_pctoftotexp FROM ( SELECT @rownum := @rownum +1 AS `row_number` , d_wages.totalwages_pctoftotexp FROM data_990_c3 d_wages, ( SELECT @rownum :=0 )r_wages WHERE totalwages_pctoftotexp >0 ORDER BY d_wages.totalwages_pctoftotexp ) AS t1_wages, ( SELECT COUNT( * ) AS total_rows FROM data_990_c3 d_wages WHERE totalwages_pctoftotexp >0 ) AS t2_wages WHERE 1 AND t1_wages.row_number = FLOOR( total_rows /2 ) +1 --- [that was one median, below is another] --- SELECT t1_solvent.solvent_days AS median_solvent_days FROM ( SELECT @rownum := @rownum +1 AS `row_number` , d_solvent.solvent_days FROM data_990_c3 d_solvent, ( SELECT @rownum :=0 )r_solvent WHERE solvent_days >0 ORDER BY d_solvent.solvent_days ) AS t1_solvent, ( SELECT COUNT( * ) AS total_rows FROM data_990_c3 d_solvent WHERE solvent_days >0 ) AS t2_solvent WHERE 1 AND t1_solvent.row_number = FLOOR( total_rows /2 ) +1 

[these two are only five, I will ultimately need to find medians right away)

+4
source share
2 answers

Such a thing is a big neck pain in MySQL. It might be wise to use the free edition of Oracle Express or postgreSQL if you are going to do the tonnage of this statistical ranking job. They all have MEDIAN(value) aggregate functions that are either built-in or available as extensions. Here a little sqlfiddle demonstrates this. http://sqlfiddle.com/#!4/53de8/6/0

But you did not ask about it.

In MySQL, the main problem is the scope of variables like @rownum. You also have a rotation problem: i.e. You need to turn the rows of your query into columns.

First solve the problem with the prop. What you are going to do is create a union of several large thick queries. For instance:

 SELECT 'median_wages' AS tag, wages AS value FROM (big fat query making median wages) A UNION SELECT 'median_volunteer_hours' AS tag, hours AS value FROM (big fat query making median volunteer hours) B UNION SELECT 'median_solvent_days' AS tag, days AS value FROM (big fat query making median solvency days) C 

So, here are your results in a table of tag / value pairs. You can rotate this table to get one row with a value in each column.

 SELECT SUM( CASE tag WHEN 'median_wages' THEN value ELSE 0 END ) AS median_wages, SELECT SUM( CASE tag WHEN 'median_volunteer_hours' THEN value ELSE 0 END ) AS median_volunteer_hours, SELECT SUM( CASE tag WHEN 'median_solvent_days' THEN value ELSE 0 END ) AS median_solvent_days FROM ( /* the above gigantic UNION query */ ) Q 

The way you rotate rows (from a UNION query in this case) into columns. Here is a tutorial on this topic. http://www.artfulsoftware.com/infotree/qrytip.php?id=523

Now we need to tackle the subqueries of median computing. The code in your question looks pretty good. I have no data, so it’s hard for me to evaluate them.

But you need to avoid reusing the @rownum variable. Call it @ rownum1 in one of your queries, @ rownum2 in the next, and so on. Here dinky sql fiddle does only one of them. http://sqlfiddle.com/#!2/2f770/1/0

Now let's tweak it a bit by making two different medians. Here is the fiddle http://sqlfiddle.com/#!2/2f770/2/0 and here is the UNION request. Note that the second half of the join request uses @rownum2 instead of @rownum .

Finally, here is the complete request with a twist. http://sqlfiddle.com/#!2/2f770/13/0

  SELECT SUM( CASE tag WHEN 'Boston' THEN value ELSE 0 END ) AS Boston, SUM( CASE tag WHEN 'Bronx' THEN value ELSE 0 END ) AS Bronx FROM ( SELECT 'Boston' AS tag, pop AS VALUE FROM ( SELECT @rownum := @rownum +1 AS `row_number` , pop FROM pops, (SELECT @rownum :=0)r WHERE pop >0 AND city = 'Boston' ORDER BY pop ) AS ordered_rows, ( SELECT COUNT( * ) AS total_rows FROM pops WHERE pop >0 AND city = 'Boston' ) AS rowcount WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1 UNION ALL SELECT 'Bronx' AS tag, pop AS VALUE FROM ( SELECT @rownum2 := @rownum2 +1 AS `row_number` , pop FROM pops, (SELECT @rownum2 :=0)r WHERE pop >0 AND city = 'Bronx' ORDER BY pop ) AS ordered_rows, ( SELECT COUNT( * ) AS total_rows FROM pops WHERE pop >0 AND city = 'Bronx' ) AS rowcount WHERE ordered_rows.row_number = FLOOR( total_rows /2 ) +1 ) D 

These are just two medians. You need five. I think it's easy to make this median computation absurdly hard to do in MySQL in a single query.

+2
source

Suppose you have a table with three columns, such as table (key, value1, value2).

this query gives the median value of two columns of values ​​for each key:

 SELECT key, ((array_agg(value1 order by value1 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value1 order by value1 asc) )[ceiling( (count(*)+1)::float/2) ] )/2, ((array_agg(value2 order by value2 asc) )[floor( (count(*)+1)::float/2)] + (array_agg(value2 order by value2 asc) )[ceiling( (count(*)+1)::float/2) ] )/2 FROM table GROUP BY key 
0
source

Source: https://habr.com/ru/post/1489485/


All Articles