How can I return the square box numeric data of all results using 1 mySQL query?

[tbl_votes] - id <!-- unique id of the vote) --> - item_id <!-- vote belongs to item <id> --> - vote <!-- number 1-10 --> 

Of course, we can fix this by getting:

  • smallest observation (so)
  • lower quartile (lq)
  • median (me)
  • upper quartile (uq)
  • and largest observation (lo)

.. one by one using multiple queries, but I wonder if this can be done with a single query.

In Oracle, I can use COUNT OVER and RATIO_TO_REPORT , but this is not supported in mySQL.

For those who do not know what boxplot is: http://en.wikipedia.org/wiki/Box_plot

Any help would be greatly appreciated.

+6
source share
3 answers

Below is an example of calculating quartiles for ranges of e256 values ​​within e32 groups, the index in (e32, e256) in this case is mandatory:

 SELECT @group:=IF( e32=@group , e32, GREATEST(@index:=-1, e32)) as e32_, MIN(e256) as so, MAX(IF(lq_i=(@index: =@index +1), e256, NULL)) as lq, MAX(IF( me_i=@index , e256, NULL)) as me, MAX(IF( uq_i=@index , e256, NULL)) as uq, MAX(e256) as lo FROM (SELECT @index:=NULL, @group:=NULL) as init, test t JOIN ( SELECT e32, COUNT(*) as cnt, (COUNT(*) div 4) as lq_i, -- lq value index within the group (COUNT(*) div 2) as me_i, -- me value index within the group (COUNT(*) * 3 div 4) as uq_i -- uq value index within the group FROM test GROUP BY e32 ) as cnts USING (e32) GROUP BY e32; 

If no groups are needed, the query will be a little easier.

PS test is my random value playground table, where e32 is the result of Python int(random.expovariate(1.0) * 32) , etc.

0
source

I found a solution in PostgreSQL using PL / Python.

However, I leave the question open if someone else comes up with a solution in mySQL.

 CREATE TYPE boxplot_values AS ( min numeric, q1 numeric, median numeric, q3 numeric, max numeric ); CREATE OR REPLACE FUNCTION _final_boxplot(strarr numeric[]) RETURNS boxplot_values AS $$ x = strarr.replace("{","[").replace("}","]") a = eval(str(x)) a.sort() i = len(a) return ( a[0], a[i/4], a[i/2], a[i*3/4], a[-1] ) $$ LANGUAGE 'plpythonu' IMMUTABLE; CREATE AGGREGATE boxplot(numeric) ( SFUNC=array_append, STYPE=numeric[], FINALFUNC=_final_boxplot, INITCOND='{}' ); 

Example:

 SELECT customer_id as cid, (boxplot(price)).* FROM orders GROUP BY customer_id; cid | min | q1 | median | q3 | max -------+---------+---------+---------+---------+--------- 1001 | 7.40209 | 7.80031 | 7.9551 | 7.99059 | 7.99903 1002 | 3.44229 | 4.38172 | 4.72498 | 5.25214 | 5.98736 

Source: http://www.christian-rossow.de/articles/PostgreSQL_boxplot_median_quartiles_aggregate_function.php

+2
source

Well, I can do this in two queries. Make the first request to get the quartile positions, and then use the limit function to get answers in the second request.

mysql> select (select gender (count (*) / 4)) as first_q (select gender (count (*) / 2) from customer_data) as mid_pos, (select gender (count (*) / 4 * 3) from customer_data) as third_q of order customer_data by dimension limit 1;

mysql> select min (measure), (select a measure from customer_data order by measurement limit 0.1) as firstq, (select a measure from customer_data order by measurement limit 5.1) as the median (select a measure from customer_data order by measurement limit 8 , 1) as last_q, max (measure) from customer_data;

0
source

Source: https://habr.com/ru/post/904505/


All Articles