SQL Server NTILE - same value in different quartiles

I have a scenario where I split several results into groups using the SQL Server NTILE function below. The goal is to have an equal number of rows in each class

case NTILE(4) over (order by t2.TotalStd) when 1 then 'A' when 2 then 'B' when 3 then 'C' else 'D' end as Class 

The table of results is given below, and there is a division (9,9,8,8) between the four groups of classes A, B, C and D.

There are two results that cause me a problem, both lines have the same common value of std 30, but are assigned to different quartiles.

 8 30 A 2 30 B 

I am wondering if there is a way to guarantee that rows with the same value are assigned to the same quartile? Can I group or split with another column to get this behavior?

 Pos TotalStd class 1 16 A 2 23 A 3 21 A 4 29 A 5 25 A 6 26 A 7 28 A 8 30 A 9 29 A 1 31 B 2 30 B 3 32 B 4 32 B 5 34 B 6 32 B 7 34 B 8 32 B 9 33 B 1 36 C 2 35 C 3 35 C 4 35 C 5 40 C 6 38 C 7 41 C 8 43 C 1 43 D 2 48 D 3 45 D 4 47 D 5 44 D 6 48 D 7 46 D 8 57 D 
+4
source share
4 answers

Not sure what you expect here, really. At your request, SQL Server divided the data into 4 groups as large as possible. What would you like? Take a look at this example:

 declare @data table ( x int ) insert @data values (1),(2), (2),(3), (3),(4), (4),(5) select x, NTILE(4) over (order by x) as ntile from @data 

Results:

 x ntile ----------- ---------- 1 1 2 1 2 2 3 2 3 3 4 3 4 4 5 4 

Now each ntile group matters next to her (her)! But what else needs to be done?

+1
source

Try the following:

 ; with a as (      select TotalStd,Class=case ntile(4)over( order by TotalStd )                               when 1 then 'A'                               when 2 then 'B'                               when 3 then 'C'                               when 4 then 'D'                               end               from t2               group by TotalStd ) select d.*, a.Class from t2 d inner join a on a.TotalStd=d.TotalStd order by Class,Pos; 
+1
source

You will need to create an Ntile function using the rank function. The ranking function gives the same rank for rows with the same value. The value later β€œjumps” to the next rank, as if you were using row_number. We can use this behavior to mimic the Ntile function, forcing it to give the same Ntile value for strings with the same value. However, this will cause the Ntile partitions to have different sizes. See the example below for a new Ntile using 4 bins:

 declare @data table ( x int ) insert @data values (1),(2), (2),(3), (3),(4), (4),(5) select x, 1+(rank() over (order by x)-1) * 4 / count(1) over (partition by (select 1)) as new_ntile from @data 

Results:

 x new_ntile --------------- 1 1 2 1 2 1 3 2 3 2 4 3 4 3 5 4 
+1
source

Here we have a table of 34 rows.

 DECLARE @x TABLE (TotalStd INT) INSERT @x (TotalStd) VALUES (16), (21), (23), (25), (26), (28), (29), (29), (30), (30), (31), (32), (32), (32), (32), (33), (34), (34), (35), (35), (35), (36), (38), (40), (41), (43), (43), (44), (45), (46), (47), (48), (48), (57) SELECT '@x', TotalStd FROM @x ORDER BY TotalStd 

We want to divide into quartiles. If we use NTILE , the bucket sizes will be about the same size (8 to 9 lines each), but the links will be arbitrarily broken:

 SELECT '@x with NTILE', TotalStd, NTILE(4) OVER (ORDER BY TotalStd) quantile FROM @x 

See how 30 appears twice: once in quantile 1 and once in quantile 2. Similarly, 43 appears in both quantiles 3 and 4.

What I have to find are 10 elements in quantile 1, 8 in quantile 2, 7 in quantile 3 and 9 in quantile 4 (i.e. not perfect separation 9-8-9-8, but such separation is impossible, if we are not allowed to break the bonds arbitrarily). I can do this with NTILE to define cutoff points in a temporary table:

 DECLARE @cutoffs TABLE (quantile INT, min_value INT, max_value INT) INSERT @cutoffs (quantile, min_value) SELECT y.quantile, MIN(y.TotalStd) FROM (SELECT TotalStd, NTILE(4) OVER (ORDER BY TotalStd) AS quantile FROM @x) y GROUP BY y.quantile -- The max values are the minimum values of the next quintiles UPDATE c1 SET c1.max_value = ISNULL(C2.min_value, (SELECT MAX(TotalStd) + 1 FROM @x)) FROM @cutoffs c1 LEFT OUTER JOIN @cutoffs c2 ON c2.quantile - 1 = c1.quantile SELECT '@cutoffs', * FROM @cutoffs 

We will use the boundary values ​​in the @cutoffs table to create the resulting table:

 SELECT x.TotalStd, c.quantile FROM @xx INNER JOIN @cutoffs c ON x.TotalStd >= c.min_value AND x.TotalStd < c.max_value 
0
source

Source: https://habr.com/ru/post/1397069/


All Articles