Calculate Percentage Level Using NTILE?

It is necessary to calculate the percentile ranking (1st-99th percentile) for each student with a score for one test.

I am a little confused by the definition of msdn NTILE because it does not explicitly indicate percentile ranking. I need some certainty that NTILE is the right keyword used to calculate percentile rankings.

declare @temp table ( StudentId int, Score int ) insert into @temp select 1, 20 union select 2, 25 ..... select NTILE(100) OVER (order by Score) PercentileRank from @temp 

This looks right to me, but is this the right way to calculate the Percile rank?

+6
source share
5 answers

NTILE is absolutely NOT the same as Percile. NTILE simply divides the dataset evenly by the number provided (as noted above by RoyiNamir). If you draw the results of both functions, NTILE will be an ideally linear line from 1 to n, while ranking the percentile will [usually] have some curves for it depending on your data.

The Percentile rank is much more complicated than just dividing it by N. Then it takes each number of lines and calculates where the distribution is located, interpolating if necessary (which is very intense for the processor). I have an Excel sheet of 525,000 rows, and it dominates my 8-core computer processor by 100% for 15-20 minutes to figure out the PERCENTRANK function for one column.

This article provides the best explanation of the percentage rating and how to do it in SQL:

http://sqlmag.com/t-sql/calculate-percentiles

+3
source

There is a problem with your code, as the NTILE distribution is not homogeneous. If you have 213 students, most of the 13 groups will have 3 students, and the last 87 will have 2 students. This is not what you would ideally want in the distribution of percentiles.

You might want to use RANK / ROWNUM and then split to get the group% ile.

+1
source

One way to think about it is "the percentage of students with a score lower than this."

Here is one way to get this type of percentile in SQL Server using RANK() :

 select * , (rank() over (order by Score) - 1.0) / (select count(*) from @temp) * 100 as PercentileRank from @temp 

Note that this will always be less than 100% unless you round and you always get 0% for the smallest value. This does not necessarily mean that the median value is 50%, and it will not interpolate, as some percentile calculations do.

Remember to round or distinguish the whole expression (for example, cast(... as decimal(4,2)) ) for good reports, or even replace - 1.0 with - 1e to force a floating point calculation.

NTILE() is actually not what you are looking for in this case, because it essentially divides the number of rows of an ordered set into groups, not values. It will assign a different percentile to two instances of the same value if these instances intersect with the intersection point. Then you have to further group this value and capture the maximum or minimum percentile of the group in order to use NTILE() in the same way as we do with RANK() .

+1
source

Is there a typo?

 select NTILE(100) OVER (order by Score) PercentileRank from @temp 

And your script looks good. If you think something is wrong, could you explain what exactly?

0
source

I know this is an old thread, but, of course, a lot of misinformation about this topic makes it on the Internet.

NTILE is not designed to calculate percentile ranking (AKA percentage rating)

If you use NTILE to calculate Percent Rank, you are doing it wrong. Anyone who tells you this is misinformed and mistaken. If you use NTILE (100) and get the correct answer, this is just a coincidence.

Tim Lenner perfectly explained the problem.

"He will assign a different percentile to two instances of the same value if these instances intersect with the crossover point."

In other words, using NTILE to calculate where students are ranked based on their test scores can cause two students with the same test scores to get different percentage scores. Conversely, two students with different scores may receive the same percentage rating.

For a more detailed explanation of why NTILE is the wrong tool for this job, and also as the most effective percent_rank alternative, see: Nasty Fast PERCENT_RANK. http://www.sqlservercentral.com/articles/PERCENT_RANK/141532/

0
source

Source: https://habr.com/ru/post/911441/


All Articles