Python Pandas - how 25 percentiles are calculated using the description function

For a given dataset in the data frame, when I apply the function describe, I get basic statistics that include min, max, 25%, 50%, etc.

For instance:

data_1 = pd.DataFrame({'One':[4,6,8,10]},columns=['One'])
data_1.describe()

Conclusion:

        One
count   4.000000
mean    7.000000
std     2.581989
min     4.000000
25%     5.500000
50%     7.000000
75%     8.500000
max     10.000000

My question is : What is the mathematical formula for calculating 25%?

1) Based on what I know, this is:

formula = percentile * n (n is number of values)

In this case:

25/100 * 4 = 1

So, the first position is number 4, but according to the descriptive function it is 5.5.

2) Another example says: if you get an integer, then take the average value of 4 and 6, which will be equal to 5, it still does not match the 5.5one given by the description.

3) Another textbook says: you take the difference between the two numbers - multiply by 25% and add to the lower number:

25/100 * (6-4) = 1/4*2 = 0.5

: 4 + 0.5 = 4.5

- 5.5.

- ?

+4
2

pandas , numpy.percentile:

, a la numpy.percentile.

, numpy.percentile , , linear :

linear: + (j - i) * fraction, , j

25- :

res_25 = 4 + (6-4)*(3/4) =  5.5

75- :

res_75 = 8 + (10-8)*(1/4) = 8.5

"midpoint", , .

.

+7

, , min + (max-min) * percentile. , , NumPy:

linear: + (j - i) * fraction, fraction - , j

res_25 = 4+(10-4)*percentile = 4+(10-4)*25% = 5.5
res_75 = 4+(10-4)*percentile = 4+(10-4)*75% = 8.5
0

Source: https://habr.com/ru/post/1655035/


All Articles