Convert array to percentiles

I have an array that I want to convert to percentiles. For example, let's say I have a normally distributed array:

import numpy as np
import matplotlib.pyplot as plt

arr = np.random.normal(0, 1, 1000)
plt.hist(arr)

enter image description here

For each value in this array, I want to calculate the percentile of this value (for example, 0 is the 50th percentile of the above distribution, so 0 → 0.5). The result should be evenly distributed, since each percentile should have equal weight.

enter image description here

I found np.percentile, but this function returns the value given for the array and quantile , and I need to return the quantile, the given array and value .

Is there a relatively effective way to do this?

+2
source share
2
from scipy.stats import percentileofscore

# generate example data
arr = np.random.normal(0, 1, 10)

# pre-sort array
arr_sorted =  sorted(arr)

# calculate percentiles using scipy func percentileofscore on each array element
s = pd.Series(arr)
percentiles = s.apply(lambda x: percentileofscore(arr_sorted, x))

:

df = pd.DataFrame({'data': s, 'percentiles': percentiles})    
df.sort_values(by='data')

       data   pcts
3 -1.692881   10.0
8 -1.395427   20.0
7 -1.162031   30.0
6 -0.568550   40.0
9  0.047298   50.0
5  0.296661   60.0
0  0.534816   70.0
4  0.542267   80.0
1  0.584766   90.0
2  1.185000  100.0
+2

. , . , inverted_edf.

SAMPLE . df , , inverted_edf.

, 1000 , 0,5 .

import statsmodels.distributions.empirical_distribution as edf
from scipy.interpolate import interp1d
import numpy as np
import matplotlib.pyplot as plt

SAMPLE = np.random.normal(0, 1, 1000)
sample_edf = edf.ECDF(SAMPLE)

slope_changes = sorted(set(SAMPLE))

sample_edf_values_at_slope_changes = [ sample_edf(item) for item in slope_changes]
inverted_edf = interp1d(sample_edf_values_at_slope_changes, slope_changes)

x = np.linspace(0.005, 1)
y = inverted_edf(x)
#~ plt.plot(x, y, 'ro', x, y, 'b-')
plt.plot(x, y, 'b-')
plt.show()

p = 0.5
print ('%s percentile:' % (100*p), inverted_edf(p))

.

PIT Chart

50.0 percentile: -0.05917394517540461
50.0 percentile: -0.0034011090849578695
0

Source: https://habr.com/ru/post/1679539/


All Articles