Here I have a function that calculates a percentile column based on two other columns in a data frame: for each row, the function recreates mini-df with only the last 20 rows, calculates the absolute difference for each of them, and then assigns the percentile to the current row.
I suggested to the respondent to the previous question in order to convey this more specific question regarding the application
grid = np.random.rand(40,2)
full = pd.DataFrame(grid, columns=['value'])
def percentile(x, df):
if int(x.name)<20:
pass
else:
df_temp = df.loc[(int(x.name)-20):int(x.name),'value']
bucketted = [b for b in df_temp.value if b < df_temp.loc[int(x.name), 'value']]
return len(bucketted)/0.2
full['percentile'] = full.apply(percentile, axis=1, args=(full,))
for intellectual curiosity - as it works - if someone has a faster / faster way to approach the problem.
source
share