I understand how to create simple quantiles in Pandas using pd.qcut . But after searching, I see nothing to create weighted quantiles. In particular, I want to create a variable that fills the values โโof the variable of interest (from the smallest to the largest), so that each bit contains equal weight. So far this is what I have:
def wtdQuantile(dataframe, var, weight = None, n = 10):
if weight == None:
return pd.qcut(dataframe[var], n, labels = False)
else:
dataframe.sort_values(var, ascending = True, inplace = True)
cum_sum = dataframe[weight].cumsum()
cutoff = max(cum_sum)/n
quantile = cum_sum/cutoff
quantile[-1:] -= 1
return quantile.map(int)
Is there an easier way or something from Pandas that I am missing?
Edit: As requested, I provide some sample data. In the following, I try to knock out the variable "Var", using the weight "Weight" as the weight. Using pd.qcut, we get an equal number of observations in each box. Instead, I want equal weight in each box or in this case as close to it as possible.
Weight Var pd.qcut(n=5) Desired_Rslt
10 1 0 0
14 2 0 0
18 3 1 0
15 4 1 1
30 5 2 1
12 6 2 2
20 7 3 2
25 8 3 3
29 9 4 3
45 10 4 4
source
share