I have the following pandas dataframe:
data_df = pd.DataFrame({'ind':['la','p','la','la','p','g','g','la'], 'dist':[10.,5.,7.,8.,7.,2.,5.,3.], 'diff':[0.54,3.2,8.6,7.2,2.1,1.,3.5,4.5], 'cas':[1.,2.,3.,4.,5.,6.,7.,8.]})
i.e
cas diff dist ind 0 1 0.54 10 la 1 2 3.20 5 p 2 3 8.60 7 la 3 4 7.20 8 la 4 5 2.10 7 p 5 6 1.00 2 g 6 7 3.50 5 g 7 8 4.50 3 la
I need to calculate the weighted average of all the columns where the weights are in the "dist" column and group the values with "ind".
For example, for 'ind' = 'la' and the column 'diff':
((10*0.54)+(8.60*7)+(7.20*8)+(4.50*3))/(10+7+8+3) = 4.882143
The result I want is the following
cas diff ind g 6.714286 2.785714 la 3.107143 4.882143 p 3.750000 2.558333
which is obtained by multiplying each value of each column by the corresponding value in the "dist" column, summarize the results with the same "ind", and then divide the result by the sum of all the "dist" values corresponding to the same index.
I thought it would be an easy task performed by the dataframe groupby method, but in fact it is quite complicated.
Can anyone help me out?