Calculate weighted average with pandas dataframe

I have the following pandas dataframe:

data_df = pd.DataFrame({'ind':['la','p','la','la','p','g','g','la'], 'dist':[10.,5.,7.,8.,7.,2.,5.,3.], 'diff':[0.54,3.2,8.6,7.2,2.1,1.,3.5,4.5], 'cas':[1.,2.,3.,4.,5.,6.,7.,8.]}) 

i.e

  cas diff dist ind 0 1 0.54 10 la 1 2 3.20 5 p 2 3 8.60 7 la 3 4 7.20 8 la 4 5 2.10 7 p 5 6 1.00 2 g 6 7 3.50 5 g 7 8 4.50 3 la 

I need to calculate the weighted average of all the columns where the weights are in the "dist" column and group the values ​​with "ind".

For example, for 'ind' = 'la' and the column 'diff':

 ((10*0.54)+(8.60*7)+(7.20*8)+(4.50*3))/(10+7+8+3) = 4.882143 

The result I want is the following

  cas diff ind g 6.714286 2.785714 la 3.107143 4.882143 p 3.750000 2.558333 

which is obtained by multiplying each value of each column by the corresponding value in the "dist" column, summarize the results with the same "ind", and then divide the result by the sum of all the "dist" values ​​corresponding to the same index.

I thought it would be an easy task performed by the dataframe groupby method, but in fact it is quite complicated.

Can anyone help me out?

+5
source share
1 answer

You can get within the normalized group weights using transform :

 >>> df['weight'] = df['dist'] / df.groupby('ind')['dist'].transform('sum') >>> df['weight'] 0 0.357143 1 0.416667 2 0.250000 3 0.285714 4 0.583333 5 0.285714 6 0.714286 7 0.107143 Name: weight, dtype: float64 

Then you just need to multiply this weight by the values ​​and take the sum:

 >>> df['wcas'], df['wdiff'] = (df[n] * df['weight'] for n in ('cas', 'diff')) >>> df.groupby('ind')[['wcas', 'wdiff']].sum() wcas wdiff ind g 6.714286 2.785714 la 3.107143 4.882143 p 3.750000 2.558333 

Edit: using mutation in place:

 >>> backup = df.copy() # make a backup copy to mutate in place >>> cols = df.columns[:2] # cas, diff >>> df[cols] = df['weight'].values[:, None] * df[cols] >>> df.groupby('ind')[cols].sum() cas diff ind g 6.714286 2.785714 la 3.107143 4.882143 p 3.750000 2.558333 
+6
source

Source: https://habr.com/ru/post/1235727/


All Articles