You can return any number of aggregated values ββfrom a groupby object with apply . Just return the series and the index values ββwill become the new column names.
Take a look at a quick example:
df = pd.DataFrame({'group':['a','a','b','b'], 'd1':[5,10,100,30], 'd2':[7,1,3,20], 'weights':[.2,.8, .4, .6]}, columns=['group', 'd1', 'd2', 'weights']) df group d1 d2 weights 0 a 5 7 0.2 1 a 10 1 0.8 2 b 100 3 0.4 3 b 30 20 0.6
Define a custom function to be passed to apply . It implicitly accepts a DataFrame - the value of the data parameter is a DataFrame. Notice how it uses multiple columns, which is not possible using the agg groupby method:
def weighted_average(data): d = {} d['d1_wa'] = np.average(data['d1'], weights=data['weights']) d['d2_wa'] = np.average(data['d2'], weights=data['weights']) return pd.Series(d)
Call the groupby apply method using our custom function:
df.groupby('group').apply(weighted_average) d1_wa d2_wa group a 9.0 2.2 b 58.0 13.2
You can get better performance by pre-calculating the weighted totals into the new DataFrame columns, as described in other answers, and don't use apply at all.
Ted Petrou Nov 04 '17 at 18:16 2017-11-04 18:16
source share