As I read the question, you want to be able to do something arbitrary with both values ββfrom both columns. You just need to make sure that you return the data frame the same size as you passed. I think the best way is to simply create a new column, for example:
df = pd.DataFrame({'a':[1,2,3,4,5,6], 'b':[1,2,3,4,5,6], 'c':['q', 'q', 'q', 'q', 'w', 'w'], 'd':['z','z','z','o','o','o']}) df['e']=0 def f(x): y=(x['a']+x['b'])/sum(x['b']) return pd.DataFrame({'e':y,'a':x['a'],'b':x['b']}) df.groupby(['c','d']).transform(f)
:
abe 0 1 1 0.333333 1 2 2 0.666667 2 3 3 1.000000 3 4 4 2.000000 4 5 5 0.909091 5 6 6 1.090909
If you have a very complex framework, you can select your own columns (for example, df.groupby(['c'])['a','b','e'].transform(f) )
This of course looks very inelegant for me, but still much faster than apply on large datasets.
Another alternative is to use set_index to capture all the columns you need and then pass only one column to transform .
Victor Chubukov May 23 '16 at 23:09 2016-05-23 23:09
source share