9 months later, and this is still one of the best results for working with multiprocessing and pandas. I hope you have found some kind of answer at the moment, but if not, I have one that seems to work, and hopefully this will help others who are considering this issue.
import pandas as pd import numpy as np #sample data df = pd.DataFrame([[1,2,3,1,2,3,1,2,3,1],[2,2,2,2,2,2,2,2,2,2],[1,3,5,7,9,2,4,6,8,0],[2,4,6,8,0,1,3,5,7,9]]).transpose() df.columns=['a','b','c','d'] df abcd 0 1 2 1 2 1 2 2 3 4 2 3 2 5 6 3 1 2 7 8 4 2 2 9 0 5 3 2 2 1 6 1 2 4 3 7 2 2 6 5 8 3 2 8 7 9 1 2 0 9 #this one function does the three functions you had used in your question, obviously you could add more functions or different ones for different groupby things def f(x): return [np.mean(x[1]['c']),np.mean(x[1]['d']),x[1]['d'].sum()] #sets up a pool with 4 cpus from multiprocessing import Pool pool = Pool(4) #runs the statistics you wanted on each group group_df = pd.DataFrame(pool.map(f,df.groupby(['a','b']))) group_df 0 1 2 0 3 5.500000 22 1 6 3.000000 9 2 5 4.666667 14 group_df['keys']=df.groupby(['a','b']).groups.keys() group_df 0 1 2 keys 0 3 5.500000 22 (1, 2) 1 6 3.000000 9 (3, 2) 2 5 4.666667 14 (2, 2)
At least I hope this helps someone who is looking at this material in the future.