The function you pass to apply must take pandas.DataFrame as the first argument. You can pass additional keywords or positional arguments to apply , which will be passed to the application function. Thus, your example will work with a small modification. Change ols_res to
def ols_res(df, xcols, ycol): return sm.OLS(df[ycol], df[xcols]).fit().predict()
Then you can use groupby and apply , like this
df.groupby('grp').apply(ols_res, xcols=['x1', 'x2'], ycol='y')
or
df.groupby('grp').apply(ols_res, ['x1', 'x2'], 'y')
EDIT
In the above code, several one-dimensional regressions are not performed. Instead, one multivariate regression is performed for each group. However, with another (small) modification it will be.
def ols_res(df, xcols, ycol): return pd.DataFrame({xcol : sm.OLS(df[ycol], df[xcol]).fit().predict() for xcol in xcols})
EDIT 2
Although the above solution works, I think the following is a bit more pandas -y
import statsmodels.api as sm import pandas as pd import numpy as np df = pd.DataFrame({ 'y': np.random.randn(20), 'x1': np.random.randn(20), 'x2': np.random.randn(20), 'grp': ['a', 'b'] * 10}) def ols_res(x, y): return pd.Series(sm.OLS(y, x).fit().predict()) df.groupby('grp').apply(lambda x : x[['x1', 'x2']].apply(ols_res, y=x['y']))
For some reason, if I define ols_res() as it was originally, the resulting DataFrame does not have a group label in the index.