I summarize my Pandas dataframe: data . In particular, I want to get the average value and amount over the tuples [ origin and type ]. For averaging and summing, I tried the numpy functions below:
import numpy as np import pandas as pd result = data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum, pd.Series.mean]}).reset_index()
My problem is that the amount column includes NaN s, which results in the result above code having a lot of NaN average value and sums.
I know that both pd.Series.sum and pd.Series.mean have skipna=True by default, so why am I still getting NaN here?
I also tried this, which obviously didn't work:
data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum(skipna=True), pd.Series.mean(skipna=True)]}).reset_index()
EDIT: At the suggestion of @Korem, I also tried using partial , as shown below:
s_na_mean = partial(pd.Series.mean, skipna = True) data.groupby(groupbyvars).agg({'amount': [ np.nansum, s_na_mean ]}).reset_index()
but get this error:
error: 'functools.partial' object has no attribute '__name__'
source share