Pandas aggregation ignoring NaN

Question

Pandas aggregation ignoring NaN

I summarize my Pandas dataframe: data . In particular, I want to get the average value and amount over the tuples [ origin and type ]. For averaging and summing, I tried the numpy functions below:

 import numpy as np import pandas as pd result = data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum, pd.Series.mean]}).reset_index()

My problem is that the amount column includes NaN s, which results in the result above code having a lot of NaN average value and sums.

I know that both pd.Series.sum and pd.Series.mean have skipna=True by default, so why am I still getting NaN here?

I also tried this, which obviously didn't work:

 data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum(skipna=True), pd.Series.mean(skipna=True)]}).reset_index()

EDIT: At the suggestion of @Korem, I also tried using partial , as shown below:

 s_na_mean = partial(pd.Series.mean, skipna = True) data.groupby(groupbyvars).agg({'amount': [ np.nansum, s_na_mean ]}).reset_index()

but get this error:

 error: 'functools.partial' object has no attribute '__name__'

+6

python numpy pandas aggregate nan

Rhubarb 01 Oct '14 at 16:01

source share

1 answer

Korem · Answer 1 · 2014-10-01T19:06:27+0000

Use numpy nansum and nanmean

 from numpy import nansum from numpy import nanmean data.groupby(groupbyvars).agg({'amount': [ nansum, nanmean]}).reset_index()

As a workaround for an older version of numpy, as well as a way to fix your last attempt:

When you execute pd.Series.sum(skipna=True) , you call the method. If you want to use it like this, you want to define partial . Therefore, if you do not have nanmean , define s_na_mean and use this:

 from functools import partial s_na_mean = partial(pd.Series.mean, skipna = True)

Pandas aggregation ignoring NaN

More articles: