Pandas aggregation ignoring NaN

I summarize my Pandas dataframe: data . In particular, I want to get the average value and amount over the tuples [ origin and type ]. For averaging and summing, I tried the numpy functions below:

 import numpy as np import pandas as pd result = data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum, pd.Series.mean]}).reset_index() 

My problem is that the amount column includes NaN s, which results in the result above code having a lot of NaN average value and sums.

I know that both pd.Series.sum and pd.Series.mean have skipna=True by default, so why am I still getting NaN here?

I also tried this, which obviously didn't work:

 data.groupby(groupbyvars).agg({'amount': [ pd.Series.sum(skipna=True), pd.Series.mean(skipna=True)]}).reset_index() 

EDIT: At the suggestion of @Korem, I also tried using partial , as shown below:

 s_na_mean = partial(pd.Series.mean, skipna = True) data.groupby(groupbyvars).agg({'amount': [ np.nansum, s_na_mean ]}).reset_index() 

but get this error:

 error: 'functools.partial' object has no attribute '__name__' 
+6
source share
1 answer

Use numpy nansum and nanmean

 from numpy import nansum from numpy import nanmean data.groupby(groupbyvars).agg({'amount': [ nansum, nanmean]}).reset_index() 

As a workaround for an older version of numpy, as well as a way to fix your last attempt:

When you execute pd.Series.sum(skipna=True) , you call the method. If you want to use it like this, you want to define partial . Therefore, if you do not have nanmean , define s_na_mean and use this:

 from functools import partial s_na_mean = partial(pd.Series.mean, skipna = True) 
+5
source

Source: https://habr.com/ru/post/976117/


All Articles