Summing rows in grouped pandas data and returning NaN

Question

Summing rows in grouped pandas data and returning NaN

Example

import pandas as pd
import numpy as np
d = {'l':  ['left', 'right', 'left', 'right', 'left', 'right'],
     'r': ['right', 'left', 'right', 'left', 'right', 'left'],
     'v': [-1, 1, -1, 1, -1, np.nan]}
df = pd.DataFrame(d)

Problem

When a grouped framework contains a value np.NaN, I want the grouped sum to NaNbe as indicated by the flag skipna=Falsefor pd.Series.sum, as well as pd.DataFrame.sumhowever this

In [235]: df.v.sum(skipna=False)
Out[235]: nan

However, this behavior is not reflected in pandas.DataFrame.groupbyobject

In [237]: df.groupby('l')['v'].sum()['right']
Out[237]: 2.0

and cannot be applied directly to a method np.sum

In [238]: df.groupby('l')['v'].apply(np.sum)['right']
Out[238]: 2.0

Bypass

I can get around this by doing

check_cols = ['v']
df['flag'] = df[check_cols].isnull().any(axis=1)
df.groupby('l')['v', 'flag'].apply(np.sum).apply(
    lambda x: x if not x.flag else np.nan,
    axis=1
)

but it is ugly. Is there a better way?

+4

python numpy pandas nan dataframe

Alexander McFarlane Mar 13 '17 at 17:55

source share

3 answers

, , :

>>> series_sum = pd.core.series.Series.sum
>>> df.groupby('l')['v'].agg(series_sum, skipna=False)
l
left     -3
right   NaN
Name: v, dtype: float64

sum, , df.v.sum, skipna:

>>> help(df.v.sum)
Help on method sum in module pandas.core.generic:

sum(axis=None, skipna=None, level=None, numeric_only=None, **kwargs) method 
of pandas.core.series.Series instance

+1

alexis 13 . '17 20:11

Is this what you want?

In [24]: df.groupby('l')['v'].agg(lambda x: np.nan if x.isnull().any() else x.sum())
Out[24]:
l
left    -3.0
right    NaN
Name: v, dtype: float64

or

In [22]: df.groupby('l')['v'].agg(lambda x: x.sum() if x.notnull().all() else np.nan)
Out[22]:
l
left    -3.0
right    NaN
Name: v, dtype: float64

0

Maxu Mar 13 '17 at 18:46

source share

B. M. · Accepted Answer · 2017-03-13T18:47:47+0000

I think this is inherent in pandas. Workaround may be:

df.groupby('l')['v'].apply(array).apply(sum)

to mimic the numpy way

or

df.groupby('l')['v'].apply(pd.Series.sum,skipna=False) # for series, or
df.groupby('l')['v'].apply(pd.DataFrame.sum,skipna=False) # for dataframes.

to call a good function.

Summing rows in grouped pandas data and returning NaN

Example

Problem

Bypass

More articles: