Batch Type Provisioning

Question

Batch Type Provisioning

Consider the following example:

import pandas as pd
import numpy as np
foo = pd.DataFrame(dict(letter=['a', 'a', 'a', 'b', 'b', 'b', 'a', 'b'],
                 number=[1,1,2,2,3,np.nan, np.nan,4]))
grouped = foo.groupby(foo.number)
print grouped['letter'].transform(lambda x: sum(x=='a'))

Out[18]: 
0    2
1    2
2    1
3    1
4    0
5    b
6    a
7    0

Instead of showing 1in lines 5and 6, 'a'and 'b', presumably, because the group was indexed by value np.nan. Is there a way to stop this without replacing the values nanwith some dummy variable? Also - why is this happening?

+4

python numpy pandas

Hillary sanders Dec 02 '15 at 10:32

source share

1 answer

toasteez · Accepted Answer · 2015-12-02T22:59:52+0000

The pandas docs explain this here: http://pandas.pydata.org/pandas-docs/stable/missing_data.html

NAN excluded, this is consistent with R.

Earlier versions of pandas included them, but have since been removed.

Batch Type Provisioning

More articles: