Batch Type Provisioning

Consider the following example:

import pandas as pd
import numpy as np
foo = pd.DataFrame(dict(letter=['a', 'a', 'a', 'b', 'b', 'b', 'a', 'b'],
                 number=[1,1,2,2,3,np.nan, np.nan,4]))
grouped = foo.groupby(foo.number)
print grouped['letter'].transform(lambda x: sum(x=='a'))

Out[18]: 
0    2
1    2
2    1
3    1
4    0
5    b
6    a
7    0

Instead of showing 1in lines 5and 6, 'a'and 'b', presumably, because the group was indexed by value np.nan. Is there a way to stop this without replacing the values nanwith some dummy variable? Also - why is this happening?

+4
source share
1 answer

The pandas docs explain this here: http://pandas.pydata.org/pandas-docs/stable/missing_data.html

NAN excluded, this is consistent with R.

Earlier versions of pandas included them, but have since been removed.

+1
source

Source: https://habr.com/ru/post/1618350/


All Articles