How to group categorical values ​​in Pandas?

I am trying to convert a categorical value and group in pandas.

For example, I tried the following:

import pandas as pd

df = pd.DataFrame()
df['A'] = ['C1', 'C1', 'C2', 'C2', 'C3', 'C3']
df['B'] = [1,2,3,4,5,6]

df['A'] = df.loc[:,'A'].astype('category')

df2 = df[0:3]

result = df2.groupby(by='A')['B'].nunique()

print(result)

Sorry, I get an exception

File "C: \ Python34 \ lib \ site-packages \ pandas \ core \ internals.py", line 86, in init len (self.values), len (self.mgr_locs)))

ValueError: wrong number of elements passed 2, allocation implies 3

Edit Unfortunately, the workaround suggested by @joris does not work for my application. New counterexample:

import pandas as pd

df = pd.DataFrame()
df['A'] = ['C1', 'C1', 'C2', pd.np.nan, 'C3', 'C3']
df['B'] = [1,2,3,4,5,6]

df['A'] = df.loc[:,'A'].astype('category')

df2 = df[0:4]

df2['A'] = df2['A'].cat.remove_unused_categories()

result = df2.groupby(by='A')['B'].nunique()

print(result)
+4
source share
1 answer

, pandas 0.17.0 : https://github.com/pydata/pandas/issues/11635

nunique Series apply groupby:

In [22]: df2.groupby(by='A')['B'].apply(lambda x: x.nunique())
Out[22]:
A
C1    2
C2    1
C3    0
Name: B, dtype: int64

, remove_unused_categories(), , 0.17.1 (https://github.com/pydata/pandas/pull/11639)

0

Source: https://habr.com/ru/post/1616332/


All Articles