Does the pandas group aggregate function handle inline functions differently?

This seemingly strange behavior when discussing qaru.site/questions/1689990 / ... .

The OP had this data block:

x = pd.DataFrame.from_dict({
    'cat1':['A', 'A', 'A', 'B', 'B', 'C', 'C', 'C'],
    'cat2':['X', 'X', 'Y', 'Y', 'Y', 'Y', 'Z', 'Z']})

and wanted to find unique values cat2for each group of values cat1.

One option is to combine and use a lambda to create a set of unique values:

x.groupby('cat1').agg(lambda x: set(x))

# Returns
        cat2
cat1        
A     {X, Y}
B        {Y}
C     {Z, Y}

I suggested that use setalone would be equivalent to lambda here, as it could be caused by:

x.groupby('cat1').agg(set)

# Returns
              cat2
cat1              
A     {cat1, cat2}
B     {cat1, cat2}
C     {cat1, cat2}

, lambda, , , , , pandas Series. , set DataFrame, .

. - , pandas -?

, SeriesGroupBy.agg . "Error: " type " ".

x.groupby('cat1')['cat2'].agg(set)
+4

Source: https://habr.com/ru/post/1689986/


All Articles