How can I filter the Pandas GroupBy object and return the GroupBy object?

When a filter is executed based on the result of the Pandas groupby operation, it returns a dataframe. But, believing that I want to do further group calculations, I need to call the group again, which seems something like this. Is there a more idiomatic way to do this?

EDIT:

To illustrate what I'm talking about:

We shamelessly steal a toy frame from Pandas documents and the group:

>>> dff = pd.DataFrame({'A': np.arange(8), 'B': list('aabbbbcc')}) >>> grouped = dff.groupby('B') >>> type(grouped) <class 'pandas.core.groupby.DataFrameGroupBy'> 

Returns a groupby object over which we can iterate, perform group operations, etc. But if we filter:

 >>> filtered = grouped.filter(lambda x: len(x) > 2) >>> type(filtered) <class 'pandas.core.frame.DataFrame'> 

We are returning a data frame. Is there a good idiomatic way to get the filtered groups back, and not just the source strings belonging to the filtered groups?

+5
source share
1 answer

If you want to combine the filter and the aggregate, the best way I can imagine would be to combine your filter and the aggregate using the triple if inside apply , returning None for the filtered groups, and then dropna to remove these lines from your final result:

 grouped.apply(lambda x: x.sum() if len(x) > 2 else None).dropna() 

If you want to iterate over groups, tell me to join them again, you can use the understanding of the generator

 pd.concat(g for i,g in grouped if len(g)>2) 

Ultimately, I think it would be better if groupby.filter were able to return a groupby object.

+1
source

Source: https://habr.com/ru/post/1244541/


All Articles