I have the following dataset:
location category percent
A 5 100.0
B 3 100.0
C 2 50.0
4 13.0
D 2 75.0
3 59.0
4 13.0
5 4.0
And I'm trying to get the youngest elements of a category in a dataframe, grouped by location. those. if I want the top 2 highest percentages for each group to be as follows:
location category percent
A 5 100.0
B 3 100.0
C 2 50.0
4 13.0
D 2 75.0
3 59.0
It seems that in pandas this is relatively straightforward using pandas.core.groupby.SeriesGroupBy.nlargest
, but dask has no function nlargest
for groupby. Played with apply
, but could not get it to work correctly.
df.groupby(['location'].apply(lambda x: x['percent'].nlargest(2)).compute()
But I just get the error message ValueError: Wrong number of items passed 0, placement implies 8
source
share