If df has an index without duplicate values, you can use idxmax to return the maximum row index for each group. Then use df.loc to select the entire line:
In [322]: df.loc[df.groupby('type').votes.agg('idxmax')] Out[322]: name type votes 3 max cat 9 0 bob dog 10
If df.index has duplicate values, i.e. is not a unique index, then first make the index unique:
df = df.reset_index()
then use idxmax :
result = df.loc[df.groupby('type').votes.agg('idxmax')]
If you really need to, you can return df to its original state:
df = df.set_index(['index'], drop=True)
but overall life is much better with a unique index.
Here is an example showing what happens when df does not have a unique index. Assume index AABB :
import pandas as pd df = pd.DataFrame({'name': ['bob', 'pete', 'fluffy', 'max'], 'type': ['dog', 'cat', 'dog', 'cat'], 'votes': [10, 8, 5, 9]}, index=list('AABB')) print(df)
idxmax returns index values A and B :
print(df.groupby('type').votes.agg('idxmax')) type cat B dog A Name: votes, dtype: object
But A and B do not explicitly indicate the required lines. df.loc[...] returns all rows whose index value is A or B :
print(df.loc[df.groupby('type').votes.agg('idxmax')])
In contrast, if the reset index is:
df = df.reset_index()
then df.loc can be used to select the desired lines:
print(df.groupby('type').votes.agg('idxmax'))