Group return of full line for max occurs

How to get full row of data for groupu relsult?

df
   a   b   c  d   e
0  a  25  12  1  20
1  a  15   1  1   1
2  b  12   1  1   1
3  n  25   2  3   3

In [4]: df = pd.read_clipboard()

In [5]: df.groupby('a')['b'].max()
Out[5]: 
a
a    25
b    12
n    25
Name: b, dtype: int64

How to get the full string?

a   b   c  d   e
a  25  12  1  20
b  12   1  1   1
n  25   2  3   3

I tried to filter, but df[df.e == df.groupby('a')['b'].max()], but the size is different :(

Initial data:

0          1       2        3     4        5     6      7       8    9   
EVE00101  Trial  DRY RUN  PASS  1610071  1610071  Y  20140808  NaN  29   

10        11                12           13                 14  
FF1  ./ff1.sh  Event Validation  Hive Tables  2015-11-30 9:40:34 

Groupby([1,7])[14].max()gives me the result, but in grouped rows like 1 and 7 as an index, I need the corresponding columns. This is 15,000 rows of data and provided 1 row of sample

+4
source share
3 answers

You can use argmax():

In [287]: df.groupby('a', as_index=False).apply(lambda x: x.loc[x.b.argmax(),])
Out[287]:
   a   b   c  d   e
0  a  25  12  1  20
1  b  12   1  1   1
2  n  25   2  3   3

Thus, it works, even if bnot the largest.

+3
source

"b", transform, 'a', drop_duplicates:

In [331]:
df['b'] = df.groupby('a')['b'].transform('max')
df

Out[331]:
   a   b   c  d   e
0  a  25  12  1  20
1  a  25   1  1   1
2  b  12   1  1   1
3  n  25   2  3   3

In [332]:    
df.drop_duplicates('a')

Out[332]:
   a   b   c  d   e
0  a  25  12  1  20
2  b  12   1  1   1
3  n  25   2  3   3
+2

You can simply not use it ['b']for your slicing, and then print the entire data file:

In [41]: df.groupby('a').max()
Out[41]:
    b   c  d   e
a
a  25  12  1  20
b  12   1  1   1
n  25   2  3   3
+1
source

Source: https://habr.com/ru/post/1619479/


All Articles