Comparing two or more lines in a Pandas frame

I have a dataframe that looks like this:

Reference |   ID  | Length
ref101    |123456 | 10
ref101    |123789 | 5
ref202    |654321 | 20
ref202    |653212 | 40

I am trying to determine which row for each row in a column Referencehas the longest (based on the value in the column Length). For example, ref101with ID 123456longer lengths than ref101with ID 123789.

I played with .groupby(), but I'm not going anywhere. Is there any way to perform such an operation in Pandas?

+4
source share
4 answers

If this is the whole line you want, use groupby+ idxmax:

df.loc[df.groupby('Reference').Length.idxmax()]

  Reference      ID  Length
0    ref101  123456      10
3    ref202  653212      40

If you only need length, then groupby+ maxwill be enough:

df.groupby('Reference').Length.max()

Reference
ref101    10
ref202    40
Name: Length, dtype: int64
+6
source

idxmax,

df.groupby('Reference').Length.idxmax()
Out[495]: 
Reference
ref101    0
ref202    3
Name: Length, dtype: int64

nlargest

df.groupby('Reference').Length.nlargest(1)
Out[496]: 
Reference   
ref101     0    10
ref202     3    40
Name: Length, dtype: int64
+1

:

df = df.sort_values(['Reference', 'Length'], ascending=False).drop_duplicates(['Reference'])
0

From sort_valuesto Length, and then groupbyfirst we take head(1):

result_df = df.sort_values('Length',ascending=False).groupby('Reference').head(1))
print(result_df)

Result:

  Reference      ID  Length
3    ref202  653212      40
0    ref101  123456      10
0
source

Source: https://habr.com/ru/post/1695065/


All Articles