Comparing two or more lines in a Pandas frame

Question

Comparing two or more lines in a Pandas frame

I have a dataframe that looks like this:

Reference |   ID  | Length
ref101    |123456 | 10
ref101    |123789 | 5
ref202    |654321 | 20
ref202    |653212 | 40

I am trying to determine which row for each row in a column Referencehas the longest (based on the value in the column Length). For example, ref101with ID 123456longer lengths than ref101with ID 123789.

I played with .groupby(), but I'm not going anywhere. Is there any way to perform such an operation in Pandas?

+4

python pandas

DanielH Mar 19 '18 at 16:01

source share

4 answers

idxmax,

df.groupby('Reference').Length.idxmax()
Out[495]: 
Reference
ref101    0
ref202    3
Name: Length, dtype: int64

nlargest

df.groupby('Reference').Length.nlargest(1)
Out[496]: 
Reference   
ref101     0    10
ref202     3    40
Name: Length, dtype: int64

+1

Wen 19 . '18 16:06

:

df = df.sort_values(['Reference', 'Length'], ascending=False).drop_duplicates(['Reference'])

0

zipa 19 . '18 16:10

From sort_valuesto Length, and then groupbyfirst we take head(1):

result_df = df.sort_values('Length',ascending=False).groupby('Reference').head(1))
print(result_df)

Result:

  Reference      ID  Length
3    ref202  653212      40
0    ref101  123456      10

0

0p3n5ourcE Mar 19 '18 at 16:19

source share

cᴏʟᴅsᴘᴇᴇᴅ · Accepted Answer · 2018-03-19T16:06:26+0000

If this is the whole line you want, use groupby+ idxmax:

df.loc[df.groupby('Reference').Length.idxmax()]

  Reference      ID  Length
0    ref101  123456      10
3    ref202  653212      40

If you only need length, then groupby+ maxwill be enough:

df.groupby('Reference').Length.max()

Reference
ref101    10
ref202    40
Name: Length, dtype: int64

Comparing two or more lines in a Pandas frame

More articles: