Sort twice in pandas

Question

Sort twice in pandas

I have a dataframe A that has three columns: 'id', 'value' and 'date'. I did a group by id and noticed something strange: if I do two consecutive sortings for rows with a given identifier, first by value and then by date, the order of operations affects the order of the lines. Pay attention to the order of the lines with indices 42915 and 42916:

A.sort_values('value').sort_values('date')[A.sort_values('value').sort_values('date')['id'] == '0001249666']

id  value   date
42913   0001249666  113845.0    20130408
42914   0001249666  114597.0    20130430
42916   0001249666  125972.0    20140414
42915   0001249666  125971.0    20140414
42917   0001249666  136154.0    20150410
42918   0001249666  145551.0    20160407
42919   0001249666  152911.0    20170413

A[A['id'] == '0001249666'].sort_values('value').sort_values('date')

id  value   date
42913   0001249666  113845.0    20130408
42914   0001249666  114597.0    20130430
42915   0001249666  125971.0    20140414
42916   0001249666  125972.0    20140414
42917   0001249666  136154.0    20150410
42918   0001249666  145551.0    20160407
42919   0001249666  152911.0    20170413

If I, however, use .sort_values with two arguments, the order does not matter.

A.sort_values(['date','value'])[A.sort_values(['date','value'])['id'] == '0001249666']

id  value   date
42913   0001249666  113845.0    20130408
42914   0001249666  114597.0    20130430
42915   0001249666  125971.0    20140414
42916   0001249666  125972.0    20140414
42917   0001249666  136154.0    20150410
42918   0001249666  145551.0    20160407
42919   0001249666  152911.0    20170413

a[a['id'] == '0001249666'].sort_values(['date','value'])

id  value   date
42913   0001249666  113845.0    20130408
42914   0001249666  114597.0    20130430
42915   0001249666  125971.0    20140414
42916   0001249666  125972.0    20140414
42917   0001249666  136154.0    20150410
42918   0001249666  145551.0    20160407
42919   0001249666  152911.0    20170413

Now I know that what I'm doing is not the smartest way to do what I want, but I'm really interested in understanding what this behavior can explain? What do I mean when I do not understand this behavior.

+4

python sorting pandas

ollipolli Mar 05 '18 at 12:03

source share

1 answer

jdehesa · Accepted Answer · 2018-03-05T12:10:33+0000

, , ( , date) . , , () , . sort_values ; , quicksort, kind='mergesort' , .

, , .

Sort twice in pandas

More articles: