Pandas: how to sort data by column AND by index

Given a DataFrame:

import pandas as pd
df = pd.DataFrame([6, 4, 2, 4, 5], index=[2, 6, 3, 4, 5], columns=['A'])

Results in:

   A
2  6
6  4
3  2
4  4
5  5

Now, I would like to sort by column A and index values.

eg.

df.sort_values(by='A')

Returns

   A
3  2
6  4
4  4
5  5
2  6

While I would like

   A
3  2
4  4
6  4
5  5
2  6

How can I get the sort in the first column and the second -

+4
source share
2 answers

Using lexsortfrom numpy can be a different way and a little faster:

df.iloc[np.lexsort((df.index, df.A.values))] # Sort by A.values, then by index

Result:

   A
3  2
4  4
6  4
5  5
2  6

Comparison with timeit:

%%timeit
df.iloc[np.lexsort((df.index, df.A.values))] # Sort by A.values, then by index

Result:

1000 loops, best of 3: 278 µs per loop

With the index reset and set the index again:

 %%timeit
df.reset_index().sort_values(by=['A','index']).set_index('index')

Result:

100 loops, best of 3: 2.09 ms per loop
+5
source

You can sort by index and then by column A using kind='mergesort'.

This works because mergesort is stable .

res = df.sort_index().sort_values('A', kind='mergesort')

Result:

   A
3  2
4  4
6  4
5  5
2  6
+3
source

Source: https://habr.com/ru/post/1695036/


All Articles