Pandas sort with capital letters

Running this code:

df = pd.DataFrame(['ADc','Abc','AEc'],columns = ['Test'],index=[0,1,2])
df.sort(columns=['Test'],axis=0, ascending=False,inplace=True)

Returns columns dataframe, ordered as: [Abc, AEc, ADc]. ADc should be before AEc, what happens?

+4
source share
2 answers

I do not think pandas error. This is similar to how the python sorting algorithm works with mixed lowercase letters (case sensitive) - see here

Because when you do:

In [1]: l1 = ['ADc','Abc','AEc']
In [2]: l1.sort(reverse=True)
In [3]: l1
Out[3]: ['Abc', 'AEc', 'ADc']

So, since it seems impossible to control the sorting algorithm using the pandas sorting method, just use a narrower version of this column to sort and omit it later:

In [4]: df = pd.DataFrame(['ADc','Abc','AEc'],columns = ['Test'],index=[0,1,2])
In [5]: df['test'] = df['Test'].str.lower()
In [6]: df.sort(columns=['test'], axis=0, ascending=True, inplace=True)
In [7]: df.drop('test', axis=1, inplace=True)
In [8]: df
Out[8]:
  Test
1  Abc
0  ADc
2  AEc

. , , ascending True

EDIT:

DSM, , :

df = df.loc[df["Test"].str.lower().order().index]
+5

reindex, @Zero answer . (SORT_INDEX1), (SORT_INDEX2). (SORT_INDEX2) , (SORT_INDEX1) .

import pandas as pd

df = pd.DataFrame([['q', '1'],['a', '1'],['B', '1'],['C', '1'],
                   ['q', '0'],['a', '0'],['B', '0'],['C', '0']])

SORT_INDEX1 = 1
SORT_INDEX2 = 0

# Cannot change sorting algorithm used internally by pandas.
df_default = df.sort_values(by=[SORT_INDEX1, SORT_INDEX2])

# Use tuple of (index, value to sort by) to get a list of sorted indices, obtained through unzipping.
df_new = df.reindex(list(zip(*sorted(zip(df.index, df[SORT_INDEX2]), key=lambda t: t[1].lower())))[0])
           .sort_values(by=SORT_INDEX1)

print('Original dataframe:')
print(df)

print('Default case-sensitive sort:')
print(df_default)

print('Case-insensitive sort:')
print(df_new)

:

Original dataframe:
   0  1
0  q  1
1  a  1
2  B  1
3  C  1
4  q  0
5  a  0
6  B  0
7  C  0
Default case-sensitive sort:
   0  1
6  B  0
7  C  0
5  a  0
4  q  0
2  B  1
3  C  1
1  a  1
0  q  1
Case-insensitive sort:
   0  1
5  a  0
6  B  0
7  C  0
4  q  0
1  a  1
2  B  1
3  C  1
0  q  1

( )

EDIT: , . . , 2 .

0

Source: https://habr.com/ru/post/1584914/


All Articles