I have users (mostly Germans) who need to select items from a long list. I will implement autocomplete, but I also want to present the elements to them in the expected order. I asked a couple of users for typical strings to sort them and found that it was (basically) consistent. However, implementing this order is difficult:
user_expectation(l) " < @ 1 2 10 10abc A e é E Z
sorted(l) " 1 10 10abc 2 < @ A E Z e é
sorted(l, key=lambda w: w.lower()) " 1 10 10abc 2 < @ A e E Z é
ns.natsorted(l) 1 2 10 10abc " < @ A E Z e é
ns.natsorted(l, alg=ns.I) 1 2 10 10abc " < @ A E Z e é
ns.natsorted(l, alg=ns.LOCALE | ns.LF | ns.G) 1 2 10 10abc " < @ A E e Z é
ns.natsorted(l, alg=ns.LOCALE | ns.LF | ns.G), en 1 2 10 10abc < " @ A E e é Z
ns.natsorted(l, alg=ns.LOCALE | ns.LF | ns.G), de 1 2 10 10abc < " @ A E e é Z
ns.natsorted(l, alg=ns.LF | ns.G), de 1 2 10 10abc " < @ A e E Z é
Hence:
- Special characters first - the order does not matter if it is consistent.
- Further. Sort numbers numerically by matching prefix (from here ['1', '10', '2'])
- Characters (Latin1?)
- Without first accent
- Lower case before capital case (although this is probably not so important
- Accented / special later
the code
from __future__ import unicode_literals
import natsort as ns
import locale
def custom_print(name, l):
s = u"{:<50}".format(name)
for el in l:
s += u"{:<5}\t".format(el)
print(u"\t" + s.strip())
l = ['"', "<", "@", "1", "2", "10", "10abc", "A", "e", "é", "E", "Z"]
custom_print("user_expectation(l)", l)
custom_print("sorted(l)", sorted(l))
custom_print("sorted(l, key=lambda w: w.lower())",
sorted(l, key=lambda w: w.lower()))
custom_print("ns.natsorted(l)", ns.natsorted(l))
custom_print("ns.natsorted(l, alg=ns.I)", ns.natsorted(l, alg=ns.I))
custom_print("ns.natsorted(l, alg=ns.LOCALE | ns.LF | ns.G)",
ns.natsorted(l, alg=ns.LOCALE | ns.LF | ns.G))
locale.setlocale(locale.LC_ALL, 'en_US.UTF-8')
custom_print("ns.natsorted(l, alg=ns.LOCALE | ns.LF | ns.G), en",
ns.natsorted(l, alg=ns.LOCALE | ns.LF | ns.G))
locale.setlocale(locale.LC_ALL, 'de_DE.UTF-8')
custom_print("ns.natsorted(l, alg=ns.LOCALE | ns.LF | ns.G), de",
ns.natsorted(l, alg=ns.LOCALE | ns.LF | ns.G))
custom_print("ns.natsorted(l, alg=ns.LF | ns.G), de",
ns.natsorted(l, alg=ns.LF | ns.G))
natsort IGNORECASE, LOWERCASEFIRST, LOCALE (de en), GROUP . , . ? ( LF, , )