You can use the cmp
parameter to sort by key in Python 2.3 or lower. But sometimes sorting the key
style is easier to read; and in any case, it does less work, since cmp
will be called O (n log n) times, and the key
function will be called only O (n) times.
With this in mind, you can reproduce the behavior of the key
parameter in later versions of Python. He uses the hieromate decorate-sort-undecorate, aka Schwartzian Transform . It will not be so effective because it makes copies, but for large lists it is more likely to be more efficient. I called it sorted
because it roughly reproduces the sorted
function added in 2.4; check the python version and conditionally import it so that you don't break the built-in sorted
in newer versions - or just rename it.
def sorted(seq, key=lambda x: None, reverse=False): seq = [(key(x), i, x) for i, x in enumerate(seq)] seq.sort() if reverse: seq.reverse() return [x for k, i, x in seq]
Note that enumerate
is only necessary if you care about stable sorting by unequal values ββwith equal keys; It slows down hair function. Tested according to your data:
>>> key=lambda x: (x.count('YES'), x.count('MAYBE'), x.count('NO')) >>> my_sorted(mylist, key=key, reverse=True) [['ITEM C', 'YES', 'YES', 'YES', 'YES', 'NO', 'NO', 'MAYBE', 'NO', 'MAYBE'], ['ITEM B', 'YES', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'NO', 'MAYBE'], ['ITEM A', 'YES', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'NO', 'NO']]
You can also use a dictionary to count; Thus, only one pass is required. However, count
is optimized enough that three passes are still faster than one Python for
loop, at least on my machine. So use this only if you need to count a lot of values. I will leave this here for posterity:
def my_key(inner_list): counts = {'YES':0, 'MAYBE':0, 'NO':0} for i in inner_list: if i in counts: counts[i] += 1 return (counts['YES'], counts['MAYBE'], counts['NO'])
I did some testing; apologies for the long post. The following is only curious and inquisitive.
My tests show that decorating, sorting, undecorate in a smaller list is already faster than using the built-in + cmp
sorting. On a larger list, the difference becomes more dramatic. Definitions:
def key_count(x): return (x.count('YES'), x.count('MAYBE'), x.count('NO')) def key_dict(inner_list): counts = {'YES':0, 'MAYBE':0, 'NO':0} for i in inner_list: if i in counts: counts[i] += 1 return (counts['YES'], counts['MAYBE'], counts['NO']) def decorate_sort(seq, key=lambda x: None, reverse=False): seq = [(key(x), i, x) for i, x in enumerate(seq)] seq.sort() if reverse: seq.reverse() return [x for k, i, x in seq] def builtin_sort(seq, key, reverse=False): seq.sort(lambda p, q: cmp(key(p), key(q))) if reverse: seq.reverse()
Tests:
>>> mylist = [ ... ['ITEM A', 'YES', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'NO', 'NO'], ... ['ITEM B', 'YES', 'NO', 'YES', 'YES', 'NO', 'NO', 'NO', 'NO', 'MAYBE'], ... ['ITEM C', 'YES', 'YES', 'YES', 'YES', 'NO', 'NO', 'MAYBE', 'NO', 'MAYBE'] ... ] >>> %timeit decorate_sort(mylist, key=key_count, reverse=True) 100000 loops, best of 3: 5.03 us per loop >>> %timeit builtin_sort(mylist, key=key_count, reverse=True) 100000 loops, best of 3: 5.28 us per loop
The built-in version is already slower! A less generalized version of mylist.sort(lambda p, q: -cmp(key(p), key(q)))
better for short list hair by adding enumerate
to decorate_sort
; without it, decorate_sort
works faster (4.28 us per cycle in my previous test):
>>> %timeit mylist.sort(lambda p, q: -cmp(key_count(p), key_count(q))) 100000 loops, best of 3: 4.74 us per loop
Using key_dict
is an error in this case:
>>> %timeit decorate_sort(mylist, key=key_dict, reverse=True) 100000 loops, best of 3: 8.97 us per loop >>> %timeit builtin_sort(mylist, key=key_dict, reverse=True) 100000 loops, best of 3: 11.4 us per loop
By checking this on a larger list, basically the same results are saved:
>>> import random >>> mylist = [[random.choice(('YES', 'MAYBE', 'NO')) for _ in range(1000)] for _ in range(100)] >>> %timeit decorate_sort(mylist, key=key_count, reverse=True) 100 loops, best of 3: 6.93 ms per loop >>> %timeit builtin_sort(mylist, key=key_count, reverse=True) 10 loops, best of 3: 34.5 ms per loop
A less generalized version is now slower than decorate_sort
.
>>> %timeit mylist.sort(lambda p, q: -cmp(key_count(p), key_count(q))) 100 loops, best of 3: 13.5 ms per loop
And key_dict
is still slower. (But faster than builtin_sort
!)
>>> %timeit decorate_sort(mylist, key=key_dict, reverse=True) 10 loops, best of 3: 20.4 ms per loop >>> %timeit builtin_sort(mylist, key=key_dict, reverse=True) 10 loops, best of 3: 103 ms per loop
So, the result is that the Schwartz transformation provides a solution that is faster and more generalized β a rare and wonderful combination.