Why is this key class for sorting heterogeneous sequences behaving strangely?

Question

Why is this key class for sorting heterogeneous sequences behaving strangely?

Python 3.x's sorted sorted() function cannot be used to sort heterogeneous sequences, since most pairs of different types are unordered (numeric types such as int , float , decimal.Decimal , etc. are decimal.Decimal exceptions):

 Python 3.4.2 (default, Oct 8 2014, 08:07:42) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> sorted(["one", 2.3, "four", -5]) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: float() < str()

In contrast, comparisons between objects that do not have a natural order are arbitrary but consistent in Python 2.x, so sorted() works:

 Python 2.7.8 (default, Aug 8 2014, 14:55:30) [GCC 4.8.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> sorted(["one", 2.3, "four", -5]) [-5, 2.3, 'four', 'one']

To replicate the behavior of Python 2.x in Python 3.x, I wrote a class to use as the key parameter for sorted() , which is based on the fact that sorted() guaranteed to use only less than comparisons:

 class motley: def __init__(self, value): self.value = value def __lt__(self, other): try: return self.value < other.value except TypeError: return repr(type(self.value)) < repr(type(other.value))

Usage example:

 >>> sorted(["one", 2.3, "four", -5], key=motley) [-5, 2.3, 'four', 'one']

So far, so good.

However, I noticed an amazing behavior when sorted(s, key=motley) is called with specific sequences containing complex numbers:

 >>> sorted([0.0, 1, (1+0j), False, (2+3j)], key=motley) [(1+0j), 0.0, False, (2+3j), 1]

I would expect that 0.0 , False and 1 would be in one group (because they are mutually ordered), and (1+0j) and (2+3j) in another (because they are of the same type). The fact that complex numbers in this result are not only separated from each other, but also one of them is in the middle of a group of objects that are comparable with each other, but not with it, is somewhat puzzling.

What's going on here?

+8

python sorting python-3.x python-2.x order complex-numbers

Zero Piraeus Oct. 25 '14 at 21:55

source share

1 answer

BrenBarn · Accepted Answer · 2014-10-25 22:12

You do not know in what order comparisons are made or even which elements are compared, which means that you cannot know exactly what effect your __lt__ will have. Your specific __lt__ sometimes depends on actual values, and sometimes on string representations of types, but both versions can be used for the same object during sorting. This means that your order is not determined solely by the objects in the list, but may also depend on their initial order. This, in turn, means that just because the objects are mutually comparable does not mean that they will be sorted together; they can be “blocked” by an incomparable object between them.

You can get an idea of what is going on by putting some debugging prints to see what it compares:

 class motley: def __init__(self, value): self.value = value def __lt__(self, other): fallback = False try: result = self.value < other.value except TypeError: fallback = True result = repr(type(self.value)) < repr(type(other.value)) symbol = "<" if result else ">" print(self.value, symbol, other.value, end="") if fallback: print(" -- because", repr(type(self.value)), symbol, repr(type(other.value))) else: print() return result

Then:

 >>> sorted([0.0, 1, (1+0j), False, (2+3j)], key=motley) 1 > 0.0 (1+0j) < 1 -- because <class 'complex'> < <class 'int'> (1+0j) < 1 -- because <class 'complex'> < <class 'int'> (1+0j) < 0.0 -- because <class 'complex'> < <class 'float'> False > 0.0 False < 1 (2+3j) > False -- because <class 'complex'> > <class 'bool'> (2+3j) < 1 -- because <class 'complex'> < <class 'int'> [(1+0j), 0.0, False, (2+3j), 1]

You can see, for example, that type-based ordering is used to compare a complex number with 1, but not to compare 1 and 0. Similarly, 0.0 < False for "normal" reasons, but 2+3j > False for a type based on the name.

As a result, it sorts 1+0j to the beginning, and then leaves 2+3j , where it is above False. He never tries to compare two complex numbers with each other, and the only element with which they are compared is 1.

More generally, your approach may result in persistent ordering with the appropriate choice for type names. For example, if you define classes A, B and C such that A and C can be compared, but they cause exceptions compared to B, then by creating objects a , b and c (from the corresponding classes), such that c < a , you can create a loop a < b < c < a . a < b < c will be true because classes will be compared based on their names, but c < a , because these types can be directly mapped. With an intransitive order, there is no hope of a “correct” sort order.

You can even do this with built-in types, although it takes a bit of creativity to create one to think of objects whose type names are in the correct alphabetical sequence:

 >>> motley(1.0) < motley(lambda: 1) < motley(0) < motley(1.0) True

(Because 'float' < 'function'

Why is this key class for sorting heterogeneous sequences behaving strangely?

More articles: