The short answer to your question (about why adding NaN keys in a Python dict creates multiple records) is that the NaN floating point NaN unordered , i.e. a NaN value is not equal to, greater or less than anything, including it. This behavior is defined in the IEEE 754 standard for floating point arithmetic. An explanation of why this is given by an IEEE 754 committee member in this answer .
For a longer Python-specific answer, we’ll first look at how word insertion and key comparison work in CPython dictionaries.
When you say d[key] = val , PyDict_SetItem() is called for the d dictionary, which in turn calls (internal) insertdict() , which either updates the existing dictionary or introduces a new element (possibly by changing the size of the hash table )
The first step to insert is to search for key in the hash table of dictionary keys. The general-purpose search function called in your case (from non- lookdict() keys), lookdict() .
lookdict will use a key hash value to search for a key , iterate over a list of possible keys with the same hash value, compare first by address, then by calling key s' equivalence operator (s) (see excellent comments in Objects/dictobject.c for more information on resolving hash collisions in the Python implementation of open addressing ).
Since each float('nan') has the same hash value , but each of them is different from another object (with a different "identifier", that is, a memory address), and they're not equal to their float values :
>>> a, b = float('nan'), float('nan') >>> hash(a), hash(b) (0, 0) >>> id(a), id(b) (94753433907296, 94753433907272) >>> a == b False
when you speak:
d = dict() d[float('nan')] = 1 d[float('nan')] = 2
lookdict will look for the second NaN by looking at its hash ( 0 ), then try to resolve the hash collision, iterate over the keys with the same hash and compare keys by identifier / address (they are different), then by calling (expensive) PyObject_RichCompareBool / do_richcompare , which in turn, calls float_richcompare , which compares floats in the same way as C:
/* Comparison is pretty much a nightmare. When comparing float to float, * we do it as straightforwardly (and long-windedly) as conceivable, so * that, eg, Python x == y delivers the same result as the platform * C x == y when x and/or y is a NaN.
which behaves according to the IEEE 754 standard (from GNU C Library Documents ):
20.5.2 Infinity and NaN
[...]
Basic operations and mathematical functions all take infinity and NaN and produce a reasonable output. Infinity extends through computation, as you would expect: for example, 2 + & infin; =? infin ;, 4 /? = 0, atan (? Infin;) =? Pi / 2. NaN, on the other hand, infects any computations that include it. If the calculation does not give the same result, no matter what the actual value replaces NaN, the result will be NaN.
In comparative operations, positive infinity is greater than all values except itself and NaN, and negative infinity is less than all values except itself and NaN. NaN is disordered: it is not equal, more or less than anything, including it. x == x is not true if x is NaN. You can use this to check if the value is NaN or not, but the recommended testing method for NaN is with the isnan function (see floating point classes). In addition, <,>, <= and <= will throw an exception when applied to NaN.
and which will return false for NaN == NaN .
Therefore, Python decides that the second NaN object deserves a new dictionary entry. It may have the same hash, but its address and equivalence test say that it is different from all other NaN objects.
Please note that if you always use the same
NaN object (with the same address), since the address is checked before the equivalent of float,
you will get the expected behavior :
>>> nan = float('nan') >>> d = dict() >>> d[nan] = 1 >>> d[nan] = 2 >>> d {nan: 2}