Strange Python set and hash behavior - how does it work?

I have a class called GraphEdge that I would like to uniquely define in a set (built-in set type) by its tail and head members, which are set via __init__ .

If I do not define __hash__ , I see the following behavior:

 >>> E = GraphEdge('A', 'B') >>> H = GraphEdge('A', 'B') >>> hash(E) 139731804758160 >>> hash(H) 139731804760784 >>> S = set() >>> S.add(E) >>> S.add(H) >>> S set([('A', 'B'), ('A', 'B')]) 

A lot of people don’t know that E and H match by my definition, since they have different hashes (this is what the set type uses to determine uniqueness, as far as I know), so it adds as separate elements. Therefore, I define a rather naive hash function for GraphEdge as follows:

 def __hash__( self ): return hash( self.tail ) ^ hash( self.head ) 

Now the above works as expected:

 >>> E = GraphEdge('A', 'B') >>> H = GraphEdge('A', 'B') >>> hash(E) 409150083 >>> hash(H) 409150083 >>> S = set() >>> S.add(E) >>> S.add(H) >>> S set([('A', 'B')]) 

But it is clear that ('A', 'B') and ('B', 'A') will return the same hash in this case, so I would expect that I could not add ('B', 'A') to a set already containing ('A', 'B') . But this is not what happens:

 >>> E = GraphEdge('A', 'B') >>> H = GraphEdge('B', 'A') >>> hash(E) 409150083 >>> hash(H) 409150083 >>> S = set() >>> S.add(E) >>> S.add(H) >>> S set([('A', 'B'), ('B', 'A')]) 

So, is a hash type set or not? If so, how is the last scenario possible? If not, why doesn't the first script work (no __hash__ )? Did I miss something?

Edit: For reference to future readers, I have already defined __eq__ (also based on tail and head ).

+4
source share
3 answers

You have a hash collision. When hashes collide in a set, the == operator is used to check if they are really equal to each other.

+14
source

It is important to understand how hash and == work together because both are used by collections. For two values ​​of x and y, the rule is important:

 x == y ==> hash(x) == hash(y) 

(x is equal to y means that x and y hashes are equal). But the inverse is wrong: two unequal values ​​can have the same hash.

Settings (and dicts) will use the hash to get closer to equality, but will use the real equality operation to find out if the two values ​​match or not.

+7
source

You should always define both __eq__() and __hash__() if you need at least one of them. If the hashes of the two objects are equal, an additional __eq__() check is performed to check for uniqueness.

+6
source

Source: https://habr.com/ru/post/1299707/


All Articles