What happens when the objects in the set are modified to fit together?

Question

What happens when the objects in the set are modified to fit together?

As the name implies, I have a question about changing objects in sets so that they become exactly the same (in the eyes of the set). Just curious.

I am asking this question regarding Python, but if generalized, feel free to do it.

If I understood correctly in Python, Set iterable will determine if objects are "equal" by equating their hashes. So for objects a and b this would be:

hash(a) == hash(b)

For any object you make, you can rewrite the standard hash function, __hash__ , as you wish.

Suppose you create a hash function that takes some or all of the objects in your object and uses a combination of hashes as its own (for example, by ORing them).

Now, if you have several initially different objects in one set and, therefore, go to this set and change the objects inside so that their internal objects coincide, what will happen to Set? Will they stay there, or they will be kicked out, or do we need to wait until the operation is performed on Seth? Or are we raising something?

+6

python set hash

Vincent ketelaars Nov 13 '13 at 12:04

source share

4 answers

You are not allowed to change the member of the set so as to change its hash value.

In Python, you can only store hashed objects in a collection. From the documentation (my attention):

A hashable if it has a hash value that never changes during its life cycle (it needs the __hash__() method) and can be compared with other objects (it needs __eq__() or __cmp__() ). Hashable objects that compare the same must have the same hash value.
Hashability allows you to use an object as a dictionary key and a member of a set, as these data structures use an internal hash value.
All unused Pythons built-in objects are hashed, while there are no mutable containers (such as lists or dictionaries). Objects that are instances of custom classes are hashed by default; they are all compared unevenly (except for themselves), and their hash value is their id() .

If you violate this contract (as you suggest in your question), the set cannot do its job, and all bets are disabled.

The correct way to change a member of a set is to delete, modify, and re-add. This will behave as you expected.

[set] will determine if objects are "equal" by equating their hashes

This is not entirely correct. Hash comparisons cannot be used to establish that objects are equal. It can only be used to establish that objects are unequal. This is a subtle but important difference.

+5

NPE Nov 13 '13 at 12:07

source share

First of all, set elements must be hashable :

Set elements must be hashed.

While hashable means:

An object hashable if it has a hash value that never changes during its lifetime [...]

So, until you change the object so that its hash value (the result of its __hash__ method) remains unchanged, everything is fine.

Typically, in Python, immutable objects are considered hashed, but mutable ones are not:

All unused Pythons built-in objects are hashed, while there are no mutable containers (such as lists or dictionaries).

+2

Bartoszkp Nov 13 '13 at 12:07

source share

ORing together hashes would create a particularly bad hash function, since you would get a greater tendency for values with more bits. However, sets and dictionaries use hashes for the hash table; collisions are expected, and deeper comparisons are made for objects with equal hash values. However, you lose the advantage of the hash table - O (1) search - if the hash function is bad.

Like other answers, sets should only contain immutable values. Changing the value of the hash object of an object after inserting it in sets violates the conditions for a given type, and operations, such as checking whether an object is in a collection or even removing an object from a collection, will fail. However, I hope you can still find it, iterating through the set.

0

Yann vernier Nov 13 '13 at 12:13

source share

georg · Accepted Answer · 2013-11-13T12:13:28+0000

Consider this test:

 class A: def __init__(self, h): self.h = h def __hash__(self): return self.h x = A(1) y = A(2) a = {x, y} print x in a, y in a print a print "----" xh = 2 print x in a, y in a print a

Result:

 True True set([<__main__.A instance at 0x10d94fd40>, <__main__.A instance at 0x10d94fd88>]) ---- False True set([<__main__.A instance at 0x10d94fd40>, <__main__.A instance at 0x10d94fd88>])

As you can see, the first object x still exists, but the in statement says that it is not. Why did this happen?

From my understanding, Set objects are implemented using hash tables, and a hash table usually has this structure:

  hash_value => list of objects with this hash value another_hash_value => list of objects with this hash value

When the set answers in queries, it first computes the hash value for the argument, and then tries to find it in the corresponding list. Our set a initially looks like this:

  1 => [x] 2 => [y]

Now we change the hash x and set it if there is an object. The set computes the hash value (which is now 2 ) and tries to find x in the second list and does not work - hence False .

To make things more interesting, let

 a.add(x) print x in a, y in a print a

Result:

 True True set([<__main__.A instance at 0x107cbfd40>, <__main__.A instance at 0x107cbfd88>, <__main__.A instance at 0x107cbfd40>])

Now we have the same object twice in the set! As you can see, there are no automatic adjustments and errors. Python is an adult language and always assumes that you know what you are doing.

What happens when the objects in the set are modified to fit together?

More articles: