Weird HashSet contains () behavior

HashSet in java greatly confused me when contains () is used, will it look for hashcode () and equals () result? But in this case, he does not behave normally. Sometimes a problem arises if you put this kind of code in a large project. The problem is why the last statement prints FALSE? What contains () to do under the hood?

class R { int count; public R(int count) { this.count = count; } public String toString() { return "R(count attribution:" + count + ")"; } public boolean equals(Object obj) { if (obj instanceof R) { R r = (R)obj; if (r.count == this.count) { return true; } } return false; } public int hashCode() { return this.count; } } public class TestHashSet2 { public static void main(String[] args) { HashSet hs = new HashSet(); hs.add(new R(5)); hs.add(new R(-3)); hs.add(new R(9)); hs.add(new R(-2)); System.out.println(hs); //change first element Iterator it = hs.iterator(); R first = (R)it.next(); first.count = -3; System.out.println(hs); //remove hs.remove(new R(-3)); System.out.println(hs); R r1 = new R(-3); System.out.println(r1.hashCode()); Iterator i = hs.iterator(); R r2 = (R)i.next(); System.out.println(r2.hashCode()); //same hashcode -3 System.out.println(r1.equals(r2)); //equals true System.out.println("hs contains object which count=-3 ?" + hs.contains(new R(-3))); //false } } 
+4
source share
4 answers

HashSet stores values ​​in codes; the bucket index is computed when an item is added to a hash set. The idea behind this: now the set can read the hash code of the objects and calculate the bucket in one step. In other words: contains() is an O (1) operation.

Imagine a trivial hash set:

 bucket object(hashcode) #1 5 #2 -3 #3 6 

with a hash function for calculating buckets such as:

 f(hashcode) := | 5 -> 1 | -3 -> 2 | 6 -> 3 

Now let's see what you did in your example: you deleted the object in bucket 2 (changed the function) and changed the hash code of the object in bucket 1.

The new function looks like this:

 f(hashcode) := | 5 -> 1 | 6 -> 3 

f(-3) will return null ( contains() returns false), and your actual object with hashcode -3 will be stored where the object with hashcode 5 should be.

+2
source

By changing the value of an object after inserting it into a HashSet , you destroy the integrity of the data structure. After that, you cannot rely on this when doing your job.

It is usually a bad idea to use mutable objects as keys for any map or values ​​for a set. Fortunately, the classes used most often for this purpose ( String , Integer ) are immutable.

+6
source

That is why you should not use mutable objects as keys in HashSets and HashMaps.

The first iterator returned an R object with hashCode 5. Then you changed the property of this object (count). But this does not make the hash recount. So, for the HashSet, the object for which you changed the score to -3 is still in the bucket corresponding to the hash code 5. Then you deleted the object which is in the bucket corresponding to the -3 hash code, which was the original object R (-3). Thus, after this operation, the bucket does not have a hash of code 3, therefore contains(new R(-3)) returns false.

+2
source

The problem is that the hash of the object R can change . This is a breach of contract that the hashCode() method must obey.


To understand why this matters, you need to understand how the hash table works. Java HashSet has in its heart an array of record lists. When you put an object in a hash table, it first computes the hash code of the object. Then it reduces the hash code to an index in the array, computing

 index = hashcode % array.length 

He then looks for the chain starting with array[index] , and if the object is not in the list, he adds it.

And to check if the HashSet contains an object, it performs the same calculations and looks for the same hash chain.

However, if you are doing something for the object so that its hash code changes while it is in the table, then the above algorithm (usually) looks for the object in another chain in the chain to which it was originally added. And, of course, it will not find.

The end result is that the HashSet will behave abnormally if the hashcode contract is broken for any object while the object is a member of the collection.


Here is what Java javadoc says (see java.jang.Object # hashcode ()):

"General hashCode contract:

  • Whenever it is called on the same object more than once during the execution of a Java application, the hashCode method must consistently return the same integer if the information used in equal comparisons with the object does not change. This integer should not remain consistent with one execution of the application on another execution of the same application.

  • ...

"No information provided ..." the clause puzzles me. I think this only works if there is a rule that you should not change the hash codes of the object when they are in the hash table. Unfortunately, this rule is not specified in any of the places that you expect to find. Error in the documentation?


Perhaps we should name the requirement not to change the hash codes "verbal agreement"? :-)

+1
source

Source: https://habr.com/ru/post/1398581/


All Articles