Why does Object.hashcode () have conflicts in Java?

I am running the code below in Hotspot JDK 1.6 on Windows XP, I ran it twice and I got the results below.

Basically, does object.hashcode() have conflicts? it doesn't seem to return the memory address in the virtual machine.

However, a comment in the JDK said that the values ​​should be different, can anyone explain?

As far as reasonably practical, the hashCode method defined by the Object class returns different integers for different objects. (Usually this is done by converting the internal address of the object to an integer, but this implementation of the technique is not required by Java TM .)

 @return a hash code value for this object. @see java.lang.Object#equals(java.lang.Object) @see java.util.Hashtable 

This is the first result:

 i,hashcode(): 361,9578500 i,hashcode(): 1886,9578500 conflict:1886, 361 i,hashcode(): 1905,14850080 i,hashcode(): 2185,14850080 conflict:2185, 1905 9998 

This is the second result:

 i,hashcode(): 361,5462872 i,hashcode(): 1886,29705835 conflict:1887, 362 i,hashcode(): 1905,9949222 i,hashcode(): 2185,2081190 conflict:2186, 1906 9998 10000 

My code is:

 @Test public void testAddr() { Set<Integer> s = new TreeSet<Integer>(); Map<Integer, Integer> m = new TreeMap<Integer, Integer>(); Set<Object> os = new HashSet<Object>(); for(int i = 0; i < 10000; ++i) { Object o = new Object(); os.add(o); Integer h = o.hashCode(); if((i == 361) || (i == 1886) || (i == 2185) || (i == 1905)) { System.out.println("i,hashcode(): " + i + "," + h); } if(s.contains(h)) { System.out.println("conflict:" + i + ", " + m.get(h)); } else { s.add(h); m.put(h, i); } } System.out.println(s.size()); int c = 0; for(Object o: os) { c++; } System.out.println(c); } 
+2
source share
5 answers

hashCode() supposed to be used to place objects in hash tables . The rule for hashCode not that hashCode should never create conflicts, although this is a desired property, but equal objects should have the same hash codes. This does not exclude the presence of equal hashes of unequal objects.

You have found a case where the default implementation of Object.hashCode() generates the same hash codes for unequal objects. It is required that the hash code of an object does not change if there were no changes in some field equality of this object with another. One possible reason is that the garbage collector collected the memory, so that the subsequent instance o was in the same place as the earlier creation o (i.e., in the loop, you selected two objects o , and the garbage collector regrouped the memory between two distributions so that the old o moved from one memory location, and then the new o allocated in this place). Then, even if the hash code for the old o cannot be changed, the hash code for the new o is the address where the new o is stored in memory, which turns out to be the hash code for the old o .

+5
source

This, unfortunately, is a common misinterpretation of API documents. From an uncontrollable error (1 vote) for this some time ago.

(spec) System.identityHashCode doc inadequate, default Object.hashCode docs implementation error

[...]

From discussions of Usenet and open source software, it seems that many, perhaps most programmers, think that by default, and therefore System.identityHashCode, it will create unique hash codes.

The proposed implementation method is not even suitable for modern wireless JVMs, and should go just like the JVM Spec Chapter 9.

The qualification “To the extent practicable” is not enough to clearly show that hash codes are absent, in practice, distinct.

+2
source

It is possible that a long-term program created can create, call hashCode() on and discard many billions of objects during its execution. Thus, it would be mathematically impossible to guarantee that if some hashCode object returned a specific number, no other object would ever return the same amount for the life of the program. Even if hashCode() somehow managed to return unique values ​​for the first objects 4,294,967,296, he would have no choice but to return the already used value for the next (since the previous call would use the last remaining previously unused value).

The fact that hashCode() clearly cannot guarantee that the hash values ​​will not be reused for the life of the program does not mean that it cannot guarantee that the hash codes will not be reused throughout the life of the objects in question. for some memory management schemes, such a guarantee can be made relatively cheaply. For example, the 1984 Macintosh broke the heap into two parts, one of which contained descriptors of objects of a fixed size and one of which contained data of an object of variable size. Object descriptors once created will never move; if any objects were deleted, the space used by their descriptors will be reused when creating new objects. With this scheme, the address of the object descriptor will be a unique and unchanging representation of its identity as long as the object exists, and therefore can be used as a hashCode() value. Unfortunately, such schemes tend to have more overhead than some other approaches in which objects do not have a fixed address associated with them.

+1
source

The comment does not say that it is different. It says that it differs as much as reasonably practical.

Apparently you found a case where it was impractical.

0
source

Haskods do not have to be unique, just consistent. Although most often they are usually quite unique.

In addition to your excerpt above, Object should say the following.

As reasonably practical, the hashCode method defined by the Object class returns different integers for different objects. (This is usually done by converting the internal address of the object to an integer, but this implementation method is not required by the JavaTM programming language.)

Doc object

0
source

Source: https://habr.com/ru/post/1485367/


All Articles