Java: getting a unique hash value of an object

Question

Java: getting a unique hash value of an object

I am trying to get a unique hash value for a Java object, for example, the following:

If A == B then A.HashValue() == B.Hash.HashValue()
If A != B then A.HashValue() != B.HashValue()

Assume that an object contains several logical and integer fields.

+4

java

Gjorgji Feb 15 '11 at 5:09

source share

6 answers

This is not possible at all, you must ensure that if a.equals(b) , then a.hashCode() == b.hashCode() . You cannot guarantee the opposite: you can always run into it because the hashCode method has only 32-bit space, and your JVM can have 64-bit space for identification hash codes.

+4

sjr Feb 15 '11 at 5:24

source share

You can do this if you can limit the number of instances of your class to 2 ^ 32. Here is one way:

 import java.util.concurrent.atomic.AtomicInteger; class UniqueHash { private static AtomicInteger NEXT_HASH_CODE = new AtomicInteger(); private final int hashCode; UniqueHash() { while (true) { int nextHashCode = NEXT_HASH_CODE.get(); if (nextHashCode == -1) { throw new RuntimeException("Too many instances!"); } if (NEXT_HASH_CODE.compareAndSet(nextHashCode, nextHashCode + 1)) { hashCode = nextHashCode; break; } } } public int hashCode() { return hashCode; } }

Edit: this assumed that by "a == b" you meant a == b in the sense of the identity of the object. You will mention in the comments that you really mean if the fields are equal. See Answers by @Mark Peters and @sjr.

Edit 2: The fixed bug marked by @Tom Hawtin - tackline left another bad practice in place. :)

Edit 3: there was a race in my “fix”. Race fixed.

+3

rlibby Feb 15 '11 at 5:31

source share

I am trying to get a unique hash value for a Java object ... Let's say that the object contains several logical and integer fields.

To do this, you need more than a 32-bit integer or you need to define limits on the range of your fields. It is simply impossible to write more than 32 bits of information in 32 bits, and the presence of only int and boolean is 33 bits of information (provided that each int value is possible).

A long won't even be big enough if you have multiple int fields. You need to go to BigInteger , BitSet or an array of bytes.

In any case, let's say your data does not span more than 32 bits. Then it is just a matter of placing your data in the bit field represented by int.

 byte a; byte b; boolean c; boolean d; int hash = (a << 24) | (b << 16) | (c ? 0x02 : 0) | (d ? 0x01 : 0); //layout //index: ... 3210 //aaaa aaaa bbbb bbbb 0000 0000 0000 00cd

This does not make it a well-distributed hash (for use in a hash table, for example). However, if you want to guarantee uniqueness, you are probably not trying to use it for a hash table, are you?

I am curious why you have this strange requirement. The usual purpose of a hash is to get a value that can be unique, but with a fixed (reduced) size. Your requirement ensures that the hash must be as wide as the data it represents.

+3

Mark peters Feb 15 '11 at 5:34

source share

Use System.identityHashCode()

http://download.oracle.com/javase/1.5.0/docs/api/java/lang/System.html#identityHashCode (java. lang.Object)

Edit: it is true that you cannot guarantee the uniqueness of hash codes with this method; however, I think this is the best thing you can do, given that you cannot get the location of the Object memory. Any other hash function that you encounter will necessarily have the property that two structurally equivalent hash objects are equal to the same value, while this function at least gives you a chance for all objects created by your program with different hash codes.

For completeness: the hash code of the object by default is calculated once, when the object is constructed, from its initial memory location. Therefore, if more than one object is created with the same source memory address, they will necessarily have the same hash code.

+1

danben Feb 15 '11 at 5:15

source share

How to get a "unique identifier" - I really do not recommend this one :-) However, it fulfills the requirements in the question. See IdentityHashMap and view weak links.

Use a Map → integer object, where an integer is a counter.
For each new object seen, increase the counter and add it to the map.
For each existing object, return the stored value.

There may also be implementation-specific methods: for example, on Sun, I believe that Object.toString (the base method) always returns a unique string for this lifetime of the objects. The "encoded number" can be deflated and is the "AFAIK" internal reference.

I do not guarantee the accuracy of the previous paragraph. YMMV. Happy coding.

+1

user166390 Feb 15 '11 at 5:28

source share

corsiKa · Accepted Answer · 2011-02-15T05:24:23+0000

// Very important to edit ...

Giorgi, I know that you accepted the answer below correctly, but I found it wrong.

If you have a class like this:

 class tiny { int a; public int hashCode() { return a; } }

You have already selected all possible hash codes. (If it is not clear why, say so.)

So, if you add ANY additional information to the object, if you want the information presented in hashCode, you will have a collision somewhere.

But, in this regard, you really do not want to set a hash code that is 100% unique to the object. This is really not a hashCode point!

The hashCode point should provide you with a unique identifier for the object so that you can put it in the hash bucket. This is not for identification as it is for classification. The idea is that if you have a whole bunch of objects, you probably won't have a lot of collisions, so you'll probably have pretty quick access to what you're looking for if you grouped the elements by their hashCode.

If this means that you canceled my answer as correct, everything is in order. This is really wrong for what you are looking for. I hope you understand that this hashCode explanation leads to proper use, thereby preserving the correctness. But, as Mark clearly pointed out, this does not actually solve the problem you stated.

Below is the old answer:

==================================================== ==========

A good article about it is here from Effective Java (reveals the best book “I Want to Learn to Be a Good Java Developer” there).

http://www.linuxtopia.org/online_books/programming_books/thinking_in_java/TIJ313_029.htm

 class Gjorgji { boolean a; boolean b; boolean c; int x; int y; // EDIT: I almost forgot a VERY important rule... // WHEN YOU OVERRIDE hashCode, OVERRIDE EQUALS (and vice versa) public int equals(Object o) { if(!(o instanceof Gjorgji) return false; Gjorgji g = (Gjorgji)o; return a == ga && b == gb && c == gc && x == gx && y == gy; } public int hashCode() { int hash = x ^ y; hash *= a ? 31 : 17; // pick some small primes hash *= b ? 13 : 19; hash *= c ? 11 : 29; return hash; } }

Java: getting a unique hash value of an object

More articles: