Java hash function

I know Java has beautiful built-in support for HashMaps or HashTables.

Does anyone know which hash functions or methods Java uses?

Can I customize these features to make them more specific for a single application, to improve performance and reduce access time?

Thanks a lot for reading!

+4
java function oop hash
Mar 26 '09 at 5:17
source share
8 answers

Java allows you to override the hashCode() method for your classes to use a hash algorithm that is not only good for your application, but also for your individual types:

 public class Employee { private int id; // Default implementation might want to use "name" for as part of hashCode private String name; @Override public int hashCode() { // We know that ID is always unique, so don't use name in calculating // the hash code. return id; } } 
+11
Mar 26 '09 at 5:23
source share

Go nuts.

http://www.docjar.com/html/api/java/util/HashMap.java.html

In addition, you can always set a threshold for resizing and initial memory so that they are as large as you need them, which will reduce the start time when the card is almost full. If your card is full, you will also get a huge performance boost using ConcurrentHashmap.

+4
Mar 26 '09 at 5:19
source share

As a note, if you intend to override hashCode, you must also override equals.

+4
Mar 26 '09 at 15:08
source share

The hash code is calculated on the object stored in the collection. It is calculated using a standard algorithm (according to Effective Java). See This for more details.

You really can override the hashcode method for each object. The best way to implement the hashcode method is through a HashcodeBuilder (which is part of the Commons Lang structure, see here:

http://commons.apache.org/lang/

For more information on the hash code, see this article:

http://www.ibm.com/developerworks/java/library/j-jtp05273.html

Hope this helps.

+3
Mar 26 '09 at 5:31
source share

I know Java has beautiful built-in support for HashMaps or HashTables.

Completely devoid of hash syntax, I would not say that ...

In any case, as others have pointed out, individual classes should indicate what their hashCode () should be (by default, this is a hash of the memory address). If you implement your own, make sure that you follow the hashCode () method contract (in particular, it must be compatible with equals ()), otherwise the class will not work for keys in HashMap.

You can also look at the source code for j ava.util.HashMap and friends and see how they are implemented. For example, HashMap uses an array of buckets, and buckets can overflow using a linked list.

For further reading, you might want to take a look at ConcurrentHashMap, which can be safely accessed by many threads at the same time, as well as TreeMap, which offers a way to build a map for keys that can be ordered (and not necessarily hashed).

+1
Mar 26 '09 at 5:32
source share

In general, don’t worry too much about the hash functions of the standard JDK classes. Even if you can override String (you cannot), in practice, the hash function is almost always "good enough". Perhaps there are a few exceptions - for example, some classes, such as BigInteger and collections, each time calculate their hash code, cyclically moving through each element contained in it, which in some cases is quite false, but how often do you type instances of these classes ?

To develop hash codes for your own classes, the thing you are trying to do extends to the hash codes "randomly" over a range of integers. To do this, you usually need to β€œmix” the bit of consecutive fields in your object (you may be interested in an article on my website that clearly illustrates how a String hash code mixes bits ). Multiplying the current hash by an odd number (and usually a prime number), then adding the next element to the hash usually works quite well, like the first attempt. (However, there may be problems with this method when, for example, combined numbers / hash codes tend to have zeros in their low-order bits - usually there is no practical hash function that would absolutely guarantee good work in all cases.)

Then you can check your hash code. Create a series of random objects (or even use some real ones), calculate their hash codes, And from the bottom, say 16 bits of hash codes, and then see how many collisions you get. Make sure that the number of collisions you receive is approximately the number of hash collisions that you expect by chance. . For example, if you And with the lower 16 bits of the hash code (& 0xffff), then after 1000 random objects you expect about 8 collisions. After 2000, you expect about 30 clashes.

As far as performance is concerned, to some extent, I think that getting a hash code that is well distributed will generally be more useful these days than sacrificing hash quality for hash computation speed.

+1
Mar 26 '09 at 6:26
source share

There is a "hashCode / equals contract" to which you must join, which states that objects that are equal to each other according to the equals () method must provide the same hashCode () value. However, all objects with the same hash code are not required to be the same. You should take a look at http://java.sun.com/javase/6/docs/api/java/lang/Object.html#hashCode () , which tells you the details.

At first it’s a little difficult to wrap your head around symmetry, but it is definitely worth understanding if you do not want to have strange behavior in your application when you put objects in the HashMap and friends who do not adhere to this contract.

I also recommend purchasing a copy of Effective Java and reading the hashCode / equals chapters to fully understand it.

+1
Mar 26 '09 at 15:24
source share

what I suggest, if you know that you need fast hashes, you need to use a different implementation: try using ( http://fastutil.dsi.unimi.it/ ) or ( http://trove4j.sourceforge.net/ ) quickly. They are apparently faster, but specific in type.

0
Mar 26 '09 at 11:49
source share



All Articles