Understanding the weird Java hash function

Below is the source code for the hash function in java.util.HashMap . Comments explain well enough what he is doing. , but how? What do ^ and >>> operators do? Can someone explain how the code really does what the comments say?

 /** * Applies a supplemental hash function to a given hashCode, which * defends against poor quality hash functions. This is critical * because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */ static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } 
+32
java hash
Feb 17 2018-12-17T00:
source share
6 answers

Dunno 'is about English, but here is some code and sample output:

 public static void main ( String[] args ) { int h = 0xffffffff; int h1 = h >>> 20; int h2 = h >>> 12; int h3 = h1 ^ h2; int h4 = h ^ h3; int h5 = h4 >>> 7; int h6 = h4 >>> 4; int h7 = h5 ^ h6; int h8 = h4 ^ h7; printBin ( h ); printBin ( h1 ); printBin ( h2 ); printBin ( h3 ); printBin ( h4 ); printBin ( h5 ); printBin ( h6 ); printBin ( h7 ); printBin ( h8 ); } static void printBin ( int h ) { System.out.println ( String.format ( "%32s", Integer.toBinaryString ( h ) ).replace ( ' ', '0' ) ); } 

What prints:

 11111111111111111111111111111111 00000000000000000000111111111111 00000000000011111111111111111111 00000000000011111111000000000000 11111111111100000000111111111111 00000001111111111110000000011111 00001111111111110000000011111111 00001110000000001110000011100000 11110001111100001110111100011111 

So, the code breaks the hash function into steps so you can see what happens. The first shift of 20 xor positions with the second shift of 12 positions creates a mask that can flip 0 or more of the lower 20 bits of int. This way you can get some randomness inserted into the lower bits, which uses potentially more distributed higher bits. It is then applied via xor to the original value to add this randomness to the low-order bits. A second shift at 7 x positions or a shift at 4 positions creates a mask that can flip 0 or more of the lower 28 bits, which again leads to some randomness to the lower bits and to some of the more significant ones, using the capitalization of the previous xor which already examined some of the distributions in low bits. The end result is a smoother bit allocation through the hash value.

Since the hashmap in java computes the bucket index by combining the hash with the number of buckets, you need to have a uniform distribution of the least significant bits of the hash value in order to evenly distribute the entries in each bucket.

As for the evidence of the claim that this limits the number of collisions, I have no input. Also, see here for good info on creating hash functions and a few details on why xor of two numbers tends to randomly distribute bits as a result.

+45
Feb 17 2018-12-17T00:
source share

>>> - bit-sphere with zero filling.

^ is XOR.

XOR also called exceptional, or is a mathematical operator that combines two numbers. See http://en.wikipedia.org/wiki/Exclusive_or

The correct n -bit is similar to resetting the n least significant bits of a number. Therefore, if the number 00010111 , and you shifted it by 1, you will get 00001011 .

+5
Feb 17 2018-12-12T00:
source share

Here's an article that discusses entire hash functions and some of the considerations for which they are designed. This is not very detailed, but most importantly:

operations must use a computational chain to reach the avalanche. An avalanche means that one bit of difference in input will be about 1/2 of the output bits will be different.

In principle, the goal is for the additional hash function to remove any patterns in the input signal, because this can lead to degeneration of the hash table.

+4
Feb 17 '12 at 22:05
source share
+1
Feb 17 2018-12-12T00:
source share

>>> seems to be an unsigned correct bitwise shift, and ^ is a bitwise XOR

http://docs.oracle.com/javase/tutorial/java/nutsandbolts/op3.html

0
Feb 17 '12 at 20:50
source share

This is a combination of a bitwise exclusive OR and an unsigned right shift.

See here for more details: http://www.roseindia.net/java/master-java/bitwise-bitshift-operators.shtml

-one
Feb 17 '12 at 20:51
source share



All Articles