Faster hash function

I am trying to implement my own hash function, I add the ASCII numbers of each line using java. I find the hash code, find the mod the size of the hash table and the sum. Size% of the amount. I was wondering if there is a way to use the same process but reduce collisions when searching for a string?

Thanks in advance.

0
java hashtable
Dec 11 '12 at 17:39
source share
2 answers

I would look at the code for String and HashMap, as they have a low collision rate and do not use % and process negative numbers.

From source for string

 public int hashCode() { int h = hash; if (h == 0 && value.length > 0) { char val[] = value; for (int i = 0; i < value.length; i++) { h = 31 * h + val[i]; } hash = h; } return h; } 

From source for HashMap

 /** * Retrieve object hash code and applies a supplemental hash function to the * result hash, which defends against poor quality hash functions. This is * critical because HashMap uses power-of-two length hash tables, that * otherwise encounter collisions for hashCodes that do not differ * in lower bits. Note: Null keys always map to hash 0, thus index 0. */ final int hash(Object k) { int h = 0; if (useAltHashing) { if (k instanceof String) { return sun.misc.Hashing.stringHash32((String) k); } h = hashSeed; } h ^= k.hashCode(); // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } 

Since the HashMap always has a power of 2, you can use

  hash = (null != key) ? hash(key) : 0; bucketIndex = indexFor(hash, table.length); 

and

 /** * Returns index for hash code h. */ static int indexFor(int h, int length) { return h & (length-1); } 

Using & much faster than % and returns only positive numbers, since the length is positive.

+6
Dec 11 '12 at 18:05
source share

Java String.hashcode () makes a compromise between a really good hash function and the most efficient one. Just adding character values ​​to a string is not a reliable hash function.

For example, consider the two lines dog and god . Since both of them contain "d", "g", and "o", no method involving only adding will ever result in a different hash code.

Joshua Bloch , who has implemented a large part of Java, discusses the String.hashCode () method in his book Effective Java and talks about how, in versions of Java prior to 1.3, the String.hashCode () function is used to consider only 16 characters in a given string. This happened a little faster than the current implementation, but in some situations it was amazingly bad.

In general, if your specific dataset is very well defined, and you can use some kind of uniqueness in it, you can probably improve the hash function. For general purpose lines, good luck.

+6
Dec 11 '12 at 17:57
source share



All Articles