What is the effective HashCode () for small x values, large y?

I map the x, y values ​​on the Cartesian plane using a HashMap. What would be an effective HashCode for very small x, very large y values?

I am currently using:

public int hashCode() { return ((y * 31) ^ x); // & Typical x,y values would be, (with many collisions on x): [4, 1000001] [9, 1000000] [5, 999996] [6, 999995] [4, 999997] [6, 999997] [6, 1000003] [10, 999994] [8, 999997] [10, 999997] [5, 999999] [4, 999998] [5, 1000003] [2, 1000005] [3, 1000004] [6, 1000000] [3, 1000005] 

I insert x, y pairs in the hash map key using the .put method to avoid duplicating x, y pairs. Not sure if this is the most effective solution.

+2
source share
3 answers

Sometimes the best way to find out is to just run some brute force tests on your ranges. Ultimately, you can always write a hash function and go back and fix it later if you get poor performance. Premature optimization is evil. However, hashing is easy to verify.

I ran this program and got 0 collisions:

 import java.util.HashMap; import java.util.Map; import java.util.Map.Entry; public class Testing { public static void main(String[] args) { int minX = 0; int minY = 100000; int maxX = 20; int maxY = 2000000; Map<Integer, Integer> hashToCounts = new HashMap<Integer, Integer>(); for (int x = minX; x < maxX; x++) { for (int y = minY; y < maxY; y++) { int hash = hash(x, y); Integer count = hashToCounts.get(hash); if (count == null) count = 0; hashToCounts.put(hash, ++count); } } int totalCollisions = 0; for (Entry<Integer, Integer> hashCountEntry : hashToCounts.entrySet()) if (hashCountEntry.getValue() > 1) totalCollisions += hashCountEntry.getValue() - 1; System.out.println("Total collisions: " + totalCollisions); } private static int hash(int x, int y) { return 7 + y * 31 + x * 23; } } 

And the conclusion:

Total Collisions: 0

Please note that my function was 7 + y * 31 + x * 23 .

Of course, don't take my word for it. Stir ranges to tune it to your dataset and try to calculate it yourself.

Using (y * 31) ^ x gave me:

Total Collisions: 475,000

And using only x * y :

Total Collisions: 20439,039

Be warned that this program can use a pretty good chunk of memory and processing power. I ran it on a fairly powerful server. I do not know how this will work on the local machine.

Some good rules for hashing:

  • Mix your operators. By mixing your operators, you can make the results change more. Using just x * y in this test, I had a very large number of collisions.
  • Use prime numbers for multiplication. Prime numbers have interesting binary properties that lead to more unstable multiplication.
  • Avoid using shift operators (unless you really know what you are doing). They insert many zeros or ones into the binary number, reducing the volatility of other operations and potentially even reducing the number of possible outputs.
+3
source

It seems that x * y will work well, especially if the result matches int .

You can use a HashSet: it is inside a HashMap with only keys, no values. This would make the intention to avoid duplication more obvious.

0
source

It’s hard to predict. HashMap does some rebooting using the hash () method shown below, and then takes the bottom X bits. So, in an ideal world, ignoring the hash () method, which interferes, your least significant bits should be well distributed.

 static int hash(int h) { // This function ensures that hashCodes that differ only by // constant multiples at each bit position have a bounded // number of collisions (approximately 8 at default load factor). h ^= (h >>> 20) ^ (h >>> 12); return h ^ (h >>> 7) ^ (h >>> 4); } 

Usually I start with something very simple, for example. x ^ y (or x shifted by something ^ y or vice versa), and create a HashMap, and see if there are too many conflicts.

0
source

Source: https://habr.com/ru/post/1485369/


All Articles