I have my own private-hashset / public addressing class (i.e. non-linked lists). It is very specific to my needs - it is not general (only for positive long numbers), it requires the number of inserted records to be predetermined and does not support deletion - but it should be as small as possible.
Since it has so little functionality, it is really a small and simple class. However, for some reason, when I insert a lot of records, the number of collisions gets too high too fast.
Some code (Java):
public class MyHashSet { private long[] _entries; public MyHashSet(int numOfEntries) { int neededSize = (int)(numOfEntries / 0.65D); _entries = new long[neededSize]; } public void add(long num) { int cell = ((Long) (num % _entries.length)).intValue(); while (_entries[cell] != 0) { if (++cell >= _entries.length) cell = 0; } _entries[cell] = num; } ...
I have a main object that initializes a MyHashSet object with 10 million as a parameter, and then calls add () 10 million times with another randomly generated (but positive) long number. Although on a regular Java HashSet, this insert takes about a second overall, it takes about 13 seconds to complete work with MyHashSet. I added a counter for collisions, and indeed, the number of collisions by 3-6 billion is more than expected (I would assume that about 30-40 million is expected).
Am I doing something wrong? Is there something wrong with hashing? Why were there so many clashes, and what can I do about it?
Thanks!
PS: The number 0.65 in the code means that the table will be filled only by 65%, which, as I know, should work well in closed hash networks. For this, even if I set it to 20%, the insertion still takes> 10 seconds.
- EDIT -
This is pretty tricky, but my test code recreated a random object (with System.currentTimeMillis () as a seed) on each iteration of the loop, instead of using the same one for the entire run.
After fixing, it takes about 2-3 seconds to insert. This still seems too big in comparison - why does java HashSet only need a second to insert by default when it is more complex than MyHashSet? Now I get only about 9 million collisions. I also tried disabling the registration code to see if it helps, but it still won't change the situation. I would appreciate any ideas and apologize again for the confusion.