In general, there are a number of changes to how hash tables handle overflows.
Many (including Java if memory is used) are resized when the load factor (percentage of cells used) exceeds a certain percentage. The disadvantage of this is that the speed is independent - most inserts will be O (1), but some of them will be O (N).
To improve this problem, resize it a bit instead: when the load factor exceeds the magic number, they:
- Create a second (large) hash table.
- Insert a new item into the new hash table.
- Move some items from the existing hash table to a new one.
Then each subsequent insertion moves a different fragment from the old hash table to the new one. This preserves the average complexity of O (1) and can be written so that the complexity for each insert is essentially constant: when the hash table becomes βfullβ (i.e., the load factor exceeds the trigger point), you double the size of the table. Then, each insert, you insert a new element and move one element from the old table to the new. The old table will be filled in exactly the same way as the new one is filled, so each insert will include exactly two operations: insert one new element and move one old, so the insert speed remains almost constant.
There are other strategies. I especially like to make a hash table a table of balanced trees. In doing so, you usually ignore overflow completely. As you fill out the hash table, you simply get more items in each tree. Theoretically, this means that the complexity is O (log N), but for any practical size it is proportional to log N/M , where M = the number of buckets. For practical size ranges (for example, up to several billion elements) that are essentially constant ( log N grows very slowly), and, and this is often a little faster for the largest table that you can put in memory, and is lost faster for smaller sizes . The disadvantage is that it is really real when the stored objects are large enough - if you saved (for example) one character per node, the overhead of two pointers (plus, usually, balance information) on the node will be extremely high.
source share