Why IdentityHashMap uses linear sensing to resolve conflicts

As we know in the framework of java collections, each class in Map uses Chaining to resolve conflicts, but IdentityHashMap uses linear sensing for it.

If you see java docs, he mentioned:

For many JRE implementations and working mixtures, this class will provide better performance than the HashMap (which uses a chain rather than a linear probe).

My questions:

  • why did the developers use IdentityHashMap only line research instead of all Map implementations if performance is better in linear research

  • Why there is a performance enhancement in linear sensing , and then chaining .

Tanks.

+4
source share
3 answers

This may shed some light (taken from the Oracle website):

Implementation Note: This is a simple hash table with a linear probe, as described, for example, in the Sedgewick and Knuth texts. An array alternates keys and values. (This has better locality for large tables than using separate arrays.) For many JRE implementations and work mixes, this class will have better performance than HashMap (which uses chaining rather than linear probing).

Although binding may be better for most implementations, this is not true for every implementation.

EDIT Also found this, maybe it's less trivial (taken from here ):

The motivation for using sensing is that it is somewhat faster than after a linked list, but this is only true when a reference to the value can be placed directly in the array. This is not practical for all other hash-based collections, as they store the hash code as well as the value. This is for efficiency reasons: the get operation should check if it found the correct key, and since equality is an expensive operation, it makes sense to first check if it even has the correct hash code. Of course, this reasoning does not apply to IdentityHashMap , which checks the identity of an object, not the equality of an object.

As a background / refinement, a IdentityHashMap differs from a regular HashMap in that two keys are considered equal only if they are physically the same object: an identifier is used to compare keys, not equal.

EDIT: A discussion that helps find the answer (from the comments below):

Attempt:

but this is only true when the reference to the value can be placed directly in the array. This is not practical for all other hash-based collections, as they store the hash code as well as the value. I have a doubt why hashMap cannot put the key, value and hash code into an array and use linear probing if linked list binding is more expensive than a direct array?

wlyles:

probably due to the use of space. It will take more data in each slot. And I have to point out that, although bypassing is less expensive for linear probing, the general search operation can be more expensive (and less predictable), as linear probing often suffers from clustering when many keys have the same hash value. As @delnan said in another comment, for example, if the keys are 1.20 hash for consecutive slots and the 21st hash in the same slot as the 1st, find it (or for a non-existing key that hashes the 1st slot) 20 probes required. Using a list will require fewer probes. For further clarification: since IdentityHashMap compares key values, the probability of collision is very low. Thus, the main weakness of the linear probing collisions leading to thickening is largely eliminated, which makes it more desirable in this implementation.

For further clarification: due to the way IdentityHashMap compares key values, the probability of collision is very low. Thus, the main weakness of the linear probing collisions leading to thickening is largely eliminated, which makes it more desirable in this implementation

+3
source

When you create a hash identification map, there is no way to find two instances that are equal to each other but are not the same object. It also uses identityHashCode , which has a chance of collisions that are known to the IdentityHashMap developers in advance and are known to be very small. In these "laboratory" conditions, linear sensing is apparently the best choice in terms of performance.

I suspect that the class library developers used a chain rather than linear probing in β€œregular” hash maps β€” this is their desire to maintain decent performance, even if the hash functions are suboptimal.

+7
source

From Documents :

Implementation Note: This is a simple hash table with a linear probe, as described, for example, in the Sedgewick and Knuth texts. An array alternates keys and values. (This has better locality for large tables than using separate arrays.) . For many JRE implementations and work mixes, this class will yield better performance than HashMap (which uses chaining rather than linear sensing).

The reason is that this class "will give better performance than HashMap."

-1
source

Source: https://habr.com/ru/post/1487082/


All Articles