ConcurrentHashmap vs HashMap Performance

Question

ConcurrentHashmap vs HashMap Performance

How does ConcurrentHashMap work compared to HashMap, especially with the .get () function (I am particularly interested in only a few elements in the range from 0-5000)?

Is there any reason not to use ConcurrentHashMap instead of HashMap?

(I know empty values are not allowed)

Update

just to clarify, it is obvious that performance in the case of actual concurrent access will suffer, but how does it compare performance in the absence of concurrent access?

+45

java collections hashmap

Mauli Sep 04 '09 at 10:00

source share

7 answers

Thread safety is a complex issue. If you want to make the stream of objects safe, do it consciously and write down this choice. People who use your class will be grateful if it is thread safe, when it will simplify their use, but they will curse you if an object that was once thread safe becomes different in a future version. Thread safety, although very good, is not just for Christmas!

So now for your question:

ConcurrentHashMap (at least in Sun current implementation ) works by dividing the base map into several separate buckets. Getting an element does not require any locking as such, but uses atomic / volatile operations, which implies a memory barrier (potentially very expensive and interfering with other possible optimizations).

Even if all the overhead of atomic operations can be eliminated by the JIT compiler in the single-threaded case, there is still overhead to decide which of the search buckets is admittedly a relatively quick calculation, but nonetheless, it cannot be excluded.

How to decide which implementation to use, the choice is probably simple.

If this is a static field, you will almost certainly want to use ConcurrentHashMap if testing does not show that it is a real performance killer. Your class has different expectations for thread safety from instances of this class.

If this is a local variable, then most likely, a HashMap is sufficient - if you do not know that references to the object may leak into another stream. By encoding the map interface, you allow yourself to easily change it if you find a problem.

If this is an instance field and the class was not designed to be thread safe, then document it as an unsafe thread and use the HashMap.

If you know that this instance field is the only reason the class is not thread safe and are willing to live with the limitations that promise the promise of thread safety, then use ConcurrentHashMap if testing does not show significant performance implications. In this case, you can let the class user select a thread-safe version of the object, possibly using another factory method.

In any case, document the class as a safe thread (or conditionally thread safe) so that the people who use your class know that they can use objects in multiple threads, and the people who edit your class know that they must maintain thread safety in the future.

+81

Bill Michell Sep 04 '09 at 12:14

source share

I would recommend you measure it, because (for one reason) there may be some dependence on the hashed distribution of those objects that you store.

+3

Brian Agnew Sep 04 '09 at 10:05

source share

The standard hashmap does not provide concurrency protection, while the parallel hashmap does. Before it was available, you could wrap the hash map to access the stream, but it was a hard blocking of the grain and meant that all concurrent access was serialized, which could really affect performance.

A parallel hashmap uses a lock lock and locks only those elements that are subject to a specific lock. If you are working on a modern vm, such as an access point, vm will try to use a lock offset, coarser and an ellipse if possible, so you will pay a penalty for blocking when you really need it.

Thus, if your card will connect to parallel streams, and you need to guarantee a consistent view of the state, use a parallel hash file.

+3

Robert Christie Sep 04 '09 at 10:20

source share

In the case of a hash table of 1000 elements using 10 locks for the entire table, almost half the time is saved when 10,000 threads are inserted and 10,000 threads are deleted from it.

Interesting runtime difference here

Always use a parallel data structure. unless the back of the strip (mentioned below) becomes a frequent operation. In this case, you have to purchase all the locks? I read that recursion is the best way to do this.

A lock band is useful when there is a way to crack a high competition lock onto multiple locks without compromising data integrity. If this is possible or not, think about it, and this is not always the case. It is also about data structure. Therefore, if we use a large array to implement a hash table, using one lock for the entire hash table to synchronize it will lead to a sequential access to the data structure. If this is the same place in the hash table, then it is necessary, but what if they access the two extremes of the table.

The downside of locking is that it is difficult to get the state of a data structure that is affected by interlace. In the example, the size of the table or an attempt to list / enumerate the entire table can be cumbersome, since we need to get all the striped locks.

+2

Harisankar Krishna Swamy Feb 15 '12 at 18:51

source share

What answer do you expect here?

Obviously, it will depend on the number of readings that occur simultaneously with the recording and the duration of the normal recording of the card in the write operation in your application (and regardless of whether it uses the putIfAbsent method on ConcurrentMap ). Any benchmark will be largely meaningless.

0

oxbow_lakes Sep 04 '09 at 10:17

source share

It is not clear what you mean. If you need thread safety, you have almost no choice - only ConcurrentHashMap. And it definitely has performance / memory penalties in the get () call - access to mutable variables and blocking if you're out of luck.

0

Vitaly Sep 04 '09 at 10:18

source share

Atais · Accepted Answer · 2015-08-21 13:39

I was very surprised to see that this topic is so old, and so far no one has given any tests on this matter. Using ScalaMeter , I created add , get and remove tests for HashMap and ConcurrentHashMap in two scenarios:

using one thread
using as many threads as I have available kernels. Note that since the HashMap not thread safe, I just created a separate HashMap for each thread, but I used one, a common ConcurrentHashMap .

The code is available in my repository .

The results are as follows:

X axis (size) represents the number of elements recorded on the card (s)
Y axis (value) represents time in milliseconds

Summary

If you want to work with your data as quickly as possible, use all available streams. This seems obvious, each thread has 1 / nth full work.
If you choose one access to the stream, use the HashMap , it is simply faster. For the add method, it is even 3 times more efficient. Only get faster on ConcurrentHashMap , but not much.
When working with ConcurrentHashMap with many threads, it is similarly effective to work on a separate HashMaps for each thread. Therefore, there is no need to split the data in different structures.

To summarize, the performance for ConcurrentHashMap worse when you use a single thread, but adding more threads to do the job will certainly speed up the process.

Platform testing
AMD FX6100, 16GB Ram
Xubuntu 16.04, Oracle JDK 8 91 update, Scala 2.11.8

ConcurrentHashmap vs HashMap Performance

The results are as follows:

Summary

More articles: