Is it safe to use parallelstream () to populate Maps in Java 8

I have a list of 1 million objects and I need to fill this out on a map. Now I want to reduce the time to fill this in Map, and for this I plan to use Java 8 parallelstream () as follows:

List<Person> list = new LinkedList<>(); Map<String, String> map = new HashMap<>(); list.parallelStream().forEach(person ->{ map.put(person.getName(), person.getAge()); }); 

I want to ask if it is safe to fill such a card through parallel threads. Is it impossible to have concurrency problems and some data may be lost on the Map?

+6
source share
2 answers

It is very safe to use parallelStream() for collection in a HashMap . However, it is unsafe to use parallelStream() , forEach and the user adding things to the HashMap .

HashMap not a synchronized class, and trying to put elements into it at the same time will not work properly. This is what forEach will do, it will invoke a given consumer who places items in the HashMap from multiple threads, possibly at the same time. If you need simple code demonstrating the problem:

 List<Integer> list = IntStream.range(0, 10000).boxed().collect(Collectors.toList()); Map<Integer, Integer> map = new HashMap<>(); list.parallelStream().forEach(i -> { map.put(i, i); }); System.out.println(list.size()); System.out.println(map.size()); 

Be sure to run it several times. There is a very good chance (joy of concurrency) that the size of the printed card after the operation is not 10000, the size of which is a list, but a little smaller.

The solution here, as always, is not to use forEach , but to use mutable shorthand using the collect method and the built-in toMap :

 Map<Integer, Integer> map = list.parallelStream().collect(Collectors.toMap(i -> i, i -> i)); 

Use this line of code in the code example above, and you can be sure that the card size will always be 10000. The Stream API ensures that it is safe to collect into a container that is not a stream, even in parallel. It also means that you do not need to use toConcurrentMap for security; this collector is necessary if you specifically want ConcurrentMap as a result, not a general Map ; but as regards thread safety with respect to collect , you can use both.

+12
source

HashMap not thread safe, but ConcurrentHashMap ; use this instead

 Map<String, String> map = new ConcurrentHashMap<>(); 

and your code will work as expected.


toMap() vs toMap() performance comparison

After warming up the JVM using 1M elements, using parallel streams and using median timings, the forEach() version was sequentially 2-3 times faster than the toMap() version.

The results were consistent between unique, 25% duplicate and 100% duplicate inputs.

+3
source

Source: https://habr.com/ru/post/1011682/


All Articles