How do I know if Java Stream (Collectors.toMap) is parallelized?

Question

How do I know if Java Stream (Collectors.toMap) is parallelized?

I have the following code that tries to populate a map from a list in parallel by going through the Java Stream API:

class NameId {...} public class TestStream { static public void main(String[] args) { List<NameId > niList = new ArrayList<>(); niList.add(new NameId ("Alice", "123456")); niList.add(new NameId ("Bob", "223456")); niList.add(new NameId ("Carl", "323456")); Stream<NameId> niStream = niList.parallelStream(); Map<String, String> niMap = niStream.collect(Collectors.toMap(NameId::getName, NameId::getId)); } }

How do I know if a card is filled with multiple streams, i.e. in parallel? Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap? Is this a smart way to parallelize map populations? How do I know which specific card supports the new niMap (e.g. HashMap)?

+5

java parallel-processing java-stream

user1332148 Dec 05 '15 at 0:06

source share

3 answers

Cardano · Answer 1 · 2015-12-05T00:13:43+0000

From Javadoc :

The returned collector is not parallel. For parallel streaming pipelines, the combiner function works by merging keys from one card to another, which can be an expensive operation. If you do not want the results to be inserted into the Map in execution order, using toConcurrentMap (Function, Function) can provide better concurrent performance.

So it looks like toConcurrentMap will parallelize the inserts.

The default support map is HashMap . It simply calls the version of toMap , which takes the Supplier<M> and passes HashMap::new . (source: source)

Peter Lawrey · Answer 2 · 2015-12-05T00:14:42+0000

How do I know if a card is filled with multiple streams, i.e. in parallel?

Hard to say. If your code is unexpectedly slow, it may be because you are trying to use multiple threads.

Do I need to call Collectors.toConcurrentMap instead of Collectors.toMap?

This will help make the parallel more efficient or, in a different way, slightly less inefficient.

Is this a smart way to parallelize map populations?

You can do this as you suggest, but you should note that the cost of starting a new stream is much more expensive than everything you do here, so adding even one stream will slow it down.

How do I know which specific card supports the new niMap (e.g. HashMap)?

The documentation says that you may not know for sure. The last time I checked toMap , I used HashMap and groupingBy used LinkedHashMap, but you cannot assume that this is any particular Map.

Tagir valeev · Answer 3 · 2015-12-06T11:29:54+0000

You can use both toConcurrentMap for serial stream and toMap for parallel stream. The difference is

toConcurrentMap() usually faster for a parallel thread than for a serial thread
toMap() usually faster for a serial stream than for a parallel stream

If you don’t know where your thread came from and want to do it faster in both cases, you can write like this:

 Map<String, String> niMap = niStream.collect( niStream.isParallel() ? Collectors.toConcurrentMap(NameId::getName, NameId::getId) : Collectors.toMap(NameId::getName, NameId::getId) );

The difference is that toConcurrentMap() is a CONCURRENT collector, which means that it uses a parallel data structure ( ConcurrentHashMap in the current implementation), which can be populated simultaneously from different threads. For a serial stream, this adds some unnecessary overhead, but for a parallel stream it is faster than using toMap() , as in the case of toMap() , separate non-competitive Map instances will be created for each parallel stream, then these Maps are combined, which is not very fast for large cards.

Please note that my StreamEx library, which extends the standard Stream API, adds toMap() , which uses parallel assembly for parallel flow and non-competitive collection for sequential:

 Map<String, String> niMap = StreamEx.of(niStream) .toMap(NameId::getName, NameId::getId);

How do I know if Java Stream (Collectors.toMap) is parallelized?

More articles: