Why should I use a parallel characteristic in a parallel thread using collection?

Why use a parallel feature in a parallel thread using collect:

List<Integer> list = Collections.synchronizedList(new ArrayList<>(Arrays.asList(1, 2, 4))); Map<Integer, Integer> collect = list.stream().parallel() .collect(Collectors.toConcurrentMap(k -> k, v -> v, (c, c2) -> c + c2)); 

And not:

 Map<Integer, Integer> collect = list.stream().parallel() .collect(Collectors.toMap(k -> k, v -> v, (c, c2) -> c + c2)); 

In other words, which side effects do not use this feature, is it useful for internal thread operations?

+5
source share
3 answers

These two collectors work differently.

First, the Stream structure breaks the workload into independent fragments that can be processed in parallel (therefore, you do not need a special collection as a source, a synchronizedList not required).

When using a non-competitive collector, each piece will be processed by creating a local container (here, Map ) using the Collectors provider and accumulating in the local container (placing records). These partial results should be combined, i.e. One card will be placed in another to get the final result.

The parallel collector supports simultaneous accumulation, so only one ConcurrentMap will be created, and all streams will accumulate on this card at a time. Therefore, after completion, no merge step is required, since there is only one card.


Thus, both collectors are thread safe, but can have completely different performance characteristics depending on the task. If Streams workload is heavy before collecting the result, the differences may be small. If, as in your example, there is no corresponding work for the collection operation, the result largely depends on how often it is necessary to match the comparisons, i.e. The same key, and as a valid target, ConcurrentMap deals with competition in the parallel case.

If you basically have different keys, the step of merging a non-competitive collector can be as expensive as the previous one, destroying any advantage of parallel processing. But if you have many duplicate keys that require merging values, then competition on the same key can degrade the performance of parallel collectors.

So, there is no simple โ€œbest answerโ€ (well, if there was such an answer, why bother adding another option). It depends on your actual work. You can use the expected scenario as a starting point to select it, but then you must measure real-life data. Since both are equivalent, you can change your choice at any time.

+11
source

First of all, I gave the answer "1" to Holger, he is good. I would try just this a bit, saying that:

CONCURRENT -> multiple threads throw data in one container in a specific order (ConcurrentHashMap)

NON-CONCURRENT -> multiple threads combine intermediate results.

The easiest way to understand this (IMHO) is to write a custom collector and play with each of its methods: supplier, battery, combiner.

It has already been hidden here

+3
source

Because of this: "The effects of memory consistency. As with other parallel collections, actions in a thread before placing an object in ConcurrentMap as a key or value occur before actions after accessing or deleting this object from ConcurrentMap in another thread."

0
source

Source: https://habr.com/ru/post/1260994/


All Articles