Avoiding race conditions with scala parallel collections

Are parallel collections associated with side effects operations? If so, how can you avoid race conditions? For instance:

var sum=0 (1 to 10000).foreach(n=>sum+=n); println(sum) 50005000 

no problem with that. But if you try to parallelize, the race conditions will occur:

 var sum=0 (1 to 10000).par.foreach(n=>sum+=n);println(sum) 49980037 
+6
source share
2 answers

Quick answer: do not do this. Parallel code should be parallel, not parallel.

The best answer:

 val sum = (1 to 10000).par.reduce(_+_) // depends on commutativity and associativity 

See also aggregate .

+17
source

The parallel case does not work because you do not use mutable variables, therefore you do not ensure the visibility of your records and because you have several threads that perform the following actions:

  • read sum into register
  • add sum to the register
  • write updated value to memory

If 2 threads perform the first step one by one, and then proceed to complete the remaining steps above in any order, they will eventually overwrite one of the updates.

  • Use the @volatile annotation to make sum visible when doing something like this. See here .
  • Even with @volatile due to the non-atomic nature of the increment, you will lose some increments. You should use AtomicInteger and their incrementAndGet .
  • Although using atomic counters ensures correctness, shared variables hinder performance here - your shared variable is now a performance bottleneck because each thread is trying to atomically write to the same cache line. If you wrote this variable infrequently, it will not be a problem, but since you do it at each iteration, there will be no acceleration - in fact, due to the transfer of rights to the cache line between processors, it will probably be slower.

So, as Daniel suggested, use reduce for this.

+4
source

Source: https://habr.com/ru/post/912094/


All Articles