Memory Items in Parallel.For

Question

Memory Items in Parallel.For

The Microsoft Parallel.For documentation contains the following method:

static void MultiplyMatricesParallel(double[,] matA, double[,] matB, double[,] result) { int matACols = matA.GetLength(1); int matBCols = matB.GetLength(1); int matARows = matA.GetLength(0); // A basic matrix multiplication. // Parallelize the outer loop to partition the source array by rows. Parallel.For(0, matARows, i => { for (int j = 0; j < matBCols; j++) { double temp = 0; for (int k = 0; k < matACols; k++) { temp += matA[i, k] * matB[k, j]; } result[i, j] = temp; } }); // Parallel.For }

In this method, potentially multiple threads read values from matA and matB that were created and initialized in the calling thread, and potentially multiple threads write values to result , which are later read by the calling thread. Within the lambda that passed before Parallel.For , an explicit lock around the array is not read or written. Since this example comes from Microsoft, I assume it is thread safe, but I'm trying to figure out what is going on behind the scenes to make it thread safe.

As far as I understand from the read and other questions that I asked on SO (for example this one ), several memory barriers are required for this to work. It:

memory barrier in the calling thread after creating and initializing matA and matB ,
memory barrier for each non-calling thread before reading values from matA and matB ,
memory barrier for each non-calling thread after writing the values of result and
memory barrier in the calling thread before reading values from result .

Did I understand this correctly?

If so, does Parallel.For all this somehow? I went to delve into the original source, but had problems after the code . I have not seen any lock blocks or MemoryBarrier calls.

+5

multithreading c # memory-barriers parallel.for

adv12 Jan 14 '17 at 19:03

source share

4 answers

Since the array has already been created, writing to or reading from it will not result in a resize. In addition, the code itself prohibits reading and writing of the same position in the array.

The bottom line is that the code can always calculate the position for reading and writing in the array, and these calls never intersect with each other. Therefore, it is thread safe.

+6

Patrick hofman Jan 14 '17 at 19:09

source share

The memory fields you are looking for are inside the task scheduler.

ParallelFor disrupts work in tasks, and a work scheduler performs this task. The minimum memory barriers required for the work scheduler are:

Run "release" after creating the task.
"Capture" the fence when stealing a task.
“Block” the fence when the stolen task completes.
Get a fence for a thread waiting to complete a task.

Look here where 1 is implied by atomic ("blocked") operations used to queue a task. Look here where 2 is implied by atomic operations, volatile reads and / or locks when a task is stolen.

I could not track where 3 and 4 are. 3 and 4 are most likely implied by some kind of atomic compound counter.

+5

Arch D. robison Jan 15 '17 at 15:31

source share

I think you are really impressed with the idea of memory barriers, but I really cannot understand your problems. Let me take a look at the code you investigated:

3 arrays are initiated and filled with values in the main line. So, as you assign a variable to a value and call a method - the CLR ensures that your method receives a new value for the arguments. A possible problem here may take place if initialization is performed in the background and / in parallel with some other thread. In this case, you are right, and you need some synchronization constructs here: either a memory barrier, or a lock statement, or other methods.
The code for parallel execution takes all values from 0 to matARows and creates an array of tasks for them. You need to understand two different ways to parallelize code: operations and data. Right here we have some lines with the same lambda operation for them. The assignments for the temp variable are not shared, so they are thread safe and no memory barrier is required since there are no old and new values. Again, as in the first place, if some other rows update the original matrices, synchronization constructs are needed here.
Parallel.For ensures that all tasks will be completed (completed, canceled or failed) until they proceed to the next operations, so the internal code loop will be executed as a normal method. Why don't you need a barrier here? Since all write operations are performed on different lines, and there is no intersection between them, therefore this is parallelism data. However, as in other cases, if other threads are needed for the new value from some iterations of the loop, you still need synchronization. Thus, this code is thread safe because it is geometrically parallel to the data and does not create race conditions.

This example is very simple, and a real algorithm generally needs more complex logic. You can explore various methods to make the code thread safe without using synchronization for this code without blocking.

+2

VMAtm Jan 14 '17 at 22:30

source share

Henk holterman · Accepted Answer · 2017-01-14T20:49:34+0000

Inside threads (actually: Tasks), access to matA and matB is read-only, the result is write-only.

Parallel reading is essentially thread safe, writing is thread safe, because the variable i unique for each task.

There is no need for memory barriers in this code fragment (differently than before / after all Parallel.For, but this can be assumed).

Create item numbering,
1) and 4) are implied Parallel.For ()
2) and 3) are simply not needed.

Memory Items in Parallel.For

More articles: