D simultaneous write to the buffer

Say you have a buffer of size N that needs to be set to certain values ​​(say, zero or something else). This buffer value setting is divided into M threads, each of which processes N / M buffer elements.

The buffer cannot be immutable , since we are changing values. Message passing will also not work, as it is forbidden to pass the types ref or array (= pointer). So this should happen via shared ? No, because in my case, buffer elements are of type creal , and therefore arithmetic is not atomic.

In the end, the main program should wait for all threads to complete. It is given that each thread writes only to a subset of the array, and none of the threads overlap in the array with another thread or in any way depends on eachother.

How can I write (or modify) a buffer in parallel?

PS: sometimes I can just split the array into M consecutive fragments, but sometimes I look at the array (1D array, but it is 2D data) by column. Which makes the individual arrays used by streams actually alternating in the mother array. Argh.


EDIT: I realized that the shared(creal)[] would work, since now the elements are separated, not the array itself. You can parallelize interleaved arrays. However, there are some disadvantages:

The shared storage class is so strict that the distribution must be provided with a keyword. This makes him hardly encapsulated; since the caller must pass the array, it must pass the shared array and cannot just pass the regular array and let the processing function worry about parallelism. No, the calling function must also worry about parallelism, so the processing function gets the shared array and does not need to redistribute the array into shared space.

There is also a very strange error when I dynamically allocate shared(creal)[] in certain places, it just freezes during distribution. It seems very random and cannot find the culprit ... In a test example, this works, but not in my project ... This turned out to be an error in DMD / OptLink.


EDIT2: I never mentioned, but this is for FFT implementation (Fast Fourier Theorem). Therefore, I have no power over the selection of exact, aligned cache fields. All I know are creal elements, and the number of elements is 2 (for each row / column).

+3
source share
1 answer

you can use std.parallelism module

 T[] buff; foreach(ref elem;parallel(buff))elem=0; 

but if you want to rethink this, you can simply use shared access (thread safety allows only 1 thread accessing a specific element at a time and if you are doing this using the appropriate join() or Task.*force() better)

+2
source

Source: https://habr.com/ru/post/1384118/


All Articles