C ++ 17 parallelism hardware implementation

As I could understand, C ++ 17 will come with Parallelism . However, what I could not understand is this particular parallelism hardware (default processor)? Or can it be extended to any equipment with multiple computing units?

In other words, will we see something like, for example, the “standard nVidia C ++ compiler” that is going to compile parallel parts that will run on GPUs?

Would this be a more standardized alternative to OpenCL, for example?

Note Absolutely, I do not ask: "Will nVidia do this?". I ask if C ++ 17 standards allow, and if this is theoretically possible.

+5
source share
1 answer

The question provides a link to a document proposing this change, and regarding the parallelism aspects of the substantial changes in the proposal. Yes, the compiler can do everything that makes sense for the target equipment to parallelize the execution of various algorithms, provided that it receives the correct answer (with some reservations) and that it does not impose unnecessary overhead (again, with some reservations).

There are several important points to understand.

Firstly, C ++ 17 parallelism is not a general parallel programming mechanism. It provides parallel versions of many STL algorithms, nothing more. Thus, this is not a replacement for more powerful mechanisms such as OpenCL, TBB, etc.

Secondly, when trying to parallelize algorithms, there are inherent limitations, and so I added these two entries in brackets. For example, the parallel version of std::accumulate will give the same result as the non-parallel version only if the function applied to the input range is commutative and associative. The most obvious problem here is floating point values, where mathematical operations are not associative, so the result may vary. Similarly, some algorithms actually impose more overhead when parallelizing; you get pure speed, but more complete work is done, so the acceleration for these algorithms will not be linear in the number of processors. std::partial_sum - example: each output value depends on the previous value, so it’s not easy to parallelize the algorithm. There are ways to do this, but you end up using the combiner function more than once than the non-parallel algorithm. In general, there are relaxations of complexity requirements for algorithms to reflect this reality.

+2
source

Source: https://habr.com/ru/post/1259603/


All Articles