Stream Block Building ( TBB ) is the C ++ boilerplate library for the parallelism task. The library contains various algorithms and data structures specialized for the parallelism task. I had success using parallel_for as well as parallel_pipeline to speed up the computation significantly. With a little extra coding, TBB parallel_for can take a serial loop that is suitable for parallel execution and makes it run as such (see the example here ). TBB parallel_pipeline has the ability to perform a chain of dependent tasks with the ability to perform each of them in parallel or sequentially (see the Example here ). There are many more examples on the Internet, especially at software.intel.com, but here in stackoverflow ( see here ).
OpenMP - API parallelism, . , , TBB, OpenMP ( ). , , , OpenMP TBB . , OpenMP . , , OpenMP over TBB ( , , , TBB). , OpenMP, . (, wikipedia) OpenMP, stackoverflow.
SIMD ( , ), parallelism. , OpenMP - SIMD ( ). , SSE AVX ( x86), NEON ( ARM) . SSE AVX. , ( , Intel intrinsics). , , .
, parallelism , , Intel MKL ( ), OpenBLAS. , / / (, BLAS LAPACK). , , parallelism. , parallelism (, , ), ( ) . , .