There are two ways of nested loops in TBB.
Since TBB is designed to flawlessly support nested parallelism, simply write nested parallel fors:
tbb::parallel_for(0, 100, [](int i){ tbb::parallel_for(0, 100, [](int j){ tbb::parallel_for(0, 100, [](int k){ printf("Hello World %d/%d/%d\n", i, j, k); }); }); });
This option works well when loops belong to different modules and / or libraries.
Otherwise, collapse two or three nested loops with blocked_range2d or blocked_range3d . It can also help optimize cache locality and thus improve performance even on a single thread when accessing arrays:
tbb::parallel_for( tbb::blocked_range3d<int>(0, 100, 0, 100, 0, 100), []( const tbb::blocked_range3d<int> &r ) { for(int i=r.pages().begin(), i_end=r.pages().end(); i<i_end; i++){ for(int j=r.rows().begin(), j_end=r.rows().end(); j<j_end; j++){ for(int k=r.cols().begin(), k_end=r.cols().end(); k<k_end; k++){ printf("Hello World %d\n", matrix3d[i][j][k]); } } } });
source share