I am trying to use tbb for multithreading an existing recursive algorithm. The single-threaded version uses tail call recursion, structurally it looks something like this:
void my_func() { my_recusive_func (0); } bool doSomeWork (int i, int& a, int& b, int& c) {
I am new to tbb, so my first attempt used the parallel_invoke function:
void my_recusive_func (int i) { int a, b, c; bool notDone = doSomeWork (i, a, b, c); if (notDone) { tbb::parallel_invoke ( [a]{my_recusive_func (a);}, [b]{my_recusive_func (b);}, [c]{my_recusive_func (c);}); } }
This works, and it works faster than the single-threaded version, but it does not seem to scale with the number of cores. I am targeting 16 cores (32 hyperthreads), so scaling is very important for this project, but this version only speeds up work on this computer 8 times faster, and many cores seem to be inactive while the algorithm is running.
My theory is that tbb is waiting for child tasks to complete after parallel_invoke, so there can be many tasks that sit around waiting for a wait without extra wait? Will this explain idle cores? Is there a way to return the parental task without waiting for the children to appear? I thought maybe something like this, but I still don't know enough about the scheduler, but I know if everything is okay or not:
void my_func() { tbb::task_group g; my_recusive_func (0, g); g.wait(); } void my_recusive_func (int i, tbb::task_group& g) { int a, b, c; bool notDone = doSomeWork (i, a, b, c); if (notDone) { g.run([a,&g]{my_recusive_func(a, g);}); g.run([b,&g]{my_recusive_func(b, g);}); my_recusive_func (c, g); } }
My first question is: tbb::task_group::run() thread safe? I could not figure it out from the documentation. Also, is there a better way to do this? Perhaps I should use low-level scheduler calls?
(I typed this code without compilation, so please forgive the typos.)