I just ask this to try to understand that I spent 24 hours trying to fix it.
My system: Ubuntu 12.04.2, Matlab R2011a, both of them are a 64-bit Intel Xeon processor based on Nehalem.
The problem is that Matlab allows OpenMP-based programs to use all CPU cores that support hyper-threading, but it does not allow them to be used for TBB.
When starting TBB, I can only start 4 threads, even when I change maxNumCompThreads to 8. In OpenMP, I can use all the threads I want. Without Hyper-threading, both TBB and OpenMP use all 4 cores, of course.
I understand that Hyper-threading and that it virtual, but matlab restriction does in fact lead to a fine for the performance (additional link ).
I tested this problem using 2 programs, a simple loop with
#pragma omp parallel for
and another very simple loop based on the tbb example code.
tbb::task_scheduler_init init(tbb::task_scheduler_init::deferred);
tbb::parallel_for_each(tasks.begin(),tasks.end(),invoker<mytask>());
and wrapped both of them with the matrix function mexFunction.
Does anyone have an explanation? Is there an inherent difference in the thread creation method or structure that allows Matlab to throttle TBB but not allow this throttling for OpenMP?
Code for reference:
OpenMP:
#include "mex.h"
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[] ){
threadCount = 100000;
#pragma omp parallel for
for(int globalId = 0; globalId < threadCount ; globalId++)
{
for(long i=0;i<1000000000L;++i) {}
}
}
TBB:
#include "tbb/parallel_for_each.h"
#include "tbb/task_scheduler_init.h"
#include <iostream>
#include <vector>
#include "mex.h"
struct mytask {
mytask(size_t n)
:_n(n)
{}
void operator()() {
for (long i=0;i<1000000000L;++i) {}
std::cerr << "[" << _n << "]";
}
size_t _n;
};
template <typename T> struct invoker {
void operator()(T& it) const {it();}
};
void mexFunction(int nlhs, mxArray* plhs[], int nrhs, const
mxArray* prhs[]) {
tbb::task_scheduler_init init(tbb::task_scheduler_init::deferred);
std::vector<mytask> tasks;
for (int i=0;i<10000;++i)
tasks.push_back(mytask(i));
tbb::parallel_for_each(tasks.begin(),tasks.end(),invoker<mytask>());
}