Matlab restricts TBB but not OpenMP

I just ask this to try to understand that I spent 24 hours trying to fix it.

My system: Ubuntu 12.04.2, Matlab R2011a, both of them are a 64-bit Intel Xeon processor based on Nehalem.

The problem is that Matlab allows OpenMP-based programs to use all CPU cores that support hyper-threading, but it does not allow them to be used for TBB.

When starting TBB, I can only start 4 threads, even when I change maxNumCompThreads to 8. In OpenMP, I can use all the threads I want. Without Hyper-threading, both TBB and OpenMP use all 4 cores, of course.

I understand that Hyper-threading and that it virtual, but matlab restriction does in fact lead to a fine for the performance (additional link ).

I tested this problem using 2 programs, a simple loop with

#pragma omp parallel for

and another very simple loop based on the tbb example code.

tbb::task_scheduler_init init(tbb::task_scheduler_init::deferred);
tbb::parallel_for_each(tasks.begin(),tasks.end(),invoker<mytask>());

and wrapped both of them with the matrix function mexFunction.

Does anyone have an explanation? Is there an inherent difference in the thread creation method or structure that allows Matlab to throttle TBB but not allow this throttling for OpenMP?

Code for reference:

OpenMP:

#include "mex.h"

void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[] ){
    threadCount = 100000;
#pragma omp parallel for
    for(int globalId = 0; globalId < threadCount ; globalId++)
    {
        for(long i=0;i<1000000000L;++i) {} // Deliberately run slow
    }
}

TBB:

#include "tbb/parallel_for_each.h"
#include "tbb/task_scheduler_init.h"
#include <iostream>
#include <vector>
#include "mex.h"

struct mytask {
  mytask(size_t n)
    :_n(n)
  {}
  void operator()() {
    for (long i=0;i<1000000000L;++i) {}  // Deliberately run slow
    std::cerr << "[" << _n << "]";
  }
  size_t _n;
};

template <typename T> struct invoker {
  void operator()(T& it) const {it();}
};

void mexFunction(int nlhs, mxArray* plhs[], int nrhs, const
mxArray* prhs[]) {

  tbb::task_scheduler_init init(tbb::task_scheduler_init::deferred);  // Automatic number of threads

  std::vector<mytask> tasks;
  for (int i=0;i<10000;++i)
    tasks.push_back(mytask(i));

  tbb::parallel_for_each(tasks.begin(),tasks.end(),invoker<mytask>());

}
+3
source share
1 answer

, , . deferred . automatic, ( src/tbb/tbb_misc_ex.cpp, . . initialize_hardware_concurrency_info())

:

#include "tbb/parallel_for_each.h"
#include "tbb/task_scheduler_init.h"
#include "tbb/atomic.h"
#include "tbb/spin_mutex.h"
#include <iostream>
#include <vector>

// If LOW_THREAD == 0, run with task_scheduler_init(automatic), which is the number
// of cores available.  If 1, start with 1 thread.

#ifndef NTASKS
#define NTASKS 50
#endif
#ifndef MAXWORK
#define MAXWORK 400000000L
#endif
#ifndef LOW_THREAD
#define LOW_THREAD 0  // 0 == automatic
#endif

tbb::atomic<size_t> cur_par;
tbb::atomic<size_t> max_par;

#if PRINT_OUTPUT
tbb::spin_mutex print_mutex;
#endif

struct mytask {
  mytask(size_t n) :_n(n) {}
  void operator()() {
      size_t my_par = ++cur_par;
      size_t my_old = max_par;
      while( my_old < cur_par) { my_old = max_par.compare_and_swap(my_par, my_old); }

      for (long i=0;i<MAXWORK;++i) {}  // Deliberately run slow
#if PRINT_OUTPUT
      {
          tbb::spin_mutex::scoped_lock s(print_mutex);
          std::cerr << "[" << _n << "]";
      }
#endif
      --cur_par;
  }
  size_t _n;
};

template <typename T> struct invoker {
  void operator()(T& it) const {it();}
};

void mexFunction(/*int nlhs, mxArray* plhs[], int nrhs, const mxArray* prhs[]*/) {

    for( size_t thr = LOW_THREAD; thr <= 128; thr = thr ? thr * 2: 1) {
        cur_par = max_par = 0;
        tbb::task_scheduler_init init(thr == 0 ? (unsigned int)tbb::task_scheduler_init::automatic : thr);

        std::vector<mytask> tasks;
        for (int i=0;i<NTASKS;++i) tasks.push_back(mytask(i));

        tbb::parallel_for_each(tasks.begin(),tasks.end(),invoker<mytask>());
        std::cout << " for thr == ";
        if(thr) std::cout << thr; else std::cout << "automatic";
        std::cout << ", maximum parallelism == " << (size_t)max_par << std::endl;
    }
}

int main() {
    mexFunction();
}

16- :

for thr == automatic, maximum parallelism == 16
for thr == 1, maximum parallelism == 1
for thr == 2, maximum parallelism == 2
for thr == 4, maximum parallelism == 4
for thr == 8, maximum parallelism == 8
for thr == 16, maximum parallelism == 16
for thr == 32, maximum parallelism == 32
for thr == 64, maximum parallelism == 50
for thr == 128, maximum parallelism == 50

50 - , .

, TBB, , , , for_each, ; for_each . TBB , OpenMP, OpenMP parallel_for TBB parallel_for_each .

+2

Source: https://habr.com/ru/post/1544939/


All Articles