Matlab: -maxNumCompThreads, hyperthreading and parkour

I am running Matlab R2014a on a node in a Linux cluster with 20 cores enabled and a hyperthread. I know this has been discussed before, but I'm looking for some clarification. Here is my understanding of the kernel threading problem in Matlab:

  • Matlab has built-in multithreading capabilities and will use additional cores on a multi-core machine.
  • Matlab launches its threads in such a way that using multiple Matlab threads on the same kernel (i.e. hyperthreading) is not useful. Thus, by default, the maximum number of threads that Matlab will create is the number of cores on your system.
  • When using parpool (), regardless of the number of workers you create, each worker will use only one physical core, as indicated in this thread .

However, I also read that using the (deprecated) maxNumCompThreads () function, you can either reduce or increase the number of threads that Matlab or one of the workers will generate. This can be useful in several scenarios:

  • You want to use the implicit Matlab multithreading capabilities to run some code in a node cluster without highlighting the entire node. It would be nice if there was some other way to do this if maxNumCompThreads is ever deleted.
  • You want to expand the parameters, but have fewer parameters than the number of cores on your computer. In this case, you can increase the number of threads per worker so that all your cores are used. This has been suggested recently in this thread . However, in my experience, while individual workers seem to be happy to use maxNumCompThreads () to increase the number of threads, checking the actual CPU usage using the β€œtop” command suggests that this has no effect, that is, every worker is all still only gets use of a single core. It is possible that the individual Matlab processes spawned by the park start with the -singleCompThread argument. I confirmed that if the parent Matlab process starts with -singleCompThread, the maxNumCompThreads (n) command, where n> 1 throws an error because Matlab is running in single-threaded mode. So the result, apparently, is that (at least in 2014) you cannot increase the number of computational threads for parallel pool workers. In this regard, I can not force the Matlab Parent process to start more threads than there are kernels, even though the computer itself has hyperthreading. Again, it will successfully run maxNumCompThreads (n), where n> # are physical cores, but the fact that the top shows that the processor load is 50% suggests otherwise. So what is happening, or what am I misunderstanding?

Edit: state my questions in more detail:

  • As part of the parfor loop, why is maxNumCompThreads (n) not set when n> 1 is working? If this is because the workflow starts with -singleCompThread, why does maxNumCompThreads () not return an error like in the parent process starting with -singleCompThread?
  • In the parent process, why doesn't maxNumCompThreads (n) be used, where n> # physical kernels, do something?

Note. I posted this earlier in Matlab's answers, but didn't get any feedback.

Edit2: It looks like the problem in (1) was the problem with the test code I used.

+1
source share
3 answers

I was mistaken in the fact that maxNumCompThreads does not work with groundwork workers. It seems like the problem was that the code I used was:

 parfor j = 1:2 tic maxNumCompThreads(2); workersCompThreads(j) = maxNumCompThreads; i = 1; while toc < 200 a = randn(10^i)*randn(10^i); i = i + 1; end end 

used so much memory by the time I checked the processor load that I / O was the bottleneck and the additional threads were already closed. When I did the following:

 parfor j = 1:2 tic maxNumCompThreads(2); workersCompThreads(j) = maxNumCompThreads; i = 4; while toc < 200 a = randn(10^i)*randn(10^i); end end 

Additional threads started and stayed working.

Regarding the second problem, I received confirmation from Mathworks that the parent Matlab process will not start more threads than the number of physical cores, even if you explicitly raise the limit above this. Therefore, in the documentation, the sentence:

"Currently, the maximum number of processing threads is equal to the number of processing cores on your computer."

must say:

"Currently, the maximum number of computational threads is equal to the number of physical cores on your computer."

+1
source

This is a rather long question, but I think the direct answer is that yes, as I understand it, MATLAB employees start with -singleCompThread .

+1
source

Firstly, some quick tests confirming our understanding:

> matlab.exe -singleCompThread

 >> warning('off', 'MATLAB:maxNumCompThreads:Deprecated') >> maxNumCompThreads ans = 1 >> maxNumCompThreads(2) Error using feature MATLAB has computational multithreading disabled. To enable multithreading please restart MATLAB without singleCompThread option. Error in maxNumCompThreadsHelper (line 37) Error in maxNumCompThreads (line 27) lastn = maxNumCompThreadsHelper(varargin{:}); 

As indicated, when MATLAB starts with the -singleCompThread option, we cannot override it with maxNumCompThreads .

> matlab.exe

 >> parpool(2); % local pool >> spmd, n = maxNumCompThreads, end Lab 1: n = 1 Lab 2: n = 1 

We see that each worker is by default limited to one stream of computation. This is good because we want to avoid the excessive subscription and unnecessary context switches that occur when the number of threads trying to execute exceeds the number of available physical / logical cores. Therefore, in theory, the best way to maximize processor load is to run as many single-threaded workers as we have cores.

No, looking at the local work processes running in the background, we see that each of them starts as:

 matlab.exe -dmlworker -noFigureWindows [...] 

I believe that the undocumented -dmlworker option does something similar to -singleCompThread , but is probably a little different. Firstly, I was able to override it with maxNumCompThreads(2) without throwing an error as before.

Remember that even if the MATLAB session operates in single-threaded calculation mode, this does not mean that the computational flow is limited to only one CPU core (the flow may jump between the kernels designated by the OS scheduler). You will need to establish the proximity of work processes if you want to control this.


So, I did some profiling using the Intel VTune Amplifier. Basically, I ran some linear algebra code and performed the analysis of the β€œhot spots” by joining the MATLAB process and filtering the mkl.dll module (this is Intel MKL library, which MATLAB uses as an optimized implementation of BLAS / LAPACK). Here are my results:

- Sequential mode

I used the following code: eig(rand(500));

  • MATLAB usually starts, the calculation generates 4 threads (which is automatically selected by default if I have a quad-core Intel i7 Intel processor).
  • running MATLAB is usually, but calling maxNumCompThreads(1) before computing. As expected, only the calculation only calculates 1 thread.
  • starting MATLAB with the -singleCompThread option, again only 1 thread is used.

- Parallel mode ( parpool )

I used the following code: parpool(2); spmd, eig(rand(500)); end parpool(2); spmd, eig(rand(500)); end parpool(2); spmd, eig(rand(500)); end . In both cases below, MATLAB runs normally

  • when you run the code for workers with default settings, each worker is limited to one calculation flow
  • when I redefine the settings for workers using maxNumCompThreads(2) , then each worker will use 2 threads

Here is a screenshot of what VTune reports:

vtune_hotspot_analysis

Hope that answers your questions :)

+1
source

Source: https://habr.com/ru/post/1205039/


All Articles