Using mkl_set_num_threads with numpy

Question

Using mkl_set_num_threads with numpy

I am trying to set the number of threads for numpy calculations using mkl_set_num_threads , like this

 import numpy import ctypes mkl_rt = ctypes.CDLL('libmkl_rt.so') mkl_rt.mkl_set_num_threads(4)

but I keep getting segmentation error:

 Program received signal SIGSEGV, Segmentation fault. 0x00002aaab34d7561 in mkl_set_num_threads__ () from /../libmkl_intel_lp64.so

Getting the number of threads does not cause problems:

 print mkl_rt.mkl_get_max_threads()

How can I make my code work? Or is there another way to set the number of threads at runtime?

+13

python numpy intel-mkl

Daniel Feb 02 '15 at 17:17

source share

4 answers

In short, use MKL_Set_Num_Threads and his friends CamelCased when calling MKL from Python. The same applies to C if you are not #include <mkl.h> .

MKL documentation seems to suggest that the correct type signature is in C:

 void mkl_set_num_threads(int nt);

Ok, let him try the minimal program:

 void mkl_set_num_threads(int); int main(void) { mkl_set_num_threads(1); return 0; }

Compile it with GCC and arrow, Segmentation fault again. So the problem is not limited to Python.

Running through the debugger (GDB) shows:

 Program received signal SIGSEGV, Segmentation fault. 0x0000… in mkl_set_num_threads_ () from /…/mkl/lib/intel64/libmkl_intel_lp64.so

Wait a second, mkl_set_num_threads_ ? This is the version of Fortran MKL_Set_Num_Threads ! How did we eventually call the Fortran version? (Keep in mind that the Fortran calling convention requires arguments to be passed as pointers, not by value.)

Turns out the documentation was a complete facade. If you are really looking at the header files for the latest MKL versions, you will find this rather small definition:

 void MKL_Set_Num_Threads(int nth); #define mkl_set_num_threads MKL_Set_Num_Threads

... and now everything makes sense! The correct call function (for C code) is MKL_Set_Num_Threads , not MKL_Set_Num_Threads . Checking the symbol table reveals that there are actually four different options:

 nm -D /…/mkl/lib/intel64/libmkl_rt.so | grep -i mkl_set_num_threads 00000000000e3060 T MKL_SET_NUM_THREADS … 00000000000e30b0 T MKL_Set_Num_Threads … 00000000000e3060 T mkl_set_num_threads 00000000000e3060 T mkl_set_num_threads_ …

Why did Intel deliver four different variants of one function, despite the fact that the documentation contains only variants of C and Fortran? I do not know for sure, but I suspect that it is compatible with different Fortran compilers. You see, the Fortran appointment agreement is not standardized. Different compilers will change names differently:

some use uppercase
some use lower case with underscore and
some use lowercase without any decoration.

There may be other ways that I do not know about. This trick allows the MKL library to be used with most Fortran compilers without any changes, the disadvantage is that C functions must be crippled to make room for the three variants of the Fortran assignment convention.

+5

Rufflewind Feb 03 '15 at 7:42

source share

For people who are looking for a complete solution, you can use the context manager:

 import ctypes class MKLThreads(object): _mkl_rt = None @classmethod def _mkl(cls): if cls._mkl_rt is None: try: cls._mkl_rt = ctypes.CDLL('libmkl_rt.so') except OSError: cls._mkl_rt = ctypes.CDLL('mkl_rt.dll') return cls._mkl_rt @classmethod def get_max_threads(cls): return cls._mkl().mkl_get_max_threads() @classmethod def set_num_threads(cls, n): assert type(n) == int cls._mkl().mkl_set_num_threads(ctypes.byref(ctypes.c_int(n))) def __init__(self, num_threads): self._n = num_threads self._saved_n = self.get_max_threads() def __enter__(self): self.set_num_threads(self._n) return self def __exit__(self, type, value, traceback): self.set_num_threads(self._saved_n)

Then use it like:

 with MKLThreads(2): # do some stuff on two cores pass

Or simply by manipulating the configuration, calling the following functions:

 # Example MKLThreads.set_num_threads(3) print(MKLThreads.get_max_threads())

Code is also available at this point .

0

Alex Maystrenko Jan 24 '19 at 16:25

source share

For those looking for a cross-platform and batch solution, note that we recently released threadpoolctl , a module to limit the number of threads used in level C thread pools called python ( OpenBLAS , OpenMP and MKL ). See this answer for more information.

0

Thomas moreau Jun 04 '19 at 11:46

source share

Daniel · Accepted Answer · 2015-02-03T07:01:28+0000

Ophion led me right. Despite the documentation, you need to pass the mkl_set_num_thread parameter by reference.

Now I have defined functions for receiving and setting threads

 import numpy import ctypes mkl_rt = ctypes.CDLL('libmkl_rt.so') mkl_get_max_threads = mkl_rt.mkl_get_max_threads def mkl_set_num_threads(cores): mkl_rt.mkl_set_num_threads(ctypes.byref(ctypes.c_int(cores))) mkl_set_num_threads(4) print mkl_get_max_threads() # says 4

and they work as expected.

Edit: according to Rufflewind, the names of C-functions are written to capital-case, which expect parameters by value:

 import ctypes mkl_rt = ctypes.CDLL('libmkl_rt.so') mkl_set_num_threads = mkl_rt.MKL_Set_Num_Threads mkl_get_max_threads = mkl_rt.MKL_Get_Max_Threads

Using mkl_set_num_threads with numpy

More articles: