NumPy may in some cases use a library that uses several processes for processing and thereby distributes the load across multiple cores. This, however, is library dependent and has little to do with python code in NumPy. So yes, NumPy and any other library can overcome these limitations if they are not written in python. There are even some libraries that offer accelerated GPU features.
NumExpr uses the same method to bypass GIL. On your home page:
In addition, numexpr implements support for multithreaded computing directly on its internal virtual machine, written in C. This allows you to bypass the GIL in Python
However, there are some fundamental differences between NumPy and NumExpr. Numpy is focused on creating a good Pythonic interface for operations with arrays, NumExpr has a much narrower scope and its own language. When NumPy performs the calculation c = 3*a + 4*b , where the operands are arrays, two temporary arrays ( 3*a and 4*b ) are created in the process. In this case, NumExpr can optimize the calculation so that multiplications and additions are performed in stages without the use of any intermediate results.
This leads to some interesting things with NumPy. The following tests were conducted with a 4-core 8-thread i7 processor, and the time was reviewed with iPython %timeit :
import numpy as np import numexpr as ne def addtest_np(a, b): a + b def addtest_ne(a, b): ne.evaluate("a+b") def addtest_np_inplace(a, b): a += b def addtest_ne_inplace(a, b): ne.evaluate("a+b", out=a) def addtest_np_constant(a): a + 3 def addtest_ne_constant(a): ne.evaluate("a+3") def addtest_np_constant_inplace(a): a += 3 def addtest_ne_constant_inplace(a): ne.evaluate("a+3", out=a) a_small = np.random.random((100,10)) b_small = np.random.random((100,10)) a_large = np.random.random((100000, 1000)) b_large = np.random.random((100000, 1000))
Of course, using synchronization methods is not very accurate, but there are certain general trends:
- NumPy uses fewer cloc loops (np <ne1)
- parallelism helps a bit with very large arrays (10-20%)
- NumExpr is much slower with small arrays
- NumPy is very strong with on-site operations
NumPy does not make simple arithmetic operations parallel, but, as can be seen from the above, it does not really matter. The speed is mainly limited by the memory bandwidth, not the processing power.
If we do something more complex, everything changes.
np.sin(a_large) # 19.4 ns/element ne.evaluate("sin(a_large)") # 5.5 ns/element
Speed ββis no longer limited by memory bandwidth. To make sure that this is really thread-related (and not because NumExpr sometimes uses some fast libraries):
ne.set_num_threads(1) ne.evaluate("sin(a_large)")
Here parallelism helps a lot.
NumPy can use parallel processing with more complex linear operations, such as matrix inversions. These operations are not supported by NumExpr, so there is no meaningful comparison. Actual speed depends on the library used (BLAS / Atlas / LAPACK). In addition, when performing complex operations such as FFT, performance is library dependent. (AFAIK, NumPy / SciPy does not yet support fftw .)
As a result, it seems that NumExpr cases are very fast and useful. Then there are times when NumPy is the fastest. If you have arrays of rage and item-wise operations, NumExpr is very powerful. However, it should be noted that some parallelism (or even the spread of computing on computers) is often quite easy to incorporate into the code using multiprocessing or something equivalent.
The issue of "multiprocessing" and "multithreading" is somewhat more complicated, as the terminology is a little shaky. In python, "thread" is something that works under the same GIL, but if we talk about threads and processes of the operating system, then there is no difference between them. For example, in Linux there is no difference between the two.