The most efficient way to map functions over a numpy array

What is the most efficient way to map a function over a numpy array? The way I did this in my current project is as follows:

import numpy as np x = np.array([1, 2, 3, 4, 5]) # Obtain array of square of each element in x squarer = lambda t: t ** 2 squares = np.array([squarer(xi) for xi in x]) 

However, it looks like it is probably very inefficient, since I use list comprehension to create a new array as a Python list, before converting it back to a numpy array.

Can we do better?

+212
performance python numpy
Feb 05 '16 at 2:08
source share
10 answers

I checked all the suggested methods plus np.array(map(f, x)) with perfplot ( perfplot small project).

Message # 1: If you can use NumPy native functions, do it.

If the function you are trying to vectorize is already vectorized (as an example x**2 in the original post), then using it is much faster than anything else (note the scale of the log):

enter image description here

If you really need vectorization, it doesn't really matter which option you use.

enter image description here




Code for playing sections:

 import numpy as np import perfplot import math def f(x): # return math.sqrt(x) return np.sqrt(x) vf = np.vectorize(f) def array_for(x): return np.array([f(xi) for xi in x]) def array_map(x): return np.array(list(map(f, x))) def fromiter(x): return np.fromiter((f(xi) for xi in x), x.dtype) def vectorize(x): return np.vectorize(f)(x) def vectorize_without_init(x): return vf(x) perfplot.show( setup=lambda n: np.random.rand(n), n_range=[2**k for k in range(20)], kernels=[ f, array_for, array_map, fromiter, vectorize, vectorize_without_init ], logx=True, logy=True, xlabel='len(x)', ) 
+180
Sep 28 '17 at 13:28
source share

How about using numpy.vectorize .

 >>> import numpy as np >>> x = np.array([1, 2, 3, 4, 5]) >>> squarer = lambda t: t ** 2 >>> vfunc = np.vectorize(squarer) >>> vfunc(x) array([ 1, 4, 9, 16, 25]) 

https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

+100
Feb 05 '16 at 2:29
source share

TL; DR

As @ user2357112 notes , the โ€œdirectโ€ method of applying a function is always the fastest and easiest way to map a function to Numpy arrays:

 import numpy as np x = np.array([1, 2, 3, 4, 5]) f = lambda x: x ** 2 squares = f(x) 

Usually avoid using np.vectorize as it is np.vectorize and has (or had) a number of problems . If you work with other types of data, you can learn the other methods shown below.

Method Comparison

Here are some simple tests to compare the three methods for function mapping; this example is used with Python 3.6 and NumPy 1.15.4. First, the function settings for testing:

 import timeit import numpy as np f = lambda x: x ** 2 vf = np.vectorize(f) def test_array(x, n): t = timeit.timeit( 'np.array([f(xi) for xi in x])', 'from __main__ import np, x, f', number=n) print('array: {0:.3f}'.format(t)) def test_fromiter(x, n): t = timeit.timeit( 'np.fromiter((f(xi) for xi in x), x.dtype, count=len(x))', 'from __main__ import np, x, f', number=n) print('fromiter: {0:.3f}'.format(t)) def test_direct(x, n): t = timeit.timeit( 'f(x)', 'from __main__ import x, f', number=n) print('direct: {0:.3f}'.format(t)) def test_vectorized(x, n): t = timeit.timeit( 'vf(x)', 'from __main__ import x, vf', number=n) print('vectorized: {0:.3f}'.format(t)) 

Five-element testing (sorted from fastest to slowest):

 x = np.array([1, 2, 3, 4, 5]) n = 100000 test_direct(x, n) # 0.265 test_fromiter(x, n) # 0.479 test_array(x, n) # 0.865 test_vectorized(x, n) # 2.906 

With hundreds of items:

 x = np.arange(100) n = 10000 test_direct(x, n) # 0.030 test_array(x, n) # 0.501 test_vectorized(x, n) # 0.670 test_fromiter(x, n) # 0.883 

And with thousands of array elements or more:

 x = np.arange(1000) n = 1000 test_direct(x, n) # 0.007 test_fromiter(x, n) # 0.479 test_array(x, n) # 0.516 test_vectorized(x, n) # 0.945 

Different versions of Python / NumPy and compiler optimization will have different results, so do a similar test for your environment.

+53
05 Feb '16 at 4:36
source share

Since this question was answered, many events have occurred : around the numbers npr , numba and cython . The purpose of this answer is to take these possibilities into account.

But first let me state the obvious: no matter how you map a Python function to a numpy array, it remains a Python function, which means for each evaluation:

  • The numpy-array element must be converted to a Python object (e.g. Float ).
  • all calculations are performed with Python objects, which means there is overhead for the interpreter, dynamic dispatching, and immutable objects.

Thus, what mechanism is used to loop through the array does not play a big role due to the above-mentioned costs - it remains much slower than using a simple vectorization.

Let's look at the following example:

 # numpy-functionality def f(x): return x+2*x*x+4*x*x*x # python-function as ufunc import numpy as np vf=np.vectorize(f) vf.__name__="vf" 

np.vectorize as a representative of the pure Python approach class. Using perfplot (see the code in the appendix to this answer), we get the following runtime:

enter image description here

We see that the numpy approach is 10-100 times faster than the version in pure Python. Probably, the performance degradation with large arrays is due to the fact that the data no longer fits in the cache.

You can often hear that NumPy's performance is as good as possible because it is pure C under the hood. However, there are many opportunities for improvement!

The vectorized numpy version uses a lot of extra memory and memory accesses. Numexp-library tries to arrange numpy arrays and thus get better cache usage:

 # less cache misses than numpy-functionality import numexpr as ne def ne_f(x): return ne.evaluate("x+2*x*x+4*x*x*x") 

It leads to the following comparison:

enter image description here

I canโ€™t explain everything on the chart above: at first we see the big overhead for the numbersxpr library, but since it uses cache better, it is about 10 times faster for large arrays!




Another approach is to do a jit compilation of the function and thus get a real UFunc in pure C. This is Numba's approach:

 # runtime generated C-function as ufunc import numba as nb @nb.vectorize(target="cpu") def nb_vf(x): return x+2*x*x+4*x*x*x 

This is 10 times faster than the original approach:

enter image description here




However, the task is embarrassingly parallelized, so we can also use prange to compute the loop in parallel:

 @nb.njit(parallel=True) def nb_par_jitf(x): y=np.empty(x.shape) for i in nb.prange(len(x)): y[i]=x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i] return y 

As expected, the parallel function is slower for small inputs, but faster (almost 2 times) for large sizes:

enter image description here




While numba specializes in optimizing operations with numpy arrays, Cython is a more general tool. It is more difficult to extract the same performance as with numba - it often drops to llvm (numba) compared to the local compiler (gcc / MSVC):

 %%cython -c=/openmp -a import numpy as np import cython #single core: @cython.boundscheck(False) @cython.wraparound(False) def cy_f(double[::1] x): y_out=np.empty(len(x)) cdef Py_ssize_t i cdef double[::1] y=y_out for i in range(len(x)): y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i] return y_out #parallel: from cython.parallel import prange @cython.boundscheck(False) @cython.wraparound(False) def cy_par_f(double[::1] x): y_out=np.empty(len(x)) cdef double[::1] y=y_out cdef Py_ssize_t i cdef Py_ssize_t n = len(x) for i in prange(n, nogil=True): y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i] return y_out 

Cython leads to slightly slower functions:

enter image description here




Conclusion

Obviously, testing for only one function proves nothing. It should also be borne in mind that for the selected example function, the memory bandwidth was a bottleneck for sizes exceeding 10 ^ 5 elements - thus, we had the same performance for numba, figurexpr and cython in this area.

However, based on this research and my experience, I would say that numba seems to be the easiest tool with the best features.




Work schedule with perflot -package:

 import perfplot perfplot.show( setup=lambda n: np.random.rand(n), n_range=[2**k for k in range(0,24)], kernels=[ f, vf, ne_f, nb_vf, nb_par_jitf, cy_f, cy_par_f, ], logx=True, logy=True, xlabel='len(x)' ) 
+19
Jan 22 '19 at 16:04
source share
 squares = squarer(x) 

Arithmetic operations on arrays are automatically applied elementarily, with efficient C-level loops that exclude all interpreter overhead that will be applied to a Python-level loop or understanding.

Most of the functions that you want to apply to the NumPy element will just work, although some of them may require changes. For example, if does not work differently. You want to convert them to constructors like numpy.where :

 def using_if(x): if x < 5: return x else: return x**2 

becomes

 def using_where(x): return numpy.where(x < 5, x, x**2) 
+18
Feb 05 '16 at 2:36
source share

I believe in the new version (I use 1.13) numpy, you can just call the function by passing the numpy array to the fuction that you wrote for the scalar type, it will automatically apply the function call to each element over the numpy array and return you another numpy array

 >>> import numpy as np >>> squarer = lambda t: t ** 2 >>> x = np.array([1, 2, 3, 4, 5]) >>> squarer(x) array([ 1, 4, 9, 16, 25]) 
+10
Jun 23 '17 at 7:16
source share

Nobody seems to have mentioned the built-in factory method for getting ufunc in a numpy package: np.frompyfunc which I tested np.vectorize again and exceeded it by about 20-30%. Of course, it will work well as prescribed by C code or even numba (which I have not tested), but may be a better alternative than np.vectorize

 f = lambda x, y: x * y f_arr = np.frompyfunc(f, 2, 1) vf = np.vectorize(f) arr = np.linspace(0, 1, 10000) %timeit f_arr(arr, arr) # 307ms %timeit vf(arr, arr) # 450ms 

I also tested large samples, and the improvement is proportional. See documentation also here.

+1
May 15 '19 at 21:41
source share

As mentioned in this post , just use generator expressions, for example:

 numpy.fromiter((<some_func>(x) for x in <something>),<dtype>,<size of something>) 
0
Feb 05 '16 at 2:22
source share

This may not directly answer this question, but I heard that numba can compile existing python code into parallel machine instructions. I review and revise this post when I have a chance to use it.

0
Apr 30 '18 at 23:39
source share

Maybe it's better to use vectorize

 def square(x): return x**2 vfunc=vectorize(square) vfunc([1,2,3,4,5]) output:array([ 1, 4, 9, 16, 25]) 
-3
Feb 05 '16 at 3:20
source share



All Articles