The most efficient way to map functions over a numpy array

Question

The most efficient way to map functions over a numpy array

What is the most efficient way to map a function over a numpy array? The way I did this in my current project is as follows:

import numpy as np x = np.array([1, 2, 3, 4, 5]) # Obtain array of square of each element in x squarer = lambda t: t ** 2 squares = np.array([squarer(xi) for xi in x])

However, it looks like it is probably very inefficient, since I use list comprehension to create a new array as a Python list, before converting it back to a numpy array.

Can we do better?

+212

performance python numpy

Ryan Feb 05 '16 at 2:08

source share

10 answers

Nico Schlömer · Answer 1 · 2017-09-28 13:28

I checked all the suggested methods plus np.array(map(f, x)) with perfplot ( perfplot small project).

Message # 1: If you can use NumPy native functions, do it.

If the function you are trying to vectorize is already vectorized (as an example x**2 in the original post), then using it is much faster than anything else (note the scale of the log):

If you really need vectorization, it doesn't really matter which option you use.

Code for playing sections:

 import numpy as np import perfplot import math def f(x): # return math.sqrt(x) return np.sqrt(x) vf = np.vectorize(f) def array_for(x): return np.array([f(xi) for xi in x]) def array_map(x): return np.array(list(map(f, x))) def fromiter(x): return np.fromiter((f(xi) for xi in x), x.dtype) def vectorize(x): return np.vectorize(f)(x) def vectorize_without_init(x): return vf(x) perfplot.show( setup=lambda n: np.random.rand(n), n_range=[2**k for k in range(20)], kernels=[ f, array_for, array_map, fromiter, vectorize, vectorize_without_init ], logx=True, logy=True, xlabel='len(x)', )

satomacoto · Answer 2 · 2016-02-05 02:29

How about using numpy.vectorize .

 >>> import numpy as np >>> x = np.array([1, 2, 3, 4, 5]) >>> squarer = lambda t: t ** 2 >>> vfunc = np.vectorize(squarer) >>> vfunc(x) array([ 1, 4, 9, 16, 25])

https://docs.scipy.org/doc/numpy/reference/generated/numpy.vectorize.html

Mike T · Answer 3 · 2016-02-05 04:36

TL; DR

As @ user2357112 notes , the “direct” method of applying a function is always the fastest and easiest way to map a function to Numpy arrays:

 import numpy as np x = np.array([1, 2, 3, 4, 5]) f = lambda x: x ** 2 squares = f(x)

Usually avoid using np.vectorize as it is np.vectorize and has (or had) a number of problems . If you work with other types of data, you can learn the other methods shown below.

Method Comparison

Here are some simple tests to compare the three methods for function mapping; this example is used with Python 3.6 and NumPy 1.15.4. First, the function settings for testing:

 import timeit import numpy as np f = lambda x: x ** 2 vf = np.vectorize(f) def test_array(x, n): t = timeit.timeit( 'np.array([f(xi) for xi in x])', 'from __main__ import np, x, f', number=n) print('array: {0:.3f}'.format(t)) def test_fromiter(x, n): t = timeit.timeit( 'np.fromiter((f(xi) for xi in x), x.dtype, count=len(x))', 'from __main__ import np, x, f', number=n) print('fromiter: {0:.3f}'.format(t)) def test_direct(x, n): t = timeit.timeit( 'f(x)', 'from __main__ import x, f', number=n) print('direct: {0:.3f}'.format(t)) def test_vectorized(x, n): t = timeit.timeit( 'vf(x)', 'from __main__ import x, vf', number=n) print('vectorized: {0:.3f}'.format(t))

Five-element testing (sorted from fastest to slowest):

 x = np.array([1, 2, 3, 4, 5]) n = 100000 test_direct(x, n) # 0.265 test_fromiter(x, n) # 0.479 test_array(x, n) # 0.865 test_vectorized(x, n) # 2.906

With hundreds of items:

 x = np.arange(100) n = 10000 test_direct(x, n) # 0.030 test_array(x, n) # 0.501 test_vectorized(x, n) # 0.670 test_fromiter(x, n) # 0.883

And with thousands of array elements or more:

 x = np.arange(1000) n = 1000 test_direct(x, n) # 0.007 test_fromiter(x, n) # 0.479 test_array(x, n) # 0.516 test_vectorized(x, n) # 0.945

Different versions of Python / NumPy and compiler optimization will have different results, so do a similar test for your environment.

ead · Answer 4 · 2019-01-22 16:04

Since this question was answered, many events have occurred : around the numbers npr , numba and cython . The purpose of this answer is to take these possibilities into account.

But first let me state the obvious: no matter how you map a Python function to a numpy array, it remains a Python function, which means for each evaluation:

The numpy-array element must be converted to a Python object (e.g. Float ).
all calculations are performed with Python objects, which means there is overhead for the interpreter, dynamic dispatching, and immutable objects.

Thus, what mechanism is used to loop through the array does not play a big role due to the above-mentioned costs - it remains much slower than using a simple vectorization.

Let's look at the following example:

 # numpy-functionality def f(x): return x+2*x*x+4*x*x*x # python-function as ufunc import numpy as np vf=np.vectorize(f) vf.__name__="vf"

np.vectorize as a representative of the pure Python approach class. Using perfplot (see the code in the appendix to this answer), we get the following runtime:

We see that the numpy approach is 10-100 times faster than the version in pure Python. Probably, the performance degradation with large arrays is due to the fact that the data no longer fits in the cache.

You can often hear that NumPy's performance is as good as possible because it is pure C under the hood. However, there are many opportunities for improvement!

The vectorized numpy version uses a lot of extra memory and memory accesses. Numexp-library tries to arrange numpy arrays and thus get better cache usage:

 # less cache misses than numpy-functionality import numexpr as ne def ne_f(x): return ne.evaluate("x+2*x*x+4*x*x*x")

It leads to the following comparison:

I can’t explain everything on the chart above: at first we see the big overhead for the numbersxpr library, but since it uses cache better, it is about 10 times faster for large arrays!

Another approach is to do a jit compilation of the function and thus get a real UFunc in pure C. This is Numba's approach:

 # runtime generated C-function as ufunc import numba as nb @nb.vectorize(target="cpu") def nb_vf(x): return x+2*x*x+4*x*x*x

This is 10 times faster than the original approach:

However, the task is embarrassingly parallelized, so we can also use prange to compute the loop in parallel:

 @nb.njit(parallel=True) def nb_par_jitf(x): y=np.empty(x.shape) for i in nb.prange(len(x)): y[i]=x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i] return y

As expected, the parallel function is slower for small inputs, but faster (almost 2 times) for large sizes:

While numba specializes in optimizing operations with numpy arrays, Cython is a more general tool. It is more difficult to extract the same performance as with numba - it often drops to llvm (numba) compared to the local compiler (gcc / MSVC):

 %%cython -c=/openmp -a import numpy as np import cython #single core: @cython.boundscheck(False) @cython.wraparound(False) def cy_f(double[::1] x): y_out=np.empty(len(x)) cdef Py_ssize_t i cdef double[::1] y=y_out for i in range(len(x)): y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i] return y_out #parallel: from cython.parallel import prange @cython.boundscheck(False) @cython.wraparound(False) def cy_par_f(double[::1] x): y_out=np.empty(len(x)) cdef double[::1] y=y_out cdef Py_ssize_t i cdef Py_ssize_t n = len(x) for i in prange(n, nogil=True): y[i] = x[i]+2*x[i]*x[i]+4*x[i]*x[i]*x[i] return y_out

Cython leads to slightly slower functions:

Conclusion

Obviously, testing for only one function proves nothing. It should also be borne in mind that for the selected example function, the memory bandwidth was a bottleneck for sizes exceeding 10 ^ 5 elements - thus, we had the same performance for numba, figurexpr and cython in this area.

However, based on this research and my experience, I would say that numba seems to be the easiest tool with the best features.

Work schedule with perflot -package:

 import perfplot perfplot.show( setup=lambda n: np.random.rand(n), n_range=[2**k for k in range(0,24)], kernels=[ f, vf, ne_f, nb_vf, nb_par_jitf, cy_f, cy_par_f, ], logx=True, logy=True, xlabel='len(x)' )

user2357112 · Answer 5 · 2016-02-05 02:36

 squares = squarer(x)

Arithmetic operations on arrays are automatically applied elementarily, with efficient C-level loops that exclude all interpreter overhead that will be applied to a Python-level loop or understanding.

Most of the functions that you want to apply to the NumPy element will just work, although some of them may require changes. For example, if does not work differently. You want to convert them to constructors like numpy.where :

 def using_if(x): if x < 5: return x else: return x**2

becomes

 def using_where(x): return numpy.where(x < 5, x, x**2)

Peiti Li · Answer 6 · 2017-06-23 07:16

I believe in the new version (I use 1.13) numpy, you can just call the function by passing the numpy array to the fuction that you wrote for the scalar type, it will automatically apply the function call to each element over the numpy array and return you another numpy array

 >>> import numpy as np >>> squarer = lambda t: t ** 2 >>> x = np.array([1, 2, 3, 4, 5]) >>> squarer(x) array([ 1, 4, 9, 16, 25])

Wunderbar · Answer 7 · 2019-05-15 21:41

Nobody seems to have mentioned the built-in factory method for getting ufunc in a numpy package: np.frompyfunc which I tested np.vectorize again and exceeded it by about 20-30%. Of course, it will work well as prescribed by C code or even numba (which I have not tested), but may be a better alternative than np.vectorize

 f = lambda x, y: x * y f_arr = np.frompyfunc(f, 2, 1) vf = np.vectorize(f) arr = np.linspace(0, 1, 10000) %timeit f_arr(arr, arr) # 307ms %timeit vf(arr, arr) # 450ms

I also tested large samples, and the improvement is proportional. See documentation also here.

bannana · Answer 8 · 2016-02-05 02:22

As mentioned in this post , just use generator expressions, for example:

 numpy.fromiter((<some_func>(x) for x in <something>),<dtype>,<size of something>)

把友情留在无盐 · Answer 9 · 2018-04-30 23:39

This may not directly answer this question, but I heard that numba can compile existing python code into parallel machine instructions. I review and revise this post when I have a chance to use it.

Wang Wei Qiang · Answer 10 · 2016-02-05 03:20

Maybe it's better to use vectorize

 def square(x): return x**2 vfunc=vectorize(square) vfunc([1,2,3,4,5]) output:array([ 1, 4, 9, 16, 25])

The most efficient way to map functions over a numpy array

TL; DR

Method Comparison

Conclusion

More articles: