Understanding Numpy Profiling Inside Elements

Profiling a piece of numpy code shows that I spend most of the time on these two functions

numpy/matrixlib/defmatrix.py.__getitem__:301 numpy/matrixlib/defmatrix.py.__array_finalize__:279 

Here's the Numpy source:

Question number 1:

__getitem__ Seems to be called every time I use something like my_array[arg] , and it gets more expensive if arg not an integer, but a slice. Is there a way to speed up calling arrays?

eg. in

 for i in range(idx): res[i] = my_array[i:i+10].mean() 

Question number 2:

When exactly __array_finalize__ are __array_finalize__ and how can I speed up by reducing the number of calls to this function?

Thanks!

+4
source share
2 answers

You cannot use matrices the same way you use 2d numpy arrays. Usually I use only matrices for short-term use of the syntax for multiplication (but with the addition of the .dot method to arrays, I find that I am doing this less and less).

But to your questions:

1) Configuring __getitem__ is actually not the case if defmatrix does not overcome __getslice__ , which he could have done, but had not done yet. There are .item and .itemset methods that are optimized for getting and setting integers (and return Python objects, not NumPy arrays)

2) __array_finalize__ is called whenever an array object (or subclass) is created. It is called from a C function through which the redirection of each array occurs. https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L1003

In the case of subclasses defined exclusively in Python, it accesses the Python interpreter from C, which has overhead. If the matrix class was a built-in type (for example, a Cython-based cdef class), then the call could avoid the overhead of the Python interpreter.

+11
source

Question 1:

Since slicing an array may sometimes require a copy of the underlying data structure (holding pointers to data in memory), they can be quite expensive. If you are actually nodes of this in the above example, you can perform average operations by actually iterating the elements i to i + 10 and manually creating the average value. For some operations this will not give any performance improvement, but avoiding the creation of new data structures will generally speed up the process.

One more note: if you do not use native types inside numpy, you will get a very big performance limitation for managing a numpy array. Suppose your array has dtype = float64, and your own float32 fleet size - it will cost a lot of extra computing power for numpy and overall performance. This is sometimes normal, and you can just take a hit to maintain the data type. In other cases, it is arbitrary what type of float or int is stored as internally. In these cases, try dtype = float instead of dtype = float64. Numpy should use its own type by default. Having made this change, I had 3x + acceleration by intensive use algorithms.

Question 2:

__array_finalize__ "is called whenever the system internally allocates a new array from obj, where obj is a subclass (subtype) of (large) ndarray" according to SciPy . So this is the result described in the first question. When you slice and create a new array, you must complete this array by making structural copies or wrapping the original structure. This operation takes time. Avoiding fragments will save on this operation, although for multidimensional data it is impossible to completely avoid calls to __array_finalize__ .

+2
source

Source: https://habr.com/ru/post/1434255/


All Articles