What is the fastest way to iterate through a numpy array

I noticed a significant difference between iterating through the numpy array “directly” compared to tolist through the tolist method. See the chart below:

directly
[i for i in np.arange(10000000)]
via tolist
[i for i in np.arange(10000000).tolist()]

enter image description here


given that I discovered one way to speed things up. I wanted to ask, what else can accelerate its work?

What is the fastest way to iterate through a numpy array?

+6
source share
2 answers

These are my timings on a slower machine

 In [1034]: timeit [i for i in np.arange(10000000)] 1 loop, best of 3: 2.16 s per loop 

If I generate a range directly (Py3, so this is a generator), times are much better. Take this basic value to understand a list of this size.

 In [1035]: timeit [i for i in range(10000000)] 1 loop, best of 3: 1.26 s per loop 

tolist first converts arange to a list; takes a little longer but the iteration is still on the list

 In [1036]: timeit [i for i in np.arange(10000000).tolist()] 1 loop, best of 3: 1.6 s per loop 

Using list() - simultaneously with direct iteration in the array; which suggests that the first iteration does this first.

 In [1037]: timeit [i for i in list(np.arange(10000000))] 1 loop, best of 3: 2.18 s per loop In [1038]: timeit np.arange(10000000).tolist() 1 loop, best of 3: 927 ms per loop 

same time iteration on .tolist

 In [1039]: timeit list(np.arange(10000000)) 1 loop, best of 3: 1.55 s per loop 

In the general case, if you should go in cycles, work on the list is faster. Access to list items is easier.

Look at the items returned by indexing.

a[0] is another numpy object; it is built from values ​​in a , but not just a retrievable value

list(a)[0] is the same type; the list is just [a[0], a[1], a[2]]]

 In [1043]: a = np.arange(3) In [1044]: type(a[0]) Out[1044]: numpy.int32 In [1045]: ll=list(a) In [1046]: type(ll[0]) Out[1046]: numpy.int32 

but tolist convert the array to a clean list, in this case, like an int list. It works more than list() , but does it in compiled code.

 In [1047]: ll=a.tolist() In [1048]: type(ll[0]) Out[1048]: int 

In general, do not use list(anarray) . It rarely does something useful and not as hard as tolist() .

What is the fastest way to iterate through an array - None. At least not in Python; there are quick ways in c code.

a.tolist() is the fastest, vector way to create integers from an array. It iterates, but does it in compiled code.

But what is your real goal?

+4
source

This is actually not surprising. Let's look at methods that start with the slowest.

 [i for i in np.arange(10000000)] 

This method asks python to reach the numpy array (stored in the C memory area), one item at a time, allocate a Python object in memory and create a pointer to that object in the list. Every time you wedge between a numpy array stored in a C server and pull it into pure python, there is overhead. This method adds 10,000,000 times to this value.

Further:

 [i for i in np.arange(10000000).tolist()] 

In this case, using .tolist() , one call is made to the backend with numpy C and selects all elements from one frame in the list. Then you use python to iterate over this list.

Finally:

 list(np.arange(10000000)) 

This basically does the same as above, but creates a list of objects of type numpy native type (e.g. np.int64 ). Using list(np.arange(10000000)) and np.arange(10000000).tolist() should be around the same time.


So, in terms of iteration, the main benefit of using numpy is that you don't need to iterate. The operation is applied in vector form over an array. An iteration just slows it down. If you find yourself iterating over the elements of an array, you should find a way to restructure the algorithm you are trying so that you only use numpy operations (it has many built-in!), Or if you really need to, you can use np.apply_along_axis , np.apply_over_axis or np.vectorize .

+7
source

Source: https://habr.com/ru/post/1012310/


All Articles