The best way to add numpy to an array

I have a numpy array and I can just add an element to it using append, for example:

numpy.append(myarray, 1) 

In this case, I just added the integer 1 .

But is this the fastest way to add to an array? I have a very long array of tens of thousands.

Or is it better to index the array and assign it directly? Like this:

 myarray[123] = 1 
+6
source share
2 answers

Adding numpy to arrays is very inefficient. This is due to the fact that the interpreter needs to find and assign memory for the entire array at each step. Depending on the application, there are much more effective strategies.

If you know the length in advance, it is best to pre-allocate the array using a function such as np.ones , np.zeros , or np.empty .

 desired_length = 500 results = np.empty(desired_length) for i in range(desired_length): results[i] = i**2 

If you don't know the length, it might be more efficient to store the results in a regular list and then convert them to an array.

 results = [] while condition: a = do_stuff() results.append(a) results = np.array(results) 

Below are some timings on my computer.

 def pre_allocate(): results = np.empty(5000) for i in range(5000): results[i] = i**2 return results def list_append(): results = [] for i in range(5000): results.append(i**2) return np.array(results) def numpy_append(): results = np.array([]) for i in range(5000): np.append(results, i**2) return results %timeit pre_allocate() # 100 loops, best of 3: 2.42 ms per loop %timeit list_append() # 100 loops, best of 3: 2.5 ms per loop %timeit numpy_append() # 10 loops, best of 3: 48.4 ms per loop 

So, you can see that both the preliminary selection and the use of the list, the conversion is much faster.

+9
source

If you know the size of the array at the end of the run, it will be much faster to pre-allocate an array of the appropriate size, and then set the values. If you need to add on the fly, it might be better to try not to do this one element at a time, instead adding as little time as possible to avoid generating many copies over and over again. You can also do some profiling of the differences in the timings np.append , np.hstack , np.concatenate . and etc.

+1
source

Source: https://habr.com/ru/post/974761/


All Articles