Why does searching in a sorted list in python take longer?

I did an experiment in which I tried to find the time needed to find a python list. I have a list of arr with random integers. arr_s contains only those items that have been sorted.

 arr = np.random.randint(low = 0, high = 1000, size = 500) arr_s = sorted(arr) 

Now I create a random array of find integers that has the elements I want to look for in arr and arr_s .

 >>> %%timeit ...:find = np.random.randint(0, 1000, 600) ...:for i in find: ...: if i in arr: ...: continue [OUT]:100 loops, best of 3: 2.18 ms per loop >>> %%timeit ...:find = np.random.randint(0, 1000, 600) ...:for i in find: ...: if i in arr_s: ...: continue [OUT]:100 loops, best of 3: 5.15 ms per loop 

Now I understand that I did not use any specific method to search in a sorted array (e.g. binary search). Thus, it can be a standard linear search, but why does it take much longer to search in a sorted array, which is in an unsorted array? I think it will take about the same time. I tried all kinds of find arrays. Arrays that have integers from (0, 1000), (-1000, -100) and (-10000, 10000) for a sorted array always take more cycles.

+6
source share
3 answers
 arr = np.random.randint(low = 0, high = 1000, size = 500) arr_s = sorted(arr) 

arr is an array. arr_s is a list. Array searches can be handled efficiently with numpy, but the following pointers and type checks are required to search for a list. This has nothing to do with sorting.

Note: in does strange things in numpy. Using in with numpy ndarrays might be a bad idea.

+7
source

Python lists are not like C arrays. It's not just a block of memory where element 1 always comes after element 0, etc. Instead, under the hood, Python stores things in a flexible way so that you can add and remove elements of arbitrary types and move things as you wish.

In this case, I assume that the list sorting action changes the underlying organization, which makes it somewhat less effective for accessing items.

0
source

I don't have an exact answer, but a possible starting point is to check on the iterators used by each object.

 In [9]: it = arr.__iter__() In [10]: its = arr_s.__iter__() In [11]: type(it) Out[11]: iterator In [12]: type(its) Out[12]: listiterator 

They apparently use two different iterators that can explain the difference in speed.

0
source

Source: https://habr.com/ru/post/953267/


All Articles