Speed ​​difference between iterations over generators and lists

The following trivial examples have two functions that sort a list of random numbers. The first method passes the sorted generator expression, the second method first creates a list:

 import random l = [int(1000*random.random()) for i in xrange(10*6)] def sort_with_generator(): return sorted(a for a in l) def sort_with_list(): return sorted([a for a in l]) 

Benchmarking with line profiler shows that the second parameter ( sort_with_list ) is about twice as fast as the generator expression.

Can someone explain what is happening and why the first method is much slower than the second?

+4
source share
3 answers

Your first example is a generator expression that iterates over a list. Your second example is a list expression that repeats over a list. Indeed, the second example is a bit faster.

 >>> import timeit >>> timeit("sorted(a for a in l)", setup="import random;l = [int(1000*random.random()) for i in xrange(10*6)]") 5.963912010192871 >>> timeit("sorted([a for a in l])", setup="import random;l = [int(1000*random.random()) for i in xrange(10*6)]") 5.021576881408691 

The reason for this, of course, is that the list is created at a time, and function calls are required to iterate over the generator.

Generators do not speed up such small lists (you have 60 items in the list, this is very small). This allows you to save memory when creating long lists in the first place.

+6
source

If you look at the source for sorted , any sequence you pass in will first be copied to the new list.

 newlist = PySequence_List(seq); 

generatorlist is slower than listlist .

 >>> timeit.timeit('x = list(l)', setup = 'l = xrange(1000)') 16.656711101531982 >>> timeit.timeit('x = list(l)', setup = 'l = range(1000)') 4.525658845901489 

How to make a copy, think about how sorting works. Sorting is not a linear algorithm. We move through the data several times, sometimes looking at the data in both directions. The generator is designed to create a sequence through which we repeat once and only once, from the beginning to somewhere after it. The list allows random access.

On the other hand, creating a list from a generator would mean only one list in memory, while copying a list would mean two lists in memory. A good combination of space-time.

Python uses Timsort , a hybrid of sorting sorting and insertion sorting.

+2
source

Print expressions, first, load the data into memory. Then do any operations with the resulting list. Let the allocation time T2 (for the second case). Generator expressions do not allocate time at the same time, but change the iterator value to time t1[i] . The sum of all t1[i] will be T1 . T1T2 .

But when you call sorted() , in the first case, the time T1 added with the allocation time of each pair compared to sorting ( tx1[i] ). As a result of T1 , the sum of all tx1[i] added.

Therefore, T2 < T1 + sum(tx1[i])

0
source

Source: https://habr.com/ru/post/1500838/


All Articles