Python: number of items in list for if condition

Given a list of integers, what is the most Pythonic / best way to count the number of elements in a certain range?

I researched and found 2 ways to do this:

>>> x = [10, 60, 20, 66, 79, 5] >>> len([i for i in x if 60 < i < 70]) 1 

or

 >>> x = [10, 60, 20, 66, 79, 5] >>> sum(1 for i in x if 60 < i < 70) 1 

Which method uses less time / memory (for larger lists) and why? Or maybe another way is better ...

+5
source share
3 answers

A generator expression is more efficient in terms of memory because you do not need to create an extra list.

Creating a list and getting its length (the latter is a very fast O (1) operation) seems to be faster than creating a generator and executing n additions for relatively small lists.

 In [13]: x = [1] In [14]: timeit len([i for i in x if 60 < i < 70]) 10000000 loops, best of 3: 141 ns per loop In [15]: timeit sum(1 for i in x if 60 < i < 70) 1000000 loops, best of 3: 355 ns per loop In [16]: x = range(10) In [17]: timeit len([i for i in x if 60 < i < 70]) 1000000 loops, best of 3: 564 ns per loop In [18]: timeit sum(1 for i in x if 60 < i < 70) 1000000 loops, best of 3: 781 ns per loop In [19]: x = range(50) In [20]: timeit len([i for i in x if 60 < i < 70]) 100000 loops, best of 3: 2.4 µs per loop In [21]: timeit sum(1 for i in x if 60 < i < 70) 100000 loops, best of 3: 2.62 µs per loop In [22]: x = range(1000) In [23]: timeit len([i for i in x if 60 < i < 70]) 10000 loops, best of 3: 50.9 µs per loop In [24]: timeit sum(1 for i in x if 60 < i < 70) 10000 loops, best of 3: 51.7 µs per loop 

I tried with various lists, for example [65]*n , and the trend does not change. For instance:

 In [1]: x = [65]*1000 In [2]: timeit len([i for i in x if 60 < i < 70]) 10000 loops, best of 3: 67.3 µs per loop In [3]: timeit sum(1 for i in x if 60 < i < 70) 10000 loops, best of 3: 82.3 µs per loop 
+2
source

In the specific examples you presented

 [i for i in x if 60 < i < 70] 

actually generates a completely new list, then takes its len . And vice versa,

 (1 for i in x if 60 < i < 70) 

is the generator expression by which you take sum .

For sufficiently significant relevant elements, the second version will be more effective (especially in terms of memory).


Delay

 x = [65] * 9999999 %%time len([i for i in x if 60 < i < 70]) CPU times: user 724 ms, sys: 44 ms, total: 768 ms Wall time: 768 ms Out[7]: 9999999 %%time sum(1 for i in x if 60 < i < 70) CPU times: user 592 ms, sys: 0 ns, total: 592 ms Wall time: 593 ms 
+4
source

You can easily verify this using the timeit module. In your specific example, the first len based solution looks faster:

 $ python --version Python 2.7.10 $ python -m timeit -s "x = [10,60,20,66,79,5]" "len([i for i in x if 60 < i < 70])" 1000000 loops, best of 3: 0.514 usec per loop $ python -m timeit -s "x = [10,60,20,66,79,5]" "sum(i for i in x if 60 < i < 70)" 1000000 loops, best of 3: 0.693 usec per loop 

Even for large lists - but with most elements not matching your predicate - the len version does not look slower:

 $ python -m timeit -s "x = [66] + [8] * 10000" "len([i for i in x if 60 < i < 70])" 1000 loops, best of 3: 504 usec per loop $ python -m timeit -s "x = [66] + [8] * 10000" "sum(1 for i in x if 60 < i < 70)" 1000 loops, best of 3: 501 usec per loop 

In fact, even if most of the elements in this list are the same (so a large list of results is configured to switch to len ), the len version wins:

 $ python -m timeit -s "x = [66] + [65] * 10000" "len([i for i in x if 60 < i < 70])" 1000 loops, best of 3: 762 usec per loop $ python -m timeit -s "x = [66] + [65] * 10000" "sum(1 for i in x if 60 < i < 70)" 1000 loops, best of 3: 935 usec per loop 

However, it seems much faster to not have a list, if possible, but rather to keep it, for example. a collections.Counter . For instance. for 100,000 elements, I get:

 $ python -m timeit -s "import collections; x = [66] + [65] * 100000" "len([i for i in x if 60 < i < 70])" 100 loops, best of 3: 8.11 msec per loop $ python -m timeit -s "import collections; x = [66] + [65] * 100000; d = collections.Counter(x)" "sum(v for k,v in d.items() if 60 < k < 70)" 1000000 loops, best of 3: 0.761 usec per loop 
+2
source

Source: https://habr.com/ru/post/1243760/


All Articles