Python: number of items in list for if condition

Question

Python: number of items in list for if condition

Given a list of integers, what is the most Pythonic / best way to count the number of elements in a certain range?

I researched and found 2 ways to do this:

>>> x = [10, 60, 20, 66, 79, 5] >>> len([i for i in x if 60 < i < 70]) 1

or

 >>> x = [10, 60, 20, 66, 79, 5] >>> sum(1 for i in x if 60 < i < 70) 1

Which method uses less time / memory (for larger lists) and why? Or maybe another way is better ...

+5

python

K. Menyah Feb 24 '16 at 15:16

source share

3 answers

In the specific examples you presented

 [i for i in x if 60 < i < 70]

actually generates a completely new list, then takes its len . And vice versa,

 (1 for i in x if 60 < i < 70)

is the generator expression by which you take sum .

For sufficiently significant relevant elements, the second version will be more effective (especially in terms of memory).

Delay

 x = [65] * 9999999 %%time len([i for i in x if 60 < i < 70]) CPU times: user 724 ms, sys: 44 ms, total: 768 ms Wall time: 768 ms Out[7]: 9999999 %%time sum(1 for i in x if 60 < i < 70) CPU times: user 592 ms, sys: 0 ns, total: 592 ms Wall time: 593 ms

+4

Ami tavory Feb 24 '16 at 15:19

source share

You can easily verify this using the timeit module. In your specific example, the first len based solution looks faster:

 $ python --version Python 2.7.10 $ python -m timeit -s "x = [10,60,20,66,79,5]" "len([i for i in x if 60 < i < 70])" 1000000 loops, best of 3: 0.514 usec per loop $ python -m timeit -s "x = [10,60,20,66,79,5]" "sum(i for i in x if 60 < i < 70)" 1000000 loops, best of 3: 0.693 usec per loop

Even for large lists - but with most elements not matching your predicate - the len version does not look slower:

 $ python -m timeit -s "x = [66] + [8] * 10000" "len([i for i in x if 60 < i < 70])" 1000 loops, best of 3: 504 usec per loop $ python -m timeit -s "x = [66] + [8] * 10000" "sum(1 for i in x if 60 < i < 70)" 1000 loops, best of 3: 501 usec per loop

In fact, even if most of the elements in this list are the same (so a large list of results is configured to switch to len ), the len version wins:

 $ python -m timeit -s "x = [66] + [65] * 10000" "len([i for i in x if 60 < i < 70])" 1000 loops, best of 3: 762 usec per loop $ python -m timeit -s "x = [66] + [65] * 10000" "sum(1 for i in x if 60 < i < 70)" 1000 loops, best of 3: 935 usec per loop

However, it seems much faster to not have a list, if possible, but rather to keep it, for example. a collections.Counter . For instance. for 100,000 elements, I get:

 $ python -m timeit -s "import collections; x = [66] + [65] * 100000" "len([i for i in x if 60 < i < 70])" 100 loops, best of 3: 8.11 msec per loop $ python -m timeit -s "import collections; x = [66] + [65] * 100000; d = collections.Counter(x)" "sum(v for k,v in d.items() if 60 < k < 70)" 1000000 loops, best of 3: 0.761 usec per loop

+2

Frerich raabe Feb 24 '16 at 15:30

source share

timgeb · Accepted Answer · 2016-02-24T15:29:33+0000

A generator expression is more efficient in terms of memory because you do not need to create an extra list.

Creating a list and getting its length (the latter is a very fast O (1) operation) seems to be faster than creating a generator and executing n additions for relatively small lists.

 In [13]: x = [1] In [14]: timeit len([i for i in x if 60 < i < 70]) 10000000 loops, best of 3: 141 ns per loop In [15]: timeit sum(1 for i in x if 60 < i < 70) 1000000 loops, best of 3: 355 ns per loop In [16]: x = range(10) In [17]: timeit len([i for i in x if 60 < i < 70]) 1000000 loops, best of 3: 564 ns per loop In [18]: timeit sum(1 for i in x if 60 < i < 70) 1000000 loops, best of 3: 781 ns per loop In [19]: x = range(50) In [20]: timeit len([i for i in x if 60 < i < 70]) 100000 loops, best of 3: 2.4 µs per loop In [21]: timeit sum(1 for i in x if 60 < i < 70) 100000 loops, best of 3: 2.62 µs per loop In [22]: x = range(1000) In [23]: timeit len([i for i in x if 60 < i < 70]) 10000 loops, best of 3: 50.9 µs per loop In [24]: timeit sum(1 for i in x if 60 < i < 70) 10000 loops, best of 3: 51.7 µs per loop

I tried with various lists, for example [65]*n , and the trend does not change. For instance:

 In [1]: x = [65]*1000 In [2]: timeit len([i for i in x if 60 < i < 70]) 10000 loops, best of 3: 67.3 µs per loop In [3]: timeit sum(1 for i in x if 60 < i < 70) 10000 loops, best of 3: 82.3 µs per loop

Python: number of items in list for if condition

More articles: