Python: any () unexpected performance

I compare the performance of an inline function any()with the actual implementation of docs :

I am looking for an element greater than 0 in the following list:

lst = [0 for _ in range(1000000)] + [1]

This is a supposedly equivalent function:

def gt_0(lst):
    for elm in lst:
        if elm > 0:
            return True
    return False

And these are the results of performance tests:

>> %timeit any(elm > 0 for elm in lst)
>> 10 loops, best of 3: 35.9 ms per loop

>> %timeit gt_0(lst)
>> 100 loops, best of 3: 16 ms per loop

I would expect both of them to have the same performance, however any(), if twice as slow. Why?

+4
source share
4 answers

, any(). Python . __next__() any. , .

, , :

In [4]: %timeit any(elm > 0 for elm in lst)
10 loops, best of 3: 66.8 ms per loop

In [6]: test_list = [elm > 0 for elm in lst]

In [7]: %timeit any(test_list)
100 loops, best of 3: 4.93 ms per loop

, , next, - . , :

any(True for elm in lst if elm > 0)

, , ( - , .) Ashwini.

+4

, . , next() next().

, .

>>> lst = list(range(10**5))

>>> %%timeit
... sum(x for x in lst)
...
100 loops, best of 3: 6.39 ms per loop

>>> %%timeit
... c = 0
... for x in lst: c += x
...

100 loops, best of 3: 6.69 ms per loop

:

def gt_0(lst):
    for elm in lst:
        if elm > 0:
            return True
    return False


def any_with_ge(lst):
    return any(elm > 0 for elm in lst)

-:

>>> dis.dis(gt_0)
 10           0 SETUP_LOOP              30 (to 33)
              3 LOAD_FAST                0 (lst)
              6 GET_ITER
        >>    7 FOR_ITER                22 (to 32)
             10 STORE_FAST               1 (elm)

 11          13 LOAD_FAST                1 (elm)
             16 LOAD_CONST               1 (0)
             19 COMPARE_OP               4 (>)
             22 POP_JUMP_IF_FALSE        7

 12          25 LOAD_GLOBAL              0 (True)
             28 RETURN_VALUE
             29 JUMP_ABSOLUTE            7
        >>   32 POP_BLOCK

 13     >>   33 LOAD_GLOBAL              1 (False)
             36 RETURN_VALUE
>>> dis.dis(any_with_ge.func_code.co_consts[1])
 17           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                17 (to 23)
              6 STORE_FAST               1 (elm)
              9 LOAD_FAST                1 (elm)
             12 LOAD_CONST               0 (0)
             15 COMPARE_OP               4 (>)
             18 YIELD_VALUE
             19 POP_TOP
             20 JUMP_ABSOLUTE            3
        >>   23 LOAD_CONST               1 (None)
             26 RETURN_VALUE

, any(), >, , PyObject_IsTrue. , gt_0 True False.

any(), if-, for-loop.

def any_with_ge_and_condition(lst):
    return any(True for elm in lst if elm > 0)

Bytecode:

>>> dis.dis(any_with_ge_and_condition.func_code.co_consts[1])
 21           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                23 (to 29)
              6 STORE_FAST               1 (elm)
              9 LOAD_FAST                1 (elm)
             12 LOAD_CONST               0 (0)
             15 COMPARE_OP               4 (>)
             18 POP_JUMP_IF_FALSE        3
             21 LOAD_GLOBAL              0 (True)
             24 YIELD_VALUE
             25 POP_TOP
             26 JUMP_ABSOLUTE            3
        >>   29 LOAD_CONST               1 (None)
             32 RETURN_VALUE

, any(), ( ), , True, .


3:

>>> %timeit gt_0(lst)
10 loops, best of 3: 26.1 ms per loop
>>> %timeit any_with_ge(lst)
10 loops, best of 3: 57.7 ms per loop
>>> %timeit any_with_ge_and_condition(lst)
10 loops, best of 3: 26.8 ms per loop

gt_0, , any(), .

from operator import truth
# This calls `PyObject_IsTrue` internally
# https://github.com/python/cpython/blob/master/Modules/_operator.c#L30


def gt_0_truth(lst, truth=truth): # truth=truth to prevent global lookups
    for elm in lst:
        condition = elm > 0
        if truth(condition):
            return True
    return False

Timing:

>>> %timeit gt_0_truth(lst)
10 loops, best of 3: 56.6 ms per loop

, , operator.truth.

>> %%timeit t=truth
... [t(i) for i in xrange(10**5)]
...
100 loops, best of 3: 5.45 ms per loop
>>> %%timeit t=truth
[t(t(i)) for i in xrange(10**5)]
...
100 loops, best of 3: 9.06 ms per loop
>>> %%timeit t=truth
[t(i) for i in xrange(10**6)]
...
10 loops, best of 3: 58.8 ms per loop
>>> %%timeit t=truth
[t(t(i)) for i in xrange(10**6)]
...
10 loops, best of 3: 87.8 ms per loop

, truth() (.. PyObject_IsTrue) , any().


, if any() , , Py_True Py_False. POP_JUMP_IF_FALSE OP- PyObject_IsTrue.

+3

for.

any : for elm in lst for, any. , False, False, False, ..., True

gt_0 .

If you change it to check if the element is really believable, therefore both of them only repeat in cycles:

def _any(lst):
    for elm in lst:
        if elm:
            return True
    return False

_any(lst)
any(lst)

There is a clear winner:

$ python2 -m timeit "from test import lst, _any" "any(lst)"
100 loops, best of 3: 5.68 msec per loop

$ python2 -m timeit "from test import lst, _any" "_any(lst)"
10 loops, best of 3: 17 msec per loop
+1
source
print(timeit('any(True for elm in lst if elm > 0)',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))
print(timeit('any([elm > 0 for elm in lst])',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))
print(timeit('any(elm > 0 for elm in lst)',setup='lst = [0 for _ in range(1000000)] + [1]', number=10))

gives:

2.1382904349993623
3.1172365920028824
4.580027656000311

As Kasramvd explains, the latest version is the slowest because it uses a generator expression; List comprehension is a little faster, but surprisingly, using a generator expression with a conditional sentence suggested by Ashwini Chaudhary is even faster.

+1
source

Source: https://habr.com/ru/post/1680307/


All Articles