Remember that this is Python, not C, and sometimes the way it works is not necessarily intuitive. This means that you must confirm your assumptions. I did this using the built-in magic %%timeit built into IPython.
a = [1,2,2,3,4,5] %timeit b=[x for x in a if x > 2]\ 1000000 loops, best of 3: 362 ns per loop %timeit c=[x for x in a if not (x > 2)] 1000000 loops, best of 3: 371 ns per loop
So up to 800 ns for both, but we repeat the cycle two times. Do we really need this? How about a couple of the above methods? Start with a classification that is pretty simple and just iterates over the list:
%timeit b, a = classify(a, lambda x: x > 2) 1000000 loops, best of 3: 1.89 µs per loop
Wow, although it only goes through the cycle at a time, it takes more than twice as much as the simple solution above that goes through it twice. Let's try a slight deviation from the other solution proposed above:
%%timeit b, c = [], [] for x in a: b.append(x) if x > 2 else a.append(x) 1000000 loops, best of 3: 1.2 µs per loop
Better, but still slower than our "naive" / "ineffective" implementation. Maybe a slightly different wording would be better:
%%timeit b, c = [], [] for x in a: if x > 2: b.append(x) else: c.append(x) 1000000 loops, best of 3: 983 ns per loop
Hmm, it seems almost the same, but a little faster. Still not surpassing a naive implementation. Let really smart, perhaps make it even faster:
%%timeit b, c = [], [] for x in a: (b, c)[x > 2].append(x) 1000000 loops, best of 3: 1.28 µs per loop
So, we saw that, despite our desire not to repeat the cycle twice, we do not seem to be able to improve just by making two comprehensions of the list. In Python, lists are visible "under the hood", so for many things they will be faster than anything you come up with.
Now comparing x < 2 cheap - no function calls, no math. There will be a point where it starts to make sense. Let me create a more expensive comparison function - we will calculate the factorial (and make it inefficient):
def factorial(x): if x in (0,1): return 1 else: return x * factorial(x-1) %timeit b = [x for x in a if factorial(x) > 6] 100000 loops, best of 3: 3.47 µs per loop %timeit c = [x for x in a if not factorial(x) > 6] 100000 loops, best of 3: 3.53 µs per loop
Thus, it is obvious that the time went very well - now everything is about 7uS.
Let's try some of our other examples:
%timeit b, c = classify(a, lambda x: factorial(x) > 6) 100000 loops, best of 3: 5.05 µs per loop %%timeit b, c = [], [] for x in a: if factorial(x) > 6: b.append(x) else: c.append(x) 100000 loops, best of 3: 4.01 µs per loop %%timeit b, c = [], [] for x in a: (b, c)[factorial(x) > 6].append(x) 100000 loops, best of 3: 4.59 µs per loop
A lesson from all of this: when dealing with python and efficiency, it is usually recommended to just try it in the console, but very often naive implementations are quite efficient and easiest to read. And a quick test will tell you whether you should try to optimize it; you can often make it less readable and slower if you are not careful ....
Addendum: Someone commented that we need longer lists because we measure function overhead more than performance. They have a good point, but times show roughly the same relationship on my machine:
In [16]: a = range(100000) In [17]: random.shuffle(a) In [18]: %timeit b = [x for x in a if x > 50000] 100 loops, best of 3: 5.2 ms per loop In [19]: %timeit c = [x for x in m if not x > 50000] 100 loops, best of 3: 5.18 ms per loop In [20]: %timeit c = [x for x in m if x <= 50000] 100 loops, best of 3: 5.35 ms per loop In [21]: %%timeit ....: b, c = [], [] ....: for x in a: ....: if x > 50000: ....: b.append(x) ....: else: ....: c.append(x) ....: 100 loops, best of 3: 12.7 ms per loop
Please note that if I changed the comparison from x> 2 (instead of x> 50,000), the second cycle accelerated to about 11.4 ms. But still, this is hardly faster than a naive implementation. However, this is probably the one I would prefer - it is still easy to read, and it is not slower (or significantly so). And the code becomes more complex over time, so when you later add another comparison or change it to a function call, it’s easier to do, and the performance will be less in this version than the naive version.
But again: this does not mean using “using lists” (although they are often a good idea), but check your assumptions.