Understanding a Python list - the result of pop from the original list?

Let's say I have a list in Python 3.X. I use list comprehension to return a subset of this list --- is there a simple / Pythonic way to “pull” that subset from the original list (so the elements in this subset no longer return to the original list)? Thanks!

Example (I need help defining my_func):

a = [1, 2, 2, 3, 4, 5] a, b = my_func(a, *kwargs) 

I would like to:

a = [1, 2, 2]
b = [3, 4, 5]

Note. I do not want to set values ​​one at a time, but just a subset of them all. Pop may not be the best terminology, but I don’t know what it is.

The best way I can come up with is:

  temp = [idx for idx, val in enumerate(a) if val > 2] b = [a[i] for i in temp] a = [val for idx,val in enumerate(a) if idx not in temp] 

Obviously, I would prefer something more elegant and efficient. I want to avoid iterating over the list twice.

EDIT: I think this question is unique because I'm not just interested in splitting the list - I want to change the original list. Separating and assigning one of these sections to the original list variable is a possible solution, but I was hoping it could be a way to do this without explicitly assigning the original list variable (similar to b.append (a.pop ()))

+6
source share
4 answers

Just filter out unwanted elements using list comprehension:

 a = [1, 2, 2, 3, 4, 5] condition = lambda x: x > 2 b = [x for x in a if condition(x)] # b == [3, 4, 5] a = [x for x in a if not condition(x)] # a == [1, 2, 2] 

Update

If you are worried about efficiency, then here is another approach that scans the list only once:

 def classify(iterable, condition): """ Returns a list that matches the condition and a list that does not """ true_list = [] false_list = [] for item in iterable: if condition(item): true_list.append(item) else: false_list.append(item) return true_list, false_list a = [1, 2, 2, 3, 4, 5] b, a = classify(a, lambda x: x > 2) print(a) # [1, 2, 2] print(b) # [3, 4, 5] 

Update 2

I did not understand that my update looks almost the same as for user3467349, but believe it or not, I did not cheat :-)

+4
source

One line solution (don't really use this):

 a = [7,4,2, 1, 5 , 11, 2] [x for x in (a.pop() for i in range(len(a))) if (lambda x: True if x> 2 \ else a.insert(0, x))(x)] 

Smarter Solutions

In the general case - if you check what the value is before popping up - it is reading by index, and then pop (not to mention how to keep track of iteration over an array of varying lengths) which seems to bet on the absence of two new lists .

Therefore, I would simply add to one of the two new lists, i.e.

 a = [1, 2, 2, 3, 4, 5] b=[]; c=[]; for i in range(len(a)): if x > 2: b.append(x) else: c.append(x) 

Of course, you can technically jump out of a and paste back into what you would do a.insert(0, x) instead of c.append(x) - but it's generally a bad idea to manipulate the list that you iterate over.

Another alternative is a sorted list and bisect, i.e.

 a = sorted(a) ind = bisect.bisect(a, 2) #Now both lists are here a, b= a[:ind], a[ind:] 

Which preferred method really depends on what you plan to do with the lists afterwards.

+4
source

Remember that this is Python, not C, and sometimes the way it works is not necessarily intuitive. This means that you must confirm your assumptions. I did this using the built-in magic %%timeit built into IPython.

 a = [1,2,2,3,4,5] %timeit b=[x for x in a if x > 2]\ 1000000 loops, best of 3: 362 ns per loop %timeit c=[x for x in a if not (x > 2)] 1000000 loops, best of 3: 371 ns per loop 

So up to 800 ns for both, but we repeat the cycle two times. Do we really need this? How about a couple of the above methods? Start with a classification that is pretty simple and just iterates over the list:

 %timeit b, a = classify(a, lambda x: x > 2) 1000000 loops, best of 3: 1.89 µs per loop 

Wow, although it only goes through the cycle at a time, it takes more than twice as much as the simple solution above that goes through it twice. Let's try a slight deviation from the other solution proposed above:

 %%timeit b, c = [], [] for x in a: b.append(x) if x > 2 else a.append(x) 1000000 loops, best of 3: 1.2 µs per loop 

Better, but still slower than our "naive" / "ineffective" implementation. Maybe a slightly different wording would be better:

 %%timeit b, c = [], [] for x in a: if x > 2: b.append(x) else: c.append(x) 1000000 loops, best of 3: 983 ns per loop 

Hmm, it seems almost the same, but a little faster. Still not surpassing a naive implementation. Let really smart, perhaps make it even faster:

 %%timeit b, c = [], [] for x in a: (b, c)[x > 2].append(x) 1000000 loops, best of 3: 1.28 µs per loop 

So, we saw that, despite our desire not to repeat the cycle twice, we do not seem to be able to improve just by making two comprehensions of the list. In Python, lists are visible "under the hood", so for many things they will be faster than anything you come up with.

Now comparing x < 2 cheap - no function calls, no math. There will be a point where it starts to make sense. Let me create a more expensive comparison function - we will calculate the factorial (and make it inefficient):

 def factorial(x): if x in (0,1): return 1 else: return x * factorial(x-1) %timeit b = [x for x in a if factorial(x) > 6] 100000 loops, best of 3: 3.47 µs per loop %timeit c = [x for x in a if not factorial(x) > 6] 100000 loops, best of 3: 3.53 µs per loop 

Thus, it is obvious that the time went very well - now everything is about 7uS.

Let's try some of our other examples:

 %timeit b, c = classify(a, lambda x: factorial(x) > 6) 100000 loops, best of 3: 5.05 µs per loop %%timeit b, c = [], [] for x in a: if factorial(x) > 6: b.append(x) else: c.append(x) 100000 loops, best of 3: 4.01 µs per loop %%timeit b, c = [], [] for x in a: (b, c)[factorial(x) > 6].append(x) 100000 loops, best of 3: 4.59 µs per loop 

A lesson from all of this: when dealing with python and efficiency, it is usually recommended to just try it in the console, but very often naive implementations are quite efficient and easiest to read. And a quick test will tell you whether you should try to optimize it; you can often make it less readable and slower if you are not careful ....

Addendum: Someone commented that we need longer lists because we measure function overhead more than performance. They have a good point, but times show roughly the same relationship on my machine:

 In [16]: a = range(100000) In [17]: random.shuffle(a) In [18]: %timeit b = [x for x in a if x > 50000] 100 loops, best of 3: 5.2 ms per loop In [19]: %timeit c = [x for x in m if not x > 50000] 100 loops, best of 3: 5.18 ms per loop In [20]: %timeit c = [x for x in m if x <= 50000] 100 loops, best of 3: 5.35 ms per loop In [21]: %%timeit ....: b, c = [], [] ....: for x in a: ....: if x > 50000: ....: b.append(x) ....: else: ....: c.append(x) ....: 100 loops, best of 3: 12.7 ms per loop 

Please note that if I changed the comparison from x> 2 (instead of x> 50,000), the second cycle accelerated to about 11.4 ms. But still, this is hardly faster than a naive implementation. However, this is probably the one I would prefer - it is still easy to read, and it is not slower (or significantly so). And the code becomes more complex over time, so when you later add another comparison or change it to a function call, it’s easier to do, and the performance will be less in this version than the naive version.

But again: this does not mean using “using lists” (although they are often a good idea), but check your assumptions.

+4
source

If you want to “put” the values ​​from the list a and add these values ​​to b, I would take the following approach:

  a = [1,2,2,3,4,5] b = [] for i in a[:]: if i > 2: b.append(i) a.remove(i) #a = [1,2,2] #b = [3,4,5] 

This script is especially useful if list a not in a specific order.

0
source

Source: https://habr.com/ru/post/980879/


All Articles