Problem with Python3 built-in zip function

Python 3.4.2 (default, Oct 8 2014, 13:44:52) [GCC 4.9.1 20140903 (prerelease)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> gen = (x for x in range(10)) ## Need to wrap range into () to create a generator, next(range(10)) is invalid >>> list(zip(gen, [1,2,3])) ## zip will "eat up" the number 3 [(0, 1), (1, 2), (2, 3)] >>> next(gen) ## Here i need next to return 3 4 >>> 

The problem is that I lose value after the zip call. That would be a big problem if not for the fact that gen is pure code.

I donโ€™t know if it is possible to create a function that behaves this way, it is definitely possible if only one of the arguments of the zip function is a generator, and the rest are โ€œnormalโ€ iterators, where all values โ€‹โ€‹are known and stored in memory. If that were the case, you could just check the generator one last time.

Basically, I am wondering if there is any function in the python standard library that will act as I need in this case.

Of course, in some cases, you can just do something like

 xs = list(gen) 

Then you only need to deal with the list.

I could also add that getting the last value that zip got from gen would also be a solution to this problem.

+5
source share
4 answers

No, there are no built-in functions that avoid this behavior.

What happens is that the zip() function tries to get the next value for all inputs so that it can create the next tuple. He should do this in order, and it is logical that this order matches the arguments passed in. In fact, the order is guaranteed by the documentation :

The iteration order is guaranteed from left to right.

Since the function must support arbitrary iterations, zip() makes no attempt to determine the length of all parameters. He does not know that your second parameter has only 3 elements. It just tries to get the next value for each of the parameters, builds a tuple and returns it. If any of the parameters cannot produce the next value, the zip() iterator is executed. But that means that your generator first will ask for the next item before asking for a list.

Besides changing the order of your inputs, you can instead create your own zip() function, which tries to take it into account, if any:

 def limited_zip(*iterables): minlength = float('inf') for it in iterables: try: if len(it) < minlength: minlength = len(it) except TypeError: pass iterators = [iter(it) for it in iterables] count = 0 while iterators and count < minlength: yield tuple(map(next, iterators)) count += 1 

So, this version of the zip() function is trying to get the ball at the minimum length of any sequences you went through. This does not protect you from using a shorter iteration in the mix, but works for your test case:

Demo:

 >>> gen = iter(range(10)) >>> list(limited_zip(gen, [1, 2, 3])) [(0, 1), (1, 2), (2, 3)] >>> next(gen) 3 
+4
source

The problem is that zip(gen,[1,2,3]) generates 0,1,2, and 3 also , but finds that the second argument is only three in length. So, if you do this in reverse order, you can create 3 in the line next (gen) :

 >>> gen = (x for x in range(10)) >>> list(zip([1,2,3],gen)) [(1, 0), (2, 1), (3, 2)] >>> next(gen) 3 
+2
source

The problem is that when zip reaches StopIteration at one of its iterations, it forgets the values โ€‹โ€‹returned from previous iterations.

Here, the solution using zip_longest and groupby from itertools to split the zip sequence before and after the shortest iteration ends:

 >>> from itertools import zip_longest, groupby >>> sentinel = object() >>> gen = (x for x in range(10)) >>> g = iter(groupby(zip_longest(gen, [1,2,3], fillvalue=sentinel), ... lambda t: sentinel not in t)) >>> _, before = next(g) >>> list(before) [(0, 1), (1, 2), (2, 3)] >>> _, after = next(g) >>> next(after) (3, <object object at 0x7fad64cbf080>) >>> next(gen) 4 
+1
source

You can use a wrapper class around your generator to give you access to the very last element. I took most of this code from the Python Wiki at https://wiki.python.org/moin/Generators .

 class gen_wrap(object): def __init__(self, gen): self.gen = gen self.current = None def __iter__(self): return self # Python 3 compatibility def __next__(self): return self.next() def next(self): self.current = next(self.gen) return self.current def last(self): return self.current >>> gen = gen_wrap(x for x in range(10)) >>> list(zip(gen, [1,2,3])) [(0, 1), (1, 2), (2, 3)] >>> gen.last() 3 
+1
source

Source: https://habr.com/ru/post/1205374/


All Articles