Feeling stupid when trying to implement lazy splitting in Python

Question

Feeling stupid when trying to implement lazy splitting in Python

I am trying to implement a lazy splitting of an iterator object that gives slices of an iterator when a function on an iterator element changes values. This would replace the Clojure partition-by behavior (although the semantics of the output would be different, since Python really "consumed" the elements). My implementation is optimal in the number of operations it performs, but not in the required memory. I don’t understand why a good implementation will require more O (1) memory, but my implementation takes O (k) memory, where k is the size of the partition. I would like to be able to handle cases where k is large. Does anyone know of a good implementation?

The correct behavior should be something like

>>>unagi = [-1, 3, 4, 7, -2, 1, -3, -5] >>> parts = partitionby(lambda x: x < 0,unagi) >>> print [[y for y in x] for x in parts] [[-1], [3, 4, 7], [-2], [1], [-3, -5]]

Here is my current version

 from itertools import * def partitionby(f,iterable): seq = iter(iterable) current = next(seq) justseen = next(seq) partition = iter([current]) while True: if f(current) == f(justseen): partition = chain(partition,iter([justseen])) try: justseen = next(seq) except StopIteration: yield partition break else: yield partition current = justseen partition = iter([])

+6

python iterator lazy-evaluation

Gabriel Mitchell Apr 29 '11 at 18:10

source share

2 answers

I heard that partition can be a regular list, not an iterator ie:

 partition = iter([current]) partition = chain(partition,iter([justseen])) partition = iter([])

may be:

 partition = [current] partition.append(justseen) partition = []

0

Dan D. Apr 29 '11 at 18:42

source share

kennytm · Accepted Answer · 2011-04-29T18:35:34+0000

Why not reuse groupby ? I think this is O (1).

 def partitionby(f, iterable): return (g[1] for g in groupby(iterable, f))

The difference in implementing groupby with yours is that partition is a specialized iterator object instead of chain of chain of chain ...

Feeling stupid when trying to implement lazy splitting in Python

More articles: