What do you call an iterator with two different "made" states?

When I request an API that has a broken list of unknown length, I find myself doing substantially

def fetch_one(self, n): data = json.load(urlopen(url_template % n)) if data is None: self.finished = True return for row in data: if row_is_weird(row): self.finished = True return yield prepare(row) def work(self): n = 1 self.finished = False while not self.finished: consume(self.fetch_one(n)) n += 1 

the separation between work and fetch_one makes it very easy to test, but signaling through instance variables means that I cannot accept more than one work at a time, which sucks. I came up with what, in my opinion, is a cleaner solution, but it has an iterator with two “made” states, and I have no idea what to call it. I'm sure this pattern exists elsewhere, so I would appreciate pointers (or reasons why this is stupid):

 class Thing(object): def __init__(self, gen): self.gen = gen self.finished = False def __iter__(self): return self def __next__(self): try: v = next(self.gen) except StopThisThing: self.finished = True raise StopIteration else: return v next = __next__ 

which I would then use as

 @thinged def fetch_one(self, n): data = json.load(urlopen(url_template % n)) if data is None: raise StopThisThing() for row in data: if row_is_weird(row): raise StopThisThing() yield prepare(row) def work(self): n = 1 while True: one = self.fetch_one(n) consume(one) if one.finished: break n += 1 

So what is this thing that I created?

+6
source share
4 answers

I think you can avoid this by giving way to something special.

I had to create my own executable example to show what I mean:

 def fetch_one(n): lst = [[1,2,3], [4,5,6], [7,8,9]][n] for x in lst: if x == 6: yield 'StopAll' return yield x def work(): n = 0 in_progress = True while in_progress: numbers_iterator = fetch_one(n) for x in numbers_iterator: if x == 'StopAll': in_progress = False break print('x =', x) n += 1 work() 

Conclusion:

 x = 1 x = 2 x = 3 x = 4 x = 5 

I like it more than self.finished or a decorator like the one you created, but I think something else can be found. (Perhaps this answer may help you with this).

Update:. A simpler solution would be to convert fetch_one to a class that carries its own finised flag.

The decorator's approach to this solution may be:

 class stopper(object): def __init__(self, func): self.func = func self.finished = False def __call__(self, *args, **kwargs): for x in self.func(*args, **kwargs): if x == 6: self.finished = True raise StopIteration yield x else: self.finished = True 

Basically, you don't care how fetch_one works, only if what it gives is okay or not.

Usage example:

 @stopper def fetch_one(n): lst = [[1,2,3], [4,5,6], [7,8,9]][n] #lst = [[1,2,3], [], [4,5,6], [7,8,9]][n] # uncomment to test for/else for x in lst: yield x def work(): n = 0 while not fetch_one.finished: for x in fetch_one(n): print('x =', x) n += 1 
+2
source

There is a cleaner way to handle your situation: you have a data source consisting of paged data, but the termination condition can be detected by examining individual rows. So I would use an iterator that retrieves the data row by row and stops when necessary. No special values ​​(in or out of range), no two-way communication.

Edit: I just found that you really don't care about page borders. In this case, you should simply use this:

 def linegetter(url_template): """ Return the data line by line. Stop when end of input is detected. """ n=0 while True: n += 1 data = json.load(urlopen(url_template % n)) if data is None: return for row in data: if row_is_weird(row): return yield row 

It returns data line by line, and you can prepare and consume it in any way. Done!

That should be the whole answer, it seems. But suppose you need to process the data page by page (as your code does). Just group the first iterator output into sub iterators for each page. The code is more complicated because I inserted it into a completely general solution; but using it is very simple.

 def linegetter(source, terminate=lambda x: False): """ Return the data line by line, in a tuple with the page number. Stop when end of input is detected. """ for n, data in enumerate(source): if data is None: return for row in data: if terminate(row): return yield (n, row) def _giverow(source): "Yield page contents line by line, discarding page number" for page, row in source: yield row def pagegetter(source): """Return an iterator for each page of incoming data. """ import itertools for it in itertools.groupby(source, lambda x : x[0]): yield _giverow(it[1]) 

Demo: Each "line" is a number, each page is a sublist. We stop when we see "b". There are no final checks in your main loop:

 incoming = iter([[1,2,3], [4,5,6, "b", 7], [7,8,9]]) def row_is_weird(r): return r == "b" for page in pagegetter(linegetter(incoming, row_is_weird)): print list(page) 

As you can see, the code is completely general. You can use it with an iterator that retrieves json pages, for example:

 from itertools import imap, count jsonsource = imap(lambda n: json.load(urlopen(url_template % n)), count(1)) for page in pagegetter(linegetter(jsonsource, row_is_weird)): consume(page) 
+1
source

The name you invented is "the poor man version of the iterator." Your work function spends effort re-implementing what python already provides in a for loop. You have a sequence of values ​​that can be stopped at any time, and that is why python iterators provide. We would be better off moving part of this logic into a separate function. Something like that:

 def fetch_all(self): for n in itertools.count(): data = json.load(urlopen(url_template % n)) if data is None: return for row in data: if row_is_wierd(row): return yield itertools.imap(prepare, data) 

Alternatively you can use exceptions

 def fetch_all(self): for n in itertools.count(): data = json.load(urlopen(url_template % n) if data is None: return try: yield map(prepare, data) except WierdRowError: return 

Actually, I questioned the logic of the treatment of such strange lines. What makes the series weird? Why are we staying there? Is this really some kind of error that the line is weirder?

In any case, your working function becomes

 def work(): for item in fetch_all(): consume(item) 

EDIT

With additional information, I would do something like

 def fetch_rows(): for n in itertools.count(): data = json.load(urlopen(url_template % n)) for row in data: if row_is_wierd(row): return yield row 

This function creates a string sequence.

 def work(): for row in fetch_all_rows(): consume(row) 

This function actually processes the strings.

Some or all of them can be replaced with iterator objects from itertools.

0
source

I initially gave the wrong answer; it's better here.

You have several sequences (JSON files) that can either end normally or abruptly (if row_is_weird ). If the sequence ends normally, you must perform the following sequence. This sequence of sequences ends when you get None instead of a JSON file. </sanity-check>

You use an instance variable to signal both abrupt and normal termination. This helps your code break deeply nested loops, but also introduces an unwanted non-local state.

The easiest way to remove the general state is to pass it as part of the result or parameters. Let each line of “strangeness” pass along with it. In fact, if the string is strange, we don’t need to pass the value of the string, we just pass the value saying "from now on, the results are invalid." This helps stop the iteration in the right place.

Essentially, this is similar to the accepted answer, but inside you can view it as the application of the Maybe and List monks . An added benefit is that you cannot mistakenly use the end of sequence marker for a sequence marker.

 # preparations and mockups input = [ # imitates rows or parsed JSON ['apple', 'orange', 'peach'], # entirely good rows ['meat', 'fowl', 'ROTTEN', 'unicorn'], # some good rows, then a bad one ['unicorn2', 'unicorn3'], # good rows we should never see None, # sentinel imitating 'no data' from JSON parser ] def prepare(x): print "%s is prepared" % x return 'prepared %s' %x consume = lambda x: "%s is consumed" % x row_is_weird = lambda x: x is 'ROTTEN' # the solution def maybe_prepare(row): if row_is_weird(row): return (False, None) # Nothing else: return (True, prepare(row)) # Just prepare(row) def fetch_one(n): data = input[n-1] # instead of json.load(template % n) if data is None: return iter([(False, None)]) else: return (maybe_prepare(row) for row in data) # chain_all iterates over all items of all sequences in seqs chain_all = lambda seqs: (item for seq in seqs for item in seq) from itertools import count def work(): for is_ok, prepared_row in chain_all(fetch_one(n) for n in count(1)): if not is_ok: break print consume(prepared_row) 

This code is still easy to test, but testing fetch_one() bit more complicated: you only need to fetch_one() over the values ​​before the first (False, None) . This is easy to do with itertools.takewhile() .

The maybe_prepare() function may be single-line, but I left it multi-line for readability.

0
source

Source: https://habr.com/ru/post/910424/


All Articles