Using itertools.product and want to align the value

So, I wrote a small script to download photos from a website. It goes through the value of 7 alpha charactor, where the first char is always a number. The problem is that if I want to stop the script and run it again, I have to start all over again.

Can I sow itertools.product in some way with the last value I received, so I don’t have to go through all this again.

Thanks for any input.

here is the piece of code:

numbers = '0123456789' alnum = numbers + 'abcdefghijklmnopqrstuvwxyz' len7 = itertools.product(numbers, alnum, alnum, alnum, alnum, alnum, alnum) # length 7 for p in itertools.chain(len7): currentid = ''.join(p) #semi static vars url = 'http://mysite.com/images/' url += currentid #Need to get the real url cause the redirect print "Trying " + url req = urllib2.Request(url) res = openaurl(req) if res == "continue": continue finalurl = res.geturl() #ok we have the full url now time to if it is real try: file = urllib2.urlopen(finalurl) except urllib2.HTTPError, e: print e.code im = cStringIO.StringIO(file.read()) img = Image.open(im) writeimage(img) 
+6
source share
3 answers

here's a solution based on the pypy library code (thanks to the agf suggestion in the comments).

state is accessible through the .state attribute and can be reset via .goto(state) , where state is the index in the sequence (starting at 0). there is a demo at the end (you need to scroll down, I'm afraid).

it's faster than dropping values.

 > cat prod.py class product(object): def __init__(self, *args, **kw): if len(kw) > 1: raise TypeError("product() takes at most 1 argument (%d given)" % len(kw)) self.repeat = kw.get('repeat', 1) self.gears = [x for x in args] * self.repeat self.num_gears = len(self.gears) self.reset() def reset(self): # initialization of indicies to loop over self.indicies = [(0, len(self.gears[x])) for x in range(0, self.num_gears)] self.cont = True self.state = 0 def goto(self, n): self.reset() self.state = n x = self.num_gears while n > 0 and x > 0: x -= 1 n, m = divmod(n, len(self.gears[x])) self.indicies[x] = (m, self.indicies[x][1]) if n > 0: self.reset() raise ValueError("state exceeded") def roll_gears(self): # Starting from the end of the gear indicies work to the front # incrementing the gear until the limit is reached. When the limit # is reached carry operation to the next gear self.state += 1 should_carry = True for n in range(0, self.num_gears): nth_gear = self.num_gears - n - 1 if should_carry: count, lim = self.indicies[nth_gear] count += 1 if count == lim and nth_gear == 0: self.cont = False if count == lim: should_carry = True count = 0 else: should_carry = False self.indicies[nth_gear] = (count, lim) else: break def __iter__(self): return self def next(self): if not self.cont: raise StopIteration l = [] for x in range(0, self.num_gears): index, limit = self.indicies[x] l.append(self.gears[x][index]) self.roll_gears() return tuple(l) p = product('abc', '12') print list(p) p.reset() print list(p) p.goto(2) print list(p) p.goto(4) print list(p) > python prod.py [('a', '1'), ('a', '2'), ('b', '1'), ('b', '2'), ('c', '1'), ('c', '2')] [('a', '1'), ('a', '2'), ('b', '1'), ('b', '2'), ('c', '1'), ('c', '2')] [('b', '1'), ('b', '2'), ('c', '1'), ('c', '2')] [('c', '1'), ('c', '2')] 

you should check it more - I might have made a dumb mistake, but the idea is quite simple, so you can fix it: o) you can use my changes; I don’t know what the original pypy license is.

also state not a full state - it does not include the original arguments - it is just an index into the sequence. it might be better to call it an index, but there are already signs in the code ...

Update

here's a simpler version, which is the same idea, but works by converting a sequence of numbers. so you just imap over over count(n) to get the sequence offset by n .

 > cat prod2.py from itertools import count, imap def make_product(*values): def fold((n, l), v): (n, m) = divmod(n, len(v)) return (n, l + [v[m]]) def product(n): (n, l) = reduce(fold, values, (n, [])) if n > 0: raise StopIteration return tuple(l) return product print list(imap(make_product(['a','b','c'], [1,2,3]), count())) print list(imap(make_product(['a','b','c'], [1,2,3]), count(3))) def product_from(n, *values): return imap(make_product(*values), count(n)) print list(product_from(4, ['a','b','c'], [1,2,3])) > python prod2.py [('a', 1), ('b', 1), ('c', 1), ('a', 2), ('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)] [('a', 2), ('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)] [('b', 2), ('c', 2), ('a', 3), ('b', 3), ('c', 3)] 

(the disadvantage here is that if you want to stop and restart, you need to keep track of how much you used)

+3
source

Once you get a fair path through the iterator, it will take some time to get to the place using dropwhile.

You may have to adapt the recipe, for example, so that you can maintain a brine condition between runs.

Make sure your script can only run once at a time, or you need something more complex, like a server process that passes identifiers to scripts

+2
source

If your input sequences do not have duplicate values, this can be faster than dropwhile to promote the product , since it does not require comparing all the reset values, calculating the correct point to resume the iteration.

 from itertools import product, islice from operator import mul def resume_product(state, *sequences): start = 0 seqlens = map(len, sequences) if any(len(set(seq)) != seqlen for seq, seqlen in zip(sequences, seqlens)): raise ValueError("One of your sequences contains duplicate values") current = end = reduce(mul, seqlens) for i, seq, seqlen in zip(state, sequences, seqlens): current /= seqlen start += seq.index(i) * current return islice(product(*sequences), start + 1, end) seqs = '01', '23', '45', '678' # if I want to resume after '1247': for i in resume_product('1247', *seqs): # blah blah pass 
+1
source

Source: https://habr.com/ru/post/911625/


All Articles