How to make the reverse "range", i.e. Create a compact range based on a set of numbers?

Question

How to make the reverse "range", i.e. Create a compact range based on a set of numbers?

Python has a range method that allows you to use things like:

 >>> range(1, 6) [1, 2, 3, 4, 5]

What I'm looking for is the opposite: take a list of numbers and return the beginning and end.

 >>> magic([1, 2, 3, 4, 5]) [1, 5] # note: 5, not 6; this differs from `range()`

This is easy enough to do for the above example, but can you also resolve spaces or multiple ranges by returning a range in a string format like PCRE? Something like that:

 >>> magic([1, 2, 4, 5]) ['1-2', '4-5'] >>> magic([1, 2, 3, 4, 5]) ['1-5']

Edit: I am looking for a Python solution, but welcome working examples in other languages as well. This is more about figuring out an elegant, efficient algorithm. Bonus question: is there any programming language that has a built-in method for this?

+6

python numbers pcre range

Mathias bynens Feb 27 '12 at 18:59

source share

6 answers

Sort numbers, search for consecutive ranges (remember RLE compression?).

Something like that:

 input = [5,7,9,8,6, 21,20, 3,2,1, 22,23, 50] output = [] first = last = None # first and last number of current consecutive range for item in sorted(input): if first is None: first = last = item # bootstrap elif item == last + 1: # consecutive last = item # extend the range else: # not consecutive output.append((first, last)) # pack up the range first = last = item # the last range ended by iteration end output.append((first, last)) print output

Result: [(1, 3), (5, 9), (20, 23), (50, 50)] . You find out the rest :)

+5

9000 Feb 27 '12 at 19:16

source share

I thought you might like my generic clojure solution.

 (def r [1 2 3 9 10]) (defn successive? [ab] (= a (dec b))) (defn break-on [pred s] (reduce (fn [memo n] (if (empty? memo) [[n]] (if (pred (last (last memo)) n) (conj (vec (butlast memo)) (conj (last memo) n)) (conj memo [n])))) [] s)) (break-on successive? r)

+4

gf3 Feb 27 '12 at 20:28

source share

It's kind of elegant, but also disgusting, depending on your point of view. :)

 import itertools def rangestr(iterable): end = start = iterable.next() for end in iterable: pass return "%s" % start if start == end else "%s-%s" % (start, end) class Rememberer(object): last = None class RangeFinder(object): def __init__(self): self.o = Rememberer() def __call__(self, x): if self.o.last is not None and x != self.o.last + 1: self.o = Rememberer() self.o.last = x return self.o def magic(iterable): return [rangestr(vals) for k, vals in itertools.groupby(sorted(iterable), RangeFinder())] >>> magic([5,7,9,8,6, 21,20, 3,2,1, 22,23, 50]) ['1-3', '5-9', '20-23', '50']

Explanation: it uses itertools.groupby to group the sorted items together using the key, where the key is a Rememberer object. The RangeFinder class saves a Rememberer object if the sequential heap of elements belongs to one range block. After you left this block, it replaces Rememberer so that the key does not compare with the equal, and groupby creates a new group. When groupby looks at the sorted list, it passes the elements one by one to rangestr , which builds the line by remembering the first and last element and ignoring everything in between.

Is there a practical reason to use this instead of the 9000 answer ? Probably no; this is basically the same algorithm.

+2

Dougal Feb 27 '12 at 19:28

source share

Since 9000 beat me, I just send the second part of the code, which prints ranges similar to pcre, from the previously calculated output plus an added type check:

 for i in output: if not isinstance(i, int) or i < 0: raise Exception("Only positive ints accepted in pcre_ranges") result = [ str(x[0]) if x[0] == x[1] else '%s-%s' % (x[0], x[1]) for x in output ] print result

Output: ['1-3', '5-9', '20-23', '50']

+2

Frg Feb 27 '12 at 19:33

source share

Let's try the generators!

 # ignore duplicate values l = sorted( set( [5,7,9,8,6, 21,20, 3,2,1, 22,23, 50] ) ) # get the value differences d = (i2-i1 for i1,i2 in zip(l,l[1:])) # get the gap indices gaps = (i for i,e in enumerate(d) if e != 1) # get the range boundaries def get_ranges(gaps, l): last_idx = -1 for i in gaps: yield (last_idx+1, i) last_idx = i yield (last_idx+1,len(l)-1) # make a list of strings in the requested format (thanks Frg!) ranges = [ "%s-%s" % (l[i1],l[i2]) if i1!=i2 else str(l[i1]) \ for i1,i2 in get_ranges(gaps, l) ]

It got pretty scary, I think :)

+2

moooeeeep Feb 27 '12 at 22:29

source share

Sven marnach · Accepted Answer · 2012-02-27T19:55:28+0000

A good trick to simplify the code is to look at the differences of each element of the sorted list and its index:

 a = [4, 2, 1, 5] a.sort() print [x - i for i, x in enumerate(a)]

prints

 [1, 1, 2, 2]

Each run of the same number corresponds to a run of consecutive numbers in a . Now we can use itertools.groupby() to retrieve these runs. Here is the full code:

 from itertools import groupby def sub(x): return x[1] - x[0] a = [5, 3, 7, 4, 1, 2, 9, 10] ranges = [] for k, iterable in groupby(enumerate(sorted(a)), sub): rng = list(iterable) if len(rng) == 1: s = str(rng[0][1]) else: s = "%s-%s" % (rng[0][1], rng[-1][1]) ranges.append(s) print ranges

Print

 ['1-5', '7', '9-10']

How to make the reverse "range", i.e. Create a compact range based on a set of numbers?

More articles: