Sorting a string in an array, making it sparsely populated

Question

Sorting a string in an array, making it sparsely populated

For example, let's say I have a line like:

duck duck duck duck goose goose goose dog

And I want it to be as sparsely populated as possible, let's say in this case

 duck goose duck goose dog duck goose duck

Which algorithm would you recommend? Code snippets or generic pointers would be useful, languages would welcome Python, C ++, and an extra class if you have a way to do this in bash.

+4

c ++ python algorithm bash

S1syphus May 24, '10 at 12:51

source share

6 answers

I would sort the array by the number of duplicates, starting from the duplicated element itself, spread these elements as far as possible from each other

in your example, the duck is duplicated 4 times, so the duck will be placed in position n * 8/4 for n from 0 to 3 inclusive.

Then place the next most repeated one (goose) at positions n * 8/3 + 1 for n from 0 to 2 inclusive. If something is already placed there, just put it in the next empty place. etc etc.

+5

Charles Ma May 24, '10 at 13:06

source share

I think this is what: general idea:

 L = "duck duck duck duck goose goose goose dog ".split() from itertools import cycle, islice, groupby # from: http://docs.python.org/library/itertools.html#recipes def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> ADEBFC" # Recipe credited to George Sakkis pending = len(iterables) nexts = cycle(iter(it).next for it in iterables) while pending: try: for next in nexts: yield next() except StopIteration: pending -= 1 nexts = cycle(islice(nexts, pending)) groups = [list(it) for k,it in groupby(sorted(L))] # some extra print so you get the idea print L print groups print list(roundrobin(*groups))

Conclusion:

 ['dog', 'duck', 'duck', 'duck', 'duck', 'goose', 'goose', 'goose'] [['dog'], ['duck', 'duck', 'duck', 'duck'], ['goose', 'goose', 'goose']] ['dog', 'duck', 'goose', 'duck', 'goose', 'duck', 'goose', 'duck']

So you want some kind of round video :-)

Well, the whirl is not perfect.

Here is brute force (aka terribly ineffective) version of what you were thinking.

 # this is the function we want to maximize def space_sum( L ): """ return the sum of all spaces between all elements in L""" unique = set(L) def space(val): """ count how many elements are between two val """ c = 0 # start with the first occurrence of val, then count for x in L[1+L.index(val):]: if x==val: yield c c = 0 else: c += 1 return sum(sum(space(val)) for val in unique) print max((space_sum(v), v) for v in permutations(L)) # there are tons of equally good solutions print sorted(permutations(L), key=space_sum, reverse=True)[:100]

+3

Jochen ritzel May 24, '10 at 13:14

source share

How to measure sparseness? By the way, a simple random shuffle may work.

+2

Zellux May 24 '10 at 12:54

source share

Sort types by quantity.

Item type 1 is placed in a linked list. (Save the middle link).
Next element Type count = c total current list size = N. Distribute item 2 in c using the "bankers rounding" from the middle of the list.
Go to 2.

+2

Charles Beattie May 24, '10 at 13:17

source share

There are good answers above about sorting and splitting the most common lines. But if you have so much data that you cannot sort or don’t want to waste time, look at quasi-random numbers ( http://mathworld.wolfram.com/QuasirandomSequence.html ). There's a simple implementation of this in the book Numerical Recipes. These are numbers that “look” random, i.e. They fill the gap, but try to avoid each other as much as possible. He used a lot in applications where you want to "randomly" try something, but instead of being random, you want to effectively display all the space.

+1

miked May 24, '10 at 17:13

source share

tzot · Accepted Answer · 2010-06-22T23:20:55+0000

If I understand your definition of "sparse" correctly, this function should be exactly what you want:

 # python ≥ 2.5 import itertools, heapq def make_sparse(sequence): grouped= sorted(sequence) item_counts= [] for item, item_seq in itertools.groupby(grouped): count= max(enumerate(item_seq))[0] + 1 item_counts.append( (-count, item) ) # negative count for heapq purposes heapq.heapify(item_counts) count1, item1= heapq.heappop(item_counts) yield item1; count1+= 1 while True: try: count2, item2= heapq.heappop(item_counts) except IndexError: # no other item remains break yield item2; count2+= 1 if count1 < 0: heapq.heappush(item_counts, (count1, item1)) item1, count1= item2, count2 # loop is done, produce remaining item1 items while count1 < 0: yield item1; count1+= 1 if __name__ == "__main__": # initial example print list(make_sparse( "duck duck duck duck goose goose goose dog".split())) # updated example print list(make_sparse([ 'duck', 'duck', 'duck', 'duck', 'duck', 'duck', 'goose', 'goose', 'goose', 'goose', 'dog', 'dog'])) # now a hard case: item 'a' appears more than: # > total_len//2 times if total_len is even # > total_len//2+1 times if total_len is odd print list(make_sparse("aaaaaabbcc"))

These examples produce this conclusion:

 ['duck', 'goose', 'duck', 'goose', 'duck', 'dog', 'duck', 'goose'] ['duck', 'goose', 'duck', 'goose', 'duck', 'dog', 'duck', 'goose', 'duck', 'dog', 'duck', 'goose'] ['a', 'b', 'a', 'c', 'a', 'b', 'a', 'c', 'a', 'a']

A quick note: in the first and second examples, reverse the output order may look more optimal.

Sorting a string in an array, making it sparsely populated

More articles: