Sort Python list by group size

Question

Sort Python list by group size

I have a group of items marked as item_labels = [('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)]

I want to sort them by group size. for example, label 3 has a size of 3, and label 2 has a size of 2 in the above example.

I tried using a combination of groupby and sorted but did not work.

 In [162]: sil = sorted(item_labels, key=op.itemgetter(1)) In [163]: sil Out[163]: [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)] In [164]: g = itt.groupby(sil,) Display all 465 possibilities? (y or n) In [164]: g = itt.groupby(sil, key=op.itemgetter(1)) In [165]: for k, v in g: .....: print k, list(v) .....: .....: 1 [('c', 1)] 2 [('b', 2), ('e', 2)] 3 [('a', 3), ('d', 3), ('f', 3)] In [166]: sg = sorted(g, key=lambda x: len(list(x[1]))) In [167]: sg Out[167]: [] # not exactly know why I got an empty list here

I can always write some tedious for-loops for this, but I would rather find something more elegant. Any suggestion? If there are libraries that are useful to me, I will be happy to use them. e.g. pandas , scipy

+4

python sorted python-2.6 itertools

clwen Jun 24 '13 at 21:35

source share

5 answers

 from collections import defaultdict import operator l=[('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)] d=defaultdict(int) for p in l: d[p[1]] += 1 print [ p for i in sorted(d.iteritems(), key=operator.itemgetter(1)) for p in l if p[1] == i[1] ]

+3

perreal Jun 24 '13 at 9:51

source share

itertools.groupby returns an iterator, so this is for the loop: for k, v in g: actually consumes this iterator.

 >>> it = iter([1,2,3]) >>> for x in it:pass >>> list(it) #iterator already consumed by the for-loop []

code:

 >>> lis = [('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)] >>> from operator import itemgetter >>> from itertools import groupby >>> lis.sort(key = itemgetter(1) ) >>> new_lis = [list(v) for k,v in groupby(lis, key = itemgetter(1) )] >>> new_lis.sort(key = len) >>> new_lis [[('c', 1)], [('b', 2), ('e', 2)], [('a', 3), ('d', 3), ('f', 3)]]

To get a flattened list, use itertools.chain :

 >>> from itertools import chain >>> list( chain.from_iterable(new_lis)) [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]

+2

Ashwini chaudhary Jun 24 '13 at 9:38

source share

Same as @perreal and @Elazar answers , but with better names:

 from collections import defaultdict size = defaultdict(int) for _, group_id in item_labels: size[group_id] += 1 item_labels.sort(key=lambda (_, group_id): size[group_id]) print item_labels # -> [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]

+2

jfs Jun 24 '13 at 22:16

source share

Here is another way:

 example=[('a', 3), ('b', 2), ('c', 1), ('d', 3), ('e', 2), ('f', 3)] out={} for t in example: out.setdefault(t[1],[]).append(t) print sorted(out.values(),key=len)

Print

 [[('c', 1)], [('b', 2), ('e', 2)], [('a', 3), ('d', 3), ('f', 3)]]

If you need a flat list:

 print [l for s in sorted(out.values(),key=len) for l in s] [('c', 1), ('b', 2), ('e', 2), ('a', 3), ('d', 3), ('f', 3)]

+1

dawg Jun 24 '13 at 22:25

source share

Elazar · Accepted Answer · 2013-06-24T21:40:08+0000

In python2.7 and above, use a counter:

 from collections import Counter c = Counter(y for _, y in item_labels) item_labels.sort(key=lambda t : c[t[1]])

In python2.6, for this purpose, this Counter constructor can be implemented using defaultdict (as suggested by @perreal) as follows:

 from collections import defaultdict def Counter(x): d = defaultdict(int) for v in x: d[v]+=1 return d

Since we only work with numbers and assuming the numbers are as low as in your example, we can actually use a list (which will be compatible with an even older version of Python):

 def Counter(x): lst = list(x) d = [0] * (max(lst)+1) for v in lst: d[v]+=1 return d

Without a counter, you can simply do this:

 item_labels.sort(key=lambda t : len([x[1] for x in item_labels if x[1]==t[1] ]))

This is slower but reasonable compared to short lists.

The reason you got an empty list is because g is a generator. You can only iterate over it once.

Sort Python list by group size

More articles: