Why does Python-itertools.permutations contain duplicates? (When the source list has duplicates)

Question

Why does Python-itertools.permutations contain duplicates? (When the source list has duplicates)

It is generally accepted that a list of n different characters has n! Permutations. However, when the characters do not differ from each other, the most common convention in mathematics and elsewhere, apparently, is to count only individual permutations. Thus, permutations of the list [1, 1, 2] are usually considered as [1, 1, 2], [1, 2, 1], [2, 1, 1] . In fact, the following C ++ code prints exactly those three:

 int a[] = {1, 1, 2}; do { cout<<a[0]<<" "<<a[1]<<" "<<a[2]<<endl; } while(next_permutation(a,a+3));

On the other hand, Python itertools.permutations seems to be printing something else:

 import itertools for a in itertools.permutations([1, 1, 2]): print a

Will print

 (1, 1, 2) (1, 2, 1) (1, 1, 2) (1, 2, 1) (2, 1, 1) (2, 1, 1)

As user Artiom Rudzenka noted in response, the Python documentation says this:

Elements are considered unique, based on their position, and not on their value.

My question is: why was this design decision made?

It seems that according to the usual convention, the results will be useful (and, indeed, this is exactly what I want) ... or is there some kind of Python behavior application that I am missing?

[Or is it an implementation problem? The algorithm, as in next_permutation - for example, is explained on StackOverflow here (by me) and shown here as O (1) is amortized - seems efficient and implementable in Python, but Python does something even more efficient since it does not guarantee a lexicographical order based on value? And if so, was the increase in efficiency considered decent?]

+47

python algorithm language-design permutation

ShreevatsaR Jun 30 '11 at 12:03

source share

5 answers

I accept Gareth Rice's answer as the most attractive explanation (with the exception of the answer from the Python library developers), namely that Python itertools.permutations does not compare element values. Think about it, about it and asks a question, but now I see how this can be seen as an advantage, depending on what itertools.permutations usually uses for.

Just for completeness, I compared the three methods of generating all the different permutations. Method 1, which is very inefficient in memory and in time, but requires the least new code, is to wrap Python's itertools.permutations , as in zeekay's answer. Method 2 is a generator-based C ++ version of next_permutation starting with this blog post . Method 3 is what I wrote, which is even closer to the C ++ next_permutation algorithm ; it changes the list in place (I did not make it too general).

 def next_permutationS(l): n = len(l) #Step 1: Find tail last = n-1 #tail is from `last` to end while last>0: if l[last-1] < l[last]: break last -= 1 #Step 2: Increase the number just before tail if last>0: small = l[last-1] big = n-1 while l[big] <= small: big -= 1 l[last-1], l[big] = l[big], small #Step 3: Reverse tail i = last j = n-1 while i < j: l[i], l[j] = l[j], l[i] i += 1 j -= 1 return last>0

Here are some results. I have even greater respect for the Python built-in function: it is about three to four times faster than other methods when all the elements (or almost all) are different. Of course, when there are many repeating elements, using this is a terrible idea.

 Some results ("us" means microseconds): l m_itertoolsp m_nextperm_b m_nextperm_s [1, 1, 2] 5.98 us 12.3 us 7.54 us [1, 2, 3, 4, 5, 6] 0.63 ms 2.69 ms 1.77 ms [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] 6.93 s 13.68 s 8.75 s [1, 2, 3, 4, 6, 6, 6] 3.12 ms 3.34 ms 2.19 ms [1, 2, 2, 2, 2, 3, 3, 3, 3, 3] 2400 ms 5.87 ms 3.63 ms [1, 1, 1, 1, 1, 1, 1, 1, 1, 2] 2320000 us 89.9 us 51.5 us [1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4, 4] 429000 ms 361 ms 228 ms

The code is here if anyone wants to research.

+15

ShreevatsaR Jul 04 '11 at 13:10

source share

It's pretty easy to get the behavior you prefer by wrapping itertools.permutations , which could affect the decision. As described in the documentation, itertools intended as a collection of building blocks / tools for use in creating custom iterators.

 def unique(iterable): seen = set() for x in iterable: if x in seen: continue seen.add(x) yield x for a in unique(permutations([1, 1, 2])): print a (1, 1, 2) (1, 2, 1) (2, 1, 1)

However, as stated in the comments, this may not be as effective as you would like:

 >>> %timeit iterate(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2])) 1 loops, best of 3: 4.27 s per loop >>> %timeit iterate(unique(permutations([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2]))) 1 loops, best of 3: 13.2 s per loop

Perhaps with enough interest in itertools you can add a new function or optional argument to itertools.permutations to generate permutations without duplicates more efficiently.

+13

zeekay Jun 30 2018-11-11T00:

source share

I am also surprised that itertools does not have a function for the more intuitive concept of unique permutations. The generation of repetitive permutations just to choose a unique among them is out of the question of any serious application.

I wrote my own iterative generator function, which behaves similarly to itertools.permutations , but does not return duplicates. Only permutations of the source list are taken into account, signatures can be created with the standard itertools library.

 def unique_permutations(t): lt = list(t) lnt = len(lt) if lnt == 1: yield lt st = set(t) for d in st: lt.remove(d) for perm in unique_permutations(lt): yield [d]+perm lt.append(d)

+3

Sasho Jan 16 '13 at 22:52

source share

I may be mistaken, but it seems that the reason for this in the Elements is considered as unique, based on their position, and not on their meaning. Therefore, if the input elements are unique, there will be no duplicate values in each permutation. You specified (1,1,2) and from your point of view, 1 in index 0 and 1 in the same index are the same - but this is not so, since indexes were used instead of values in the python implementation lookups.

So, if we look at the default implementation of python permutations, we will see that it uses indexes:

 def permutations(iterable, r=None): pool = tuple(iterable) n = len(pool) r = n if r is None else r for indices in product(range(n), repeat=r): if len(set(indices)) == r: yield tuple(pool[i] for i in indices)

For example, if you change your input to [1,2,3], you will get the correct permutations ([((1, 2, 3), (1, 3, 2), (2, 1, 3), (2, 3 , 1), (3, 1, 2), (3, 2, 1)]), because the values are unique.

+1

Artsiom Rudzenka Jun 30 '11 at 12:16

source share

Gareth Rees · Accepted Answer · 2011-06-30 12:47

I can not talk about the designer itertools.permutations (Raymond Hettinger), but it seems to me that there are several points in favor of design:

Firstly, if you used the style next_permutation -style, then you will not be allowed to pass objects that support linear order. While itertools.permutations provides permutations of any type of object. Imagine how unpleasant it would be:

 >>> list(itertools.permutations([1+2j, 1-2j, 2+j, 2-j])) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: no ordering relation is defined for complex numbers

Secondly, without testing equality on objects, itertools.permutations avoids paying the cost of calling the __eq__ method in the usual case when it is not needed.

In principle, itertools.permutations solves the general case reliably and cheaply. Of course, there is an argument that itertools should provide a function that avoids duplication of permutations, but such a function should be in addition to itertools.permutations , and not instead of it. Why not write such a function and send the patch?

Why does Python-itertools.permutations contain duplicates? (When the source list has duplicates)

More articles: