Does this algorithm produce uniformly random permutations?

Question

Does this algorithm produce uniformly random permutations?

I am studying CLRS and found a problem with the shuffle algorithm. Does this produce uniformly random permutations?

1 PERMUTE-WITH-ALL-IDENTITY(A) 2 n = A.length 3 for i = 1 to n 4 swap A[i] with A[RANDOM(1,n)] 5 swap A[i] with A[RANDOM(i+1,n)]

My claim: No, it is not. Because line 4 there are n ^ n possible permutations. And it is not divisible by n! which represents the number of different permutations.

Can you confirm the correctness of my reasoning?

+6

math algorithm random shuffle clrs

Bek abdik May 11, '15 at 14:01

source share

2 answers

templatetypedef · Answer 1 · 2015-06-08T17:28:20+0000

Firstly, I think there is an error in the pseudo-code. In line 4, I believe that there is a boundary error with i = n, as this will require a random number between n + 1 and n. I fixed this later by assuming the code is as follows:

 1 PERMUTE-WITH-ALL-IDENTITY(A) 2 n = A.length 3 for i = 1 to n 4 swap A[i] with A[RANDOM(1,n)] 5 swap A[i] with A[RANDOM(i,n)]

If this is code, then I believe that the answer is no, it does not create uniformly random permutations. I have no mathematical proof that this is so, but I do have a piece of C ++ code that translates all possible paths through PERMUTE-WITH-ALL-IDENTITY and counts the number of times that each permutation is done. I ran this code and tested the permutation algorithm on sequences of lengths from 0 to 4, inclusive, and found that not all permutations are equally likely.

Here is the code:

 #include <iostream> #include <map> #include <vector> #include <algorithm> using namespace std; /* Maximum size of a sequence to permute. */ const size_t kMaxSize = 4; /** * Given a frequencies map associating permutations to the number of times * that we've seen them, displays a visual report of the permutations * and their frequencies. * * @param frequencies The frequencies map. */ void reportResults(const map<vector<int>, size_t>& frequencies, size_t size) { cout << "Report for size " << size << endl; cout << "===================================================" << endl; /* Print out the map. */ cout << " Map entries:" << endl; for (const auto& entry: frequencies) { cout << " "; for (const auto& num: entry.first) { cout << num; } cout << ": " << entry.second << endl; } cout << "===================================================" << endl; cout << endl << endl; } /** * Traces through all possible executions of the algorithm, recording * the number of times that each outcome occurs. This algorithm uses * exhaustive recursion to explore the full space, assuming that the * underlying random generator is uniform. * * @param elems The elements in the sequence. It assumed that initially * these are in sorted order, but as the algorithm progresses it will * become progressively more permuted. * @param frequencies A map from permutations to the number of times that * they appear. * @param index The index through the main loop that we are currently in. * This is the variable 'i' in the pseudocode. * @param size The length of the vector. This isn't strictly necessary, * but it makes the code a bit cleaner. */ void recPopulate(const vector<int>& elems, map<vector<int>, size_t>& frequencies, size_t index, size_t size) { /* If we've finished permuting everything, record in the frequency map * that we've seen this permutation one more time. */ if (index == size) { frequencies[elems]++; } else { /* For all possible pairs of a first swap and a second swap, try that * out and see what happens. */ for (size_t i = 0; i < size; i++) { // First swap index for (size_t j = index; j < size; j++) { // Second swap index /* Make a clean copy of the vector to permute. */ vector<int> newElems = elems; /* Perform the swaps. */ swap(newElems[i], newElems[index]); swap(newElems[j], newElems[index]); /* Recursively explore all possible executions of the algorithm * from this point forward. */ recPopulate(newElems, frequencies, index + 1, size); } } } } /** * Traces through all possible executions of the proposed algorithm, * building a frequency map associating each permutation with the * number of possible executions that arrive there. * * @param size The number of elements in the initial sequence. * @return A frequency map from permutations to the number of times that * permutation can be generated. */ map<vector<int>, size_t> populateFrequencies(size_t size) { /* Create the sequence 0, 1, 2, ..., size - 1 */ vector<int> elems(size); iota(elems.begin(), elems.end(), 0); map<vector<int>, size_t> frequencies; recPopulate(elems, frequencies, 0, elems.size()); return frequencies; } int main() { for (size_t size = 0; size <= kMaxSize; size++) { map<vector<int>, size_t> frequencies = populateFrequencies(size); reportResults(frequencies, size); } }

This program generates reports that show for each permutation the number of possible execution paths through the algorithm that creates this permutation. The output for permuting the four elements is shown below. Given that each execution path is equally likely, since the numbers here do not match, we can conclude that the algorithm is not uniformly random:

 Report for size 4 =================================================== Map entries: 0123: 214 0132: 268 0213: 267 0231: 316 0312: 242 0321: 229 1023: 268 1032: 322 1203: 312 1230: 349 1302: 287 1320: 262 2013: 242 2031: 283 2103: 233 2130: 262 2301: 252 2310: 240 3012: 213 3021: 208 3102: 204 3120: 187 3201: 248 3210: 236 ===================================================

The above analysis depends on the fact that

My code is correct and
My interpretation of the pseudocode is correct.

If this is incorrect, I would be happy to cancel or edit this answer. Please let me know if I made a mistake here.

Hope this helps!

Paul uszak · Answer 2 · 2015-07-27T03:02:25+0000

The above analysis suggests chi 147 with 23 degrees of freedom, which means that the P-value is <0.00001. This indicates a very poor fit with the expected uniform distribution.

But.

It seems that a total of 6144 samples. If you look at chance, I would think that would be more appropriate. It may be that the P-Value moves towards a more favorable position after the 1000th move. Do not rearrange the random generator between runs.

Does this algorithm produce uniformly random permutations?

More articles: