Strategy for changing the permutation algorithm to prevent duplication of printouts

I looked at algorithms for practice, and now I look at a permutation algorithm that I really like:

void permute(char* set, int begin, int end) { int range = end - begin; if (range == 1) cout << set << endl; else { for(int i = 0; i < range; ++i) { swap(&set[begin], &set[begin+i]); permute(set, begin+1, end); swap(&set[begin], &set[begin+i]); } } } 

I really wanted to apply this to a situation where there would be many duplicate characters, so I need to be able to modify it to prevent duplicate permutations from printing.

How would I know that I am creating a duplicate? I know that I can store this in a hash or something similar, but this is not the optimal solution - I would prefer one that does not require additional storage. Can anyone give me a suggestion?

PS: I don’t want to use STL permutation mechanisms, and I don’t want to refer to another “unique permutation algorithm” somewhere. I would like to understand the mechanism used to prevent duplication, so I can build it in this, if possible, find out.

+4
source share
5 answers

There is no general way to prevent arbitrary functions from duplicating. Of course, you can always filter out duplicates, but you do not want this for very good reasons. Therefore, you need a special way to create only non-duplicates.

One way would be to generate permutations in increasing lexicographical order. Then you can simply compare if the “new” permutation is the same as the last, and then skip it. It gets even better: the algorithm for generating permutations to increase the lexicographical order specified in http://en.wikipedia.org/wiki/Permutations#Generation_in_lexicographic_order does not even create duplicates at all!

However , this is not the answer to your question, as it is a different algorithm (albeit based on the exchange).

So let's take a closer look at your algorithm. One of the key observations:

  • Once the character changes to begin , it will remain there for all nested permute calls.

We combine this with the following general observation of permutations:

  • If you rearrange the string s , but only in those places where there is the same character, s will remain unchanged. In fact, all duplicate permutations have a different order for the occurrence of some character c, where c occurs at the same positions.

OK, so all we need to do is make sure that the occurrences of each character are always in the same order as at the beginning. The code follows, but ... I really do not speak C ++, so I will use Python and hope to leave with a statement about its pseudocode.

Let's start with your original pseudo-code rewritten algorithm:

 def permute(s, begin, end): if end == begin + 1: print(s) else: for i in range(begin, end): s[begin], s[i] = s[i], s[begin] permute(s, begin + 1, end) s[begin], s[i] = s[i], s[begin] 

and an auxiliary function that makes it easier to call:

 def permutations_w_duplicates(s): permute(list(s), 0, len(s)) # use a list, as in Python strings are not mutable 

Now we extend the permutation function with some bookkeeping about how many times a certain character has been replaced by the begin position (i.e. it has been fixed), and we also remember the original order of occurrence of each character ( char_number ). Each character that we are trying to swap to the begin position must then be next higher in the original order, that is, the number of corrections for a symbol determines which original origin of this symbol can be corrected as follows: I call it next_fixable .

 def permute2(s, next_fixable, char_number, begin, end): if end == begin + 1: print(s) else: for i in range(begin, end): if next_fixable[s[i]] == char_number[i]: next_fixable[s[i]] += 1 char_number[begin], char_number[i] = char_number[i], char_number[begin] s[begin], s[i] = s[i], s[begin] permute2(s, next_fixable, char_number, begin + 1, end) s[begin], s[i] = s[i], s[begin] char_number[begin], char_number[i] = char_number[i], char_number[begin] next_fixable[s[i]] -= 1 

Again, we use a helper function:

 def permutations_wo_duplicates(s): alphabet = set(s) next_fixable = dict.fromkeys(alphabet, 0) count = dict.fromkeys(alphabet, 0) char_number = [0] * len(s) for i, c in enumerate(s): char_number[i] = count[c] count[c] += 1 permute2(list(s), next_fixable, char_number, 0, len(s)) 

What is it!

Nearly. You can stop here and rewrite to C ++ if you want, but if you are interested in some test data, read on.

I used slightly different code for testing because I did not want to print all permutations. In Python, you replace print with yield , turning the function into a generator function, the result of which can be iterated over with a for loop, and permutations will be calculated only as needed. This is the real code and test I used:

 def permute2(s, next_fixable, char_number, begin, end): if end == begin + 1: yield "".join(s) # join the characters to form a string else: for i in range(begin, end): if next_fixable[s[i]] == char_number[i]: next_fixable[s[i]] += 1 char_number[begin], char_number[i] = char_number[i], char_number[begin] s[begin], s[i] = s[i], s[begin] for p in permute2(s, next_fixable, char_number, begin + 1, end): yield p s[begin], s[i] = s[i], s[begin] char_number[begin], char_number[i] = char_number[i], char_number[begin] next_fixable[s[i]] -= 1 def permutations_wo_duplicates(s): alphabet = set(s) next_fixable = dict.fromkeys(alphabet, 0) count = dict.fromkeys(alphabet, 0) char_number = [0] * len(s) for i, c in enumerate(s): char_number[i] = count[c] count[c] += 1 for p in permute2(list(s), next_fixable, char_number, 0, len(s)): yield p s = "FOOQUUXFOO" A = list(permutations_w_duplicates(s)) print("%s has %s permutations (counting duplicates)" % (s, len(A))) print("permutations of these that are unique: %s" % len(set(A))) B = list(permutations_wo_duplicates(s)) print("%s has %s unique permutations (directly computed)" % (s, len(B))) print("The first 10 permutations :", A[:10]) print("The first 10 unique permutations:", B[:10]) 

And the result:

 FOOQUUXFOO has 3628800 permutations (counting duplicates) permutations of these that are unique: 37800 FOOQUUXFOO has 37800 unique permutations (directly computed) The first 10 permutations : ['FOOQUUXFOO', 'FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUXOOF', 'FOOQUUXOFO', 'FOOQUUFXOO', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX'] The first 10 unique permutations: ['FOOQUUXFOO', 'FOOQUUXOFO', 'FOOQUUXOOF', 'FOOQUUFXOO', 'FOOQUUFOXO', 'FOOQUUFOOX', 'FOOQUUOFXO', 'FOOQUUOFOX', 'FOOQUUOXFO', 'FOOQUUOXOF'] 

Note that permutations are calculated in the same order as the original algorithm, without duplicates. Please note that 37800 * 2! * 2! * 4! = 3628800, as you would expect.

+5
source

You can add an if statement to prevent swap code from executing if it replaces two identical characters. Then the for <

 for(int i = 0; i < range; ++i) { if(i==0 || set[begin] != set[begin+i]) { swap(&set[begin], &set[begin+i]); permute(set, begin+1, end); swap(&set[begin], &set[begin+i]); } } 

The reason for allowing case i==0 is that the recursive call occurs exactly once, even if all the characters in the set are the same.

+2
source

OPTION 1

One option is to use 256 bits of storage on the stack to store the bitmask, what characters you tried in the for loop, and only to resend new characters.

OPTION 2

The second option is to use the approach suggested in the comments ( http://n1b-algo.blogspot.com/2009/01/string-permutations.html ) and change the for loop to:

 else { char last=0; for(int i = 0; i < range; ++i) { if (last==set[begin+i]) continue; last = set[begin+i]; swap(&set[begin], &set[begin+i]); permute(set, begin+1, end); swap(&set[begin], &set[begin+i]); } } 

However, to use this approach, you will also have to sort the characters set [begin], set [begin + 1], ... set [end-1] when entering the function.

Note that you need to sort every time the function is called. (The blog post does not seem to mention this, but otherwise you will create too many results for the input string “aabbc.” The problem is that the string will not be sorted after using swap.)

It is still not very effective. For example, for a string containing 1 'a' and N 'b, this approach will eventually cause sorting N times for a total of N ^ 2logN

OPTION 3

A more efficient approach for long strings containing many repetitions will be to support both a “set” of strings and a dictionary of how many of each character type you have left to use. The for loop would change to a loop over the dictator’s keys, as these would be the unique characters that are allowed in this position.

This will have a complexity equal to the number of output lines, and only a very small amount of storage to store the dictionary.

+1
source

A simple solution is to randomly change duplicate characters to characters that are not yet present. Then, after rearranging, change the characters back. Accept only permutation if its characters are in order.

eg. if you have "a, b, b"

you would have the following:

 abb abb bab bab bba bba 

But, if we start with a, b, b and mark duplicate b, then we can change the second b to c

now we have bc

 abc - accept because b is before c. change c back to b to get abb acb - reject because c is before b bac - accept as bab bca - accept as bba cba - reject as c comes before b. cab - reject as c comes before b. 
+1
source

The key is not to change the same character twice. That way you can use unordered_set to remember which characters were replaced.

 void permute(string& input, int begin, vector<string>& output) { if (begin == input.size()){ output.push_back(input); } else { unordered_set<char> swapped; for(int i = begin; i < input.size(); i++) { // Do not swap a character that has been swapped if(swapped.find(input[i]) == swapped.end()){ swapped.insert(input[i]); swap(input[begin], input[i]); permute(input, begin+1, output); swap(input[begin], input[i]); } } } } 

You can go through your source code manually, and you will find those cases when duplicates occur: " replacement with a changed character. "

Ex: input = "BAA"

index = 0, i = 0, input = "BAA"

----> index = 1, i = 1, input = "BAA"

----> index = 1, i = 2, input = "BAA" (duplicate)

index = 0, i = 1, input = "ABA"

----> index = 1, i = 1, input = "ABA"

----> index = 1, i = 2, input = "AAB"

index = 0, i = 2, input = "AAB"

----> index = 1, i = 1, input = "AAB" (duplicate)

----> index = 1, i = 2, input = "ABA" (duplicate)

0
source

Source: https://habr.com/ru/post/1398950/


All Articles