Your problem is complex because there are some edge cases to think about:
- Strings with duplicate characters (i.e. how would you shuffle "aaaab"?)
- How do you measure character string bindings or rebuild blocks?
In any case, the metric defined to shuffle the rows to a certain percentage is likely to be the same that you use in your algorithm to see how close they are.
My code for shuffling n characters:
import random def shuffle_n(s, n): idx = range(len(s)) random.shuffle(idx) idx = idx[:n] mapping = dict((idx[i], idx[i-1]) for i in range(n)) return ''.join(s[mapping.get(x,x)] for x in range(len(s)))
Basically, he selects the positions n for swapping in a random order, and then exchanges each of them with the next one in the list ... Thus, he ensures that reverse swaps will not be created and only the characters n will be replaced (if the characters are repeated there, failure) .
Explained run with 'string', 3 as input:
idx is [0, 1, 2, 3, 4, 5] we shuffle it, now it is [5, 3, 1, 4, 0, 2] we take just the first 3 elements, now it is [5, 3, 1] those are the characters that we are going to swap string ^ ^ ^ t (1) will be i (3) i (3) will be g (5) g (5) will be t (1) the rest will remain unchanged so we get 'sirgnt'
The bad thing about this method is that it does not generate all possible options, for example, it cannot make "gnrits" from "string". This could be fixed by shuffling the index partitions as follows:
import random def randparts(l): n = len(l) s = random.randint(0, n-1) + 1 if s >= 2 and n - s >= 2: