Algorithm for creating permutations of a list of strings and their substrings

This algorithm has eluded me for some time. Suppose I was given the string "cccaatt". I am trying to create all possible variations of each substring of repeating letters. EG, "cccaatt" as the return value:

cat, CATT, CAAT, caatt, CCAT, ccatt, CCAAT, ccaatt, cccat, cccatt, cccaat, cccaatt

The order of the results does not matter if it returns all of them. Typically, an input is a string consisting of g groups of repeated letters, each group of length k_n.

My intuition is that it is a recursive algorithm, but its exact structure was difficult to understand.

+4
source share
3 answers

If you keep the alphabet and the maximum occurrences of each letter (as noted in the comments), you can do this:

function variations(letter_type, current string) { if (letter_type is in the alphabet) { while (string has fewer than the max amount of that letter) { add one of that letter to current string variations(next letter, current string) } } else { print current string // since there are no more letters to add } } 

In Java:

 public class Variations { static String[] alphabet = {"c","a","t"}; static int[] maximums = {3, 2, 2}; public static void main(String[] args) { variations(0, ""); } public static void variations(int letter_type, String curr) { if (letter_type < alphabet.length) { for (int i = 1; i <= maximums[letter_type]; i++) { curr += alphabet[letter_type]; variations(letter_type+1, curr); } } else { System.out.println(curr); } } } 
+2
source

Expand the line in a list of numbers and the number of repetitions, i.e. "cccaatt" => [(c, 3), (a, 2), (t, 2)]. then the problem can be defined recursively.

 Let xs = [(a_1, n_1), (a_2, n_2), (a_3, n_3), ... (a_k, n_k)] define Perm(xs): if len(xs) == 1: return all length variations of xs else: return every sequence in Perm(x[:-1]) appended with one or more from x[-1] 

I will have a python example soon.

 > perm("cccaatt") > ['cat', 'ccat', 'cccat', 'caat', 'ccaat', 'cccaat', 'catt', 'ccatt', 'cccatt', 'caatt', 'ccaatt', 'cccaatt'] 

The code is attached

 def perm(xs): if not xs: return [] # group them into the correct format, probably should have used groupby + zip l = [(xs[0],1)] for x in xs[1:]: last, num = l[-1] if last == x: l[-1] = (last, num+1) else: l.append((x, 1)) # print(l) print(recurse(l)) # this is where the real work is done. def recurse(xs): if len(xs) == 1: return [ xs[0][0] * x for x in range(1, xs[0][1] + 1) ] prev = recurse(xs[:-1]) char, num = xs[-1] return [ y + x * char for x in range(1,num + 1) for y in prev ] 
+1
source

The Python itertools module has powerful tools for grouping and then for iterating through group members leading to the next program.

I showed some intermediate results and used the pprint module to print the answer:

 Python 2.7.3 (default, Aug 1 2012, 05:16:07) [GCC 4.6.3] on linux2 Type "copyright", "credits" or "license()" for more information. >>> import itertools >>> instring = "cccaatt" >>> [(x[0], list(x[1])) for x in itertools.groupby(instring)] [('c', ['c', 'c', 'c']), ('a', ['a', 'a']), ('t', ['t', 't'])] >>> xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)] >>> xx [['c', 'cc', 'ccc'], ['a', 'aa'], ['t', 'tt']] >>> list(itertools.product(*xx)) [('c', 'a', 't'), ('c', 'a', 'tt'), ('c', 'aa', 't'), ('c', 'aa', 'tt'), ('cc', 'a', 't'), ('cc', 'a', 'tt'), ('cc', 'aa', 't'), ('cc', 'aa', 'tt'), ('ccc', 'a', 't'), ('ccc', 'a', 'tt'), ('ccc', 'aa', 't'), ('ccc', 'aa', 'tt')] >>> from pprint import pprint as pp >>> pp(list(itertools.product(*xx))) [('c', 'a', 't'), ('c', 'a', 'tt'), ('c', 'aa', 't'), ('c', 'aa', 'tt'), ('cc', 'a', 't'), ('cc', 'a', 'tt'), ('cc', 'aa', 't'), ('cc', 'aa', 'tt'), ('ccc', 'a', 't'), ('ccc', 'a', 'tt'), ('ccc', 'aa', 't'), ('ccc', 'aa', 'tt')] >>> 

Or as a function:

 >>> def stringexpand(instring): xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)] return list(itertools.product(*xx)) >>> pp(stringexpand("cccaatt")) [('c', 'a', 't'), ('c', 'a', 'tt'), ('c', 'aa', 't'), ('c', 'aa', 'tt'), ('cc', 'a', 't'), ('cc', 'a', 'tt'), ('cc', 'aa', 't'), ('cc', 'aa', 'tt'), ('ccc', 'a', 't'), ('ccc', 'a', 'tt'), ('ccc', 'aa', 't'), ('ccc', 'aa', 'tt')] >>> 

It seems you need strings connected to their parts. This can be done in this small mod:

 def stringexpand(instring): xx = [list(x[0]*n for n in range(1, len(list(x[1]))+1)) for x in itertools.groupby(instring)] return [''.join(parts) for parts in itertools.product(*xx)] 

What returns:

 ['cat', 'catt', 'caat', 'caatt', 'ccat', 'ccatt', 'ccaat', 'ccaatt', 'cccat', 'cccatt', 'cccaat', 'cccaatt'] 
+1
source

Source: https://habr.com/ru/post/1432396/


All Articles