Algorithm for extracting any possible combination of a sublist of two lists

Question

Algorithm for extracting any possible combination of a sublist of two lists

Suppose I have two lists, as I repeat every possible combination of each sublist, so that each item appears once and only once.

I think that an example can be, if you have employees and tasks, and you want to divide them into teams, where each employee can be in only one team, and each work can be in only one team. For instance,

List<string> employees = new List<string>() { "Adam", "Bob"} ; List<string> jobs = new List<string>() { "1", "2", "3"};

I want

 Adam : 1 Bob : 2 , 3 Adam : 1 , 2 Bob : 3 Adam : 1 , 3 Bob : 2 Adam : 2 Bob : 1 , 3 Adam : 2 , 3 Bob : 1 Adam : 3 Bob : 1 , 2 Adam, Bob : 1, 2, 3

I tried using the answer to this stackoverflow question to create a list of all possible combinations of employees and each possible combination of tasks, and then select one item from each of them, each list, but this is about the way I got it.

I do not know the maximum size of the lists, but it will certainly be less than 100, and there may be other limiting factors (for example, each team can have no more than 5 employees).

Update

I'm not sure that this can be removed more and / or simplified, but this is what I have so far turned out to be.

It uses the Group algorithm provided by Yorye (see its answer below), but I removed an order that I don't need and caused problems if the keys are not comparable.

 var employees = new List<string>() { "Adam", "Bob" } ; var jobs = new List<string>() { "1", "2", "3" }; int c= 0; foreach (int noOfTeams in Enumerable.Range(1, employees.Count)) { var hs = new HashSet<string>(); foreach( var grouping in Group(Enumerable.Range(1, noOfTeams).ToList(), employees)) { // Generate a unique key for each group to detect duplicates. var key = string.Join(":" , grouping.Select(sub => string.Join(",", sub))); if (!hs.Add(key)) continue; List<List<string>> teams = (from r in grouping select r.ToList()).ToList(); foreach (var group in Group(teams, jobs)) { foreach (var sub in group) { Console.WriteLine(String.Join(", " , sub.Key ) + " : " + string.Join(", ", sub)); } Console.WriteLine(); c++; } } } Console.WriteLine(String.Format("{0:n0} combinations for {1} employees and {2} jobs" , c , employees.Count, jobs.Count));

Since I don't care about the order of the results, this seems to give me what I need.

+6

c # algorithm

sgmoore Jun 04 '12 at 20:00

source share

5 answers

Good question.

First of all, before writing the code down, let's look at the basic combinatorics of your question.

Basically, you require that for any section of set A you need the same number of parts in set B.

eg. If you divide groups A into 3, you need to set B to be divided into 3 groups, if you do not have at least one element will not have the corresponding group in another set.
It is easier to imagine this with a partition of the set A into 1 group, we should have one group from the set B, as in your example (Adam, Bob: 1, 2, 3).

So now we know that the set A has n elements and that B has elements k .
Therefore, naturally, we cannot request that any set be more divided into Min(n,k) .

Let's say we divided both sets into 2 groups (sections), now we have 1*2=2! unique pairs between two sets.
Another example is 3 groups, each of which will give us 1*2*3=3! unique couples.

However, we have not finished yet, after any set is divided into several subsets (groups), we can still order the elements in many combinations.
So, for m sections of the set, we need to find how many combinations of placing elements n in sections m .
This can be found using the Stirling number of a formula of the second kind:

(Eq 1)

This formula gives you the number of ways to split a set of elements n into k nonempty sets.

Thus, the total number of combinations of pairs for all sections is from 1 to Min(n,k) :

(Eq 2)

In short, this is the sum of all combinations of sections of both sets, once all combinations of pairs.

So, now we understand how to separate and correlate our data, we can write the code down:

The code:

So, if we look at our final equation (2), we understand that we need four pieces of code for our solution.
1. Summation (or cycle)
2. A way to get our Stirling sets or sections from both sets
3. A method for obtaining the Cartesian product of both Stirling sets.
4. The way to rearrange the elements in the set. (P!)

In StackOverflow you can find many ways to permute elements and ways to search for Cartesian products, here is an example (as an extension method):

  public static IEnumerable<IEnumerable<T>> Permute<T>(this IEnumerable<T> list) { if (list.Count() == 1) return new List<IEnumerable<T>> { list }; return list.Select((a, i1) => Permute(list.Where((b, i2) => i2 != i1)).Select(b => (new List<T> { a }).Union(b))) .SelectMany(c => c); }

This was the easy part of the code.
The more complex part (imho) finds all possible n sections of the set.
Therefore, to solve this problem, I first solved the big problem of how to find all possible sections of the set (not only the size n).

I came up with this recursive function:

 public static List<List<List<T>>> AllPartitions<T>(this IEnumerable<T> set) { var retList = new List<List<List<T>>>(); if (set == null || !set.Any()) { retList.Add(new List<List<T>>()); return retList; } else { for (int i = 0; i < Math.Pow(2, set.Count()) / 2; i++) { var j = i; var parts = new [] { new List<T>(), new List<T>() }; foreach (var item in set) { parts[j & 1].Add(item); j >>= 1; } foreach (var b in AllPartitions(parts[1])) { var x = new List<List<T>>(); x.Add(parts[0]); if (b.Any()) x.AddRange(b); retList.Add(x); } } } return retList; }

Return value: List<List<List<T>>> means a list of all possible sections, where the section is a list of sets and the set is a list of elements.
We do not need to use the List type here, but it simplifies indexing.

So now let's get it all together:

Main code

 //Initialize our sets var employees = new [] { "Adam", "Bob" }; var jobs = new[] { "1", "2", "3" }; //iterate to max number of partitions (Sum) for (int i = 1; i <= Math.Min(employees.Length, jobs.Length); i++) { Debug.WriteLine("Partition to " + i + " parts:"); //Get all possible partitions of set "employees" (Stirling Set) var aparts = employees.AllPartitions().Where(y => y.Count == i); //Get all possible partitions of set "jobs" (Stirling Set) var bparts = jobs.AllPartitions().Where(y => y.Count == i); //Get cartesian product of all partitions var partsProduct = from employeesPartition in aparts from jobsPartition in bparts select new { employeesPartition, jobsPartition }; var idx = 0; //for every product of partitions foreach (var productItem in partsProduct) { //loop through the permutations of jobPartition (N!) foreach (var permutationOfJobs in productItem.jobsPartition.Permute()) { Debug.WriteLine("Combination: "+ ++idx); for (int j = 0; j < i; j++) { Debug.WriteLine(productItem.employeesPartition[j].ToArrayString() + " : " + permutationOfJobs.ToArray()[j].ToArrayString()); } } } }

Conclusion:

 Partition to 1 parts: Combination: 1 { Adam , Bob } : { 1 , 2 , 3 } Partition to 2 parts: Combination: 1 { Bob } : { 2 , 3 } { Adam } : { 1 } Combination: 2 { Bob } : { 1 } { Adam } : { 2 , 3 } Combination: 3 { Bob } : { 1 , 3 } { Adam } : { 2 } Combination: 4 { Bob } : { 2 } { Adam } : { 1 , 3 } Combination: 5 { Bob } : { 3 } { Adam } : { 1 , 2 } Combination: 6 { Bob } : { 1 , 2 } { Adam } : { 3 }

We can easily check our result by simply counting the results.
In this example, we have a set of 2 elements and a set of 3 elements, Equation 2 states that we need S (2,1) S (3,1) 1! + S (2,2) S (3,2) 2! = 1 + 6 = 7
That is exactly the number of combinations that we received.

For reference, examples of the Stirling number of the second kind are given:
S (1,1) = 1

S (2,1) = 1
S (2,2) = 1

S (3,1) = 1
S (3,2) = 3
S (3,3) = 1

S (4,1) = 1
S (4,2) = 7
S (4,3) = 6
S (4,4) = 1

Edit 06/19/2012

 public static String ToArrayString<T>(this IEnumerable<T> arr) { string str = "{ "; foreach (var item in arr) { str += item + " , "; } str = str.Trim().TrimEnd(','); str += "}"; return str; }

Change 24.6.2012

The main part of this algorithm is to find Stirling sets, I used the inneficient Permutation method, which is faster here, based on the QuickPerm algorithm:

 public static IEnumerable<IEnumerable<T>> QuickPerm<T>(this IEnumerable<T> set) { int N = set.Count(); int[] a = new int[N]; int[] p = new int[N]; var yieldRet = new T[N]; var list = set.ToList(); int i, j, tmp ;// Upper Index i; Lower Index j T tmp2; for (i = 0; i < N; i++) { // initialize arrays; a[N] can be any type a[i] = i + 1; // a[i] value is not revealed and can be arbitrary p[i] = 0; // p[i] == i controls iteration and index boundaries for i } yield return list; //display(a, 0, 0); // remove comment to display array a[] i = 1; // setup first swap points to be 1 and 0 respectively (i & j) while (i < N) { if (p[i] < i) { j = i%2*p[i]; // IF i is odd then j = p[i] otherwise j = 0 tmp2 = list[a[j]-1]; list[a[j]-1] = list[a[i]-1]; list[a[i]-1] = tmp2; tmp = a[j]; // swap(a[j], a[i]) a[j] = a[i]; a[i] = tmp; //MAIN! // for (int x = 0; x < N; x++) //{ // yieldRet[x] = list[a[x]-1]; //} yield return list; //display(a, j, i); // remove comment to display target array a[] // MAIN! p[i]++; // increase index "weight" for i by one i = 1; // reset index i to 1 (assumed) } else { // otherwise p[i] == i p[i] = 0; // reset p[i] to zero i++; // set new index value for i (increase by one) } // if (p[i] < i) } // while(i < N) }

This will cut the time in half.
However, most processor cycles go into line building, which is especially necessary for this example.
It will be a little faster:

  results.Add(productItem.employeesPartition[j].Aggregate((a, b) => a + "," + b) + " : " + permutationOfJobs.ToArray()[j].Aggregate((a, b) => a + "," + b));

Compiling on x64 would be better because these lines take up a lot of memory.

+4

Erez robinson Jun 13 '12 at 6:16

source share

Could you use another library? Here is a common library for combinations (you obviously want it not to be repeated). Then you will only need to do foreach over the list of your employees and start a combination with both.

I don’t think you are making any advantages for yourself from a great perspective. O, is efficiency important here?

This is from the hip, but it must be the code to get what you want (with this library):

 Combinations<string> combinations = new Combinations<string>(jobs, 2); foreach(IList<string> c in combinations) { Console.WriteLine(String.Format("{{{0} {1}}}", c[0], c[1])); }

And then it will be necessary to apply to each employee

+3

Rikon Jun 04 '12 at 20:09

source share

Say we have two sets: A and B.
A = {a1, a2, a3} ; B = {b1, b2, b3}

First we give a set consisting of sets containing a subset and its complement:

 {a1} {a2 a3} {a2} {a1 a3} {a3} {a1 a2} {a2, a3} {a1} ...

To do this, you can use the Rikon library above or write your own code. You will need to do this:

Get a set of all subsets (known as a power set)
For each subset, we get the complement or the remaining elements of this subset from the larger set.
Join them in a motorcade or pair; e.g. (subset, addition)

Here is the order of the orders; {a1} {a2 a3} and {a2 a3} {a1} are different tuples.

Then we get a similar set for B. Then we cross-connect between the two sets to get:

 {a1} {a2 a3} | {b1} {b2 b3} {a2} {a1 a3} | {b2} {b1 b3} ...

This pretty much matches your description above. Just look at {Bob, Adam} as one set and {1, 2, 3} as another set. You will have to discard some empty sets (since empty subsets are also included in the power kit), depending on your requirements. This is a common sense, though, as far as I can tell. Sorry for the lack of code; I have to go to sleep: P

0

Gregros Jun 04 '12 at 22:28

source share

Ignoring your other restrictions (for example, the maximum command size), you create a partition of the set P and partition the set Q with the same number of subsets, and then find all the combinations of one of the sets in compare the first section with the second.

Michael Orlov’s article has a simple algorithm for generating sections, where each iteration uses a constant space in this article . It also provides an algorithm for listing partitions of a given size.

We start with P = {A, B} and Q = {1, 2, 3}, then partitions of size 1 will be [P] and [Q], so the only pairing ([P], [Q])

For partitions of size 2, P has only two elements, so only one partition of size 2 [{A}, {B}], Q has three partitions of size 2 [{1}, {2, 3}], [{1, 2} , {3}], [{1, 3}, {2}].

Since each section of Q contains two subsets, there are 2 combinations of each section, which gives you six pairs:

 ( [ { A }, { B } ], [ { 1 }, { 2, 3 } ] ) ( [ { A }, { B } ], [ { 2, 3 }, { 1 } ] ) ( [ { A }, { B } ], [ { 1, 2 }, { 3 } ] ) ( [ { A }, { B } ], [ { 3 }, { 1, 2 } ] ) ( [ { A }, { B } ], [ { 1, 3 }, { 2 } ] ) ( [ { A }, { B } ], [ { 2 }, { 1, 3 } ] )

Since the size of one of the source sets was 2, it does not have partitions of size 3, so we stop.

0

Pete kirkham Jun 06 '12 at 10:14

source share

Simplevar · Accepted Answer · 2012-06-06T12:18:03+0000

In my answer, I will ignore the last result you had: Adam, Bob: 1, 2, 3 , because it is a logical exception. Go to the end to see my conclusion.

Explanation:

The idea would be to iterate over the combinations "which group will belong to the element".

Say that you have elements "a, b, c", and you have groups "1, 2", let's have an array of size 3, as the number of elements that will contain all possible combinations of groups "1, 2", with repetition:

 {1, 1, 1} {1, 1, 2} {1, 2, 1} {1, 2, 2} {2, 1, 1} {2, 1, 2} {2, 2, 1} {2, 2, 2}

Now we take each group and make it a key-value, with the following logic:

The group elements[i] will be the value of comb[i] .

 Example with {1, 2, 1}: a: group 1 b: group 2 c: group 1 And in a different view, the way you wanted it: group 1: a, c group 2: b

After all this, you just need to filter out all the combinations that do not contain all the groups, because you wanted all the groups to have at least one value.

So, you should check if all groups are displayed in a certain combination and filter those that do not match, so you will be left with:

 {1, 1, 2} {1, 2, 1} {1, 2, 2} {2, 1, 2} {2, 2, 1} {2, 1, 1}

This will lead to:

 1: a, b 2: c 1: a, c 2: b 1: a 2: b, c 1: b 2: a, c 1: c 2: a, b 1: b, c 2: a

This group combination logic will work for more elements and groups. Here is my implementation, which could probably be done better, because even I got a little lost there while I encoded it (this is really not an intuitive problem), but it works well.

Implementation:

 public static IEnumerable<ILookup<TValue, TKey>> Group<TKey, TValue> (List<TValue> keys, List<TKey> values, bool allowEmptyGroups = false) { var indices = new int[values.Count]; var maxIndex = values.Count - 1; var nextIndex = maxIndex; indices[maxIndex] = -1; while (nextIndex >= 0) { indices[nextIndex]++; if (indices[nextIndex] == keys.Count) { indices[nextIndex] = 0; nextIndex--; continue; } nextIndex = maxIndex; if (!allowEmptyGroups && indices.Distinct().Count() != keys.Count) { continue; } yield return indices.Select((keyIndex, valueIndex) => new { Key = keys[keyIndex], Value = values[valueIndex] }) .OrderBy(x => x.Key) .ToLookup(x => x.Key, x => x.Value); } }

Application:

 var employees = new List<string>() { "Adam", "Bob"}; var jobs = new List<string>() { "1", "2", "3"}; var c = 0; foreach (var group in CombinationsEx.Group(employees, jobs)) { foreach (var sub in group) { Console.WriteLine(sub.Key + ": " + string.Join(", ", sub)); } c++; Console.WriteLine(); } Console.WriteLine(c + " combinations.");

Conclusion:

 Adam: 1, 2 Bob: 3 Adam: 1, 3 Bob: 2 Adam: 1 Bob: 2, 3 Adam: 2, 3 Bob: 1 Adam: 2 Bob: 1, 3 Adam: 3 Bob: 1, 2 6 combinations.

UPDATE

Prototype Protocol of Combined Keys:

 public static IEnumerable<ILookup<TKey[], TValue>> GroupCombined<TKey, TValue> (List<TKey> keys, List<TValue> values) { // foreach (int i in Enumerable.Range(1, keys.Count)) for (var i = 1; i <= keys.Count; i++) { foreach (var lookup in Group(Enumerable.Range(0, i).ToList(), keys)) { foreach (var lookup1 in Group(lookup.Select(keysComb => keysComb.ToArray()).ToList(), values)) { yield return lookup1; } } } /* Same functionality: return from i in Enumerable.Range(1, keys.Count) from lookup in Group(Enumerable.Range(0, i).ToList(), keys) from lookup1 in Group(lookup.Select(keysComb => keysComb.ToArray()).ToList(), values) select lookup1; */ }

There is still a slight duplicate problem, but it gives all the results.

Here is what I would use to remove duplicates as a workaround :

 var c = 0; var d = 0; var employees = new List<string> { "Adam", "Bob", "James" }; var jobs = new List<string> {"1", "2"}; var prevStrs = new List<string>(); foreach (var group in CombinationsEx.GroupCombined(employees, jobs)) { var currStr = string.Join(Environment.NewLine, group.Select(sub => string.Format("{0}: {1}", string.Join(", ", sub.Key), string.Join(", ", sub)))); if (prevStrs.Contains(currStr)) { d++; continue; } prevStrs.Add(currStr); Console.WriteLine(currStr); Console.WriteLine(); c++; } Console.WriteLine(c + " combinations."); Console.WriteLine(d + " duplicates.");

Conclusion:

 Adam, Bob, James: 1, 2 Adam, Bob: 1 James: 2 James: 1 Adam, Bob: 2 Adam, James: 1 Bob: 2 Bob: 1 Adam, James: 2 Adam: 1 Bob, James: 2 Bob, James: 1 Adam: 2 7 combinations. 6 duplicates.

Note that it will also create non-combined groups (if possible, due to the lack of empty groups). To create ONLY combination keys, you need to replace this:

 for (var i = 1; i <= keys.Count; i++)

Wherein:

 for (var i = 1; i < keys.Count; i++)

At the beginning of the GroupCombined method. Test the method with three people and three tasks to see how it works.

OTHER CHANGE:

The best way to handle duplicates is to handle duplicate key combinations at the GroupCombined level:

 public static IEnumerable<ILookup<TKey[], TValue>> GroupCombined<TKey, TValue> (List<TKey> keys, List<TValue> values) { for (var i = 1; i <= keys.Count; i++) { var prevs = new List<TKey[][]>(); foreach (var lookup in Group(Enumerable.Range(0, i).ToList(), keys)) { var found = false; var curr = lookup.Select(sub => sub.OrderBy(k => k).ToArray()) .OrderBy(arr => arr.FirstOrDefault()).ToArray(); foreach (var prev in prevs.Where(prev => prev.Length == curr.Length)) { var isDuplicate = true; for (var x = 0; x < prev.Length; x++) { var currSub = curr[x]; var prevSub = prev[x]; if (currSub.Length != prevSub.Length || !currSub.SequenceEqual(prevSub)) { isDuplicate = false; break; } } if (isDuplicate) { found = true; break; } } if (found) { continue; } prevs.Add(curr); foreach (var group in Group(lookup.Select(keysComb => keysComb.ToArray()).ToList(), values)) { yield return group; } } } }

It seems to be wise to add restrictions to the method so that TKey ICompareable<TKey> and possibly also IEquatable<TKey> .

The result is 0 duplicates at the end.

Algorithm for extracting any possible combination of a sublist of two lists

More articles: