How to generate combinations in pieces

I have an algorithm that performs calculations for each combination of input elements, for example. for each 5-element subset of a 100-element set. I port it to the GPU and now I have the original version. To speed it up, I would like to load data from local memory, which is limited (for example, 32 KB), and may contain something like 20 input elements from 100. Therefore, I need to somehow separate my work and generate combinations in pieces. And now this is the hard part of how to do it. Most likely, I need to first load data for 20 elements and perform calculations for 5-element subsets of these 20 elements. After that, I would have to replace some (or all) of them with new ones and perform calculations for them, then rinse and repeat. Could you tell me how can I select replaced items in local memory to avoid duplication of work? So far, I have come to the conclusion that I will have to replace at least 16 of them at once in order to avoid duplication of work problems.

Edit: here is an example for creating two-element combinations of 5 elements. Here is a complete list of all possible cases:

1, 2 1, 3 1, 4 1, 5 2, 3 2, 4 2, 5 3, 4 3, 5 4, 5 

Local memory on the GPU is limited - suppose it can contain only 3 elements. Therefore, I need to somehow divide my problem into creating combinations of 2 elements from 3. I have to repeat this several times until I get all the combinations from the list above. As a first step, I can load elements 1, 2, 3 into local memory, so I get the following combinations:

 1, 2 1, 3 2, 3 

Now I need to load a different set of elements and calculate combinations for them. It can be 1, 4, 5. It will produce the following combinations:

 1, 4 1, 5 4, 5 

On the other hand, setting 1, 2, 4 is not valid - this will lead to a duplication of the combination:

 1, 2 // duplicate 1, 4 // ok, new 2, 4 // ok, new 

After this step, 4 more combinations are created (list below). The algorithm should be able to generate another 2-element combination of 3 elements and somehow process the last (10th) combination.

 2, 4 2, 5 3, 4 3, 5 

By breaking the work in this way, I can process all combinations of the original input data set using limited local memory, which can contain only part of this file.

+5
source share
2 answers

Say for example. you have 100 elements, you can store 20 in memory, and you need all 5-element combinations, of which there is C (100.5) = 75,287,520.

In order to be able to generate all combinations, each combination of 5 elements must be in memory at some point. This can be done by dividing the elements into groups of 20/5 = 4 elements; at input 25, and C (25.5) = 53.130 combinations of 5 groups.

For each combination of groups, we start by creating combinations with one element from each of the five groups; this gives us 53,130 x 4 5 = 54,405,120 unique combinations.

Now we have combinations where each element comes from another group, which is the section [1,1,1,1,1]. We still need to find combinations for the sections [2,1,1,1,1], [2,2,1], [3,1,1], [3,2] and [4,1]. The easiest way is to do this in separate stages, but the fastest way, of course, will be to include them in the first stage for [1,1,1,1,1,1], since all the combinations of groups we will ever need are loaded into memory at some point in the first stage.

For the [2,1,1,1] section, we will load each group in turn as a group with 2 elements, and then load each combination of 3 groups from the remaining 24 groups and take an element from each of them. This would require 25 x C (24.3) = 50,600 steps, each of which would receive C (4,2) x 4 3 = 384 combinations, or a total of 19,430,400.

A section such as [2,2,1] is a bit different, because we will load each group in turn to be the first group with two elements, but only the groups that come after it as a second group with 2 elements, so that avoid duplication. Then for each of them we will load each of 23 other groups to get the last item. This will require C (25.2) / 2 x 23 = 6900 steps, each of which leads to C (4.2) x C (4.2) x C (4.1) = 144 combinations, for a total of 993 600 .

Section [3,1,1] requires 25 x C (24.2) = 25 x 276 = 6900 steps, each of which leads to C (4.3) x 4 2 = 64 combinations, for a total of 441 600.

Section [3.2] requires 25 x 24 = 600 steps, each of which gives C (4.3) x C (4.2) = 24 combinations, for a total of 14,400.

Section [4.1] requires 25 x 24 = 600 steps, each of which gives C (4.4) x C (4.1) = 4 combinations, for a total of 2400.

So we have a total of:

 [1,1,1,1,1] -> 54,405,120 [2,1,1,1] -> 19,430,400 [2,2,1] -> 993,600 [3,1,1] -> 441,600 [3,2] -> 14,400 [4,1] -> 2,400 ---------- 75,287,520 combinations 

As you noticed, sections [3,2] and [4,1] require each combination of two groups, so they can be easily integrated in one step. Of course, if you integrate them in the first step for [1,1,1,1,1], you will only need to load 53,130 combinations of groups into memory, which is an absolute minimum.

(If it’s faster to load only one new group of elements in memory at each step, instead of running through the combinations of groups in lexicographical order, check this answer>.)


Integration of various stages

The easiest way to perform all group combinations for section [1,1,1,1,1] is to load groups 1 to 21 into group A, and then all groups after A to 22 in group B, all groups after B to 23 as group C, all groups after C to 24 as group D, and all groups after D to 25 in group E.

  ABCDE 1 2 3 4 5 <- ABCDE 1 2 3 4 6 <- ABCDE ... 1 2 3 4 25 <- ABCDE 1 2 3 5 6 <- ABCDE ... 1 2 3 24 25 <- ABCDE 1 2 4 5 6 <- ABCDE ... 1 2 23 24 25 <- ABCDE 1 3 4 5 6 <- ABCDE ... 1 22 23 24 25 <- ABCDE 2 3 4 5 6 <- ABCDE ... 21 22 23 24 25 <- ABCDE 

Four-part sections can be integrated by taking [2,1,1,1,1], [1,2,1,1], [1,1,2,1] and [1,1,1,2] of these combinations of four groups:

  ABCDE 1 2 3 4 5 <- ABCD ABCE ABDE ACDE BCDE 1 2 3 4 6 <- ABCE ABDE ACDE BCDE ... 1 2 3 4 25 <- ABCE ABDE ACDE BCDE 1 2 3 5 6 <- ABDE ACDE BCDE ... 1 2 3 24 25 <- ABDE ACDE BCDE 1 2 4 5 6 <- ACDE BCDE ... 1 2 23 24 25 <- ACDE BCDE 1 3 4 5 6 <- BCDE ... 1 22 23 24 25 <- BCDE 2 3 4 5 6 <- none ... 21 22 23 24 25 <- none 

Three-part sections can be integrated by taking [2,2,1], [2,1,2], [1,2,2], [3,1,1], [1,3, 1] and [ 1,1,3] of these combinations of three groups:

  ABCDE 1 2 3 4 5 <- ABC ABD ACD BCD ABE ACE BCE ADE BDE CDE 1 2 3 4 6 <- ABE ACE BCE ADE BDE CDE ... 1 2 3 4 25 <- ABE ACE BCE ADE BDE CDE 1 2 3 5 6 <- ADE BDE CDE ... 1 2 3 24 25 <- ADE BDE CDE 1 2 4 5 6 <- CDE ... 1 2 23 24 25 <- CDE 1 3 4 5 6 <- none ... 21 22 23 24 25 <- none 

Sections with two parts can be integrated by taking the elements [2,3], [3,2], [4,1] and [1,4] from these combinations of two groups:

  ABCDE 1 2 3 4 5 <- AB AC BC AD BD CD AE BE CE DE 1 2 3 4 6 <- AE BE CE DE ... 1 2 3 4 25 <- AE BE CE DE 1 2 3 5 6 <- DE ... 1 2 3 24 25 <- DE 1 2 4 5 6 <- none ... 21 22 23 24 25 <- none 

Usually

Elements e , elements m can be loaded into memory, and you need all combinations of elements k . Use k groups of size g = m / k .

Generate all sections of k with parts limited to size g :

 [1,1,1 ... 1] [2,1,1 ... 1] [2,2,1 ... 1] ... [k] (if k <= g) [1,1,1 ... 1] [2,1,1 ... 1] [2,2,1 ... 1] ... [g,kg] (if k > g) 

For each of them, all unique permutations are generated, for example:

 [3,2,2,1] -> [3,2,2,1] [3,2,1,2] [3,1,2,2] [2,3,2,1] [2,3,1,2] [1,3,2,2] [2,2,3,1] [2,1,3,2] [1,2,3,2] [2,2,1,3] [2,1,2,3] [1,2,2,3] 

Sort permutations by the number of parts, for example:

 k: [1,1,1 ... 1] k-1: [2,1 ... 1] [1,2 ... 1] ... [1,1 ... 2] ... 2: [g,kg] [kg,g] 

Load the first k groups into memory, for example:

 ABCDEF 1 2 3 4 5 6 

For each section length p , generate each set of groups of size p , for example:

 p=k: ABCDEF C(k,k) sets p=k-1: ABCDE ABCDF ABCEF ABDEF ACDEF BCDEF C(k,k-1) sets p=k-2: ABCD ABCE ABCF ABDE ABDF ... CDEF C(k,k-2) sets ... p=2: AB AC AD AE AF BC BD BE BF ... EF C(k,2) sets 

For each of these sets, create combinations for sections with the appropriate number of parts, for example:

 p=k-1: ABCDE [2,1,1,1,1] -> [a,a,b,c,d,e] C(g,2)*C(g,1)^4 combinations [1,2,1,1,1] -> [a,b,b,c,d,e] [1,1,2,1,1] -> [a,b,c,c,d,e] [1,1,1,2,1] -> [a,b,c,d,d,e] [1,1,1,1,2] -> [a,b,c,d,e,e] ABCDE [2,1,1,1,1] -> [a,a,b,c,d,f] [1,2,1,1,1] -> [a,b,b,c,d,f] [1,1,2,1,1] -> [a,b,c,c,d,f] [1,1,1,2,1] -> [a,b,c,d,d,f] [1,1,1,1,2] -> [a,b,c,d,f,f] ... BCDEF [2,1,1,1,1] -> [b,b,c,d,e,f] [1,2,1,1,1] -> [b,c,c,d,e,f] [1,1,2,1,1] -> [b,c,d,d,e,f] [1,1,1,2,1] -> [b,c,d,e,e,f] [1,1,1,1,2] -> [b,c,d,e,f,f] 

From the list of sets, delete sets that do not contain the last group (F):

 p=k: ABCDEF p=k-1: ABCDF ABCEF ABDEF ACDEF BCDEF p=k-2: ABCF ABDF ABEF ACDF ACEF ADEF BCDF BCEF BDEF CDEF ... p=2: AF BF CF DF EF 

Load the following groups to e / g into memory as group F, for example:

 ABCDEF 1 2 3 4 5 7 ... 1 2 3 4 5 e/g 

Again, for each of them and for each of the sets, create combinations for sections with the appropriate number of parts.

From the list of sets, delete sets that do not contain the last two groups (EF):

 p=k: ABCDEF p=k-1: ABCEF ABDEF ACDEF BCDEF p=k-2: ABEF ACEF ADEF BCEF BDEF CDEF ... p=2: EF 

Load the following groups before e / g-1 into memory as group E, and for each of them load the groups after E to e / g into memory as group F, for example:

 ABCDEF 1 2 3 4 6 7 ... 1 2 3 4 e/g-1 e/g 

Again, for each of them and for each of the sets, create combinations for sections with the appropriate number of parts.

From the list of sets, delete sets that do not contain the last three groups (DEF):

 p=k: ABCDEF p=k-1: ABDEF ACDEF BCDEF p=k-2: ADEF BDEF CDEF ... p=2: none 

Load the following groups before e / g-2 into memory as group D, and for each of them load groups after D to e / g-1 into memory as group E, and for each of them load groups after E to e / g in memory as a group F, for example:

 ABCDEF 1 2 3 5 6 7 ... 1 2 3 e/g-2 e/g-1 e/g 

Again, for each of them and for each of the sets, create combinations for sections with the appropriate number of parts.

And so on, until you reach:

  ABCDEF e/g-5 e/g-4 e/g-3 e/g-2 e/g-1 e/g 

only:

 p=k: ABCDEF 

Mileage for 21.9.3

number of elements: e = 21
elements: [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21]
combination size: k = 3 elements
memory size: m = 9 elements

Preparations:

 number of groups in memory: 3 (k) group size: g = m/k = 3 elements number of groups: e/g = 7 groups: 1:[1,2,3] 2:[4,5,6] 3:[7,8,9] 4:[10,11,12] 5:[13,14,15] 6:[16,17,18] 7:[19,20,21] number of element sets loaded into memory: C(e/g,k) = C(7,3) = 35 partitions of k with max part g: [1,1,1] [2,1] [3] permutations: 3:{[1,1,1]} 2:{[1,2],[2,1]} 1:{[3]} group sets: 3:{[A,B,C]} 2:{[A,B],[A,C],[B,C]} 1:{[A],[B],[C]} 

Stage 1:

 group sets: 3:{[A,B,C]} 2:{[A,B],[A,C],[B,C]} 1:{[A],[B],[C]} (all) ABC 1 2 3 -> elements in memory: [1,2,3] [4,5,6] [7,8,9] -> 84 combinations 3: [1,1,1]:[A,B,C] -> [a,b,c] -> [1,4,7] [1,4,8] [1,4,9] [1,5,7] [1,5,8] [1,5,9] [1,6,7] [1,6,8] [1,6,9] [2,4,7] [2,4,8] [2,4,9] [2,5,7] [2,5,8] [2,5,9] [2,6,7] [2,6,8] [2,6,9] [3,4,7] [3,4,8] [3,4,9] [3,5,7] [3,5,8] [3,5,9] [3,6,7] [3,6,8] [3,6,9] 2: [1,2]:[A,B] -> [a,b,b] -> [1,4,5] [1,4,6] [1,5,6] [2,4,5] [2,4,6] [2,5,6] [3,4,5] [3,4,6] [3,5,6] [1,2]:[A,C] -> [a,c,c] -> [1,7,8] [1,7,9] [1,8,9] [2,7,8] [2,7,9] [2,8,9] [3,7,8] [3,7,9] [3,8,9] [1,2]:[B,C] -> [b,c,c] -> [4,7,8] [4,7,9] [4,8,9] [5,7,8] [5,7,9] [5,8,9] [6,7,8] [6,7,9] [6,8,9] [2,1]:[A,B] -> [a,a,b] -> [1,2,4] [1,3,4] [2,3,4] [1,2,5] [1,3,5] [2,3,5] [1,2,6] [1,3,6] [2,3,6] [2,1]:[A,C] -> [a,a,c] -> [1,2,7] [1,3,7] [2,3,7] [1,2,8] [1,3,8] [2,3,8] [1,2,9] [1,3,9] [2,3,9] [2,1]:[B,C] -> [b,b,c] -> [4,5,7] [4,6,7] [5,6,7] [4,5,8] [4,6,8] [5,6,8] [4,5,9] [4,6,9] [5,6,9] 1: [3]:[A] -> [a,a,a] -> [1,2,3] [3]:[B] -> [b,b,b] -> [4,5,6] [3]:[C] -> [c,c,c] -> [7,8,9] 

Stage 2:

 group sets: 3:{[A,B,C]} 2:{[A,C],[B,C]} 1:{[C]} (sets without C removed) ABC 1 2 4 -> elements in memory: [1,2,3] [4,5,6] [10,11,12] -> 64 combinations 3: [1,1,1]:[A,B,C] -> [a,b,c] -> [1,4,10] [1,4,11] [1,4,12] [1,5,10] [1,5,11] [1,5,12] [1,6,10] [1,6,11] [1,6,12] [2,4,10] [2,4,11] [2,4,12] [2,5,10] [2,5,11] [2,5,12] [2,6,10] [2,6,11] [2,6,12] [3,4,10] [3,4,11] [3,4,12] [3,5,10] [3,5,11] [3,5,12] [3,6,10] [3,6,11] [3,6,12] 2: [1,2]:[A,C] -> [a,c,c] -> [1,10,11] [1,10,12] [1,11,12] [2,10,11] [2,10,12] [2,11,12] [3,10,11] [3,10,12] [3,11,12] [1,2]:[B,C] -> [b,c,c] -> [4,10,11] [4,10,12] [4,11,12] [5,10,11] [5,10,12] [5,11,12] [6,10,11] [6,10,12] [6,11,12] [2,1]:[A,C] -> [a,a,c] -> [1,2,10] [1,3,10] [2,3,10] [1,2,11] [1,3,11] [2,3,11] [1,2,12] [1,3,12] [2,3,12] [2,1]:[B,C] -> [b,b,c] -> [4,5,10] [4,6,10] [5,6,10] [4,5,11] [4,6,11] [5,6,11] [4,5,12] [4,6,12] [5,6,12] 1: [3]:[C] -> [c,c,c] -> [10,11,12] ABC 1 2 5 -> elements in memory: [1,2,3] [4,5,6] [13,14,15] -> 64 combinations 1 2 6 -> elements in memory: [1,2,3] [4,5,6] [16,17,18] -> 64 combinations 1 2 7 -> elements in memory: [1,2,3] [4,5,6] [19,20,21] -> 64 combinations 

Stage 3:

 group sets: 3:{[A,B,C]} 2:{[B,C]} (sets without B removed) ABC 1 3 4 -> elements in memory: [1,2,3] [7,8,9] [10,11,12] -> 45 combinations 3: [1,1,1]:[A,B,C] -> [a,b,c] -> [1,7,10] [1,7,11] [1,7,12] [1,8,10] [1,8,11] [1,8,12] [1,9,10] [1,9,11] [1,9,12] [2,7,10] [2,7,11] [2,7,12] [2,8,10] [2,8,11] [2,8,12] [2,9,10] [2,9,11] [2,9,12] [3,7,10] [3,7,11] [3,7,12] [3,8,10] [3,8,11] [3,8,12] [3,9,10] [3,9,11] [3,9,12] 2: [1,2]:[B,C] -> [b,c,c] -> [7,10,11] [7,10,12] [7,11,12] [8,10,11] [8,10,12] [8,11,12] [9,10,11] [9,10,12] [9,11,12] [2,1]:[B,C] -> [b,b,c] -> [7,8,10] [7,9,10] [8,9,10] [7,8,11] [7,9,11] [8,9,11] [7,8,12] [7,9,12] [8,9,12] ABC 1 3 5 -> elements in memory: [1,2,3] [7,8,9] [13,14,15] -> 45 combinations 1 3 6 -> elements in memory: [1,2,3] [7,8,9] [16,17,18] -> 45 combinations 1 3 7 -> elements in memory: [1,2,3] [7,8,9] [19,20,21] -> 45 combinations 1 4 5 -> elements in memory: [1,2,3] [7,8,9] [13,14,15] -> 45 combinations 1 4 6 -> elements in memory: [1,2,3] [7,8,9] [16,17,18] -> 45 combinations 1 4 7 -> elements in memory: [1,2,3] [7,8,9] [19,20,21] -> 45 combinations 1 5 6 -> elements in memory: [1,2,3] [7,8,9] [16,17,18] -> 45 combinations 1 5 7 -> elements in memory: [1,2,3] [7,8,9] [19,20,21] -> 45 combinations 1 6 7 -> elements in memory: [1,2,3] [7,8,9] [19,20,21] -> 45 combinations 

Stage 4:

 group sets: 3:{[A,B,C]} (sets without A removed) ABC 2 3 4 -> elements in memory: [4,5,6] [7,8,9] [10,11,12] -> 27 combinations 3: [1,1,1]:[A,B,C] -> [a,b,c] -> [4,7,10] [4,7,11] [4,7,12] [4,8,10] [4,8,11] [4,8,12] [4,9,10] [4,9,11] [4,9,12] [5,7,10] [5,7,11] [5,7,12] [5,8,10] [5,8,11] [5,8,12] [5,9,10] [5,9,11] [5,9,12] [6,7,10] [6,7,11] [6,7,12] [6,8,10] [6,8,11] [6,8,12] [6,9,10] [6,9,11] [6,9,12] ABC 2 3 5 -> elements in memory: [4,5,6] [7,8,9] [13,14,15] -> 27 combinations 2 3 6 -> elements in memory: [4,5,6] [7,8,9] [16,17,18] -> 27 combinations 2 3 7 -> elements in memory: [4,5,6] [7,8,9] [19,20,21] -> 27 combinations 2 4 5 -> elements in memory: [4,5,6] [10,11,12] [13,14,15] -> 27 combinations 2 4 6 -> elements in memory: [4,5,6] [10,11,12] [16,17,18] -> 27 combinations 2 4 7 -> elements in memory: [4,5,6] [10,11,12] [19,20,21] -> 27 combinations 2 5 6 -> elements in memory: [4,5,6] [13,14,15] [16,17,18] -> 27 combinations 2 5 7 -> elements in memory: [4,5,6] [13,14,15] [19,20,21] -> 27 combinations 2 6 7 -> elements in memory: [4,5,6] [16,17,18] [19,20,21] -> 27 combinations 3 4 5 -> elements in memory: [7,8,9] [10,11,12] [13,14,15] -> 27 combinations 3 4 6 -> elements in memory: [7,8,9] [10,11,12] [16,17,18] -> 27 combinations 3 4 7 -> elements in memory: [7,8,9] [10,11,12] [19,20,21] -> 27 combinations 3 5 6 -> elements in memory: [7,8,9] [13,14,15] [16,17,18] -> 27 combinations 3 5 7 -> elements in memory: [7,8,9] [13,14,15] [19,20,21] -> 27 combinations 3 6 7 -> elements in memory: [7,8,9] [16,17,18] [19,20,21] -> 27 combinations 4 5 6 -> elements in memory: [10,11,12] [13,14,15] [16,17,18] -> 27 combinations 4 5 7 -> elements in memory: [10,11,12] [13,14,15] [19,20,21] -> 27 combinations 4 6 7 -> elements in memory: [10,11,12] [16,17,18] [19,20,21] -> 27 combinations 5 6 7 -> elements in memory: [13,14,15] [16,17,18] [19,20,21] -> 27 combinations 

Results:

 Phase 1: 84 = 84 combinations Phase 2: 4 x 64 = 256 combinations Phase 3: 10 x 45 = 450 combinations Phase 4: 20 x 27 = 540 combinations ---- 1330 combinations = C(21,3) 
+3
source

Depending on how difficult the calculation you make for each combination, the fastest way can be to split the range 0..C (n, k) into a bunch of subbands that will be processed in parallel, and generate the corresponding combinations directly in each subband on the GPU, using the unlock function to generate the first and classic algorithms to search for the next combination to generate subsequent ones.

0
source

Source: https://habr.com/ru/post/1272440/


All Articles