Choosing five numbers that add up to S

Given an array A of N non-negative numbers, I'm interested in finding the number of ways to select 5 numbers (from different positions in the array) so that their sum is S

In O(N^3) there is a simple solution:

 Let H be a hash table of (sum, position of leftmost element of sum) for i = 0, N for j = i + 1, N H.add(A[i] + A[j], i) numPossibilities = 0 for i = 0, N for j = i + 1, N for k = j + 1, N numPossibilities += H.get(S - (A[i] + A[j] + A[k]), k) 

Where H.get(x, y) returns the number of elements in the hash, the sum of which has the same hash as x , and the leftmost element is greater than k.

As an alternative, we can add sums of 3 elements to the hash table, and then continue with two nested loops. However, the complexity remains the same, and we just use more memory.

Assuming the inputs are pretty random (so hashing the worst case), is there an algorithm that can solve this in O(N^2) or maybe O(N^2 log N) or even O(N^3) if it is executed in all cases? I think binary search may help, but I don't see how to deal with overlapping indexes.

The above solution in practice is much better than the naive 5-for-loop solution, however I have a feeling that we can do much better, therefore this question.

If you can prove that such an algorithm does not exist, how can the above solution be optimized?

Clarification:

The above algorithm is really O(N^5) in the worst case, for example, when a given array contains nothing but the number 1, and we have S = 5 . On average, however, the H.get method H.get much closer to O(1) , so my average cubic complexity.

If you implement this and run it on 1000 random numbers in a large interval (say 0 to Int32.MaxValue), you will see that it works relatively quickly. However, it is not difficult to find inputs for which it takes a long time. Even if we cannot achieve this fast enough for all equal numbers, what kind of optimizations could we do?

Under the same assumptions , can we do better, asymptotically, or at least in practice?

+19
optimization algorithm hash
Sep 10 '10 at 11:29
source share
4 answers

I think that the fact that the numbers should have different positions is a red herring. You can use the inclusion-exclusion principle to count the number of all positions (i, j, k, l, m), where x [i] + x [j] + x [k] + x [l] + x [m] = S and i, j, k, l, m are different:

  sums with i!=j,i!=k,i!=l...,l!=m = all sums - sums with i=j - ... - sums with l=m + sums with i=j and j=k + ... + sums with k=l and l=m - ... + sums with i=j=k=l=m 

The calculation of the sums to the right, except for the first, is performed in O (N ^ 2 log N). For example, to find the number of positions (i, i, k, l, m) such that x [i] + x [i] + x [k] + x [l] + x [m] = S you can create sorted arrays with sums {2a + b} and {c + d} and check whether they have elements x, y such that x + y = S.

Basic algorithm

So, it is enough to calculate how many of them are (i, j, k, l, m), where x[i]+x[j]+x[k]+x[l]+x[m]=S and i , j, k, l, m are not necessarily different. Basically, you can use the Moron solution this way:

  • Create a sorted array of sums {a + b: a, b - numbers from the array}; group equal elements into one, remembering the counter. For example, for the array [1,1,3] you get nine sums [2,2,2,2,4,4,4,4,4,6] of form a + b. Then you group the same elements, remembering the calculations: [(2,4), (4,4), (6,1)]. This step is O (N ^ 2 log N).

  • For each e, count how many pairs there are pairs of elements that sum with Se. As in Moron’s decision, you have two pointers: one goes to the right, one goes to the left. If the amount is too low, move the first pointer, increasing the amount; if the amount is too high, move the second pointer to decrease it.

    Assume the amount is correct. This means that one points to (a, x) and the second to (b, y), where a + b = Se. Increase the counter by x * y and move both pointers (you can only move one pointer, but there will be no match in the next step, and the second pointer will then be moved.).

For example, for the array [(2,4), (4,4), (6,1)] and Se = 8, the first pointer points to (2,4), and the second to (6,1). Since 2 + 6 = 8, you add 4 and move both pointers. Now they both point to (4.4), so you increase the counter by 16. Do not stop! The pointers go to each other, and you first get at (6.1), a second at (2.4), increase the counter by 4.

So, at the end there are 4 + 16 + 4 = 24 ways to get 8 as the sum of 4 elements from [1,1,3]:

 >>> len([k for k in itertools.product([1,1,3],repeat=4) if sum(k) == 8]) 24 Prelude Control.Monad> length [k | k <- replicateM 4 [1,1,3], sum k == 8] 24 

Repeating this for each e, you will get the number of ways to get S as the sum of 5 elements.

For [1,1,1,1,1,1] and Se = 4 the array of sums will be [(2,25)], and you will get that there are 625 ways to get the sum of 4.

For each e, this step is linear in size of the array (so O (N 2 )), so the loop takes O (N 3 ).

About inclusion-exception :

Call the five-digit (i, j, k, l, m) “correct” if x [i] + x [j] + x [k] + x [l] + x [m] = S. The goal is to calculate the number of proper fivefold (i, j, k, l, m), where i, j, k, l, m are pairwise different. The main algorithm can calculate in O (N ^ 3) how many of them have the correct five-line numbers, which are not necessarily separate components. The rest is to count these "wrong" tuples.

Consider the subsets of eigenfold fivefold numbers

A xy = {(i, j, k, l, m): the indices at the xth and yth places are the same}

For example, A 24 is a set of regular five-fold (i, j, k, l, m), where j = l.

A set of incorrect five-fold numbers:

A 12 ∪ A 13 ∪ ... ∪ A 45

Calculation of its power using inclusion-exclusion:

| A 12 ∪ A 13 ∪ ... ∪ A 45 | = | A 12 | + | A 13 | + ... + | A 45 | - | A 12 ∩ A 23 | -... - | A 34 ∩ A 45 | + ... + | A 12 ∩ A 23 ∩ ... ∩ A 35 ∩ A 45 |

There are 2 10 = 1024 terms. But a lot of power is the same.

The only thing you should consider:

  • X 1 = | A 12 | - fivefold with i = j
  • X 2 = | A 12 ∩ A 23 | - fivefold with i = j = k
  • X 3 = | A 12 ∩ A 23 ∩ A 34 | - fivefold with i = j = k = l
  • X 4 = | A 12 ∩ A 23 ∩ A 34 ∩ A 45sub> | - five times with i = j = k = l = m
  • X 5 = | A 12 ∩ A 34 | - fivefold with i = j, k = l
  • X 6 = | A 12 ∩ A 23 ∩ A 45 | - fivefold with i = j = k, l = m

You can observe, rearranging, all other sets are presented here. For example, A 24 has the same power as A 12 .

The power counting of these 6 sets is pretty simple. For the first you create arrays {2a + b} and {c + d} and count how many common elements; for others there are only 3 or less free variables, so even a simple loop will give you O (N ^ 3).

To simplify the amount, I wrote the following Haskell program:

 import Control.Monad import Data.List import qualified Data.Map as Map -- Take equivalence relation, like [(1,2),(2,3)] and return its partition, like [3,1,1] f xs = sort $ map length $ foldr f (map return [1..5]) xs where f (x,y) a = let [v1] = filter (x `elem`) a [v2] = filter (y `elem`) a in if v1 == v2 then a else (a \\ [v1,v2]) ++ [v1++v2] -- All 1024 subsets of [(1,2),(1,3), ..., (4,5)] subsets = filterM (const [False, True]) [(i,j) | i <- [1..5], j <- [i+1..5]] res = Map.fromListWith (+) $ map (\k -> (fk, (-1)^(length k))) subsets *Main> res Loading package array-0.3.0.1 ... linking ... done. Loading package containers-0.3.0.0 ... linking ... done. fromList [([1,1,1,1,1],1),([1,1,1,2],-10),([1,1,3],20),([1,2,2],15),([1,4],-30),([2,3],-20),([5],24)] 

which means the formula

all subsets are 10X 1 + 20X 2 - 30X 3 + 24X 4 + 15X 5 - 20X 6 .

Check:

How many of them have five lines in [0,0,0, ..., 0], adding up to 0? One way to calculate this directly, the second way is to use the formula (and not care about different positions):

 direct x = x*(x-1)*(x-2)*(x-3)*(x-4) indirect x = x^5 - 10 * x^4 + 20 * x^3 + 15 * x^3 - 30 * x^2 - 20*x^2 + 24*x *Main> direct 100 9034502400 *Main> indirect 100 9034502400 

Other notes:

In addition, there is a solution O (a n log a n ): Calculate (x a 1 + ... + x a n ) 5 using FFT, the result is a coefficient at x S. This allows you to use multiple i , but you can subtract polynomials such as (x 2a 1 + ... + x 2a n ) 5 * (x a 1 + ... + x a n ) 3 , etc. in accordance with the principle of inclusion-exclusion.

In some limited computational models, it has been shown to solve this problem O (N ^ 3) time is required.

+11
Sep 10 '10 at 18:09
source share

O (N ^ 3) seems possible (although I did not try to prove it).

Take all possible pairs and create a new array (say B) of size O (N ^ 2), which contains the sum of all possible pairs. Also track the index of two elements from the original array that gave this amount. - O (N ^ 2)

Now sort the array - O (N ^ 2LogN).

Now for each element a in the original array, try to find two elements from B that add up to Sa. Since B is sorted, this can be done in O (B) time: Start with two pointers, one for max and one for min.

If the sum of these two> Sa, reduce the pointer to about max.

If the sum of these two is <Sa, increase the pointer by about min.

If the sum is equal, then you have found one pair of candidates and a new sorted submatrix in which you will search for the next possible pair of candidates. (You must make sure that two elements of B come from four elements of A). (There may be potential problems)

Thus, you can count the number of times that Sa occurs as the sum of two elements of B that come from four elements of the original array (not including a).

So O (N ^ 2) time for O (N) elements is O (N ^ 3).

Hope this helps.

+4
Sep 10 2018-10-10
source share

You can do this in O (N * S) with dynamic programming:

 static int count(int[] A, int S) { final int K = 5; // In count[n][s] we'll count the number of ways you can pick n numbers such that their sum is s int[][] count = new int[K+1][S+1]; count[0][0] = 1; // The base case for (int i = 0; i < A.length; i++) for (int n = K; n >= 1; n--) for (int s = A[i]; s <= S; s++) count[n][s] += count[n-1][s - A[i]]; return count[K][S]; } 
+3
Sep 10 2018-10-10
source share

It might be better to first create an array with only different values ​​and count their appearance in the original array. Since only the number of solutions is required, and not the solutions themselves, this can be faster if combined calculations are used.

1) Array Sort A O (N log N)

2) Create a new array B where all values ​​are different. Also keep the value counter in the original array A for each element in B O (N)

3) Create a new array C with sums of two elements B Including sums of the same element if count> 1. Also keep both indexes of elements from B O (| B | 2 )

4) Sort the array C sums O (| B | 2 (log | B | 2 ))

5) For each element from B find two real elements from C so that the three values ​​are summed up to S and the indices are in the same order. In pseudo code:

 num=0 for (i=0; i<n; i++) j=i k=|C|-1 while (j <= k) if (c[j].sum + c[k].sum = S - b[i].value) for (m=0; m<c[j].index.length; m++) for (n=0; n<c[k].index.length; n++) if (i < c[j].index[m].left < c[j].index[m].right < c[j].index[k].left < c[j].index[k].right) num+=b[i].count * b[c[j].index[m].left].count * b[c[j].index[m].right].count * b[c[j].index[k].left].count * b[c[j].index[k].right].count else if (b[i].count > 1 && i = c[j].index[m].left < c[j].index[m].right < c[j].index[k].left < c[j].index[k].right) num+= binomialcoefficient(b[i].count, 2) * b[c[j].index[m].right].count * b[c[j].index[k].left].count * b[c[j].index[k].right].count else if (b[c[j].index[m].left].count > 1 && i < c[j].index[m].left = c[j].index[m].right < c[j].index[k].left < c[j].index[k].right) num+= b[i].count * binomialcoefficient(b[c[j].index[m].left].count, 2) * b[c[j].index[k].left].count * b[c[j].index[k].right].count [..] else if (b[i].count > 2 && i = c[j].index[m].left = c[j].index[m].right < c[j].index[k].left < c[j].index[k].right) num+= binomialcoefficient(b[i].count, 3) * b[c[j].index[k].left].count * b[c[j].index[k].right].count [..] else if (b[i].count > 1 && b[c[j].index[m].right].count > 1 && i = c[j].index[m].left < c[j].index[m].right = c[j].index[k].left < c[j].index[k].right) num+= binomialcoefficient(b[i].count, 2) * binomialcoefficient(b[c[j].index[m].right].count, 2) * b[c[j].index[k].right].count [..] else if (b[i].count > 4 && i = c[j].index[m].left = c[j].index[m].right = c[j].index[k].left = c[j].index[k].right) num+= binomialcoefficient(b[i].count, 5) if (c[j].sum + c[k].sum >= S - b[i].value) k-- if (c[j].sum + c[k].sum <= S - b[i].value) j++ 

I’m not sure how temporary it is. The outer for loop is linked by O (| B |), the while loop is O (| B | 2 ), the inner loop by O (| B |), because B has only individual values. Thus, this is obvious in O (| B | 5 ). But its O (N), if all the elements in A have the same value, and if all the values ​​are different and random enough, it is possible to link the number of indices to the sum in C from the constant, which will lead to O (N 3 ).

The worst case can be somewhere with half the values ​​equal, and the other half random or with all numbers different, but with lots of duplicate amounts. But it will also lead to a much shorter cycle. I feel that while and the two internal loops are connected O (N 2 ), so O (N 3 ) is generally for all cases, but I cannot prove it.

Also an interesting question is what is the maximum number of possibilities for obtaining 5 numbers that are summed with S for an array of N disparate numbers. If it is in O (N 5 ), then the worst case of this algorithm is also O (N 5 ).

Maybe try this;).

+1
Sep 10 '10 at 17:39
source share



All Articles