Find the most common entry in the array

You are given a 32-bit unsigned integer array with a length of up to 2 32 with the property that more than half of the entries in the array are N, for some 32-bit unsigned integer N. Find N by looking at each number in the array only once and using not more than 2 KB of memory.

Your decision must be deterministic and guaranteed to find N.

+22
language-agnostic algorithm time-complexity
Nov 10 '08 at 17:02
source share
8 answers

Keep one integer for each bit and increase this collection accordingly for each integer in the array.

In the end, some of the bits will have a count greater than half the length of the array β€” these bits define N. Of course, the count will be greater than the number of times N, but that's not the case. The important thing is that any bit that is not part of N cannot happen for more than half the time (because N has more than half of the entries), and any bit that is part of N should happen more than half time (because it will happen every time when N occurs, and any additional functions).

(No code at the moment - is about to lose pure access. Hope this is clear enough, though.)

+51
Nov 10 '08 at 17:13
source share

Boyer and Moore "Majority Voting Algorithm" - go down the array, supporting the current guess of the answer.

+39
Nov 10 '08 at 18:22
source share

You can do this with only two variables.

public uint MostCommon(UInt32[] numberList) { uint suspect = 0; int suspicionStrength = -1; foreach (uint number in numberList) { if (number==suspect) { suspicionStrength++; } else { suspicionStrength--; } if (suspicionStrength<=0) { suspect = number; } } return suspect; } 

Make the first number a suspicious number and continue the cycle through the list. If the number matches, increase the power of suspicion by one; if it does not match, lower the power of suspicion by one. If the level of suspiciousness reaches 0, the current number becomes suspicious. This will not work to find the most common number, only the number that makes up more than 50% of the group. Resist the urge to add validation if suspicionStrength more than half the length of the list - this will always lead to a more complete comparison.

PS I have not tested this code - use it at your own peril and risk.

+7
Nov 11 '08 at 2:55
source share

Pseudocode (C ++ notepad :-)) for the Jon algorithm:

 int lNumbers = (size_of(arrNumbers)/size_of(arrNumbers[0]); for (int i = 0; i < lNumbers; i++) for (int bi = 0; bi < 32; bi++) arrBits[i] = arrBits[i] + (arrNumbers[i] & (1 << bi)) == (1 << bi) ? 1 : 0; int N = 0; for (int bc = 0; bc < 32; bc++) if (arrBits[bc] > lNumbers/2) N = N | (1 << bc); 
+4
Nov 10 '08 at 17:37
source share

Note that if the sequence is a0, a1, . . . , anβˆ’1 a0, a1, . . . , anβˆ’1 a0, a1, . . . , anβˆ’1 contains a leader, then after deleting a pair of elements of different values, the rest of the sequence still has the same leader. Indeed, if we remove two different elements, then only one of them can be a leader. The leader in the new sequence occurs more than n/2 βˆ’ 1 = (nβˆ’2)/2 times. Consequently, he is still the leader of the new sequence of elements n βˆ’ 2 .

Here is a Python implementation with O (n) time complexity:

 def goldenLeader(A): n = len(A) size = 0 for k in xrange(n): if (size == 0): size += 1 value = A[k] else: if (value != A[k]): size -= 1 else: size += 1 candidate = -1 if (size > 0): candidate = value leader = -1 count = 0 for k in xrange(n): if (A[k] == candidate): count += 1 if (count > n // 2): leader = candidate return leader 
+2
May 29 '15 at 17:58
source share

This is a standard problem in stream algorithms (where you have a huge (potentially endless) data stream), and you have to calculate some statistics from this stream, passing through this stream once.




Clearly, you can approach it with hashing or sorting, but with a potentially infinite stream, you will obviously run out of memory. So you need to do something smart here.




The most element is an element that contains more than half the size of the array . This means that the majority element occurs more than all other elements combined, or if you count the number of times, the majority element appears and subtract the number of all other elements, you will get a positive number.

So, if you count the number of some elements and subtract the number of all other elements and get the number 0, then your original element cannot be a majority element. This, if the basis for the correct algorithm:

They have two variables, a counter and a possible element. Iterate the stream, if the counter is 0 - you overwrite the possible element and initialize the counter, if the number matches the possible element - increase the counter, otherwise decrease it. Python Code:

 def majority_element(arr): counter, possible_element = 0, None for i in arr: if counter == 0: possible_element, counter = i, 1 elif i == possible_element: counter += 1 else: counter -= 1 return possible_element 

It is clear that the algorithm is O(n) with a very small constant up to O(n) (for example, 3). It also seems that the complexity of the space is O(1) , because we have only three initialized variables. The problem is that one of these variables is a counter that can potentially grow to n (when the array consists of the same numbers). And to keep the number n , you need an O(log (n)) space. Thus, from a theoretical point of view, this time is O(n) and O(log(n)) . With practical, you can put the number 2 ^ 128 in longint, and this number of elements in the array is unimaginably large.

Also note that the algorithm only works if there is a majority element. If such an element does not exist, it will still return a certain number, which will certainly be incorrect. (easy to modify the algorithm to determine if a majority element exists)

History Channel: This algorithm was invented sometime in 1982 by Boyer Moore and named the voting algorithm of most Boyer-Moore .

+2
Mar 27 '16 at 4:02
source share

I have memories of this algorithm, which may or may not follow the 2K rule. It may need to be rewritten using stacks, etc., in order to avoid violating memory restrictions due to function calls, but this may be unnecessary because it only has a logarithmic number of such calls. In any case, I have vague memories from college or a recursive solution to this issue that is connected with division and victory, and the secret is that when you divide the groups in half, at least one of the halves still has more than half of its meanings equal to max. The basic rule for dividing is that you return the two upper values ​​of the candidate, one of which is the upper value, and one of them is some other value (which may or may not be the 2nd). I forgot the algorithm itself.

0
Nov 10 '08 at
source share

Proof of correctness for buti-oxa / Jason Hernandez answer, assuming Jason's answer is the same as answer to buti-oxa, and both work the way the described algorithm should work:

We define the adjusted suspicious strength as equal to the suspicious strength if the upper value is selected or -suspicion strength if the upper value is not selected. Each time you select the correct number, the current adjustment for the power of suspiciousness increases by 1. Each time you select the wrong number, it either drops by 1 or increases by 1, depending on whether the wrong number is currently selected. Thus, the smallest possible final adjusted suspicious force is equal to the number of [upper values] - the number of [other values]

0
Nov 12 '08 at 22:37
source share



All Articles