This is a standard problem in stream algorithms (where you have a huge (potentially endless) data stream), and you have to calculate some statistics from this stream, passing through this stream once.
Clearly, you can approach it with hashing or sorting, but with a potentially infinite stream, you will obviously run out of memory. So you need to do something smart here.
The most element is an element that contains more than half the size of the array . This means that the majority element occurs more than all other elements combined, or if you count the number of times, the majority element appears and subtract the number of all other elements, you will get a positive number.
So, if you count the number of some elements and subtract the number of all other elements and get the number 0, then your original element cannot be a majority element. This, if the basis for the correct algorithm:
They have two variables, a counter and a possible element. Iterate the stream, if the counter is 0 - you overwrite the possible element and initialize the counter, if the number matches the possible element - increase the counter, otherwise decrease it. Python Code:
def majority_element(arr): counter, possible_element = 0, None for i in arr: if counter == 0: possible_element, counter = i, 1 elif i == possible_element: counter += 1 else: counter -= 1 return possible_element
It is clear that the algorithm is O(n) with a very small constant up to O(n) (for example, 3). It also seems that the complexity of the space is O(1) , because we have only three initialized variables. The problem is that one of these variables is a counter that can potentially grow to n (when the array consists of the same numbers). And to keep the number n , you need an O(log (n)) space. Thus, from a theoretical point of view, this time is O(n) and O(log(n)) . With practical, you can put the number 2 ^ 128 in longint, and this number of elements in the array is unimaginably large.
Also note that the algorithm only works if there is a majority element. If such an element does not exist, it will still return a certain number, which will certainly be incorrect. (easy to modify the algorithm to determine if a majority element exists)
History Channel: This algorithm was invented sometime in 1982 by Boyer Moore and named the voting algorithm of most Boyer-Moore .