Given a file containing 4.30 billion 32-bit integers, how can we find a number that appears at least twice?

Question

Given a file containing 4.30 billion 32-bit integers, how can we find a number that appears at least twice?

I came up with a separation and conquest algorithm for this. Just wanted to know if this would work or not?

The first average is calculated from the integer range, that is (0+ (1 <32-1)) → 1, and then this idea is applied: the range of the number from the beginning to the middle or from the middle to the end will always be less than the numbers that we will consider, because we are considering billions of numbers, and there will certainly be some numbers that will be repeated, since the range of a 32-bit integer is much less than a billion numbers.

def get_duplicate(input, start, end):  
  while True:
    mid = (start >> 1) + end - (end >> 1)
    less_to_mid = 0
    more_to_mid = 0
    equal_to_mid = 0
    for data in input:
        data = int(data, 16)
        if data < mid:
            less_to_mid += 1
        elif data == mid:
            equal_to_mid += 1
        else:
            more_to_mid += 1
    if equal_to_mid > 1:
        return mid
    elif mid-start < less_to_mid:
        end = mid-1
    elif end-mid < more_to_mid:
        start = mid+1

with open("codes\output.txt", 'r+') as f:
  content = f.read().split()
  print(get_duplicate(content, 0, 1<<32-1))

I know that we can use a bit array, but I just want to get your views on this solution and if the implementation is buggy.

+4

algorithm numbers numeric numerical-methods divide-and-conquer

noman pouigt 16 . '16 11:54

2

2 ^ 32- . , bitset, , . : , :

void print_twice_seen (Iterator &it)//iterates through all numbers
{
  std::bitset< (1L<<32) > seen;//bitset for 2^32 elements, assume 64bit system

  while(it.hasNext()){
       unsigned int val=it.next();//return current element and move the iterator
       if(seen[val])
           std::cout<<"Seen at least twice: "<<val<<std::endl;
       else
           seen.set(val, true);//remember as seen
  }
}

0

ead 16 . '16 12:56

WhatsUp · Accepted Answer · 2016-02-16T21:44:41+0000

. , , , .

, , .

A[65536] .
. , x, 1 A[x mod 65536].
, i , A[i] 65536. , 65536 * 63356 < 4.3 billion. , A[i0] , 65536.
A .
, x, x mod 65536 = i0. x 1 A[x / 65536].
, j , A[j] , 1. 65536 * j + i0.

Given a file containing 4.30 billion 32-bit integers, how can we find a number that appears at least twice?

More articles: