Find a duplicate element in an array of consecutive integers in time O (log n)

Question

Find a duplicate element in an array of consecutive integers in time O (log n)

A couple of days ago in an interview I was offered a program that will find the only duplicate element in a sequential integer array in O(log n) time.

The case is somewhat specific, since there are only 11 integers (from 1 to 10 in this order) plus one duplicate of any of these numbers inserted somewhere in the middle.

I was provided with an example similar to this:

{1, 2, 3, 4, 5, 6, 2 , 7, 8, 9, 10}

So now I came up with the following C code:

 #include <stdio.h> int main (void) { int a[11] = {1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10}, first=0, last=10, mid; while (1) { mid = (first + last)/2; if (mid+1 == a[mid]) first = mid+1; else if ((mid == 0) || (mid == 10) || (a[mid+1] - a[mid-1] == 1)) /* Order is Important, (a[mid+1] - a[mid-1] == 1) must be last */ break; else last = mid-1; } printf("Duplicate element is %d, located at position no. %d.", a[mid], mid+1); return 0; }

Would this satisfy O(log n) criteria correctly? And are there any alternatives / improvements to this?

+5

optimization c arrays algorithm duplicates

primekrieger Oct 31 '14 at 20:34

source share

2 answers

kraskevich · Answer 1 · 2014-10-31T20:45:11+0000

Yes, it has O(log n) time complexity.

You can make the code more understandable using the following fact: you need to find the smallest i such that a[i] != i + 1 , so it can be implemented in a more concise way:

 //invariant: the [0...low] prefix does not contain a duplicate // the [0...high] prefix contains a duplicate low = 0 //[0...0] prefix obviously does not contain a duplicate high = 10 //the entire array obviously contains a duplicate while high - low > 1: mid = (low + high) / 2 if a[mid] != mid + 1: high = mid else: low = mid print(a[high], high)

Amit sharma · Answer 2 · 2014-11-01T08:19:47+0000

We can modify the binary search algorithm to get a solution. In binary search , we have a key, and we used this key to find its position by dividing the size of the array. Here we do not have a key, instead we must find it. But the duplicate element behavior can be used to bisect size of the array. How? let's get a look:

A careful study of the data, we can easily see that after inserting a duplicate element in an arbitrary position (for example, index ) in a sequential array of elements, the property of the elements will change ( a[i] == i+1 → a[i] != i+1 ) from the position of index (including index ). Now this changed property can be used to divide the size of the array. Therefore, we can find a duplicate in O(log(n)) .

For example, consider this array: {1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10}

 {1, 2, 3, 4, 5, 6, 2, 7, 8, 9, 10} || from this position the the property of (a[i] == i+1) will no more satisfied.

This property can be a model for dividing the size of an array in a solution.

 void binary_duplictae_finder(int a[], int low, int high) { int mid=(low+high)/2; if(high - low > 1){ if(a[mid]!=mid+1) binary_duplictae_finder(a, low, mid); else binary_duplictae_finder(a, mid, high); } if(high==low+1) printf("%d ", a[high]); }

Find a duplicate element in an array of consecutive integers in time O (log n)

More articles: