Remove duplicates from an array without using a hash table

Question

Remove duplicates from an array without using a hash table

I have an array that can contain duplicate elements (more than two duplicate elements). I wonder if it is possible to find and remove duplicates in an array:

without using a hash table (strict requirement)
without using a temporary secondary array. There are no restrictions on complexity.

PS : This is not a question about working at home

Asked my friend in a technical interview with yahoo

+3

arrays algorithm duplicates

Superman Dec 9 '10 at 6:57

source share

7 answers

O (NlogN): Sort and replace the same item with one copy.

O (N ² ): Run a nested loop to compare each element with the rest of the array, if duplicate is found, replace the duplicate with the element at the end of the array and reduce the size of the array by 1.

+5

codaddict Dec 9 '10 at 7:03

source share

There are no restrictions on complexity.

So this is a piece of cake.

 // A[1], A[2], A[3], ... A[i], ... A[n] // O(n^2) for(i=2; i<=n; i++) { duplicate = false; for(j=1; j<i; j++) if(A[i] == A[j]) {duplicate = true; break;} if(duplicate) { // "remove" A[i] by moving all elements from its left over it for(j=i; j<n; j++) A[j] = A[j+1]; n--; } }

+3

Gabi purcaru Dec 9 '10 at 7:04

source share

Removing duplicates in place, preserving the existing list order, in quadratic time:

 for (var i = 0; i < list.length; i++) { for (var j = i + 1; j < list.length;) { if (list[i] == list[j]) { list.splice(j, 1); } else { j++; } } }

The trick is to start the inner loop at i + 1 and not increase the inner counter when deleting an element.

JavaScript code, splice(x, 1) removes an element in x .

If saving an order is not a problem, you can do it faster:

 list.sort(); for (var i = 1; i < list.length;) { if (list[i] == list[i - 1]) { list.splice(i, 1); } else { i++; } }

which is linear unless you count the sort you owe, so it has a sort order - in most cases n × log (n).

+2

Theo Dec 9 '10 at 7:03

source share

In functional languages, you can combine sorting and unification (is this the real word?) In one go. Take the standard quick sort algorithm:

 - Take the first element of the input (x) and the remaining elements (xs) - Make two new lists - left: all elements in xs smaller than or equal to x - right: all elements in xs larger than x - apply quick sort on the left and right lists - return the concatenation of the left list, x, and the right list - PS quick sort on an empty list is an empty list (don't forget base case!)

If you only need unique entries, replace

left: all elements in xs smaller than or equal to x

from

left: all elements in xs smaller than x

This is a one-pass O (n log n) algorithm.

An example implementation in F #:

 let rec qsort = function | [] -> [] | x::xs -> let left,right = List.partition (fun el -> el <= x) xs qsort left @ [x] @ qsort right let rec qsortu = function | [] -> [] | x::xs -> let left = List.filter (fun el -> el < x) xs let right = List.filter (fun el -> el > x) xs qsortu left @ [x] @ qsortu right

And the test is interactive:

 > qsortu [42;42;42;42;42];; val it : int list = [42] > qsortu [5;4;4;3;3;3;2;2;2;2;1];; val it : int list = [1; 2; 3; 4; 5] > qsortu [3;1;4;1;5;9;2;6;5;3;5;8;9];; val it : int list = [1; 2; 3; 4; 5; 6; 8; 9]

+1

cfern Dec 9 '10 at 9:02

source share

Since the question of the interview usually requires the interviewer to be asked about the problems.

If alternative storage is not allowed (this is valid O (1) storage, in which you are likely to use some counters / pointers), it seems obvious that a destructive operation is expected, it might be worth pointing to the interviewer.

Now the real question is: do you want to keep the relative order of the elements? those. Should this operation be stable?

Stability greatly affects the available algorithms (and therefore complexity).

The most obvious choice is to list “Sort Algorithms,” after all, once the data is sorted, it's pretty easy to get unique items.

But if you want stability, you cannot sort the data (since you could not get the “correct” order back), and so I wonder if it is solvable less than O (N ** 2) if stability is involved.

0

Matthieu M. Dec 9 '10 at 14:19

source share

doesn't use a hash table as such, but I know its implementation behind the scenes. However, I thought I could post it in case this helps. This is in JavaScript and uses an associative array to write duplicates for transmission

 function removeDuplicates(arr) { var results = [], dups = []; for (var i = 0; i < arr.length; i++) { // check if not a duplicate if (dups[arr[i]] === undefined) { // save for next check to indicate duplicate dups[arr[i]] = 1; // is unique. append to output array results.push(arr[i]); } } return results; }

0

d48 Mar 15 '13 at 21:19

source share

Billy oneal · Accepted Answer · 2010-12-09T07:00:01+0000

Sort the source array. Find consecutive elements that are equal. (Ie what std::unique does in C ++ land). The total complexity is N lg N or just N if the input is already sorted.

To remove duplicates, you can copy elements from a later array over elements earlier in the array also in linear time. Just keep a pointer to the new logical end of the container and copy the next individual element to this new logical end at each step. (Again, just like std::unique (really, why not just load the implementation of std::unique and do exactly what it does ?: p))

Remove duplicates from an array without using a hash table

More articles: