Replace duplicate numbers with unique numbers from 0- (N-1)

History:

I have an array of N positive random numbers that probably contain duplicates. e.g. 10,4,5,7,10,9,10,9,8,10,5
Edit : N will probably be 32, or some other force equal to two sizes.

Problem:

I am trying to find the fastest way to replace duplicates with the missing numbers from 0- (N-1). Using the above example, I want to get a result that looks like this:
10,4,5,7,0,9,1,2,8,3,6
The goal is to have one of each number from 0 to N-1 without replacing all the numbers 0- (N-1) (random order is important). Change It is also important that this replacement is deterministic, i.e. the same input will have the same output (not random).

My decision:

Currently, Java uses 2 Boolean arrays to track used / unused numbers (unique numbers / missing numbers in the range [0, N)) and has an approximate worst-case execution time N + N * sqrt (N).
Code follows:

public byte[] uniqueify(byte[] input) { boolean[] usedNumbers = new boolean[N]; boolean[] unusedIndices = new boolean[N]; byte[] result = new byte[N]; for(int i = 0; i < N; i++) // first pass through { int newIdx = (input[i] + 128) % N; // first make positive if(!usedNumbers[newIdx]) // if this number has not been used { usedNumbers[newIdx] = true; // mark as used result[i] = newIdx; // save it in the result } else // if the number is used { unusedIndices[i] = true; // add it to the list of duplicates } } // handle all the duplicates for(int idx = 0; idx < N; idx++) // iterate through all numbers { if(unusedIndices[idx]) // if unused for(int i = 0; i < N; i++) // go through all numbers again { if(!usedNumbers[i]) // if this number is still unused { usedNumbers[i] = true; // mark as used result[i] = idx; break; } } } return result; } 

It seems the fastest I can hope for, but I thought I would ask the Internet because there are people who are much smarter than me, who can have a better solution.

NB Suggestions / solutions do not have to be in Java.

Thanks.

Change I forgot to mention that I am converting this to C ++. I published my implementation of Java because it is more complete.

+6
source share
7 answers

The fastest way to do this is probably the easiest. I would go through a list of data containing a count of each individual value and a label where duplicates appeared. Then it’s just a matter of creating a list of unused values ​​and applying them one at a time in places where duplicates were found.

A mystery, since it could be using a C ++ List , if speed makes sense, a simple C array is the most efficient.

This program shows the principle.

 #include <iostream> #include <cstring> using namespace std; int main() { int data[] = { 10, 4, 5, 7, 10, 9, 10, 9, 8, 10, 5 }; int N = sizeof(data) / sizeof(data[0]); int tally[N]; memset(tally, 0, sizeof(tally)); int dup_indices[N]; int ndups = 0; // Build a count of each value and a list of indices of duplicate data for (int i = 0; i < N; i++) { if (tally[data[i]]++) { dup_indices[ndups++] = i; } } // Replace each duplicate with the next value having a zero count int t = 0; for (int i = 0; i < ndups; i++) { while (tally[t]) t++; data[dup_indices[i]] = t++; } for (int i = 0; i < N; i++) { cout << data[i] << " "; } return 0; } 

Exit

 10 4 5 7 0 9 1 2 8 3 6 
+1
source

Use a balanced binary search tree to keep track of used / unused numbers instead of a boolean array. Then you use time n log n .

The simplest solution would be the following:

  • Scroll through the list and create an "unused" BST
  • Go through the list a second time, keeping track of the numbers seen so far in the "used" BST
  • If a duplicate is found, replace it with a random β€œunused” BST element.
+5
source

This is how I wrote it.

 public static int[] uniqueify(int... input) { Set<Integer> unused = new HashSet<>(); for (int j = 0; j < input.length; j++) unused.add(j); for (int i : input) unused.remove(i); Iterator<Integer> iter = unused.iterator(); Set<Integer> unique = new LinkedHashSet<>(); for (int i : input) if (!unique.add(i)) unique.add(iter.next()); int[] result = new int[input.length]; int k = 0; for (int i : unique) result[k++] = i; return result; } public static void main(String... args) { System.out.println(Arrays.toString(uniqueify(10, 4, 5, 7, 10, 9, 10, 9, 8, 10, 5))); } 

prints

 [10, 4, 5, 7, 0, 9, 1, 2, 8, 3, 6] 
+2
source

My approach would be 1. Copy the array into Set in Java.

Set will automatically remove duplicates as much as possible (since Sun Micro implemented it, usually their approach is the fastest, like .. using TimSort for sorting, etc.)

  • Calculate the size () of the set.

  • size will not give you duplicates.

  • now copy the array 0-n-1 into the same set ... missing values ​​will be inserted.

+1
source

I think this is possible even when n running. The idea is to keep track of the elements used in the source list and the additional elements used during processing in a separate array. A possible implementation of Java is as follows:

 int[] list = { 10, 4, 5, 7, 10, 9, 10, 9, 8, 10, 5 }; boolean[] used = new boolean[list.length]; for (int i : list) { used[i] = true; } boolean[] done = new boolean[list.length]; int nextUnused = 0; Arrays.fill(done, false); for (int idx = 0; idx < list.length; idx++) { if (done[list[idx]]) { list[idx] = nextUnused; } done[list[idx]] = true; while (nextUnused < list.length && (done[nextUnused] || used[nextUnused])) { nextUnused++; } } System.out.println(Arrays.toString(list)); 
0
source
 List<Integer> needsReplaced = newLinkedList<Integer>(); boolean[] seen = new boolean[input.length]; for (int i = 0; i < input.length; ++i) { if (seen[input[i]]) { needsReplaced.add(i); } else { seen[input[i]] = true; } } int replaceWith = 0; for (int i : needsReplaced) { while (seen[replaceWith]) { ++replaceWith; } input[i] = replaceWith++; } 

This should behave at about 2n. List operations are constant time, and although this second loop looks nested, the outer loop works much less than n iterations, and the inner loop will only be executed n times.

0
source

C #, but it is easy to convert to java. O (n).

  int[] list = { 0, 0, 6, 0, 5, 0, 4, 0, 1, 2, 3 }; int N = list.length; boolean[] InList = new boolean[N]; boolean[] Used = new boolean[N]; int[] Unused = new int[N]; for (int i = 0; i < N; i++) InList[list[i]] = true; for (int i = 0, j = 0; i < N; i++) if (InList[i] == false) Unused[j++] = i; int UnusedIndex = 0; for (int i = 0; i < N; i++) { if (Used[list[i]] == true) list[i] = Unused[UnusedIndex++]; Used[list[i]] = true; } 

Edit: tried to convert it to java from C #. I do not have java, so it cannot compile, but it is easy to fix. Arrays may need to be initialized to false if java does not do this automatically.

0
source

Source: https://habr.com/ru/post/912574/


All Articles