Getting unique elements from a container [C ++]

I was looking to get only unique items from the container. Let's say srcContainer is the container from which I want unique elements. I considered three options:

  • Using std :: unique

      std::sort(srcContainer.begin(), srcContainer.end()); srcContainer.erase(std::unique(srcContainer.begin(), srcContainer.end()), srcContainer.end()); 
  • Using BOOST :: unique

     boost::erase(srcContainer, boost::unique<boost::return_found_end>(boost::sort(srcContainer))); 
  • My own method

     std::set<T> uniqueElems(srcContainer.begin(), srcContainer.end()); srcContainer.clear(); srcContainer.insert(srcContainer.end(), uniqueElems.begin(), uniqueElems.end()); 

The problem with 1. and 2. is that they change the order in which the members occurred in the original srcContainer. Since 3. there are no changes in the order, and in addition, it gives much better performance compared to 1. and 2 (this is because the explicit sort in 3. ??) is higher. Below is the elapsed wall clock time for the 3 methods above and the number of elements in srcContainer:

  • srcContainer size (contains integers) = 1e + 6
    - std :: unique = 1.04779 secs
    - BOOST :: unique = 1.04774 seconds
    - Native method = 0.481638 sec

  • srcContainer size (contains integers) = 1e + 8
    - std :: unique = 151.554 secs
    - BOOST :: unique = 151.474 seconds
    - Native method = 57.5693 sec

My question is:

  • Is there a better way to find a unique use of std :: unique or BOOST :: unique or any other code and store the original order in the container?
  • Any problem using method 3. above.

The srcContainer was created to profile the performance of srcContainer :

 std::vector<int> srcContainer; int halfWay = numElems/2; for (size_t k=0; k<numElems; ++k) { if (k < halfWay) srcContainer.push_back(k); else srcContainer.push_back(k - halfWay); } 

Editing:
Accepting the comments, Method 3. also reorders the elements. Is there a better way to get unique items without changing the order?

thanks

+4
source share
1 answer

EDIT based on information about the source data: The reason you see the insert installation completed faster than sorting the vector is because your input data is two ranges already sorted. For quicksort (commonly used by std::sort ), this is a degenerate case and one of the worst possible inputs you could give it. For input size 1e8 sorting changes from std::sort to std::stable_sort shorten the runtime from ~ 25 s to 9 s.

If you want to keep the original order of items, you can try something like the following, which stores a hash of all the elements. I have no idea what the effectiveness of this would be, but for example, you could use a hash and remove_if , as shown below:

 struct Remover { explicit Remover(hash& found_items) : found_items_(found_items) { } bool operator()(const Iter& item) { retval = <does exist in hash>; add to hash; return retval; } hash& found_items_; }; hash dup_finder; Remover remover(dup_finder); std::erase(std::remove_if(src.begin(), src.end(), remover), src.end()); 

Original components of my answer:

If the items in the source container are already mostly sorted, you can see better performance with stable_sort rather than sorting before calling unique . I cannot guess without additional information about the yoru dataset, which could lead to option 3 working better than 1 & 2.

Option 3 should remove uniques, but keep in mind that, despite what you claim, it will still reorder the elements in the same way as the first two options.

+1
source

Source: https://habr.com/ru/post/1480168/


All Articles