Keep duplicate items separately

I have std::vector<std::string> textLines , which contains a large number of, for example, city names. I delete duplicates with:

 using namespace std; vector<string>::iterator iter; sort(textLines.begin(), textLines.end()); iter = unique(textLines.begin(), textLines.end()); 

At this point, the repeating elements are all zero (empty) lines at the end of the vector with the same size as before unique() .

I delete them with

 textLines.resize(distance(textLines.begin(), iter)); 

This works fine, but is there a way to keep deleted duplicates? It would be better (for me) if the duplicates were simply carried over to the end and not replaced by empty lines.

The new end is indicated by the iter returned from unique() , so there is no problem finding the new end of the vector.

In other words, I want to know which rows had duplicates and which did not.

+5
source share
3 answers

You can do this very simply, without significantly changing your logic. You can store duplicates in another container that is captured by the comparison predicate passed to unique() :

 vector<string> duplicates; auto iter = unique(textLines.begin(), textLines.end(), [&duplicates](auto& first, auto& second) -> bool { if (first == second) { duplicates.push_back(second); return true; } return false; }); 

Real-time example: here .

+6
source

You can always write your own function, which is recommended for cases like yours, where you have a specific request. Sort of:

 //Define a "bool has(const vector &v, int element)" function before vector<string> nonDuplicates; vector<string> duplicates; for (auto i : textList) { if (has(nonDupicates, i)) { duplicates.push(i); } else { nonDuplicates.push(i); } } 

This is not a very elegant or quick way to do this, so you can find a better way, but if you do, use the binary search for has () if you sorted it

0
source

With this solution, you will need additional memory to store the number of items.

 vector<string>::iterator iter; vector<string> v{ "a", "b", "a", "t", "a", "g", "t" }; sort(v.begin(), v.end()); // Find number of distinct elements int count = 1; auto current = v.cbegin(); for (auto i = v.cbegin() + 1; i < v.cend(); ++i) { if (*i != *current) { ++count; current = i; } } // Count every entry vector<int> vCount(count); auto currentCount = vCount.begin(); ++*currentCount; for (size_t i = 1; i < v.size(); ++i) { if (v[i] == v[i-1]) ++*currentCount; else *++currentCount = 1; } iter = unique(v.begin(), v.end()); 
0
source

Source: https://habr.com/ru/post/1275336/


All Articles