Your method works, but you are mistaken with your complexity.
Firstly, testing if an element in std::map has complexity O(log(n) * f) , where n is the number of elements on the map and f is the upper bound on the time required to compare any two elements to insert / find on the map.
In your case, each line has a length m , so comparing any two elements in the map costs O(m) .
So the overall complexity of your method is this:
O(n * log(n) * m) to insert n lines into the map.
However, you can speed it up to O(n * m) in anticipation of what is asymptotically optimal (because you have to read all the data) using a hash table, not a map. The reason for this is that the hash table has an average complexity of O(1) for the insert operation, and the hash function for each input row is calculated only once.
In C++ you can use unordered_set for this.
source share