Effective algorithm for checking duplicate rows in a matrix

Given the matrix M of integers. Check if the two rows in the matrix are identical. Give an optimal approach.

Example: [{1, 2, 3}, {3, 4, 5}, {1, 2, 3}] 

In the above matrix, rows 1 and 3 are identical.

Possible Solution:

 Given a matrix, we can convert each row in a string (example using to_string() method of C++ and concatenating each element in a row to a string). We do this for every row of the matrix, and insert it in a table that is something like (map<string, int> in C++). And hence, duplicate row can be checked in O(mn) time for an mxn matrix. 

Can I do better than this? Or, does the method above have any flaw?

+6
source share
2 answers

Your method works, but you are mistaken with your complexity.

Firstly, testing if an element in std::map has complexity O(log(n) * f) , where n is the number of elements on the map and f is the upper bound on the time required to compare any two elements to insert / find on the map.

In your case, each line has a length m , so comparing any two elements in the map costs O(m) .

So the overall complexity of your method is this:

O(n * log(n) * m) to insert n lines into the map.

However, you can speed it up to O(n * m) in anticipation of what is asymptotically optimal (because you have to read all the data) using a hash table, not a map. The reason for this is that the hash table has an average complexity of O(1) for the insert operation, and the hash function for each input row is calculated only once.

In C++ you can use unordered_set for this.

+6
source

Depending on the size of the matrix, converting everything to a string seems like a pretty big waste of time and space.

Why not calculate the likely unique hash for each line. For example, you can compute a bit-wise OR of all entries, and then save this hash along with the row index in the multimage. When you go through each line, you calculate its hash, and then check if that hash exists. If so, compare your line with other lines with the same hash to make sure they are equal.

It does not have the best Big-O complexity, but it almost certainly has a smaller constant and uses less space.

0
source

Source: https://habr.com/ru/post/956209/


All Articles