There is some uncertainty as to whether you want to prevent duplicate inserts into the database. You can simply select unique pairs while keeping duplicates.
So, here is an alternative solution for the latter case, requesting unique pairs, even if duplicates exist:
SELECT r1.* FROM Relationships r1 LEFT OUTER JOIN Relationships r2 ON (r1.person_1 = r2.person_2 AND r1.person_2 = r2.person_1) WHERE r1.person_1 < r1.person_2 OR r2.person_1 IS NULL;
So, if there is a corresponding line with a changed id id, there is a rule for which a request is preferable (one that has an identifier in numerical order).
If there is no corresponding line, then r2 will be NULL (this is how the outer join works), so just use what is found in r1 in this case.
No need to use GROUP BY or DISTINCT , because there can only be zero or one matching row.
Trying this in MySQL, I get the following optimization plan:
+----+-------------+-------+--------+---------------+---------+---------+-----------------------------------+------+--------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+--------+---------------+---------+---------+-----------------------------------+------+--------------------------+ | 1 | SIMPLE | r1 | ALL | NULL | NULL | NULL | NULL | 2 | | | 1 | SIMPLE | r2 | eq_ref | PRIMARY | PRIMARY | 8 | test.r1.person_2,test.r1.person_1 | 1 | Using where; Using index | +----+-------------+-------+--------+---------------+---------+---------+-----------------------------------+------+--------------------------+
This is apparently a pretty good use of indexes.
source share