I would like to know in which cases a single abstraction column with an index should be preferable to using a composite index. In my case, I have two tables with approximately 1 million data sets, one of which receives the updated value from the other table (this is the data import procedure). MySQL 5.1 and 5.5 are used in my environments.
Example:
CREATE TABLE permanent ( ref_id_1 INT(10), ref_id_2 INT(10), ref_id_3 INT(10), ref_id_4 INT(10), ref_date DATE, value INT(10)); CREATE TABLE import ( ref_id_1 INT(10), ref_id_2 INT(10), ref_id_3 INT(10), ref_id_4 INT(10), ref_date DATE, value INT(10)); //Option 1 ALTER TABLE import ADD UNIQUE INDEX idx_composite(ref_id_1,ref_id_2,ref_id_3,ref_id_4,ref_date); //Option 2 ALTER TABLE import ADD hash_col CHAR(32); UPDATE import SET hash_col = MD5(CONCAT(ref_id_1,ref_id_2,ref_id_3,ref_id_4,ref_date)); ALTER TABLE import ADD UNIQUE INDEX idx_hash_col(hash_col);
Of course, the constant table will also have hash_col and the necessary indexes. Now two possible updates / merges will be:
//Join via columns UPDATE permanent INNER JOIN import ON import.ref_id_1 = permanent.ref_id_2 AND import.ref_id_2 = permanent.ref_id_2 AND import.ref_id_3 = permanent.ref_id_3 AND import.ref_id_4 = permanent.ref_id_4 AND import.ref_date = permanent.ref_date SET permanent.value = import.value; //Join via Hash-col UPDATE permanent INNER JOIN import ON import.hash_col = permanent.hash_col SET permanent.value = import.value
So which approach should be preferred? Is there a thumb rule like "if you have more than X columns, use a hash instead." Thanks in advance!
ps this is my first question here, so please excuse me if something is missing.
source share