I have an element-to-element similarity matrix configured with these tables:
items (id, ...) (Primary key `id`)
similarities (item1_id, item2_id, similarity) (Index on `item1_id` and `item2_id`)
The tables similaritiescontain pairs of identifiers with a similarity index, i.e.
item1_id item2_id similarity
1 2 0.3143
2 3 0.734
For efficient storage, “return pairs” are omitted, i.e. there is only one pair (1,2), there is no excess pair (2,1). This means that the foreign key for an element can be either item1_idor item2_id.
Now I want to find items that look like a bunch of other items, sorted in descending order. I am using this query:
SELECT `Item`.*
FROM `items` AS `Item`
LEFT JOIN `similarities` AS `Similarity`
ON (`Item`.`id` = `Similarity`.`item1_id`
AND `Similarity`.`item2_id` IN (1, 2, 3, ...))
OR (`Item`.`id` = `Similarity`.`item2_id`
AND `Similarity`.`item1_id` IN (1, 2, ,3, ...))
WHERE `Similarity`.`item1_id` IN (1, 2, 3, ...)
OR `Similarity`.`item2_id` IN (1, 2, 3, ...)
GROUP BY `Item`.`id`
ORDER BY `Similarity`.`similarity` desc
It is very slow, although it takes 4-5 seconds for ~ 100,000 items and ~ 30,000 similarity pairs. ADDITION seems to be extremely costly. Here's the EXPLAINed request :
select_type table type possible_keys key key_len ref rows Extra
SIMPLE Similarity index_merge item1_id,item2_id item1_id,item2_id 110,110 NULL 31 Using sort_union(item1_id,...
SIMPLE Item ALL PRIMARY NULL NULL NULL 136600 Using where; Using join buffer
, ? , JOIN-, .