I have a simple question about the most efficient way to perform a specific connection.
Take these three tables, real names have been changed to protect the innocent:
Table: animal
animal_id name ...
========================
1 bunny
2 bear
3 cat
4 mouse
Table: tags
tag_id tag
===================
1 fluffy
2 brown
3 cute
4 small
Map Table: animal_tag
animal_id tag_id
===================
eleven
12
thirteen
2 2
3 4
4 2
I want to find all animals that are marked as “fluffy,” “brown,” and “cute.” That is, the animal must be labeled with all three. In reality, the number of tags required may vary, but this does not matter for discussion. This is the query I came up with:
SELECT * FROM animal JOIN ( SELECT at.animal_id FROM animal_tag at WHERE at.tag_id IN ( SELECT tg.tag_id FROM tag tg WHERE tg.tag='fluffy' OR tg.tag='brown' OR tg.tag='cute' ) GROUP BY at.animal_id HAVING COUNT(at.tag_id)=3 ) AS jt ON animal.animal_id=jt.animal_id
On a table with thousands of "animals" and "hundreds" of tags, this request is executed with respect ... 10 with milliseconds. However, when I look at the query plan (Apache Derby is the database), the estimated cost of the optimizer is quite high (9945.12), and the plan is quite extensive. For a query, this “simple” one, I usually try to get query plans with an approximate cost of one or two digits.
So my question is: is there a better way to fulfill this request? Sounds like a simple request, but I was at a dead end, coming up with something better.
source share