I thought that such a query would be quite easy due to the nature of relational databases, but it seems to be useful to me. I also searched around but found nothing that really helped. Here's the situation:
Say I have a simple link for products and product tags. This is a one-to-many relationship, so we can have the following:
productid | tag ======================== 1 | Car 1 | Black 1 | Ford 2 | Car 2 | Red 2 | Ford 3 | Car 3 | Black 3 | Lexus 4 | Motorcycle 4 | Black 5 | Skateboard 5 | Black 6 | Skateboard 6 | Green
What is the most efficient way to request for everyone (Ford OR Black OR Skateboard) AND NOT (Motorcycles OR Green) ? Another question I need to do is something like (Car) or (Skateboard) or (Green AND Motorcycle) or (Red AND Motorcycle) .
The product table contains about 150 thousand records in the product table and 600 thousand records in the tag tables, so the query should be as efficient as possible. Here is one query I was messing around with (example # 1), but it seems to take about 4 seconds or so. Any help would be greatly appreciated.
SELECT p.productid FROM products p JOIN producttags tag1 USING (productid) WHERE p.active = 1 AND tag1.tag IN ( 'Ford', 'Black', 'Skatebaord' ) AND p.productid NOT IN (SELECT productid FROM producttags WHERE tag IN ( 'Motorcycle', 'Green' ));
Update
The fastest query I've found so far is something like this. It takes 100-200 ms, but it seems pretty inflexible and ugly. I basically grab all the products that match Ford , Black or Skateboard . They I combine all the tags for these agreed products in a line separated by a colon, and delete all products that match :Green: And :Motorcycle: Any thoughts?
SELECT p.productid, Concat(':', Group_concat(alltags.tag SEPARATOR ':'), ':') AS taglist FROM products p JOIN producttags tag1 USING (productid) JOIN producttags alltags USING (productid) WHERE p.active = 1 AND tag1.tag IN ( 'Ford', 'Black', 'Skateboard' ) GROUP BY tag1.productid HAVING ( taglist NOT LIKE '%:Motorcycle:%' AND taglist NOT LIKE '%:Green:%' );