I am currently writing an application that allows you to store images and then tag these images. I use Python and Peewee ORM (http://charlesleifer.com/docs/peewee/), which is very similar to Django ORM.
My data model looks like this (simplified):
class Image(BaseModel): key = CharField() class Tag(BaseModel): tag = CharField() class TagRelationship(BaseModel): relImage = ForeignKeyField(Image) relTag = ForeignKeyField(Tag)
Now I understand conceptually how to request for all images that have a given set of tags:
SELECT Image.key FROM Image INNER JOIN TagRelationship ON Image.ID = TagRelationship.ImageID INNER JOIN Tag ON TagRelationship.TagID = Tag.ID WHERE Tag.tag IN ( 'A' , 'B' )
However, I also want to be able to perform more complex searches. In particular, I would like to specify a list of "all tags", that is, the image should have all the specified tags to return, as well as a list of any and a list of none.
EDIT: I would like to clarify this a bit. In particular, the above request is an "all tags" request. It returns images that have all the tags. I want to be able to specify something like: "Give me all the images that have tags (green, mountain), any of the tags (background, landscape), but not tags (digital, drawings)."
Now, ideally, I would like it to be a single SQL query, because paging is very easy with LIMIT and OFFSET. I actually have an implementation in which I just load everything into Python sets and then use various intersection operators. What am I interested in if there is a way to do it all at once?
Also, for those who are interested, I sent an email to the Peewee author on how to submit the above request using Peewee, and he responded with the following solution:
Image.select(['key']).group_by('key').join(TagRelationship).join(Tag).where(tag__in=['tag1', 'tag2']).having('count(*) = 2')
Or, alternatively, a shorter version:
Image.filter(tagrelationship_set__relTag__tag__in=['tag1', 'tag2']).group_by(Image).having('count(*) = 2')
Thanks in advance for your time.