Search for many-to-many relationships

I am currently writing an application that allows you to store images and then tag these images. I use Python and Peewee ORM (http://charlesleifer.com/docs/peewee/), which is very similar to Django ORM.

My data model looks like this (simplified):

class Image(BaseModel): key = CharField() class Tag(BaseModel): tag = CharField() class TagRelationship(BaseModel): relImage = ForeignKeyField(Image) relTag = ForeignKeyField(Tag) 

Now I understand conceptually how to request for all images that have a given set of tags:

 SELECT Image.key FROM Image INNER JOIN TagRelationship ON Image.ID = TagRelationship.ImageID INNER JOIN Tag ON TagRelationship.TagID = Tag.ID WHERE Tag.tag IN ( 'A' , 'B' ) -- list of multiple tags GROUP BY Image.key HAVING COUNT(*) = 2 -- where 2 == the number of tags specified, above 

However, I also want to be able to perform more complex searches. In particular, I would like to specify a list of "all tags", that is, the image should have all the specified tags to return, as well as a list of any and a list of none.

EDIT: I would like to clarify this a bit. In particular, the above request is an "all tags" request. It returns images that have all the tags. I want to be able to specify something like: "Give me all the images that have tags (green, mountain), any of the tags (background, landscape), but not tags (digital, drawings)."

Now, ideally, I would like it to be a single SQL query, because paging is very easy with LIMIT and OFFSET. I actually have an implementation in which I just load everything into Python sets and then use various intersection operators. What am I interested in if there is a way to do it all at once?

Also, for those who are interested, I sent an email to the Peewee author on how to submit the above request using Peewee, and he responded with the following solution:

 Image.select(['key']).group_by('key').join(TagRelationship).join(Tag).where(tag__in=['tag1', 'tag2']).having('count(*) = 2') 

Or, alternatively, a shorter version:

 Image.filter(tagrelationship_set__relTag__tag__in=['tag1', 'tag2']).group_by(Image).having('count(*) = 2') 

Thanks in advance for your time.

+6
source share
3 answers
 SELECT Image.key FROM Image JOIN TagRelationship ON Image.ID = TagRelationship.ImageID JOIN Tag ON TagRelationship.TagID = Tag.ID GROUP BY Image.key HAVING SUM(Tag.tag IN ( mandatory tags )) = N /*the number of mandatory tags*/ AND SUM(Tag.tag IN ( optional tags )) > 0 AND SUM(Tag.tag IN ( prohibited tags )) = 0 

UPDATE

A more universal version of the above query (converts the logical results of IN predicates into integers using CASE expressions):

 SELECT Image.key FROM Image JOIN TagRelationship ON Image.ID = TagRelationship.ImageID JOIN Tag ON TagRelationship.TagID = Tag.ID GROUP BY Image.key HAVING SUM(CASE WHEN Tag.tag IN ( mandatory tags ) THEN 1 ELSE 0 END) = N /*the number of mandatory tags*/ AND SUM(CASE WHEN Tag.tag IN ( optional tags ) THEN 1 ELSE 0 END) > 0 AND SUM(CASE WHEN Tag.tag IN ( prohibited tags ) THEN 1 ELSE 0 END) = 0 

or using COUNT instead of SUM:

 SELECT Image.key FROM Image JOIN TagRelationship ON Image.ID = TagRelationship.ImageID JOIN Tag ON TagRelationship.TagID = Tag.ID GROUP BY Image.key HAVING COUNT(CASE WHEN Tag.tag IN ( mandatory tags ) THEN 1 END) = N /*the number of mandatory tags*/ AND COUNT(CASE WHEN Tag.tag IN ( optional tags ) THEN 1 END) > 0 AND COUNT(CASE WHEN Tag.tag IN ( prohibited tags ) THEN 1 END) = 0 
+4
source

The top half gets the words matching the required tags. The bottom half contains tags that must have at least 1. There is no GROUP BY in the bottom query because I want to know if the image appears twice. If so, he has a background and landscape. The counter ORDER BY (*) will take pictures with the BOTH background and landscape tags so that they appear at the top. Thus, the green, mountainous, background landscape will be the most relevant. Then green, mountain, background or landscape.

 SELECT Image.key, count(*) AS 'relevance' FROM (SELECT Image.key FROM --good image candidates (SELECT Image.key FROM Image WHERE Image.key NOT IN --Bad Images (SELECT DISTINCT(Image.key) --Will reduce size of set, remove duplicates FROM Image INNER JOIN TagRelationship ON Image.ID = TagRelationship.ImageID INNER JOIN Tag ON TagRelationship.TagID = Tag.ID WHERE Tag.tag IN ('digital', 'drawing' ))) INNER JOIN TagRelationship ON Image.ID = TagRelationship.ImageID INNER JOIN Tag ON TagRelationship.TagID = Tag.ID WHERE Tag.tag IN ('green', 'mountain') GROUP BY Image.key HAVING COUNT(*) = count('green', 'mountain') --we need green AND mountain UNION ALL --Get all images with one of the following 2 tags SELECT * FROM (SELECT Image.key FROM Image INNER JOIN TagRelationship ON Image.ID = TagRelationship.ImageID INNER JOIN Tag ON TagRelationship.TagID = Tag.ID WHERE Tag.tag IN ( 'background' , 'landscape' )) ) GROUP BY Image.key ORDER BY relevance DESC 
+2
source

The following query should return all images marked with both ('A' and 'B') and ('C' OR 'D'), but not 'E' and 'F'

 SELECT Image.key FROM Image INNER JOIN TagRelationship ON Image.ID = TagRelationship.ImageID INNER JOIN Tag tag1 ON TagRelationship.TagID = tag1.ID INNER JOIN Tag tag2 ON TagRelationship.TagID = tag2.ID WHERE tag1.tag IN ( 'A' , 'B' ) AND tag2.tag NOT IN ('E', 'F') GROUP BY Image.key HAVING COUNT(*) = 2 UNION SELECT Image.key FROM Image INNER JOIN TagRelationship ON Image.ID = TagRelationship.ImageID INNER JOIN Tag tag1 ON TagRelationship.TagID = tag1.ID INNER JOIN Tag tag2 ON TagRelationship.TagID = tag2.ID WHERE tag1.tag IN ( 'C' , 'D' ) AND tag2.tag NOT IN ('E', 'F') 
0
source

Source: https://habr.com/ru/post/906077/


All Articles