I am trying to use Django ORM for a task that requires a JOIN in SQL. I already have a workaround that performs the same task with several queries and some processing outside the database, but I am not satisfied with the complexity of the execution.
First, I would like to give you a brief introduction to the relevant part of my model. After that, I will explain the task in English, SQL and (inefficient) Django ORM.
Model
In my CMS model, messages are multilingual: there can be one instance of message content for each message and each language. In addition, when editing messages, I do not UPDATE , but INSERT their new versions.
So, PostContent is unique on post , language and version . Here's the class:
class PostContent(models.Model): """ contains all versions of a post, in all languages. """ language = models.ForeignKey(Language) post = models.ForeignKey(Post)
Task in SQL
And this is the task: I would like to get a list of the latest versions of all messages in each language using ORM. In SQL, this means JOIN in the subquery that makes GROUP BY and MAX to get the maximum version for each unique resource and language pair. The ideal answer to this question is the number of ORM calls that invoke the following SQL statement:
SELECT id, post_id, version, v FROM cms_postcontent, (SELECT post_id as p, max(version) as v, language_id as l FROM cms_postcontent GROUP BY post_id, language_id ) as maxv WHERE post_id=p AND version=v AND language_id=l;
Solution in Django
My current solution using Django ORM does not create such a JOIN, but two separate SQL queries, and one of these queries can become very large. First I execute a subquery (inner SELECT on top):
maxv = PostContent.objects.values('post','language').annotate( max_version=Max('version'))
Now, instead of joining maxv , I explicitly request every message in maxv , filtering PostContent.objects.all() for each tuple post, language, max_version . SQL result looks like
SELECT * FROM PostContent WHERE post=P1 and language=L1 and version=V1 OR post=P2 and language=L2 and version=V2 OR ...;
In Django:
from django.db.models import Q conjunc = map(lambda pc: Q(version=pc['max_version']).__and__( Q(post=pc['post']).__and__( Q(language=pc['language']))), maxv) result = PostContent.objects.filter( reduce(lambda disjunc, x: disjunc.__or__(x), conjunc[1:], conjunc[0]))
If maxv is small enough, for example. when receiving one message this may be a good solution, but the size of the request and the time of its creation grow linearly with the number of messages. The complexity of query analysis is also at least linear.
Is there a better way to do this other than using raw SQL?