Django ORM: QuerySets Consolidation

I am trying to use Django ORM for a task that requires a JOIN in SQL. I already have a workaround that performs the same task with several queries and some processing outside the database, but I am not satisfied with the complexity of the execution.

First, I would like to give you a brief introduction to the relevant part of my model. After that, I will explain the task in English, SQL and (inefficient) Django ORM.

Model

In my CMS model, messages are multilingual: there can be one instance of message content for each message and each language. In addition, when editing messages, I do not UPDATE , but INSERT their new versions.

So, PostContent is unique on post , language and version . Here's the class:

 class PostContent(models.Model): """ contains all versions of a post, in all languages. """ language = models.ForeignKey(Language) post = models.ForeignKey(Post) # the Post object itself only version = models.IntegerField(default=0) # contains slug and id. # further metadata and content left out class Meta: unique_together = (("resource", "language", "version"),) 

Task in SQL

And this is the task: I would like to get a list of the latest versions of all messages in each language using ORM. In SQL, this means JOIN in the subquery that makes GROUP BY and MAX to get the maximum version for each unique resource and language pair. The ideal answer to this question is the number of ORM calls that invoke the following SQL statement:

 SELECT id, post_id, version, v FROM cms_postcontent, (SELECT post_id as p, max(version) as v, language_id as l FROM cms_postcontent GROUP BY post_id, language_id ) as maxv WHERE post_id=p AND version=v AND language_id=l; 

Solution in Django

My current solution using Django ORM does not create such a JOIN, but two separate SQL queries, and one of these queries can become very large. First I execute a subquery (inner SELECT on top):

 maxv = PostContent.objects.values('post','language').annotate( max_version=Max('version')) 

Now, instead of joining maxv , I explicitly request every message in maxv , filtering PostContent.objects.all() for each tuple post, language, max_version . SQL result looks like

 SELECT * FROM PostContent WHERE post=P1 and language=L1 and version=V1 OR post=P2 and language=L2 and version=V2 OR ...; 

In Django:

 from django.db.models import Q conjunc = map(lambda pc: Q(version=pc['max_version']).__and__( Q(post=pc['post']).__and__( Q(language=pc['language']))), maxv) result = PostContent.objects.filter( reduce(lambda disjunc, x: disjunc.__or__(x), conjunc[1:], conjunc[0])) 

If maxv is small enough, for example. when receiving one message this may be a good solution, but the size of the request and the time of its creation grow linearly with the number of messages. The complexity of query analysis is also at least linear.

Is there a better way to do this other than using raw SQL?

+4
source share
1 answer

You can join (in the sense of union) queries with the | if the queries request the same model.

However, it looks like you want something like PostContent.objects.order_by('version').distinct('language') ; as you cannot do this in 1.3.1, consider using values in combination with distinct() to get the desired effect.

0
source

Source: https://habr.com/ru/post/1400858/


All Articles