List of parent objects and their children with fewer queries

I have a Django view that I am trying to optimize. It displays a list of parent objects on the page along with its children. The child model has a foreign key for the parent, so select_related does not seem to apply.

 class Parent(models.Model): name = models.CharField(max_length=31) class Child(models.Model): name = models.CharField(max_length=31) parent = models.ForeignKey(Parent) 

A naive implementation uses n + 1 queries, where n is the number of parent objects, i.e. one query to retrieve the parent list, then one query to retrieve the children of each parent.

I wrote a view that does the job in two queries: one to retrieve the parent objects, the other to get the related children, then some Python (which I'm too embarrassed to post here) to put it all together again.

As soon as I found that I was importing the standard collections library module, I realized that I was probably wrong. There is probably a much simpler way, but I lack the experience of Django to find it. Any pointers would be greatly appreciated!

+4
source share
3 answers

Add related_name to the foreign key, then use prefetch_related , which is added in Django 1.4:

Returns a QuerySet that will be automatically retrieved in one batch, related objects for each of the specified queries.

This has a similar purpose for select_related , since both are designed to prevent floods of database queries caused by access to related objects, but the strategy is completely different:

  • select_related works by creating an SQL join and including the fields of the associated object in the SELECT . For this reason, select_related retrieves related objects in a single database query. However, to avoid a much larger set of results, which is joining to "many" relationships, select_related limited to a one-to-one relationship - a foreign key and one to one.

  • prefetch_related , on the other hand, does a separate search for each relationship, and does a β€œjoin” in Python. This allows prefetch many-to-many and many-to-one objects , which cannot be done using select_related , in addition to the foreign key and one-to-one relationships supported by select_related . It also supports GenericRelation and GenericForeignKey .

 class Parent(models.Model): name = models.CharField(max_length=31) class Child(models.Model): name = models.CharField(max_length=31) parent = models.ForeignKey(Parent, related_name='children') >>> Parent.objects.all().prefetch_related('children') 

All relevant children will be loaded in a single query and used to do QuerySets, which have a pre-populated cache of relevant Results. These QuerySets are then used in self.children.all() calls.

Note 1 , which, as always with QuerySets, any subsequent chaining methods that involve another database query will ignore previously cached results and retrieve data using a new database query.

Note 2 that if you use iterator() to run a request, calls to prefetch_related() will be ignored, as these two optimizations do not make sense together.

+3
source

If you need to work with more than two levels at the same time, you can consider a different approach to storing trees in db using MPTT

In short, it adds data to your model that is updated during updates and provides a much more efficient search.

+3
source

Actually, select_related is what you are looking for. select_related creates a JOIN so that all the data you need is retrieved in a single expression. prefetch_related runs all the requests at once, and then caches them.

The trick here is to β€œjoin” what you absolutely need to reduce the penalty for connection performance. β€œWhat you absolutely need” is a long way of saying that you should pre-select only the fields that you will read later in your view or template. There is some good documentation here: https://docs.djangoproject.com/en/1.4/ref/models/querysets/#select-related

This is a snippet from one of my models where I ran into a similar problem:

 return QuantitativeResult.objects.select_related( 'enrollment__subscription__configuration__analyte', 'enrollment__subscription__unit', 'enrollment__subscription__configuration__analyte__unit', 'enrollment__subscription__lab', 'enrollment__subscription__instrument_model' 'enrollment__subscription__instrument', 'enrollment__subscription__configuration__method', 'enrollment__subscription__configuration__reagent', 'enrollment__subscription__configuration__reagent__manufacturer', 'enrollment__subscription__instrument_model__instrument__manufacturer' ).filter(<snip, snip - stuff edited out>) 

In this pathological case, I went down from 700+ requests to one. The django debug toolbar is your friend when it comes to this kind of problem.

0
source

Source: https://habr.com/ru/post/1438822/


All Articles