How to improve query performance in Django administration in related fields (MySQL)

In Django, I have this:

models.py

class Book(models.Model): isbn = models.CharField(max_length=16, db_index=True) title = models.CharField(max_length=255, db_index=True) ... other fields ... class Author(models.Model): first_name = models.CharField(max_length=128, db_index=True) last_name = models.CharField(max_length=128, db_index=True) books = models.ManyToManyField(Book, blank=True) ... other fields ... 

admin.py

 class AuthorAdmin(admin.ModelAdmin): search_fields = ('first_name', 'last_name', 'books__isbn', 'books__title') ... 

My problem is that when I search the admin list page with two or more short terms, MySQL starts to take a lot of time (at least 8 seconds for a query with three terms). I have about 5,000 authors and 2,500 books. In short, it is very important. If I search for "abc", so there are 3 really short terms, I'm not patient enough to wait for the result (I waited at least 2 minutes). Instead, if I search for β€œall bee hints”, I get a result in 2 seconds. So the problem is being really short members in related areas.

The SQL query resulting from this search has a lot of JOIN, LIKE, AND, and OR, but there is no subquery.

I am using MySQL 5.1, but I tried with 5.5 without any success.

I also tried increasing the value of innodb_buffer_pool_size to a really big value. It does not change anything.

The only idea I have now to improve performance is to denormalize the isbn and title field (i.e. copy them directly to the Authors), but I will have to add a bunch of mechanics to synchronize these fields with the real ones in the book.

Any suggestions for improving this query?

+4
source share
2 answers

After many studies, I found that the problem is related to how the search query is constructed for the admin search field (in the ChangeList class). In multi-mode searches (words separated by a space), each term is added to the QuerySet, catching a new filter() . When search_fields has one or more related fields, the generated SQL query will have many JOIN chained one after another with many JOIN for each related field (see My related question for some examples and more details). This chain of JOIN exists so that each term will only search in a subset of the data filter using the use case term AND, most importantly, the associated field should have only one term (vs, which needs to have EVERYTHING) to make the match. See Pinning ambiguous relationships in Django docs for more information on this. I am pretty sure that this behavior was most required for the admin search field.

The disadvantage of this query (with related fields) is that the performance change (query execution time) can be very large. It depends on many factors: the number of searched terms, search terms, type of search in the field (VARCHAR, etc.), the amount of search in the field, data in tables, size of tables, etc. With the right combination, it’s easy for a query to be executed mostly forever (a query that takes more than 10 minutes for me is a query that runs forever in the context of this search field).

The reason this can take so long is that the database needs to create a temporary table for each term and look at it completely to find the next term. Thus, it adds up very quickly.

A possible change that needs to be made to improve performance is ANDed all members in the same filter() . Thus, there will be only one JOIN related field (or 2, if this is a lot for many), and not much more. This query will be much faster and with very little change in performance. The disadvantage is that the related fields must have ALL CONDITIONS to match, so in many cases you can get fewer matches.

UPDATE

As trinchet heres asked, what was needed to change the search behavior (for Django 1.7). You must override get_search_results() of the admin classes where you want to perform this search. You need to copy all the method code from the base class ( ModelAdmin ) into your own class. Then you need to change these lines:

 for bit in search_term.split(): or_queries = [models.Q(**{orm_lookup: bit}) for orm_lookup in orm_lookups] queryset = queryset.filter(reduce(operator.or_, or_queries)) 

To that:

 and_queries = [] for bit in search_term.split(): or_queries = [models.Q(**{orm_lookup: bit}) for orm_lookup in orm_lookups] and_queries.append(Q(reduce(operator.or_, or_queries))) queryset = queryset.filter(reduce(operator.and_, and_queries)) 

This code has not been verified. My original code was for Django 1.4, and I just adapt it for 1.7 here.

+7
source

You can override get_changelist for a subclass of ModelAdmin and try to optimize the query manually there . For example, you can search for ISBNs with exact matches instead of icons, and you can add subqueries in a book to work faster.

+1
source

Source: https://habr.com/ru/post/1401009/


All Articles