Django: duplication when filtering across many fields

My Django application has the following models:

class Book(models.Model): name = models.CharField(max_length=100) keywords = models.ManyToManyField('Keyword') class Keyword(models.Model) name = models.CharField(max_length=100) 

I have the following saved keywords:

 science-fiction fiction history science astronomy 

On my site, a user can filter books by keyword by visiting /keyword-slug/ . The keyword_slug variable is passed to a function in my views that filters Books by keywords as follows:

 def get_books_by_keyword(keyword_slug): books = Book.objects.all() keywords = keyword_slug.split('-') for k in keywords: books = books.filter(keywords__name__icontains=k) 

This works for the most part, however whenever I filter a keyword that contains a string that appears more than once in the keyword table (e.g. science-fiction and fiction ), I get the same book more than once in the resulting QuerySet.

I know that I can add distinct to only return unique books, but I wonder why I get duplicates for a start and really want to understand why this works the way it does. Since I only call filter() on successfully filtered QuerySets, how is a duplicate book added to the results?

+4
source share
2 answers

2 models in your example are represented by 3 tables: book , keyword and book_keyword relationship table for managing the M2M field.

When using keywords__name in a filter call, Django uses SQL JOIN to join all three tables. This allows you to filter objects in the 1st table according to values ​​from another table.

SQL will look like this:

 SELECT `book`.`id`, `book`.`name` FROM `book` INNER JOIN `book_keyword` ON (`book`.`id` = `book_keyword`.`book_id`) INNER JOIN `keyword` ON (`book_keyword`.`keyword_id` = `keyword`.`id`) WHERE (`keyword`.`name` LIKE %fiction%) 

After your data looks like

 | Book Table | Relation table | Keyword table | |---------------------|------------------------------------|------------------------------| | Book ID | Book name | relation_book_id | relation_key_id | Keyword ID | Keyword name | |---------|-----------|------------------|-----------------|------------|-----------------| | 1 | Book 1 | 1 | 1 | 1 | Science-fiction | | 1 | Book 1 | 1 | 2 | 2 | Fiction | | 2 | Book 2 | 2 | 2 | 2 | Fiction | 

Then, when the data is loaded from the database in Python, you only get data from the book table. Since you can see book 1 is duplicated there

Here's how many-to-many and JOB work

+3
source

Direct quote from Docs: https://docs.djangoproject.com/en/dev/topics/db/queries/#spanning-multi-valued-relationships

Successive calls to filter () additionally limit the set of objects, but for multi-valued relationships, they apply to any object associated with the primary model, not necessarily those objects that were selected by an earlier call to filter ().

In your case, since keywords is a multi-valued relationship, your .filter() chain calls filters based only on the original model, not the previous request.

+3
source

Source: https://habr.com/ru/post/1495445/


All Articles