Are Django QuerySets lazy enough to handle large datasets?

I think I read somewhere that Django ORM lazily loads objects. Let's say I want to update a large set of objects (for example, 500,000) in a batch update operation. Is it possible to simply iterate over a very large QuerySet, load, update and save objects when I go?

Similarly, if I wanted to allow pagination of all these thousands of objects, could I use the built -in pagination object, or would I have to manually launch a window over the dataset with the query each time because of the size of the QuerySet of all the objects?

+4
source share
3 answers

If batch updating is possible using an SQL query, I think using sql queries or django-orm will not make much difference. But if the update really requires loading each object, processing the data and then updating them, you can use orm or write your own sql query and run update requests for each of the processed data, the overhead completely depends on the logic of the code.

The built-in pagination tool triggers the restriction, the offset of the request (if you do it right), so I don’t think that pagination has a lot of overhead.

+1
source

If you evaluate the 500,000 query set, which is large, it will be cached in memory. Instead, you can use the iterator() method in your query, which will return results on request without huge memory consumption.

In addition, use the update() and F() objects to perform simple batch updates in a single request.

+3
source

How I rated this for my current project with a dataset of 2.5M records in one table.

I read the information and counted the records, for example, I needed to find the identifiers of the records, and the "name" field was updated more than once at certain intervals. The Django test used ORM to retrieve all the records, and then iterate through them. The data was saved in a list for further processing. No debug output other than the print result at the end.

On the other hand, I used MySQLdb, which performed the same queries (received from Django) and built the same structure, using classes to store data and store instances in a list for further processing. No debug output other than the print result at the end.

I found that:

  without Django with Django execution time x 10x memory consumption y 25y 

And I only read and counted without performing updates / inserts of queries.

Try to research this question for yourself; the benchmark is not difficult to write down and execute.

-2
source

Source: https://habr.com/ru/post/1299453/


All Articles