I have a table containing about 30 thousand records that I am trying to iterate and process using Django ORM. Each record stores several binary blocks, each of which can have a size of several MB, which I need to process and write to a file.
However, I am having problems using Django for this due to memory limitations. I have 8 GB of memory in my system, but after processing about 5 thousand records, the Python process consumes all 8 GB and will be killed by the Linux kernel. I tried various tricks to clear the Django request cache, for example:
- periodically calling
MyModel.objects.update() settings.DEBUG=False- periodically calling the Python garbage collector via
gc.collect()
However, none of them seem to have a noticeable effect, and my process continues to experience some kind of memory leak until it works.
Is there anything else I can do?
Since I only need to process each record one at a time, and I never need to access the same record again in this process, I do not need to save an instance of the model or load more than one instance at a time. How do you ensure that only one record is loaded and that Django caches nothing and does not allocate all memory immediately after use?
Cerin source share