Active ActiveRecord data, memory grows forever

I use ActiveRecord to bulk transfer some data from a table in one database to another table in another database. About 4 million lines.

I use find_each to fetch packages. Then I do some logic for each record and write it to another db. I tried to write directly one after the other, and use the beautiful activerecord-import attribute for batch writing.

However, in any case, the use of the memory of the ruby โ€‹โ€‹process is growing throughout the entire period of export / import. I would think that using find_each, I get lots of 1000, their number should only be 1000 in memory at a time ... but no, every record I get seems to consume memory forever until the process ends.

Any ideas? Is ActiveRecord caching something that I can disable?

updated Jan 17, 2012

I think I'm going to give it up. I tried: * Make sure everything is wrapped in ActiveRecord::Base.uncached do * Adding ActiveRecord::IdentityMap.enabled = false (I think this should disable the ID card for the current stream, although this is not clearly documented, and I think that ID is not enabled by default in current Rails)

None of them have a big effect, memory is still leaking.

Then I added a periodic explicit:

  • GC.start

This seems to slow down the memory leak rate, but a memory leak still occurs (ultimately exhausting all memory and bombing).

So, I think I'm giving up, and decided that it is currently not possible to use AR to read millions of rows from one database and insert them into another. Perhaps there is a memory leak in the used MySQL code (that is my db), or somewhere else in AR, or who knows.

+4
source share
2 answers

I suggest queuing every unit of work to the Resque queue . I found that the ruby โ€‹โ€‹has some quirks when repeating such large arrays.

You have one main thread that queues for work by ID, and then several red-blooded workers are queued to do this work.

I used this method for about 300 thousand records, so it will most likely scale to millions.

+1
source

Change line # 86 to bulk_queue = [] , because bulk_queue.clear sets only the length of arrya to 0 , which makes it impossible to clear the GC.

0
source

Source: https://habr.com/ru/post/1388525/


All Articles