I cannot tell you the source of the memory leak, but I am snooping on some low-hanging fruits.
But first two things:
Are you sure ActiveRecord is the right way to copy data from one database to another? I am very sure that this is not so. Each large database product has reliable export and import capabilities, and the performance you see there will be many, many times better than in Ruby, and you can always call these tools from your application. Think about it before continuing this journey.
Where does the number 10,000 come from? Your code tells you that you donβt need to take all the records at once, but 10,000 still have a lot of records. You can see some of the benefits just by trying different numbers: 100 or 1000, say.
So make it clear what this line does:
users = User.limit(10000).offset(offset).as_json
The first part, User.limit(10000).offset(offset) creates an ActiveRecord :: Relation object that represents your request. When you call as_json on it, a query is executed that creates 100,000 user model objects and puts them in an array, and then a hash is created from each of these user object attributes. (Take a look at the source for ActiveRecord::Relation#as_json here .)
In other words, you create 10,000 custom objects just to throw them away after you have their attributes.
So, a quick win is to completely skip this part. Just select the source data:
user_keys = User.attribute_names until break_condition # ... users_values = User.limit(10000).offset(offset).pluck(user_keys) users_values.each do |vals| user_attrs = user_keys.zip(vals).to_h member = Member.find_by(name: user_attrs["name"]) member.update_attributes(user_attrs) end end
ActiveRecord::Calculations#pluck returns an array of arrays with values ββfrom each record. Inside the user_values.each loop user_values.each we turn this array of values ββinto a hash. There is no need to instantiate any user objects.
Now take a look at this:
member = Member.find_by(name: user_attrs["name"]) member.update_attributes(user_attrs)
This selects the record from the database, creates an instance of the Member object, and then updates the record in the database β 10,000 times at each iteration of the while . This is the right approach if you need checks to run when this record is updated. However, if you do not need checks, you can save time and memory by not instantiating any objects:
Member.where(name: user_attrs["name"]).update_all(user_attrs)
The difference is that ActiveRecord::Relation#update_all does not select an entry from the database or does not create an instance of the Member object, it just updates it. You said in your comment above that you have a unique restriction on the name column, so we know that this will only update one record.
After making these changes, you must still struggle with the fact that you need to execute 10,000 UPDATE queries at each iteration of the while . Again, consider using the built-in database export and import functions instead of trying to make Rails do this.