Ruby does not free memory

I have Ruby code that is more or less like this

offset = 0 index = 1 User.establish_connection(..) # db1 class Member < ActiveRecord::Base self.table_name = 'users' end Member.establish_connection(..) #db2 while true users = User.limit(10000).offset(offset).as_json ## for a Database 1 offset = limit * index index += 1 users.each do |u| member = Member.find_by(name: u[:name]) if member.nil? Member.create(u) elsif member.updated_at < u[:updated_at] member.update_attributes(u) end end break if break_condition end 

What I see is that RSS (htop) memory continues to grow and at some point reaches 10 GB. I'm not sure why this is happening, but Ruby will never be released back to the OS.

I know that there is a long list of questions related to this. I even tried changing the code to look like this (last 3 lines). Ie Running GC.start manually leads to the same.

 while true .... ... ... users = nil GC.start break if break_condition end 

Tested in Ruby version 2.2.2 and 2.3.0

EDIT: Other Details

1) OS.

 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=15.04 DISTRIB_CODENAME=vivid DISTRIB_DESCRIPTION="Ubuntu 15.04" 

2) ruby ​​is installed and performed using rvm.

3) ActiveRecord version 4.2.6

+5
source share
1 answer

I cannot tell you the source of the memory leak, but I am snooping on some low-hanging fruits.

But first two things:

  • Are you sure ActiveRecord is the right way to copy data from one database to another? I am very sure that this is not so. Each large database product has reliable export and import capabilities, and the performance you see there will be many, many times better than in Ruby, and you can always call these tools from your application. Think about it before continuing this journey.

  • Where does the number 10,000 come from? Your code tells you that you don’t need to take all the records at once, but 10,000 still have a lot of records. You can see some of the benefits just by trying different numbers: 100 or 1000, say.

So make it clear what this line does:

 users = User.limit(10000).offset(offset).as_json 

The first part, User.limit(10000).offset(offset) creates an ActiveRecord :: Relation object that represents your request. When you call as_json on it, a query is executed that creates 100,000 user model objects and puts them in an array, and then a hash is created from each of these user object attributes. (Take a look at the source for ActiveRecord::Relation#as_json here .)

In other words, you create 10,000 custom objects just to throw them away after you have their attributes.

So, a quick win is to completely skip this part. Just select the source data:

 user_keys = User.attribute_names until break_condition # ... users_values = User.limit(10000).offset(offset).pluck(user_keys) users_values.each do |vals| user_attrs = user_keys.zip(vals).to_h member = Member.find_by(name: user_attrs["name"]) member.update_attributes(user_attrs) end end 

ActiveRecord::Calculations#pluck returns an array of arrays with values ​​from each record. Inside the user_values.each loop user_values.each we turn this array of values ​​into a hash. There is no need to instantiate any user objects.

Now take a look at this:

 member = Member.find_by(name: user_attrs["name"]) member.update_attributes(user_attrs) 

This selects the record from the database, creates an instance of the Member object, and then updates the record in the database β€” 10,000 times at each iteration of the while . This is the right approach if you need checks to run when this record is updated. However, if you do not need checks, you can save time and memory by not instantiating any objects:

 Member.where(name: user_attrs["name"]).update_all(user_attrs) 

The difference is that ActiveRecord::Relation#update_all does not select an entry from the database or does not create an instance of the Member object, it just updates it. You said in your comment above that you have a unique restriction on the name column, so we know that this will only update one record.

After making these changes, you must still struggle with the fact that you need to execute 10,000 UPDATE queries at each iteration of the while . Again, consider using the built-in database export and import functions instead of trying to make Rails do this.

+2
source

Source: https://habr.com/ru/post/1247615/


All Articles