Running memory when performing a rake import task in ruby

I have a task to import about 1 million orders. I iterate over the data to update it to the values ​​in the new database and it works fine on my local machine with 8 gigfor ram.

However, when I load it into my AWS instance, t2.mediumit will work for the first 500 thousand lines, but towards the end, I will start maximizing my memory when it starts to actually create non-existent orders. I am moving the database mysqltopostgres

Am I missing something obvious here?

require 'mysql2' # or require 'pg'

require 'active_record'

def legacy_database
  @client ||= Mysql2::Client.new(Rails.configuration.database_configuration['legacy_production'])
end

desc "import legacy orders"
task orders: :environment do
  orders = legacy_database.query("SELECT * FROM oc_order")

  # init progressbar
  progressbar = ProgressBar.create(:total => orders.count, :format => "%E, \e[0;34m%t: |%B|\e[0m")

  orders.each do |order|
    if [1, 2, 13, 14].include? order['order_status_id']
      payment_method = "wx"
      if order['paid_by'] == "Alipay"
        payment_method = "ap"
      elsif order['paid_by'] == "UnionPay"
        payment_method = "up"
      end

      user_id = User.where(import_id: order['customer_id']).first
      if user_id
        user_id = user_id.id
      end

        order = Order.create(
          # id: order['order_id'],
          import_id: order['order_id'],
          # user_id: order['customer_id'],
          user_id: user_id,
          receiver_name: order['payment_firstname'],
          receiver_address: order['payment_address_1'],
          created_at: order['date_added'],
          updated_at: order['date_modified'],
          paid_by: payment_method,
          order_num: order['order_id']
        )

      #increment progress bar on each save
      progressbar.increment
    end
  end
end
0
source share
4 answers

, mysql- , nattfodd.

, mysql:

SELECT * FROM oc_order LIMIT 5,10; SELECT * FROM oc_order LIMIT 10 OFFSET 5;

6-15.

, .

, 1000 , - :

batch_size = 1000
offset = 0
loop do
  orders = legacy_database.query("SELECT * FROM oc_order LIMIT #{batch_size} OFFSET #{offset}")

  break unless orders.present?

  offset += batch_size

  orders.each do |order|

    ... # your logic of creating new model objects
  end
end

:

begin
  ... # main logic
rescue => e
  ... # handle error
ensure
  ... # ensure 
end
+2

, orders = legacy_database.query("SELECT * FROM oc_order") , .

. ActiveRecord find_each. , limit offset, ActiveRecord.

+3

Disabling row caching , while iterating over the collection of orders should reduce memory consumption:

orders.each(cache_rows: false) do |order|
+1
source

there is a gem that helps us do this called activerecord-import .

bulk_orders=[]

orders.each do |order|      
   order = Order.new(
          # id: order['order_id'],
          import_id: order['order_id'],
          # user_id: order['customer_id'],
          user_id: user_id,
          receiver_name: order['payment_firstname'],
          receiver_address: order['payment_address_1'],
          created_at: order['date_added'],
          updated_at: order['date_modified'],
          paid_by: payment_method,
          order_num: order['order_id']
        )
end

Order.import bulk_orders, validate: false

with one INSERT statement .

0
source

Source: https://habr.com/ru/post/1694760/


All Articles