How to save large amounts of data by reading from a CSV file

How to save large amounts of data by reading from a CSV file (say, 20 million lines). Until now, it was approaching 1 1/2 days and saving only 10 million lines, how can I do this to make it faster, and it is possible to run it in parallel.

I am using CSV reader code here, I would like to know if there is a better way to achieve this.

Refer: to large CSV files (20G) in ruby

+6
source share
3 answers

You can first split the file into several smaller files, then you can process multiple files in parallel.

It is likely that the user will be able to use a tool faster, such as split

split -l 1000000 ./test.txt ./out-files- 

Then, while you are processing each of the files and assuming that you are inserting records, rather than pasting them one by one, you can combine them into batches and make voluminous inserts. Sort of:

 INSERT INTO some_table VALUES (1,'data1'), (2, 'data2') 

To improve performance, you need to create an SQL statement yourself and execute it:

 ActiveRecord::Base.connection.execute('INSERT INTO <whatever you have built>') 
+4
source

Since you want to save data in MySQL for further processing, using Load Data Infile from MySQL will be faster. something like the following with your circuit:

sql = "LOAD DATA LOCAL INFILE 'big_data.csv' INTO TABLE FIELD TESTS TESTED ',' CONCLUDED '\" LINES INTERRUPTED' \ n '(foo, foo1) "

con = ActiveRecord :: Base.connection

con.execute (SQL)

+2
source

Key points:

Sorry for the bad english.

+1
source

Source: https://habr.com/ru/post/978921/


All Articles