MySQL optimization for parallel import of massive data files. 1 Connection at the table

I am doing the preparatory work for a great website migration.

The database is about 10 GB in size, and several tables contain> 15 million records. Unfortunately, this only happens in a large single mysqldump file in SQL format due to customer relationships outside my competence, but you know how this happens. My goal is to minimize downtime and therefore import data as quickly as possible.

I tried using the standard MySQL CLI interface like this:

$mysql database_name < superhuge_sql_file -u username -p 

It is, however, very slow.

To try to speed things up, I used awk to break the file into pieces for each table with related data and created a small shell script to try to import the tables in parallel, for example:

 #!/bin/sh awk '/DROP TABLE/{f=0 ;n++; print >(file="out_" n); close("out_" n-1)} f{ print > file}; /DROP TABLE/{f=1}' superhuge.sql for (( i = 1; i <= 95; i++ )) do mysql -u admin --password=thepassword database_name < /path/to/out_$i & done 

It is worth mentioning that this is a "use once and destroy" script (passwords in scripts, etc.).

Now it works, but it still takes more than 3 hours to complete the work on a quad-core server, doing nothing currently. Tables are imported in parallel, but not all of them at once, and trying to get MySQL server information through the CLI is very slow during the process. I'm not sure why, but trying to access tables using the same mysql user account when this happens. max_user_connections is not limited.

I established maximum connections with 500 in my.cnf, but otherwise MySQL did not configure on this server.

I had a good hunt, but I was wondering if there were any MySQL configuration options that would help speed up this process or any other methods that I skipped would be faster.

+4
source share
2 answers

If you can use GNU parallel , check out this example on the wardbekker gist:

 # Split MYSQL dump file zcat dump.sql.gz | awk '/DROP TABLE IF EXISTS/{n++}{print >"out" n ".sql" }' # Parallel import using GNU Parallel http://www.gnu.org/software/parallel/ ls -rS *.sql | parallel --joblog joblog.txt mysql -uXXX -pYYY db_name "<" 

which will split the large file into separate SQL files, then run parallel for parallel processing.

So, to run 10 threads in parallel GNU, you can run:

 ls -rS data.*.sql | parallel -j10 --joblog joblog.txt mysql -uuser -ppass dbname "<" 

On OS X, it could be:

 gunzip -c wiebetaaltwat_stable.sql.gz | awk '/DROP TABLE IF EXISTS/{n++}{filename = "out" n ".sql"; print > filename}' 

Source: wardbekker / gist: 964146


Related: Import SQL files using xargs in Unix.SE

+3
source

Does sql include multiple rows in a dump? Does the dump use multiple row inserts? (Or maybe you can pre-process it?)

This guy covers a lot of basics, for example:

  • Disabling indexes that make import many times faster.
  • Disable MySQL indexes, so before starting the import:

     ALTER TABLE `table_name` DISABLE KEYS; 

    then after importing change it:

     ALTER TABLE `table_name` DISABLE KEYS; 
  • When using the MyISAM table type, use the MySQL INSERT DELAYED command instead, so it causes MySQL to write data to disk when the database is idle.

  • For InnoDB tables, use these additional commands to avoid a lot of disk access:

     SET FOREIGN_KEY_CHECKS = 0; SET UNIQUE_CHECKS = 0; SET AUTOCOMMIT = 0; 

    and this is at the end:

     SET UNIQUE_CHECKS = 1; SET FOREIGN_KEY_CHECKS = 1; COMMIT; 
-1
source

Source: https://habr.com/ru/post/1343771/


All Articles