Best practice for migrating data from MySQL to BigQuery

I tried several csv -F ormats (different escape characters, quotation marks and other settings) to export data from MySQL and import it into BigQuery, but I could not find a solution that works in each case.

Google SQL requires the following code to import / export from / to MySQL. Although Cloud SQL is not BigQuery, this is a good starting point:

SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8' 
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY '' FROM table

I am currently using the following command to import compressed csv into BigQuery: bq --nosync load -F "," --null_marker "NULL" --Format=csv PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json

On the one hand, the bq command does not allow the escape character to be set ( "escaped by the other ", which appears to be a well-defined CSV -F ormat). On the other hand, \"as an escape character for MySQL export will result in "Na "Null-value", which also does not work:

CSV table references column position 34, but line starting at position:0 contains only 34 columns. (error code: invalid)

Therefore, my question is: how to write (table) an independent export command for MySQL to SQL so that the generated file can be loaded into BigQuery. Which escape character should be used and how to handle / set null values?

+12
source share
7 answers

I worked with the same problem, here is my solution:

Export data from MySQL

MySQL :

SELECT * INTO OUTFILE 'filename.csv' CHARACTER SET 'utf8' 
FIELDS TERMINATED BY '\t' OPTIONALLY ENCLOSED BY '' 
FROM table <yourtable>

tsv (, ), csv.

,  :

bq load --field_delimiter="\t" --null_marker="\N" --quote="" \
PROJECT:DATASET.tableName gs://bucket/data.csv.gz table_schema.json

  • - MySQL (\t), . SQL- REPLACE(<column>, '\t', ' ') , .

  • - , CSV.

, .

+8

SQL, , , \N:

SELECT * INTO OUTFILE '/tmp/foo.csv' CHARACTER SET 'utf8'  
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '\"' ESCAPED BY "\\" 
FROM table;

--null_marker="\N". , ?

+2
+1

, MySQL Big Query, , :; \t .

- \escaper, "escaper", "N \N".

, . , , .

1: MySQL

:

  • : 001
  • Encloser: '' (none)

MySQL. AWS RDS Aurora, MySQL ( S3):

SELECT * FROM my_table
INTO OUTFILE S3 's3://xxxxx/tmp/my_table/data'
CHARACTER SET UTF8MB4 
FIELDS TERMINATED BY x'01'
OPTIONALLY ENCLOSED BY ''
MANIFEST OFF 
OVERWRITE ON

2. gsutil

gsutil rsync -m s3://xxxxx/tmp/my_table/ gs://xxxxx/tmp/my_table/

3: Big Query CLI

bq load --source_format=CSV --field_delimiter=^A --null_marker="\N" --quote="" project:base.my_table gs://xxxxx/tmp/my_table/* ./schema.json

  • ^ A - . Windows, Alt + 001, linux- Ctrl + V Ctrl + A ( ). .
  • - , .
+1

MySQL BigQuery Import Script.md MySQL Big Query.

mysql_table_to_big_query.sh MySQL CSV JSON SQL. . . BigQuery ( ) {SCHEMA_NAME} _ {DATE}. DATE, BigQuery.

mysql_schema_to_big_query.sh MySQL mysql_table_to_big_query.sh . CSV , . Google .

0

MySQL Google BigQuery . Google BigQuery: https://cloud.google.com/bigquery/partners/

0

sqldump-to. , MySQL, JSON BigQuery.

CSV TSV - . JSON .

, BigQuery , .

, mysqldump sqldump-to:

mysqldump -u user -psecret dbname | sqldump-to --dir-output ./dbname --schema

You may need to modify the mysqldump command to fit your specific MySQL configuration (e.g. remote servers, etc.)

If you already have a dump file, the tool also supports multiple workers to make better use of your CPU.

Once it sqldump-tocreates your JSON files, just use the command line tool bqto load into BigQuery:

bq load --source_format=NEWLINE_DELIMITED_JSON datasetname.tablename tablename.json tablename_schema.json
0
source

Source: https://habr.com/ru/post/1667403/


All Articles