Sync data with MySql on Amazon RedShift

We are doing some aggregation on huge datasets in Amazon RedShift, and we have a relatively small amount of data in MySQL. For some of the joins in RedShift, we need data in MySQL. What is the best way to sync MySql data with RedShift? Is there such a thing in redshift as a remote representation in an oracle? Or should I programmatically query MySql and insert / update in RedShift?

+6
source share
4 answers

When MySQL data is needed for joins in Redshift, we usually just send it from one to the other.

It means:

  • Redshift: creating a similar table schema (taking into account the features of Redshift / PSQL)
  • MySQL: dumping a data table (in csv format)
  • Cancel export and send it to S3
  • Redshift: truncate a table and import all data using COPY

Steps 2-4 can be scripted and allow you to send fresh data to Redshift if necessary or regularly.

+3
source

Redshift now supports downloading data from remote hosts via SSH. This method includes:

  • Adding the public key from the cluster to the authorized_keys file on the remote host (nodes)
  • Allow SSH access to the remote host (s) from the IP addresses of the cluster nodes
  • Loading a JSON manifest on S3 with the remote hosts, public key (s) and commands to execute on the remote host
  • Running the COPY command with the specified manifest file and AWS credentials

The command specified by the manifest runs an arbitrary command that prints text in a format suitable for use by the Redshift COPY command.

+7
source

What is remote viewing in Oracle?

In any case, if you can extract data from the table into a CSV file, you have another scenario. You can use the Python / boto / psycopg2 combination to script load your CSV on Amazon Redshift.

In my MySQL_To_Redshift_Loader, I do the following:

  • Extract data from MySQL to temp file.

    loadConf=[ db_client_dbshell ,'-u', opt.mysql_user,'-p%s' % opt.mysql_pwd,'-D',opt.mysql_db_name, '-h', opt.mysql_db_server] ... q=""" %s %s INTO OUTFILE '%s' FIELDS TERMINATED BY '%s' ENCLOSED BY '%s' LINES TERMINATED BY '\r\n'; """ % (in_qry, limit, out_file, opt.mysql_col_delim,opt.mysql_quote) p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env) p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE) ... 
  • S3 data compression and loading using boto Python module and multi-page loading.

     conn = boto.connect_s3(AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) bucket = conn.get_bucket(bucket_name) k = Key(bucket) k.key = s3_key_name k.set_contents_from_file(file_handle, cb=progress, num_cb=20, reduced_redundancy=use_rr ) 
  • Use the psycopg2 COPY command to add data to a Redshift table.

     sql=""" copy %s from '%s' CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s' DELIMITER '%s' FORMAT CSV %s %s %s %s;""" % (opt.to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.delim,quote,gzip, timeformat, ignoreheader) 
0
source

Check out this easiest way to load Mysql data into redshift. When your expectation just uploads snapshots of the source data to redshift, try this free solution. In addition, you will receive a migration of the circuit, an external query console and some statistical report (with a graph) of the entire loading process.

-2
source

Source: https://habr.com/ru/post/956058/


All Articles