Sync data with MySql on Amazon RedShift

Question

Sync data with MySql on Amazon RedShift

We are doing some aggregation on huge datasets in Amazon RedShift, and we have a relatively small amount of data in MySQL. For some of the joins in RedShift, we need data in MySQL. What is the best way to sync MySql data with RedShift? Is there such a thing in redshift as a remote representation in an oracle? Or should I programmatically query MySql and insert / update in RedShift?

+6

synchronization mysql amazon-redshift view rds

Gavriel Oct 16 '13 at 8:21

source share

4 answers

Redshift now supports downloading data from remote hosts via SSH. This method includes:

Adding the public key from the cluster to the authorized_keys file on the remote host (nodes)
Allow SSH access to the remote host (s) from the IP addresses of the cluster nodes
Loading a JSON manifest on S3 with the remote hosts, public key (s) and commands to execute on the remote host
Running the COPY command with the specified manifest file and AWS credentials

The command specified by the manifest runs an arbitrary command that prints text in a format suitable for use by the Redshift COPY command.

+7

Ben whaley Nov 11 '13 at 23:25

source share

What is remote viewing in Oracle?

In any case, if you can extract data from the table into a CSV file, you have another scenario. You can use the Python / boto / psycopg2 combination to script load your CSV on Amazon Redshift.

In my MySQL_To_Redshift_Loader, I do the following:

Extract data from MySQL to temp file.

loadConf=[ db_client_dbshell ,'-u', opt.mysql_user,'-p%s' % opt.mysql_pwd,'-D',opt.mysql_db_name, '-h', opt.mysql_db_server] ... q=""" %s %s INTO OUTFILE '%s' FIELDS TERMINATED BY '%s' ENCLOSED BY '%s' LINES TERMINATED BY '\r\n'; """ % (in_qry, limit, out_file, opt.mysql_col_delim,opt.mysql_quote) p1 = Popen(['echo', q], stdout=PIPE,stderr=PIPE,env=env) p2 = Popen(loadConf, stdin=p1.stdout, stdout=PIPE,stderr=PIPE) ...

S3 data compression and loading using boto Python module and multi-page loading.

 conn = boto.connect_s3(AWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY) bucket = conn.get_bucket(bucket_name) k = Key(bucket) k.key = s3_key_name k.set_contents_from_file(file_handle, cb=progress, num_cb=20, reduced_redundancy=use_rr )

Use the psycopg2 COPY command to add data to a Redshift table.

 sql=""" copy %s from '%s' CREDENTIALS 'aws_access_key_id=%s;aws_secret_access_key=%s' DELIMITER '%s' FORMAT CSV %s %s %s %s;""" % (opt.to_table, fn, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY,opt.delim,quote,gzip, timeformat, ignoreheader)

0

Alex b Apr 27 '16 at 21:19

source share

Check out this easiest way to load Mysql data into redshift. When your expectation just uploads snapshots of the source data to redshift, try this free solution. In addition, you will receive a migration of the circuit, an external query console and some statistical report (with a graph) of the entire loading process.

-2

Tajul Islam Nov 17 '15 at 14:41

source share

altermativ · Accepted Answer · 2013-10-16T23:16:48+0000

When MySQL data is needed for joins in Redshift, we usually just send it from one to the other.

It means:

Redshift: creating a similar table schema (taking into account the features of Redshift / PSQL)
MySQL: dumping a data table (in csv format)
Cancel export and send it to S3
Redshift: truncate a table and import all data using COPY

Steps 2-4 can be scripted and allow you to send fresh data to Redshift if necessary or regularly.

Sync data with MySql on Amazon RedShift

More articles: