I am trying to make a replication server from MySQL to redshift, for this I am parsing a MySQL bing file. For initial replication, I take a dump of the mysql table, converting it to a CSV file and uploading it to S3, and then use the redshift copy command. For this, performance is effective.
After the initial replication for continuous synchronization, when I read binlog, inserts and updates should start sequentially, which is very slow.
Is there anything that can be done to improve performance?
One possible solution that I can think of is to wrap statements in a transaction, and then send the transaction right away to avoid multiple network calls. But this will not solve the problem that one update and insertion of statements in redshift is very slow. One update statement takes 6 seconds. Knowing the redshift constraints (that it is a columnar database and inserting one row will be slow), what can be done to circumvent these constraints?
Edit 1: Regarding DMS: I want to use redshift as a warehousing solution that constantly replicates our MYSQL, I do not want to denormalize the data, since I have 170+ tables in mysql. During ongoing replication, the DMS shows many errors several times a day and completely fails in a day or two, and it is very difficult to decrypt the DMS error logs. In addition, when I delete and reload tables, it deletes existing tables in redshift and creates a new table, and then begins to insert data that leads to downtime in my case. I wanted to create a new table, and then switch the old to the new and delete the old table
source share