Achieving Cassandra / DataStax Zero Downtime

Question

Achieving Cassandra / DataStax Zero Downtime

I have a Cassandra cluster (3 nodes, all nodes deployed in AWS) that I am trying to migrate to a DataStax cluster. This is just the time to stop managing these nodes.

I have several manufacturers and consumers who all read / write data all day in my Cassandra cluster. I have no way to place the application / service / proxy in front of my Cassandra cluster, and then just flip the switch so that all reads / writes go to / from my Cassandra, in DataStax. Thus, there is no clean way to migrate tables one at a time. I am also trying to achieve zero (or almost zero) downtime for all data producers / consumers. One tough requirement: migration cannot be lost. No data lost!

I think the best strategy is a four-step process:

Somehow setting up DataStax as a replica of my Cassandra cluster, effectively creating in-line replication in DataStax
Once the DataStax is completely “caught” with the other nodes of my Cassandra, continue to write to my current Cassandra cluster, but disconnect the consumers / readers to the DataStax (that is, reconfigure them to connect to the DataStax and then restart them). Not zero downtime, but I can probably live with a simple restart. ( Again, zero downtime solutions are highly preferred. )
Cut manufacturers to DataStax. Again, only with almost zero downtime, as this involves reconfiguring manufacturers to point to DataStax, and then requires a reboot to collect new configurations. It would be preferable to use zero downtime solutions.
As soon as the replication traffic from the "old" Cassandra cluster drops to zero, now we do not have the "new" information that my non-DataStax nodes should write to the DataStax. Kill these nodes with fire.

This solution is the most minimally invasive, closest to zero, simple solution that I can come up with, but involves a few things:

Perhaps DataStax cannot be processed as an additional node that can be replicated ( yes / no? )
Perhaps Cassandra and / or DataStax have some magic functions / features that I don't know about that can handle migrations better than this solution; or maybe there are third-party (ideally open source) tools that could handle this better
I have no idea how I will track replication "traffic" coming from the "old" Cassandra nodes in DataStax. You will need to know how to do this before I can safely shut down + kill the old nodes (again, I can’t lose the data).

I think I'm wondering if this strategy is: (1) feasible / feasible and (2) optimal; and if the Cassandra / DataStax ecosystem has any features / tools that I could use to do it better (faster and with zero downtime).

+5

cassandra data-migration downtime datastax

smeeb Jan 27 '17 at 19:48

source share

2 answers

Marcinthecloud · Answer 1 · 2017-01-31T10:26:03+0000

The four steps you have outlined are definitely a viable option. There is also a simple binary installation route: https://docs.datastax.com/en/latest-upgrade/upgrade/datastax_enterprise/upgrdCstarToDSE.html

I will speak in the context of the steps described above. If you are interested in what happens with a binary installation, we can also talk about it.

Note doc links are for Cassandra 3.0 (DataStax 5.0) - make sure the document versions match your version of Cassandra.

If the current major version of Cassandra == the current major version of Cassandra is in DataStax, you can add DataStax nodes as a new DC in the same cluster as the current Cassandra environment, as follows: http://docs.datastax.com/en/cassandra/ 3.0 / cassandra / operations / opsAddDCToCluster.html - This will result in existing data appearing from the existing Cassandra DC to DataStax DC.

If you are not compatible with Cassandra versions (the current Cassandra is older / older than DataStax Cassandra), then you can access DataStax through https://academy.datastax.com/slack , as the process will be more specific to your environment and can vary greatly.

As stated in the docs, you want to run

ALTER KEYSPACE "your-keyspace" WITH REPLICATION = {'class': 'NetworkTopologyStrategy', 'OldCassandraDC':3, 'DataStaxDC':3};

(obviously changing the DC name and replication rate to your specifications)

This will ensure that the new data from your manufacturers is replicated to the new DataStax nodes.

Then you can run nodetool rebuild -- name_of_existing_data_center from the nodetool rebuild -- name_of_existing_data_center nodes to stream data from existing Cassandra nodes. Depending on how much data there is, it may be time consuming, but it is the easiest, best way to do this.

Then you would like to update the contact points in your manufacturers / consumers one by one before the decommissioning of the old Cassandra DC.

Some tips in my experience:

Make sure your DataStax nodes use GosspingPropertyFileSnitch in the cassandra.yaml file before starting these nodes.
When starting nodetool rebuild do this using the screen so that you can see when it will complete (or errors). Otherwise, you will need to track progress using nodetool netstats and check for streaming activity.
Build OpsCenter to track what happens in the DataStax cluster during rebuilds. You can monitor stream throughput, pending compression, and other Kassandra metrics.
When the time comes to decommission the old DC, make sure that you follow these steps: http://docs.datastax.com/en/cassandra/3.0/cassandra/operations/opsDecomissionDC.html

Hope this helps!

Jeff jirsa · Answer 2 · 2017-02-01T00:13:08+0000

I guess you mean the Datastax Managed product, where they manage the cassandra for you. If you just mean “run DSE on your own AWS instances”, you can perform a binary upgrade in place.

The questions you asked are best asked by Datastax - if you are going to pay for them, you can also ask them questions (what customers do).

Your 4-step approach is mostly pretty logical, but probably too complicated. Most cassandra drivers will automatically detect new hosts and automatically supersede old / retiring hosts, so when you have all the new Datastax managed nodes in the cluster (assuming that they allow this), you can start the repair to ensure consistency and then output decommission your existing nodes - your application will work (isn’t Cassandra excellent?). You will need to update the configuration of your application to have new Datastax managed nodes in the configuration / endpoints of the application, but this does not need to be done in advance.

The only caveat is the delay - moving from your environment to Datastax Managed may lead to a delay. In this case, you have an intermediate step where you can consider where you add the Datastax managed nodes as another “data center” within cassandra, expand the replication coefficient, and use the LOCAL_ consistency levels to control which DC receives requests ( and then you CAN move your producers / consumers individually).

Achieving Cassandra / DataStax Zero Downtime

More articles: