Cassandra - client-side load balancing

Consider the following Cassandra setup:

  • 6 knot ring: A, B, D, E, F, G
  • replication rate: 3
  • delimiter: RandomPartitioner
  • placement strategy: SimpleStrategy

My test column is stored on node B and replicated to nodes D and E.

Now I have several java processes that read my test column through the Hector (Thrift) API using CL.ONE reading

There are two possibilities:

  • The Hector forwards all calls to node B, since B is the data master
  • Hector will load balance messages for reading through node B, D and E (master and replication). In this case, my test column will be loaded into the cache on each instance of Cassandra.

Which one is 1) or 2)?

Thank you and welcome Maciej

+6
source share
2 answers

I believe this is: 3) Cassandra forwards all calls to the nearest node, which is live, where the "proximity" is determined by the current Snitch (set in cassandra.yaml).

  • SimpleSnitch selects the closest node on the marker ring.
  • AbstractNetworkTopologySnitch and derived snitches first try to select nodes in the same rack, and then nodes in the same data center.

If DynamicSnitch is turned on, it dynamically adjusts the proximity of the node returned by the base snitch according to the latest results of the nodes.

For more information, see Cassandra ArchitectureInternals in the Reading Path section.

+4
source

(An approved Theodore answers because he is a corect). Additional information:

We do nothing on the processor side to route traffic to this node based on the key (for now). This was referred to as a “customer-mediated choice” in Amazon's Dynamo paper section 6.2. The study seems to show that it is really only useful for very large clusters, cutting out a network jump.

The downside would be duplicate hash calculations and partition searches on the client.

+2
source

Source: https://habr.com/ru/post/901614/


All Articles