Is cassandra num_tokens really num_token_partitions?

I am new to Kassandra. I am reading about the num_tokens parameter for virtual nodes in the cassandra.yml file. I do not think that I fully understand what this does or how tokens / sections are assigned. What's going on here?

The default value of 256 makes no sense if we are really talking about the number of tokens / node. Is num_tokens num_token_partitions / node really?

First, take 2 nodes A and B, add a third node C, and then try to explain how everything works. First, each node is configured with num_tokens of 256. Now that A and B come up

  • How many tokens do A and B make when they join a cluster? What partition ranges do A and B get, and how is this accepted?
  • What metadata is stored in Cassandra to find out which ranges of ranges A and B are migrated.
  • What happens when C joins now? How does Cassandra decide which areas of the range C gets? How many partitions should be put in C?
  • How does the partition range for A and B change when joining C?

Is anyone enough to explain in detail for everyone?

+6
source share
2 answers

4) Ranges of ranges are determined by providing each node with a range from their available tokens to the next specified token.

2) Data is exchanged through gossip, detailing which nodes have tokens. This metadata allows each node to know which nodes are responsible for the ranges. Keyspace / Replication options also change when data is actually stored.

Example: 1) A receives 256 ranges of B, receives 256 ranges. But to make it simple, give them two tokens and pretend the token range is from 0 to 30

Given tokens: A 10.15 and B 3.11 Nodes are responsible for the following ranges

(3-9:B)(10:A)(11-14:B)(15-30,0-2:A) 

3) If C joins also with 2 tokens 20.5 Now the nodes will be responsible for the following ranges

 (3-4:B)(5-9:C)(10:A)(11-14:B)(15-19:A)(20-30,0-2:C) 

Vnodes are powerful because now that C connects the cluster, it gets its data from several nodes (5-9 from B and 20-30,0-2-2 from A) sharing the load between these machines. In this toy example, you can see that having only two tokens allows some nodes to host most of the data, while others receive almost nothing. As the number of Vnodes increases, the balance between nodes increases as the ranges become more and more randomly divided. On 256 nodes, you are likely to distribute an even amount of data for each node in the cluster.

For more information VNodes: http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

+15
source

At 256 nodes, you are likely to distribute an even amount of data for each node in the cluster.

Unless, of course, no. The random distribution of Vnode token tokens has nothing to do with load balancing. Balanced load is the ENGINEERED token range, which should be balanced, not guessed.

Then there are errors in the distribution of the ranges of the markers CASSANDRA-6388 and CASSANDRA-7032 not one fixed in any cluster launched in production today. Then there are the main problems with 256 VNODE clusters and attempts to rebuild them or create a backup, which is impossible, literally.

Reconstructions and restoration take WEEKS. And just try running hasoop against vnodes in production. Throw the token range engineering cluster for VNODE grad Mary at your own risk.

0
source

Source: https://habr.com/ru/post/958129/


All Articles