4) Ranges of ranges are determined by providing each node with a range from their available tokens to the next specified token.
2) Data is exchanged through gossip, detailing which nodes have tokens. This metadata allows each node to know which nodes are responsible for the ranges. Keyspace / Replication options also change when data is actually stored.
Example: 1) A receives 256 ranges of B, receives 256 ranges. But to make it simple, give them two tokens and pretend the token range is from 0 to 30
Given tokens: A 10.15 and B 3.11 Nodes are responsible for the following ranges
(3-9:B)(10:A)(11-14:B)(15-30,0-2:A)
3) If C joins also with 2 tokens 20.5 Now the nodes will be responsible for the following ranges
(3-4:B)(5-9:C)(10:A)(11-14:B)(15-19:A)(20-30,0-2:C)
Vnodes are powerful because now that C connects the cluster, it gets its data from several nodes (5-9 from B and 20-30,0-2-2 from A) sharing the load between these machines. In this toy example, you can see that having only two tokens allows some nodes to host most of the data, while others receive almost nothing. As the number of Vnodes increases, the balance between nodes increases as the ranges become more and more randomly divided. On 256 nodes, you are likely to distribute an even amount of data for each node in the cluster.
For more information VNodes: http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2
RussS source share