Cassandra's scaling is better described in terms of Gustafson's Law , rather than Amdahl's Law . Scaling Gustafson looks like how much data can be processed as the number of nodes increases. That is, if you have N times more nodes, you can process the data set N times more for the same period of time.
This is possible because Cassandra uses very little coordination across the entire cluster, with the exception of circuit and ring changes. Most operations include only the number of nodes equal to the replication coefficient, which remains constant as the data set grows, therefore, on an almost linear scale.
In contrast, Amdahl scaling looks like how much faster you can process a fixed dataset as the number of nodes increases. That is, if you have N times more nodes, can you process the same data set N times faster?
It is clear that at some point you are reaching the point where adding more nodes does not make your requests faster because it takes a minimal amount of time to service the request. Cassandra is not linear here.
In your case, it sounds like you are asking if it is better to have 1000 slow nodes or 200 fast nodes. How big is your data set? It depends on your workload, but the usual recommendation is that the optimal node size is about 1 TB of data, each of which should have enough RAM and processor (see cassandra node restrictions ). 1000 sounds too much if you don't have petabytes of data.
source share