We are currently working on rewriting our Peer-to-Peer Service Bus Directory ( Zebus ).
We had a Cassandra / Thrift implementation, and it needed to be improved to meet some of the new loading requirements, so rewriting it using CQL seemed to be right.
We have two CFs, one for storing Peers and one for storing subscriptions, the last of which is the most complex.
We need to keep a list of routing keys (bindings) for each message type and a list of message types for each peer. We also need to be able to update each list of message type routes separately (we use Cassandra timestamps to handle potential race conditions, since we have several directories). And finally, we should be able to list all these subscriptions when someone requests Peers status.
The last point was the problem, because it means starting SELECT * FROM "Subscriptions", which means listing strings from multiple nodes (BTW, how does CQL allow you to list strings of Cassandra?) And can be quite slow.
Thus, we have the following scheme for our CF, to store everything sequentially on disk in the same Cassandra line and have excellent read performance (we know that this is pretty bad for balancing data between nodes).
CREATE TABLE IF NOT EXISTS "DynamicSubscriptions" (
"UselessKey" boolean,
"PeerId" text,
"MessageTypeId" text,
"SubscriptionBindings" blob,
PRIMARY KEY("UselessKey", "PeerId", "MessageTypeId")
);
This is pretty ugly, but it does the trick, it all ends up on the same line of "Thrift", which leads to fast reading.
So my question is this: is there a nice way to develop CF using CQL if I want my data to be requested very quickly during an unconditional SELECT?
(Or, if you think our design is completely flawed, feel free to talk about it).