Relatively 1) Cassandra is used for heavy recording, large amounts of data on multiple nodes. Retrieving ALL data from this type of setup is bold because it can include huge amounts that must be processed by one client. A better approach would be to use pagination . This is initially supported in version 2.0 .
Relatively 2) The fact is that partition keys only support EQ or IN queries. For LT or GT (</ ">) you use the column keys. Therefore, if it makes sense to group records by some type identifier, you can use this for your section key and timeuuid as a column. This allows you to query all records, newer than X for example
create table test (type int, SCHEMA_ID int, RECORD_NAME text, SCHEMA_VALUE text, TIMESTAMP timeuuid, primary key (type, timestamp)); select * from test where type IN (0,1,2,3) and timestamp < 58e0a7d7-eebc-11d8-9669-0800200c9a66;
Update:
You asked:
can anyone insert the same SCHEMA_ID twice? Am I right?
Yes, you can always insert with an existing primary key. The values โโin this primary key will be updated. Therefore, in order to preserve uniqueness, the UUID is often used in the primary key, for example timeuuid. This is a unique value that contains the timestamp and MAC address of the client. excellent documentation on this .
General advice:
- Write down your queries first, and then create your model. (Use case!)
- Your queries determine your data model, which in turn is determined mainly by your primary keys.
So, in your case, I would just adapt my scheme above, for example:
CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT, LAST_MODIFIED_DATE TIMEUUID, PRIMARY KEY (RECORD_NAME, LAST_MODIFIED_DATE));
What this request allows:
select * from test where RECORD_NAME IN ("componentA","componentB") and LAST_MODIFIED_DATE < 1688f180-4141-11e3-aa6e-0800200c9a66; the uuid corresponds to -> Wednesday, October 30, 2013 8:55:55 AM GMT so you would fetch everything after that