How to get only information received from Kassandra?

Question

How to get only information received from Kassandra?

I am working on creating a Cassandra column family diagram for my bottom use case. I'm not sure the best way to create a cassandra column family for my bottom use case? For this I will use the CQL Datastax Java driver.

Below is my usage example and sample design that I have developed now -

SCHEMA_ID RECORD_NAME SCHEMA_VALUE TIMESTAMP 1 ABC some value t1 2 ABC some_other_value t2 3 DEF some value again t3 4 DEF some other value t4 5 GHI some new value t5 6 IOP some values again t6

Now what I will look from the table above looks something like this:

The first time my application runs, I will ask for everything from the above table. The value will give me everything that is indicated in the table above.
Then, every 5 or 10 minutes, my background thread will check this table and ask me to give me everything that has changed (full row, if something has changed for this row) .. that's why I use timestamp as one of the columns here.

But I'm not sure how to create a query template so that both of my use cases are satisfied easily and what would be the proper way to create a table for this? Here SCHEMA_ID will be the main key that I am going to use ...

I will use the CQL and Datastax Java driver for this.

Update: -

If I use something like this, then are there any problems with this approach?

 CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT, LAST_MODIFIED_DATE TIMESTAMP, PRIMARY KEY (ID)); INSERT INTO TEST (SCHEMA_ID, RECORD_NAME, SCHEMA_VALUE, LAST_MODIFIED_DATE) VALUES ('1', 't26', 'SOME_VALUE', 1382655211694);

Because in my use case, I don’t want anyone to insert the same SCHEMA_ID everytime .. SCHEMA_ID should be unique when we insert any new row into this table. So, with your example (@omnibear), maybe someone can insert the same SCHEMA_ID twice? Am I right?

And also with regard to type you took as an additional column, this type column can be record_name in my example ..

0

java cassandra cql datastax-java-driver

AKIWEB Oct 26 '13 at 4:57

source share

1 answer

omnibear · Accepted Answer · 2013-10-29 10:55

Relatively 1) Cassandra is used for heavy recording, large amounts of data on multiple nodes. Retrieving ALL data from this type of setup is bold because it can include huge amounts that must be processed by one client. A better approach would be to use pagination . This is initially supported in version 2.0 .

Relatively 2) The fact is that partition keys only support EQ or IN queries. For LT or GT (</ ">) you use the column keys. Therefore, if it makes sense to group records by some type identifier, you can use this for your section key and timeuuid as a column. This allows you to query all records, newer than X for example

 create table test (type int, SCHEMA_ID int, RECORD_NAME text, SCHEMA_VALUE text, TIMESTAMP timeuuid, primary key (type, timestamp)); select * from test where type IN (0,1,2,3) and timestamp < 58e0a7d7-eebc-11d8-9669-0800200c9a66;

Update:

You asked:

can anyone insert the same SCHEMA_ID twice? Am I right?

Yes, you can always insert with an existing primary key. The values in this primary key will be updated. Therefore, in order to preserve uniqueness, the UUID is often used in the primary key, for example timeuuid. This is a unique value that contains the timestamp and MAC address of the client. excellent documentation on this .

General advice:

Write down your queries first, and then create your model. (Use case!)
Your queries determine your data model, which in turn is determined mainly by your primary keys.

So, in your case, I would just adapt my scheme above, for example:

 CREATE TABLE TEST (SCHEMA_ID TEXT, RECORD_NAME TEXT, SCHEMA_VALUE TEXT, LAST_MODIFIED_DATE TIMEUUID, PRIMARY KEY (RECORD_NAME, LAST_MODIFIED_DATE));

What this request allows:

 select * from test where RECORD_NAME IN ("componentA","componentB") and LAST_MODIFIED_DATE < 1688f180-4141-11e3-aa6e-0800200c9a66; the uuid corresponds to -> Wednesday, October 30, 2013 8:55:55 AM GMT so you would fetch everything after that

How to get only information received from Kassandra?

More articles: