Cassandra data modeling for real-time data

Question

Cassandra data modeling for real-time data

I currently have an application that saves event-driven streaming data into a column family that is modeled as such:

CREATE TABLE current_data (
    account_id text, 
    value text,
    PRIMARY KEY (account_id)
)

Data is sent every X seconds to the account, so we overwrite the existing row every time we receive an event. This data contains current information in real time, and we only care about the last event (do not use for older data, so we insert an existing key). From the end of the user application, the request request is select_ account_id.

I was wondering if there is a better way to model this behavior and looked at Cassandra's best practices and other similar questions ( How to model Cassandra DB for Time Series, server metrics ).

Thinking of something like this:

CREATE TABLE current_data_2 (
        account_id text, 
        time timeuuid,      
        value text,
        PRIMARY KEY (account_id, time) WITH CLUSTERING ORDER BY (time DESC)
)

No dubbing will occur, and each insertion will also be done with TTL (maybe TTL in a few minutes).

The question is HOW better, if at all , is the second data model compared to the first. As far as I understand, the main advantage will be in READS - since the data is ordered by time, all I need to do is simple

SELECT * FROM metrics WHERE account_id = <id> LIMIT 1

while in the first data model, Cassandra actually reads ALL rows that replace the same key, and then select the last by timestamp (please correct me if I am wrong).

Thank.

+4

cassandra data-modeling query-performance

fncontroloptioncommand 17 . '17 10:59

1

Oresztesz · Accepted Answer · 2017-10-24T21:55:15+0000

, .

, Cassandra SSTable MemTable ( -).

,

( -). , SSTable .

, SSTables . , SSTable.

SSTabes, SSTable. TTL . .

Cassandra data modeling for real-time data

More articles: