Clearing the old data strategy for Cassandra DB

We store events in several tables depending on the category. Each event has an identifier, but contains several subelements. We have a lookup table to find events using subelement_id. Each sub-element can participate in max in 7 events. Therefore, the section will contain a maximum of 7 lines. We will have 30-50 MILLION lines in eventlookup for 5 years.

CREATE TABLE eventlookup (
    subelement_id text,
    recordtime timeuuid,
    event_id text,
    PRIMARY KEY ((subelement_id), recordtime)
)

Problem: how to delete old data as soon as we reach the year mark (or another number). We want to clean the tail at regular intervals, say, every week or month.

Approaches investigated so far:

  • TTL X years (works well, but TTL needs to know in advance, 8 extra bytes for each column)
  • NO to delete - just ignore the problem (problem with someone else: 0)
  • One-line speed limit (do a full table scan and potentially billions of delete statements)
  • Divide the table into several tables -> "CREATE TABLE eventlookup YYYY ". Once a year is not required, just discard it. (The problem in every reading should potentially query all tables)

Are there any other approaches we can consider?

Is there a design decision that we can make now (we are not yet in production) that will mitigate the future problem?

+4
source share
2 answers

If it costs extra space, keep track of recordtimesyour ranges subelement_idin a separate table / columnfamiliy.

, , a ttl .

, , date , - , (date,chunk), 0- 10 chunk. TimeWindowCompactionStrategy - : http://thelastpickle.com/blog/2016/12/08/TWCS-part1.html

subelement_id, 7 .

+1

, subelement_id , . , recordtime DESC:

CREATE TABLE eventlookup (
    subelement_id text,
    recordtime timeuuid,
    eventtype int,
    parentid text,
    partition bigint,
    event_id text,
    PRIMARY KEY ((subelement_id), recordtime)
)
WITH CLUSTERING ORDER BY (recordtime DESC);

, .

, (, 2000 2018 ). , 5 , :

SELECT * FROM eventlookup WHERE subelement_id = 'mysub_id' AND recordtime >= '2013-01-01';

, C * , : 5 . , , , . , "" ,

WHERE subelement_id = 'mysub_id' AND recordtime < '2013-01-01';

, , , , .

, , , .

0

Source: https://habr.com/ru/post/1693307/


All Articles