Cassandra TTL is set to 0 on the primary key, if the update does not specify TTL, but if so, TTL on the primary key does not change

This behavior in Kassandra seems counterintuitive, and I want to know why this is happening, and possibly get around it.


Imagine I have a table with three columns: pk , primary key, type text , foo , bigint and bar , another text .

 insert into keyspace.table (pk, foo, bar) values ('first', 1, 'test') using ttl 60; 

This creates a row in my table that has a life time of 60 seconds. Looking at this, it looks like this:

  pk | foo | bar ------------------ first | 1 | test 

Now I do:

 update keyspace.table using ttl 10 set bar='change' where pk='first'; 

And then, watching the line, I see that it undergoes the following changes:

  pk | foo | bar -------------------- first | 1 | change first | 1 | <<null>> // after 10 seconds << deleted >> // after the initial 60 seconds 

All is well and good. What I wanted was for bar time for a break, but nothing else, especially not the primary key. This behavior was expected.


However, if my update does not have ttl in it or it is set to 0:

 update keyspace.table set bar='change' where pk='first'; 

Then I see this behavior over time.

  pk | foo | bar -------------------- first | 1 | change first | 0 | change // after the initial 60 seconds 

In other words, the string is never deleted. foo not changed, therefore its lifetime remained valid, and after its transfer the value was deleted (set to 0). But pk has changed its lifetime. This is completely unexpected.

Why does a primary key change in real time only change if I do not specify the lifetime in the update? And how can I get around this so that the primary key of the lifetime changes if I say this explicitly?

Edit I also found that if I use a lifetime that is higher than the initial, it also seems to change the lifetime on the primary key.

 update keyspace.table using ttl 70 set bar='change' where pk='first'; pk | foo | bar -------------------- first | 1 | change first | 0 | change // after the initial 60 seconds << deleted >> // after the 70 seconds 
+6
source share
2 answers

The effect you experience is caused by the storage model used by Cassandra.

In your example, where you have a table in which there are no clustering columns, each row in the table is mapped to a row in the data store (often called the "Sutra Line" because it is a storage model, Thrift API). Each of the columns in the table that are not part of the primary key (therefore, in your example, the columns foo and bar ) is displayed in the column of the Thrift row. In addition to this, an additional column that does not appear in the CQL row is created as a marker that exists in the row.

TTL termination occurs at the column level of Thrift, not CQL. When you INSERT row, all the columns that you insert, as well as a special marker for the row itself, get the same TTL.

If you are an UPDATE row, only updated columns get the new TTL. The line marker is not affected.

When you run a query with SELECT all rows for which at least one column or , returns a special row marker. This means that the column with the highest TTL determines how long the CQL string will be displayed if only the row marker (which applies only when using the INSERT ) has a longer TTL.

If you want the row primary key to be updated with the same TTL as the new column values, the workaround is simple: use the INSERT when updating the row. This will have the same effect as using UPDATE , but it will also update the TTL of the line marker.

The only drawback to this workaround is that it does not work in conjunction with lightweight transactions ( IF clause in INSERT or UPDATE ). If you need this in combination with TTL, you should use a more sophisticated workaround, but that will be a separate issue, I suppose.

If you want to update some columns of a row, but still want the whole row to disappear after the TTL that you specified when it was inserted expires, this is not directly supported by Cassandra. The only way is to find the TTL on the left for the row, first querying the TTL of one of the columns, and then using that TTL in the UPDATE operation. For example, you can use SELECT TTL(foo) FROM table1 WHERE pk = 'first'; . However, this has performance implications as it increases the delay (you must wait for the SELECT result before you can run UPDATE ).

Alternatively, you can add a column that you only use as the "row exists" marker, and that you only touch INSERT and are never in UPDATE . Then you could simply ignore rows for which this column is null , but this filtering should be implemented on the client side, and this will not help if you cannot specify TTL in UPDATE , because updated columns will never be deleted.

+7
source

After some testing, these are the expected results. TTLs have column granularity.

  • When updating, if TTL is not specified, the TTL column is set to 0. This operation does not affect other TTL columns.
  • We cannot update the column value and store the old TTL column value in a single cql command.
  • A row (or primary / section partition) is deleted when all TTL columns are out of date. A row will not be deleted if the column has a TTL or 0.

To date (Cassandra 2.1). Here's how you can update a column value and save its TTL:

 SELECT TTL(col1) FROM table1 where pk=1; // read the ttl value fetched. UPDATE table1 USING TTL <the_ttl_value> set col1='change' where pk=1; 
+1
source

Source: https://habr.com/ru/post/979068/


All Articles