Logical delete support for existing feed table

I would like to implement a logical deletion for news recording to support cancellation later.
The system is in production, so any solution must support existing data.
Inserting records into a channel is idempotent, so inserting an already deleted record (has the same primary key) should not delete it.
Any solution should support queries to retrieve a page of existing or deleted records.

Feed table:

CREATE TABLE my_feed ( tenant_id int, item_id int, created_at timestamp, feed_data text, PRIMARY KEY (tenant_id, created_at, feed_id) ) WITH compression = { 'sstable_compression' : 'LZ4Compressor' } AND CLUSTERING ORDER BY (created_at DESC); 

There are two approaches that I thought of, but both have serious flaws:
1. Move deleted records to another table. The queries are trivial and no migration is required, but idempotent inserts seem difficult (just reading before inserting?).
2. Add the is_deleted column. Create an additional index for this column to support queries. Idempotent inserts seem to be easier to maintain (light transactions or update trick). The main disadvantage is that old records have a null value, so this requires data transfer.

Is there a third elegant approach? Do you support one of the above suggestions?

+6
source share
1 answer

If you keep a separate table for deleted records, you can use the CQL BATCH construct to perform the “move” operation, but since the only delete record is in this table, you must first check it if you want the behavior that you described, rather than repeated animating deleted records. Reading before writing is usually an anti-pattern, etc.

Using the is_deleted column may require some migration, as you mentioned, but a potentially more serious problem that may arise is that creating an index on a column with very low power is usually extremely inefficient. With the boolean field, I think your index will only contain two rows. If you do not delete too often, this means that your "false" line will be very wide, and therefore almost useless .

If you avoid creating a secondary index for the is_deleted column, and you allow both null and false to indicate active records, while only explicit true indicates deleted records, you may not need to transfer anything. (Do you really know which existing records to delete during the migration?) Then you will leave the filtering of deleted records to the client, who will probably already be responsible for some of your search queries. The disadvantage of this design is that you may need to have> N entries get N that are not deleted!

I hope this helps and resolves the issue of how you stated this. I would be interested to know why you need to protect against already deleted records that come to life, but I can imagine a situation where you have several actors working on a specific feed (and CAS problems that may arise).

On a slightly unrelated note, you may need to use timeuuid instead of timestamp for your created_at field. CQL supports the dateOf() function to get this date if it is a stumbling block. (It may also be impossible to get collisions inside tenant_id sections, in which case you can safely ignore me.)

+1
source

Source: https://habr.com/ru/post/976866/


All Articles