Why not just save a table containing things that have been deleted since the last rewrite?
This table may be the same structure as your main bucket, possibly with a Bloom filter for quick membership checks.
You can re-record the data of the main bucket without deleted elements, either when you are going to rewrite it in any case for some other modification, or when the ratio of deleted elements: the size of the bucket exceeds a certain threshold.
This scheme can work either by storing each remote pair next to each bucket, or by storing a separate table for all deleted documents: I'm not sure what works best for your requirements.
Keeping one table, itβs hard to know when you can delete an item if you donβt know how many buckets it affects, without re-writing all the buckets every time the delete table gets too big. It may work, but it will stop the world a little.
You also need to perform two checks for each pair that you enter (i.e. for (3278, 15378) , you should check if only 3278 or 15378 , and not just check if there will be a pair (3278, 15378) was removed.
Conversely, a table for each basket of each deleted pair will take longer, but it will be a little faster to check and easier to collapse when re-recording the bucket.
source share