I experimented and calculated what is most effective in terms of reading / writing units and cost, taking into account race conditions when updates are registered during version registration and avoiding data duplication. I narrowed down a couple of possible solutions. You should consider your best option.
Key concepts revolve around considering version 0 as the latest version. In addition, we will use the revisions key, which will list how many revisions exist before this element, but will also be used to determine the current version of the element ( version = revisions + 1 ). The ability to calculate how versions exist is a requirement, and, in my opinion, revisions satisfy this need, as well as the value that can be presented to the user.
Thus, the first line will be created using version: 0 and revisions: 0 . Although technically this is the first version (v1), we do not use the version number until it is archived. When this line changes, version remains at 0 , which still stands for the last, and revisions incremented to 1 . A new line is created with all previous values, except that now this line stands for version: 1 .
Summarizing:
When creating an item:
- Create an element using
revisions: 0 and version 0
When updating or overwriting an item:
- Increased
revisions - Insert the old line exactly as before, but replace
version: 0 with a new version, which can easily be calculated as version: revisions + 1 .
Here's what the conversion to conversion will look like for a table with only a primary key:
Primary Key: id
id color 9501 violet 9502 cyan 9503 magenta
Primary Key: Identifier + Version
id version revisions color 9501 0 6 violet 9501 1 0 red 9501 2 1 orange 9501 3 2 yellow 9501 4 3 green 9501 5 4 blue 9501 6 5 indigo
Here is a table conversion that already uses the sort key:
Primary Key: ID + Date
id date color 9501 2018-01 violet 9501 2018-02 cyan 9501 2018-03 black
Primary Key: id + date_ver
id date_ver revisions color 9501 2018-01__v0 6 violet 9501 2018-01__v1 0 red 9501 2018-01__v2 1 orange 9501 2018-01__v3 2 yellow 9501 2018-01__v4 3 green 9501 2018-01__v5 4 blue 9501 2018-01__v6 5 indigo
Alternative No. 2:
id date_ver revisions color 9501 2018-01 6 violet 9501 2018-01__v1 0 red 9501 2018-01__v2 1 orange 9501 2018-01__v3 2 yellow 9501 2018-01__v4 3 green 9501 2018-01__v5 4 blue 9501 2018-01__v6 5 indigo
In fact, we have the ability to either put previous versions in one table, or split them in our table. Both options have their advantages and disadvantages.
Using the same table:
- The primary key consists of a partition key and a sort key
- The version should be used in the sort key either separately as
number , or added to the existing sort key as string
Benefits:
- All data exists in one table.
Disadvantages:
- Perhaps restricts the use of table sort keys
- Version control uses the same recording units as your main table.
- Sort keys can only be set during table creation.
- Maybe you need to reconfigure the code for v0 request
- Previous versions are also affected by indexes.
Using additional tables:
- Add the
revision key to both tables - If the sort key is not used, create a sort key for the secondary table named
version . The primary table will always have version: 0 . Using this key in the primary table is optional. - If you are already using a sort key, see "Alternative No. 2" above
Benefits:
- The primary table does not need to change any keys or recreate.
get requests do not change. - The main table stores its sort key
- The secondary table may have independent units for reading and writing
- The secondary table has its own indices
Disadvantages:
- Second table management required
No matter how you decide to split the data, now we have to decide how to create revision lines. Here are a few different methods:
Synchronously overwrite / update item on demand and insert revision on demand
Summary: Get the current version of the string. Update the current row and insert the previous version with one transaction.
To avoid race conditions, we need to write an update and insert it into the same operation using TransactWriteItems . In addition, we need to make sure that the version we are updating is the correct version by the time the query reaches the database server. We achieve this either with one of two checks, or even both:
- In the
Update command in TransactItems ConditionExpression must verify that the revision in the updated row matches the revision in the object that we performed on Get earlier. - The
Put command in TransactItems ConditionExpression checks that the row does not exist yet.
Cost
- 1 4K read capacity for Get on v0
- 1 Recording capacity for preparing TransactWriteItem
- 1 write capacity at 1K for Put / Update on v0
- 1 1KB recording capacity for Revision Version
- 1 Recording capacity for committing TransactWriteItem
Notes:
Upon request, asynchronously retrieve items, overwrite / update items, and insert versions
Summary: Get and save the current line. When overwriting or updating a line, check the current revision and increment revisions . Insert a previously saved row with the version number.
Run update with
{ UpdateExpression: 'SET revisions = :newRevisionCount', ExpressionAttributeValues: { ':newRevisionCount': previousRow.revisions + 1, ':expectedRevisionCount': previousRow.revisions, }, ConditionExpression: 'revisions = :expectedRevisionCount', }
We can use the same ConditionExpression with put when overwriting a previously existing line.
In the response, we observe a ConditionalCheckFailedException . If this returns, it means that the revision has already been changed by another process, and we must repeat the process from the very beginning or interrupt it completely. If there are no exceptions, then we can insert the previous saved row after updating the attribute value of your version accordingly (numeric or string).
Cost
- 1 unit of 4K reading capacity for Get on v0
- 1 unit of write capacity per 1 KB for Put / UpdateItem in v0
- 1 unit of write capacity per KB for Put version
On demand, asynchronous hidden element update and change-insert
Summary: Perform a βblindβ update to v0, increasing revisions and requesting old attributes. Use the return value to create a new line with the version number.
Run update-item with
{ UpdateExpression: 'ADD revisions :revisionIncrement', ExpressionAttributeValues: { ':revisionIncrement': 1, }, ReturnValues: 'ALL_OLD', }
The ADD action will automatically create revisions if it does not exist, and will consider it 0 . Another nice benefit of ReturnValues:
No additional costs associated with requesting a return value, with the exception of a small network, and the overhead of processing the receipt of a larger response. Reading units are not used.
In response to the update, the Attributes value will be data from the old record. The version of this entry is Attributes.revisions + 1 . If necessary, change the value of the version attribute (numeric or string).
Now you can insert this entry into your target table.
Cost
- 1 unit of write capacity per 1 KB for v0 upgrade
- 1 unit of write capacity per KB for Put version
Notes:
- The length of the returned
Attributes object is limited to 65535. - There is no solution to rewrite strings.
Automatic Asynchronous Revision Insert
Summary: Perform "blind" updates and inserts on the primary as you increase revisions . Use the lambda trigger that tracks changes in revision to insert revisions asynchronously.
Run update with
{ UpdateExpression: 'ADD revisions :revisionIncrement', ExpressionAttributeValues: { ':revisionIncrement': 1, }, }
The ADD action will automatically create revisions if it does not exist, and will consider it 0 .
To overwrite entries with put value with revisions step based on previous get request.
Configure the DynamoDB Stream view type to return both new and old images. Set up lambda trigger for database table. Here is a sample code for NodeJS that would compare old and new images and call a function to record revisions in batch mode.
export function handler(event) { const oldRevisions = event.Records .filter(record => record.dynamodb.OldImage && record.dynamodb.NewImage && record.dynamodb.OldImage.revision.N !== record.dynamodb.NewImage.revision.N) .map(record => record.dynamodb.OldImage); batchWriteRevisions(oldRevisions); }
This is just an example, but working code will most likely include more checks.
Cost
- 1 unit of read capacity per 4K for access to v0 (only when overwriting)
- 1 unit of write capacity per 1 KB for Put / Update v0
- 1 DynamoDB Stream read request block per GetRecords command
- 1 unit of recording capacity per 1 KB for put edition
Notes:
- DynamoDB Stream segment data expires in 24 hours
- DynamoDB Stream read request blocks are independent of table read capacity units
- Using lambda functions comes at a price
- Changing the presentation type of a stream requires disabling and re-enabling the stream
- Works with Write, Put, BatchWriteItems, TransactWriteItems
For my use cases, I already use DynamoDB Streams and do not expect users to request version lines so often. I can also let users wait a while until the revisions are ready, as they are asynchronous. This makes using a second table and an automated lambda process more ideal for me.
There are several points of failure for asynchronous options. However, this is something you can either repeat immediately on demand, or plan for the future DynamoDB Stream solution.
If anyone has any other solutions or criticisms, please comment. Thanks!