How can I implement version control without replacing the previous entry in DynamoDB?

Question

How can I implement version control without replacing the previous entry in DynamoDB?

Currently, I see that when I use version control in DynamoDB, it changes the version number, but the new record replaces the old record; i.e:

old

{ object:one, name:"hey", version:1}

new

 { object:one, name:"ho", version:2}

I want to have bot records in db; i.e:

old

 { object:one, name:"hey", version:1 }

new

 { object:one, name:"hey", version:1} { object:one, name:"ho", version:2}

Can this be done?

+15

java amazon-dynamodb

iCodeLikeImDrunk Jun 17 '14 at 22:50

source share

4 answers

I experimented and calculated what is most effective in terms of reading / writing units and cost, taking into account race conditions when updates are registered during version registration and avoiding data duplication. I narrowed down a couple of possible solutions. You should consider your best option.

Key concepts revolve around considering version 0 as the latest version. In addition, we will use the revisions key, which will list how many revisions exist before this element, but will also be used to determine the current version of the element ( version = revisions + 1 ). The ability to calculate how versions exist is a requirement, and, in my opinion, revisions satisfy this need, as well as the value that can be presented to the user.

Thus, the first line will be created using version: 0 and revisions: 0 . Although technically this is the first version (v1), we do not use the version number until it is archived. When this line changes, version remains at 0 , which still stands for the last, and revisions incremented to 1 . A new line is created with all previous values, except that now this line stands for version: 1 .

Summarizing:

When creating an item:

Create an element using revisions: 0 and version 0

When updating or overwriting an item:

Increased revisions
Insert the old line exactly as before, but replace version: 0 with a new version, which can easily be calculated as version: revisions + 1 .

Here's what the conversion to conversion will look like for a table with only a primary key:

Primary Key: id

  id color 9501 violet 9502 cyan 9503 magenta

Primary Key: Identifier + Version

 id version revisions color 9501 0 6 violet 9501 1 0 red 9501 2 1 orange 9501 3 2 yellow 9501 4 3 green 9501 5 4 blue 9501 6 5 indigo

Here is a table conversion that already uses the sort key:

Primary Key: ID + Date

 id date color 9501 2018-01 violet 9501 2018-02 cyan 9501 2018-03 black

Primary Key: id + date_ver

 id date_ver revisions color 9501 2018-01__v0 6 violet 9501 2018-01__v1 0 red 9501 2018-01__v2 1 orange 9501 2018-01__v3 2 yellow 9501 2018-01__v4 3 green 9501 2018-01__v5 4 blue 9501 2018-01__v6 5 indigo

Alternative No. 2:

 id date_ver revisions color 9501 2018-01 6 violet 9501 2018-01__v1 0 red 9501 2018-01__v2 1 orange 9501 2018-01__v3 2 yellow 9501 2018-01__v4 3 green 9501 2018-01__v5 4 blue 9501 2018-01__v6 5 indigo

In fact, we have the ability to either put previous versions in one table, or split them in our table. Both options have their advantages and disadvantages.

Using the same table:

The primary key consists of a partition key and a sort key
The version should be used in the sort key either separately as number , or added to the existing sort key as string

Benefits:

All data exists in one table.

Disadvantages:

Perhaps restricts the use of table sort keys
Version control uses the same recording units as your main table.
Sort keys can only be set during table creation.
Maybe you need to reconfigure the code for v0 request
Previous versions are also affected by indexes.

Using additional tables:

Add the revision key to both tables
If the sort key is not used, create a sort key for the secondary table named version . The primary table will always have version: 0 . Using this key in the primary table is optional.
If you are already using a sort key, see "Alternative No. 2" above

Benefits:

The primary table does not need to change any keys or recreate. get requests do not change.
The main table stores its sort key
The secondary table may have independent units for reading and writing
The secondary table has its own indices

Disadvantages:

Second table management required

No matter how you decide to split the data, now we have to decide how to create revision lines. Here are a few different methods:

Synchronously overwrite / update item on demand and insert revision on demand

Summary: Get the current version of the string. Update the current row and insert the previous version with one transaction.

To avoid race conditions, we need to write an update and insert it into the same operation using TransactWriteItems . In addition, we need to make sure that the version we are updating is the correct version by the time the query reaches the database server. We achieve this either with one of two checks, or even both:

In the Update command in TransactItems ConditionExpression must verify that the revision in the updated row matches the revision in the object that we performed on Get earlier.
The Put command in TransactItems ConditionExpression checks that the row does not exist yet.

Cost

1 4K read capacity for Get on v0
1 Recording capacity for preparing TransactWriteItem
1 write capacity at 1K for Put / Update on v0
1 1KB recording capacity for Revision Version
1 Recording capacity for committing TransactWriteItem

Notes:

Items limited to 400KB

Upon request, asynchronously retrieve items, overwrite / update items, and insert versions

Summary: Get and save the current line. When overwriting or updating a line, check the current revision and increment revisions . Insert a previously saved row with the version number.

Run update with

 { UpdateExpression: 'SET revisions = :newRevisionCount', ExpressionAttributeValues: { ':newRevisionCount': previousRow.revisions + 1, ':expectedRevisionCount': previousRow.revisions, }, ConditionExpression: 'revisions = :expectedRevisionCount', }

We can use the same ConditionExpression with put when overwriting a previously existing line.

In the response, we observe a ConditionalCheckFailedException . If this returns, it means that the revision has already been changed by another process, and we must repeat the process from the very beginning or interrupt it completely. If there are no exceptions, then we can insert the previous saved row after updating the attribute value of your version accordingly (numeric or string).

Cost

1 unit of 4K reading capacity for Get on v0
1 unit of write capacity per 1 KB for Put / UpdateItem in v0
1 unit of write capacity per KB for Put version

On demand, asynchronous hidden element update and change-insert

Summary: Perform a “blind” update to v0, increasing revisions and requesting old attributes. Use the return value to create a new line with the version number.

Run update-item with

 { UpdateExpression: 'ADD revisions :revisionIncrement', ExpressionAttributeValues: { ':revisionIncrement': 1, }, ReturnValues: 'ALL_OLD', }

The ADD action will automatically create revisions if it does not exist, and will consider it 0 . Another nice benefit of ReturnValues:

No additional costs associated with requesting a return value, with the exception of a small network, and the overhead of processing the receipt of a larger response. Reading units are not used.

In response to the update, the Attributes value will be data from the old record. The version of this entry is Attributes.revisions + 1 . If necessary, change the value of the version attribute (numeric or string).

Now you can insert this entry into your target table.

Cost

1 unit of write capacity per 1 KB for v0 upgrade
1 unit of write capacity per KB for Put version

Notes:

The length of the returned Attributes object is limited to 65535.
There is no solution to rewrite strings.

Automatic Asynchronous Revision Insert

Summary: Perform "blind" updates and inserts on the primary as you increase revisions . Use the lambda trigger that tracks changes in revision to insert revisions asynchronously.

Run update with

 { UpdateExpression: 'ADD revisions :revisionIncrement', ExpressionAttributeValues: { ':revisionIncrement': 1, }, }

The ADD action will automatically create revisions if it does not exist, and will consider it 0 .

To overwrite entries with put value with revisions step based on previous get request.

Configure the DynamoDB Stream view type to return both new and old images. Set up lambda trigger for database table. Here is a sample code for NodeJS that would compare old and new images and call a function to record revisions in batch mode.

 /** * @param {AWSLambda.DynamoDBStreamEvent} event * @return {void} */ export function handler(event) { const oldRevisions = event.Records .filter(record => record.dynamodb.OldImage && record.dynamodb.NewImage && record.dynamodb.OldImage.revision.N !== record.dynamodb.NewImage.revision.N) .map(record => record.dynamodb.OldImage); batchWriteRevisions(oldRevisions); }

This is just an example, but working code will most likely include more checks.

Cost

1 unit of read capacity per 4K for access to v0 (only when overwriting)
1 unit of write capacity per 1 KB for Put / Update v0
1 DynamoDB Stream read request block per GetRecords command
1 unit of recording capacity per 1 KB for put edition

Notes:

DynamoDB Stream segment data expires in 24 hours
DynamoDB Stream read request blocks are independent of table read capacity units
Using lambda functions comes at a price
Changing the presentation type of a stream requires disabling and re-enabling the stream
Works with Write, Put, BatchWriteItems, TransactWriteItems

For my use cases, I already use DynamoDB Streams and do not expect users to request version lines so often. I can also let users wait a while until the revisions are ready, as they are asynchronous. This makes using a second table and an automated lambda process more ideal for me.

There are several points of failure for asynchronous options. However, this is something you can either repeat immediately on demand, or plan for the future DynamoDB Stream solution.

If anyone has any other solutions or criticisms, please comment. Thanks!

+14

Shortfuse Feb 08 '19 at 21:30

source share

You can also achieve this by maintaining two separate tables. One for the most recent items, and another for their versions. I wrote a blog post with a detailed explanation https://www.efekarakus.com/2018/05/25/client-side-row-versioning-in-dynamo-db.html

A resource table where the hash is the primary key.

  +----------+---------+-------------------+ | hash | version | attr1..attrN | +----------+---------+-------------------+ | 1c5815b2 | 2 | some values | +----------+---------+-------------------+

A resource history table, where the hash is the section key and the version is the sort key.

  +----------+---------+-------------------+ | hash | version | attr1..attrN | +----------+---------+-------------------+ | 1c5815b2 | 2 | some values | +----------+---------+-------------------+ | 1c5815b2 | 1 | some old values | +----------+---------+-------------------+

The important part is that any action that modifies a record must increment its version number.

When you create or update a resource, first write to the resource history table, and then to the resource table.

I found this to be a little cleaner, because you will not run into possible data loss scenarios, as when working with immutable data in a single table.

+8

Efe karakus Mar 26 '18 at 15:34

source share

Amazon made a recommendation on how to implement version control in DynamoDB: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-sort-keys.html#bp-sort-keys-version-control

Using the sort key as the version, you can make sure that the latter is always always the first (for example, "v0_"), and the rest of the keys are arranged sequentially after that. They also suggest cloning v0_latest to "v00x_" so that it can be the last key for searches that want to streamline version history.

See this link for full details.

+4

kos Aug 7 '18 at 3:17

source share

Erben mo · Accepted Answer · 2014-06-17T23:50:34+0000

I do not think that the DynamoDB service currently supports linear versioning. If you need a version control feature, you will need to do this on your side.

In DynamoDB, a string is uniquely identified by its primary key. The primary key can be either HashKey-only or HashKey + RangeKey. If you want to distinguish the same line with different versions, you need to specify the version number somewhere in your primary key.

For example, you can add the version number to the end of your hash key for all old versions of the string. The line with the latest version will use the original hash.

 Hash Attr Version hey a2 2 hey_v1 a1 1

after updating the row to version 3, the table should look like this:

 Hash Attr Version hey a3 3 hey_v1 a1 1 hey_v2 a2 2

Client side versioning is always not perfect. for example, for the above approach, if you run a scan, you will also get hey_V1 and hey_v2. let me know if this works for you or not. If you have a better way to do client-side versioning, please submit here.

How can I implement version control without replacing the previous entry in DynamoDB?

Using the same table:

Using additional tables:

Synchronously overwrite / update item on demand and insert revision on demand

Upon request, asynchronously retrieve items, overwrite / update items, and insert versions

On demand, asynchronous hidden element update and change-insert

Automatic Asynchronous Revision Insert

More articles: