Using amazon data pipeline to back up dynamoDB data on S3

Question

Using amazon data pipeline to back up dynamoDB data on S3

I need to backup dynamoDB table data on S3 using amazon data pipeline.

My question is: can I use one data pipeline to back up several dynamoDB tables on S3, or do I need to make a separate pipeline for each of them?

Also, since my tables have the year_month prefix (ex-2014_3_tableName), I was thinking of using the SDK for data to change the table name in the pipeline definition after changing the month. Will this work? Is there an alternative / better way?

Thanks!!

+2

amazon-s3 amazon-web-services amazon-dynamodb amazon-data-pipeline

user3610975 May 07 '14 at 6:50

source share

2 answers

His old question, but I have been looking for an answer in recent days. When adding multiple DynamoDBDataNode, you can still use one single S3DataNode output. Just select the folders in the S3 bucket by specifying another output.directoryPath file in the "Step of Impact" field.

Like this: # {output.directoryPath} / newFolder

Each new folder will be automatically created in the s3 bucket.

+1

Davide b 21 sept '16 at 19:27

source share

David yanacek · Accepted Answer · 2014-05-07T17:07:57+0000

If you customize your data line through the Import / Export DynamoDB Console button, you will have to create a separate pipeline for each table. If you directly use the Data Pipeline (either through the Data Pipeline API or through the Data Pipeline console), you can export multiple tables to the same pipeline. For each table, simply add an additional DynamoDBDataNode and EmrActivity to associate this Data Node with the output S3DataNode.

As for your use case for the year_month prefix, using the sdk protocol to change table names periodically seems like the best approach. Another approach might be to make a copy of the script for which EmrActivity is exported (you can see the location of the script in the activity "step" section), and instead change the way the script bush determines the table name by checking the current date. You will need to make a copy of this script and place the modified script in your own S3 bucket and specify the EmrActivity value for this location, not the default. I have not tried any approach before, but both are theoretically possible.

More general information on exporting DynamoDB tables can be found in the DynamoDB Developer's Guide , and more information can be found in the AWS Data Pipeline Developer's Guide .

Using amazon data pipeline to back up dynamoDB data on S3

More articles: