├── .github └── PULL_REQUEST_TEMPLATE.md ├── .gitignore ├── Architecture.png ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── ImportExport.png ├── LICENSE ├── NOTICE ├── README.md ├── cost-calculator.xlsx ├── dist ├── dynamodb_continuous_backup-1.4.zip └── dynamodb_continuous_backup-1.5.zip └── src ├── .gitignore ├── build.sh ├── config.hjson ├── config.loc ├── deploy.py ├── deprovision_tables.py ├── dynamo_continuous_backup.py ├── index.py ├── provision_tables.py ├── provisioning_whitelist.hjson └── setup_existing_tables.py /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | *Issue #, if available:* 2 | 3 | *Description of changes:* 4 | 5 | 6 | By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. 7 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .project 2 | 3 | .pydevproject 4 | 5 | src/lib 6 | 7 | src/*.pyc 8 | -------------------------------------------------------------------------------- /Architecture.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awslabs/dynamodb-continuous-backup/62b4108af14eb797ad004e0a358f9b9b3e9d2dfe/Architecture.png -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check [existing open](https://github.com/awslabs/dynamodb-continuous-backup/issues), or [recently closed](https://github.com/awslabs/dynamodb-continuous-backup/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/awslabs/dynamodb-continuous-backup/labels/help%20wanted) issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](https://github.com/awslabs/dynamodb-continuous-backup/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /ImportExport.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awslabs/dynamodb-continuous-backup/62b4108af14eb797ad004e0a358f9b9b3e9d2dfe/ImportExport.png -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | DynamoDB Continuous Backup Utility 2 | Copyright 2014-2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DynamoDB Continuous Backup Utility 2 | 3 | Amazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit millisecond latency at any scale. For data durability, Tables are automatically distributed across 3 facilities in an AWS Region of your choice, and ensure continous operation even in the case of AZ level interruption of service. 4 | 5 | DynamoDB can be [backed up using Amazon Data Pipeline](http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part2.html), which creates full point in time copies of DynamoDB tables to AmazonS3. If you want to restore data from a point in time, you simply [reimport that data into a new table](http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb-part1.html). 6 | 7 | __PLEASE NOTE THAT THIS UTILITY IS NOW DEPRECATED IN FAVOR OF [DYNAMODB POINT IN TIME RECOVERY](https://aws.amazon.com/dynamodb/pitr)__ 8 | 9 | ![Import Export Backup](ImportExport.png) 10 | 11 | For some customers, this full backup and restore model works extremely well. Other customers need the ability to recover data at the item level, with a frequency that is much lower than a full periodic backup and restore. For example, they may want to recover changes made to a single item within just a few minutes. 12 | 13 | This module gives you the ability to configure continuous, streaming backup of all data in DynamoDB Tables to Amazon S3 via [AWS Lambda Streams to Firehose](https://github.com/awslabs/lambda-streams-to-firehose), which will propagate all changes to a DynamoDB Table to Amazon S3 in as little as 60 seconds. This module completely automates the provisioning process, to ensure that all Tables created in an Account over time are correctly configured for continuous backup. It does this by operating the API calls in your Account. [Amazon CloudWatch Events](http://docs.aws.amazon.com/AmazonCloudWatch/latest/events/WhatIsCloudWatchEvents.html) subscribes to `DynamoDB::CreateTable` and `DeleteTable ` events and then forwards them to an AWS Lambda function that automates the configuration of continuous backup. This includes: 14 | 15 | * Configuring [DynamoDB Update Streams](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) for the Table that's just been created 16 | * Creating an [Amazon Kinesis Firehose Delivery Stream](http://docs.aws.amazon.com/firehose/latest/dev/basic-create.html) for the required destination of the backups on Amazon S3 17 | * Deploy AWS Lambda Streams to Firehose as a Lambda function in the account 18 | * Route the Table's `NEW_AND_OLD_IMAGES` update stream entries to the LambdaStreamToFirehose function 19 | 20 | ![Architecture](Architecture.png) 21 | 22 | By using this module, you can ensure that any Amazon DynamoDB Table that is created, whether as part of an application rollout, or just by a developer as they are doing development, has continuous incremental backups enabled. Once this module deployed, there are no ongoing operations needed to create these backups on S3. 23 | 24 | When you delete DynamoDB Tables, this automation also ensures that the Kinesis Firehose Delivery Stream is deleted, but the backup data on Amazon S3 is retained. 25 | 26 | # How much will it cost? 27 | 28 | This solution adds AWS service components to achieve continuous backup of your DynamoDB data. The additional cost will be made up of the following (depending on region prices may vary slightly): 29 | 30 | * Adding an update stream to the DynamoDB table. This costs $.02/100,000 reads after the first 2.5M reads per update stream 31 | * Adding a Kinesis Firehose Delivery Stream per table. This costs $.035/GB ingested to the delivery stream 32 | * Backup data storage on S3. This costs the customer ~ $.03/GB 33 | * CloudWatch Events. This costs $1.0/million events. 34 | * AWS Lambda invocations to forward DynamoDB Update Stream data to Kinesis Firehose. This costs $.20/million invocations, after the first million. 35 | 36 | We believe that these costs are relatively low, but you should assess the cost implications to running this solution in your account, especially on tables with a very large number of write IOPS. 37 | 38 | # Getting Started 39 | 40 | ## Create the configuration 41 | 42 | To get started with this function, simply clone the project, and then create a configuration file. This module uses [hjson](https://hjson.org) to make the configuration easy to maintain and read over time, and there's an example file in [config.hjson](src/config.hjson). You may need to work with your AWS Administrator to setup some of the IAM Roles with the correct permissions. You will only need one configuration for **all** tables in your account, and the following items must be configured: 43 | 44 | * `region` - the AWS Region where you want the function deployed 45 | * `cloudWatchRoleArn` - IAM Role ARN which CloudWatch Events will use to invoke your Lambda function 46 | * `firehoseDeliveryBucket` - The S3 bucket where DynamoDB backup data should be stored 47 | * `firehoseDeliveryPrefix` - The prefix on S3 where DynamoDB backup data should be stored. The table name will be added automatically to this prefix, as will the date and time of the backup file 48 | * `firehoseDeliveryRoleArn` - the ARN of the IAM role that Kinesis Firehose will use to write to S3 49 | * `firehoseDeliverySizeMB` - size in MB of dynamo DB backup files to write to S3 50 | * `firehoseDeliveryIntervalSeconds` - output interval in seconds for backup files (minimum of 60) 51 | * `lambdaExecRoleArn` - IAM Role ARN for which AWS Lambda uses to write to Kinesis Firehose 52 | * `streamsMaxRecordsBatch` - Number of update records to stream to the continuous backup function at one time. This number times your DDB record size must be < 128K 53 | * `tableNameMatchRegex` - Regular expression that is used to control which tables are provisioned for continuous backup. If omitted or invalid then it will not be used 54 | 55 | An appendix with the structure of the required IAM role permissions is at the end of this document. 56 | 57 | # Installing into your Account 58 | 59 | In order to deploy this function into your AWS Account, you must first build the Lambda module with the provided configuration, and then deploy it to your account. The module is configured at account level, and runs for all DynamoDB tables created. 60 | 61 | ## Build the Lambda Function 62 | 63 | We now need to build the Lambda function so we can deploy it to your account. This means we need to add the configuration file into the archive that will be run as an AWS Lambda function. To do this, run: 64 | 65 | ``` 66 | cd src 67 | ./build.sh 68 | ``` 69 | 70 | where `` is the file you just created, ideally in the same directory as src. This will install the required modules into the `/lib` folder if they aren't there, and then create a Zip Archive which is used to deploy the Lambda function. 71 | 72 | 73 | ## Deploy to AWS Lambda 74 | 75 | Now that the function is built, we need to prepare your account and deploy the Lambda function we've just built. By running: 76 | 77 | ``` 78 | cd src 79 | deploy.py --config-file 80 | ``` 81 | 82 | We will: 83 | 84 | * Subscribe CloudWatch Events to DynamoDB CreateTable and DeleteTable API Calls 85 | * Deploy the Lambda function 86 | * Subscribe the Lambda function to the CloudWatch Events Subscription 87 | * Enable CloudWatch Events to invoke the Lambda function 88 | 89 | ## Verify 90 | 91 | Once deployed, you can verify the correct operation of this function by: 92 | 93 | * Ensuring that there is a CloudWatch Events Rule for `dynamodb.amazonaws.com` for events `CreateTable` and `DeleteTable` 94 | * This CloudWatch Events Rule has a target of the `EnsureDynamoBackup` Lambda function 95 | * Create a simple test DynamoDB table, and observe the CloudWatch Logs output from `EnsureDynamoBackup` indicating that it has provisioned the continuous backup. For example: 96 | 97 | ``` 98 | aws dynamodb create-table --region eu-west-1 --attribute-definitions AttributeName=MyHashKey,AttributeType=S --key-schema AttributeName=MyHashKey,KeyType=HASH --provisioned-throughput ReadCapacityUnits=1,WriteCapacityUnits=1 --table-name MyTestTable 99 | ``` 100 | 101 | This will result in log output such as: 102 | 103 | ``` 104 | START RequestId: 0afc60f9-7a6b-11e6-a7ee-c571d294a8c0 Version: $LATEST 105 | Loaded configuration from config.hjson 106 | Enabled Update Stream for MyDynamoTable 107 | Resolved DynamoDB Stream ARN: arn:aws:dynamodb:eu-west-1::table/MyDynamoTable/stream/2016-09-14T11:04:46.890 108 | Created new Firehose Delivery Stream arn:aws:firehose:eu-west-1::deliverystream/MyDynamoTable 109 | Resolved Firehose Delivery Stream ARN: arn:aws:firehose:eu-west-1::deliverystream/MyDynamoTable 110 | Processed 1 Events 111 | END RequestId: 0afc60f9-7a6b-11e6-a7ee-c571d294a8c0 112 | REPORT RequestId: 0afc60f9-7a6b-11e6-a7ee-c571d294a8c0 Duration: 3786.87 ms Billed Duration: 3800 ms Memory Size: 128 MB Max Memory Used: 51 MB 113 | ``` 114 | 115 | Please note that API Calls => CloudWatch Events => AWS Lambda propagation can take several minutes. 116 | 117 | ## Activating continuous backup for existing tables 118 | 119 | Once you have performed the above steps, continuous backup will be configured for all new Tables created in DynamoDB. If you would like to also provision continuous backup for the existing tables in your account, you can use the `provision_tables.py` script. 120 | 121 | First, you need to indicate if you want all tables, or only a subset of tables to be provisioned. You do this with a configuration file: 122 | 123 | ``` 124 | { 125 | "provisionAll": (true|false), 126 | "tableNames": [ 127 | "Table1", 128 | "Table2", 129 | "...", 130 | "TableN" 131 | ] 132 | } 133 | ``` 134 | 135 | This file, like others in the module, uses HJson. By setting `provisionAll` to `true`, the whitelist will be ignored and all Tables in your account will be configured for continuous backup. However, if you do not include the value, or set `false`, then the `tableNames` provided will be used, and only those tables will be configured: 136 | 137 | `python provision_tables.py my_table_whitelist.hjson` 138 | 139 | You can use the `deprovision_tables.py` script in exactly the same way to tear down the continuous backup configuration. 140 | 141 | # Limits 142 | 143 | Please note that default Account limits are for 20 Kinesis Firehose Delivery Streams, and this module will create one Firehose Delivery Stream per Table. If you require more, please file a [Limit Increase Request](https://aws.amazon.com/support/createCase?serviceLimitIncreaseType=kinesis-firehose-limits&type=service_limit_increase). 144 | 145 | # Backup Data on S3 146 | 147 | Data is backed up automatically via Amazon Kinesis Firehose. The Firehose Delivery Stream that is created as part of provisioning will have the same name as the DynamoDB table you create. The output path of the Firehose Delivery Stream will be the configured bucket and prefix, plus the table name, and then the date in format `YYYY/MM/DD/HH`. An example backup file for a variety of options would look like: 148 | 149 | ``` 150 | {"Keys":{"MyHashKey":{"S":"abc"}},"NewImage":{"123":{"S":"asdfasdf"},"MyHashKey":{"S":"abc"}},"OldImage":{"123":{"S":"0921438-09"},"MyHashKey":{"S":"abc"}},"SequenceNumber":"19700000000011945700385","SizeBytes":45,"eventName":"MODIFY"} 151 | {"Keys":{"MyHashKey":{"S":"abc"}},"NewImage":{"123":{"S":"asdfasq223qdf"},"MyHashKey":{"S":"abc"}},"OldImage":{"123":{"S":"asdfasdf"},"MyHashKey":{"S":"abc"}},"SequenceNumber":"19800000000011945703002","SizeBytes":48,"eventName":"MODIFY"} 152 | ``` 153 | 154 | Every change made to the DynamoDB Item is stored sequentially in Amazon S3, using the Date that the Item was forwarded to Kinesis Firehose from the UpdateStream. You may expect a propagation delay of a few seconds between the DynamoDB Item update/insert/delete time, and the forwarding of the event to Kinesis Firehose. Firehose will then buffer data for the configured `firehoseDeliveryIntervalSeconds`. 155 | 156 | # Filtering which tables are backed up 157 | 158 | By default, all Tables in your account will be configured for backup. If you want to filter this down to a subset, you can supply a `tableNameMatchRegex` which will check the name of the Table created in DynamoDB against the supplied regular expression. If it matches then the Table will get backed up. If you supply an invalid regular expression, or have other issues with the supplied configuration __then the table will be backed up__. 159 | 160 | You can also go further by supplying your own code which validates if a Table should be backed up. For instance, you might check a configuration file or database entry, or check other properties such as the number of IOPS. To implement your own function, simply code a new module in `dynamo_continuous_backup.py` that takes a single String argument (the DynamoDB table name) and returns Boolean. Once done, you can register the function by setting it's name as the implementation function [on line 53](src/dynamo_continuous_backup.py#L53): 161 | 162 | ``` 163 | def my_filter_function(dynamo_table_name): 164 | # custom logic implementation to filter tables in/out of backup 165 | ... 166 | return (True|False) 167 | 168 | optin_function = my_filter_function 169 | ``` 170 | 171 | # Prerequisites for setup & running 172 | 173 | In order to use this module, you will need to have installed the following: 174 | 175 | * Python 176 | * Boto3 177 | * HJson 178 | * ShortUUID 179 | * aws-cli 180 | 181 | Installation of Python & Pip is beyond the scope of this document, but once installed, run: 182 | 183 | ``` 184 | pip install --upgrade boto3 awscli hjson shortuuid 185 | ``` 186 | 187 | and on some systems you may need to run with `sudo`, and on Mac may need to add `--ignore-installed six`. 188 | 189 | When running provision/deprovision tables, you will need to provide access credentials which provide at least the following AWS service permissions: 190 | 191 | ``` 192 | "dynamodb:DescribeStream", 193 | "dynamodb:DescribeTable", 194 | "dynamodb:ListStreams", 195 | "dynamodb:ListTables", 196 | "dynamodb:UpdateTable", 197 | "firehose:CreateDeliveryStream", 198 | "firehose:DescribeDeliveryStream", 199 | "firehose:ListDeliveryStreams", 200 | "firehose:DeleteDeliveryStream", 201 | "lambda:AddPermission", 202 | "lambda:CreateEventSourceMapping", 203 | "lambda:GetEventSourceMapping", 204 | "lambda:GetFunction", 205 | "lambda:GetPolicy", 206 | "lambda:ListAliases", 207 | "lambda:ListEventSourceMappings", 208 | "lambda:ListFunctions", 209 | "lambda:UpdateEventSourceMapping", 210 | "lambda:DeleteEventSourceMapping" 211 | ``` 212 | 213 | 214 | # Performing a Restore 215 | 216 | ## Determining which data needs to be restored 217 | 218 | The first step in performing a restoration operation is to determine which data must be restored. You can easily review all changes made to a Table Item over time, by running queries via [Amazon EMR](https://aws.amazon.com/emr), using Presto or Hive integration with Amazon S3. First, provision an Amazon EMR Cluster, and ensure that it has Hive and Hue enabled as installed tools. Once up, [connect to Hue](http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/accessing-hue.html) and open a query editor. 219 | 220 | The first thing we'll need is a [SerDe](https://cwiki.apache.org/confluence/display/Hive/SerDe) that allows us to read the complex, nested JSON structure that our DynamoDB Update Stream forwards to Kinesis Firehose. The OpenX JSON Serde is excellent for this - to load it into your cluster, [follow the instructions for building the SerDe](https://github.com/rcongiu/Hive-JSON-Serde), upload the generated JAR to S3, and run: 221 | 222 | ` 223 | add jar s3://mybucket/prefix/json-serde-1.3.8-SNAPSHOT-jar-with-dependencies.jar; 224 | ` 225 | 226 | We'll now create a Hive Table on top of our backup location, which uses this SerDe to read the JSON and let us query fine grained details. This table will be read-only on our backup data, and you can drop it at any time: 227 | 228 | ``` 229 | create external table MyTable_
_( 230 | Keys map>, 231 | NewImage map>, 232 | OldImage map>, 233 | SequenceNumber string, 234 | SizeBytes bigint, 235 | eventName string) 236 | ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' 237 | location 's3://backup-bucket/backup-prefix/MyTable///
/'; 238 | ``` 239 | 240 | Please note that with the above, we'd just be looking at a single hour's worth of backup changes. If you'd prefer to look at a whole day, then you'd just remove the `` part of the table creation statement and location. 241 | 242 | You can then query changes to your table, down to item level, by using the following HQL: 243 | 244 | ``` 245 | select OldImage['attribute1']['s'], 246 | NewImage['attribute1']['s'], 247 | SequenceNumber, 248 | SizeBytes, 249 | EventName 250 | from MyTable_
_ 251 | where Keys['MyHashKey']['s'] = 252 | order by SequenceNumber desc; 253 | ``` 254 | 255 | You can add as many different attributes from the item as needed, or use the `NewImage['attribute1']['s']` values in a where clause that matches items that indicate the need for restoration. 256 | 257 | ## Restoring a DynamoDB Item 258 | 259 | This module does not provide any direct function for performing an update to an Item in DynamoDB, simply because we believe there are many many different ways you might want to do this, as well as a likely need for validation and approval to make a manual change to an application table. The above queries give you the ability to see how values were changed over time, and make an educated decision about what the 'restored' values should be, and it is highly likely that these changes should be introduced via the application itself, rather than bypassing application logic and directly updating the database. However, every customer has different requirements, and so please carefully consider the implications of updating your application DB before making any direct changes. 260 | 261 | # Appendix 1: IAM Role Permissions 262 | 263 | This module requires 3 roles in order to deliver data between CloudTrail, CloudWatch and Amazon Kinesis on your behalf. The following role policies are required, and each minimum set of permissions is shown. 264 | 265 | ## cloudWatchRoleArn 266 | 267 | IAM Role ARN which CloudWatch Events uses to invoke your AWS Lambda Function. 268 | 269 | Trust Relationship: `events.amazonaws.com` 270 | 271 | Predefined Policy: `CloudWatchEventsInvocationAccess` 272 | 273 | ## firehoseDeliveryRoleArn 274 | 275 | IAM Role ARN that Kinesis Firehose will use to write to S3. This role must have at least permissions to write to Amazon S3, find buckets, and create log events. 276 | 277 | Trust Relationship: `firehose.amazonaws.com` 278 | 279 | ``` 280 | { 281 | "Version": "2012-10-17", 282 | "Statement": [ 283 | { 284 | "Sid": "", 285 | "Effect": "Allow", 286 | "Action": [ 287 | "s3:AbortMultipartUpload", 288 | "s3:GetBucketLocation", 289 | "s3:GetObject", 290 | "s3:ListBucket", 291 | "s3:ListBucketMultipartUploads", 292 | "s3:PutObject" 293 | ], 294 | "Resource": [ 295 | "arn:aws:s3:::", 296 | "arn:aws:s3:::/*" 297 | ] 298 | }, 299 | { 300 | "Sid": "", 301 | "Effect": "Allow", 302 | "Action": [ 303 | "logs:PutLogEvents" 304 | ], 305 | "Resource": [ 306 | "arn:aws:logs:eu-west-1::log-group:/aws/kinesisfirehose/*:log-stream:*" 307 | ] 308 | } 309 | ] 310 | } 311 | ``` 312 | 313 | ## lambdaExecRoleArn 314 | 315 | IAM Role ARN for which AWS Lambda uses to write to Kinesis Firehose. This role must have rights to use PutRecords on Kinesis Firehose. 316 | 317 | Trust Relationship: `lambda.amazonaws.com` 318 | 319 | ``` 320 | { 321 | "Version": "2012-10-17", 322 | "Statement": [ 323 | { 324 | "Sid": "Stmt1444729748000", 325 | "Effect": "Allow", 326 | "Action": [ 327 | "firehose:CreateDeliveryStream", 328 | "firehose:DescribeDeliveryStream", 329 | "firehose:ListDeliveryStreams", 330 | "firehose:PutRecord", 331 | "firehose:PutRecordBatch", 332 | "dynamodb:DescribeStream", 333 | "dynamodb:DescribeTable", 334 | "dynamodb:GetRecords", 335 | "dynamodb:GetShardIterator", 336 | "dynamodb:ListStreams", 337 | "dynamodb:ListTables", 338 | "dynamodb:UpdateTable", 339 | "logs:CreateLogGroup", 340 | "logs:CreateLogStream", 341 | "logs:PutLogEvents", 342 | "lambda:CreateFunction", 343 | "lambda:CreateEventSourceMapping", 344 | "lambda:ListEventSourceMappings", 345 | "iam:passrole", 346 | "s3:Get*", 347 | "s3:List*" 348 | ], 349 | "Resource": [ 350 | "*" 351 | ] 352 | } 353 | ] 354 | } 355 | ``` 356 | 357 | # License 358 | 359 | Licensed under the Apache License, 2.0. 360 | -------------------------------------------------------------------------------- /cost-calculator.xlsx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awslabs/dynamodb-continuous-backup/62b4108af14eb797ad004e0a358f9b9b3e9d2dfe/cost-calculator.xlsx -------------------------------------------------------------------------------- /dist/dynamodb_continuous_backup-1.4.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awslabs/dynamodb-continuous-backup/62b4108af14eb797ad004e0a358f9b9b3e9d2dfe/dist/dynamodb_continuous_backup-1.4.zip -------------------------------------------------------------------------------- /dist/dynamodb_continuous_backup-1.5.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awslabs/dynamodb-continuous-backup/62b4108af14eb797ad004e0a358f9b9b3e9d2dfe/dist/dynamodb_continuous_backup-1.5.zip -------------------------------------------------------------------------------- /src/.gitignore: -------------------------------------------------------------------------------- 1 | /lib/ 2 | -------------------------------------------------------------------------------- /src/build.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | ver=1.5 4 | 5 | # Fail on any error: 6 | set -e 7 | 8 | # validate that the config file exists 9 | if [ ! -f $1 ]; then 10 | echo "$1 is not a valid file" 11 | usage 12 | fi 13 | 14 | # save the configuration file to the config.loc file 15 | if [ $# -ne 1 ]; then 16 | echo "Proceeding without a supplied configuration file. You must use the provided SAM or configure the backup Lambda function manually." 17 | else 18 | echo $1 | tr -d '\n' > config.loc 19 | fi 20 | 21 | if [ ! -d ../dist ]; then 22 | mkdir ../dist 23 | fi 24 | 25 | ARCHIVE=dynamodb_continuous_backup-$ver.zip 26 | 27 | # add required dependencies 28 | if [ ! -d lib/hjson ]; then 29 | pip install hjson -t lib 30 | fi 31 | 32 | if [ ! -d lib/shortuuid ]; then 33 | pip install shortuuid -t lib 34 | fi 35 | 36 | # bin the old zipfile 37 | if [ -f ../dist/$ARCHIVE ]; then 38 | echo "Removed existing Archive ../dist/$ARCHIVE" 39 | rm -Rf ../dist/$ARCHIVE 40 | fi 41 | 42 | cmd="zip -r ../dist/$ARCHIVE index.py dynamo_continuous_backup.py lib/" 43 | 44 | if [ $# -eq 1 ]; then 45 | cmd=`echo $cmd config.loc $1` 46 | fi 47 | 48 | echo $cmd 49 | 50 | eval $cmd 51 | 52 | echo "Generated new Lambda Archive ../dist/$ARCHIVE" 53 | -------------------------------------------------------------------------------- /src/config.hjson: -------------------------------------------------------------------------------- 1 | { 2 | // target region - for example 'us-east-1' or 'eu-west-1' 3 | "region" : "eu-west-1", 4 | 5 | // firehose destination information 6 | "firehoseDeliveryBucket" : "my-backup-bucket", 7 | "firehoseDeliveryPrefix" : "dynamodb/backup", 8 | 9 | // the ARN of the IAM role that Kinesis Firehose will use to write to S3 10 | "firehoseDeliveryRoleArn" : "arn:aws:iam:::role/firehose_delivery_role", 11 | 12 | // size in MB of dynamo DB backup files to write to S3 13 | "firehoseDeliverySizeMB" : 128, 14 | 15 | // output interval in seconds for backup files 16 | "firehoseDeliveryIntervalSeconds" : 60, 17 | 18 | // IAM Role ARN for which AWS Lambda uses to write to Kinesis Firehose 19 | "lambdaExecRoleArn" : "arn:aws:iam:::role/LambdaExecRole", 20 | 21 | // IAM Role used by CloudWatch Events to read from AWS CloudTrail and call AWS Lambda 22 | "cloudWatchRoleArn" : "arn:aws:iam:::role/CloudWatchEventsRole", 23 | 24 | // number of update records to stream to the continuous backup function at one time. This number times your DDB record size must be < 128K 25 | "streamsMaxRecordsBatch" : 1000, 26 | 27 | // regular expression to run against incoming CreateTable events, to implement filtering of which tables are configured 28 | "tableNameMatchRegex": ".*" 29 | } -------------------------------------------------------------------------------- /src/config.loc: -------------------------------------------------------------------------------- 1 | config.hjson -------------------------------------------------------------------------------- /src/deploy.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import sys 4 | 5 | # add the lib directory to the path 6 | sys.path.append('lib') 7 | 8 | import boto3 9 | import botocore 10 | import argparse 11 | import shortuuid 12 | import hjson 13 | import json 14 | 15 | cwe_client = None 16 | lambda_client = None 17 | version = '1.5' 18 | LAMBDA_FUNCTION_NAME = 'EnsureDynamoBackup' 19 | DDB_CREATE_DELETE_RULE_NAME = 'DynamoDBCreateDelete' 20 | 21 | 22 | def configure_cwe(region, cwe_role_arn): 23 | # connect to CloudWatch Logs 24 | global cwe_client 25 | cwe_client = boto3.client('events', region_name=region) 26 | 27 | # determine if there's an existing rule in place 28 | rule_query_response = {} 29 | try: 30 | rule_query_response = cwe_client.describe_rule( 31 | Name=DDB_CREATE_DELETE_RULE_NAME 32 | ) 33 | except botocore.exceptions.ClientError as e: 34 | code = e.response['Error']['Code'] 35 | if code == 'ResourceNotFoundException': 36 | pass 37 | else: 38 | raise e 39 | 40 | if 'Arn' in rule_query_response: 41 | print "Resolved existing DynamoDB CloudWatch Event Subscriber Rule %s" % (rule_query_response['Arn']); 42 | 43 | return rule_query_response['Arn'] 44 | else: 45 | # create a cloudwatch events rule 46 | rule_response = cwe_client.put_rule( 47 | Name=DDB_CREATE_DELETE_RULE_NAME, 48 | EventPattern='{"detail-type":["AWS API Call via CloudTrail"],"detail":{"eventSource":["dynamodb.amazonaws.com"],"eventName":["DeleteTable","CreateTable"]}}', 49 | State='ENABLED', 50 | Description='CloudWatch Events Rule to React to DynamoDB Create and DeleteTable events', 51 | RoleArn=cwe_role_arn 52 | ) 53 | 54 | print "Created new CloudWatch Events Rule %s" % (rule_response["RuleArn"]) 55 | return rule_response["RuleArn"] 56 | 57 | 58 | def deploy_lambda_function(region, lambda_role_arn, cwe_rule_arn, force): 59 | # connect to lambda 60 | global lambda_client 61 | lambda_client = boto3.client('lambda', region_name=region) 62 | 63 | deployment_zip = open('../dist/dynamodb_continuous_backup-%s.zip' % (version), 'rb') 64 | deployment_contents = deployment_zip.read() 65 | deployment_zip.close() 66 | 67 | response = None 68 | function_arn = None 69 | try: 70 | response = lambda_client.create_function( 71 | FunctionName=LAMBDA_FUNCTION_NAME, 72 | Runtime='python2.7', 73 | Role=lambda_role_arn, 74 | Handler='index.event_handler', 75 | Code={ 76 | 'ZipFile': deployment_contents, 77 | }, 78 | Description="Function to ensure DynamoDB tables are configured with continuous backup", 79 | Timeout=300, 80 | MemorySize=128, 81 | Publish=True 82 | ) 83 | function_arn = response['FunctionArn'] 84 | 85 | print "Deployed new DynamoDB Ensure Backup Module to %s" % (function_arn) 86 | except botocore.exceptions.ClientError as e: 87 | code = e.response['Error']['Code'] 88 | if code == 'ResourceAlreadyExistsException' or code == 'ResourceConflictException': 89 | if force: 90 | response = lambda_client.update_function_code( 91 | FunctionName=LAMBDA_FUNCTION_NAME, 92 | ZipFile=deployment_contents, 93 | Publish=True 94 | ) 95 | function_arn = response['FunctionArn'] 96 | # store the arn with the version number stripped off 97 | function_arn = ":".join(function_arn.split(":")[:7]) 98 | 99 | print "Redeployed DynamoDB Ensure Backup Module to %s" % (response['FunctionArn']) 100 | else: 101 | response = lambda_client.get_function( 102 | FunctionName=LAMBDA_FUNCTION_NAME 103 | ) 104 | function_arn = response['Configuration']['FunctionArn'] 105 | 106 | print "Using existing DynamoDB Ensure Backup Module at %s" % (function_arn) 107 | else: 108 | raise e 109 | 110 | # query for a permission being granted to the function 111 | policy = None 112 | response_doc = None 113 | events_grant_ok = False 114 | try: 115 | policy = lambda_client.get_policy(FunctionName=LAMBDA_FUNCTION_NAME)['Policy'] 116 | response_doc = json.loads(policy) 117 | 118 | if 'Statement' in response_doc: 119 | # spin through and determine if an Allow grant has been made to CW Events to InvokeFunction 120 | for x in response_doc['Statement']: 121 | if 'Action' in x and x['Action'] == 'lambda:InvokeFunction' and x['Effect'] == 'Allow' and x['Principal']['Service'] == 'events.amazonaws.com': 122 | events_grant_ok = True 123 | break; 124 | except botocore.exceptions.ClientError as e: 125 | code = e.response['Error']['Code'] 126 | if code == 'ResourceNotFoundException': 127 | pass 128 | else: 129 | print e 130 | 131 | if events_grant_ok: 132 | print "Permission to execute Lambda function already granted to CloudWatch Events" 133 | else: 134 | # add a permission for CW Events to invoke this function 135 | response = lambda_client.add_permission( 136 | FunctionName=LAMBDA_FUNCTION_NAME, 137 | StatementId=shortuuid.uuid(), 138 | Action='lambda:InvokeFunction', 139 | Principal='events.amazonaws.com', 140 | SourceArn=cwe_rule_arn 141 | ) 142 | 143 | print "Granted permission to execute Lambda function to CloudWatch Events" 144 | 145 | return function_arn 146 | 147 | 148 | def create_lambda_cwe_target(lambda_arn): 149 | existing_targets = cwe_client.list_targets_by_rule( 150 | Rule=DDB_CREATE_DELETE_RULE_NAME 151 | ) 152 | 153 | if 'Targets' not in existing_targets or len(existing_targets['Targets']) == 0: 154 | cwe_client.put_targets( 155 | Rule=DDB_CREATE_DELETE_RULE_NAME, 156 | Targets=[ 157 | { 158 | 'Id': shortuuid.uuid(), 159 | 'Arn': lambda_arn 160 | } 161 | ] 162 | ) 163 | 164 | print "Created CloudWatchEvents Target for Rule %s" % (DDB_CREATE_DELETE_RULE_NAME) 165 | else: 166 | print "Existing CloudWatchEvents Rule has correct Target Function" 167 | 168 | 169 | def configure_backup(region, cwe_role_arn, lambda_role_arn, redeploy_lambda): 170 | # setup a CloudWatchEvents Rule 171 | cwe_rule_arn = configure_cwe(region, cwe_role_arn) 172 | 173 | # deploy the lambda function 174 | lambda_arn = deploy_lambda_function(region, lambda_role_arn, cwe_rule_arn, redeploy_lambda) 175 | 176 | # create a target for our CloudWatch Events Rule that points to the Lambda function 177 | create_lambda_cwe_target(lambda_arn) 178 | 179 | 180 | 181 | if __name__ == "__main__": 182 | parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) 183 | parser.add_argument("--config-file", dest='config_file', action='store', required=False, help="Enter the path to the JSON or HJSON configuration file") 184 | parser.add_argument("--region", dest='region', action='store', required=False, help="Enter the destination region") 185 | parser.add_argument("--cw_role_arn", dest='cw_role_arn', action='store', required=False, help="The CloudWatch Events Role ARN") 186 | parser.add_argument("--lambda_role_arn", dest='lambda_role_arn', action='store', required=False, help="The Lambda Execution Role ARN") 187 | parser.add_argument("--redeploy", dest='redeploy', action='store_true', required=False, help="Redeploy the Lambda function?") 188 | args = parser.parse_args() 189 | 190 | if args.config_file != None: 191 | # load the configuration file 192 | config = hjson.load(open(args.config_file, 'r')) 193 | 194 | configure_backup(config['region'], config['cloudWatchRoleArn'], config['lambdaExecRoleArn'], args.redeploy) 195 | else: 196 | # no configuration file provided so we need region, CW Role and Lambda Exec role args 197 | if args.region == None or args.cw_role_arn == None or args.lambda_role_arn == None: 198 | parser.print_help() 199 | else: 200 | configure_backup(args.region, args.cw_role_arn, args.lambda_role_arn, args.redeploy) 201 | -------------------------------------------------------------------------------- /src/deprovision_tables.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import sys 4 | 5 | # add the lib directory to the path 6 | sys.path.append('lib') 7 | 8 | import setup_existing_tables 9 | import argparse 10 | 11 | if __name__ == "__main__": 12 | parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) 13 | parser.add_argument('whitelist_configuration', help='whitelist_configuration.hjson') 14 | args = parser.parse_args() 15 | 16 | setup_existing_tables.deprovision(args.whitelist_configuration) 17 | -------------------------------------------------------------------------------- /src/dynamo_continuous_backup.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Module which configures an account for continuous backup of DynamoDB tables via LambdaStreamsToFirehose. 3 | 4 | Monitors a provided AWS CloudTrail which is forwarded to Amazon CloudWatch Logs, and then uses an AWS Lambda 5 | function to ensure DynamoDB tables have UpdateStreams configured, that LambdaStreamsToFirehose is deployed 6 | with the DynamoDB UpateStream as the trigger, and that a Kinesis Firehose Delivery Stream is provided for 7 | data archive to S3 8 | ''' 9 | 10 | import os 11 | import re 12 | import sys 13 | 14 | # add the lib directory to the path 15 | sys.path.append('lib') 16 | 17 | import time 18 | import boto3 19 | import botocore 20 | import hjson 21 | 22 | 23 | config = None 24 | regex_pattern = None 25 | 26 | version = "1.0.2" 27 | 28 | ''' 29 | Function that checks if a table should be opted into backups based on a regular expression provided 30 | in the configuration file 31 | ''' 32 | def table_regex_optin(dynamo_table_name): 33 | try: 34 | if "tableNameMatchRegex" in config: 35 | global regex_pattern 36 | if regex_pattern == None: 37 | regex_pattern = re.compile(get_config_value('tableNameMatchRegex')) 38 | 39 | # check the regular expression match 40 | if regex_pattern.match(dynamo_table_name): 41 | return True 42 | else: 43 | return False 44 | else: 45 | # no regular expression matching in the configuration 46 | return True 47 | except: 48 | return True 49 | 50 | ''' 51 | Function reference for how to check whether tables should be backed up - change this 52 | to the specific implementation that you've provided 53 | 54 | Spec: boolean = f(string) 55 | ''' 56 | optin_function = table_regex_optin 57 | 58 | # constants - don't change these! 59 | REGION_KEY = 'AWS_REGION' 60 | LAMBDA_STREAMS_TO_FIREHOSE = "LambdaStreamToFirehose" 61 | LAMBDA_STREAMS_TO_FIREHOSE_VERSION = "1.5.1" 62 | LAMBDA_STREAMS_TO_FIREHOSE_BUCKET = "awslabs-code" 63 | LAMBDA_STREAMS_TO_FIREHOSE_PREFIX = "LambdaStreamToFirehose" 64 | CONF_LOC = 'config.loc' 65 | dynamo_client = None 66 | dynamo_resource = None 67 | current_region = None 68 | firehose_client = None 69 | lambda_client = None 70 | 71 | 72 | ''' 73 | Configuration accessor. Rule is to access the provided configuration first, and then fall back to Environment Variables 74 | ''' 75 | def get_config_value(key): 76 | if config != None and key in config: 77 | return config[key] 78 | elif key in os.environ: 79 | return os.environ[key] 80 | else: 81 | raise Exception("Unable to establish location of Config. %s not found" % (key)) 82 | 83 | 84 | ''' 85 | Initialise the module with the provided or default configuration 86 | ''' 87 | def init(config_override): 88 | global config 89 | global current_region 90 | global dynamo_client 91 | global dynamo_resource 92 | global firehose_client 93 | global lambda_client 94 | 95 | config_file_name = None 96 | 97 | if config == None: 98 | # read the configuration file name from the config.loc file 99 | if config_override == None: 100 | if os.path.isfile(CONF_LOC): 101 | config_file_name = open(CONF_LOC, 'r').read() 102 | print "Using compiled configuration %s" % (config_file_name) 103 | else: 104 | # there's no configuration override, and no config pointer file, so we'll use environment variables for config only 105 | print "No Configuration File supplied. Using Environment Variables" 106 | else: 107 | print "Using Config Override %s" % (config_override) 108 | config_file_name = config_override 109 | 110 | config = hjson.load(open(config_file_name, 'r')) 111 | print "Loaded configuration from %s" % (config_file_name) 112 | 113 | # load the region from the context 114 | if current_region == None: 115 | try: 116 | current_region = os.environ.get('AWS_DEFAULT_REGION', os.environ[REGION_KEY]) 117 | if current_region == None or current_region == '': 118 | raise KeyError 119 | except KeyError: 120 | raise Exception("Unable to resolve what region to use. Please set AWS_DEFAULT_REGION.") 121 | 122 | # connect to the required services 123 | if dynamo_client == None: 124 | dynamo_client = boto3.client('dynamodb', region_name=current_region) 125 | dynamo_resource = boto3.resource('dynamodb', region_name=current_region) 126 | firehose_client = boto3.client('firehose', region_name=current_region) 127 | lambda_client = boto3.client('lambda', region_name=current_region) 128 | 129 | 130 | ''' 131 | Check if a DynamoDB table has update streams enabled, and if not then turn it on 132 | ''' 133 | def ensure_stream(table_name): 134 | table = dynamo_resource.Table(table_name) 135 | 136 | # determine if the table has an update stream 137 | stream_arn = None 138 | if table.stream_specification == None or table.stream_specification["StreamEnabled"] == False: 139 | # enable update streams 140 | dynamo_client.update_table( 141 | TableName=table_name, 142 | StreamSpecification={ 143 | 'StreamEnabled': True, 144 | 'StreamViewType': 'NEW_AND_OLD_IMAGES' 145 | } 146 | ) 147 | 148 | # wait for the table to come out of 'UPDATING' status 149 | ok = False 150 | while not ok: 151 | result = dynamo_client.describe_table( 152 | TableName=table_name 153 | ) 154 | if result["Table"]["TableStatus"] == 'ACTIVE': 155 | ok = True 156 | print "Enabled Update Stream for %s" % (table_name) 157 | 158 | stream_arn = result["Table"]["LatestStreamArn"] 159 | else: 160 | # sleep for 1 second 161 | time.sleep(1) 162 | else: 163 | stream_arn = table.latest_stream_arn 164 | 165 | return stream_arn 166 | 167 | 168 | ''' 169 | Create a new Firehose Delivery Stream 170 | ''' 171 | def create_delivery_stream(for_table_name): 172 | try: 173 | response = firehose_client.create_delivery_stream( 174 | DeliveryStreamName=get_delivery_stream_name(for_table_name), 175 | S3DestinationConfiguration={ 176 | 'RoleARN': get_config_value('firehoseDeliveryRoleArn'), 177 | 'BucketARN': 'arn:aws:s3:::' + get_config_value('firehoseDeliveryBucket'), 178 | 'Prefix': "%s/%s/" % (get_config_value('firehoseDeliveryPrefix'), for_table_name), 179 | 'BufferingHints': { 180 | 'SizeInMBs': get_config_value('firehoseDeliverySizeMB'), 181 | 'IntervalInSeconds': get_config_value('firehoseDeliveryIntervalSeconds') 182 | }, 183 | 'CompressionFormat': 'GZIP' 184 | } 185 | ) 186 | 187 | print "Created new Firehose Delivery Stream %s" % (response["DeliveryStreamARN"]) 188 | 189 | return response["DeliveryStreamARN"] 190 | except botocore.exceptions.ClientError as e: 191 | print e 192 | raise e 193 | 194 | 195 | 196 | ''' 197 | Kinesis Firehose Delivery Stream Names are limited to 64 characters 198 | ''' 199 | def get_delivery_stream_name(dynamo_table_name): 200 | return dynamo_table_name[:64] 201 | 202 | 203 | ''' 204 | Check that we have a Firehose Delivery Stream of the same name as the provided DynamoDB Table. If not, then create it 205 | ''' 206 | def ensure_firehose_delivery_stream(dynamo_table_name): 207 | response = None 208 | 209 | delivery_stream_name = get_delivery_stream_name(dynamo_table_name) 210 | 211 | ok = False 212 | tries = 0 213 | try_count = 100 214 | while not ok and tries < try_count: 215 | try: 216 | response = firehose_client.describe_delivery_stream(DeliveryStreamName=delivery_stream_name) 217 | ok = True 218 | except botocore.exceptions.ClientError as e: 219 | if e.response['Error']['Code'] == 'ResourceNotFoundException': 220 | ok = True 221 | break 222 | if e.response['Error']['Code'] == 'LimitExceededException' or e.response['Error']['Code'] == 'ThrottlingException': 223 | # exponential backoff with base of 100 ms up to 3 seconds 224 | interval = max(.1 * pow(2, try_count), 3) 225 | print "Limit Exceeded: Backing off for %s seconds" % (interval) 226 | time.sleep(interval) 227 | tries += 1 228 | else: 229 | raise e 230 | 231 | if not ok: 232 | raise Exception("Unable to resolve Firehose Delivery Stream presence in 100 attempts. Aborting") 233 | else: 234 | if response and response["DeliveryStreamDescription"]["DeliveryStreamARN"]: 235 | delivery_stream_arn = response["DeliveryStreamDescription"]["DeliveryStreamARN"] 236 | 237 | return delivery_stream_arn 238 | else: 239 | # delivery stream doesn't exist, so create it 240 | delivery_stream_arn = create_delivery_stream(delivery_stream_name) 241 | 242 | return delivery_stream_arn 243 | 244 | 245 | ''' 246 | Wire the DynamoDB Update Stream to LambdaStreamsToFirehose, if it isn't already 247 | ''' 248 | def ensure_update_stream_event_source(dynamo_stream_arn): 249 | # ensure that we have a lambda streams to firehose function 250 | function_arn = ensure_lambda_streams_to_firehose() 251 | 252 | # map the dynamo update stream as a source for this function 253 | try: 254 | lambda_client.create_event_source_mapping( 255 | EventSourceArn=dynamo_stream_arn, 256 | FunctionName=function_arn, 257 | Enabled=True, 258 | BatchSize=get_config_value('streamsMaxRecordsBatch'), 259 | StartingPosition='TRIM_HORIZON' 260 | ) 261 | except botocore.exceptions.ClientError as e: 262 | if e.response['Error']['Code'] == 'ResourceConflictException': 263 | pass 264 | else: 265 | raise e 266 | 267 | 268 | ''' 269 | Deploy the LambdaStreamsToFirehose module (https://github.com/awslabs/lambda-streams-to-firehose) if it is not deployed already 270 | ''' 271 | def ensure_lambda_streams_to_firehose(): 272 | # make sure we have the LambdaStreamsToFirehose function deployed 273 | response = None 274 | try: 275 | response = lambda_client.get_function(FunctionName=LAMBDA_STREAMS_TO_FIREHOSE) 276 | except botocore.exceptions.ClientError as e: 277 | if e.response['Error']['Code'] == 'ResourceNotFoundException': 278 | pass 279 | 280 | if response and response["Configuration"]["FunctionArn"]: 281 | function_arn = response["Configuration"]["FunctionArn"] 282 | else: 283 | deployment_package = "%s/%s-%s.zip" % (LAMBDA_STREAMS_TO_FIREHOSE_PREFIX, LAMBDA_STREAMS_TO_FIREHOSE, LAMBDA_STREAMS_TO_FIREHOSE_VERSION) 284 | 285 | # resolve the bucket based on region 286 | region_suffix = current_region 287 | 288 | deploy_bucket = "%s-%s" % (LAMBDA_STREAMS_TO_FIREHOSE_BUCKET, region_suffix) 289 | 290 | print "Deploying %s from s3://%s" % (deployment_package, deploy_bucket) 291 | try: 292 | response = lambda_client.create_function( 293 | FunctionName=LAMBDA_STREAMS_TO_FIREHOSE, 294 | Runtime='nodejs4.3', 295 | Role=get_config_value('lambdaExecRoleArn'), 296 | Handler='index.handler', 297 | Code={ 298 | 'S3Bucket': deploy_bucket, 299 | 'S3Key': deployment_package 300 | }, 301 | Description="AWS Lambda Streams to Kinesis Firehose Replicator", 302 | Timeout=300, 303 | MemorySize=128, 304 | Publish=True 305 | ) 306 | except botocore.exceptions.ClientError as e: 307 | if e.response['Error']['Code'] == 'ResourceConflictException': 308 | # the function somehow already exists, though the get previously failed 309 | pass 310 | else: 311 | raise e 312 | 313 | if response: 314 | function_arn = response["FunctionArn"] 315 | print "Created New Function %s:%s" % (LAMBDA_STREAMS_TO_FIREHOSE, function_arn) 316 | 317 | return function_arn 318 | 319 | 320 | ''' 321 | Removes a Firehose Delivery Stream, without affecting S3 in any way 322 | ''' 323 | def delete_fh_stream(for_table_name): 324 | try: 325 | delivery_stream_name = get_delivery_stream_name(for_table_name) 326 | 327 | firehose_client.delete_delivery_stream( 328 | DeliveryStreamName=delivery_stream_name 329 | ) 330 | 331 | print "Deleted Firehose Delivery Stream %s" % (delivery_stream_name) 332 | except botocore.exceptions.ClientError as e: 333 | if e.response['Error']['Code'] == 'ResourceNotFoundException': 334 | print "No Firehose Delivery Stream %s Found - OK" % (delivery_stream_name) 335 | 336 | 337 | ''' 338 | Remove the routing of any DynamoDB Update Streams to LambdaStreamsToFirehose 339 | ''' 340 | def remove_stream_trigger(dynamo_table_name): 341 | # find any update streams that route to Lambda Streams to Firehose and remove them 342 | event_source_mappings = lambda_client.list_event_source_mappings(FunctionName=LAMBDA_STREAMS_TO_FIREHOSE) 343 | removed_stream_trigger = False 344 | 345 | for mapping in event_source_mappings['EventSourceMappings']: 346 | event_source_tokens = mapping['EventSourceArn'].split(":") 347 | 348 | # check if this is a dynamo DB event 349 | event_source_service = event_source_tokens[2] 350 | 351 | if event_source_service == 'dynamodb': 352 | # check if the table matches 353 | event_source_table = event_source_tokens[5].split("/")[1] 354 | 355 | if event_source_table == dynamo_table_name: 356 | lambda_client.delete_event_source_mapping(UUID=mapping["UUID"]) 357 | 358 | print "Removed Event Source Mapping for DynamoDB Update Stream %s" % (mapping["EventSourceArn"]) 359 | 360 | if not removed_stream_trigger: 361 | print "No DynamoDB Update Stream Triggers found routing to %s for %s - OK" % (LAMBDA_STREAMS_TO_FIREHOSE, dynamo_table_name) 362 | 363 | ''' 364 | Provision a single table for DynamoDB backup 365 | ''' 366 | def configure_table(dynamo_table_name): 367 | proceed = optin_function(dynamo_table_name) 368 | 369 | # ensure that the table has an update stream 370 | if proceed: 371 | dynamo_stream_arn = ensure_stream(dynamo_table_name) 372 | print "Resolved DynamoDB Stream ARN: %s" % (dynamo_stream_arn) 373 | 374 | # now ensure that we have a firehose delivery stream that will route to the backup location 375 | delivery_stream_arn = ensure_firehose_delivery_stream(dynamo_table_name) 376 | print "Resolved Firehose Delivery Stream ARN: %s" % (delivery_stream_arn) 377 | 378 | # wire the dynamo update stream to the deployed instance of lambda-streams-to-firehose 379 | ensure_update_stream_event_source(dynamo_stream_arn) 380 | else: 381 | print "Not configuring continuous backup for %s as it has been suppressed by the configured Opt-In function" % (dynamo_table_name) 382 | 383 | 384 | ''' 385 | Remove continuous backup via Update Streams, without affecting backup data on S3 386 | ''' 387 | def deprovision_table(dynamo_table_name): 388 | # remote routing of update stream to lambda-streams-to-firehose 389 | remove_stream_trigger(dynamo_table_name) 390 | 391 | # remove the firehose delivery stream 392 | delete_fh_stream(dynamo_table_name) 393 | -------------------------------------------------------------------------------- /src/index.py: -------------------------------------------------------------------------------- 1 | ''' 2 | AWS Lambda function which receives CloudTrail events via CloudWatch Logging and implements DynamoDB Continuous Backups 3 | ''' 4 | 5 | import sys 6 | 7 | # add the lib directory to the path 8 | sys.path.append('lib') 9 | 10 | import dynamo_continuous_backup as backup 11 | 12 | config = None 13 | debug = True 14 | 15 | def event_handler(event, context): 16 | if 'detail' in event and 'errorCode' in event['detail']: 17 | # anything that comes in with errors is ignored 18 | if debug == True: 19 | print "Supressing errored API Call - detail: %s:%s" % (event['detail']['errorCode'],event['detail']['errorMessage']) 20 | return 21 | 22 | # initialise the ddb continuous backup manager 23 | backup.init(None) 24 | 25 | # handle unknown event types 26 | if 'detail' not in event or 'requestParameters' not in event["detail"] or event['detail']['eventSource'] != 'dynamodb.amazonaws.com': 27 | print "Unknown input event type" 28 | print event 29 | else: 30 | if debug == True: 31 | print event 32 | 33 | if event['detail']['eventName'] == "CreateTable": 34 | # resolve the table 35 | dynamo_table_name = event["detail"]["requestParameters"]["tableName"] 36 | 37 | # configure the table for continuous backup 38 | backup.configure_table(dynamo_table_name) 39 | elif event['detail']['eventName'] == "DeleteTable": 40 | # delete the firehose delivery stream for this table 41 | dynamo_table_name = event["detail"]["requestParameters"]["tableName"] 42 | 43 | # deprovision table for continuous backup 44 | backup.deprovision_table(dynamo_table_name) 45 | else: 46 | print "Unknown Event %s" % (event['detail']['eventName']) 47 | print event -------------------------------------------------------------------------------- /src/provision_tables.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import sys 4 | 5 | # add the lib directory to the path 6 | sys.path.append('lib') 7 | 8 | import setup_existing_tables as setup 9 | import argparse 10 | 11 | if __name__ == "__main__": 12 | parser = argparse.ArgumentParser(formatter_class=argparse.ArgumentDefaultsHelpFormatter) 13 | parser.add_argument('whitelist_configuration', help='whitelist_configuration.hjson') 14 | args = parser.parse_args() 15 | 16 | setup.provision(args.whitelist_configuration) 17 | -------------------------------------------------------------------------------- /src/provisioning_whitelist.hjson: -------------------------------------------------------------------------------- 1 | { 2 | "provisionAll": false, 3 | "tableNames": [ 4 | "Table1", 5 | "Table2", 6 | // however many tables you want to opt-in to the bulk operation 7 | "TableN" 8 | ] 9 | } -------------------------------------------------------------------------------- /src/setup_existing_tables.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | ''' 4 | Module which gives customers the ability to provision existing DynamoDB tables for continuous backup 5 | ''' 6 | import sys 7 | 8 | # add the lib directory to the path 9 | sys.path.append('lib') 10 | 11 | import dynamo_continuous_backup 12 | import boto3 13 | import os 14 | import hjson 15 | 16 | REGION_KEY = 'AWS_REGION' 17 | dynamo_client = None 18 | 19 | def init(): 20 | try: 21 | current_region = os.environ[REGION_KEY] 22 | 23 | if current_region == None or current_region == '': 24 | raise KeyError 25 | except KeyError: 26 | raise Exception("Unable to resolve environment variable %s" % REGION_KEY) 27 | 28 | global dynamo_client 29 | dynamo_client = boto3.client('dynamodb', region_name=current_region) 30 | 31 | 32 | def resolve_table_list(config_file): 33 | # determine if there was a config file with a whitelist, or if we are provisioning all existing tables 34 | if config_file != None: 35 | print "Building Table List for Processing from %s" % (config_file) 36 | config = hjson.load(open(config_file, 'r')) 37 | 38 | table_list = [] 39 | if config == None or config == [] or config["provisionAll"] == True: 40 | last_table_evaluated = str(None) 41 | while last_table_evaluated != None or len(table_list) == 0: 42 | list_table_result = dynamo_client.list_tables(ExclusiveStartTableName=last_table_evaluated) 43 | 44 | for x in list_table_result['TableNames']: 45 | table_list.append(x) 46 | 47 | if "LastEvaluatedTableName" in list_table_result: 48 | last_table_evaluated = list_table_result['LastEvaluatedTableName'] 49 | else: 50 | break 51 | 52 | else: 53 | table_list = config["tableNames"] 54 | 55 | return table_list 56 | 57 | 58 | def provision_tables(table_list): 59 | for x in table_list: 60 | try: 61 | dynamo_continuous_backup.configure_table(x) 62 | except Exception as e: 63 | print "Exception while provisioning table %s" % (x) 64 | print e 65 | print "Proceeding..." 66 | 67 | 68 | def deprovision_tables(table_list): 69 | for x in table_list: 70 | try: 71 | dynamo_continuous_backup.deprovision_table(x) 72 | except Exception as e: 73 | print "Exception while deprovisioning table %s" % (x) 74 | print e 75 | print "Proceeding..." 76 | 77 | 78 | def deprovision(table_whitelist): 79 | init() 80 | 81 | table_list = resolve_table_list(table_whitelist) 82 | 83 | dynamo_continuous_backup.init(None) 84 | 85 | deprovision_tables(table_list) 86 | 87 | 88 | def provision(table_whitelist): 89 | init() 90 | 91 | table_list = resolve_table_list(table_whitelist) 92 | 93 | dynamo_continuous_backup.init(None) 94 | 95 | provision_tables(table_list) 96 | --------------------------------------------------------------------------------