├── .github └── PULL_REQUEST_TEMPLATE.md ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── LoadMovieData.py ├── NOTICE.txt ├── README.md ├── ddb-to-firehose.py ├── ddbathenablog_cf.yaml └── moviedata.json /.github/PULL_REQUEST_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | *Issue #, if available:* 2 | 3 | *Description of changes:* 4 | 5 | 6 | By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice. 7 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check [existing open](https://github.com/aws-samples/aws-s3-nosql-offloading-blog/issues), or [recently closed](https://github.com/aws-samples/aws-s3-nosql-offloading-blog/issues?utf8=%E2%9C%93&q=is%3Aissue%20is%3Aclosed%20), issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels ((enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any ['help wanted'](https://github.com/aws-samples/aws-s3-nosql-offloading-blog/labels/help%20wanted) issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](https://github.com/aws-samples/aws-s3-nosql-offloading-blog/blob/master/LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this 4 | software and associated documentation files (the "Software"), to deal in the Software 5 | without restriction, including without limitation the rights to use, copy, modify, 6 | merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 7 | permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 10 | INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 11 | PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 12 | HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 13 | OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 14 | SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | -------------------------------------------------------------------------------- /LoadMovieData.py: -------------------------------------------------------------------------------- 1 | ######################################################################################### 2 | # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | ######################################################################################### 17 | 18 | import os, sys, time, decimal 19 | from decimal import * 20 | import boto3 21 | import json 22 | dynamodb = boto3.resource('dynamodb', region_name='us-east-1') 23 | table = dynamodb.Table('Movies') 24 | 25 | 26 | def loadfile(infile): 27 | jsonobj = json.load(open(infile)) 28 | lc = 1 29 | for movie in jsonobj: 30 | lc += 1 31 | CreateTime = int(time.time()) 32 | ExpireTime = CreateTime + (1* 60* 60) 33 | response = table.put_item( 34 | Item={ 35 | 'Year': decimal.Decimal(movie['year']), 36 | 'Title': movie['title'], 37 | 'info': json.dumps(movie['info']), 38 | 'CreateTime': CreateTime, 39 | 'ExpireTime': ExpireTime 40 | } 41 | ) 42 | if (lc % 10) == 0: 43 | print ("%d rows inserted" % (lc)) 44 | 45 | if __name__ == '__main__': 46 | filename = sys.argv[1] 47 | if os.path.exists(filename): 48 | # file exists, continue 49 | loadfile(filename) 50 | else: 51 | print ('Please enter a valid filename') -------------------------------------------------------------------------------- /NOTICE.txt: -------------------------------------------------------------------------------- 1 | aws-s3-nosql-offloading-blog 2 | Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## AWS S3 Nosql Offloading Blog 2 | 3 | Code to support Databases blog post - How to offload data from your transactional NoSQL database to Amazon S3, perform advanced analytics, and build visualizations 4 | 5 | ## License Summary 6 | 7 | This sample code is made available under a modified MIT license. See the LICENSE file. 8 | -------------------------------------------------------------------------------- /ddb-to-firehose.py: -------------------------------------------------------------------------------- 1 | 2 | 3 | ######################################################################################### 4 | # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. 5 | # 6 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 7 | # software and associated documentation files (the "Software"), to deal in the Software 8 | # without restriction, including without limitation the rights to use, copy, modify, 9 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 10 | # permit persons to whom the Software is furnished to do so. 11 | # 12 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 13 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 14 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 15 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 16 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 17 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 18 | ######################################################################################### 19 | 20 | import os, json, base64, boto3 21 | 22 | firehose = boto3.client('firehose') 23 | 24 | print('Loading function') 25 | 26 | def recToFirehose(streamRecord): 27 | ddbRecord = streamRecord['NewImage'] 28 | toFirehose = {} 29 | for c in ddbRecord: 30 | toFirehose[c] = next(iter(ddbRecord[c].values())) 31 | jddbRecord = json.loads(ddbRecord['info']['S']) 32 | # Transform the record a bit 33 | try: 34 | rating = jddbRecord['rating'] 35 | except: 36 | rating = 0 37 | try: 38 | actors = jddbRecord['actors'] 39 | except: 40 | actors = [' ',' '] 41 | actor1 = actors[0] 42 | try: 43 | actor2 = actor[1] 44 | except: 45 | actor2 = ' ' 46 | try: 47 | genres = jddbRecord['genres'] 48 | except: 49 | genres = ['',''] 50 | genre1 = genres[0] 51 | try: 52 | genre2 = genres[1] 53 | except: 54 | genre2 = ' ' 55 | 56 | try: 57 | directors = jddbRecord['directors'] 58 | except: 59 | directors = [' ',' '] 60 | director1 = directors[0] 61 | try: 62 | director2 = directors[1] 63 | except: 64 | director2 = ' ' 65 | 66 | toFirehose["actor1"] = actor1 67 | toFirehose["actor2"] = actor2 68 | toFirehose["director1"] = director1 69 | toFirehose["director2"] = director2 70 | toFirehose["genre1"] = genre1 71 | toFirehose["genre2"] = genre2 72 | toFirehose["rating"] = rating 73 | jtoFirehose = json.dumps(toFirehose) 74 | response = firehose.put_record( 75 | DeliveryStreamName=os.environ['DeliveryStreamName'], 76 | Record= { 77 | 'Data': jtoFirehose + '\n' 78 | } 79 | ) 80 | print(response) 81 | 82 | def lambda_handler(event, context): 83 | for record in event['Records']: 84 | if (record['eventName']) != 'REMOVE': 85 | recToFirehose(record['dynamodb']) 86 | return 'Successfully processed {} records.'.format(len(event['Records'])) 87 | -------------------------------------------------------------------------------- /ddbathenablog_cf.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: 2010-09-09 2 | Description: >- 3 | AWS CloudFormation: 4 | Parameters: 5 | DynamoDBTableName: 6 | Description: DynamoDB Table Name 7 | Type: String 8 | AllowedPattern: '[a-zA-Z0-9]*' 9 | MinLength: '1' 10 | MaxLength: '255' 11 | ConstraintDescription: must contain only alphanumeric characters 12 | LambdaCodeBucket: 13 | Description: S3 bucket containing the Lambda function code 14 | Type: String 15 | Resources: 16 | myDynamoDBTable: 17 | Type: 'AWS::DynamoDB::Table' 18 | Properties: 19 | TableName: !Ref DynamoDBTableName 20 | StreamSpecification: 21 | StreamViewType: NEW_IMAGE 22 | AttributeDefinitions: 23 | - AttributeName: Year 24 | AttributeType: N 25 | - AttributeName: Title 26 | AttributeType: S 27 | KeySchema: 28 | - AttributeName: Year 29 | KeyType: HASH 30 | - AttributeName: Title 31 | KeyType: RANGE 32 | ProvisionedThroughput: 33 | ReadCapacityUnits: 5 34 | WriteCapacityUnits: 5 35 | TimeToLiveSpecification: 36 | Enabled: True 37 | AttributeName: ExpireTime 38 | myS3Bucket: 39 | Type: 'AWS::S3::Bucket' 40 | Properties: 41 | PublicAccessBlockConfiguration: 42 | BlockPublicAcls: true 43 | BlockPublicPolicy: true 44 | IgnorePublicAcls: true 45 | RestrictPublicBuckets: true 46 | firehoseDeliveryStream: 47 | DependsOn: 48 | - deliveryPolicy 49 | Type: 'AWS::KinesisFirehose::DeliveryStream' 50 | Properties: 51 | DeliveryStreamName: !Ref DynamoDBTableName 52 | ExtendedS3DestinationConfiguration: 53 | BucketARN: !Join 54 | - '' 55 | - - 'arn:aws:s3:::' 56 | - !Ref myS3Bucket 57 | BufferingHints: 58 | IntervalInSeconds: '60' 59 | SizeInMBs: '1' 60 | CompressionFormat: UNCOMPRESSED 61 | Prefix: firehose/ 62 | RoleARN: !GetAtt deliveryRole.Arn 63 | deliveryRole: 64 | Type: 'AWS::IAM::Role' 65 | Properties: 66 | AssumeRolePolicyDocument: 67 | Version: 2012-10-17 68 | Statement: 69 | - Sid: '' 70 | Effect: Allow 71 | Principal: 72 | Service: firehose.amazonaws.com 73 | Action: 'sts:AssumeRole' 74 | Condition: 75 | StringEquals: 76 | 'sts:ExternalId': !Ref 'AWS::AccountId' 77 | deliveryPolicy: 78 | Type: 'AWS::IAM::ManagedPolicy' 79 | Properties: 80 | Description: Managed policy for firehose 81 | Roles: 82 | - !Ref deliveryRole 83 | PolicyDocument: 84 | Version: 2012-10-17 85 | Statement: 86 | - Effect: Allow 87 | Action: 88 | - 's3:AbortMultipartUpload' 89 | - 's3:GetBucketLocation' 90 | - 's3:GetObject' 91 | - 's3:ListBucket' 92 | - 's3:ListBucketMultipartUploads' 93 | - 's3:PutObject' 94 | Resource: 95 | - !Join 96 | - '' 97 | - - 'arn:aws:s3:::' 98 | - !Ref myS3Bucket 99 | - !Join 100 | - '' 101 | - - 'arn:aws:s3:::' 102 | - !Ref myS3Bucket 103 | - '*' 104 | lambdaExecutionRole: 105 | Type: AWS::IAM::Role 106 | Properties: 107 | AssumeRolePolicyDocument: 108 | Version: '2012-10-17' 109 | Statement: 110 | - Sid: '' 111 | Effect: Allow 112 | Principal: 113 | Service: lambda.amazonaws.com 114 | Action: 'sts:AssumeRole' 115 | ddbToFirehose: 116 | Type: "AWS::Lambda::Function" 117 | Properties: 118 | Handler: "ddb-to-firehose.lambda_handler" 119 | Role: 120 | Fn::GetAtt: 121 | - "lambdaExecutionRole" 122 | - "Arn" 123 | Code: 124 | S3Bucket: !Ref LambdaCodeBucket 125 | S3Key: "ddb-to-firehose.zip" 126 | Runtime: "python3.6" 127 | Timeout: "25" 128 | Environment: 129 | Variables: 130 | DeliveryStreamName: !Ref DynamoDBTableName 131 | logGroup: 132 | Type: "AWS::Logs::LogGroup" 133 | Properties: 134 | LogGroupName: !Sub "/aws/lambda/${ddbToFirehose}" 135 | lambdaExecutionPolicy: 136 | Type: 'AWS::IAM::ManagedPolicy' 137 | Properties: 138 | Description: Managed policy for lambda function 139 | Roles: 140 | - !Ref lambdaExecutionRole 141 | PolicyDocument: 142 | Version: 2012-10-17 143 | Statement: 144 | - Effect: Allow 145 | Action: 146 | - 'firehose:PutRecord' 147 | - 'firehose:PutRecordBatch' 148 | - 'firehose:UpdateDestination' 149 | Resource: !GetAtt 150 | - firehoseDeliveryStream 151 | - Arn 152 | - Effect: Allow 153 | Action: 154 | - 'logs:CreateLogStream' 155 | - 'logs:PutLogEvents' 156 | Resource: 157 | - !Sub "arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:${logGroup}:*" 158 | - Effect: Allow 159 | Action: 160 | - 'dynamodb:DescribeStream' 161 | - 'dynamodb:GetRecords' 162 | - 'dynamodb:GetShardIterator' 163 | - 'dynamodb:ListStreams' 164 | Resource: !GetAtt 165 | - myDynamoDBTable 166 | - StreamArn 167 | EventSourceMapping: 168 | Type: "AWS::Lambda::EventSourceMapping" 169 | DependsOn: 170 | - lambdaExecutionPolicy 171 | Properties: 172 | EventSourceArn: !GetAtt 173 | - myDynamoDBTable 174 | - StreamArn 175 | FunctionName: !GetAtt 176 | - ddbToFirehose 177 | - Arn 178 | StartingPosition: "TRIM_HORIZON" 179 | Outputs: 180 | TableName: 181 | Value: !Ref myDynamoDBTable 182 | Description: Table name of the newly created DynamoDB table 183 | BucketName: 184 | Value: !Ref myS3Bucket 185 | Description: My s3 bucket --------------------------------------------------------------------------------