├── .gitignore ├── LICENSE ├── NOTICE.txt ├── README.md ├── WELL-ARCHITECTED.md ├── buildspec-test.yml ├── buildspec.yml ├── cleanup.sh ├── img ├── lambda-refarch-fileprocessing-dashboard.png ├── lambda-refarch-fileprocessing-simple-pipeline.png ├── lambda-refarch-fileprocessing-simple.png └── lambda-refarch-fileprocessing-x-ray-error-trace.png ├── pipeline ├── README.md ├── cleanup.sh └── pipeline.yaml ├── src ├── conversion │ ├── conversion.py │ └── requirements.txt ├── notification │ ├── cfnresponse.py │ ├── notification.py │ └── requirements.txt └── sentiment │ ├── requirements.txt │ └── sentiment.py ├── template.yml ├── tests.sh └── tests ├── sample-01.md ├── sample-02.md ├── sample-03.md ├── sample-04.md ├── sample-05.md ├── sample-06.md ├── sample-07.md ├── sample-08.md ├── sample-09.md ├── sample-10.md ├── sample-11.md ├── sample-12.md ├── sample-13.md ├── sample-14.md ├── sample-15.md ├── sample-16.md ├── sample-17.md ├── sample-18.md ├── sample-19.md ├── sample-20.md ├── sample-21.md ├── sample-22.md ├── sample-23.md └── sample-24.md /.gitignore: -------------------------------------------------------------------------------- 1 | samconfig.toml 2 | .aws-sam 3 | .idea 4 | .history 5 | __pycache__ 6 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright {yyyy} {name of copyright owner} 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | 203 | -------------------------------------------------------------------------------- /NOTICE.txt: -------------------------------------------------------------------------------- 1 | AWS Lambda Reference Architecture: Real-time File Processing 2 | lambda-refarch-fileprocessing 3 | Copyright 2015 Amazon.com, Inc. or its affiliates. 4 | All Rights Reserved. -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Serverless Reference Architecture: Real-time File Processing 2 | 3 | The Real-time File Processing reference architecture is a general-purpose, event-driven, parallel data processing architecture that uses [AWS Lambda](https://aws.amazon.com/lambda). This architecture is ideal for workloads that need more than one data derivative of an object. 4 | 5 | In this example application, we deliver notes from an interview in Markdown format to S3. S3 Events are used to trigger multiple processing flows - one to convert and persist Markdown files to HTML and another to detect and persist sentiment. 6 | 7 | ## Architectural Diagram 8 | 9 | ![Reference Architecture - Real-time File Processing](img/lambda-refarch-fileprocessing-simple.png) 10 | 11 | ## Application Components 12 | 13 | ### Event Trigger 14 | 15 | In this architecture, individual files are processed as they arrive. To achive this, we utilize [AWS S3 Events](https://docs.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html) and [Amazon Simple Notification Service](https://docs.aws.amazon.com/sns/latest/dg/welcome.html). When an object is created in S3, an event is emitted to a SNS topic. We deliver our event to 2 seperate [SQS Queues](https://aws.amazon.com/sqs/), representing 2 different workflows. Refer to [What is Amazon Simple Notification Service?](https://docs.aws.amazon.com/sns/latest/dg/welcome.html) for more information about eligible targets. 16 | 17 | ### Conversion Workflow 18 | 19 | Our function will take Markdown files stored in our **InputBucket**, convert them to HTML, and store them in our **OutputBucket**. The **ConversionQueue** SQS queue captures the S3 Event JSON payload, allowing for more control of our **ConversionFunction** and better error handling. Refer to [Using AWS Lambda with Amazon SQS](https://docs.aws.amazon.com/lambda/latest/dg/with-sqs.html) for more details. 20 | 21 | If our **ConversionFunction** cannot remove the messages from the **ConversionQueue**, they are sent to **ConversionDlq**, a dead-letter queue (DLQ), for inspection. A CloudWatch Alarm is configured to send notification to an email address when there are any messages in the **ConversionDlq**. 22 | 23 | ### Sentiment Analysis Workflow 24 | 25 | Our function will take Markdown files stored in our **InputBucket**, detect the overall sentiment for each file, and store the result in our **SentimentTable**. 26 | 27 | We are using [Amazon Comprehend](https://aws.amazon.com/comprehend/) to detect overall interview sentiment. Amazon Comprehend is a machine learning powered service that makes it easy to find insights and relationships in text. We use the Sentiment Analysis API to understand whether interview responses are positive or negative. 28 | 29 | The Sentiment workflow uses the same SQS-to-Lambda Function pattern as the Coversion workflow. 30 | 31 | If our **SentimentFunction** cannot remove the messages from the **SentimentQueue**, they are sent to **SentimentDlq**, a dead-letter queue (DLQ), for inspection. A CloudWatch Alarm is configured to send notification to an email address when there are any messages in the **SentimentDlq**. 32 | 33 | ## Building and Deploying the Application with the AWS Serverless Application Model (AWS SAM) 34 | 35 | This application is deployed using the [AWS Serverless Application Model (AWS SAM)](https://aws.amazon.com/serverless/sam/). AWS SAM is an open-source framework that enables you to build serverless applications on AWS. It provides you with a template specification to define your serverless application, and a command line interface (CLI) tool. 36 | 37 | ### Pre-requisites 38 | 39 | * [AWS CLI version 2](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html) 40 | 41 | * [AWS SAM CLI (0.41.0 or higher)](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) 42 | 43 | * [Docker](https://docs.docker.com/install/) 44 | 45 | ### Clone the Repository 46 | 47 | #### Clone with SSH 48 | 49 | ```bash 50 | git clone git@github.com:aws-samples/lambda-refarch-fileprocessing.git 51 | ``` 52 | 53 | #### Clone with HTTPS 54 | 55 | ```bash 56 | git clone https://github.com/aws-samples/lambda-refarch-fileprocessing.git 57 | ``` 58 | 59 | ### Build 60 | 61 | The AWS SAM CLI comes with abstractions for a number of Lambda runtimes to build your dependencies, and copies the source code into staging folders so that everything is ready to be packaged and deployed. The *sam build* command builds any dependencies that your application has, and copies your application source code to folders under *.aws-sam/build* to be zipped and uploaded to Lambda. 62 | 63 | ```bash 64 | sam build --use-container 65 | ``` 66 | 67 | **Note** 68 | 69 | Be sure to use v0.41.0 of the AWS SAM CLI or newer. Failure to use the proper version of the AWS SAM CLI will result in a `InvalidDocumentException` exception. The `EventInvokeConfig` property is not recognized in earlier versions of the AWS SAM CLI. To confirm your version of AWS SAM, run the command `sam --version`. 70 | 71 | ### Deploy 72 | 73 | For the first deployment, please run the following command and save the generated configuration file *samconfig.toml*. Please use **lambda-file-refarch** for the stack name. 74 | 75 | ```bash 76 | sam deploy --guided 77 | ``` 78 | 79 | You will be prompted to enter data for *ConversionLogLevel* and *SentimentLogLevel*. The default value for each is *INFO* but you can also enter *DEBUG*. You will also be prompted for *AlarmRecipientEmailAddress*. 80 | 81 | Subsequent deployments can use the simplified `sam deploy`. The command will use the generated configuration file *samconfig.toml*. 82 | 83 | You will receive an email asking you to confirm subscription to the `lambda-file-refarch-AlarmTopic` SNS topic that will receive alerts should either the `ConversionDlq` SQS queue or `SentimentDlq` SQS queue receive messages. 84 | 85 | ## Testing the Example 86 | 87 | After you have created the stack using the CloudFormation template, you can manually test the system by uploading a Markdown file to the InputBucket that was created in the stack. 88 | 89 | Alternatively you test it by utilising the pipeline tests.sh script, however the test script removes the resources it creates, so if you wish to explore the solution and see the output files 90 | and DynamoDB tables manually uploading is the better option. 91 | 92 | ### Manually testing 93 | 94 | You can use the any of the sample-xx.md files in the repository /**tests** directory as example files. After the files have been uploaded, you can see the resulting HTML file in the output bucket of your stack. You can also view the CloudWatch logs for each of the functions in order to see the details of their execution. 95 | 96 | You can use the following commands to copy a sample file from the provided S3 bucket into the input bucket of your stack. 97 | 98 | ```bash 99 | INPUT_BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id InputBucket --query "StackResourceDetail.PhysicalResourceId" --output text) 100 | aws s3 cp ./tests/sample-01.md s3://${INPUT_BUCKET}/sample-01.md 101 | aws s3 cp ./tests/sample-02.md s3://${INPUT_BUCKET}/sample-02.md 102 | ``` 103 | 104 | Once the input files has been uploaded to the input bucket, a series of events are put into motion. 105 | 106 | 1. The input Markdown files are converted and stored in a separate S3 bucket. 107 | ``` 108 | OUTPUT_BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id ConversionTargetBucket --query "StackResourceDetail.PhysicalResourceId" --output text) 109 | aws s3 ls s3://${OUTPUT_BUCKET} 110 | ``` 111 | 112 | 2. The input Markdown files are analyzed and their sentiment published to a DynamoDB table. 113 | ``` 114 | DYNAMO_TABLE=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id SentimentTable --query "StackResourceDetail.PhysicalResourceId" --output text) 115 | aws dynamodb scan --table-name ${DYNAMO_TABLE} --query "Items[*]" 116 | ``` 117 | 118 | You can also view the CloudWatch logs generated by the Lambda functions. 119 | 120 | 121 | ### Using the test script 122 | 123 | The pipeline end to end test script can be manually executed, you will need to ensure you have adequate permissions to perform the test script actions. 124 | 125 | * Describing stack resources 126 | * Uploading and deleting files from the S3 input bucket 127 | * Deleting files from the S3 output bucket 128 | * Reading and deleting entries from the DynamoDB table 129 | 130 | ```bash 131 | bash ./tests.sh lambda-file-refarch 132 | ``` 133 | 134 | While the script is executing you will see all the stages output to the command line. The samples are uploaded to the **InputBucket**, the script will then wait for files to appear in the **OutputBucket** before checking they have all been processed and the matching html file exists in the **OutputBucket**. It will also check that the sentiment for each of the files has been recorded in the **SentimentTable**. Once complete the script will remove all the files created and the entries from the **SentimentTable**. 135 | 136 | ### Extra credit testing 137 | 138 | Try uploading (or adding to ./tests if you are using the script) an oversized (>100MB) or invalid file type to the input bucket. 139 | You can check in X-ray to explore how you can trace these kind of errors within the solution. 140 | 141 | * Linux command 142 | 143 | ```bash 144 | fallocate -l 110M ./tests/sample-oversize.md 145 | ``` 146 | 147 | * Mac OS X command 148 | 149 | ```bash 150 | mkfile 110m ./tests/sample-oversize.md 151 | ``` 152 | 153 | ![X-Ray Error Tracing - Real-time File Processing](img/lambda-refarch-fileprocessing-x-ray-error-trace.png) 154 | 155 | 156 | ## Viewing the CloudWatch dashboard 157 | 158 | A dashboard is created as a part of the stack creation process. Metrics are published for the conversion and sentiment analysis processes. In addition, the alarms and alarm states are published. 159 | 160 | ![CloudWatch Dashboard - Real-time File Processing](img/lambda-refarch-fileprocessing-dashboard.png) 161 | 162 | ## Cleaning Up the Example Resources 163 | 164 | To remove all resources created by this example, run the following command: 165 | 166 | ```bash 167 | bash cleanup.sh 168 | ``` 169 | 170 | ### What Is Happening in the Script? 171 | 172 | Objects are cleared out from the `InputBucket` and `ConversionTargetBucket`. 173 | 174 | ```bash 175 | for bucket in InputBucket ConversionTargetBucket; do 176 | echo "Clearing out ${bucket}..." 177 | BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id ${bucket} --query "StackResourceDetail.PhysicalResourceId" --output text) 178 | aws s3 rm s3://${BUCKET} --recursive 179 | echo 180 | done 181 | ``` 182 | 183 | The CloudFormation stack is deleted. 184 | 185 | ```bash 186 | aws cloudformation delete-stack \ 187 | --stack-name lambda-file-refarch 188 | ``` 189 | 190 | The CloudWatch Logs Groups associated with the Lambda functions are deleted. 191 | 192 | ```bash 193 | for log_group in $(aws logs describe-log-groups --log-group-name-prefix '/aws/lambda/lambda-file-refarch-' --query "logGroups[*].logGroupName" --output text); do 194 | echo "Removing log group ${log_group}..." 195 | aws logs delete-log-group --log-group-name ${log_group} 196 | echo 197 | done 198 | ``` 199 | 200 | ## SAM Template Resources 201 | 202 | ### Resources 203 | 204 | [The provided template](https://s3.amazonaws.com/awslambda-reference-architectures/file-processing/packaged-template.yml) 205 | creates the following resources: 206 | 207 | - **InputBucket** - A S3 bucket that holds the raw Markdown files. Uploading a file to this bucket will trigger processing functions. 208 | 209 | - **NotificationTopic** - A SNS topic that receives S3 events from the **InputBucket**. 210 | 211 | - **NotificationTopicPolicy** - A SNS topic policy that allows the **InputBucket** to publish events to the **NotificationTopic**. 212 | 213 | - **NotificationQueuePolicy** - A SQS queue policy that allows the **NotificationTopic** to publish events to the **ConversionQueue** and **SentimentQueue**. 214 | 215 | - **ApplyS3NotificationLambdaFunction** - A Lambda function that adds a S3 bucket notification when objects are created in the **InputBucket**. The function is called by **ApplyInputBucketTrigger**. 216 | 217 | - **ApplyInputBucketTrigger** - A CloudFormation Custom Resource that invokes the **ApplyS3NotificationLambdaFunction** when a CloudFormation stack is created. 218 | 219 | - **ConversionSubscription** - A SNS subscription that allows the **ConversionQueue** to receive messages from **NotificationTopic**. 220 | 221 | - **ConversionQueue** - A SQS queue that is used to store events for conversion from Markdown to HTML. 222 | 223 | - **ConversionDlq** - A SQS queue that is used to capture messages that cannot be processed by the **ConversionFunction**. The *RedrivePolicy* on the **ConversionQueue** is used to manage how traffic makes it to this queue. 224 | 225 | - **ConversionFunction** - A Lambda function that takes the input file, converts it to HTML, and stores the resulting file to **ConversionTargetBucket**. 226 | 227 | - **ConversionTargetBucket** - A S3 bucket that stores the converted HTML. 228 | 229 | - **SentimentSubscription** - A SNS subscription that allows the **SentimentQueue** to receive messages from **NotificationTopic**. 230 | 231 | - **SentimentQueue** - A SQS queue that is used to store events for sentiment analysis processing. 232 | 233 | - **SentimentDlq** - A SQS queue that is used to capture messages that cannot be processed by the **SentimentFunction**. The *RedrivePolicy* on the **SentimentQueue** is used to manage how traffic makes it to this queue. 234 | 235 | - **SentimentFunction** - A Lambda function that takes the input file, performs sentiment analysis, and stores the output to the **SentimentTable**. 236 | 237 | - **SentimentTable** - A DynamoDB table that stores the input file along with the sentiment. 238 | 239 | - **AlarmTopic** - A SNS topic that has an email as a subscriber. This topic is used to receive alarms from the **ConversionDlqAlarm**, **SentimentDlqAlarm**, **ConversionQueueAlarm**, **SentimentQueueAlarm**, **ConversionFunctionErrorRateAlarm**, **SentimentFunctionErrorRateAlarm**, **ConversionFunctionThrottleRateAlarm**, and **SentimentFunctionThrottleRateAlarm**. 240 | 241 | - **ConversionDlqAlarm** - A CloudWatch Alarm that detects when there there are any messages sent to the **ConvesionDlq** within a 1 minute period and sends a notification to the **AlarmTopic**. 242 | 243 | - **SentimentDlqAlarm** - A CloudWatch Alarm that detects when there there are any messages sent to the **SentimentDlq** within a 1 minute period and sends a notification to the **AlarmTopic**. 244 | 245 | - **ConversionQueueAlarm** - A CloudWatch Alarm that detects when there are 20 or more messages in the **ConversionQueue** within a 1 minute period and sends a notification to the **AlarmTopic**. 246 | 247 | - **SentimentQueueAlarm** - A CloudWatch Alarm that detects when there are 20 or more messages in the **SentimentQueue** within a 1 minute period and sends a notification to the **AlarmTopic**. 248 | 249 | - **ConversionFunctionErrorRateAlarm** - A CloudWatch Alarm that detects when there is an error rate of 5% over a 5 minute period for the **ConversionFunction** and sends a notification to the **AlarmTopic**. 250 | 251 | - **SentimentFunctionErrorRateAlarm** - A CloudWatch Alarm that detects when there is an error rate of 5% over a 5 minute period for the **SentimentFunction** and sends a notification to the **AlarmTopic**. 252 | 253 | - **ConversionFunctionThrottleRateAlarm** - A CloudWatch Alarm that detects when ther is a throttle rate of 1% over a 5 minute period for the **ConversionFunction** and sends a notification to the **AlarmTopic**. 254 | 255 | - **SentimentFunctionThrottleRateAlarm** - A CloudWatch Alarm that detects when ther is a throttle rate of 1% over a 5 minute period for the **SentimentFunction** and sends a notification to the **AlarmTopic**. 256 | 257 | - **ApplicationDashboard** - A CloudWatch Dashboard that displays Conversion Function Invocations, Conversion Function Error Rate, Conversion Function Throttle Rate, Conversion DLQ Length, Sentiment Function Invocations, Sentiment Function Error Rate, Sentiment Function Throttle Rate, and Sentiment DLQ Length. 258 | 259 | ## License 260 | 261 | This reference architecture sample is licensed under Apache 2.0. 262 | -------------------------------------------------------------------------------- /WELL-ARCHITECTED.md: -------------------------------------------------------------------------------- 1 | ## Operational Excellence 2 | 3 | #### OPS 1. How do you evaluate your Serverless application’s health? 4 | 5 | * [ ] Question does not apply to this workload 6 | 7 | * [x] **[Required]** Understand, analyze and alert on metrics provided out of the box 8 | * [x] **[Best]** Use application, business, and operations metrics 9 | * [x] **[Good]** Use distributed tracing and code is instrumented with additional context 10 | * [ ] **[Good]** Use structured and centralized logging 11 | * [ ] None of these 12 | 13 | 14 | ##### Notes 15 | 16 | >* The example uses structured logging output to Cloudwatch. For our example we only deploy to a single account so we don't require the use of cross account centralised logging. 17 | > 18 | >* We have alarms configured with notifications should processing fail. 19 | > 20 | >* We do not have a defined KPI within the application. We could however use a metric such as number of records processed within a given time frame and alert if this is outside of the defined thresholds. 21 | 22 | --- 23 | 24 | #### OPS 2. How do you approach application lifecycle management? 25 | 26 | * [ ] Question does not apply to this workload 27 | 28 | * [x] **[Required]** Use infrastructure as code and stages isolated in separate environments 29 | * [x] **[Good]** Prototype new features using temporary environments 30 | * [ ] **[Good]** Use a rollout deployment mechanism 31 | * [ ] **[Good]** Use configuration management 32 | * [ ] **[Good]** Review the function runtime deprecation policy 33 | * [ ] **[Best]** Use CI/CD including automated testing across separate accounts 34 | 35 | * [ ] None of these 36 | 37 | 38 | ##### Notes 39 | 40 | >* Our example utilizes infrastructure as code and includes a simple pipeline that will build and deploy within an individual account and to an individual environment. However the nature of this example means it can be deployed multiple times with different configurations. You can for example deploy a staging pipeline that would watch a development branch and deploy and changes to the Staging application stack. You could also deploy a production pipeline stack that watches the master branch and merges here will trigger a production release. 41 | > 42 | >* For this example a rollout mechanism would involve adopting either a Blue / Green deployment strategy with you controlling which input bucket a particular user hits . Alternatively for application business logic only changes these could be tested by having a notification invoke an alternate version of a lambda under specific conditions. 43 | 44 | --- 45 | 46 | ## Security 47 | 48 | #### SEC 1: How do you control access to your Serverless API? 49 | 50 | * [x] Question does not apply to this workload 51 | 52 | 53 | * [ ] **[Required]** Use appropriate endpoint type and mechanisms to secure access to your API 54 | * [ ] **[Good]** Use authentication and authorization mechanisms 55 | * [ ] **[Best]** Scope access based on identity’s metadata 56 | 57 | 58 | * [ ] None of these 59 | 60 | 61 | ##### Notes 62 | 63 | >This solution doesn't include an API frontend so the question doesn't apply. 64 | 65 | --- 66 | 67 | #### SEC 2: How do you manage your Serverless application’s security boundaries? 68 | 69 | * [ ] Question does not apply to this workload 70 | 71 | 72 | * [x] **[Required]** Evaluate and define resource policies 73 | * [x] **[Good]** Control network traffic at all layers 74 | * [x] **[Best]** Smaller functions require fewer permissions 75 | * [x] **[Required]** Use temporary credentials between resources and components 76 | 77 | * [ ] None of these 78 | 79 | 80 | ##### Notes 81 | 82 | > * We use IAM policy to ensure that resources can only be called by other resources that should be calling them. 83 | > 84 | > * All application components will assume a role with only the permissions it requires in order to perform its function. This will either be only being able to perform a specific action on multiple resources or any action on a particular resource. 85 | > 86 | > * This application does not use private networking. 87 | > 88 | > * We have individual functions for each different piece of business logic. 89 | 90 | --- 91 | 92 | #### SEC 3: How do you implement Application Security in your workload?*** 93 | 94 | * [ ] Question does not apply to this workload 95 | 96 | 97 | * [x] **[Required]** Review security awareness documents frequently 98 | * [x] **[Required]** Store secrets that are used in your code securely 99 | * [ ] **[Good]** Implement runtime protection to help prevent against malicious code execution 100 | * [ ] **[Best]** Automatically review workload’s code dependencies/libraries 101 | * [x] **[Best]** Validate inbound events 102 | 103 | 104 | * [ ] None of these 105 | 106 | 107 | ##### Notes 108 | 109 | > * This application doesn't have any stored secrets. The GitHub token is required by CodePipeline, this is passed as a string for CloudFormation, it is not however visible within the CloudFormation console. This could be improved by manually creating a secrets manager entry for the token and replacing the CloudFormation parameter for the token with the secrets manager value by utilising Dynamic References. 110 | >https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/dynamic-references.html 111 | > 112 | > * For reviewing dependencies and libraries we could integrate an automatic check into the pipeline. There are many tools and providers which can check code. Currently this is manual using PEP8 and Bandit manual checks. 113 | > 114 | > * We only check for particular events and check to make sure the object is valid. 115 | 116 | --- 117 | 118 | ## Reliability 119 | 120 | #### REL 1. How do you regulate inbound request rates? 121 | 122 | * [ ] Question does not apply to this workload 123 | 124 | 125 | * [x] **[Required]** Use throttling to control inbound request rates 126 | * [ ] **[Good]** Use, analyze and enforce API quotas 127 | * [X] **[Best]** Use mechanisms to protect non-scalable resources 128 | 129 | 130 | * [ ] None of these 131 | 132 | 133 | ##### Notes 134 | 135 | > * We are using SQS queues in front of our Lambda functions, this helps us throttle the rate at which our application processes requests. 136 | > 137 | > * We don't have API's to set quotas for. 138 | > 139 | > * Our downstream resources are S3 and DynamoDB on-demand which are more than capable of scaling to match our volumes. 140 | 141 | --- 142 | 143 | #### REL 2. How do you build resiliency into your Serverless application? 144 | 145 | * [ ] Question does not apply to this workload 146 | 147 | 148 | * [x] **[Required]** Manage transaction, partial, and intermittent failures 149 | * [x] **[Required]** Manage duplicate and unwanted events 150 | * [ ] **[Good]** Orchestrate long-running transactions 151 | * [x] **[Best]** Consider scaling patterns at burst rates 152 | 153 | 154 | * [ ] None of these 155 | 156 | ##### Notes 157 | 158 | > * We use SQS queues and DLQ's to ensure any processing failure results in a notification. 159 | > 160 | > * The Dynamo key and converted S3 object for each analysis is tied to the input object being analyzed. Pushing the same document will result in the same artifact. 161 | > 162 | > * Our example does not deal with duplicate files. Any duplicate will overwrite the previous, this could be improved inserting another layer of business logic that first checks the inbound file and renames with a UUID, it could additionally check to see if the file hash has already been processed. 163 | > 164 | > * The processing time of our transactions is fast and we can handle multiple files in a single invocation. Under heavy load of inbound files the SQS queue handles the work being distributed to lambda up to 1000 concurrent batches. 165 | 166 | --- 167 | 168 | 169 | ## Performance Efficiency 170 | 171 | #### PERF 1. How do you optimize your Serverless application’s performance? 172 | 173 | * [ ] Question does not apply to this workload 174 | 175 | 176 | * [x] **[Required]** Measure, evaluate, and select optimum capacity units 177 | * [x] **[Good]** Measure and optimize function startup time 178 | * [ ] **[Good]** Take advantage of concurrency via async and stream-based function invocations 179 | * [x] **[Good]** Optimize access patterns and apply caching where applicable 180 | * [x] **[Best]** Integrate with managed services directly over functions when possible 181 | 182 | * [ ] None of these 183 | 184 | 185 | ##### Notes 186 | 187 | > * We have looked at how our function performs with different batch sizes and memory configurations to find what we believe is optimal for cost/performance . 188 | > 189 | > * For our example there is no real advantage to async. If concurrency was an issue it would be possible to chain the business logic, rather than perform it in parallel. 190 | > 191 | > * Data is pulled from S3 and held locally and cached for the execution, however currently there is only a single task performed per invocation so there is no benefit. Caching outside of the function would offer no benefit over S3. 192 | > 193 | > * In our Sentiment function we are utilising comprehend which a managed service. 194 | 195 | --- 196 | 197 | ## Cost Optimization 198 | 199 | #### COST 1. How do you optimize your Serverless application’s costs? 200 | 201 | * [ ] Question does not apply to this workload 202 | 203 | * [x] **[Required]** Minimize external calls and function code initialization 204 | * [x] **[Required]** Optimize logging output and its retention 205 | * [x] **[Good]** Optimize function configuration to reduce cost 206 | * [x] **[Best]** Use cost-aware usage patterns in code 207 | 208 | * [ ] None of these 209 | 210 | ##### Notes 211 | 212 | >We have configurable logging levels and bench marked our function for optimal cost/performance. 213 | -------------------------------------------------------------------------------- /buildspec-test.yml: -------------------------------------------------------------------------------- 1 | version: 0.2 2 | 3 | phases: 4 | install: 5 | runtime-versions: 6 | python: 3.7 7 | commands: 8 | - pip install --upgrade awscli 9 | build: 10 | commands: 11 | - chmod +x tests.sh 12 | - ./tests.sh $OUTPUT_STACK_NAME 13 | post_build: 14 | commands: 15 | - bash -c "if [ /"$CODEBUILD_BUILD_SUCCEEDING/" == /"0/" ]; then exit 1; fi" 16 | - echo Test stage successfully completed on `date` 17 | -------------------------------------------------------------------------------- /buildspec.yml: -------------------------------------------------------------------------------- 1 | version: 0.2 2 | 3 | phases: 4 | install: 5 | runtime-versions: 6 | python: 3.7 7 | commands: 8 | - pip install --upgrade aws-sam-cli 9 | build: 10 | commands: 11 | - sam build --use-container 12 | post_build: 13 | commands: 14 | - sam package --output-template-file $SAM_OUTPUT_TEMPLATE --s3-bucket $ARTIFACT_BUCKET 15 | - bash -c "if [ /"$CODEBUILD_BUILD_SUCCEEDING/" == /"0/" ]; then exit 1; fi" 16 | - echo Build stage successfully completed on `date` 17 | artifacts: 18 | files: 19 | - $SAM_OUTPUT_TEMPLATE 20 | -------------------------------------------------------------------------------- /cleanup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | 4 | echo "Clearing out resources of lambda-file-refarch stack..." 5 | echo 6 | echo "Cleaning up S3 buckets..." && for bucket in InputBucket ConversionTargetBucket; do 7 | echo "Clearing out ${bucket}..." 8 | BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch --logical-resource-id ${bucket} --query "StackResourceDetail.PhysicalResourceId" --output text) 9 | aws s3 rm s3://${BUCKET} --recursive 10 | echo 11 | done 12 | 13 | echo "Deleting CloudFormation stack..." && aws cloudformation delete-stack \ 14 | --stack-name lambda-file-refarch 15 | 16 | echo "Clearing out CloudWatch Log Groups..." && for log_group in $(aws logs describe-log-groups --log-group-name-prefix '/aws/lambda/lambda-file-refarch-' --query "logGroups[*].logGroupName" --output text); do 17 | echo "Removing log group ${log_group}..." 18 | aws logs delete-log-group --log-group-name ${log_group} 19 | echo 20 | done 21 | -------------------------------------------------------------------------------- /img/lambda-refarch-fileprocessing-dashboard.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/lambda-refarch-fileprocessing/38896310187d090e25adf08b57c5a252979de487/img/lambda-refarch-fileprocessing-dashboard.png -------------------------------------------------------------------------------- /img/lambda-refarch-fileprocessing-simple-pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/lambda-refarch-fileprocessing/38896310187d090e25adf08b57c5a252979de487/img/lambda-refarch-fileprocessing-simple-pipeline.png -------------------------------------------------------------------------------- /img/lambda-refarch-fileprocessing-simple.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/lambda-refarch-fileprocessing/38896310187d090e25adf08b57c5a252979de487/img/lambda-refarch-fileprocessing-simple.png -------------------------------------------------------------------------------- /img/lambda-refarch-fileprocessing-x-ray-error-trace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/lambda-refarch-fileprocessing/38896310187d090e25adf08b57c5a252979de487/img/lambda-refarch-fileprocessing-x-ray-error-trace.png -------------------------------------------------------------------------------- /pipeline/README.md: -------------------------------------------------------------------------------- 1 | # Serverless Reference Architecture: Real-time File Processing Deployment Pipeline 2 | 3 | The Real-time File Processing reference pipeline architecture is an example of using basic CI/CD pipeline using the AWS fully managed continuous delivery service [CodePipeline](https://aws.amazon.com/codepipeline/) in order to deploy a Serverless application. Our pipeline consists of source, build and deployment stages. 4 | We use exactly the same method as in the manual deployment however we utilise [CodeBuild](https://aws.amazon.com/codebuild/) to build and package our application and the native CodePipeline CloudFormation support to deploy our package. 5 | 6 | ## CI/CD Pipeline Diagram 7 | 8 | 9 | ![Reference Architecture - Real-time File Processing CI/CD Pipeline](../img/lambda-refarch-fileprocessing-simple-pipeline.png) 10 | 11 | 12 | ## Pipeline Components 13 | 14 | 15 | ### CloudFormation Template 16 | 17 | 18 | pipeline.yml is a CloudFormation template that will deploy all the required pipeline components. Once the stack has deployed the Pipeline will automatically execute and deploy the Serverless Application. See getting started for information on how to deploy the template. 19 | 20 | 21 | #### Deployed Resources 22 | 23 | 24 | * Pipeline S3 bucket, used to store pipeline artefacts that are passed between stages. 25 | * CodePipeline 26 | * CodeBuild Build and Test Projects 27 | * Roles for CodePipeline, CodeBuild and the CloudFormation Deployment 28 | * SNS Topic for Pipeline notifications 29 | * CloudWatch Event for Pipeline Failures 30 | 31 | 32 | ### Source 33 | 34 | 35 | For this application we are hosting our source code in GitHub. Other [Source Integrations](https://docs.aws.amazon.com/codepipeline/latest/userguide/integrations-action-type.html#integrations-source) are available however this template focuses on GitHub. Whenever an update is pushed to the GitHub branch being 36 | monitored (default: master) our pipeline will begin executing. The source stage will connect to GitHub using the credentials provided and clone the branch into our pipeline artefact bucket for use in the other stages. 37 | 38 | 39 | ### Build 40 | 41 | 42 | In order to run our SAM build and SAM package commands we are using [CodeBuild](https://aws.amazon.com/codebuild/), a fully managed continuous integration service. Codebuild allows us to perform a sequence of commands that we define in the [buildSpec.yml](https://docs.aws.amazon.com/codebuild/latest/userguide/build-spec-ref.html) 43 | file that will execute inside the [build environment](https://docs.aws.amazon.com/codebuild/latest/userguide/build-env-ref.html) we define using a docker container. For this project we are using the Amazon Linux 2 version 1.0 container with Python 3.7. 44 | 45 | Within the buildspec.yml we are: 46 | 47 | * Updating SAM to the latest version 48 | * Running SAM build as per the manual deployment 49 | * Running SAM Package again as per the manual deployment steps 50 | * Instructing CodeBuild to pass the output template back to the Pipeline for use in the deployment stage. 51 | 52 | 53 | 54 | ### Deploy 55 | 56 | 57 | To deploy our application stack we are not using SAM Deploy, CodePipeline doesn't support SAM natively so instead we are opting to use the CodePipeline native support for CloudFormation to deploy the template that SAM creates. The pipeline has a role it use with appropriate permissions to deploy the template created by the SAM package step which will create a stack containing the resources defined in our SAM Template. We are using [change sets](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-updating-stacks-changesets.html) and [approval actions](https://docs.aws.amazon.com/codepipeline/latest/userguide/approvals-action-add.html) to demonstrate a manual approval workflow. 58 | 59 | You will need to approve the deployment before the pipeline execution actually deploys any resources. Once approved, additional resources will be deployed as per the main architecture documentation. 60 | 61 | 62 | ### Test 63 | 64 | The test stage will execute a bash script to perform an end to end test of the application. It uploads 24 sample files from the tests directory and checks for outputs and sentiment DB entries. 65 | 66 | If it cannot locate either the output files or DB entries the pipeline will fail. Once the tests successfully complete the script removes the test resources. 67 | 68 | 69 | ## Getting started 70 | 71 | 72 | To get started using the template found in this repository under pipeline/pipeline.yaml. You will need to provide additional information to deploy the stack. 73 | 74 | * GitHubToken: GitHub OAuthToken with access to be able to clone the repository. You can find more information in the [GitHub Documentation](https://github.com/settings/tokens) 75 | * AlarmRecipientEmailAddress: You will need to provide an email address that can be used for configuring notifications 76 | 77 | Optionally, if you are deploying from your own repository you will need to also provide: 78 | 79 | * GitHubRepoName: The name of the GitHub repository hosting your source code. By default it points to the aws-samples repo. 80 | * GitHubRepoBranch: The GitHub repo branch code pipeline should watch for changes on. This defaults to master, but any branch can be used. 81 | * GitHubRepoOwner: the GitHub repository owner. e.g. aws-samples 82 | 83 | 84 | 85 | ### Deploying the template 86 | 87 | 88 | You can deploy the template using either the [AWS Console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-create-stack.html) or the [AWS CLI](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-cli-creating-stack.html) 89 | 90 | **[TODO]** Insert quick link to create CFN stack 91 | 92 | 93 | 94 | ##### Example CLI Deployment 95 | 96 | 97 | > aws cloudformation deploy --template-file pipeline/pipeline.yaml --stack-name "lambda-file-refarch-pipeline" --capabilities "CAPABILITY_IAM" "CAPABILITY_NAMED_IAM" --parameter-overrides GitHubToken="**{replace with your GitHub Token}**" AlarmRecipientEmailAddress="**{replace with your admin email}**" 98 | 99 | 100 | 101 | ### Deploying twice for a Development and Production example. 102 | 103 | 104 | You can actually deploy the pipeline twice to give two separate environments. Allowing you to create a simple dev to production workflow. 105 | 106 | This will allow you to build your application in your development branch and any changes will automatically be picked up and deployed by the pipeline. Once you have tested and are happy the changes can be merged to master and they will be automatically built and deployed to production. 107 | 108 | Deploy the first stack using a stack name of "lambda-file-refarch-pipeline-dev" update the **AppName** parameter to be environment specific. e.g. "lambda-file-refarch-dev" and make sure to update the branch to the development one. 109 | 110 | ##### Example CLI Deployment for development pipeline 111 | 112 | 113 | > aws cloudformation deploy --template-file pipeline/pipeline.yaml --stack-name "lambda-file-refarch-pipeline-dev" --capabilities "CAPABILITY_IAM" "CAPABILITY_NAMED_IAM" --parameter-overrides AppName="lambda-file-refarch-dev" GitHubToken="**{replace with your GitHub Token}**" AlarmRecipientEmailAddress="**{replace with your admin email}**" GitHubRepoBranch="develop" 114 | 115 | 116 | Once that has deployed and the application stack has also successfully deployed you can provision the production pipeline stack. 117 | 118 | 119 | > aws cloudformation deploy --template-file pipeline/pipeline.yaml --stack-name "lambda-file-refarch-pipeline-prod" --capabilities "CAPABILITY_IAM" "CAPABILITY_NAMED_IAM" --parameter-overrides AppName="lambda-file-refarch-prod" GitHubToken="**{replace with your GitHub Token}**" AlarmRecipientEmailAddress="**{replace with your admin email}**" GitHubRepoBranch="master" 120 | 121 | 122 | ##### Approval Actions 123 | 124 | Any deployments will require the approval of a change set before the deployment can proceed. There will be an email sent to the admin email address, which will include a link to the approval request. You will need to ensure you have confirmed the subscription in order to receive the notification. 125 | 126 | Alternatively you can do this by navigating the console or you can use the cli [Approve or Reject an Approval Action in CodePipeline](https://docs.aws.amazon.com/codepipeline/latest/userguide/approvals-approve-or-reject.html) 127 | 128 | To use the CLI it requires the creation of a JSON document and knowing the token for the last execution. See the documentation above for details on this. 129 | 130 | > aws codepipeline put-approval-result --cli-input-json file://approvalstage-approved.json 131 | 132 | ## Clean-up 133 | 134 | In order to remove all resources created by this example you will first need to make sure the 3 S3 buckets are empty. 135 | 136 | * Pipeline artefact bucket 137 | * Application input bucket 138 | * Application conversion bucket 139 | 140 | Once that is complete you can remove both the Application Stack and the Pipeline Stack. 141 | Note that the pipeline stack should not be removed until the application stack has successfully deleted as it is deployed using a role present in the pipeline stack. This role is used to also delete the stack. 142 | 143 | Additionally there will be some Codebuild logs and Log Groups left over in CloudWatch, these can be deleted. 144 | 145 | Alternatively you can use the script /pipeline/cleanup.sh 146 | 147 | Things to note: 148 | 149 | * Script will remove only stacks deployed as described in the examples. 150 | 151 | * Both the application and the pipeline stacks will be removed. 152 | 153 | * JQ needs to be installed in order to empty the pipeline bucket as versioning is enabled. The command to delete versions and markers requires it. -------------------------------------------------------------------------------- /pipeline/cleanup.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | command -v jq >/dev/null 2>&1 || { echo >&2 "jq is required but it's not installed. Aborting."; exit 1; } 4 | 5 | echo "Clearing out resources of lambda-file-refarch and Pipeline stacks..." 6 | echo 7 | echo "Cleaning up Application S3 buckets..." && for bucket in InputBucket ConversionTargetBucket; do 8 | echo "Clearing out ${bucket}..." 9 | BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch-app --logical-resource-id ${bucket} --query "StackResourceDetail.PhysicalResourceId" --output text) 10 | aws s3 rm s3://${BUCKET} --recursive 11 | echo 12 | done 13 | 14 | echo "Cleaning up Pipeline S3 buckets..." 15 | BUCKET=$(aws cloudformation describe-stack-resource --stack-name lambda-file-refarch-pipeline --logical-resource-id "PipelineBucket" --query "StackResourceDetail.PhysicalResourceId" --output text) 16 | 17 | echo 18 | 19 | echo "Removing all versions from ${BUCKET}" 20 | 21 | VERSIONS=`aws s3api list-object-versions --bucket $BUCKET | jq '.Versions'` 22 | MARKERS=`aws s3api list-object-versions --bucket $BUCKET | jq '.DeleteMarkers'` 23 | let COUNT=`echo $VERSIONS | jq 'length'`-1 24 | 25 | if [ $COUNT -gt -1 ]; then 26 | echo "removing files from bucket" 27 | for i in $(seq 0 $COUNT); do 28 | KEY=`echo $VERSIONS | jq .[$i].Key | sed -e 's/\"//g'` 29 | VERSIONID=`echo $VERSIONS | jq .[$i].VersionId | sed -e 's/\"//g'` 30 | CMD="aws s3api delete-object --bucket $BUCKET --key $KEY --version-id $VERSIONID" 31 | echo ${CMD} 32 | $CMD 33 | done 34 | fi 35 | 36 | let COUNT=`echo $MARKERS |jq 'length'`-1 37 | 38 | if [ $COUNT -gt -1 ]; then 39 | echo "removing delete markers" 40 | 41 | for i in $(seq 0 $COUNT); do 42 | KEY=`echo $MARKERS | jq .[$i].Key | sed -e 's/\"//g'` 43 | VERSIONID=`echo $MARKERS | jq .[$i].VersionId | sed -e 's/\"//g'` 44 | CMD="aws s3api delete-object --bucket $BUCKET --key $KEY --version-id $VERSIONID" 45 | echo ${CMD} 46 | $CMD 47 | done 48 | fi 49 | 50 | echo "Deleting lambda-file-refarch-app CloudFormation stack..." && aws cloudformation delete-stack \ 51 | --stack-name lambda-file-refarch-app 52 | 53 | echo "Waiting for stack deletion..." && aws cloudformation wait stack-delete-complete \ 54 | --stack-name lambda-file-refarch-app 55 | 56 | echo "Deleting lambda-file-refarch-pipeline CloudFormation stack..." && aws cloudformation delete-stack \ 57 | --stack-name lambda-file-refarch-pipeline 58 | 59 | echo "Waiting for stack deletion..." && aws cloudformation wait stack-delete-complete \ 60 | --stack-name lambda-file-refarch-pipeline 61 | 62 | echo "Clearing out Application CloudWatch Log Groups..." && for log_group in $(aws logs describe-log-groups --log-group-name-prefix /aws/lambda/lambda-file-refarch-app- --query "logGroups[*].logGroupName" --output text); do 63 | echo "Removing log group ${log_group}..." 64 | aws logs delete-log-group --log-group-name ${log_group} 65 | echo 66 | done 67 | 68 | echo "Clearing out CodeBuild CloudWatch Log Groups..." && for log_group in $(aws logs describe-log-groups --log-group-name-prefix /aws/codebuild/lambda-file-refarch-app-build --query "logGroups[*].logGroupName" --output text); do 69 | echo "Removing log group ${log_group}..." 70 | aws logs delete-log-group --log-group-name ${log_group} 71 | echo 72 | done -------------------------------------------------------------------------------- /pipeline/pipeline.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: "2010-09-09" 2 | Description: "Template for full CI/CD serverless applications." 3 | Parameters: 4 | AppName: 5 | Type: String 6 | Default: lambda-file-refarch-app 7 | Description: Name used for application deployment 8 | SAMOutputFile: 9 | Type: String 10 | Default: packaged-template.yml 11 | Description: The filename for the output SAM file from the buildspec file 12 | CodeBuildImage: 13 | Type: String 14 | Default: "aws/codebuild/amazonlinux2-x86_64-standard:1.0" 15 | Description: Image used for CodeBuild project. 16 | GitHubRepoName: 17 | Type: String 18 | Default: "lambda-refarch-fileprocessing" 19 | Description: The GitHub repo name 20 | GitHubRepoBranch: 21 | Type: String 22 | Description: The GitHub repo branch code pipelines should watch for changes on 23 | Default: master 24 | GitHubRepoOwner: 25 | Type: String 26 | Default: "aws-samples" 27 | Description: GitHub Repository Owner. 28 | GitHubToken: 29 | NoEcho: true 30 | Type: String 31 | Description: "Secret. OAuthToken with access to Repo. Long string of characters and digits. Go to https://github.com/settings/tokens" 32 | AlarmRecipientEmailAddress: 33 | Type: String 34 | Description: Email address for any alerts. 35 | Resources: 36 | CodeBuildProject: 37 | DependsOn: [PipelineBucket] 38 | Description: AWS CodeBuild project 39 | Type: AWS::CodeBuild::Project 40 | Properties: 41 | Artifacts: 42 | Type: CODEPIPELINE 43 | Description: !Sub "Building stage for ${AppName}." 44 | Environment: 45 | ComputeType: BUILD_GENERAL1_SMALL 46 | PrivilegedMode: True 47 | EnvironmentVariables: 48 | - Name: ARTIFACT_BUCKET 49 | Value: !Ref PipelineBucket 50 | - Name: SAM_OUTPUT_TEMPLATE 51 | Value: !Ref SAMOutputFile 52 | Image: !Ref CodeBuildImage 53 | Type: LINUX_CONTAINER 54 | Name: !Sub "${AppName}-build" 55 | ServiceRole: !GetAtt CodeBuildTrustRole.Arn 56 | Source: 57 | Type: CODEPIPELINE 58 | Tags: 59 | - Key: app-name 60 | Value: !Ref AppName 61 | TimeoutInMinutes: 5 62 | CodeBuildTestProject: 63 | DependsOn: [PipelineBucket] 64 | Description: AWS CodeBuild project 65 | Type: AWS::CodeBuild::Project 66 | Properties: 67 | Artifacts: 68 | Type: CODEPIPELINE 69 | Description: !Sub "Testing stage for ${AppName}." 70 | Environment: 71 | ComputeType: BUILD_GENERAL1_SMALL 72 | PrivilegedMode: True 73 | EnvironmentVariables: 74 | - Name: OUTPUT_STACK_NAME 75 | Value: !Sub "${AppName}" 76 | Image: !Ref CodeBuildImage 77 | Type: LINUX_CONTAINER 78 | Name: !Sub "${AppName}-test" 79 | ServiceRole: !GetAtt CodeBuildTrustRole.Arn 80 | Source: 81 | Type: CODEPIPELINE 82 | BuildSpec: "buildspec-test.yml" 83 | Tags: 84 | - Key: app-name 85 | Value: !Ref AppName 86 | TimeoutInMinutes: 5 87 | PipelineBucket: 88 | Description: S3 bucket for AWS CodePipeline artifacts 89 | Type: AWS::S3::Bucket 90 | Properties: 91 | BucketName: !Sub "pipeline-${AWS::AccountId}-${AWS::Region}-${AppName}" 92 | VersioningConfiguration: 93 | Status: Enabled 94 | PipelineNotificationTopic: 95 | Type: AWS::SNS::Topic 96 | Properties: 97 | Subscription: 98 | - Protocol: email 99 | Endpoint: !Ref AlarmRecipientEmailAddress 100 | PipelineSNSTopicPolicy: 101 | Type: AWS::SNS::TopicPolicy 102 | Properties: 103 | PolicyDocument: 104 | Id: PipelineTopicPolicy 105 | Version: '2012-10-17' 106 | Statement: 107 | - Sid: CwEventsPut 108 | Effect: Allow 109 | Principal: 110 | Service: 111 | - events.amazonaws.com 112 | Action: sns:Publish 113 | Resource: !Ref PipelineNotificationTopic 114 | - Sid: PipelinePut 115 | Effect: Allow 116 | Principal: 117 | Service: 118 | - codepipeline.amazonaws.com 119 | Action: sns:Publish 120 | Resource: !Ref PipelineNotificationTopic 121 | Topics: 122 | - !Ref PipelineNotificationTopic 123 | S3ArtifactBucketPolicy: 124 | DependsOn: [PipelineBucket] 125 | Description: S3 bucket policy for AWS CodePipeline access 126 | Type: AWS::S3::BucketPolicy 127 | Properties: 128 | Bucket: !Ref PipelineBucket 129 | PolicyDocument: 130 | Version: "2012-10-17" 131 | Id: SSEAndSSLPolicy 132 | Statement: 133 | - Sid: DenyInsecureConnections 134 | Effect: Deny 135 | Principal: "*" 136 | Action: s3:* 137 | Resource: !Sub "arn:aws:s3:::${PipelineBucket}/*" 138 | Condition: 139 | Bool: 140 | aws:SecureTransport: false 141 | ProjectPipeline: 142 | DependsOn: [PipelineBucket, CodeBuildProject] 143 | Description: AWS CodePipeline deployment pipeline for project 144 | Type: AWS::CodePipeline::Pipeline 145 | Properties: 146 | Name: !Sub "${AppName}-pipeline" 147 | RoleArn: !GetAtt CodePipelineTrustRole.Arn 148 | Stages: 149 | - Name: Source 150 | Actions: 151 | - Name: source 152 | InputArtifacts: [] 153 | ActionTypeId: 154 | Version: "1" 155 | Category: Source 156 | Owner: ThirdParty 157 | Provider: GitHub 158 | OutputArtifacts: 159 | - Name: !Sub "${AppName}-SourceArtifact" 160 | Configuration: 161 | Repo: !Ref GitHubRepoName 162 | Branch: !Ref GitHubRepoBranch 163 | OAuthToken: !Ref GitHubToken 164 | Owner: !Ref GitHubRepoOwner 165 | RunOrder: 1 166 | - Name: Build 167 | Actions: 168 | - Name: build-from-source 169 | InputArtifacts: 170 | - Name: !Sub "${AppName}-SourceArtifact" 171 | ActionTypeId: 172 | Category: Build 173 | Owner: AWS 174 | Version: "1" 175 | Provider: CodeBuild 176 | OutputArtifacts: 177 | - Name: !Sub "${AppName}-BuildArtifact" 178 | Configuration: 179 | ProjectName: !Sub "${AppName}-build" 180 | RunOrder: 1 181 | - Name: Deploy 182 | Actions: 183 | - Name: create-changeset 184 | InputArtifacts: 185 | - Name: !Sub "${AppName}-BuildArtifact" 186 | ActionTypeId: 187 | Category: Deploy 188 | Owner: AWS 189 | Version: "1" 190 | Provider: CloudFormation 191 | OutputArtifacts: [] 192 | Configuration: 193 | StackName: !Sub "${AppName}" 194 | ActionMode: CHANGE_SET_REPLACE 195 | RoleArn: !GetAtt CloudFormationTrustRole.Arn 196 | ChangeSetName: pipeline-changeset 197 | Capabilities: CAPABILITY_NAMED_IAM 198 | TemplatePath: !Sub "${AppName}-BuildArtifact::${SAMOutputFile}" 199 | ParameterOverrides: !Sub '{"AlarmRecipientEmailAddress": "${AlarmRecipientEmailAddress}"}' 200 | RunOrder: 1 201 | - Name: approve-changeset 202 | InputArtifacts: [] 203 | ActionTypeId: 204 | Category: Approval 205 | Owner: AWS 206 | Provider: Manual 207 | Version: '1' 208 | Configuration: 209 | NotificationArn: !Ref PipelineNotificationTopic 210 | RunOrder: 2 211 | - Name: execute-changeset 212 | InputArtifacts: [] 213 | ActionTypeId: 214 | Category: Deploy 215 | Owner: AWS 216 | Version: "1" 217 | Provider: CloudFormation 218 | OutputArtifacts: [] 219 | Configuration: 220 | StackName: !Sub "${AppName}" 221 | ActionMode: CHANGE_SET_EXECUTE 222 | ChangeSetName: pipeline-changeset 223 | RunOrder: 3 224 | - Name: Test 225 | Actions: 226 | - Name: end-to-end 227 | InputArtifacts: 228 | - Name: !Sub "${AppName}-SourceArtifact" 229 | ActionTypeId: 230 | Category: Build 231 | Owner: AWS 232 | Version: "1" 233 | Provider: CodeBuild 234 | Configuration: 235 | ProjectName: !Sub "${AppName}-test" 236 | RunOrder: 1 237 | ArtifactStore: 238 | Type: S3 239 | Location: !Ref PipelineBucket 240 | PipelineEventRule: 241 | Type: "AWS::Events::Rule" 242 | Properties: 243 | Description: "Trigger notifications based on pipeline state change to Failure" 244 | EventPattern: 245 | source: 246 | - "aws.codepipeline" 247 | detail-type: 248 | - CodePipeline Pipeline Execution State Change 249 | detail: 250 | state: 251 | - "FAILED" 252 | State: "ENABLED" 253 | Targets: 254 | - Arn: !Ref PipelineNotificationTopic 255 | Id: "PipelineTopic" 256 | InputTransformer: 257 | InputTemplate: !Sub '"The ${AppName} Pipeline in account has at