├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── archive_document ├── __init__.py ├── app.py └── requirements.txt ├── copyright.txt ├── invoices ├── Test-Alpha-Credit-Card-2020-05-19.pdf ├── Test-Alpha-Credit-Card-2020-05-19.pdf.license ├── Test-Bravo-Electric-2020-05-14.pdf ├── Test-Bravo-Electric-2020-05-14.pdf.license ├── Test-Charlie-Water-2020-05-18.pdf ├── Test-Charlie-Water-2020-05-18.pdf.license ├── Test-Delta-Cable-2020-05-23.pdf └── Test-Delta-Cable-2020-05-23.pdf.license ├── process_document_analysis ├── __init__.py ├── app.py └── requirements.txt ├── save_document_analysis ├── __init__.py ├── app.py └── requirements.txt ├── start_document_analysis ├── __init__.py ├── app.py └── requirements.txt ├── start_process_scanned_invoice_workflow ├── __init__.py ├── app.py └── requirements.txt ├── state_machine └── process_scanned_invoice_workflow.asl.json └── template.yaml /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *main* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 7 | the Software, and to permit persons to whom the Software is furnished to do so. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 10 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 11 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 12 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 13 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 14 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 15 | 16 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Getting started with RPA using AWS Step Functions and Amazon Textract 2 | 3 | [AWS Step 4 | Functions](https://aws.amazon.com/step-functions/) is a serverless function 5 | orchestrator and workflow automation tool. [Amazon Textract](https://aws.amazon.com/textract/) 6 | is a fully managed machine learning service that automatically extracts text 7 | and data from scanned documents. Combining these services, you can create an RPA bot 8 | to automate the processing of documents. 9 | 10 | See [Getting started with RPA using AWS Step Functions and Amazon Textract blog post](https://aws.amazon.com/blogs/compute/getting-started-with-rpa-using-aws-step-functions-and-amazon-textract/). 11 | 12 | ## Security 13 | 14 | See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information. 15 | 16 | ## License 17 | 18 | This library is licensed under the MIT-0 License. See the LICENSE file. 19 | 20 | ## Prerequisites 21 | 22 | Before you get started with deploying the solution, you must install the 23 | following prerequisites: 24 | 25 | 1. [Python](https://www.python.org/) 26 | 27 | 2. [AWS Command Line Interface (AWS CLI)](https://aws.amazon.com/cli/) 28 | -- for instructions, see [Installing the AWS 29 | CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html) 30 | 31 | 3. [AWS Serverless Application Model Command Line Interface (AWS 32 | SAM CLI)](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-command-reference.html) 33 | -- for instructions, see [Installing the AWS SAM 34 | CLI](https://docs.aws.amazon.com/serverless-application-model/latest/developerguide/serverless-sam-cli-install.html) 35 | 36 | ## Deploying the solution 37 | 38 | The solution will create the following three Amazon Simple Storage 39 | Service (S3) buckets with names suffixed by your AWS Account ID to 40 | prevent a global namespace collision of your S3 bucket names: 41 | 42 | - scanned-invoices-\ 43 | 44 | - invoice-analyses-\ 45 | 46 | - processed-invoices-\ 47 | 48 | The below steps deploy the reference implementation in your AWS account. 49 | The solution deploys several components including an AWS Step Functions 50 | state machine, AWS Lambda functions, Amazon Simple Storage Service (S3) 51 | buckets, an Amazon DynamoDB table for payment information, and AWS 52 | Simple Notification Service (SNS) topics. You will need an Amazon S3 53 | bucket to be used by AWS CloudFormation for deploying the solution. You 54 | will also need a stack name, e.g., Getting-Started-with-RPA, for 55 | deploying the solution. To deploy run the following commands from a 56 | terminal session: 57 | 58 | 1. Download code from GitHub repo 59 | (). 60 | 61 | 2. Run the following command to build the artifacts locally on your 62 | workstation: 63 | 64 | sam build 65 | 66 | 3. Run the following command to create a CloudFormation stack and 67 | deploy your resources: 68 | 69 | sam deploy --guided --capabilities CAPABILITY_NAMED_IAM 70 | 71 | Monitor the progress and wait for the completion of the stack creation 72 | process from the [AWS CloudFormation 73 | console](https://console.aws.amazon.com/cloudformation/home) before 74 | proceeding. 75 | 76 | ## Testing the solution 77 | 78 | To test the solution, upload the .PDF test invoices from the invoices 79 | folder of the downloaded solution to the S3 bucket named 80 | scanned-invoices-\ created during deployment. 81 | 82 | An AWS Step Functions state machine with the name \-ProcessedScannedInvoiceWorkflow will execute the workflow. Amazon 84 | Textract document analyses will be stored in the S3 bucket named 85 | invoice-analyses-\, and processed invoices will be 86 | stored in the S3 bucket named processed-invoices-\. Processed payments will be found in the DynamoDB table named 88 | \-invoices. 89 | 90 | You can monitor the execution status of the workflows from the [AWS Step 91 | Functions console](https://console.aws.amazon.com/states/home). 92 | 93 | Upon completion of the workflow executions, review the items added to 94 | DynamoDB from the [Amazon DynamoDB 95 | console](https://console.aws.amazon.com/dynamodb/home). 96 | 97 | ## Cleanup 98 | 99 | To avoid ongoing charges for resources you created, 100 | follow the below steps which will delete the stack of resources 101 | deployed: 102 | 103 | 1. Empty the three S3 buckets created during deployment using the 104 | [Amazon S3 Console](https://s3.console.aws.amazon.com/s3/home): 105 | 106 | - scanned-invoices-\ 107 | - invoice-analyses-\ 108 | - processed-invoices-\ 109 | 110 | 2. Delete the CloudFormation stack created during deployment using the 111 | [AWS CloudFormation 112 | console](https://console.aws.amazon.com/cloudformation/home). 113 | -------------------------------------------------------------------------------- /archive_document/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /archive_document/app.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | import boto3 17 | import json 18 | import os 19 | 20 | s3_client = boto3.client('s3') 21 | 22 | def lambda_handler(event, context): 23 | print("Processing Event:") 24 | print(json.dumps(event)) 25 | bucket_name = event["bucket_name"] 26 | key = event["key"] 27 | copy_source={ 28 | 'Bucket': bucket_name, 29 | 'Key': key 30 | } 31 | processed_invoices_bucket_name = os.environ["ARCHIVE_BUCKET_NAME"] 32 | s3_client.copy(copy_source, processed_invoices_bucket_name, key) 33 | s3_client.delete_object( 34 | Bucket=bucket_name, 35 | Key=key 36 | ) 37 | return event 38 | -------------------------------------------------------------------------------- /archive_document/requirements.txt: -------------------------------------------------------------------------------- 1 | boto3 2 | requests -------------------------------------------------------------------------------- /copyright.txt: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /invoices/Test-Alpha-Credit-Card-2020-05-19.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-step-functions-rpa/dc95d8fdabd429f4b7c41238e35dc80c86674977/invoices/Test-Alpha-Credit-Card-2020-05-19.pdf -------------------------------------------------------------------------------- /invoices/Test-Alpha-Credit-Card-2020-05-19.pdf.license: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /invoices/Test-Bravo-Electric-2020-05-14.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-step-functions-rpa/dc95d8fdabd429f4b7c41238e35dc80c86674977/invoices/Test-Bravo-Electric-2020-05-14.pdf -------------------------------------------------------------------------------- /invoices/Test-Bravo-Electric-2020-05-14.pdf.license: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /invoices/Test-Charlie-Water-2020-05-18.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-step-functions-rpa/dc95d8fdabd429f4b7c41238e35dc80c86674977/invoices/Test-Charlie-Water-2020-05-18.pdf -------------------------------------------------------------------------------- /invoices/Test-Charlie-Water-2020-05-18.pdf.license: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /invoices/Test-Delta-Cable-2020-05-23.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-step-functions-rpa/dc95d8fdabd429f4b7c41238e35dc80c86674977/invoices/Test-Delta-Cable-2020-05-23.pdf -------------------------------------------------------------------------------- /invoices/Test-Delta-Cable-2020-05-23.pdf.license: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /process_document_analysis/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /process_document_analysis/app.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | import boto3 17 | from datetime import datetime 18 | import json 19 | import os 20 | import re 21 | import uuid 22 | 23 | s3_client = boto3.client('s3') 24 | dynamodb_client = boto3.client('dynamodb') 25 | 26 | due_date_tags = ["pay on or before", "payment due date", "payment due"] 27 | amount_tags = ["total due", "new balance total", "total current charges", "please pay"] 28 | 29 | def lambda_handler(event, context): 30 | print("Processing Event:") 31 | print(json.dumps(event)) 32 | invoice_analyses_bucket_name = event["invoice_analyses_bucket_name"] 33 | invoice_analyses_bucket_key = event["invoice_analyses_bucket_key"] 34 | response = s3_client.get_object(Bucket=invoice_analyses_bucket_name, Key=invoice_analyses_bucket_key) 35 | body = json.loads(response['Body'].read().decode("utf-8")) 36 | key_map, value_map, block_map = get_kv_map(body['Blocks']) 37 | kvs = get_kv_relationship(key_map, value_map, block_map) 38 | lines = get_line_list(body['Blocks']) 39 | print("\n\n== FOUND KEY : VALUE pairs ===\n") 40 | print_kvs(kvs) 41 | print("\n\n== LINES ===\n") 42 | print_lines(lines) 43 | payment_info = {} 44 | payment_info['payee_name'] = get_payee_name(lines) 45 | payment_info['amount'] = get_amount(kvs, lines) 46 | payment_info['due_date'] = get_due_date(kvs) 47 | payment_info['memo'] = get_memo(kvs) 48 | payment_info['invoice_key'] = event['job_name'] 49 | process_payment_info(payment_info) 50 | event['payment_info'] = payment_info 51 | return event 52 | 53 | def get_kv_map(blocks): 54 | key_map = {} 55 | value_map = {} 56 | block_map = {} 57 | for block in blocks: 58 | block_id = block['Id'] 59 | block_map[block_id] = block 60 | if block['BlockType'] == "KEY_VALUE_SET": 61 | if 'KEY' in block['EntityTypes']: 62 | key_map[block_id] = block 63 | else: 64 | value_map[block_id] = block 65 | return key_map, value_map, block_map 66 | 67 | def get_kv_relationship(key_map, value_map, block_map): 68 | kvs = {} 69 | for block_id, key_block in key_map.items(): 70 | value_block = find_value_block(key_block, value_map) 71 | key = get_text(key_block, block_map) 72 | val = get_text(value_block, block_map) 73 | kvs[key] = val 74 | return kvs 75 | 76 | def find_value_block(key_block, value_map): 77 | for relationship in key_block['Relationships']: 78 | if relationship['Type'] == 'VALUE': 79 | for value_id in relationship['Ids']: 80 | value_block = value_map[value_id] 81 | return value_block 82 | 83 | def get_text(result, blocks_map): 84 | text = '' 85 | if 'Relationships' in result: 86 | for relationship in result['Relationships']: 87 | if relationship['Type'] == 'CHILD': 88 | for child_id in relationship['Ids']: 89 | word = blocks_map[child_id] 90 | if word['BlockType'] == 'WORD': 91 | text += word['Text'] + ' ' 92 | if word['BlockType'] == 'SELECTION_ELEMENT': 93 | if word['SelectionStatus'] == 'SELECTED': 94 | text += 'X ' 95 | return text 96 | 97 | def print_kvs(kvs): 98 | for key, value in kvs.items(): 99 | print(key, ":", value) 100 | 101 | def search_value(kvs, search_key): 102 | for key, value in kvs.items(): 103 | if re.search(search_key, key, re.IGNORECASE): 104 | return value 105 | 106 | def get_line_list(blocks): 107 | line_list = [] 108 | for block in blocks: 109 | if block['BlockType'] == "LINE": 110 | if 'Text' in block: 111 | line_list.append(block["Text"]) 112 | return line_list 113 | 114 | def print_lines(lines): 115 | for line in lines: 116 | print(line) 117 | 118 | def process_payment_info(payment_info): 119 | invoices_table_name = os.environ["INVOICES_TABLE_NAME"] 120 | payment_info['invoice_id'] = str(uuid.uuid4()) 121 | current_datetime = datetime.now() 122 | payment_info['created_at'] = current_datetime.isoformat() 123 | if "amount" not in payment_info or \ 124 | payment_info.get("amount") is None or \ 125 | len(payment_info['payee_name']) == 0: 126 | payment_info['status'] = "Pending Review" 127 | else: 128 | payment_info['status'] = "Approved for Payment" 129 | attribute_values = {} 130 | attribute_values['invoice_id'] = {"S": payment_info['invoice_id']} 131 | attribute_values['created_at'] = {"S": payment_info['created_at']} 132 | attribute_values['status'] = {"S": payment_info['status']} 133 | attribute_values['payee_name'] = {"S": payment_info['payee_name'] if len(payment_info['payee_name']) > 0 else " "} 134 | if payment_info.get('amount'): 135 | attribute_values['amount'] = {"N": payment_info['amount']} 136 | attribute_values['due_date'] = {"S": payment_info['due_date']} 137 | if payment_info['memo'] is not None: 138 | attribute_values['memo'] = {"S": payment_info['memo']} 139 | attribute_values['invoice_key'] = {"S": payment_info['invoice_key']} 140 | dynamodb_client.put_item( 141 | TableName = invoices_table_name, 142 | Item=attribute_values 143 | ) 144 | 145 | def get_payee_name(lines): 146 | payee_name = "" 147 | payable_to = "payable to" 148 | payee_lines = [line for line in lines if payable_to in line.lower()] 149 | if len(payee_lines) > 0: 150 | payee_line = payee_lines[0] 151 | payee_line = payee_line.strip() 152 | pos = payee_line.lower().find(payable_to) 153 | if pos > -1: 154 | payee_line = payee_line[pos + len(payable_to):] 155 | if payee_line[0:1] == ':': 156 | payee_line = payee_line[1:] 157 | payee_name = payee_line.strip() 158 | return payee_name 159 | 160 | def get_amount(kvs, lines): 161 | amount = None 162 | amounts = [search_value(kvs, amount_tag) for amount_tag in amount_tags if search_value(kvs, amount_tag) is not None] 163 | if len(amounts) > 0: 164 | amount = amounts[0] 165 | else: 166 | for idx, line in enumerate(lines): 167 | if line.lower() in amount_tags: 168 | amount = lines[idx + 1] 169 | break 170 | if amount is not None: 171 | amount = amount.strip() 172 | if amount[0:1] == '$': 173 | amount = amount[1:] 174 | return amount 175 | 176 | def get_due_date(kvs): 177 | due_date = None 178 | due_dates = [search_value(kvs, due_date_tag) for due_date_tag in due_date_tags if search_value(kvs, due_date_tag) is not None] 179 | if len(due_dates) > 0: 180 | due_date = due_dates[0] 181 | if due_date is not None: 182 | date_parts = due_date.split('/') 183 | if len(date_parts) == 3: 184 | due_date = datetime(int(date_parts[2]), int(date_parts[0]), int(date_parts[1])).isoformat() 185 | else: 186 | date_parts = [date_part for date_part in re.split("\s+|,", due_date) if len(date_part) > 0] 187 | if len(date_parts) == 3: 188 | datetime_object = datetime.strptime(date_parts[0], "%b") 189 | month_number = datetime_object.month 190 | due_date = datetime(int(date_parts[2]), int(month_number), int(date_parts[1])).isoformat() 191 | else: 192 | due_date = datetime.now().isoformat() 193 | return due_date 194 | 195 | def get_memo(kvs): 196 | memo = None 197 | account_number = search_value(kvs, "account number") 198 | if account_number is not None: 199 | memo = " ".join(("Account Number:", account_number)) 200 | return memo -------------------------------------------------------------------------------- /process_document_analysis/requirements.txt: -------------------------------------------------------------------------------- 1 | boto3 2 | requests -------------------------------------------------------------------------------- /save_document_analysis/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /save_document_analysis/app.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | import boto3 17 | import copy 18 | import json 19 | import os 20 | 21 | texttract_client = boto3.client('textract') 22 | s3_client = boto3.client('s3') 23 | 24 | def lambda_handler(event, context): 25 | print("Processing Event:") 26 | print(json.dumps(event)) 27 | blocks = [] 28 | analysis = {} 29 | response = texttract_client.get_document_analysis( 30 | JobId=event["job_id"] 31 | ) 32 | analysis = copy.deepcopy(response) 33 | while True: 34 | for block in response["Blocks"]: 35 | blocks.append(block) 36 | if ("NextToken" not in response.keys()): 37 | break 38 | next_token = response["NextToken"] 39 | response = texttract_client.get_document_analysis( 40 | JobId=event["job_id"], 41 | NextToken=next_token 42 | ) 43 | analysis.pop("NextToken", None) 44 | analysis["Blocks"] = blocks 45 | invoice_analyses_bucket_name = os.environ["ANALYSES_BUCKET_NAME"] 46 | invoice_analyses_bucket_key = "{}.json".format(event["key"]) 47 | s3_client.put_object( 48 | Bucket=invoice_analyses_bucket_name, 49 | Key=invoice_analyses_bucket_key, 50 | Body=json.dumps(analysis).encode('utf-8') 51 | ) 52 | event["invoice_analyses_bucket_name"] = invoice_analyses_bucket_name 53 | event["invoice_analyses_bucket_key"] = invoice_analyses_bucket_key 54 | return event 55 | -------------------------------------------------------------------------------- /save_document_analysis/requirements.txt: -------------------------------------------------------------------------------- 1 | boto3 2 | requests -------------------------------------------------------------------------------- /start_document_analysis/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /start_document_analysis/app.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | import boto3 17 | import json 18 | import os 19 | 20 | client = boto3.client('textract') 21 | 22 | def lambda_handler(event, context): 23 | print("Processing Event:") 24 | print(json.dumps(event)) 25 | s3 = event["Records"][0]["s3"] 26 | bucket_name = s3["bucket"]["name"] 27 | key = s3["object"]["key"] 28 | document_location={ 29 | 'S3Object': { 30 | 'Bucket': bucket_name, 31 | 'Name': key 32 | } 33 | } 34 | response = client.start_document_analysis( 35 | DocumentLocation=document_location, 36 | FeatureTypes=[ 37 | 'TABLES','FORMS' 38 | ], 39 | NotificationChannel={ 40 | 'SNSTopicArn': os.environ["DOCUMENT_ANALYIS_COMPLETED_SNS_TOPIC_ARN"], 41 | 'RoleArn': os.environ["TEXTRACT_PUBLISH_TO_SNS_ROLE_ARN"] 42 | } 43 | ) 44 | event["job_id"]=response["JobId"] 45 | return event 46 | -------------------------------------------------------------------------------- /start_document_analysis/requirements.txt: -------------------------------------------------------------------------------- 1 | boto3 2 | requests -------------------------------------------------------------------------------- /start_process_scanned_invoice_workflow/__init__.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | -------------------------------------------------------------------------------- /start_process_scanned_invoice_workflow/app.py: -------------------------------------------------------------------------------- 1 | # Copyright 2020 Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | # SPDX-License-Identifier: MIT-0 3 | # 4 | # Permission is hereby granted, free of charge, to any person obtaining a copy of this 5 | # software and associated documentation files (the "Software"), to deal in the Software 6 | # without restriction, including without limitation the rights to use, copy, modify, 7 | # merge, publish, distribute, sublicense, and/or sell copies of the Software, and to 8 | # permit persons to whom the Software is furnished to do so. 9 | # 10 | # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, 11 | # INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A 12 | # PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT 13 | # HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION 14 | # OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE 15 | # SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 16 | import boto3 17 | import json 18 | import os 19 | 20 | client = boto3.client('stepfunctions') 21 | 22 | def lambda_handler(event, context): 23 | print("Processing Event:") 24 | print(json.dumps(event)) 25 | body = json.loads(event["Records"][0]["body"]) 26 | message = json.loads(body["Message"]) 27 | document_location = message["DocumentLocation"] 28 | bucket_name = document_location["S3Bucket"] 29 | key = document_location["S3ObjectName"] 30 | key_split_on_slash = key.split("/") 31 | join_with_dash = "-".join(key_split_on_slash) 32 | join_split_on_colon = join_with_dash.split(":") 33 | job_name = "_".join(join_split_on_colon) 34 | job_id = message["JobId"] 35 | status = message["Status"] 36 | response = client.start_execution( 37 | stateMachineArn = os.environ['STATE_MACHINE_ARN'], 38 | input="{\"bucket_name\": \"" + bucket_name + 39 | "\",\"key\": \"" + key + 40 | "\",\"job_name\": \"" + job_name + 41 | "\",\"job_id\": \"" + job_id + 42 | "\",\"status\": \"" + status + "\"}" 43 | ) 44 | return response["executionArn"] 45 | -------------------------------------------------------------------------------- /start_process_scanned_invoice_workflow/requirements.txt: -------------------------------------------------------------------------------- 1 | boto3 2 | requests -------------------------------------------------------------------------------- /state_machine/process_scanned_invoice_workflow.asl.json: -------------------------------------------------------------------------------- 1 | { 2 | "Comment": "A state machine that starts a workflow and monitors the workflow until it completes.", 3 | "StartAt": "Did Analyze Document Job Complete Successfully?", 4 | "States": { 5 | "Did Analyze Document Job Complete Successfully?": { 6 | "Type": "Choice", 7 | "Choices": [{ 8 | "Variable": "$.status", 9 | "StringEquals": "SUCCEEDED", 10 | "Next": "Save Document Analysis" 11 | }, { 12 | "Variable": "$.status", 13 | "StringEquals": "FAILED", 14 | "Next": "Analyze Document Job Failed" 15 | }], 16 | "Default": "Analyze Document Job Failed" 17 | }, 18 | "Analyze Document Job Failed": { 19 | "Type": "Fail", 20 | "Cause": "Textract Job Failed", 21 | "Error": "Analyze Document Job returned FAILED" 22 | }, 23 | "Save Document Analysis": { 24 | "Type": "Task", 25 | "Resource": "${SaveDocumentAnalysisLambdaArn}", 26 | "InputPath": "$", 27 | "Next": "Process Document Analysis", 28 | "Retry": [{ 29 | "ErrorEquals": ["States.ALL"], 30 | "IntervalSeconds": 1, 31 | "MaxAttempts": 3, 32 | "BackoffRate": 2 33 | }] 34 | }, 35 | "Process Document Analysis": { 36 | "Type": "Task", 37 | "Resource": "${ProcessDocumentAnalysisLambdaArn}", 38 | "InputPath": "$", 39 | "Next": "Is Approved for Payment?", 40 | "Retry": [{ 41 | "ErrorEquals": ["States.ALL"], 42 | "IntervalSeconds": 1, 43 | "MaxAttempts": 3, 44 | "BackoffRate": 2 45 | }] 46 | }, 47 | "Is Approved for Payment?": { 48 | "Type": "Choice", 49 | "Choices": [{ 50 | "Variable": "$.payment_info.status", 51 | "StringEquals": "Approved for Payment", 52 | "Next": "Archive Document" 53 | }, 54 | { 55 | "Variable": "$.payment_info.status", 56 | "StringEquals": "Pending Review", 57 | "Next": "Review Document" 58 | }], 59 | "Default": "Review Document" 60 | }, 61 | "Archive Document": { 62 | "Type": "Task", 63 | "Resource": "${ArchiveDocumentLambdaArn}", 64 | "InputPath": "$", 65 | "Next": "Document Processed", 66 | "Retry": [{ 67 | "ErrorEquals": ["States.ALL"], 68 | "IntervalSeconds": 1, 69 | "MaxAttempts": 3, 70 | "BackoffRate": 2 71 | }] 72 | }, 73 | "Review Document": { 74 | "Type": "Task", 75 | "InputPath": "$", 76 | "ResultPath": "$", 77 | "Resource": "arn:aws:states:::sns:publish", 78 | "Next": "Document Processed", 79 | "Parameters": { 80 | "TopicArn": "${PendingReviewTopicArn}", 81 | "Message.$": "$" 82 | } 83 | }, 84 | "Document Processed": { 85 | "Type": "Succeed" 86 | } 87 | } 88 | } 89 | 90 | -------------------------------------------------------------------------------- /template.yaml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: '2010-09-09' 2 | Transform: AWS::Serverless-2016-10-31 3 | Description: > 4 | getting-started-with-rpa 5 | 6 | This template creates a workflow that leverages AWS Step Functions, 7 | AWS Lambda, and Amazon Textract to get started with Robotic Process 8 | Automation (RPA), which is the use of software with artificial 9 | intelligence (AI) to handle high-volume, repeatable tasks that 10 | previously required humans to perform. 11 | 12 | # More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst 13 | Globals: 14 | Function: 15 | Timeout: 3 16 | 17 | Metadata: 18 | AWS::CloudFormation::Interface: 19 | ParameterGroups: 20 | - Label: 21 | default: "Amazon DynamoDB Configuration" 22 | Parameters: 23 | - InvoicesTableNameSuffix 24 | - ReadCapacityUnits 25 | - WriteCapacityUnits 26 | - Label: 27 | default: "Amazon S3 Bucket Name Prefixes" 28 | Parameters: 29 | - ScannedInvoicesBucketNamePrefix 30 | - InvoiceAnalysesBucketNamePrefix 31 | - ProcessedInvoicesBucketNamePrefix 32 | ParameterLabels: 33 | InvoicesTableNameSuffix: 34 | default: Invoices Table Name Suffix 35 | ReadCapacityUnits: 36 | default: Read Capacity Units 37 | WriteCapacityUnits: 38 | default: Write Capacity Units 39 | ScannedInvoicesBucketNamePrefix: 40 | default: Scanned Invoices Bucket Name Prefix 41 | InvoiceAnalysesBucketNamePrefix: 42 | default: Invoice Analyses Bucket Name Prefix 43 | ProcessedInvoicesBucketNamePrefix: 44 | default: Processed Invoices Bucket Name Prefix 45 | 46 | Parameters: 47 | InvoicesTableNameSuffix: 48 | Type: String 49 | Default: "invoices" 50 | Description: > 51 | The suffix of the name of the Amazon DynamoDB table where processed invoices 52 | will be stored. The table name will be prefixed by the Stack Name. 53 | ReadCapacityUnits: 54 | Description: Provisioned read throughput 55 | Type: Number 56 | Default: 5 57 | MinValue: 5 58 | MaxValue: 10000 59 | ConstraintDescription: must be between 5 and 10000 60 | WriteCapacityUnits: 61 | Description: Provisioned write throughput 62 | Type: Number 63 | Default: 5 64 | MinValue: 5 65 | MaxValue: 10000 66 | ConstraintDescription: must be between 5 and 10000 67 | ScannedInvoicesBucketNamePrefix: 68 | Type: String 69 | Default: "scanned-invoices" 70 | Description: > 71 | The prefix of the name of the S3 bucket where scanned invoices will be stored. 72 | The bucket name will be suffixed by Account ID to avoid global S3 bucket name 73 | collisions. 74 | InvoiceAnalysesBucketNamePrefix: 75 | Type: String 76 | Default: "invoice-analyses" 77 | Description: > 78 | The prefix of the name of the S3 bucket where invoice analyses will be stored. 79 | The bucket name will be suffixed by Account ID to avoid global S3 bucket name 80 | collisions. 81 | ProcessedInvoicesBucketNamePrefix: 82 | Type: String 83 | Default: "processed-invoices" 84 | Description: > 85 | The prefix of the name of the S3 bucket where processed invoices will be stored. 86 | The bucket name will be suffixed by Account ID to avoid global S3 bucket name 87 | collisions. 88 | 89 | Resources: 90 | InvoicesTable: 91 | Type: AWS::DynamoDB::Table 92 | Properties: 93 | AttributeDefinitions: 94 | - AttributeName: invoice_id 95 | AttributeType: S 96 | - AttributeName: payee_name 97 | AttributeType: S 98 | - AttributeName: due_date 99 | AttributeType: S 100 | GlobalSecondaryIndexes: 101 | - IndexName: payee_name_index 102 | KeySchema: 103 | - AttributeName: payee_name 104 | KeyType: HASH 105 | Projection: 106 | ProjectionType: KEYS_ONLY 107 | ProvisionedThroughput: 108 | ReadCapacityUnits: !Ref ReadCapacityUnits 109 | WriteCapacityUnits: !Ref WriteCapacityUnits 110 | - IndexName: due_date_index 111 | KeySchema: 112 | - AttributeName: due_date 113 | KeyType: HASH 114 | Projection: 115 | ProjectionType: KEYS_ONLY 116 | ProvisionedThroughput: 117 | ReadCapacityUnits: !Ref ReadCapacityUnits 118 | WriteCapacityUnits: !Ref WriteCapacityUnits 119 | KeySchema: 120 | - AttributeName: invoice_id 121 | KeyType: HASH 122 | ProvisionedThroughput: 123 | ReadCapacityUnits: !Ref ReadCapacityUnits 124 | WriteCapacityUnits: !Ref WriteCapacityUnits 125 | TableName: 126 | !Sub 127 | - ${AWS::StackName}-${TableNameSuffix} 128 | - { TableNameSuffix: !Ref InvoicesTableNameSuffix } 129 | ScannedInvoicesBucket: 130 | Type: AWS::S3::Bucket 131 | Properties: 132 | BucketName: 133 | !Sub 134 | - ${BucketNamePrefix}-${AWS::AccountId} 135 | - { BucketNamePrefix: !Ref ScannedInvoicesBucketNamePrefix } 136 | BucketEncryption: 137 | ServerSideEncryptionConfiguration: 138 | - ServerSideEncryptionByDefault: 139 | SSEAlgorithm: 'aws:kms' 140 | KMSMasterKeyID: alias/aws/s3 141 | VersioningConfiguration: 142 | Status: Enabled 143 | InvoiceAnalysesBucket: 144 | Type: AWS::S3::Bucket 145 | Properties: 146 | BucketName: 147 | !Sub 148 | - ${BucketNamePrefix}-${AWS::AccountId} 149 | - { BucketNamePrefix: !Ref InvoiceAnalysesBucketNamePrefix } 150 | BucketEncryption: 151 | ServerSideEncryptionConfiguration: 152 | - ServerSideEncryptionByDefault: 153 | SSEAlgorithm: 'aws:kms' 154 | KMSMasterKeyID: alias/aws/s3 155 | VersioningConfiguration: 156 | Status: Enabled 157 | ProcessedInvoicesBucket: 158 | Type: AWS::S3::Bucket 159 | Properties: 160 | BucketName: 161 | !Sub 162 | - ${BucketNamePrefix}-${AWS::AccountId} 163 | - { BucketNamePrefix: !Ref ProcessedInvoicesBucketNamePrefix } 164 | BucketEncryption: 165 | ServerSideEncryptionConfiguration: 166 | - ServerSideEncryptionByDefault: 167 | SSEAlgorithm: 'aws:kms' 168 | KMSMasterKeyID: alias/aws/s3 169 | VersioningConfiguration: 170 | Status: Enabled 171 | StartProcessScannedInvoiceWorkflowFunctionLambdaExecutionRole: 172 | Type: AWS::IAM::Role 173 | Properties: 174 | AssumeRolePolicyDocument: 175 | Version: 2012-10-17 176 | Statement: 177 | - Effect: Allow 178 | Principal: 179 | Service: 180 | - lambda.amazonaws.com 181 | Action: 182 | - sts:AssumeRole 183 | Path: / 184 | Policies: 185 | - PolicyName: lambda_basic_execution 186 | PolicyDocument: 187 | Version: 2012-10-17 188 | Statement: 189 | - Effect: Allow 190 | Action: 191 | - logs:CreateLogGroup 192 | - logs:CreateLogStream 193 | - logs:PutLogEvents 194 | Resource: 195 | !Sub 196 | - arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}-${LambdaFunctionName}:* 197 | - { LambdaFunctionName: StartProcessScannedInvoiceWorkflow } 198 | - PolicyName: sqs 199 | PolicyDocument: 200 | Version: 2012-10-17 201 | Statement: 202 | - Effect: Allow 203 | Action: 204 | - sqs:DeleteMessage 205 | - sqs:GetQueueAttributes 206 | - sqs:ReceiveMessage 207 | Resource: 208 | !Sub 209 | - arn:aws:sqs:${AWS::Region}:${AWS::AccountId}:${AWS::StackName}-${QueueName} 210 | - { QueueName: DocumentAnalysisCompletedQueue } 211 | - PolicyName: step_functions 212 | PolicyDocument: 213 | Version: 2012-10-17 214 | Statement: 215 | - Effect: Allow 216 | Action: 217 | - states:StartExecution 218 | Resource: "*" 219 | - PolicyName: kms 220 | PolicyDocument: 221 | Version: 2012-10-17 222 | Statement: 223 | - Effect: Allow 224 | Action: 225 | - kms:Decrypt 226 | Resource: "*" 227 | RoleName: 228 | !Sub 229 | - ${AWS::StackName}-${ExecutionRoleName} 230 | - { ExecutionRoleName: StartProcessScannedInvoiceWorkflowRole } 231 | StartProcessScannedInvoiceWorkflowFunction: 232 | Type: AWS::Serverless::Function 233 | Properties: 234 | CodeUri: start_process_scanned_invoice_workflow/ 235 | Environment: 236 | Variables: 237 | STATE_MACHINE_ARN: 238 | !Sub 239 | - arn:aws:states:${AWS::Region}:${AWS::AccountId}:stateMachine:${AWS::StackName}-${StateMachineName} 240 | - { StateMachineName: ProcessScannedInvoiceWorkflow } 241 | Events: 242 | SQSEvent: 243 | Type: SQS 244 | Properties: 245 | Queue: 246 | !Sub ${DocumentAnalysisCompletedQueue.Arn} 247 | FunctionName: 248 | !Sub 249 | - ${AWS::StackName}-${LambdaFunctionName} 250 | - { LambdaFunctionName: StartProcessScannedInvoiceWorkflow } 251 | Handler: app.lambda_handler 252 | Role: 253 | !GetAtt 254 | - StartProcessScannedInvoiceWorkflowFunctionLambdaExecutionRole 255 | - Arn 256 | MemorySize: 128 257 | Runtime: python3.7 258 | Timeout: 30 259 | StartProcessScannedInvoiceWorkflowFunctionLogGroup: 260 | Type: AWS::Logs::LogGroup 261 | Properties: 262 | LogGroupName: 263 | !Sub 264 | - /aws/lambda/${AWS::StackName}-${LambdaFunctionName} 265 | - LambdaFunctionName: StartProcessScannedInvoiceWorkflow 266 | RetentionInDays: 7 267 | StartDocumentAnalysisFunctionLambdaExecutionRole: 268 | Type: AWS::IAM::Role 269 | Properties: 270 | AssumeRolePolicyDocument: 271 | Version: 2012-10-17 272 | Statement: 273 | - Effect: Allow 274 | Principal: 275 | Service: 276 | - lambda.amazonaws.com 277 | Action: 278 | - sts:AssumeRole 279 | Path: / 280 | Policies: 281 | - PolicyName: lambda_basic_execution 282 | PolicyDocument: 283 | Version: 2012-10-17 284 | Statement: 285 | - Effect: Allow 286 | Action: 287 | - logs:CreateLogGroup 288 | - logs:CreateLogStream 289 | - logs:PutLogEvents 290 | Resource: 291 | !Sub 292 | - arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}-${LambdaFunctionName}:* 293 | - { LambdaFunctionName: StartDocumentAnalysis } 294 | - PolicyName: s3 295 | PolicyDocument: 296 | Version: 2012-10-17 297 | Statement: 298 | - Effect: Allow 299 | Action: 300 | - s3:GetObject 301 | Resource: 302 | !Sub 303 | - arn:aws:s3:::${BucketNamePrefix}-${AWS::AccountId}/* 304 | - { BucketNamePrefix: !Ref ScannedInvoicesBucketNamePrefix } 305 | - PolicyName: textract 306 | PolicyDocument: 307 | Version: 2012-10-17 308 | Statement: 309 | - Effect: Allow 310 | Action: 311 | - textract:StartDocumentAnalysis 312 | Resource: "*" 313 | RoleName: 314 | !Sub 315 | - ${AWS::StackName}-${ExecutionRoleName} 316 | - { ExecutionRoleName: StartDocumentAnalysisRole } 317 | DocumentAnalysisCompletedTopic: 318 | Type: AWS::SNS::Topic 319 | Properties: 320 | KmsMasterKeyId: !Ref Key 321 | TopicName: 322 | !Sub 323 | - ${AWS::StackName}-${TopicName} 324 | - { TopicName: DocumentAnalysisCompleted } 325 | DocumentAnalysisCompletedQueue: 326 | Type: AWS::SQS::Queue 327 | Properties: 328 | DelaySeconds: 60 329 | KmsMasterKeyId: !Ref Key 330 | QueueName: 331 | !Sub 332 | - ${AWS::StackName}-${QueueName} 333 | - { QueueName: DocumentAnalysisCompletedQueue } 334 | ReceiveMessageWaitTimeSeconds: 20 335 | VisibilityTimeout: 60 336 | DocumentAnalysisCompletedQueuePolicy: 337 | Type: AWS::SQS::QueuePolicy 338 | Properties: 339 | PolicyDocument: 340 | Version: 2012-10-17 341 | Statement: 342 | - Effect: Allow 343 | Principal: 344 | Service: sns.amazonaws.com 345 | Action: 346 | - sqs:SendMessage 347 | Resource: 348 | !Sub 349 | - arn:aws:sqs:${AWS::Region}:${AWS::AccountId}:${AWS::StackName}-${QueueName} 350 | - { QueueName: DocumentAnalysisCompletedQueue } 351 | Queues: 352 | - !Ref DocumentAnalysisCompletedQueue 353 | DocumentAnalysisCompletedSNSSubscription: 354 | Type: AWS::SNS::Subscription 355 | Properties: 356 | Protocol: sqs 357 | Endpoint: !GetAtt DocumentAnalysisCompletedQueue.Arn 358 | TopicArn: !Ref DocumentAnalysisCompletedTopic 359 | TextractPublishToSNSTopicRole: 360 | Type: AWS::IAM::Role 361 | Properties: 362 | AssumeRolePolicyDocument: 363 | Version: 2012-10-17 364 | Statement: 365 | - Effect: Allow 366 | Principal: 367 | Service: 368 | - textract.amazonaws.com 369 | Action: 370 | - sts:AssumeRole 371 | Path: / 372 | Policies: 373 | - PolicyName: sns 374 | PolicyDocument: 375 | Version: 2012-10-17 376 | Statement: 377 | - Effect: Allow 378 | Action: 379 | - sns:Publish 380 | Resource: 381 | !Sub 382 | - arn:aws:sns:${AWS::Region}:${AWS::AccountId}:${AWS::StackName}-${TopicName} 383 | - { TopicName: DocumentAnalysisCompleted } 384 | - PolicyName: kms 385 | PolicyDocument: 386 | Version: 2012-10-17 387 | Statement: 388 | - Effect: Allow 389 | Action: 390 | - kms:GenerateDataKey* 391 | - kms:Decrypt 392 | Resource: "*" 393 | RoleName: 394 | !Sub 395 | - ${AWS::StackName}-${RoleName} 396 | - { RoleName: TextractPublishToSNSTopicRole } 397 | StartDocumentAnalysisFunction: 398 | Type: AWS::Serverless::Function 399 | Properties: 400 | CodeUri: start_document_analysis/ 401 | Environment: 402 | Variables: 403 | DOCUMENT_ANALYIS_COMPLETED_SNS_TOPIC_ARN: 404 | !Sub 405 | - arn:aws:sns:${AWS::Region}:${AWS::AccountId}:${AWS::StackName}-${TopicName} 406 | - { TopicName: DocumentAnalysisCompleted } 407 | TEXTRACT_PUBLISH_TO_SNS_ROLE_ARN: 408 | !Sub 409 | - arn:aws:iam::${AWS::AccountId}:role/${AWS::StackName}-${RoleName} 410 | - { RoleName: TextractPublishToSNSTopicRole } 411 | Events: 412 | S3Event: 413 | Type: S3 414 | Properties: 415 | Bucket: !Ref ScannedInvoicesBucket 416 | Events: s3:ObjectCreated:* 417 | FunctionName: 418 | !Sub 419 | - ${AWS::StackName}-${LambdaFunctionName} 420 | - { LambdaFunctionName: StartDocumentAnalysis } 421 | Handler: app.lambda_handler 422 | Role: 423 | !GetAtt 424 | - StartDocumentAnalysisFunctionLambdaExecutionRole 425 | - Arn 426 | MemorySize: 128 427 | Runtime: python3.7 428 | Timeout: 30 429 | StartDocumentAnalysisFunctionLogGroup: 430 | Type: AWS::Logs::LogGroup 431 | Properties: 432 | LogGroupName: 433 | !Sub 434 | - /aws/lambda/${AWS::StackName}-${LambdaFunctionName} 435 | - LambdaFunctionName: StartDocumentAnalysis 436 | RetentionInDays: 7 437 | SaveDocumentAnalysisFunctionLambdaExecutionRole: 438 | Type: AWS::IAM::Role 439 | Properties: 440 | AssumeRolePolicyDocument: 441 | Version: 2012-10-17 442 | Statement: 443 | - Effect: Allow 444 | Principal: 445 | Service: 446 | - lambda.amazonaws.com 447 | Action: 448 | - sts:AssumeRole 449 | Path: / 450 | Policies: 451 | - PolicyName: lambda_basic_execution 452 | PolicyDocument: 453 | Version: 2012-10-17 454 | Statement: 455 | - Effect: Allow 456 | Action: 457 | - logs:CreateLogGroup 458 | - logs:CreateLogStream 459 | - logs:PutLogEvents 460 | Resource: 461 | !Sub 462 | - arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}-${LambdaFunctionName}:* 463 | - { LambdaFunctionName: SaveDocumentAnalysis } 464 | - PolicyName: textract 465 | PolicyDocument: 466 | Version: 2012-10-17 467 | Statement: 468 | - Effect: Allow 469 | Action: 470 | - textract:GetDocumentAnalysis 471 | Resource: "*" 472 | - PolicyName: s3 473 | PolicyDocument: 474 | Version: 2012-10-17 475 | Statement: 476 | - Effect: Allow 477 | Action: 478 | - s3:PutObject 479 | Resource: 480 | !Sub 481 | - arn:aws:s3:::${BucketNamePrefix}-${AWS::AccountId}/* 482 | - { BucketNamePrefix: !Ref InvoiceAnalysesBucketNamePrefix } 483 | RoleName: 484 | !Sub 485 | - ${AWS::StackName}-${ExecutionRoleName} 486 | - { ExecutionRoleName: SaveDocumentAnalysisRole } 487 | SaveDocumentAnalysisFunction: 488 | Type: AWS::Serverless::Function 489 | Properties: 490 | CodeUri: save_document_analysis/ 491 | Environment: 492 | Variables: 493 | ANALYSES_BUCKET_NAME: 494 | !Sub 495 | - ${BucketNamePrefix}-${AWS::AccountId} 496 | - { BucketNamePrefix: !Ref InvoiceAnalysesBucketNamePrefix } 497 | FunctionName: 498 | !Sub 499 | - ${AWS::StackName}-${LambdaFunctionName} 500 | - { LambdaFunctionName: SaveDocumentAnalysis } 501 | Handler: app.lambda_handler 502 | Role: 503 | !GetAtt 504 | - SaveDocumentAnalysisFunctionLambdaExecutionRole 505 | - Arn 506 | MemorySize: 128 507 | Runtime: python3.7 508 | Timeout: 30 509 | SaveDocumentAnalysisFunctionLogGroup: 510 | Type: AWS::Logs::LogGroup 511 | Properties: 512 | LogGroupName: 513 | !Sub 514 | - /aws/lambda/${AWS::StackName}-${LambdaFunctionName} 515 | - LambdaFunctionName: SaveDocumentAnalysis 516 | RetentionInDays: 7 517 | PendingReviewTopic: 518 | Type: AWS::SNS::Topic 519 | Properties: 520 | KmsMasterKeyId: !Ref Key 521 | TopicName: 522 | !Sub 523 | - ${AWS::StackName}-${TopicName} 524 | - { TopicName: PendingReview } 525 | ProcessDocumentAnalysisFunctionLambdaExecutionRole: 526 | Type: AWS::IAM::Role 527 | Properties: 528 | AssumeRolePolicyDocument: 529 | Version: 2012-10-17 530 | Statement: 531 | - Effect: Allow 532 | Principal: 533 | Service: 534 | - lambda.amazonaws.com 535 | Action: 536 | - sts:AssumeRole 537 | Path: / 538 | Policies: 539 | - PolicyName: lambda_basic_execution 540 | PolicyDocument: 541 | Version: 2012-10-17 542 | Statement: 543 | - Effect: Allow 544 | Action: 545 | - logs:CreateLogGroup 546 | - logs:CreateLogStream 547 | - logs:PutLogEvents 548 | Resource: 549 | !Sub 550 | - arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}-${LambdaFunctionName}:* 551 | - { LambdaFunctionName: ProcessDocumentAnalysis } 552 | - PolicyName: dynamodb 553 | PolicyDocument: 554 | Version: 2012-10-17 555 | Statement: 556 | - Effect: Allow 557 | Action: 558 | - dynamodb:PutItem 559 | Resource: 560 | !Sub 561 | - arn:aws:dynamodb:${AWS::Region}:${AWS::AccountId}:table/${AWS::StackName}-${TableNameSuffix} 562 | - { TableNameSuffix: !Ref InvoicesTableNameSuffix } 563 | - PolicyName: s3 564 | PolicyDocument: 565 | Version: 2012-10-17 566 | Statement: 567 | - Effect: Allow 568 | Action: 569 | - s3:GetObject 570 | Resource: 571 | !Sub 572 | - arn:aws:s3:::${BucketNamePrefix}-${AWS::AccountId}/* 573 | - { BucketNamePrefix: !Ref InvoiceAnalysesBucketNamePrefix } 574 | - Effect: Allow 575 | Action: 576 | - s3:PutObject 577 | Resource: 578 | !Sub 579 | - arn:aws:s3:::${BucketNamePrefix}-${AWS::AccountId}/* 580 | - { BucketNamePrefix: !Ref ProcessedInvoicesBucketNamePrefix } 581 | RoleName: 582 | !Sub 583 | - ${AWS::StackName}-${ExecutionRoleName} 584 | - { ExecutionRoleName: ProcessDocumentAnalysisRole } 585 | ProcessDocumentAnalysisFunction: 586 | Type: AWS::Serverless::Function 587 | Properties: 588 | CodeUri: process_document_analysis/ 589 | Environment: 590 | Variables: 591 | INVOICES_TABLE_NAME: 592 | !Sub 593 | - ${AWS::StackName}-${TableNameSuffix} 594 | - { TableNameSuffix: !Ref InvoicesTableNameSuffix } 595 | FunctionName: 596 | !Sub 597 | - ${AWS::StackName}-${LambdaFunctionName} 598 | - { LambdaFunctionName: ProcessDocumentAnalysis } 599 | Handler: app.lambda_handler 600 | Role: 601 | !GetAtt 602 | - ProcessDocumentAnalysisFunctionLambdaExecutionRole 603 | - Arn 604 | MemorySize: 128 605 | Runtime: python3.7 606 | Timeout: 30 607 | ProcessDocumentAnalysisFunctionLogGroup: 608 | Type: AWS::Logs::LogGroup 609 | Properties: 610 | LogGroupName: 611 | !Sub 612 | - /aws/lambda/${AWS::StackName}-${LambdaFunctionName} 613 | - LambdaFunctionName: ProcessDocumentAnalysis 614 | RetentionInDays: 7 615 | ArchiveDocumentFunctionLambdaExecutionRole: 616 | Type: AWS::IAM::Role 617 | Properties: 618 | AssumeRolePolicyDocument: 619 | Version: 2012-10-17 620 | Statement: 621 | - Effect: Allow 622 | Principal: 623 | Service: 624 | - lambda.amazonaws.com 625 | Action: 626 | - sts:AssumeRole 627 | Path: / 628 | Policies: 629 | - PolicyName: lambda_basic_execution 630 | PolicyDocument: 631 | Version: 2012-10-17 632 | Statement: 633 | - Effect: Allow 634 | Action: 635 | - logs:CreateLogGroup 636 | - logs:CreateLogStream 637 | - logs:PutLogEvents 638 | Resource: 639 | !Sub 640 | - arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/lambda/${AWS::StackName}-${LambdaFunctionName}:* 641 | - { LambdaFunctionName: ArchiveDocument } 642 | - PolicyName: s3 643 | PolicyDocument: 644 | Version: 2012-10-17 645 | Statement: 646 | - Effect: Allow 647 | Action: 648 | - s3:GetObject 649 | - s3:GetObjectTagging 650 | Resource: 651 | !Sub 652 | - arn:aws:s3:::${BucketNamePrefix}-${AWS::AccountId}/* 653 | - { BucketNamePrefix: !Ref ScannedInvoicesBucketNamePrefix } 654 | - Effect: Allow 655 | Action: 656 | - s3:PutObject 657 | - s3:PutObjectTagging 658 | Resource: 659 | !Sub 660 | - arn:aws:s3:::${BucketNamePrefix}-${AWS::AccountId}/* 661 | - { BucketNamePrefix: !Ref ProcessedInvoicesBucketNamePrefix } 662 | - Effect: Allow 663 | Action: 664 | - s3:DeleteObject 665 | Resource: 666 | !Sub 667 | - arn:aws:s3:::${BucketNamePrefix}-${AWS::AccountId}/* 668 | - { BucketNamePrefix: !Ref ScannedInvoicesBucketNamePrefix } 669 | RoleName: 670 | !Sub 671 | - ${AWS::StackName}-${ExecutionRoleName} 672 | - { ExecutionRoleName: ArchiveDocumentRole } 673 | ArchiveDocumentFunction: 674 | Type: AWS::Serverless::Function 675 | Properties: 676 | CodeUri: archive_document/ 677 | Environment: 678 | Variables: 679 | ARCHIVE_BUCKET_NAME: 680 | !Sub 681 | - ${BucketNamePrefix}-${AWS::AccountId} 682 | - { BucketNamePrefix: !Ref ProcessedInvoicesBucketNamePrefix } 683 | FunctionName: 684 | !Sub 685 | - ${AWS::StackName}-${LambdaFunctionName} 686 | - { LambdaFunctionName: ArchiveDocument } 687 | Handler: app.lambda_handler 688 | Role: 689 | !GetAtt 690 | - ArchiveDocumentFunctionLambdaExecutionRole 691 | - Arn 692 | MemorySize: 128 693 | Runtime: python3.7 694 | Timeout: 30 695 | ArchiveDocumentFunctionLogGroup: 696 | Type: AWS::Logs::LogGroup 697 | Properties: 698 | LogGroupName: 699 | !Sub 700 | - /aws/lambda/${AWS::StackName}-${LambdaFunctionName} 701 | - LambdaFunctionName: ArchiveDocument 702 | RetentionInDays: 7 703 | ProcessScannedInvoiceWorkflowStateMachineStatesExecutionRole: 704 | Type: AWS::IAM::Role 705 | Properties: 706 | AssumeRolePolicyDocument: 707 | Version: 2012-10-17 708 | Statement: 709 | - Effect: Allow 710 | Principal: 711 | Service: 712 | - states.amazonaws.com 713 | Action: 714 | - sts:AssumeRole 715 | Path: / 716 | Policies: 717 | - PolicyName: lambda_invoke 718 | PolicyDocument: 719 | Version: 2012-10-17 720 | Statement: 721 | - Effect: Allow 722 | Action: 723 | - lambda:InvokeFunction 724 | Resource: 725 | - !Sub 726 | - arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:${AWS::StackName}-${LambdaFunctionName} 727 | - { LambdaFunctionName: StartProcessScannedInvoiceWorkflow } 728 | - !Sub 729 | - arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:${AWS::StackName}-${LambdaFunctionName} 730 | - { LambdaFunctionName: StartDocumentAnalysis } 731 | - !Sub 732 | - arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:${AWS::StackName}-${LambdaFunctionName} 733 | - { LambdaFunctionName: GetDocumentAnalysisStatus } 734 | - !Sub 735 | - arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:${AWS::StackName}-${LambdaFunctionName} 736 | - { LambdaFunctionName: SaveDocumentAnalysis } 737 | - !Sub 738 | - arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:${AWS::StackName}-${LambdaFunctionName} 739 | - { LambdaFunctionName: ProcessDocumentAnalysis } 740 | - !Sub 741 | - arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:function:${AWS::StackName}-${LambdaFunctionName} 742 | - { LambdaFunctionName: ArchiveDocument } 743 | - PolicyName: sns 744 | PolicyDocument: 745 | Version: 2012-10-17 746 | Statement: 747 | - Effect: Allow 748 | Action: 749 | - sns:Publish 750 | Resource: 751 | !Sub 752 | - arn:aws:sns:${AWS::Region}:${AWS::AccountId}:${AWS::StackName}-${TopicName} 753 | - { TopicName: PendingReview } 754 | - PolicyName: kms 755 | PolicyDocument: 756 | Version: 2012-10-17 757 | Statement: 758 | - Effect: Allow 759 | Action: 760 | - kms:GenerateDataKey* 761 | - kms:Decrypt 762 | Resource: "*" 763 | RoleName: 764 | !Sub 765 | - ${AWS::StackName}-${ExecutionRoleName} 766 | - { ExecutionRoleName: ProcessScannedInvoiceWorkflowRole } 767 | ProcessScannedInvoiceWorkflowStateMachine: 768 | Type: AWS::Serverless::StateMachine 769 | Properties: 770 | DefinitionSubstitutions: 771 | StartDocumentAnalysisLambdaArn: !GetAtt [ StartDocumentAnalysisFunction, Arn ] 772 | SaveDocumentAnalysisLambdaArn: !GetAtt [ SaveDocumentAnalysisFunction, Arn ] 773 | PendingReviewTopicArn: !Ref PendingReviewTopic 774 | ProcessDocumentAnalysisLambdaArn: !GetAtt [ ProcessDocumentAnalysisFunction, Arn ] 775 | ArchiveDocumentLambdaArn: !GetAtt [ ArchiveDocumentFunction, Arn ] 776 | DefinitionUri: state_machine/process_scanned_invoice_workflow.asl.json 777 | Role: !GetAtt [ ProcessScannedInvoiceWorkflowStateMachineStatesExecutionRole, Arn ] 778 | Name: 779 | !Sub 780 | - ${AWS::StackName}-${LambdaFunctionName} 781 | - { LambdaFunctionName: ProcessScannedInvoiceWorkflow } 782 | Key: 783 | Type: 'AWS::KMS::Key' 784 | Properties: 785 | KeyPolicy: 786 | Version: '2012-10-17' 787 | Statement: 788 | - Effect: Allow 789 | Principal: 790 | AWS: !Sub 'arn:aws:iam::${AWS::AccountId}:root' 791 | Action: 'kms:*' 792 | Resource: '*' 793 | - Effect: Allow 794 | Principal: 795 | Service: !Sub 'textract.amazonaws.com' 796 | Action: 797 | - kms:GenerateDataKey* 798 | - kms:Decrypt 799 | Resource: '*' 800 | - Effect: Allow 801 | Principal: 802 | Service: !Sub 'sns.amazonaws.com' 803 | Action: 804 | - kms:GenerateDataKey* 805 | - kms:Decrypt 806 | Resource: '*' 807 | KeyAlias: 808 | Type: 'AWS::KMS::Alias' 809 | Properties: 810 | AliasName: !Sub 'alias/${AWS::StackName}-Key' 811 | TargetKeyId: !Ref Key 812 | 813 | Outputs: 814 | ScannedInvoicesBucketName: 815 | Value: 816 | !Ref ScannedInvoicesBucket 817 | InvoiceAnalysesBucketName: 818 | Value: 819 | !Ref InvoiceAnalysesBucket 820 | ProcessedInvoicesBucketName: 821 | Value: 822 | !Ref ProcessedInvoicesBucket 823 | --------------------------------------------------------------------------------