├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── NOTICE ├── README.md ├── arch.png ├── arch_latest.png ├── cfntemplate.yml ├── image.png ├── launchstack.png └── src └── lambda ├── InvoiceBot.zip ├── create_texttract_detect_text_async_job.py ├── extract_text_from_textract_async_job_output.py ├── lex-manager.py └── meaningful-conversations-lex-lambda.py /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | ## Code of Conduct 2 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 3 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 4 | opensource-codeofconduct@amazon.com with any additional questions or comments. 5 | -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing Guidelines 2 | 3 | Thank you for your interest in contributing to our project. Whether it's a bug report, new feature, correction, or additional 4 | documentation, we greatly value feedback and contributions from our community. 5 | 6 | Please read through this document before submitting any issues or pull requests to ensure we have all the necessary 7 | information to effectively respond to your bug report or contribution. 8 | 9 | 10 | ## Reporting Bugs/Feature Requests 11 | 12 | We welcome you to use the GitHub issue tracker to report bugs or suggest features. 13 | 14 | When filing an issue, please check existing open, or recently closed, issues to make sure somebody else hasn't already 15 | reported the issue. Please try to include as much information as you can. Details like these are incredibly useful: 16 | 17 | * A reproducible test case or series of steps 18 | * The version of our code being used 19 | * Any modifications you've made relevant to the bug 20 | * Anything unusual about your environment or deployment 21 | 22 | 23 | ## Contributing via Pull Requests 24 | Contributions via pull requests are much appreciated. Before sending us a pull request, please ensure that: 25 | 26 | 1. You are working against the latest source on the *master* branch. 27 | 2. You check existing open, and recently merged, pull requests to make sure someone else hasn't addressed the problem already. 28 | 3. You open an issue to discuss any significant work - we would hate for your time to be wasted. 29 | 30 | To send us a pull request, please: 31 | 32 | 1. Fork the repository. 33 | 2. Modify the source; please focus on the specific change you are contributing. If you also reformat all the code, it will be hard for us to focus on your change. 34 | 3. Ensure local tests pass. 35 | 4. Commit to your fork using clear commit messages. 36 | 5. Send us a pull request, answering any default questions in the pull request interface. 37 | 6. Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation. 38 | 39 | GitHub provides additional document on [forking a repository](https://help.github.com/articles/fork-a-repo/) and 40 | [creating a pull request](https://help.github.com/articles/creating-a-pull-request/). 41 | 42 | 43 | ## Finding contributions to work on 44 | Looking at the existing issues is a great way to find something to contribute on. As our projects, by default, use the default GitHub issue labels (enhancement/bug/duplicate/help wanted/invalid/question/wontfix), looking at any 'help wanted' issues is a great place to start. 45 | 46 | 47 | ## Code of Conduct 48 | This project has adopted the [Amazon Open Source Code of Conduct](https://aws.github.io/code-of-conduct). 49 | For more information see the [Code of Conduct FAQ](https://aws.github.io/code-of-conduct-faq) or contact 50 | opensource-codeofconduct@amazon.com with any additional questions or comments. 51 | 52 | 53 | ## Security issue notifications 54 | If you discover a potential security issue in this project we ask that you notify AWS/Amazon Security via our [vulnerability reporting page](http://aws.amazon.com/security/vulnerability-reporting/). Please do **not** create a public github issue. 55 | 56 | 57 | ## Licensing 58 | 59 | See the [LICENSE](LICENSE) file for our project's licensing. We will ask you to confirm the licensing of your contribution. 60 | 61 | We may ask you to sign a [Contributor License Agreement (CLA)](http://en.wikipedia.org/wiki/Contributor_License_Agreement) for larger changes. 62 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | -------------------------------------------------------------------------------- /NOTICE: -------------------------------------------------------------------------------- 1 | Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Deriving conversational insights from invoices with Amazon Textract, Amazon Comprehend, and Amazon Lex 2 | 3 | This sample is based on the blog post (Link to be specified). It shows you how to use AWS AI services to automate text data processing and insight discovery. With AWS AI services such as Amazon Textract, Amazon Comprehend and Amazon Lex, you can set up an automated serverless solution to address this requirement. We will walk you through below steps: 4 | 1) Extract text from receipts or invoices in pdf or images with Amazon Textract. 5 | 2) Derive insights with Amazon Comprehend. 6 | 3) Interact with these insights in natural language using Amazon Lex. 7 | 8 | 9 | ## Services Used 10 | This solution uses AI services, serverless technologies and managed services to 11 | implement a scalable and cost-effective architecture. 12 | * AWS CodeStar – Sets up the web UI for the chatbot and continuous delivery pipeline. 13 | * Amazon Cognito – Lets you add user signup, signin, and access control to your web and mobile apps quickly and easily. 14 | * AWS Lambda – Executes code in response to triggers such as changes in data, shifts in system state, or user actions. Because Amazon S3 can directly trigger a Lambda function, you can build a variety of real-time serverless data-processing systems. 15 | * Amazon Lex – Provides an interface to create conversational chatbots. 16 | * Amazon Comprehend – NLP service that uses machine learning to find insights and relationships in text. 17 | * Amazon Textract– Uses ML to extract text and data from scanned documents in PDF, JPEG, or PNG formats. 18 | * Amazon Simple Storage Service (Amazon S3) – Serves as an object store for your documents and allows for central management with fine-tuned access controls. 19 | 20 | 21 | ## This sample includes: 22 | 23 | * README.md - this file 24 | 25 | * cfntemplate.yml - this file contains the AWS Serverless Application Model (AWS SAM) used 26 | by AWS CloudFormation to deploy your application. 27 | 28 | * AWS Lambda functions writted in Python present in src/Lambda folder for implementing calls to Amazon Textract, Amazon Comprehend and the fulfillment code for Amazon Lex 29 | 30 | 31 | ## Solution Overview 32 | 33 | The following diagram illustrates the architecture of the solution 34 | 35 | ![](arch.png) 36 | 37 | The architecture contains the following steps: 38 | 39 | 1. The backend user or administrator uses the AWS Management Console or AWS Command Line Interface (AWS CLI) to upload the PDF documents or images to an S3 bucket. 40 | 2. The Amazon S3 upload triggers a AWS Lambda function. 41 | 3. The Lambda function invokes an Amazon Textract StartDocumentTextDetection async API, which sets up an asynchronous job to detect text from the PDF you uploaded. 42 | 4. Amazon Textract notifies Amazon Simple Notification Service (Amazon SNS) when text processing is complete. 43 | 5. A second Lambda function gets the notification from SNS topic when the job is completed to detect text. 44 | 6. Once the lambda is notified of job completion from Amazon SNS, it calls a Amazon Textract GetDocumentTextDetection async API to receive the result from asynchronous operation and loads the results into an S3 bucket. 45 | 7. A Lambda function is used for fulfillment of the Amazon Lex intents. For a more detailed sequence of interactions please refer to the Building your chatbot step in “Deploying the Architecture with Cloudformation” section. 46 | 8. Amazon Comprehend uses ML to find insights and relationships in text. The lambda function uses boto3 APIs that Amazon Comprehend provides for entity and key phrases detection. 47 | a. In response to the Bot’s welcome message, the user types “Show me the invoice summary”, this invokes the GetInvoiceSummary Lex intent and the Lambda function uses the Amazon Comprehend DetectEntities API for fulfillment 48 | b. When the user types “Get me the invoice details”, this invokes the GetInvoiceDetails intent, Amazon Lex prompts the user to enter Invoice Number, and the Lambda function uses the Amazon Comprehend DetectEntities API to return the Invoice Details message 49 | c. When the user types “Can you show me the invoice notes for ”, this invokes the GetInvoiceNotes intent, and the Lambda function uses the Amazon Comprehend DetectKeyPhrases API to return comments associated with the invoice 50 | 51 | 9. You deploy the Lexbot Web UI in your AWS Cloudformation template by using an existing CloudFormation stack as a nested stack. To download the stack, see Deploy a Web UI for Your Chatbot. This nested stack deploys a Lex Web UI, the webpage is served as a static website from an S3 bucket. The web UI uses Amazon Cognito to generate an access token for authentication and uses AWS CodeStar to set up a delivery pipeline.The end-users interact this chatbot web UI. Please refer to this AWS github repo if you need more details on how to setup a Web UI for your Amazon Lex chatbots - https://github.com/aws-samples/aws-lex-web-ui. 52 | 53 | 54 | 55 | ## Deploy 1 click 56 | [![button](launchstack.png)](https://console.aws.amazon.com/cloudformation/home?region=us-east-1#/stacks/create/review?stackName=Invoicebot&templateURL=https://aws-ml-blog.s3.amazonaws.com/artifacts/textract-comprehend-lex/template-export.yml) 57 | ## License 58 | 59 | This project is licensed under the Apache-2.0 License. 60 | 61 | -------------------------------------------------------------------------------- /arch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-textract-comprehend-lex-chatbot/5c8b069eae56314bc744ae85ed1ef5598296d811/arch.png -------------------------------------------------------------------------------- /arch_latest.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-textract-comprehend-lex-chatbot/5c8b069eae56314bc744ae85ed1ef5598296d811/arch_latest.png -------------------------------------------------------------------------------- /cfntemplate.yml: -------------------------------------------------------------------------------- 1 | AWSTemplateFormatVersion: 2010-09-09 2 | Transform: 3 | - AWS::Serverless-2016-10-31 4 | Parameters: 5 | BotName: 6 | Type: String 7 | Description: Prefix to add to Lex resource names 8 | Default: InvoiceBot 9 | MinLength: 3 10 | MaxLength: 32 11 | AllowedPattern: ^[a-zA-Z\._]+$ 12 | ConstraintDescription: "Must conform with the permitted Lex Bot name pattern.\ 13 | \ \n" 14 | Resources: 15 | DetectTextAsyncJob: 16 | Type: AWS::Serverless::Function 17 | Properties: 18 | Handler: lambda/create_texttract_detect_text_async_job.handler 19 | Runtime: python3.7 20 | CodeUri: s3://aws-bigdata-blog/artifacts/aws-textract-comprehend-lex-chatbot/lambda.zip 21 | Description: '' 22 | MemorySize: 512 23 | Timeout: 30 24 | Environment: 25 | Variables: 26 | SNS_TOPIC_ARN: 27 | Ref: TexttractAsyncJobSNSTopic 28 | SNS_ROLE_ARN: 29 | Fn::GetAtt: 30 | - TextractSNSTopicRole 31 | - Arn 32 | Role: 33 | Fn::GetAtt: 34 | - DetectTextAsyncJobRole 35 | - Arn 36 | Events: 37 | BucketEvent1: 38 | Type: S3 39 | Properties: 40 | Bucket: 41 | Ref: SourceImageBucket 42 | Events: 43 | - s3:ObjectCreated:* 44 | DetectTextAsyncJobRole: 45 | Type: AWS::IAM::Role 46 | Properties: 47 | AssumeRolePolicyDocument: 48 | Version: '2012-10-17' 49 | Statement: 50 | - Effect: Allow 51 | Principal: 52 | Service: 53 | - lambda.amazonaws.com 54 | Action: 55 | - sts:AssumeRole 56 | Path: / 57 | ManagedPolicyArns: 58 | - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole 59 | DetectTextAsyncJobRolePolicy: 60 | Type: AWS::IAM::ManagedPolicy 61 | Properties: 62 | Path: / 63 | PolicyDocument: 64 | Version: 2012-10-17 65 | Statement: 66 | - Effect: Allow 67 | Action: 68 | - s3:GetObject 69 | Resource: arn:aws:s3:::*meaningful-conve*/* 70 | - Effect: Allow 71 | Action: 72 | - textract:StartDocumentTextDetection 73 | Resource: '*' 74 | Roles: 75 | - Ref: DetectTextAsyncJobRole 76 | SourceImageBucket: 77 | Type: AWS::S3::Bucket 78 | Properties: 79 | BucketName: 80 | Fn::Sub: meaningful-conve-ip-bkt-${AWS::Region}-${AWS::AccountId} 81 | TextractSNSTopicRole: 82 | Type: AWS::IAM::Role 83 | Properties: 84 | AssumeRolePolicyDocument: 85 | Version: 2012-10-17 86 | Statement: 87 | - Effect: Allow 88 | Principal: 89 | Service: 90 | - textract.amazonaws.com 91 | Action: 92 | - sts:AssumeRole 93 | Path: / 94 | Policies: 95 | - PolicyName: root 96 | PolicyDocument: 97 | Version: 2012-10-17 98 | Statement: 99 | - Effect: Allow 100 | Action: sns:Publish 101 | Resource: arn:aws:sns:*:*:*Textract* 102 | TexttractAsyncJobSNSTopic: 103 | Type: AWS::SNS::Topic 104 | Properties: 105 | TopicName: TextractAsyncJobCompletionNotify 106 | KmsMasterKeyId: alias/aws/sns 107 | ExtractTextFromAsyncJobOutput: 108 | Type: AWS::Serverless::Function 109 | Properties: 110 | Handler: lambda/extract_text_from_textract_async_job_output.handler 111 | Runtime: python3.7 112 | CodeUri: s3://aws-bigdata-blog/artifacts/aws-textract-comprehend-lex-chatbot/lambda.zip 113 | Description: '' 114 | MemorySize: 512 115 | Timeout: 30 116 | Environment: 117 | Variables: 118 | OUT_PUT_S3_BUCKET: 119 | Ref: OutputTextBucket 120 | Role: 121 | Fn::GetAtt: 122 | - ExtractTextAsyncJobOutputRole 123 | - Arn 124 | Events: 125 | SNSEvent: 126 | Type: SNS 127 | Properties: 128 | Topic: 129 | Ref: TexttractAsyncJobSNSTopic 130 | ExtractTextAsyncJobOutputRole: 131 | Type: AWS::IAM::Role 132 | Properties: 133 | AssumeRolePolicyDocument: 134 | Version: '2012-10-17' 135 | Statement: 136 | - Effect: Allow 137 | Principal: 138 | Service: 139 | - lambda.amazonaws.com 140 | Action: 141 | - sts:AssumeRole 142 | Path: / 143 | ManagedPolicyArns: 144 | - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole 145 | ExtractTextAsyncJobOutputRolePolicies: 146 | Type: AWS::IAM::ManagedPolicy 147 | Properties: 148 | Path: / 149 | PolicyDocument: 150 | Version: 2012-10-17 151 | Statement: 152 | - Effect: Allow 153 | Action: 154 | - textract:GetDocumentTextDetection 155 | Resource: '*' 156 | - Effect: Allow 157 | Action: 158 | - s3:PutObject 159 | Resource: arn:aws:s3:::*meaningful-conve*/* 160 | Roles: 161 | - Ref: ExtractTextAsyncJobOutputRole 162 | LexLambaFunction: 163 | Type: AWS::Serverless::Function 164 | Properties: 165 | Handler: lambda/meaningful-conversations-lex-lambda.lambda_handler 166 | Runtime: python3.7 167 | CodeUri: s3://aws-bigdata-blog/artifacts/aws-textract-comprehend-lex-chatbot/lambda.zip 168 | Description: '' 169 | MemorySize: 512 170 | Timeout: 30 171 | Environment: 172 | Variables: 173 | S3_BUCKET: 174 | Ref: OutputTextBucket 175 | Role: 176 | Fn::GetAtt: 177 | - LexLambaFunctionRole 178 | - Arn 179 | LexLambaFunctionRole: 180 | Type: AWS::IAM::Role 181 | Properties: 182 | AssumeRolePolicyDocument: 183 | Version: '2012-10-17' 184 | Statement: 185 | - Effect: Allow 186 | Principal: 187 | Service: 188 | - lambda.amazonaws.com 189 | Action: 190 | - sts:AssumeRole 191 | Path: / 192 | ManagedPolicyArns: 193 | - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole 194 | LexLambaFunctionRolePolicy: 195 | Type: AWS::IAM::ManagedPolicy 196 | Properties: 197 | Path: / 198 | PolicyDocument: 199 | Version: 2012-10-17 200 | Statement: 201 | - Effect: Allow 202 | Action: 203 | - comprehend:DetectEntities 204 | - comprehend:DetectKeyPhrases 205 | Resource: '*' 206 | - Effect: Allow 207 | Action: 208 | - s3:* 209 | Resource: 210 | - arn:aws:s3:::*meaningful-conve*/* 211 | - arn:aws:s3:::*meaningful-conve* 212 | Roles: 213 | - Ref: LexLambaFunctionRole 214 | OutputTextBucket: 215 | Type: AWS::S3::Bucket 216 | Properties: 217 | BucketName: 218 | Fn::Sub: meaningful-conve-op-bkt-${AWS::Region}-${AWS::AccountId} 219 | LambdaToCreateLexBot: 220 | Type: AWS::Serverless::Function 221 | Properties: 222 | Handler: index.handler 223 | Runtime: python3.7 224 | InlineCode: | 225 | import logging 226 | import json 227 | import boto3 228 | import time 229 | import io 230 | 231 | lexclient = boto3.client('lex-models') 232 | s3 = boto3.resource('s3') 233 | DEFAULT_LOGGING_LEVEL = logging.INFO 234 | logging.basicConfig( 235 | format='[%(levelname)s] %(message)s', 236 | level=DEFAULT_LOGGING_LEVEL 237 | ) 238 | logger = logging.getLogger(__name__) 239 | logger.setLevel(DEFAULT_LOGGING_LEVEL) 240 | 241 | BOT_DEFINITION_FILENAME = '/tmp/InvoiceBot.zip' 242 | BOT_EXPORT_FILENAME = 'bot-definition-export.json' 243 | 244 | s3.Bucket('aws-bigdata-blog').download_file('artifacts/aws-textract-comprehend-lex-chatbot/InvoiceBot.zip', '/tmp/InvoiceBot.zip') 245 | 246 | def create_bot(): 247 | with open(BOT_DEFINITION_FILENAME, 'rb') as file_data: 248 | bytes_content = file_data.read() 249 | response = lexclient.start_import( 250 | payload=bytes_content, 251 | resourceType='BOT', 252 | mergeStrategy='OVERWRITE_LATEST') 253 | print("Import id is"+response['importId']) 254 | 255 | import_status = lexclient.get_import( 256 | importId=response['importId']) 257 | 258 | while import_status['importStatus'] =='IN_PROGRESS': 259 | import_status = lexclient.get_import(importId=response['importId']) 260 | print("Bot creation is in progress") 261 | if import_status['importStatus'] == 'COMPLETE': 262 | return "SUCCESS" 263 | else: 264 | return "FAILURE" 265 | 266 | def delete_bot(bot_name=None): 267 | bot_aliases = lexclient.get_bot_aliases(botName=bot_name)['BotAliases'] 268 | for alias in bot_aliases: 269 | print("Deleting Alias"+alias) 270 | response = lexclient.delete_bot_alias(name=alias,botName=bot_name) 271 | time.sleep(5) 272 | response = lexclient.delete_bot(name=bot_name) 273 | return "SUCCESS" 274 | 275 | def handler(event, context): 276 | """ CloudFormation Custom Resource Lambda Handler 277 | """ 278 | import cfnresponse 279 | 280 | request_type = event.get('RequestType') 281 | resource_properties = event.get('ResourceProperties') 282 | bot_name= resource_properties.get('BotName') 283 | response_status = cfnresponse.SUCCESS 284 | response = {} 285 | response_id = event.get('RequestId') 286 | reason = request_type 287 | error = '' 288 | should_delete = resource_properties.get('ShouldDelete', True) 289 | 290 | 291 | if (request_type in ['Create', 'Update']): 292 | try: 293 | print("here2") 294 | response['status']=create_bot() 295 | if response['status'] =="SUCCESS": 296 | print("Job succeded\n") 297 | response_status = cfnresponse.SUCCESS 298 | else: 299 | response_status = cfnresponse.FAILED 300 | print("Job Failed\n") 301 | except Exception as e: 302 | error = 'failed to {} bot: {}'.format(request_type, e) 303 | pass 304 | 305 | if (request_type == 'Delete' and should_delete != 'false'): 306 | try: 307 | response['status']=delete_bot(bot_name) 308 | if response['status'] =="SUCCESS": 309 | print("Job succeded\n") 310 | response_status = cfnresponse.SUCCESS 311 | else: 312 | response_status = cfnresponse.FAILED 313 | print("Delete Failed\n") 314 | except Exception as e: 315 | error = 'failed to delete bot: {}'.format(e) 316 | pass 317 | 318 | if error: 319 | logger.error(error) 320 | response_status = cfnresponse.FAILED 321 | reason = error 322 | 323 | if bool(context): 324 | cfnresponse.send( 325 | event, 326 | context, 327 | response_status, 328 | response, 329 | response_id, 330 | reason 331 | ) 332 | Description: '' 333 | MemorySize: 512 334 | Timeout: 30 335 | Policies: 336 | - AmazonLexFullAccess 337 | - AmazonS3FullAccess 338 | LexBot: 339 | Type: Custom::LexBot 340 | Properties: 341 | ServiceToken: 342 | Fn::GetAtt: 343 | - LambdaToCreateLexBot 344 | - Arn 345 | BotName: 346 | Ref: BotName 347 | ShouldDelete: 'true' 348 | CodeBuildDeploy: 349 | Type: AWS::CloudFormation::Stack 350 | Properties: 351 | TemplateURL: https://s3.amazonaws.com/aws-bigdata-blog/artifacts/aws-lex-web-ui/artifacts/templates/codebuild-deploy.yaml 352 | Parameters: 353 | CodeBuildName: lex-web-ui 354 | SourceBucket: aws-bigdata-blog 355 | SourceObject: artifacts/aws-lex-web-ui/artifacts/src.zip 356 | CustomResourceCodeObject: artifacts/aws-lex-web-ui/artifacts/custom-resources.zip 357 | CleanupBuckets: 'true' 358 | BotName: 359 | Ref: BotName 360 | BotAlias: $LATEST 361 | ParentOrigin: '' 362 | WebAppConfBotInitialText: Welcome to InvoiceBot. You can ask me to provide 363 | your invoice summary, or details of your invoices, or your invoice notes 364 | WebAppConfBotInitialSpeech: Welcome to InvoiceBot. You can ask me to provide 365 | your invoice summary, or details of your invoices, or your invoice notes 366 | WebAppConfNegativeFeedback: Thumbs down 367 | WebAppConfPositiveFeedback: Thumbs up 368 | WebAppConfHelp: Help 369 | WebAppConfToolbarTitle: InvoiceBot 370 | ShouldEnableCognitoLogin: 'false' 371 | ReInitSessionAttributesOnRestart: 'false' 372 | EnableMarkdownSupport: 'true' 373 | ShouldLoadIframeMinimized: 'false' 374 | ShowResponseCardTitle: 'true' 375 | ConnectContactFlowId: ' ' 376 | ConnectInstanceId: ' ' 377 | ConnectPromptForNameMessage: ' ' 378 | ConnectWaitForAgentMessage: ' ' 379 | ConnectWaitForAgentMessageIntervalInSeconds: 1 380 | ConnectAgentJoinedMessage: ' ' 381 | ConnectAgentLeftMessage: ' ' 382 | ConnectChatEndedMessage: ' ' 383 | ConnectLiveChatTerms: ' ' 384 | CognitoIdentityPoolId: 385 | Fn::If: 386 | - NeedsCognito 387 | - Fn::GetAtt: 388 | - CognitoIdentityPool 389 | - Outputs.CognitoIdentityPoolId 390 | - '' 391 | CognitoAppUserPoolClientId: 392 | Fn::If: 393 | - NeedsCognito 394 | - Fn::GetAtt: 395 | - CognitoIdentityPool 396 | - Outputs.CognitoUserPoolClientId 397 | - UserMustSupply 398 | CognitoUserPoolId: 399 | Fn::If: 400 | - NeedsCognito 401 | - Fn::GetAtt: 402 | - CognitoIdentityPool 403 | - Outputs.CognitoUserPoolId 404 | - UserMustSupply 405 | Timestamp: 600 406 | CognitoIdentityPool: 407 | Type: AWS::CloudFormation::Stack 408 | Condition: NeedsCognito 409 | Properties: 410 | TemplateURL: https://s3.amazonaws.com/aws-bigdata-blog/artifacts/aws-lex-web-ui/artifacts/templates/cognito.yaml 411 | Parameters: 412 | CognitoIdentityPoolName: Lex Web UI 413 | LexBotName: 414 | Ref: BotName 415 | CognitoIdentityPoolConfig: 416 | Type: AWS::CloudFormation::Stack 417 | Condition: NeedsCognito 418 | Properties: 419 | TemplateURL: https://s3.amazonaws.com/aws-bigdata-blog/artifacts/aws-lex-web-ui/artifacts/templates/cognitouserpoolconfig.yaml 420 | Parameters: 421 | Timestamp: 1655687000 422 | WebAppUrl: 423 | Fn::If: 424 | - NeedsParentOrigin 425 | - Fn::GetAtt: 426 | - CodeBuildDeploy 427 | - Outputs.WebAppBase 428 | - '' 429 | WebAppPath: /parent.html 430 | CodeBuildProjectName: 431 | Fn::GetAtt: 432 | - CodeBuildDeploy 433 | - Outputs.CodeBuildProject 434 | CognitoUserPool: 435 | Fn::GetAtt: 436 | - CognitoIdentityPool 437 | - Outputs.CognitoUserPoolId 438 | CognitoUserPoolClient: 439 | Fn::GetAtt: 440 | - CognitoIdentityPool 441 | - Outputs.CognitoUserPoolClientId 442 | Conditions: 443 | NeedsCognito: 444 | Fn::Equals: 445 | - '' 446 | - '' 447 | NeedsParentOrigin: 448 | Fn::Equals: 449 | - '' 450 | - '' 451 | Outputs: 452 | CognitoIdentityPoolId: 453 | Condition: NeedsCognito 454 | Description: Cognito Identity Pool Id 455 | Value: 456 | Fn::GetAtt: 457 | - CognitoIdentityPool 458 | - Outputs.CognitoIdentityPoolId 459 | AssetsUploadBucket: 460 | Value: 461 | Fn::Sub: https://console.aws.amazon.com/s3/home?region=${AWS::Region}&bucket=${SourceImageBucket} 462 | Description: Name of the S3 bucket where the pdfs or images are uploaded for text 463 | extraction 464 | ExtractedTextfilesBucket: 465 | Value: 466 | Fn::Sub: https://console.aws.amazon.com/s3/home?region=${AWS::Region}&bucket=${OutputTextBucket} 467 | Description: Name of the S3 bucket where the files with extracted text are uploaded 468 | LexLambaFunctionArn: 469 | Description: 'Use this Lambda function for the Lex Intent FullfillmentActivity 470 | 471 | ' 472 | Value: 473 | Fn::GetAtt: 474 | - LexLambaFunction 475 | - Arn 476 | LexUIWebAppUrl: 477 | Description: 'URL of the stand-alone sample web application. This page will be 478 | available after the pipeline/deployment completes. 479 | 480 | ' 481 | Value: 482 | Fn::GetAtt: 483 | - CodeBuildDeploy 484 | - Outputs.WebAppUrl 485 | CodeBuildUrl: 486 | Description: 'Monitor the pipeline URL to see when the application has been fully 487 | built and deployed. 488 | 489 | ' 490 | Value: 491 | Fn::Sub: https://console.aws.amazon.com/codebuild/home?region=${AWS::Region}#/projects/${CodeBuildDeploy.Outputs.CodeBuildProject}/view 492 | -------------------------------------------------------------------------------- /image.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-textract-comprehend-lex-chatbot/5c8b069eae56314bc744ae85ed1ef5598296d811/image.png -------------------------------------------------------------------------------- /launchstack.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-textract-comprehend-lex-chatbot/5c8b069eae56314bc744ae85ed1ef5598296d811/launchstack.png -------------------------------------------------------------------------------- /src/lambda/InvoiceBot.zip: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/aws-samples/aws-textract-comprehend-lex-chatbot/5c8b069eae56314bc744ae85ed1ef5598296d811/src/lambda/InvoiceBot.zip -------------------------------------------------------------------------------- /src/lambda/create_texttract_detect_text_async_job.py: -------------------------------------------------------------------------------- 1 | import urllib 2 | import boto3 3 | import os 4 | 5 | textract = boto3.client('textract') 6 | 7 | sns_topic_arn = os.environ["SNS_TOPIC_ARN"] 8 | sns_role_arn = os.environ["SNS_ROLE_ARN"] 9 | 10 | 11 | def handler(event, context): 12 | source_bucket = event['Records'][0]['s3']['bucket']['name'] 13 | object_key = urllib.parse.unquote_plus( 14 | event['Records'][0]['s3']['object']['key']) 15 | 16 | textract_result = textract.start_document_text_detection( 17 | DocumentLocation={ 18 | "S3Object": { 19 | "Bucket": source_bucket, 20 | "Name": object_key 21 | } 22 | }, 23 | NotificationChannel={ 24 | "SNSTopicArn": sns_topic_arn, 25 | "RoleArn": sns_role_arn 26 | } 27 | ) 28 | print(textract_result) -------------------------------------------------------------------------------- /src/lambda/extract_text_from_textract_async_job_output.py: -------------------------------------------------------------------------------- 1 | import json 2 | import time 3 | import boto3 4 | import os 5 | textract = boto3.client('textract') 6 | s3 = boto3.client('s3') 7 | 8 | def handler(event, context): 9 | message = json.loads(event['Records'][0]['Sns']['Message']) 10 | jobId = message['JobId'] 11 | print("JobId="+jobId) 12 | output_bucket = os.environ["OUT_PUT_S3_BUCKET"] 13 | 14 | status = message['Status'] 15 | print("Status="+status) 16 | 17 | if status != "SUCCEEDED": 18 | return { 19 | # TODO : handle error with Dead letter queue (not in this workshop) 20 | # https://docs.aws.amazon.com/lambda/latest/dg/dlq.html 21 | "status": status 22 | } 23 | text_extractor = TextExtractor() 24 | 25 | pages = text_extractor.extract_text(jobId) 26 | file=jobId 27 | content=pages[1]['Content'] 28 | f = open("/tmp/file.txt", "w") 29 | f.write(content) 30 | f.close() 31 | s3_response = s3.upload_file("/tmp/file.txt",output_bucket,file+".txt") 32 | print(list(pages.values())) 33 | 34 | class TextExtractor(): 35 | def extract_text(self, jobId): 36 | """ Extract text from document corresponding to jobId and 37 | generate a list of pages containing the text 38 | """ 39 | 40 | textract_result = self.__get_textract_result(jobId) 41 | pages = {} 42 | self.__extract_all_pages(jobId, textract_result, pages, []) 43 | return pages 44 | 45 | def __get_textract_result(self, jobId): 46 | """ retrieve textract result with job Id """ 47 | 48 | result = textract.get_document_text_detection( 49 | JobId=jobId 50 | ) 51 | return result 52 | 53 | def __extract_all_pages(self, jobId, textract_result, pages, page_numbers): 54 | """ extract page content: build the pages array, 55 | recurse if response is too big (when NextToken is provided by textract) 56 | """ 57 | 58 | blocks = [x for x in textract_result['Blocks'] 59 | if x['BlockType'] == "LINE"] 60 | for block in blocks: 61 | if block['Page'] not in page_numbers: 62 | page_numbers.append(block['Page']) 63 | pages[block['Page']] = { 64 | "Number": block['Page'], 65 | "Content": block['Text'] 66 | } 67 | else: 68 | pages[block['Page']]['Content'] += " " + block['Text'] 69 | 70 | nextToken = textract_result.get("NextToken", "") 71 | if nextToken != '': 72 | textract_result = textract.get_document_text_detection( 73 | JobId=jobId, 74 | NextToken=nextToken 75 | ) 76 | self.__extract_all_pages(jobId, 77 | textract_result, 78 | pages, 79 | page_numbers) -------------------------------------------------------------------------------- /src/lambda/lex-manager.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | ########################################################################## 4 | # Copyright 2017-2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. 5 | # 6 | # Licensed under the Amazon Software License (the "License"). You may not use this file 7 | # except in compliance with the License. A copy of the License is located at 8 | # 9 | # http://aws.amazon.com/asl/ 10 | # 11 | # or in the "license" file accompanying this file. This file is distributed on an "AS IS" 12 | # BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the 13 | # License for the specific language governing permissions and limitations under the License. 14 | ########################################################################## 15 | """Lex Model Building Service helper script 16 | 17 | Used to import/export/delete Lex bots and associated resources 18 | (i.e. intents, slot types). 19 | 20 | Can be run as a shell script or used as a Lambda Function for CloudFormation 21 | Custom Resources... 22 | """ 23 | 24 | import logging 25 | import json 26 | import boto3 27 | import time 28 | 29 | lexclient = boto3.client('lex-models') 30 | DEFAULT_LOGGING_LEVEL = logging.INFO 31 | logging.basicConfig( 32 | format='[%(levelname)s] %(message)s', 33 | level=DEFAULT_LOGGING_LEVEL 34 | ) 35 | logger = logging.getLogger(__name__) 36 | logger.setLevel(DEFAULT_LOGGING_LEVEL) 37 | 38 | BOT_DEFINITION_FILENAME = 'lambda/InvoiceBot.zip' 39 | BOT_EXPORT_FILENAME = 'bot-definition-export.json' 40 | 41 | def create_bot(): 42 | with open(BOT_DEFINITION_FILENAME, 'rb') as file_data: 43 | bytes_content = file_data.read() 44 | response = lexclient.start_import( 45 | payload=bytes_content, 46 | resourceType='BOT', 47 | mergeStrategy='OVERWRITE_LATEST') 48 | print("Import id is"+response['importId']) 49 | 50 | import_status = lexclient.get_import( 51 | importId=response['importId']) 52 | 53 | while import_status['importStatus'] =='IN_PROGRESS': 54 | import_status = lexclient.get_import(importId=response['importId']) 55 | print("Bot creation is in progress") 56 | if import_status['importStatus'] == 'COMPLETE': 57 | return "SUCCESS" 58 | else: 59 | return "FAILURE" 60 | 61 | def delete_bot(bot_name=None): 62 | bot_aliases = lexclient.get_bot_aliases(botName=bot_name)['BotAliases'] 63 | for alias in bot_aliases: 64 | print("Deleting Alias"+alias) 65 | response = lexclient.delete_bot_alias(name=alias,botName=bot_name) 66 | time.sleep(5) 67 | response = lexclient.delete_bot(name=bot_name) 68 | return "SUCCESS" 69 | 70 | def handler(event, context): 71 | """ CloudFormation Custom Resource Lambda Handler 72 | """ 73 | import cfnresponse 74 | 75 | logger.info('event: {}'.format(cfnresponse.json_dump_format(event))) 76 | request_type = event.get('RequestType') 77 | resource_properties = event.get('ResourceProperties') 78 | bot_name= resource_properties.get('BotName') 79 | response_status = cfnresponse.SUCCESS 80 | response = {} 81 | response_id = event.get('RequestId') 82 | reason = request_type 83 | error = '' 84 | should_delete = resource_properties.get('ShouldDelete', True) 85 | 86 | 87 | if (request_type in ['Create', 'Update']): 88 | try: 89 | print("here2") 90 | response['status']=create_bot() 91 | if response['status'] =="SUCCESS": 92 | print("Job succeded\n") 93 | response_status = cfnresponse.SUCCESS 94 | else: 95 | response_status = cfnresponse.FAILED 96 | print("Job Failed\n") 97 | except Exception as e: 98 | error = 'failed to {} bot: {}'.format(request_type, e) 99 | pass 100 | 101 | if (request_type == 'Delete' and should_delete != 'false'): 102 | try: 103 | response['status']=delete_bot(bot_name) 104 | if response['status'] =="SUCCESS": 105 | print("Job succeded\n") 106 | response_status = cfnresponse.SUCCESS 107 | else: 108 | response_status = cfnresponse.FAILED 109 | print("Delete Failed\n") 110 | except Exception as e: 111 | error = 'failed to delete bot: {}'.format(e) 112 | pass 113 | 114 | if error: 115 | logger.error(error) 116 | response_status = cfnresponse.FAILED 117 | reason = error 118 | 119 | if bool(context): 120 | cfnresponse.send( 121 | event, 122 | context, 123 | response_status, 124 | response, 125 | response_id, 126 | reason 127 | ) 128 | -------------------------------------------------------------------------------- /src/lambda/meaningful-conversations-lex-lambda.py: -------------------------------------------------------------------------------- 1 | 2 | ########################################################################## 3 | # Copyright 2017-2017 Amazon.com, Inc. or its affiliates. All Rights Reserved. 4 | # 5 | # Licensed under the Amazon Software License (the "License"). You may not use this file 6 | # except in compliance with the License. A copy of the License is located at 7 | # 8 | # http://aws.amazon.com/asl/ 9 | # 10 | # or in the "license" file accompanying this file. This file is distributed on an "AS IS" 11 | # BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, express or implied. See the 12 | # License for the specific language governing permissions and limitations under the License. 13 | ########################################################################## 14 | import json 15 | import datetime 16 | import time 17 | import os 18 | import dateutil.parser 19 | import logging 20 | import boto3 21 | import tarfile 22 | import csv 23 | import re 24 | from io import StringIO 25 | from io import BytesIO 26 | 27 | 28 | logger = logging.getLogger() 29 | logger.setLevel(logging.INFO) 30 | 31 | s3client = boto3.client('s3') 32 | s3 = boto3.resource('s3') 33 | 34 | comprehend = boto3.client('comprehend') 35 | 36 | bucket=os.environ['S3_BUCKET'] 37 | input_bucket = s3.Bucket(bucket) 38 | 39 | 40 | # --- Helpers that build all of the responses --- 41 | 42 | 43 | def elicit_slot(session_attributes, intent_name, slots, slot_to_elicit, message): 44 | return { 45 | 'sessionAttributes': session_attributes, 46 | 'dialogAction': { 47 | 'type': 'ElicitSlot', 48 | 'intentName': intent_name, 49 | 'slots': slots, 50 | 'slotToElicit': slot_to_elicit, 51 | 'message': message 52 | } 53 | } 54 | 55 | 56 | def confirm_intent(session_attributes, intent_name, slots, message): 57 | return { 58 | 'sessionAttributes': session_attributes, 59 | 'dialogAction': { 60 | 'type': 'ConfirmIntent', 61 | 'intentName': intent_name, 62 | 'slots': slots, 63 | 'message': message 64 | } 65 | } 66 | 67 | 68 | def close(session_attributes, fulfillment_state, message): 69 | response = { 70 | 'sessionAttributes': session_attributes, 71 | 'dialogAction': { 72 | 'type':'Close', 73 | 'fulfillmentState':fulfillment_state, 74 | 'message':message 75 | } 76 | } 77 | return response 78 | 79 | 80 | def delegate(session_attributes, slots): 81 | return { 82 | 'sessionAttributes': session_attributes, 83 | 'dialogAction': { 84 | 'type': 'Delegate', 85 | 'slots': slots 86 | } 87 | } 88 | 89 | 90 | # --- Helper Functions --- 91 | 92 | 93 | def safe_int(n): 94 | """ 95 | Safely convert n value to int. 96 | """ 97 | if n is not None: 98 | return int(n) 99 | return n 100 | 101 | 102 | def try_ex(func): 103 | """ 104 | Call passed in function in try block. If KeyError is encountered return None. 105 | This function is intended to be used to safely access dictionary. 106 | 107 | Note that this function would have negative impact on performance. 108 | """ 109 | 110 | try: 111 | return func() 112 | except KeyError: 113 | return None 114 | 115 | 116 | 117 | 118 | 119 | def build_validation_result(isvalid, violated_slot, message_content): 120 | return { 121 | 'isValid': isvalid, 122 | 'violatedSlot': violated_slot, 123 | 'message': {'contentType': 'PlainText', 'content': message_content} 124 | } 125 | 126 | 127 | 128 | """ --- Functions that control the bot's behavior --- """ 129 | def get_summary(intent_request): 130 | # Declare variables and get handle to the S3 bucket containing the Textract output 131 | session_attributes = intent_request['sessionAttributes'] if intent_request['sessionAttributes'] is not None else {} 132 | 133 | i = 0 134 | qty = 0 135 | 136 | for file in input_bucket.objects.all(): 137 | i += 1 138 | selected_phrases = "" 139 | input_bucket_text_file = s3.Object(bucket, file.key) 140 | text_file_contents = str(input_bucket_text_file.get()['Body'].read().decode('utf-8')) 141 | 142 | #Comprehend Entity Detection 143 | detected_entities = comprehend.detect_entities( 144 | Text=text_file_contents, 145 | LanguageCode="en" 146 | ) 147 | print(detected_entities) 148 | 149 | selected_entity_types = ["ORGANIZATION", "OTHER", "DATE", "QUANTITY", "LOCATION"] 150 | # Let's get the billing summary across invoices 151 | for x in detected_entities['Entities']: 152 | if x['Type'] == "OTHER" and x['EndOffset'] < 40: 153 | nr = x['Text'] 154 | if x['Type'] == "QUANTITY" and x['EndOffset'] > 350: 155 | qty = round((qty + float(x['Text'])), 2) 156 | return close( 157 | session_attributes, 158 | 'Fulfilled', 159 | { 160 | 'contentType': 'PlainText', 161 | 'content': 'I reviewed your input documents and found {} invoices with invoice numbers {} totaling ${}. I can get you invoice details or invoice notes. Simply type your request'.format(i, nr, str(qty)) 162 | } 163 | ) 164 | 165 | 166 | def get_details(intent_request): 167 | bill = "" 168 | billsum = [] 169 | result = "" 170 | y = True 171 | session_attributes = intent_request['sessionAttributes'] if intent_request['sessionAttributes'] is not None else {} 172 | inr = intent_request['currentIntent']['slots']['invoicenr'] 173 | 174 | r = 0 175 | i = 0 176 | for file in input_bucket.objects.all(): 177 | i += 1 178 | selected_phrases = "" 179 | input_bucket_text_file = s3.Object(bucket, file.key) 180 | text_file_contents = str(input_bucket_text_file.get()['Body'].read().decode('utf-8')) 181 | #Comprehend Entity Detection 182 | detected_entities = comprehend.detect_entities( 183 | Text=text_file_contents, 184 | LanguageCode="en" 185 | ) 186 | 187 | 188 | print(detected_entities) 189 | selected_entity_types = ["DATE", "QUANTITY"] 190 | for x in detected_entities['Entities']: 191 | if x['Type'] in "OTHER": 192 | detnr = x['Text'] 193 | if detnr == inr: 194 | htmlstring = "Invoice Details for " + detnr + ": " 195 | for x in detected_entities['Entities']: 196 | if x['Type'] in selected_entity_types and x['EndOffset'] > 40 and x['EndOffset'] <= 337: 197 | r += 1 198 | if r == 1: 199 | htmlstring += "On " + x['Text'] + " " 200 | elif r == 2: 201 | htmlstring += "for the item " + x['Text'] + " " 202 | else: 203 | htmlstring += " there is a charge of " + str(x['Text'].split()[0]) + ". " 204 | r = 0 205 | print("HTMLString is: " + htmlstring) 206 | 207 | result = htmlstring + " You can request me for invoice notes or simply close this chat." 208 | else: 209 | result = 'Sorry I could not find a match for that Invoice Number. Please request for invoice details with a valid Invoice Number.' 210 | return close( 211 | session_attributes, 212 | 'Fulfilled', 213 | { 214 | 'contentType': 'PlainText', 215 | 'content': result 216 | } 217 | ) 218 | 219 | 220 | 221 | 222 | def get_notes(intent_request): 223 | 224 | session_attributes = intent_request['sessionAttributes'] if intent_request['sessionAttributes'] is not None else {} 225 | inr = intent_request['currentIntent']['slots']['invoicenr'] 226 | 227 | i = 0 228 | notes = "" 229 | phrases = [] 230 | 231 | for file in input_bucket.objects.all(): 232 | i += 1 233 | selected_phrases = "" 234 | input_bucket_text_file = s3.Object(bucket, file.key) 235 | text_file_contents = str(input_bucket_text_file.get()['Body'].read().decode('utf-8')) 236 | 237 | detected_entities = comprehend.detect_entities( 238 | Text=text_file_contents, 239 | LanguageCode="en" 240 | ) 241 | #print(detected_entities) 242 | #selected_entity_types = ["ORGANIZATION", "OTHER", "DATE", "QUANTITY", "LOCATION"] 243 | for x in detected_entities['Entities']: 244 | if x['Type'] in "OTHER": 245 | detnr = x['Text'] 246 | if detnr == inr: 247 | #Comprehend Key Phrases Detection 248 | detected_key_phrases = comprehend.detect_key_phrases( 249 | Text=text_file_contents, 250 | LanguageCode="en" 251 | ) 252 | print(detected_key_phrases) 253 | for y in detected_key_phrases['KeyPhrases']: 254 | if y['EndOffset'] > 185 and y['EndOffset'] <= 337: 255 | selected_phrases = " " + y['Text'] + selected_phrases + " " 256 | 257 | #phrases.append(selected_phrases) 258 | print("Selected Phrases are: " + selected_phrases) 259 | #notes = notes + ". Notes for Invoice " + str(i) + " are: " + str(phrases[i - 1]) 260 | result = "Invoice Notes for " + detnr + ": " + selected_phrases 261 | else: 262 | result = 'Sorry I could not find a match for that Invoice Number. Please request for invoice notes with a valid Invoice Number' 263 | return close( 264 | session_attributes, 265 | 'Fulfilled', 266 | { 267 | 'contentType': 'PlainText', 268 | 'content': result + '. Feel free to try the options again or you can simply close this chat' 269 | } 270 | ) 271 | 272 | def dispatch(intent_request): 273 | """ 274 | Called when the user specifies an intent for this bot. 275 | """ 276 | print("Intent Request is: " + str(intent_request)) 277 | logger.debug('dispatch userId={}, intentName={}'.format(intent_request['userId'], intent_request['currentIntent']['name'])) 278 | 279 | intent_name = intent_request['currentIntent']['name'] 280 | 281 | # Dispatch to your bot's intent handlers 282 | if intent_name == 'GetInvoiceSummary': 283 | return get_summary(intent_request) 284 | elif intent_name == 'GetInvoiceDetails': 285 | return get_details(intent_request) 286 | elif intent_name == 'GetInvoiceNotes': 287 | return get_notes(intent_request) 288 | 289 | raise Exception('Intent with name ' + intent_name + ' not supported') 290 | 291 | 292 | # --- Main handler --- 293 | 294 | 295 | def lambda_handler(event, context): 296 | """ 297 | Route the incoming request based on intent. 298 | The JSON body of the request is provided in the event slot. 299 | """ 300 | logger.debug('event.bot.name={}'.format(event['bot']['name'])) 301 | 302 | return dispatch(event) 303 | --------------------------------------------------------------------------------