├── .gitignore ├── LICENSE ├── README.md ├── conda.yaml ├── devdata └── env.json ├── images ├── control-room-process-steps.png ├── control-room-vault.png ├── example-excel.png ├── example-pdf-invoice.png ├── example-run-artifacts-excel.png └── example-run-artifacts.png ├── resources ├── InvoiceGenerator.py ├── InvoiceModeler.py ├── common.resource └── invoices.resource ├── robot.yaml └── tasks.robot /.gitignore: -------------------------------------------------------------------------------- 1 | testrun/ 2 | output/ 3 | venv/ 4 | temp/ 5 | .rpa/ 6 | .idea/ 7 | .ipynb_checkpoints/ 8 | .virtual_documents/ 9 | */.ipynb_checkpoints/* 10 | .vscode 11 | .DS_Store 12 | *.pyc 13 | *.zip 14 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Process PDF invoices with Amazon Textract 2 | 3 | [PDF](https://en.wikipedia.org/wiki/PDF) - The most machine-readable document format ever! Right? 🙈 4 | 5 | > Extracting text from PDF files is not a simple operation. PDF was never meant to be a format to _read data from_: its purpose is to provide an accurate way of reproducing documents and make them portable to any system. - [How to read PDF files with RPA Framework](https://robocorp.com/docs/development-guide/pdf/how-to-read-pdf-files)! 6 | 7 | Still, it **is** possible to automatically read and extract invoice data from PDF documents and save the data to an [Excel](https://en.wikipedia.org/wiki/Microsoft_Excel) file. No more manual copy & pasting! 8 | 9 | This robot processes randomly generated PDF invoices with [Amazon Textract](https://aws.amazon.com/textract/) and saves the extracted invoice data in an Excel file. 10 | 11 | ## Example PDF invoice 12 | 13 | 14 | 15 | ## Example Excel 16 | 17 | 18 | 19 | ## Tasks 20 | 21 | The robot provides three tasks: 22 | 23 | - `Create Invoices` 24 | - `Process PDF invoices with Amazon Textract` 25 | - `Delete Files From Amazon S3 Bucket` 26 | 27 | ### Create Invoices 28 | 29 | - Generates random PDF invoices and uploads them to [Amazon S3](https://aws.amazon.com/s3/) bucket. 30 | - Saves the generated PDF invoices to the output directory for debugging purposes. 31 | 32 | 33 | 34 | ### Process PDF invoices with Amazon Textract 35 | 36 | - Reads the invoices from the Amazon S3 bucket. 37 | - Processes the invoices with Amazon Textract. 38 | - Saves the extracted invoice data in an Excel file in the output directory. 39 | - Finally, deletes the PDF invoices from the Amazon S3 bucket. 40 | 41 | 42 | 43 | ### Delete Files From Amazon S3 Bucket 44 | 45 | - A utility task for deleting the PDF invoices from the Amazon S3 bucket. 46 | - Can be executed separately when you want to empty the Amazon S3 bucket. 47 | - Called by the `Process PDF invoices with Amazon Textract` task in the teardown phase. 48 | 49 | ## Prerequisites 50 | 51 | ### Amazon API key and key ID with access to Amazon S3 and Amazon Textract 52 | 53 | The robot requires access to Amazon S3 and Amazon Textract services. It needs an API key, key ID, and the AWS region. Check out [Amazon Textract Developer Guide](https://docs.aws.amazon.com/textract/latest/dg/what-is.html)! 54 | 55 | ### Store the API key, key ID, and the AWS region in Robocorp Vault 56 | 57 | Set up [Robocorp Vault](https://robocorp.com/docs/development-guide/variables-and-secrets/vault) either locally or in [Control Room](https://robocorp.com/docs/control-room). 58 | 59 | For a local run, use the following configuration: 60 | 61 | `/Users/username/vault.json`: 62 | 63 | ```json 64 | { 65 | "aws": { 66 | "AWS_KEY": "aws-key", 67 | "AWS_KEY_ID": "aws-key-id", 68 | "AWS_REGION": "us-east-1" 69 | } 70 | } 71 | ``` 72 | 73 | `devdata/env.json`: 74 | 75 | ```json 76 | { 77 | "RPA_SECRET_MANAGER": "RPA.Robocorp.Vault.FileSecrets", 78 | "RPA_SECRET_FILE": "/Users/username/vault.json" 79 | } 80 | ``` 81 | 82 | For Control Room run, create a new vault entry in Control Room. 83 | 84 | - Enter `aws` as the name. 85 | - Provide values for the `AWS_KEY`, `AWS_KEY_ID`, and `AWS_REGION` keys: 86 | 87 | 88 | 89 | ## Running 90 | 91 | 1. Run the `Create Invoices` task to create the PDF invoices. 92 | 93 | 2. Run the `Process PDF invoices with Amazon Textract` task to process the PDF invoices and to generate the Excel file with the data extracted from the invoices. 94 | 95 | Optional: Run the `Delete Files From Amazon S3 Bucket` task if you want to delete the PDF invoices from the Amazon S3 bucket (the `Process PDF invoices with Amazon Textract` task does this automatically in the teardown phase). 96 | 97 | When running in Control Room, add the `Create Invoices` and `Process PDF invoices with Amazon Textract` as process steps: 98 | 99 | 100 | 101 | ## Further reading 102 | 103 | - [Handling PDF files](https://robocorp.com/docs/development-guide/pdf) 104 | - [How to read PDF files with RPA Framework](https://robocorp.com/docs/development-guide/pdf/how-to-read-pdf-files) 105 | - [What is Amazon Textract?](https://docs.aws.amazon.com/textract/latest/dg/what-is.html) 106 | - [Cloud machine learning (ML) APIs example robot](https://robocorp.com/docs/development-guide/ai-machine-learning/cloud-machine-learning-apis) 107 | - [RPA.Cloud.AWS library](https://robocorp.com/docs/libraries/rpa-framework/rpa-cloud-aws) 108 | -------------------------------------------------------------------------------- /conda.yaml: -------------------------------------------------------------------------------- 1 | channels: 2 | - conda-forge 3 | 4 | dependencies: 5 | - python=3.7.5 6 | - faker=13.11.1 7 | - pip=20.1 8 | - pip: 9 | - rpaframework==14.1.1 # https://rpaframework.org/releasenotes.html 10 | - rpaframework-aws==2.0.1 11 | -------------------------------------------------------------------------------- /devdata/env.json: -------------------------------------------------------------------------------- 1 | { 2 | "RPA_SECRET_MANAGER": "RPA.Robocorp.Vault.FileSecrets", 3 | "RPA_SECRET_FILE": "/Users/username/vault.json" 4 | } 5 | -------------------------------------------------------------------------------- /images/control-room-process-steps.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robocorp/example-process-invoices-with-amazon-textract/ee34feab52e0abc99e5e20917a4defeb6736009f/images/control-room-process-steps.png -------------------------------------------------------------------------------- /images/control-room-vault.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robocorp/example-process-invoices-with-amazon-textract/ee34feab52e0abc99e5e20917a4defeb6736009f/images/control-room-vault.png -------------------------------------------------------------------------------- /images/example-excel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robocorp/example-process-invoices-with-amazon-textract/ee34feab52e0abc99e5e20917a4defeb6736009f/images/example-excel.png -------------------------------------------------------------------------------- /images/example-pdf-invoice.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robocorp/example-process-invoices-with-amazon-textract/ee34feab52e0abc99e5e20917a4defeb6736009f/images/example-pdf-invoice.png -------------------------------------------------------------------------------- /images/example-run-artifacts-excel.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robocorp/example-process-invoices-with-amazon-textract/ee34feab52e0abc99e5e20917a4defeb6736009f/images/example-run-artifacts-excel.png -------------------------------------------------------------------------------- /images/example-run-artifacts.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/robocorp/example-process-invoices-with-amazon-textract/ee34feab52e0abc99e5e20917a4defeb6736009f/images/example-run-artifacts.png -------------------------------------------------------------------------------- /resources/InvoiceGenerator.py: -------------------------------------------------------------------------------- 1 | from datetime import datetime, timedelta 2 | from faker import Faker 3 | 4 | import json 5 | import pytz 6 | import requests 7 | import random 8 | 9 | 10 | class InvoiceGeneratorClass: 11 | """ API Object for Invoice-Generator tool - https://invoice-generator.com/ """ 12 | 13 | URL = "https://invoice-generator.com" 14 | DATE_FORMAT = "%d %b %Y" 15 | LOCALE = "en_US" 16 | TIMEZONE = "Europe/Paris" 17 | # Below are the default template parameters that can be changed (see https://github.com/Invoiced/invoice-generator-api/) 18 | TEMPLATE_PARAMETERS = [ 19 | "header", 20 | "to_title", 21 | "ship_to_title", 22 | "invoice_number_title", 23 | "date_title", 24 | "payment_terms_title", 25 | "due_date_title", 26 | "purchase_order_title", 27 | "quantity_header", 28 | "item_header", 29 | "unit_cost_header", 30 | "amount_header", 31 | "subtotal_title", 32 | "discounts_title", 33 | "tax_title", 34 | "shipping_title", 35 | "total_title", 36 | "amount_paid_title", 37 | "balance_title", 38 | "terms_title", 39 | "notes_title", 40 | ] 41 | 42 | def __init__( 43 | self, 44 | sender, 45 | to, 46 | logo=None, 47 | ship_to=None, 48 | number=None, 49 | payments_terms=None, 50 | due_date=None, 51 | notes=None, 52 | terms=None, 53 | currency="USD", 54 | date=datetime.now(tz=pytz.timezone(TIMEZONE)), 55 | discounts=0, 56 | tax=0, 57 | shipping=0, 58 | amount_paid=0, 59 | ): 60 | """ Object constructor """ 61 | self.logo = logo 62 | self.sender = sender 63 | self.to = to 64 | self.ship_to = ship_to 65 | self.number = number 66 | self.currency = currency 67 | self.custom_fields = [] 68 | self.date = date 69 | self.payment_terms = payments_terms 70 | self.due_date = due_date 71 | self.items = [] 72 | self.fields = {"tax": "%", "discounts": False, "shipping": False} 73 | self.discounts = discounts 74 | self.tax = tax 75 | self.shipping = shipping 76 | self.amount_paid = amount_paid 77 | self.notes = notes 78 | self.terms = terms 79 | self.template = {} 80 | self.base_order_number = random.randint(30000, 90000) 81 | 82 | def _to_json(self): 83 | """ 84 | Parsing the object as JSON string 85 | Please note we need also to replace the key sender to from, as per expected in the API but incompatible with from keyword inherent to Python 86 | We are formatting here the correct dates 87 | We are also resetting the two list of Objects items and custom_fields so that it can be JSON serializable 88 | Finally, we are handling template customization with its dict 89 | """ 90 | object_dict = self.__dict__ 91 | object_dict.pop("base_order_number") 92 | object_dict["from"] = object_dict.get("sender") 93 | object_dict["date"] = self.date.strftime( 94 | InvoiceGeneratorClass.DATE_FORMAT) 95 | if object_dict["due_date"] is not None: 96 | object_dict["due_date"] = self.due_date.strftime( 97 | InvoiceGeneratorClass.DATE_FORMAT 98 | ) 99 | object_dict.pop("sender") 100 | for index, item in enumerate(object_dict["items"]): 101 | object_dict["items"][index] = item.__dict__ 102 | for template_parameter, value in self.template.items(): 103 | object_dict[template_parameter] = value 104 | object_dict.pop("template") 105 | return json.dumps(object_dict) 106 | 107 | def add_custom_field(self, name=None, value=None): 108 | """ Add a custom field to the invoice """ 109 | self.custom_fields.append(CustomField(name=name, value=value)) 110 | 111 | def add_item(self, name=None, quantity=0, unit_cost=0.0, description=None): 112 | """ Add item to the invoice """ 113 | self.items.append( 114 | Item( 115 | name=name, 116 | quantity=quantity, 117 | unit_cost=unit_cost, 118 | description=description, 119 | ) 120 | ) 121 | 122 | def download(self, file_path): 123 | """ Directly send the request and store the file on path """ 124 | json_string = self._to_json() 125 | response = requests.post( 126 | InvoiceGeneratorClass.URL, 127 | json=json.loads(json_string), 128 | stream=True, 129 | headers={"Accept-Language": InvoiceGeneratorClass.LOCALE}, 130 | ) 131 | if response.status_code == 200: 132 | open(file_path, "wb").write(response.content) 133 | 134 | def set_template_text(self, template_parameter, value): 135 | """ If you want to change a default value for customising your invoice template, call this method """ 136 | if template_parameter in InvoiceGeneratorClass.TEMPLATE_PARAMETERS: 137 | self.template[template_parameter] = value 138 | else: 139 | raise ValueError( 140 | "The parameter {} is not a valid template parameter. See docs.".format( 141 | template_parameter 142 | ) 143 | ) 144 | 145 | def toggle_subtotal(self, tax="%", discounts=False, shipping=False): 146 | """ Toggle lines of subtotal """ 147 | self.fields = {"tax": tax, 148 | "discounts": discounts, "shipping": shipping} 149 | 150 | 151 | class Item: 152 | """ Item object for an invoice """ 153 | 154 | def __init__(self, name, quantity, unit_cost, description=""): 155 | """ Object constructor """ 156 | self.name = name 157 | self.quantity = quantity 158 | self.unit_cost = unit_cost 159 | self.description = description 160 | 161 | 162 | class CustomField: 163 | """ Custom Field object for an invoice """ 164 | 165 | def __init__(self, name, value): 166 | """ Object constructor """ 167 | self.name = name 168 | self.value = value 169 | 170 | 171 | class InvoiceGenerator: 172 | def create_random_invoice(self, filename: str, order: int): 173 | faker = Faker() 174 | invoice_number = random.randint(1000, 9999) 175 | dynamic_date = random.randint(-7, 7) 176 | invoice_date = datetime.now() + timedelta(dynamic_date) 177 | due_date = datetime.now() + timedelta(dynamic_date + 7) 178 | invoice = InvoiceGeneratorClass( 179 | sender="Demo Company\nSuite 5A-1204\n123 Somewhere Street\nYour City AZ 12345\nadmin@invoices.com", 180 | to=f"To:\n{faker.company()} {faker.company_suffix()}\n{faker.address()}", 181 | logo="https://invoiced.com/img/logo-invoice.png", 182 | number=1, 183 | terms="Payment is due within 30 days from date of invoice. Late payment is subject to fees of 5% per month.", 184 | shipping=50, 185 | ) 186 | invoice.add_item( 187 | name="Starter plan", 188 | quantity=random.randint(1, 50), 189 | unit_cost=99, 190 | ) 191 | invoice.add_item( 192 | name="Fees", 193 | quantity=random.randint(1, 5), 194 | unit_cost=49, 195 | ) 196 | order_number = str(invoice.base_order_number + order) 197 | invoice.custom_fields.append( 198 | {"name": "Invoice Number", "value": f"INV-{invoice_number}"} 199 | ) 200 | invoice.custom_fields.append( 201 | {"name": "Order Number", "value": order_number}) 202 | invoice.custom_fields.append( 203 | {"name": "Invoice Date", 204 | "value": invoice_date.strftime("%B %d, %Y")} 205 | ) 206 | invoice.custom_fields.append( 207 | {"name": "Due Date", "value": due_date.strftime("%B %d, %Y")} 208 | ) 209 | invoice.toggle_subtotal(shipping=True) 210 | 211 | invoice.template["quantity_header"] = "Hrs/Qty" 212 | 213 | ret = invoice.download(filename) 214 | print 215 | return filename, order_number 216 | -------------------------------------------------------------------------------- /resources/InvoiceModeler.py: -------------------------------------------------------------------------------- 1 | import json 2 | 3 | 4 | class InvoiceModeler: 5 | def get_new_invoice(self): 6 | return { 7 | "invoice_number": "", 8 | "order_number": "", 9 | "invoice_date": "", 10 | "due_date": "", 11 | "balance_due": "", 12 | "from": "", 13 | "to": "", 14 | } 15 | 16 | def create_aws_invoice(self, fields): 17 | invoice = self.get_new_invoice() 18 | for field in fields: 19 | key = field.key.text.lower() 20 | val = field.value.text if field.value else "" 21 | 22 | if "invoice number" in key: 23 | invoice["invoice_number"] = val 24 | elif "order number" in key: 25 | invoice["order_number"] = val 26 | elif "invoice date" in key: 27 | invoice["invoice_date"] = val 28 | elif "due date" in key: 29 | invoice["due_date"] = val 30 | elif "balance due" in key or "total due" in key: 31 | invoice["balance_due"] = val.replace("$", "") 32 | elif "from:" in key: 33 | invoice["from"] = val 34 | elif key in ["bill to:", "to:"]: 35 | invoice["to"] = val 36 | return invoice 37 | -------------------------------------------------------------------------------- /resources/common.resource: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Library Collections 3 | Library RPA.Cloud.AWS 4 | Library RPA.Excel.Files 5 | Library RPA.Robocorp.Vault 6 | Library InvoiceGenerator 7 | Library InvoiceModeler 8 | 9 | 10 | *** Keywords *** 11 | Initialize Amazon Clients 12 | ${secret}= Get Secret aws 13 | Init Textract Client 14 | ... ${secret}[AWS_KEY_ID] 15 | ... ${secret}[AWS_KEY] 16 | ... ${secret}[AWS_REGION] 17 | Init S3 Client 18 | ... ${secret}[AWS_KEY_ID] 19 | ... ${secret}[AWS_KEY] 20 | ... ${secret}[AWS_REGION] 21 | -------------------------------------------------------------------------------- /resources/invoices.resource: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Resource common.resource 3 | 4 | 5 | *** Variables *** 6 | ${S3_BUCKET_NAME} example-amazon-textract-pdf-invoice-bucket 7 | ${NUMBER_OF_INVOICES} %{NUMBER_OF_INVOICES=10} 8 | 9 | 10 | *** Keywords *** 11 | Process PDF with Amazon Textract 12 | [Arguments] ${filename} 13 | ${job_id}= Start Document Analysis ${S3_BUCKET_NAME} ${filename} 14 | RETURN ${job_id} 15 | 16 | Wait For PDF Processing Results 17 | [Arguments] ${job_ids} 18 | ${invoices}= Create List 19 | ${responses}= Create List 20 | FOR ${i} IN RANGE 90 21 | ${jobs_length}= Get Length ${job_ids} 22 | IF ${jobs_length} == 0 BREAK 23 | ${jobs_queue}= Copy List ${job_ids} 24 | FOR ${job_id} IN @{jobs_queue} 25 | ${response}= Get Document Analysis ${job_id} 26 | IF "${response}[JobStatus]" == "SUCCEEDED" 27 | Remove Values From List ${job_ids} ${job_id} 28 | Append To List ${responses} ${response} 29 | END 30 | END 31 | Sleep 3s 32 | END 33 | FOR ${response} IN @{responses} 34 | ${model}= Convert Textract Response To Model ${response} 35 | ${invoice}= Set Variable ${NONE} 36 | FOR ${page} IN @{model.pages} 37 | IF ${page.form.fields} 38 | ${invoice}= Create AWS Invoice ${page.form.fields} 39 | Append To List ${invoices} ${invoice} 40 | END 41 | END 42 | END 43 | RETURN ${invoices} 44 | 45 | Save Invoices To Excel 46 | [Arguments] ${invoices} 47 | Create Workbook ${OUTPUT_DIR}${/}invoices.xlsx 48 | FOR ${invoice} IN @{invoices} 49 | &{row}= Create Dictionary 50 | ... To ${invoice}[to] 51 | ... Invoice Number ${invoice}[invoice_number] 52 | ... Order Number ${invoice}[order_number] 53 | ... Invoice Date ${invoice}[invoice_date] 54 | ... Due Date ${invoice}[due_date] 55 | ... Balance Due ${invoice}[balance_due] 56 | Append Rows to Worksheet ${row} header=True 57 | END 58 | Save Workbook 59 | 60 | Create PDF Invoice 61 | [Arguments] ${index} 62 | ${filename}= Set Variable invoice_${index}.pdf 63 | ${file_path}= Set Variable ${OUTPUT_DIR}${/}${filename} 64 | Create Random Invoice ${file_path} ${index} 65 | ${upload_status}= Upload File 66 | ... ${S3_BUCKET_NAME} 67 | ... ${file_path} 68 | ... ${filename} 69 | 70 | Create Invoices 71 | ${bucket_created}= Create Bucket ${S3_BUCKET_NAME} 72 | FOR ${index} IN RANGE 1 ${NUMBER_OF_INVOICES}+1 73 | Create PDF Invoice ${index} 74 | END 75 | 76 | Get File Keys From Amazon S3 Bucket 77 | ${files}= List Files ${S3_BUCKET_NAME} 78 | @{keys}= Create List 79 | FOR ${file} IN @{files} 80 | Append To List ${keys} ${file}[Key] 81 | END 82 | RETURN ${keys} 83 | 84 | Delete Files From Amazon S3 Bucket 85 | ${keys}= Get File Keys From Amazon S3 Bucket 86 | ${deleted}= Delete Files ${S3_BUCKET_NAME} ${keys} 87 | -------------------------------------------------------------------------------- /robot.yaml: -------------------------------------------------------------------------------- 1 | tasks: 2 | Process PDF invoices with Amazon Textract: 3 | robotTaskName: Process PDF invoices with Amazon Textract 4 | Create Invoices: 5 | robotTaskName: Create Invoices 6 | Delete Files From Amazon S3 Bucket: 7 | robotTaskName: Delete Files From Amazon S3 Bucket 8 | 9 | condaConfigFile: conda.yaml 10 | artifactsDir: output 11 | PATH: 12 | - . 13 | PYTHONPATH: 14 | - . 15 | - resources 16 | ignoreFiles: 17 | - .gitignore 18 | -------------------------------------------------------------------------------- /tasks.robot: -------------------------------------------------------------------------------- 1 | *** Settings *** 2 | Documentation Processes PDF invoices with Amazon Textract. 3 | ... Saves the extracted invoice data in an Excel file. 4 | 5 | Resource invoices.resource 6 | 7 | 8 | *** Tasks *** 9 | Process PDF invoices with Amazon Textract 10 | [Setup] Initialize Amazon Clients 11 | ${job_ids}= Create List 12 | ${filenames}= Get File Keys From Amazon S3 Bucket 13 | FOR ${filename} IN @{filenames} 14 | ${job_id}= Process PDF with Amazon Textract ${filename} 15 | Append To List ${job_ids} ${job_id} 16 | END 17 | ${invoices}= Wait For PDF Processing Results ${job_ids} 18 | Save Invoices To Excel ${invoices} 19 | [Teardown] Delete Files From Amazon S3 Bucket 20 | 21 | Create Invoices 22 | [Setup] Initialize Amazon Clients 23 | Create Invoices 24 | 25 | Delete Files From Amazon S3 Bucket 26 | [Setup] Initialize Amazon Clients 27 | Delete Files From Amazon S3 Bucket 28 | --------------------------------------------------------------------------------