├── .gitignore ├── LICENSE ├── README.md ├── main.py ├── requirements.txt ├── templates └── index.html └── uploads └── README.txt /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | uploads/* -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 4 | 5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 6 | 7 | THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 8 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Invoiceable 2 | 3 |
4 | 5 | Watch the Demo Video 6 | 7 | [Watch the Demo](https://user-images.githubusercontent.com/76186054/214171508-8ef2e3c1-f3fe-46f7-ad6d-ba3677c762f1.mp4) 8 | 9 |
10 | 11 | 12 | ## Introduction 13 | 14 | Invoiceable is a free and open-sourced Flask application that uses AI, Tesseract OCR, and the open-sourced machine learning model [`impira/layoutlm-document-qa`](https://huggingface.co/impira/layoutlm-document-qa) to parse invoices, documents, résumés, and more. 15 | 16 | Invoiceable is an educational project and should not be relied on. Do not expect updates, bugfixes, or major changes to this abandonware. 17 | 18 | **Do not rely on Invoiceable because it is often inaccurate.** 19 | 20 | If Invoiceable returns an incorrect response, please don't contact the authors/maintainers of Invoiceable. Invoiceable did not create or train the AI model used in this program. 21 | 22 | ## Requirements 23 | 24 | There's not much that you need to run Invoiceable! 25 | 26 | * [Python 3.9+](https://www.python.org/) 27 | 28 | * A computer that can run AI models 29 | 30 | * ~5 GB of disk space - you probably don't need that much, but it's better to be on the safe side. 31 | 32 | * [Git](https://git-scm.com/) 33 | 34 | * [Flask](https://flask.palletsprojects.com/) 35 | 36 | ## Installation/Usage 37 | 38 | Installation is very simple. Run the following commands in Terminal or PowerShell: 39 | 40 | ```bash 41 | git clone https://github.com/fakerybakery/Invoiceable.git 42 | cd Invoiceable 43 | pip3 install -r requirements.txt 44 | python3 main.py 45 | ``` 46 | 47 | On startup, you should see `Importing modules...`. Depending on the speed of your computer, this should be relatively fast. If you don't see `Starting pipeline...` within around 15 seconds, it means your computer is probably too slow to run Invoiceable. You can buy a VPS, rent cloud hosting, or try on a different computer. 48 | 49 | When you see the `Starting pipeline...` message, modules have been imported. **If this is the first time you've used Invoiceable, the model will be downloaded to your disk.** This may take several minutes on slower connections. Once the model is downloaded, you won't have to download it again. 50 | 51 | When you see the `Server started!` message, that means that the webserver has been successfully started. Navigate to [127.0.0.1:2727](http://127.0.0.1:2727/) in an internet web browser to access your Invoiceable instance. 52 | 53 | To stop your Invoiceable instance, type `^C` (Control C). 54 | 55 | Congratulations, you've successfully started your own Invoiceable instance! 56 | 57 | ## Potential Uses 58 | 59 | * Make accountants' lives easier 60 | 61 | * Parse résumés (not great at that yet...) 62 | 63 | * Experiment with the power of AI 64 | 65 | ## Important Warnings 66 | 67 | * **This application is an educational experiment. It has not been tested extensively.** 68 | 69 | * **This model is only designed to extract data from the text, however it cannot perform advanced actions, such as calculations.** If the model was asked, "How much would I have to pay if the tax was doubled," the model would be unable to answer and likely return an incorrect response. 70 | 71 | * **This model is known to make mistakes.** Do not trust data extracted from this model without prior review. 72 | 73 | ## Disclaimer 74 | 75 | **Assume everything from Invoiceable is inaccurate.** Invoiceable uses an AI model, however this model is often mistaken, off, or completely incorrect. 76 | 77 | ## Credits 78 | 79 | * [`impira/layoutlm-document-qa`](https://huggingface.co/impira/layoutlm-document-qa) 80 | * [`Multipage PDF to JPEG Image Conversion in Python`](https://mtyurt.net/post/2019/multipage-pdf-to-jpeg-image-in-python.html) 81 | 82 | ## License 83 | 84 | Please refer to the `LICENSE` file for licensing information. 85 | -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- 1 | print('Importing modules...') 2 | import flask 3 | import os 4 | from flask import Flask, render_template, request, session, redirect, url_for 5 | from waitress import serve 6 | from werkzeug.utils import secure_filename 7 | from pdf2image import convert_from_path 8 | import uuid 9 | import tempfile 10 | from PIL import Image 11 | from transformers import pipeline 12 | print('Starting pipeline...') 13 | nlp = pipeline("document-question-answering", model="impira/layoutlm-document-qa") 14 | app = Flask(__name__) 15 | app.config['TEMPLATES_AUTO_RELOAD'] = True 16 | ALLOWED_EXTENSIONS = {'pdf', 'png', 'jpg', 'jpeg', 'gif', 'tiff'} 17 | def allowed_file(filename): 18 | return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS 19 | # https://mtyurt.net/post/2019/multipage-pdf-to-jpeg-image-in-python.html 20 | 21 | def convert_pdf(file_path, output_path): 22 | # save temp image files in temp dir, delete them after we are finished 23 | with tempfile.TemporaryDirectory() as temp_dir: 24 | 25 | # convert pdf to multiple image 26 | images = convert_from_path(file_path, output_folder=temp_dir) 27 | 28 | # save images to temporary directory 29 | temp_images = [] 30 | for i in range(len(images)): 31 | image_path = f'{temp_dir}/{i}.jpg' 32 | images[i].save(image_path, 'JPEG') 33 | temp_images.append(image_path) 34 | 35 | # read images into pillow.Image 36 | imgs = list(map(Image.open, temp_images)) 37 | 38 | # find minimum width of images 39 | min_img_width = min(i.width for i in imgs) 40 | 41 | # find total height of all images 42 | total_height = 0 43 | for i, img in enumerate(imgs): 44 | total_height += imgs[i].height 45 | 46 | # create new image object with width and total height 47 | merged_image = Image.new(imgs[0].mode, (min_img_width, total_height)) 48 | 49 | # paste images together one by one 50 | y = 0 51 | for img in imgs: 52 | merged_image.paste(img, (0, y)) 53 | y += img.height 54 | 55 | # save merged image 56 | merged_image.save(output_path) 57 | 58 | return output_path 59 | ### 60 | @app.route('/') 61 | def home(): 62 | return render_template('index.html') 63 | @app.route('/invoice', methods=['GET', 'POST']) 64 | def upload(): 65 | if request.method == 'POST': 66 | formdata = request.form 67 | else: 68 | formdata = request.args 69 | if request.method == 'POST': 70 | question = formdata['question'] 71 | # check if the post request has the file part 72 | file = request.files['file'] 73 | # If the user does not select a file, the browser submits an 74 | # empty file without a filename. 75 | if file and allowed_file(file.filename): 76 | fileid = str(uuid.uuid4()) 77 | ext = str(file.filename.rsplit('.', 1)[1].lower()) 78 | if (ext == 'pdf'): 79 | file.save(os.path.join('uploads', fileid + '.pdf')) 80 | filename = secure_filename(fileid + '.png') 81 | convert_pdf('uploads/' + fileid + '.pdf', 'uploads/' + filename) 82 | os.remove('uploads/' + fileid + '.pdf') 83 | else: 84 | filename = fileid + '.' + ext 85 | filename = secure_filename(filename) 86 | file.save(os.path.join('uploads', filename)) 87 | answers = nlp('uploads/' + filename, question) 88 | if len(answers) == 0: 89 | os.remove(os.path.join(os.getcwd(), 'uploads/' + filename)) 90 | return 'Sorry, I can\'t find the answer for that question. Please try again with a different document.' 91 | answer = answers[0] 92 | os.remove(os.path.join(os.getcwd(), 'uploads/' + filename)) 93 | return 'I am ' + str(answer['score'] * 100) + '% sure that the answer is: ' + str(answer['answer']) + '.' 94 | # {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15} 95 | 96 | # return redirect(url_for('upload', name=filename)) 97 | 98 | print('Server started!') 99 | serve(app, host='0.0.0.0', port=2717) 100 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | flask 2 | waitress 3 | werkzeug 4 | pdf2image 5 | pillow 6 | pytesseract 7 | transformers 8 | torch -------------------------------------------------------------------------------- /templates/index.html: -------------------------------------------------------------------------------- 1 | Invoiceable

Step 1: Upload your Invoice

The information may not be accurate. Do not rely upon Invoiceable as a reliable data source. Invoiceable takes no liability for any damage or mistakes caused through the use of this software. Invoiceable is open sourced! Check out the GitHub project!

2 | -------------------------------------------------------------------------------- /uploads/README.txt: -------------------------------------------------------------------------------- 1 | Uploads are stored here. 2 | --------------------------------------------------------------------------------