├── .gitignore
├── LICENSE
├── README.md
├── main.py
├── requirements.txt
├── templates
└── index.html
└── uploads
└── README.txt
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | uploads/*
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
4 |
5 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
6 |
7 | THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
8 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Invoiceable
2 |
3 |
4 |
5 | Watch the Demo Video
6 |
7 | [Watch the Demo](https://user-images.githubusercontent.com/76186054/214171508-8ef2e3c1-f3fe-46f7-ad6d-ba3677c762f1.mp4)
8 |
9 |
10 |
11 |
12 | ## Introduction
13 |
14 | Invoiceable is a free and open-sourced Flask application that uses AI, Tesseract OCR, and the open-sourced machine learning model [`impira/layoutlm-document-qa`](https://huggingface.co/impira/layoutlm-document-qa) to parse invoices, documents, résumés, and more.
15 |
16 | Invoiceable is an educational project and should not be relied on. Do not expect updates, bugfixes, or major changes to this abandonware.
17 |
18 | **Do not rely on Invoiceable because it is often inaccurate.**
19 |
20 | If Invoiceable returns an incorrect response, please don't contact the authors/maintainers of Invoiceable. Invoiceable did not create or train the AI model used in this program.
21 |
22 | ## Requirements
23 |
24 | There's not much that you need to run Invoiceable!
25 |
26 | * [Python 3.9+](https://www.python.org/)
27 |
28 | * A computer that can run AI models
29 |
30 | * ~5 GB of disk space - you probably don't need that much, but it's better to be on the safe side.
31 |
32 | * [Git](https://git-scm.com/)
33 |
34 | * [Flask](https://flask.palletsprojects.com/)
35 |
36 | ## Installation/Usage
37 |
38 | Installation is very simple. Run the following commands in Terminal or PowerShell:
39 |
40 | ```bash
41 | git clone https://github.com/fakerybakery/Invoiceable.git
42 | cd Invoiceable
43 | pip3 install -r requirements.txt
44 | python3 main.py
45 | ```
46 |
47 | On startup, you should see `Importing modules...`. Depending on the speed of your computer, this should be relatively fast. If you don't see `Starting pipeline...` within around 15 seconds, it means your computer is probably too slow to run Invoiceable. You can buy a VPS, rent cloud hosting, or try on a different computer.
48 |
49 | When you see the `Starting pipeline...` message, modules have been imported. **If this is the first time you've used Invoiceable, the model will be downloaded to your disk.** This may take several minutes on slower connections. Once the model is downloaded, you won't have to download it again.
50 |
51 | When you see the `Server started!` message, that means that the webserver has been successfully started. Navigate to [127.0.0.1:2727](http://127.0.0.1:2727/) in an internet web browser to access your Invoiceable instance.
52 |
53 | To stop your Invoiceable instance, type `^C` (Control C).
54 |
55 | Congratulations, you've successfully started your own Invoiceable instance!
56 |
57 | ## Potential Uses
58 |
59 | * Make accountants' lives easier
60 |
61 | * Parse résumés (not great at that yet...)
62 |
63 | * Experiment with the power of AI
64 |
65 | ## Important Warnings
66 |
67 | * **This application is an educational experiment. It has not been tested extensively.**
68 |
69 | * **This model is only designed to extract data from the text, however it cannot perform advanced actions, such as calculations.** If the model was asked, "How much would I have to pay if the tax was doubled," the model would be unable to answer and likely return an incorrect response.
70 |
71 | * **This model is known to make mistakes.** Do not trust data extracted from this model without prior review.
72 |
73 | ## Disclaimer
74 |
75 | **Assume everything from Invoiceable is inaccurate.** Invoiceable uses an AI model, however this model is often mistaken, off, or completely incorrect.
76 |
77 | ## Credits
78 |
79 | * [`impira/layoutlm-document-qa`](https://huggingface.co/impira/layoutlm-document-qa)
80 | * [`Multipage PDF to JPEG Image Conversion in Python`](https://mtyurt.net/post/2019/multipage-pdf-to-jpeg-image-in-python.html)
81 |
82 | ## License
83 |
84 | Please refer to the `LICENSE` file for licensing information.
85 |
--------------------------------------------------------------------------------
/main.py:
--------------------------------------------------------------------------------
1 | print('Importing modules...')
2 | import flask
3 | import os
4 | from flask import Flask, render_template, request, session, redirect, url_for
5 | from waitress import serve
6 | from werkzeug.utils import secure_filename
7 | from pdf2image import convert_from_path
8 | import uuid
9 | import tempfile
10 | from PIL import Image
11 | from transformers import pipeline
12 | print('Starting pipeline...')
13 | nlp = pipeline("document-question-answering", model="impira/layoutlm-document-qa")
14 | app = Flask(__name__)
15 | app.config['TEMPLATES_AUTO_RELOAD'] = True
16 | ALLOWED_EXTENSIONS = {'pdf', 'png', 'jpg', 'jpeg', 'gif', 'tiff'}
17 | def allowed_file(filename):
18 | return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
19 | # https://mtyurt.net/post/2019/multipage-pdf-to-jpeg-image-in-python.html
20 |
21 | def convert_pdf(file_path, output_path):
22 | # save temp image files in temp dir, delete them after we are finished
23 | with tempfile.TemporaryDirectory() as temp_dir:
24 |
25 | # convert pdf to multiple image
26 | images = convert_from_path(file_path, output_folder=temp_dir)
27 |
28 | # save images to temporary directory
29 | temp_images = []
30 | for i in range(len(images)):
31 | image_path = f'{temp_dir}/{i}.jpg'
32 | images[i].save(image_path, 'JPEG')
33 | temp_images.append(image_path)
34 |
35 | # read images into pillow.Image
36 | imgs = list(map(Image.open, temp_images))
37 |
38 | # find minimum width of images
39 | min_img_width = min(i.width for i in imgs)
40 |
41 | # find total height of all images
42 | total_height = 0
43 | for i, img in enumerate(imgs):
44 | total_height += imgs[i].height
45 |
46 | # create new image object with width and total height
47 | merged_image = Image.new(imgs[0].mode, (min_img_width, total_height))
48 |
49 | # paste images together one by one
50 | y = 0
51 | for img in imgs:
52 | merged_image.paste(img, (0, y))
53 | y += img.height
54 |
55 | # save merged image
56 | merged_image.save(output_path)
57 |
58 | return output_path
59 | ###
60 | @app.route('/')
61 | def home():
62 | return render_template('index.html')
63 | @app.route('/invoice', methods=['GET', 'POST'])
64 | def upload():
65 | if request.method == 'POST':
66 | formdata = request.form
67 | else:
68 | formdata = request.args
69 | if request.method == 'POST':
70 | question = formdata['question']
71 | # check if the post request has the file part
72 | file = request.files['file']
73 | # If the user does not select a file, the browser submits an
74 | # empty file without a filename.
75 | if file and allowed_file(file.filename):
76 | fileid = str(uuid.uuid4())
77 | ext = str(file.filename.rsplit('.', 1)[1].lower())
78 | if (ext == 'pdf'):
79 | file.save(os.path.join('uploads', fileid + '.pdf'))
80 | filename = secure_filename(fileid + '.png')
81 | convert_pdf('uploads/' + fileid + '.pdf', 'uploads/' + filename)
82 | os.remove('uploads/' + fileid + '.pdf')
83 | else:
84 | filename = fileid + '.' + ext
85 | filename = secure_filename(filename)
86 | file.save(os.path.join('uploads', filename))
87 | answers = nlp('uploads/' + filename, question)
88 | if len(answers) == 0:
89 | os.remove(os.path.join(os.getcwd(), 'uploads/' + filename))
90 | return 'Sorry, I can\'t find the answer for that question. Please try again with a different document.'
91 | answer = answers[0]
92 | os.remove(os.path.join(os.getcwd(), 'uploads/' + filename))
93 | return 'I am ' + str(answer['score'] * 100) + '% sure that the answer is: ' + str(answer['answer']) + '.'
94 | # {'score': 0.9943977, 'answer': 'us-001', 'start': 15, 'end': 15}
95 |
96 | # return redirect(url_for('upload', name=filename))
97 |
98 | print('Server started!')
99 | serve(app, host='0.0.0.0', port=2717)
100 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | flask
2 | waitress
3 | werkzeug
4 | pdf2image
5 | pillow
6 | pytesseract
7 | transformers
8 | torch
--------------------------------------------------------------------------------
/templates/index.html:
--------------------------------------------------------------------------------
1 |
InvoiceableStep 2: Ask Invoiceable
Question:
Hmm... Let me think...
The information may not be accurate. Do not rely upon Invoiceable as a reliable data source. Invoiceable takes no liability for any damage or mistakes caused through the use of this software. Invoiceable is open sourced! Check out the GitHub project!
2 |
--------------------------------------------------------------------------------
/uploads/README.txt:
--------------------------------------------------------------------------------
1 | Uploads are stored here.
2 |
--------------------------------------------------------------------------------