├── .gitignore ├── README.md ├── app ├── index.htm └── js │ └── app.js └── server ├── .gitignore ├── bank-statement.pdf ├── index.js ├── package-lock.json └── package.json /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Extract PDF Content 2 | 3 | This repository contains example demonstrations on how to use [PDF.js](https://mozilla.github.io/pdf.js/) in conjunction with [Lodash](https://lodash.com/), to extract data from a pdf. 4 | 5 | There are two example applications, a web application to ease data exploration and a CLI application to ease data entry from a node.js application. 6 | 7 | 8 | _NOTE: These are prototypes for further exploration and will need to be customised to a specific use case._ 9 | 10 | 11 | ## Usage 12 | 13 | Choose between: 14 | 15 | - a CLI implementation ideally to be set up and used on a server (requires [node.js](https://nodejs.org) installed) 16 | 17 | ```bash 18 | cd server/ 19 | npm install 20 | node index.js 21 | ``` 22 | - a web application implementation (open app folder, open index.htm in a web browser) 23 | 24 | 25 | ## License 26 | [Apache-2.0](https://choosealicense.com/licenses/apache-2.0/) -------------------------------------------------------------------------------- /app/index.htm: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |
5 | 6 | 7 | 8 |21 | Upload "bank-statement.pdf" provided in the repo 22 |
23 |77 | id 78 | | 79 |81 | Date 82 | | 83 |85 | Amount 86 | | 87 |89 | Description 90 | | 91 |93 | Reconciled 94 | | 95 |97 | Transaction Type 98 | | 99 |
---|