/**"]
70 | }
71 | ]
72 | }
73 |
--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 | "editor.codeActionsOnSave": {
3 | "source.fixAll.eslint": "explicit"
4 | },
5 | "conventionalCommits.scopes": [
6 | "Gathering",
7 | "debugging",
8 | "selectors",
9 | "renovate",
10 | "logging",
11 | "ci",
12 | "docs",
13 | "cli",
14 | "logs",
15 | "lint",
16 | "docker",
17 | "files",
18 | "build",
19 | "package",
20 | "deps"
21 | ],
22 | "js/ts.implicitProjectConfig.experimentalDecorators": true,
23 | "npm.exclude": "**/@(vendor|node_modules|bower_components|dist|static)/**"
24 | }
25 |
--------------------------------------------------------------------------------
/.vscode/tasks.json:
--------------------------------------------------------------------------------
1 | {
2 | "version": "2.0.0",
3 | "tasks": [
4 | {
5 | "label": "node-build",
6 | "type": "npm",
7 | "script": "build",
8 | },
9 | {
10 | "type": "docker-build",
11 | "label": "docker-build",
12 | "platform": "node",
13 | "dockerBuild": {
14 | "dockerfile": "${workspaceFolder}/dockerfile.debug",
15 | "context": "${workspaceFolder}",
16 | "pull": true
17 | },
18 | "dependsOn": [
19 | "node-build"
20 | ]
21 | },
22 | {
23 | "type": "docker-run",
24 | "label": "docker-run: release",
25 | "dependsOn": [
26 | "docker-build"
27 | ],
28 | "platform": "node",
29 | },
30 | {
31 | "type": "docker-run",
32 | "label": "docker-run: debug",
33 | "dependsOn": [
34 | "docker-build"
35 | ],
36 | "dockerRun": {
37 | "portsPublishAll": true,
38 | "env": {
39 | "NODE_ENV": "development"
40 | },
41 | "envFiles": [".env"],
42 | "command": "npm run start:debug",
43 | "ports": [{
44 | "containerPort": 9229,
45 | "hostPort": 9229,
46 | }]
47 | },
48 | "node": {
49 | "enableDebugging": true
50 | }
51 | }
52 | ]
53 | }
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 | # Contributor Covenant Code of Conduct
2 |
3 | ## Our Pledge
4 |
5 | We as members, contributors, and leaders pledge to make participation in our
6 | community a harassment-free experience for everyone, regardless of age, body
7 | size, visible or invisible disability, ethnicity, sex characteristics, gender
8 | identity and expression, level of experience, education, socio-economic status,
9 | nationality, personal appearance, race, religion, or sexual identity
10 | and orientation.
11 |
12 | We pledge to act and interact in ways that contribute to an open, welcoming,
13 | diverse, inclusive, and healthy community.
14 |
15 | ## Our Standards
16 |
17 | Examples of behavior that contributes to a positive environment for our
18 | community include:
19 |
20 | * Demonstrating empathy and kindness toward other people
21 | * Being respectful of differing opinions, viewpoints, and experiences
22 | * Giving and gracefully accepting constructive feedback
23 | * Accepting responsibility and apologizing to those affected by our mistakes,
24 | and learning from the experience
25 | * Focusing on what is best not just for us as individuals, but for the
26 | overall community
27 |
28 | Examples of unacceptable behavior include:
29 |
30 | * The use of sexualized language or imagery, and sexual attention or
31 | advances of any kind
32 | * Trolling, insulting or derogatory comments, and personal or political attacks
33 | * Public or private harassment
34 | * Publishing others' private information, such as a physical or email
35 | address, without their explicit permission
36 | * Other conduct which could reasonably be considered inappropriate in a
37 | professional setting
38 |
39 | ## Enforcement Responsibilities
40 |
41 | Community leaders are responsible for clarifying and enforcing our standards of
42 | acceptable behavior and will take appropriate and fair corrective action in
43 | response to any behavior that they deem inappropriate, threatening, offensive,
44 | or harmful.
45 |
46 | Community leaders have the right and responsibility to remove, edit, or reject
47 | comments, commits, code, wiki edits, issues, and other contributions that are
48 | not aligned to this Code of Conduct, and will communicate reasons for moderation
49 | decisions when appropriate.
50 |
51 | ## Scope
52 |
53 | This Code of Conduct applies within all community spaces, and also applies when
54 | an individual is officially representing the community in public spaces.
55 | Examples of representing our community include using an official e-mail address,
56 | posting via an official social media account, or acting as an appointed
57 | representative at an online or offline event.
58 |
59 | ## Enforcement
60 |
61 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
62 | reported to the community leaders responsible for enforcement at
63 | mfranke87@icloud.com.
64 | All complaints will be reviewed and investigated promptly and fairly.
65 |
66 | All community leaders are obligated to respect the privacy and security of the
67 | reporter of any incident.
68 |
69 | ## Enforcement Guidelines
70 |
71 | Community leaders will follow these Community Impact Guidelines in determining
72 | the consequences for any action they deem in violation of this Code of Conduct:
73 |
74 | ### 1. Correction
75 |
76 | **Community Impact**: Use of inappropriate language or other behavior deemed
77 | unprofessional or unwelcome in the community.
78 |
79 | **Consequence**: A private, written warning from community leaders, providing
80 | clarity around the nature of the violation and an explanation of why the
81 | behavior was inappropriate. A public apology may be requested.
82 |
83 | ### 2. Warning
84 |
85 | **Community Impact**: A violation through a single incident or series
86 | of actions.
87 |
88 | **Consequence**: A warning with consequences for continued behavior. No
89 | interaction with the people involved, including unsolicited interaction with
90 | those enforcing the Code of Conduct, for a specified period of time. This
91 | includes avoiding interactions in community spaces as well as external channels
92 | like social media. Violating these terms may lead to a temporary or
93 | permanent ban.
94 |
95 | ### 3. Temporary Ban
96 |
97 | **Community Impact**: A serious violation of community standards, including
98 | sustained inappropriate behavior.
99 |
100 | **Consequence**: A temporary ban from any sort of interaction or public
101 | communication with the community for a specified period of time. No public or
102 | private interaction with the people involved, including unsolicited interaction
103 | with those enforcing the Code of Conduct, is allowed during this period.
104 | Violating these terms may lead to a permanent ban.
105 |
106 | ### 4. Permanent Ban
107 |
108 | **Community Impact**: Demonstrating a pattern of violation of community
109 | standards, including sustained inappropriate behavior, harassment of an
110 | individual, or aggression toward or disparagement of classes of individuals.
111 |
112 | **Consequence**: A permanent ban from any sort of public interaction within
113 | the community.
114 |
115 | ## Attribution
116 |
117 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
118 | version 2.0, available at
119 | https://www.contributor-covenant.org/version/2/0/code_of_conduct.html.
120 |
121 | Community Impact Guidelines were inspired by [Mozilla's code of conduct
122 | enforcement ladder](https://github.com/mozilla/diversity).
123 |
124 | [homepage]: https://www.contributor-covenant.org
125 |
126 | For answers to common questions about this code of conduct, see the FAQ at
127 | https://www.contributor-covenant.org/faq. Translations are available at
128 | https://www.contributor-covenant.org/translations.
129 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2022 Marco Franke
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | Welcome to docudigger 👋
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 | > Document scraper for getting invoices automagically as pdf (useful for taxes or DMS)
18 |
19 | ### 🏠 [Homepage](https://repo.disane.dev/Disane/docudigger#readme)
20 |
21 | ## Configuration
22 |
23 | All settings can be changed via `CLI`, env variable (even when using docker).
24 |
25 | | Setting | Description | Default value |
26 | | ----------------------- | -------------------------------------------------------------------------------------------------------------------------- | --------------- |
27 | | AMAZON_USERNAME | Your Amazon username | `null` |
28 | | AMAZON_PASSWORD | Your amazon password | `null` |
29 | | AMAZON_TLD | Amazon top level domain | `de` |
30 | | AMAZON_YEAR_FILTER | Only extracts invoices from this year (i.e. 2023) | `2023` |
31 | | AMAZON_PAGE_FILTER | Only extracts invoices from this page (i.e. 2) | `null` |
32 | | ONLY_NEW | Tracks already scraped documents and starts a new run at the last scraped one | `true` |
33 | | FILE_DESTINATION_FOLDER | Destination path for all scraped documents | `./documents/` |
34 | | FILE_FALLBACK_EXTENSION | Fallback extension when no extension can be determined | `.pdf` |
35 | | DEBUG | Debug flag (sets the loglevel to DEBUG) | `false` |
36 | | SUBFOLDER_FOR_PAGES | Creates subfolders for every scraped page/plugin | `false` |
37 | | LOG_PATH | Sets the log path | `./logs/` |
38 | | LOG_LEVEL | Log level (see https://github.com/winstonjs/winston#logging-levels) | `info` |
39 | | RECURRING | Flag for executing the script periodically. Needs 'RECURRING_PATTERN' to be set. Default `true`when using docker container | `false` |
40 | | RECURRING_PATTERN | Cron pattern to execute periodically. Needs RECURRING to true | `*/30 * * * *` |
41 | | TZ | Timezone used for docker enviroments | `Europe/Berlin` |
42 |
43 | ## Install
44 |
45 | ```sh
46 | npm install
47 | ```
48 |
49 | ## Usage
50 |
51 |
52 | ```sh-session
53 | $ npm install -g @disane-dev/docudigger
54 | $ docudigger COMMAND
55 | running command...
56 | $ docudigger (--version)
57 | @disane-dev/docudigger/2.0.7 linux-x64 node-v20.18.0
58 | $ docudigger --help [COMMAND]
59 | USAGE
60 | $ docudigger COMMAND
61 | ...
62 | ```
63 |
64 |
65 | > [!IMPORTANT]
66 | > Don't forget to include `--ignore-scripts` in your install command.
67 |
68 | ## `docudigger scrape all`
69 |
70 | Scrapes all websites periodically (default for docker environment)
71 |
72 | ```
73 | USAGE
74 | $ docudigger scrape all [--json] [--logLevel trace|debug|info|warn|error] [-d] [-l ] [-c -r]
75 |
76 | FLAGS
77 | -c, --recurringCron= [default: * * * * *] Cron pattern to execute periodically
78 | -d, --debug
79 | -l, --logPath= [default: ./logs/] Log path
80 | -r, --recurring
81 | --logLevel=