├── .gitignore ├── example.png ├── requirements.txt ├── LICENSE-MIT ├── README.md ├── LICENSE-APACHE └── scan.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.swp 3 | profiles.toml 4 | venv/ 5 | -------------------------------------------------------------------------------- /example.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/dbrgn/pydigitize/HEAD/example.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | docopt<=1.0 2 | sh<2,>=1.09 3 | ocrmypdf<15,>=14 4 | awesome-slugify<2,>=1.6 5 | toml<1,>=0.9 6 | 7 | -------------------------------------------------------------------------------- /LICENSE-MIT: -------------------------------------------------------------------------------- 1 | Copyright (C) 2014-2023 Danilo Bargen 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy of 4 | this software and associated documentation files (the "Software"), to deal in 5 | the Software without restriction, including without limitation the rights to 6 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies 7 | of the Software, and to permit persons to whom the Software is furnished to do 8 | so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # pydigitize 2 | 3 | ![Example screenshot](example.png) 4 | 5 | ## Features 6 | 7 | pydigitize is a simple command-line based tool to scan/archive documents. 8 | 9 | It does the following steps: 10 | 11 | - Scan a document with any scanner that supports SANE (ADF supported) 12 | - Straightening and cleaning of scanned documents 13 | - Run OCR on PDF so that it becomes searchable 14 | - Generate [PDF/A](https://en.wikipedia.org/wiki/PDF/a) file for archival 15 | - Add keywords to the PDF file 16 | 17 | Because you don't want to type the same arguments for every piece of paper that 18 | you scan, pydigitize supports profiles: A profile pre-configures settings like 19 | the output directory, the resolution, whether to run OCR, additional PDF 20 | keywords, etc. You can create a profile for all your invoices for example. Then 21 | every time you get an invoice, you scan it with `./scan.py -p invoice`, done. 22 | 23 | ## Requirements 24 | 25 | - Python 3.x 26 | - OCRmyPDF 27 | - libtiff 28 | - sane 29 | - unpaper 30 | - ghostscript 31 | 32 | ## Usage 33 | 34 | See `./scan.py --help`. 35 | 36 | ## Profiles 37 | 38 | If you want to use profiles, create a `profiles.toml` file in the current 39 | directory. 40 | 41 | For every profile you can specify the following parameters: 42 | 43 | - `path`: The output directory 44 | - `name`: Set a string that will be included in every filename in slugified form 45 | - `ocr`: Whether to run OCR, straightening and cleanup on the scanned document 46 | - `keywords`: List of keywords that will be added to PDF Metadata 47 | 48 | You can also create sub-profiles. They inherit the settings from the parent. 49 | 50 | Example: 51 | 52 | ```toml 53 | [bill] 54 | path = "/home/user/bills/" 55 | name = "bill" 56 | ocr = true 57 | keywords = ["bill"] 58 | 59 | [bill.dentist] 60 | name = "dentist" 61 | keywords = ["bill", "dentist"] 62 | 63 | [drawing] 64 | path = "/home/user/drawings/" 65 | ocr = false 66 | ``` 67 | 68 | Then pass the name of the profile to the `scan.py` command using the `-p` 69 | parameter. 70 | 71 | ./scan.py -p bill.dentist 72 | 73 | You can of course override your parameters: 74 | 75 | ./scan.py -p bill -n amazon 76 | 77 | ## Interactive (Batch) Scanning 78 | 79 | If you want to scan a specific number of pages, use the `-c` argument. 80 | 81 | pydigitize will prompt you to confirm before scanning every page. This is very 82 | useful for example when scanning double-sided documents on a scanner that does 83 | not have a duplex unit, or when scanning a document partially in the ADF and 84 | partially on the flatbed. 85 | 86 | If you don't want manual confirmation, but just want the scanner to scan as 87 | fast as it can, use the `--nowait` argument. 88 | 89 | ## License 90 | 91 | Licensed under either of 92 | 93 | * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or 94 | http://www.apache.org/licenses/LICENSE-2.0) 95 | * MIT license ([LICENSE-MIT](LICENSE-MIT) or 96 | http://opensource.org/licenses/MIT) at your option. 97 | 98 | ### Contribution 99 | 100 | Unless you explicitly state otherwise, any contribution intentionally submitted 101 | for inclusion in the work by you, as defined in the Apache-2.0 license, shall 102 | be dual licensed as above, without any additional terms or conditions. 103 | -------------------------------------------------------------------------------- /LICENSE-APACHE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | -------------------------------------------------------------------------------- /scan.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | """pydigitize. 3 | 4 | Usage: 5 | scan.py [options] [OUTPUT] 6 | 7 | Examples: 8 | scan.py out/ 9 | scan.py out/document.pdf 10 | scan.py out/ -n document -k foo,bar 11 | 12 | Args: 13 | OUTPUT This can either be a filename or a directory name. 14 | 15 | Options: 16 | -h --help Show this help. 17 | --version Show version. 18 | 19 | -p PROFILE The profile to use. 20 | 21 | -n NAME Text that will be incorporated into the filename. 22 | -t DATE Use the specified date string (format: YYYYMMDD) 23 | -k KEYWORDS Comma separated keywords that will be added to the PDF metadata. 24 | 25 | -d DEVICE Set the device. 26 | -r RESOLUTION Set the resolution [default: 300]. 27 | -c PAGES Page count to scan [default: all pages from ADF] 28 | 29 | --no-adf Don't use ADF. By default, ADF is used. 30 | --skip-ocr Don't run OCR / straightening / cleanup step. 31 | --nowait When scanning multiple pages (with the -c parameter), don't 32 | wait for manual confirmation but scan as fast as the scanner 33 | can process the pages. 34 | 35 | --verbose Verbose output 36 | --debug Debug output 37 | 38 | """ 39 | import datetime 40 | import glob 41 | import logging 42 | import os.path 43 | import re 44 | import sys 45 | import tempfile 46 | 47 | import docopt 48 | from sh import cd, mv 49 | from slugify import slugify 50 | import toml 51 | 52 | try: 53 | from sh import scanimage 54 | except ImportError: 55 | print('Error: scanimage command not found. Please install sane.') 56 | sys.exit(1) 57 | 58 | try: 59 | from sh import tiffcp, tiff2pdf 60 | except ImportError: 61 | print('Error: tiffcp / tiff2pdf commands not found. Please install libtiff.') 62 | sys.exit(1) 63 | 64 | try: 65 | from sh import ocrmypdf 66 | except ImportError: 67 | print('Error: ocrmypdf command not found. Please install ocrmypdf.') 68 | sys.exit(1) 69 | 70 | try: 71 | from sh import tesseract # noqa 72 | except ImportError: 73 | print('Error: tesseract command not found. Please install tesseract.') 74 | sys.exit(1) 75 | 76 | try: 77 | from sh import unpaper # noqa 78 | except ImportError: 79 | print('Error: unpaper command not found. Please install unpaper.') 80 | sys.exit(1) 81 | 82 | 83 | logger = logging.getLogger('pydigitize') 84 | 85 | 86 | VALID_RESOLUTIONS = (100, 200, 300, 400, 600) 87 | START_TIME = datetime.datetime.now() 88 | 89 | 90 | def prefix(): 91 | duration = (datetime.datetime.now() - START_TIME).total_seconds() 92 | return '\033[92m\033[1m+\033[0m [{0:>5.2f}s] '.format(duration) 93 | 94 | 95 | class Scan: 96 | 97 | def __init__(self, *, 98 | resolution, 99 | device, 100 | output, 101 | name: str = None, 102 | datestring: str = None, 103 | keywords: str = None, 104 | count: int = None, 105 | nowait: bool = False, 106 | adf: bool = True, 107 | ): 108 | """ 109 | Initialize scan class. 110 | 111 | Added attributes: 112 | 113 | - resolution 114 | - device 115 | - output_path 116 | - keywords 117 | - count 118 | 119 | """ 120 | # Validate and store resolution 121 | def _invalid_res(): 122 | print('Invalid resolution. Please use one of {!r}.'.format(VALID_RESOLUTIONS)) 123 | sys.exit(1) 124 | try: 125 | if int(resolution) not in VALID_RESOLUTIONS: 126 | _invalid_res() 127 | except ValueError: 128 | _invalid_res() 129 | else: 130 | self.resolution = resolution 131 | 132 | # Store device 133 | self.device = device 134 | 135 | # Store keywords 136 | self.keywords = set() if keywords is None else set(keywords) 137 | 138 | # Validate and set timestamp 139 | if datestring is None: 140 | timestamp = START_TIME.strftime('%Y%m%d-%H%M%S') 141 | else: 142 | datestring = re.sub(r'[^0-9]', '', datestring) 143 | timestamp = datestring or START_TIME.strftime('%Y%m%d-%H%M%S') 144 | 145 | # Validate and store output path 146 | if os.path.isdir(output): 147 | if name is None: 148 | filename = '{}.pdf'.format(timestamp) 149 | else: 150 | filename = '{}-{}.pdf'.format(timestamp, slugify(name, to_lower=True)) 151 | output_path = os.path.join(output, filename) 152 | elif os.path.dirname(output) == '' or os.path.isdir(os.path.dirname(output)): 153 | output_path = output 154 | else: 155 | print('Output directory "{}" must already exist.'.format(output)) 156 | sys.exit(1) 157 | self.output_path = os.path.abspath(output_path) 158 | logger.debug('Output path: %s', self.output_path) 159 | 160 | # ADF 161 | self.adf = adf 162 | 163 | # Store page count 164 | self.count = count 165 | self.nowait = nowait 166 | 167 | def prepare_directories(self): 168 | """ 169 | Prepare the temporary output directories. 170 | 171 | Added attributes: 172 | 173 | - workdir 174 | 175 | """ 176 | print(prefix() + 'Creating temporary directory...') 177 | self.workdir = tempfile.mkdtemp(prefix='pydigitize-') 178 | 179 | def scan_pages(self): 180 | """ 181 | Scan pages using ``scanimage``. 182 | """ 183 | def _scan_page(number: int = None): 184 | if number is None: 185 | print(prefix() + 'Scanning all pages...') 186 | else: 187 | print(prefix() + 'Scanning page %d/%d...' % (number + 1, self.count)) 188 | scanimage_args = { 189 | 'x': 210, 'y': 297, 190 | 'batch': 'out%d.tif', 191 | 'batch-start': '1000', # Avoid issues with sorting (e.g. out10 < out2) 192 | 'format': 'tiff', 193 | 'resolution': self.resolution, 194 | '_ok_code': [0, 7], 195 | } 196 | scanimage_args['source'] = 'ADF' if self.adf else 'Flatbed' 197 | if self.device is not None: 198 | scanimage_args['device_name'] = self.device 199 | if number is not None: 200 | scanimage_args['batch-start'] = number 201 | scanimage_args['batch-count'] = 1 202 | logger.debug('Scanimage args: %r' % scanimage_args) 203 | 204 | scanimage(**scanimage_args) 205 | 206 | if self.count: 207 | for i in range(self.count): 208 | _scan_page(i) 209 | if not self.nowait and i < (self.count - 1): 210 | try: 211 | msg = 'Press to scan page %d (or to abort)' 212 | input(prefix() + msg % (i + 2)) 213 | except KeyboardInterrupt: 214 | print() 215 | print(prefix() + 'Aborting.') 216 | sys.exit(1) 217 | else: 218 | _scan_page(None) 219 | 220 | def combine_tiffs(self): 221 | """ 222 | Combine tiffs into single multi-page tiff. 223 | """ 224 | print(prefix() + 'Combining image files...') 225 | files = sorted(glob.glob('out*.tif')) 226 | logger.debug('Joining %r', files) 227 | tiffcp(files, 'output.tif', c='lzw') 228 | 229 | def convert_tiff_to_pdf(self): 230 | """ 231 | Convert tiff to pdf. 232 | 233 | TODO: use convert instead? 234 | 235 | """ 236 | print(prefix() + 'Converting to PDF...') 237 | tiff2pdf('output.tif', p='A4', o='output.pdf') 238 | 239 | def do_ocr(self): 240 | """ 241 | Do character recognition (OCR) with ``ocrmypdf``. 242 | """ 243 | print(prefix() + 'Running OCR...') 244 | args = ['-l', 'deu', '-d', '-c'] 245 | if self.keywords: 246 | args.extend(['--keywords', ','.join(self.keywords)]) 247 | args.extend(['output.pdf', 'clean.pdf']) 248 | ocrmypdf(*args) 249 | 250 | def process(self, *, skip_ocr=False): 251 | # Prepare directories 252 | self.prepare_directories() 253 | cd(self.workdir) 254 | 255 | # Scan pages 256 | self.scan_pages() 257 | 258 | # Combine tiffs into single multi-page tiff 259 | self.combine_tiffs() 260 | 261 | # Convert tiff to pdf 262 | self.convert_tiff_to_pdf() 263 | 264 | # Do OCR 265 | if skip_ocr is False: 266 | self.do_ocr() 267 | filename = 'clean.pdf' 268 | else: 269 | filename = 'output.pdf' 270 | 271 | # Move file 272 | print(prefix() + 'Moving resulting file...') 273 | cd('..') 274 | mv('{}/{}'.format(self.workdir, filename), self.output_path) 275 | 276 | print('\nDone: %s' % self.output_path) 277 | 278 | 279 | if __name__ == '__main__': 280 | args = docopt.docopt(__doc__, version='pydigitize 0.1') 281 | if args['--debug']: 282 | logging.basicConfig(level=logging.DEBUG) 283 | elif args['--verbose']: 284 | logging.basicConfig(level=logging.INFO) 285 | else: 286 | logging.basicConfig(level=logging.WARNING) 287 | 288 | logger.debug('Command line args: %r' % args) 289 | 290 | default_output = tempfile.mkdtemp(prefix='pydigitize-', suffix='-out') 291 | 292 | # Default args 293 | kwargs = { 294 | 'output': default_output, 295 | 'keywords': {'pydigitize'}, 296 | } 297 | skip_ocr = False 298 | 299 | # Process profile 300 | if args['-p'] is not None: 301 | # Load profiles 302 | with open('profiles.toml', 'r') as conffile: 303 | profiles = toml.loads(conffile.read()) 304 | 305 | # Create list of all profiles 306 | all_profiles = [] 307 | 308 | def _parse_profile(k, v, prefix=None): 309 | if isinstance(v, dict): 310 | if prefix is None: 311 | new_prefix = k 312 | else: 313 | new_prefix = '%s.%s' % (prefix, k) 314 | all_profiles.append(new_prefix) 315 | for kk, vv in v.items(): 316 | _parse_profile(kk, vv, new_prefix) 317 | 318 | for k, v in profiles.items(): 319 | _parse_profile(k, v) 320 | 321 | # Find profile 322 | profile = profiles 323 | profile_name = args['-p'] 324 | profile_parts = profile_name.split('.') 325 | for part in profile_parts: 326 | found = profile.get(part) 327 | if found is None: 328 | print('Profile not found: {}'.format(profile_name)) 329 | print('\nAvailable profiles:') 330 | for name in sorted(all_profiles): 331 | print(' - %s' % name) 332 | sys.exit(1) 333 | profile = found 334 | 335 | # Update args 336 | if 'path' in profile: 337 | kwargs['output'] = profile['path'] 338 | if 'name' in profile: 339 | kwargs['name'] = profile['name'] 340 | if 'ocr' in profile: 341 | skip_ocr = not bool(profile['ocr']) 342 | if 'keywords' in profile: 343 | kwargs['keywords'].update(profile['keywords']) 344 | 345 | # Argument overrides 346 | kwargs['resolution'] = args['-r'] 347 | kwargs['device'] = args['-d'] 348 | if args['OUTPUT']: 349 | kwargs['output'] = args['OUTPUT'] 350 | if args['--no-adf'] is True: 351 | kwargs['adf'] = False 352 | if args['--skip-ocr'] is True: 353 | skip_ocr = True 354 | if args['-n']: 355 | kwargs['name'] = args['-n'] 356 | if args['-t']: 357 | kwargs['datestring'] = args['-t'] 358 | if args['-k']: 359 | keywords = [k.strip() for k in args.get('-k', '').split(',')] 360 | kwargs['keywords'].update(keywords) 361 | if args['-c']: 362 | if args['-c'] == 'all pages from ADF': 363 | kwargs['count'] = None 364 | else: 365 | try: 366 | kwargs['count'] = int(args['-c']) 367 | except ValueError: 368 | print('Invalid argument to "-c": %r -> must be numeric!' % args['-c']) 369 | sys.exit(1) 370 | kwargs['nowait'] = args['--nowait'] 371 | 372 | print(' ____') 373 | print(' ________________________/ O \___/') 374 | print(' <_/_\_/_\_/_\_/_\_/_\_/_______/ \\\n') 375 | 376 | scan = Scan(**kwargs) 377 | scan.process(skip_ocr=skip_ocr) 378 | --------------------------------------------------------------------------------