├── .gitignore ├── AUTHORS.rst ├── CHANGES.rst ├── CONTRIBUTING.rst ├── HISTORY.rst ├── LICENSE.txt ├── MANIFEST.in ├── README.rst ├── TODO.rst ├── dist └── README.txt ├── docs ├── Makefile ├── authors.rst ├── conf.py ├── contributing.rst ├── history.rst ├── index.rst ├── installation.rst ├── make.bat ├── readme.rst ├── todo.rst └── usage.rst ├── fabfile.py ├── first_setup.zsh ├── requirements.txt ├── scanpdf.egg-info ├── PKG-INFO ├── SOURCES.txt ├── dependency_links.txt ├── entry_points.txt ├── requires.txt ├── top_level.txt └── zip-safe ├── scanpdf ├── __init__.py ├── scanpdf.py └── version.py ├── setup.py ├── test ├── COVERAGE.rst └── test_scanpdf.py └── tox.ini /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | .* 3 | *~ 4 | -------------------------------------------------------------------------------- /AUTHORS.rst: -------------------------------------------------------------------------------- 1 | ======= 2 | Credits 3 | ======= 4 | 5 | Development Lead 6 | ---------------- 7 | 8 | * Virantha N. Ekanayake 9 | 10 | Contributors 11 | ------------ 12 | 13 | None yet. Why not be the first? -------------------------------------------------------------------------------- /CHANGES.rst: -------------------------------------------------------------------------------- 1 | ======= ======== ====== 2 | Version Date Changes 3 | ------- -------- ------ 4 | 5 | v0.3.0 8/25/14 Allow arbitrary page sizes and auto-crops 6 | v0.1.0 1/1/14 First release 7 | ======= ======== ====== 8 | -------------------------------------------------------------------------------- /CONTRIBUTING.rst: -------------------------------------------------------------------------------- 1 | ============ 2 | Contributing 3 | ============ 4 | 5 | Contributions are welcome, and they are greatly appreciated! Every 6 | little bit helps, and credit will always be given. 7 | 8 | You can contribute in many ways: 9 | 10 | Types of Contributions 11 | ---------------------- 12 | 13 | Report Bugs 14 | ~~~~~~~~~~~ 15 | 16 | Report bugs at https://github.com/virantha/airframe/issues. 17 | 18 | If you are reporting a bug, please include: 19 | 20 | * Your operating system name and version. 21 | * Any details about your local setup that might be helpful in troubleshooting. 22 | * Detailed steps to reproduce the bug. 23 | 24 | Fix Bugs 25 | ~~~~~~~~ 26 | 27 | Look through the GitHub issues for bugs. Anything tagged with "bug" 28 | is open to whoever wants to implement it. 29 | 30 | Implement Features 31 | ~~~~~~~~~~~~~~~~~~ 32 | 33 | Look through the GitHub issues for features. Anything tagged with "feature" 34 | is open to whoever wants to implement it. 35 | 36 | Write Documentation 37 | ~~~~~~~~~~~~~~~~~~~ 38 | 39 | Scan PDF could always use more documentation, whether as part of 40 | the official Scan PDF docs, in docstrings, or even on the web in 41 | blog posts, articles, and such. 42 | 43 | Submit Feedback 44 | ~~~~~~~~~~~~~~~ 45 | 46 | The best way to send feedback is to file an issue at https://github.com/virantha/scanpdf/issues. 47 | 48 | If you are proposing a feature: 49 | 50 | * Explain in detail how it would work. 51 | * Keep the scope as narrow as possible, to make it easier to implement. 52 | * Remember that this is a volunteer-driven project, and that contributions 53 | are welcome :) 54 | 55 | Get Started! 56 | ------------ 57 | 58 | Ready to contribute? Here's how to set up `scanpdf` for local development. 59 | 60 | 1. Fork the `scanpdf` repo on GitHub. 61 | 2. Clone your fork locally:: 62 | 63 | $ git clone git@github.com:your_name_here/scanpdf.git 64 | 65 | 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:: 66 | 67 | $ mkvirtualenv scanpdf 68 | $ cd scanpdf/ 69 | $ python setup.py develop 70 | 71 | 4. Create a branch for local development:: 72 | 73 | $ git checkout -b name-of-your-bugfix-or-feature 74 | 75 | Now you can make your changes locally. 76 | 77 | 5. When you're done making changes, check that your changes pass tests:: 78 | 79 | $ fab run_tests 80 | 81 | To get fabric and tox, just pip install them into your virtualenv. 82 | 83 | 6. Commit your changes and push your branch to GitHub:: 84 | 85 | $ git add . 86 | $ git commit -m "Your detailed description of your changes." 87 | $ git push origin name-of-your-bugfix-or-feature 88 | 89 | 7. Submit a pull request through the GitHub website. 90 | 91 | Pull Request Guidelines 92 | ----------------------- 93 | 94 | Before you submit a pull request, check that it meets these guidelines: 95 | 96 | 1. The pull request should include tests. 97 | 2. If the pull request adds functionality, the docs should be updated. Put 98 | your new functionality into a function with a docstring, and add the 99 | feature to the list in README.rst. 100 | 3. The pull request should work for Python 2.6, 2.7, and 3.3, and for PyPy. Check 101 | https://travis-ci.org/Virantha N. Ekanayake/scanpdf/pull_requests 102 | and make sure that the tests pass for all supported Python versions. 103 | 104 | Tips 105 | ---- 106 | 107 | Anything?:: 108 | -------------------------------------------------------------------------------- /HISTORY.rst: -------------------------------------------------------------------------------- 1 | .. :changelog: 2 | 3 | History 4 | ------- 5 | 6 | 0.1.0 (2013-08-11) 7 | ++++++++++++++++++ 8 | 9 | * First release on PyPI. 10 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [ 2014 ] [ Virantha N. Ekanayake ] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include *.txt 2 | include *.rst 3 | 4 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | Scan PDF - Easy scans in Linux with a document scanner like the Fujitsu ScanSnap 2 | ################################################################################ 3 | 4 | .. image:: http://badge.fury.io/py/scanpdf.png 5 | :target: http://badge.fury.io/py/scanpdf 6 | 7 | .. image:: http://pypip.in/d/scanpdf/badge.png 8 | :target: https://crate.io/packages/scanpdf?version=latest 9 | 10 | 11 | If you're looking for a simple way to use a multi-page scanner and get your 12 | document into a PDF in Linux without any proprietary or commercial software, 13 | then ScanPDF might be the solution. I wrote it to quickly take the Linux SANE 14 | scanner system output image files, and process them into usable PDFs. By 15 | usable, I mean PDFs that maintain their original scanned resolution, omit blank 16 | pages (if you're scanning in duplex mode, for example), preserve color unless 17 | the original is greyscale/black and white, in which case they are intelligently 18 | down-converted to B/W PDFs to save space. 19 | 20 | * Free and open-source software: ASL2 license 21 | * Documentation: http://virantha.github.io/scanpdf/html 22 | * Source: https://github.com/virantha/scanpdf 23 | 24 | Features 25 | -------- 26 | * Uses SANE/scanadf to automatically scan to multi-page compressed PDFs 27 | * `Integrates with ScanBd `_ to respond to hardware button presses 28 | * Automatically removes blank pages. 29 | * Scans in color, and automatically down-converts into 1-bit B/W image for text/greyscale images 30 | * Auto-crops to the proper page size. 31 | 32 | Usage: 33 | ------ 34 | The simplest way to use this is: 35 | 36 | :: 37 | 38 | scanpdf scan pdf 39 | 40 | This will first perform the scan, and then the conversion to PDF. If you want 41 | to split up the scan and the PDF conversion into two separate invocations (for 42 | reasons clarified below), then you can do: 43 | 44 | :: 45 | 46 | scanpdf --tmpdir=tmp scan 47 | scanpdf --tmpdir=tmp pdf 48 | 49 | One reason for the separation might be if you want to keep scanning documents 50 | (very quick) while the post-processing (slower) for the PDF conversion is 51 | taking place in the background. For instance, if you're using the hardware 52 | button on the scanner to initiate scans (as detailed in this_ document), then 53 | you want to return immediately after the scan instead of waiting for the full 54 | conversion to PDF has taken place. 55 | 56 | .. _this: http://virantha.com/2014/03/17/one-touch-scanning-with-fujitsu-scansnap-in-linux/ 57 | 58 | You can optionally use the following switches to control if you're putting pages face up or face down in the auto 59 | document feeder, if you want to skip the blank page processing, adjust the blank page detection threshold, or add 60 | additional post-processing using unpaper_: 61 | 62 | .. _unpaper: http://unpaper.berlios.de 63 | 64 | :: 65 | 66 | --dpi= DPI to scan in [default: 300] 67 | --face-up= Face-up scanning [default: True] 68 | --keep-blanks Don't check for and remove blank pages 69 | --blank-threshold= Percentage of white to be marked as blank [default: 0.97] 70 | --post-process Run unpaper to deskew/clean up 71 | 72 | 73 | Right now, I'm assuming this is getting called via ScanBD, so I don't have the option to manually specify the 74 | scanner. If you really want to use this standalone, for now, please just set the ``SCANBD_DEVICE`` environment 75 | variable to your scanner device name before running this script. 76 | 77 | 78 | Installation 79 | ------------ 80 | :: 81 | 82 | $ pip install scanpdf 83 | 84 | Requires ImageMagick and SANE to be installed, for the command line tools: 85 | 86 | * ``convert`` 87 | * ``identify`` 88 | * ``ps2pdf`` 89 | * ``scanadf`` 90 | 91 | Also requires epstopdf. 92 | 93 | Disclaimer 94 | ---------- 95 | The software is distributed on an "AS IS" BASIS, WITHOUT 96 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 97 | -------------------------------------------------------------------------------- /TODO.rst: -------------------------------------------------------------------------------- 1 | Todo list 2 | ========= 3 | 4 | - Make it more generic in terms of stand-alone usage 5 | - Add docstrings 6 | 7 | -------------------------------------------------------------------------------- /dist/README.txt: -------------------------------------------------------------------------------- 1 | Any binary builds for various platforms go here 2 | -------------------------------------------------------------------------------- /docs/Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for Sphinx documentation 2 | # 3 | 4 | # You can set these variables from the command line. 5 | SPHINXOPTS = 6 | SPHINXBUILD = sphinx-build 7 | PAPER = 8 | BUILDDIR = /Users/virantha/dev/githubdocs/scanpdf 9 | 10 | # User-friendly check for sphinx-build 11 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) 12 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/) 13 | endif 14 | 15 | # Internal variables. 16 | PAPEROPT_a4 = -D latex_paper_size=a4 17 | PAPEROPT_letter = -D latex_paper_size=letter 18 | ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 19 | # the i18n builder cannot share the environment and doctrees with the others 20 | I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . 21 | 22 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext 23 | 24 | help: 25 | @echo "Please use \`make ' where is one of" 26 | @echo " html to make standalone HTML files" 27 | @echo " dirhtml to make HTML files named index.html in directories" 28 | @echo " singlehtml to make a single large HTML file" 29 | @echo " pickle to make pickle files" 30 | @echo " json to make JSON files" 31 | @echo " htmlhelp to make HTML files and a HTML help project" 32 | @echo " qthelp to make HTML files and a qthelp project" 33 | @echo " devhelp to make HTML files and a Devhelp project" 34 | @echo " epub to make an epub" 35 | @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" 36 | @echo " latexpdf to make LaTeX files and run them through pdflatex" 37 | @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" 38 | @echo " text to make text files" 39 | @echo " man to make manual pages" 40 | @echo " texinfo to make Texinfo files" 41 | @echo " info to make Texinfo files and run them through makeinfo" 42 | @echo " gettext to make PO message catalogs" 43 | @echo " changes to make an overview of all changed/added/deprecated items" 44 | @echo " xml to make Docutils-native XML files" 45 | @echo " pseudoxml to make pseudoxml-XML files for display purposes" 46 | @echo " linkcheck to check all external links for integrity" 47 | @echo " doctest to run all doctests embedded in the documentation (if enabled)" 48 | 49 | clean: 50 | rm -rf $(BUILDDIR)/* 51 | 52 | html: 53 | $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html 54 | @echo 55 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." 56 | 57 | dirhtml: 58 | $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml 59 | @echo 60 | @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." 61 | 62 | singlehtml: 63 | $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml 64 | @echo 65 | @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." 66 | 67 | pickle: 68 | $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle 69 | @echo 70 | @echo "Build finished; now you can process the pickle files." 71 | 72 | json: 73 | $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json 74 | @echo 75 | @echo "Build finished; now you can process the JSON files." 76 | 77 | htmlhelp: 78 | $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp 79 | @echo 80 | @echo "Build finished; now you can run HTML Help Workshop with the" \ 81 | ".hhp project file in $(BUILDDIR)/htmlhelp." 82 | 83 | qthelp: 84 | $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp 85 | @echo 86 | @echo "Build finished; now you can run "qcollectiongenerator" with the" \ 87 | ".qhcp project file in $(BUILDDIR)/qthelp, like this:" 88 | @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/pypdfocr.qhcp" 89 | @echo "To view the help file:" 90 | @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/pypdfocr.qhc" 91 | 92 | devhelp: 93 | $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp 94 | @echo 95 | @echo "Build finished." 96 | @echo "To view the help file:" 97 | @echo "# mkdir -p $$HOME/.local/share/devhelp/pypdfocr" 98 | @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/pypdfocr" 99 | @echo "# devhelp" 100 | 101 | epub: 102 | $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub 103 | @echo 104 | @echo "Build finished. The epub file is in $(BUILDDIR)/epub." 105 | 106 | latex: 107 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 108 | @echo 109 | @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." 110 | @echo "Run \`make' in that directory to run these through (pdf)latex" \ 111 | "(use \`make latexpdf' here to do that automatically)." 112 | 113 | latexpdf: 114 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 115 | @echo "Running LaTeX files through pdflatex..." 116 | $(MAKE) -C $(BUILDDIR)/latex all-pdf 117 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 118 | 119 | latexpdfja: 120 | $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex 121 | @echo "Running LaTeX files through platex and dvipdfmx..." 122 | $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja 123 | @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." 124 | 125 | text: 126 | $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text 127 | @echo 128 | @echo "Build finished. The text files are in $(BUILDDIR)/text." 129 | 130 | man: 131 | $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man 132 | @echo 133 | @echo "Build finished. The manual pages are in $(BUILDDIR)/man." 134 | 135 | texinfo: 136 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 137 | @echo 138 | @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." 139 | @echo "Run \`make' in that directory to run these through makeinfo" \ 140 | "(use \`make info' here to do that automatically)." 141 | 142 | info: 143 | $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo 144 | @echo "Running Texinfo files through makeinfo..." 145 | make -C $(BUILDDIR)/texinfo info 146 | @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." 147 | 148 | gettext: 149 | $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale 150 | @echo 151 | @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." 152 | 153 | changes: 154 | $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes 155 | @echo 156 | @echo "The overview file is in $(BUILDDIR)/changes." 157 | 158 | linkcheck: 159 | $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck 160 | @echo 161 | @echo "Link check complete; look for any errors in the above output " \ 162 | "or in $(BUILDDIR)/linkcheck/output.txt." 163 | 164 | doctest: 165 | $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest 166 | @echo "Testing of doctests in the sources finished, look at the " \ 167 | "results in $(BUILDDIR)/doctest/output.txt." 168 | 169 | xml: 170 | $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml 171 | @echo 172 | @echo "Build finished. The XML files are in $(BUILDDIR)/xml." 173 | 174 | pseudoxml: 175 | $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml 176 | @echo 177 | @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." -------------------------------------------------------------------------------- /docs/authors.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../AUTHORS.rst -------------------------------------------------------------------------------- /docs/conf.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | # 3 | # scanpdf documentation build configuration file, created by 4 | # sphinx-quickstart on Wed Oct 23 13:43:29 2013. 5 | # 6 | # This file is execfile()d with the current directory set to its 7 | # containing dir. 8 | # 9 | # Note that not all possible configuration values are present in this 10 | # autogenerated file. 11 | # 12 | # All configuration values have a default; values that are commented out 13 | # serve to show the default. 14 | 15 | import sys 16 | import os 17 | import pkg_resources 18 | 19 | # If extensions (or modules to document with autodoc) are in another directory, 20 | # add these directories to sys.path here. If the directory is relative to the 21 | # documentation root, use os.path.abspath to make it absolute, like shown here. 22 | #sys.path.insert(0, os.path.abspath('.')) 23 | 24 | # -- General configuration ------------------------------------------------ 25 | 26 | # If your documentation needs a minimal Sphinx version, state it here. 27 | #needs_sphinx = '1.0' 28 | 29 | # Add any Sphinx extension module names here, as strings. They can be 30 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom 31 | # ones. 32 | extensions = [ 33 | 'sphinx.ext.autodoc', 34 | 'sphinx.ext.viewcode', 35 | ] 36 | 37 | # Add any paths that contain templates here, relative to this directory. 38 | templates_path = ['_templates'] 39 | 40 | # The suffix of source filenames. 41 | source_suffix = '.rst' 42 | 43 | # The encoding of source files. 44 | #source_encoding = 'utf-8-sig' 45 | 46 | # The master toctree document. 47 | master_doc = 'index' 48 | 49 | # General information about the project. 50 | project = u'Scan PDF' 51 | copyright = u'2014, Virantha N. Ekanayake' 52 | 53 | # The version info for the project you're documenting, acts as replacement for 54 | # |version| and |release|, also used in various other places throughout the 55 | # built documents. 56 | # 57 | # The short X.Y version. 58 | version = '' 59 | try: 60 | release = pkg_resources.get_distribution('scanpdf').version 61 | except pkg_resources.DistributionNotFound: 62 | print 'To build the documentation, The distribution information of scanpdf' 63 | print 'Has to be available. Either install the package into your' 64 | print 'development environment or run "setup.py develop" to setup the' 65 | print 'metadata. A virtualenv is recommended!' 66 | sys.exit(1) 67 | del pkg_resources 68 | 69 | version = '.'.join(release.split('.')[:2]) 70 | # The full version, including alpha/beta/rc tags. 71 | 72 | # The language for content autogenerated by Sphinx. Refer to documentation 73 | # for a list of supported languages. 74 | #language = None 75 | 76 | # There are two options for replacing |today|: either, you set today to some 77 | # non-false value, then it is used: 78 | #today = '' 79 | # Else, today_fmt is used as the format for a strftime call. 80 | #today_fmt = '%B %d, %Y' 81 | 82 | # List of patterns, relative to source directory, that match files and 83 | # directories to ignore when looking for source files. 84 | exclude_patterns = ['_build'] 85 | 86 | # The reST default role (used for this markup: `text`) to use for all 87 | # documents. 88 | #default_role = None 89 | 90 | # If true, '()' will be appended to :func: etc. cross-reference text. 91 | #add_function_parentheses = True 92 | 93 | # If true, the current module name will be prepended to all description 94 | # unit titles (such as .. function::). 95 | #add_module_names = True 96 | 97 | # If true, sectionauthor and moduleauthor directives will be shown in the 98 | # output. They are ignored by default. 99 | #show_authors = False 100 | 101 | # The name of the Pygments (syntax highlighting) style to use. 102 | pygments_style = 'sphinx' 103 | 104 | # A list of ignored prefixes for module index sorting. 105 | #modindex_common_prefix = [] 106 | 107 | # If true, keep warnings as "system message" paragraphs in the built documents. 108 | #keep_warnings = False 109 | 110 | 111 | # -- Options for HTML output ---------------------------------------------- 112 | 113 | # The theme to use for HTML and HTML Help pages. See the documentation for 114 | # a list of builtin themes. 115 | html_theme = 'sphinxdoc' 116 | 117 | # Theme options are theme-specific and customize the look and feel of a theme 118 | # further. For a list of options available for each theme, see the 119 | # documentation. 120 | #html_theme_options = {} 121 | 122 | # Add any paths that contain custom themes here, relative to this directory. 123 | #html_theme_path = [] 124 | 125 | # The name for this set of Sphinx documents. If None, it defaults to 126 | # " v documentation". 127 | #html_title = None 128 | 129 | # A shorter title for the navigation bar. Default is the same as html_title. 130 | #html_short_title = None 131 | 132 | # The name of an image file (relative to this directory) to place at the top 133 | # of the sidebar. 134 | #html_logo = None 135 | 136 | # The name of an image file (within the static path) to use as favicon of the 137 | # docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 138 | # pixels large. 139 | #html_favicon = None 140 | 141 | # Add any paths that contain custom static files (such as style sheets) here, 142 | # relative to this directory. They are copied after the builtin static files, 143 | # so a file named "default.css" will overwrite the builtin "default.css". 144 | html_static_path = ['_static'] 145 | 146 | # Add any extra paths that contain custom files (such as robots.txt or 147 | # .htaccess) here, relative to this directory. These files are copied 148 | # directly to the root of the documentation. 149 | #html_extra_path = [] 150 | 151 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom, 152 | # using the given strftime format. 153 | #html_last_updated_fmt = '%b %d, %Y' 154 | 155 | # If true, SmartyPants will be used to convert quotes and dashes to 156 | # typographically correct entities. 157 | #html_use_smartypants = True 158 | 159 | # Custom sidebar templates, maps document names to template names. 160 | #html_sidebars = {} 161 | 162 | # Additional templates that should be rendered to pages, maps page names to 163 | # template names. 164 | #html_additional_pages = {} 165 | 166 | # If false, no module index is generated. 167 | #html_domain_indices = True 168 | 169 | # If false, no index is generated. 170 | #html_use_index = True 171 | 172 | # If true, the index is split into individual pages for each letter. 173 | #html_split_index = False 174 | 175 | # If true, links to the reST sources are added to the pages. 176 | #html_show_sourcelink = True 177 | 178 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. 179 | #html_show_sphinx = True 180 | 181 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. 182 | #html_show_copyright = True 183 | 184 | # If true, an OpenSearch description file will be output, and all pages will 185 | # contain a tag referring to it. The value of this option must be the 186 | # base URL from which the finished HTML is served. 187 | #html_use_opensearch = '' 188 | 189 | # This is the file name suffix for HTML files (e.g. ".xhtml"). 190 | #html_file_suffix = None 191 | 192 | # Output file base name for HTML help builder. 193 | htmlhelp_basename = 'scanpdfdoc' 194 | 195 | 196 | # -- Options for LaTeX output --------------------------------------------- 197 | 198 | latex_elements = { 199 | # The paper size ('letterpaper' or 'a4paper'). 200 | #'papersize': 'letterpaper', 201 | 202 | # The font size ('10pt', '11pt' or '12pt'). 203 | #'pointsize': '10pt', 204 | 205 | # Additional stuff for the LaTeX preamble. 206 | #'preamble': '', 207 | } 208 | 209 | # Grouping the document tree into LaTeX files. List of tuples 210 | # (source start file, target name, title, 211 | # author, documentclass [howto, manual, or own class]). 212 | latex_documents = [ 213 | ('index', 'scanpdf.tex', u'Scan PDF Documentation', 214 | u'Virantha N. Ekanayake', 'manual'), 215 | ] 216 | 217 | # The name of an image file (relative to this directory) to place at the top of 218 | # the title page. 219 | #latex_logo = None 220 | 221 | # For "manual" documents, if this is true, then toplevel headings are parts, 222 | # not chapters. 223 | #latex_use_parts = False 224 | 225 | # If true, show page references after internal links. 226 | #latex_show_pagerefs = False 227 | 228 | # If true, show URL addresses after external links. 229 | #latex_show_urls = False 230 | 231 | # Documents to append as an appendix to all manuals. 232 | #latex_appendices = [] 233 | 234 | # If false, no module index is generated. 235 | #latex_domain_indices = True 236 | 237 | 238 | # -- Options for manual page output --------------------------------------- 239 | 240 | # One entry per manual page. List of tuples 241 | # (source start file, name, description, authors, manual section). 242 | man_pages = [ 243 | ('index', 'scanpdf', u'Scan PDF Documentation', 244 | [u'Author'], 1) 245 | ] 246 | 247 | # If true, show URL addresses after external links. 248 | #man_show_urls = False 249 | 250 | 251 | # -- Options for Texinfo output ------------------------------------------- 252 | 253 | # Grouping the document tree into Texinfo files. List of tuples 254 | # (source start file, target name, title, author, 255 | # dir menu entry, description, category) 256 | texinfo_documents = [ 257 | ('index', 'scanpdf', u'Scan PDF Documentation', 258 | u'Author', 'scanpdf', 'One line description of project.', 259 | 'Miscellaneous'), 260 | ] 261 | 262 | # Documents to append as an appendix to all manuals. 263 | #texinfo_appendices = [] 264 | 265 | # If false, no module index is generated. 266 | #texinfo_domain_indices = True 267 | 268 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 269 | #texinfo_show_urls = 'footnote' 270 | 271 | # If true, do not generate a @detailmenu in the "Top" node's menu. 272 | #texinfo_no_detailmenu = False 273 | 274 | 275 | # -- Options for Epub output ---------------------------------------------- 276 | 277 | # Bibliographic Dublin Core info. 278 | epub_title = u'scanpdf' 279 | epub_author = u'Author' 280 | epub_publisher = u'Author' 281 | epub_copyright = u'2013, Author' 282 | 283 | # The basename for the epub file. It defaults to the project name. 284 | #epub_basename = u'scanpdf' 285 | 286 | # The HTML theme for the epub output. Since the default themes are not optimized 287 | # for small screen space, using the same theme for HTML and epub output is 288 | # usually not wise. This defaults to 'epub', a theme designed to save visual 289 | # space. 290 | #epub_theme = 'epub' 291 | 292 | # The language of the text. It defaults to the language option 293 | # or en if the language is not set. 294 | #epub_language = '' 295 | 296 | # The scheme of the identifier. Typical schemes are ISBN or URL. 297 | #epub_scheme = '' 298 | 299 | # The unique identifier of the text. This can be a ISBN number 300 | # or the project homepage. 301 | #epub_identifier = '' 302 | 303 | # A unique identification for the text. 304 | #epub_uid = '' 305 | 306 | # A tuple containing the cover image and cover page html template filenames. 307 | #epub_cover = () 308 | 309 | # A sequence of (type, uri, title) tuples for the guide element of content.opf. 310 | #epub_guide = () 311 | 312 | # HTML files that should be inserted before the pages created by sphinx. 313 | # The format is a list of tuples containing the path and title. 314 | #epub_pre_files = [] 315 | 316 | # HTML files shat should be inserted after the pages created by sphinx. 317 | # The format is a list of tuples containing the path and title. 318 | #epub_post_files = [] 319 | 320 | # A list of files that should not be packed into the epub file. 321 | #epub_exclude_files = [] 322 | 323 | # The depth of the table of contents in toc.ncx. 324 | #epub_tocdepth = 3 325 | 326 | # Allow duplicate toc entries. 327 | #epub_tocdup = True 328 | 329 | # Choose between 'default' and 'includehidden'. 330 | #epub_tocscope = 'default' 331 | 332 | # Fix unsupported image types using the PIL. 333 | #epub_fix_images = False 334 | 335 | # Scale large images. 336 | #epub_max_image_width = 0 337 | 338 | # How to display URL addresses: 'footnote', 'no', or 'inline'. 339 | #epub_show_urls = 'inline' 340 | 341 | # If false, no index is generated. 342 | #epub_use_index = True -------------------------------------------------------------------------------- /docs/contributing.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../CONTRIBUTING.rst -------------------------------------------------------------------------------- /docs/history.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../HISTORY.rst 2 | 3 | -------------------------------------------------------------------------------- /docs/index.rst: -------------------------------------------------------------------------------- 1 | .. documentation master file, created by 2 | sphinx-quickstart on Wed Oct 23 13:43:29 2013. 3 | You can adapt this file completely to your liking, but it should at least 4 | contain the root `toctree` directive. 5 | 6 | Scan PDF API Reference (version |release|) 7 | ==================================== 8 | 9 | Contents: 10 | 11 | .. toctree:: 12 | :maxdepth: 2 13 | 14 | readme 15 | contributing 16 | authors 17 | history 18 | todo 19 | 20 | 21 | Testing 22 | ================ 23 | `Coverage `_ 24 | 25 | 26 | Indices and tables 27 | ================== 28 | 29 | * :ref:`genindex` 30 | * :ref:`modindex` 31 | * :ref:`search` 32 | -------------------------------------------------------------------------------- /docs/installation.rst: -------------------------------------------------------------------------------- 1 | ============ 2 | Installation 3 | ============ 4 | 5 | At the command line:: 6 | 7 | $ pip scanpdf 8 | -------------------------------------------------------------------------------- /docs/make.bat: -------------------------------------------------------------------------------- 1 | @ECHO OFF 2 | 3 | REM Command file for Sphinx documentation 4 | 5 | if "%SPHINXBUILD%" == "" ( 6 | set SPHINXBUILD=sphinx-build 7 | ) 8 | set BUILDDIR=_build 9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% . 10 | set I18NSPHINXOPTS=%SPHINXOPTS% . 11 | if NOT "%PAPER%" == "" ( 12 | set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS% 13 | set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS% 14 | ) 15 | 16 | if "%1" == "" goto help 17 | 18 | if "%1" == "help" ( 19 | :help 20 | echo.Please use `make ^` where ^ is one of 21 | echo. html to make standalone HTML files 22 | echo. dirhtml to make HTML files named index.html in directories 23 | echo. singlehtml to make a single large HTML file 24 | echo. pickle to make pickle files 25 | echo. json to make JSON files 26 | echo. htmlhelp to make HTML files and a HTML help project 27 | echo. qthelp to make HTML files and a qthelp project 28 | echo. devhelp to make HTML files and a Devhelp project 29 | echo. epub to make an epub 30 | echo. latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter 31 | echo. text to make text files 32 | echo. man to make manual pages 33 | echo. texinfo to make Texinfo files 34 | echo. gettext to make PO message catalogs 35 | echo. changes to make an overview over all changed/added/deprecated items 36 | echo. xml to make Docutils-native XML files 37 | echo. pseudoxml to make pseudoxml-XML files for display purposes 38 | echo. linkcheck to check all external links for integrity 39 | echo. doctest to run all doctests embedded in the documentation if enabled 40 | goto end 41 | ) 42 | 43 | if "%1" == "clean" ( 44 | for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i 45 | del /q /s %BUILDDIR%\* 46 | goto end 47 | ) 48 | 49 | 50 | %SPHINXBUILD% 2> nul 51 | if errorlevel 9009 ( 52 | echo. 53 | echo.The 'sphinx-build' command was not found. Make sure you have Sphinx 54 | echo.installed, then set the SPHINXBUILD environment variable to point 55 | echo.to the full path of the 'sphinx-build' executable. Alternatively you 56 | echo.may add the Sphinx directory to PATH. 57 | echo. 58 | echo.If you don't have Sphinx installed, grab it from 59 | echo.http://sphinx-doc.org/ 60 | exit /b 1 61 | ) 62 | 63 | if "%1" == "html" ( 64 | %SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html 65 | if errorlevel 1 exit /b 1 66 | echo. 67 | echo.Build finished. The HTML pages are in %BUILDDIR%/html. 68 | goto end 69 | ) 70 | 71 | if "%1" == "dirhtml" ( 72 | %SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml 73 | if errorlevel 1 exit /b 1 74 | echo. 75 | echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml. 76 | goto end 77 | ) 78 | 79 | if "%1" == "singlehtml" ( 80 | %SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml 81 | if errorlevel 1 exit /b 1 82 | echo. 83 | echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml. 84 | goto end 85 | ) 86 | 87 | if "%1" == "pickle" ( 88 | %SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle 89 | if errorlevel 1 exit /b 1 90 | echo. 91 | echo.Build finished; now you can process the pickle files. 92 | goto end 93 | ) 94 | 95 | if "%1" == "json" ( 96 | %SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json 97 | if errorlevel 1 exit /b 1 98 | echo. 99 | echo.Build finished; now you can process the JSON files. 100 | goto end 101 | ) 102 | 103 | if "%1" == "htmlhelp" ( 104 | %SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp 105 | if errorlevel 1 exit /b 1 106 | echo. 107 | echo.Build finished; now you can run HTML Help Workshop with the ^ 108 | .hhp project file in %BUILDDIR%/htmlhelp. 109 | goto end 110 | ) 111 | 112 | if "%1" == "qthelp" ( 113 | %SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp 114 | if errorlevel 1 exit /b 1 115 | echo. 116 | echo.Build finished; now you can run "qcollectiongenerator" with the ^ 117 | .qhcp project file in %BUILDDIR%/qthelp, like this: 118 | echo.^> qcollectiongenerator %BUILDDIR%\qthelp\pypdfocr.qhcp 119 | echo.To view the help file: 120 | echo.^> assistant -collectionFile %BUILDDIR%\qthelp\pypdfocr.ghc 121 | goto end 122 | ) 123 | 124 | if "%1" == "devhelp" ( 125 | %SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp 126 | if errorlevel 1 exit /b 1 127 | echo. 128 | echo.Build finished. 129 | goto end 130 | ) 131 | 132 | if "%1" == "epub" ( 133 | %SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub 134 | if errorlevel 1 exit /b 1 135 | echo. 136 | echo.Build finished. The epub file is in %BUILDDIR%/epub. 137 | goto end 138 | ) 139 | 140 | if "%1" == "latex" ( 141 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 142 | if errorlevel 1 exit /b 1 143 | echo. 144 | echo.Build finished; the LaTeX files are in %BUILDDIR%/latex. 145 | goto end 146 | ) 147 | 148 | if "%1" == "latexpdf" ( 149 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 150 | cd %BUILDDIR%/latex 151 | make all-pdf 152 | cd %BUILDDIR%/.. 153 | echo. 154 | echo.Build finished; the PDF files are in %BUILDDIR%/latex. 155 | goto end 156 | ) 157 | 158 | if "%1" == "latexpdfja" ( 159 | %SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex 160 | cd %BUILDDIR%/latex 161 | make all-pdf-ja 162 | cd %BUILDDIR%/.. 163 | echo. 164 | echo.Build finished; the PDF files are in %BUILDDIR%/latex. 165 | goto end 166 | ) 167 | 168 | if "%1" == "text" ( 169 | %SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text 170 | if errorlevel 1 exit /b 1 171 | echo. 172 | echo.Build finished. The text files are in %BUILDDIR%/text. 173 | goto end 174 | ) 175 | 176 | if "%1" == "man" ( 177 | %SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man 178 | if errorlevel 1 exit /b 1 179 | echo. 180 | echo.Build finished. The manual pages are in %BUILDDIR%/man. 181 | goto end 182 | ) 183 | 184 | if "%1" == "texinfo" ( 185 | %SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo 186 | if errorlevel 1 exit /b 1 187 | echo. 188 | echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo. 189 | goto end 190 | ) 191 | 192 | if "%1" == "gettext" ( 193 | %SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale 194 | if errorlevel 1 exit /b 1 195 | echo. 196 | echo.Build finished. The message catalogs are in %BUILDDIR%/locale. 197 | goto end 198 | ) 199 | 200 | if "%1" == "changes" ( 201 | %SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes 202 | if errorlevel 1 exit /b 1 203 | echo. 204 | echo.The overview file is in %BUILDDIR%/changes. 205 | goto end 206 | ) 207 | 208 | if "%1" == "linkcheck" ( 209 | %SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck 210 | if errorlevel 1 exit /b 1 211 | echo. 212 | echo.Link check complete; look for any errors in the above output ^ 213 | or in %BUILDDIR%/linkcheck/output.txt. 214 | goto end 215 | ) 216 | 217 | if "%1" == "doctest" ( 218 | %SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest 219 | if errorlevel 1 exit /b 1 220 | echo. 221 | echo.Testing of doctests in the sources finished, look at the ^ 222 | results in %BUILDDIR%/doctest/output.txt. 223 | goto end 224 | ) 225 | 226 | if "%1" == "xml" ( 227 | %SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml 228 | if errorlevel 1 exit /b 1 229 | echo. 230 | echo.Build finished. The XML files are in %BUILDDIR%/xml. 231 | goto end 232 | ) 233 | 234 | if "%1" == "pseudoxml" ( 235 | %SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml 236 | if errorlevel 1 exit /b 1 237 | echo. 238 | echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml. 239 | goto end 240 | ) 241 | 242 | :end 243 | -------------------------------------------------------------------------------- /docs/readme.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../README.rst -------------------------------------------------------------------------------- /docs/todo.rst: -------------------------------------------------------------------------------- 1 | .. include:: ../TODO.rst 2 | 3 | -------------------------------------------------------------------------------- /docs/usage.rst: -------------------------------------------------------------------------------- 1 | ======== 2 | Usage 3 | ======== 4 | 5 | To use airframe in a project:: 6 | 7 | import airframe -------------------------------------------------------------------------------- /fabfile.py: -------------------------------------------------------------------------------- 1 | from fabric.api import * 2 | import os, sys 3 | 4 | project_dir = os.path.join(os.path.dirname(sys.argv[0])) 5 | 6 | def build_windows_dist(): 7 | if os.name == 'nt': 8 | # Call the pyinstaller 9 | local("python ../pyinstaller/pyinstaller.py scanpdf_windows.spec --onefile") 10 | 11 | 12 | def run_tests(): 13 | test_dir = "test" 14 | with lcd(test_dir): 15 | # Regenerate the test script 16 | local("py.test --genscript=runtests.py") 17 | t = local("py.test --cov-config .coveragerc --cov=scanpdf --cov-report=term --cov-report=html", capture=False) 18 | 19 | with open("test/COVERAGE.rst", "w") as f: 20 | f.write(t) 21 | 22 | 23 | def push_docs(): 24 | """ Build the sphinx docs from develop 25 | And push it to gh-pages 26 | """ 27 | githubpages = "/Users/virantha/dev/githubdocs/scanpdf" 28 | # Convert markdown readme to rst 29 | #local("pandoc README.md -f markdown -t rst -o README.rst") 30 | with lcd(githubpages): 31 | local("git checkout gh-pages") 32 | local("git pull origin gh-pages") 33 | with lcd("docs"): 34 | print("Running sphinx in docs/ and building to ~/dev/githubpages/scanpdf") 35 | local("make clean") 36 | local("make html") 37 | #local("cp -R ../test/htmlcov %s/html/testing" % githubpages) 38 | with lcd(githubpages): 39 | local("git add .") 40 | local('git commit -am "doc update"') 41 | local('git push origin gh-pages') 42 | -------------------------------------------------------------------------------- /first_setup.zsh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env zsh -i 2 | 3 | # Auto-generated by Voodoo 4 | # First-time script for project setup (DELETE ME AFTER RUNNING!) 5 | 6 | DOCS_DIR=~/dev/githubdocs 7 | 8 | print Setting up your virtualenv 9 | 10 | rmvirtualenv scanpdf 11 | if [ $? -ne 0 ]; then 12 | print Removing old virtualenv failed 13 | exit -1 14 | fi 15 | mkvirtualenv scanpdf 16 | if [ $? -ne 0 ]; then 17 | print Making scanpdf Virtual env failed 18 | exit -1 19 | fi 20 | 21 | workon scanpdf 22 | if [ $? -ne 0 ]; then 23 | print Could not switch to scanpdf 24 | exit -1 25 | fi 26 | print Working in virtualenv scanpdf 27 | 28 | 29 | # Set up the pip packages 30 | #pip install pytest mock pytest-cov python-coveralls coverage sphinx tox 31 | pip install sphinx 32 | echo "cd ~/dev/scanpdf" >> ~/dev/envs/scanpdf/bin/postactivate 33 | 34 | # Start python develop 35 | python setup.py develop 36 | 37 | # Initialize the git repo 38 | github_remote='git@github.com:virantha/scanpdf.git' 39 | git init 40 | git remote add origin $github_remote 41 | git add . 42 | git commit -am "Setting up new project scanpdf" 43 | 44 | # Prompt if we want to push to remote git 45 | read -q "REPLY?Create remote repository at $github_remote [y/N]?" 46 | if [[ $REPLY == y ]]; then 47 | curl --data '{"name":"scanpdf", "description":""}' --user "virantha" https://api.github.com/user/repos 48 | fi 49 | 50 | read -q "REPLY?Push to remote repository $github_remote [y/N]?" 51 | if [[ $REPLY == y ]]; then 52 | git push -u origin master 53 | fi 54 | 55 | print 56 | # Create the docs repository 57 | current_dir=`pwd` 58 | read -q "REPLY?Create and push docs to $github_remote [y/N]?" 59 | if [[ $REPLY == y ]]; then 60 | # Go to the docs build dir, and check out our repo 61 | cd $DOCS_DIR 62 | git clone https://github.com/virantha/scanpdf.git 63 | cd scanpdf 64 | git checkout --orphan gh-pages 65 | git rm -rf . 66 | 67 | cd $current_dir/docs 68 | pip install sphinx 69 | make html 70 | cd $DOCS_DIR 71 | cd scanpdf 72 | touch .nojekyll 73 | git add . 74 | git commit -m "docs" 75 | git push origin gh-pages 76 | 77 | fi -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | docopt>=0.6.1 2 | -------------------------------------------------------------------------------- /scanpdf.egg-info/PKG-INFO: -------------------------------------------------------------------------------- 1 | Metadata-Version: 1.0 2 | Name: scanpdf 3 | Version: 0.3.0 4 | Summary: Utility to use SANE/scanadf to scan to PDF 5 | Home-page: UNKNOWN 6 | Author: Virantha N. Ekanayake 7 | Author-email: virantha@gmail.com 8 | License: ASL 2.0 9 | Description: Scan PDF - Easy scans in Linux with a document scanner like the Fujitsu ScanSnap 10 | ################################################################################ 11 | 12 | .. image:: http://badge.fury.io/py/scanpdf.png 13 | :target: http://badge.fury.io/py/scanpdf 14 | 15 | .. image:: http://pypip.in/d/scanpdf/badge.png 16 | :target: https://crate.io/packages/scanpdf?version=latest 17 | 18 | 19 | If you're looking for a simple way to use a multi-page scanner and get your 20 | document into a PDF in Linux without any proprietary or commercial software, 21 | then ScanPDF might be the solution. I wrote it to quickly take the Linux SANE 22 | scanner system output image files, and process them into usable PDFs. By 23 | usable, I mean PDFs that maintain their original scanned resolution, omit blank 24 | pages (if you're scanning in duplex mode, for example), preserve color unless 25 | the original is greyscale/black and white, in which case they are intelligently 26 | down-converted to B/W PDFs to save space. 27 | 28 | * Free and open-source software: ASL2 license 29 | * Documentation: http://virantha.github.io/scanpdf/html 30 | * Source: https://github.com/virantha/scanpdf 31 | 32 | Features 33 | -------- 34 | * Uses SANE/scanadf to automatically scan to multi-page compressed PDFs 35 | * `Integrates with ScanBd `_ to respond to hardware button presses 36 | * Automatically removes blank pages. 37 | * Scans in color, and automatically down-converts into 1-bit B/W image for text/greyscale images 38 | 39 | Usage: 40 | ------ 41 | The simplest way to use this is: 42 | 43 | :: 44 | 45 | scanpdf scan pdf 46 | 47 | This will first perform the scan, and then the conversion to PDF. If you want 48 | to split up the scan and the PDF conversion into two separate invocations (for 49 | reasons clarified below), then you can do: 50 | 51 | :: 52 | 53 | scanpdf --tmpdir=tmp scan 54 | scanpdf --tmpdir=tmp pdf 55 | 56 | One reason for the separation might be if you want to keep scanning documents 57 | (very quick) while the post-processing (slower) for the PDF conversion is 58 | taking place in the background. For instance, if you're using the hardware 59 | button on the scanner to initiate scans (as detailed in this_ document), then 60 | you want to return immediately after the scan instead of waiting for the full 61 | conversion to PDF has taken place. 62 | 63 | .. _this: http://virantha.com/2014/03/17/one-touch-scanning-with-fujitsu-scansnap-in-linux/ 64 | 65 | You can optionally use the following switches to control if you're putting pages face up or face down in the auto 66 | document feeder, if you want to skip the blank page processing, adjust the blank page detection threshold, or add 67 | additional post-processing using unpaper_: 68 | 69 | .. _unpaper: http://unpaper.berlios.de 70 | 71 | :: 72 | 73 | --dpi= DPI to scan in [default: 300] 74 | --face-up= Face-up scanning [default: True] 75 | --keep-blanks Don't check for and remove blank pages 76 | --blank-threshold= Percentage of white to be marked as blank [default: 0.97] 77 | --post-process Run unpaper to deskew/clean up 78 | 79 | 80 | Right now, I'm assuming this is getting called via ScanBD, so I don't have the option to manually specify the 81 | scanner. If you really want to use this standalone, for now, please just set the ``SCANBD_DEVICE`` environment 82 | variable to your scanner device name before running this script. 83 | 84 | 85 | Installation 86 | ------------ 87 | :: 88 | 89 | $ pip install scanpdf 90 | 91 | Requires ImageMagick and SANE to be installed, for the command line tools: 92 | 93 | * ``convert`` 94 | * ``identify`` 95 | * ``ps2pdf`` 96 | * ``scanadf`` 97 | 98 | Also requires epstopdf. 99 | 100 | Disclaimer 101 | ---------- 102 | The software is distributed on an "AS IS" BASIS, WITHOUT 103 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 104 | 105 | Platform: UNKNOWN 106 | -------------------------------------------------------------------------------- /scanpdf.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | AUTHORS.rst 2 | CHANGES.rst 3 | CONTRIBUTING.rst 4 | HISTORY.rst 5 | LICENSE.txt 6 | MANIFEST.in 7 | README.rst 8 | TODO.rst 9 | requirements.txt 10 | setup.py 11 | scanpdf/__init__.py 12 | scanpdf/scanpdf.py 13 | scanpdf/version.py 14 | scanpdf.egg-info/PKG-INFO 15 | scanpdf.egg-info/SOURCES.txt 16 | scanpdf.egg-info/dependency_links.txt 17 | scanpdf.egg-info/entry_points.txt 18 | scanpdf.egg-info/requires.txt 19 | scanpdf.egg-info/top_level.txt 20 | scanpdf.egg-info/zip-safe 21 | test/test_scanpdf.py -------------------------------------------------------------------------------- /scanpdf.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /scanpdf.egg-info/entry_points.txt: -------------------------------------------------------------------------------- 1 | [console_scripts] 2 | scanpdf = scanpdf.scanpdf:main 3 | 4 | -------------------------------------------------------------------------------- /scanpdf.egg-info/requires.txt: -------------------------------------------------------------------------------- 1 | docopt>=0.6.1 -------------------------------------------------------------------------------- /scanpdf.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | scanpdf 2 | -------------------------------------------------------------------------------- /scanpdf.egg-info/zip-safe: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /scanpdf/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/virantha/scanpdf/82eef134957b2eed5444b5b1cecd9e3a86b8ac0c/scanpdf/__init__.py -------------------------------------------------------------------------------- /scanpdf/scanpdf.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python2.7 2 | # Copyright 2014 Virantha Ekanayake All Rights Reserved. 3 | # 4 | # Licensed under the Apache License, Version 2.0 (the "License"); 5 | # you may not use this file except in compliance with the License. 6 | # You may obtain a copy of the License at 7 | # 8 | # http://www.apache.org/licenses/LICENSE-2.0 9 | # 10 | # Unless required by applicable law or agreed to in writing, software 11 | # distributed under the License is distributed on an "AS IS" BASIS, 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 13 | # See the License for the specific language governing permissions and 14 | # limitations under the License. 15 | """Scan to PDF. 16 | 17 | Usage: 18 | scanpdf [options] scan 19 | scanpdf [options] pdf 20 | scanpdf [options] scan pdf 21 | 22 | 23 | Options: 24 | -v --verbose Verbose logging 25 | -d --debug Debug logging 26 | --dpi= DPI to scan in [default: 300] 27 | --tmpdir= Temporary directory 28 | --keep-tmpdir Whether to keep the tmp dir after scanning or not [default: False] 29 | --face-up= Face-up scanning [default: True] 30 | --keep-blanks Don't check for and remove blank pages 31 | --blank-threshold= Percentage of white to be marked as blank [default: 0.97] 32 | --post-process Run unpaper to deskew/clean up 33 | 34 | """ 35 | 36 | import sys, os 37 | import logging 38 | import shutil 39 | import re 40 | 41 | from version import __version__ 42 | import docopt 43 | 44 | import subprocess 45 | import time 46 | import glob 47 | from itertools import combinations 48 | 49 | 50 | class ScanPdf(object): 51 | """ 52 | The main clas. Performs the following functions: 53 | 54 | """ 55 | 56 | def __init__ (self): 57 | """ 58 | """ 59 | self.config = None 60 | self.bw_pages = {} # Keep track of which pages were in B&W 61 | 62 | def cmd(self, cmd_list): 63 | if isinstance(cmd_list, list): 64 | cmd_list = ' '.join(cmd_list) 65 | logging.debug("Running cmd: %s" % cmd_list) 66 | try: 67 | out = subprocess.check_output(cmd_list, stderr=subprocess.STDOUT, shell=True) 68 | logging.debug(out) 69 | return out 70 | except subprocess.CalledProcessError as e: 71 | print (e.output) 72 | self._error("Could not run command %s" % cmd_list) 73 | 74 | 75 | 76 | def run_scan(self): 77 | device = os.environ['SCANBD_DEVICE'] 78 | self.cmd('logger -t "scanbd: " "Begin of scan "') 79 | c = ['SANE_CONFIG_DIR=/etc/scanbd', 80 | 'scanadf', 81 | '-d "%s"' % device, 82 | '--source "ADF Duplex"', 83 | '--mode Color', 84 | '--resolution %sdpi' % self.dpi, 85 | #'--y-resolution %sdpi' % self.dpi, 86 | '-o %s/page_%%04d' % self.tmp_dir, 87 | #'-y 876.695mm', 88 | #'--page-height 355.617mm', 89 | '--page-height 876.695', 90 | '-y 876.695', 91 | #'--buffermode On', 92 | '--brightness=25', 93 | '--emphasis=20', 94 | '--ald yes', 95 | ] 96 | self.cmd(c) 97 | self.cmd('logger -t "scanbd: " "End of scan "') 98 | 99 | def _error(self, msg): 100 | print("ERROR: %s" % msg) 101 | sys.exit(-1) 102 | 103 | def _atoi(self,text): 104 | return int(text) if text.isdigit() else text 105 | 106 | def _natural_keys(self, text): 107 | ''' 108 | alist.sort(key=natural_keys) sorts in human order 109 | http://nedbatchelder.com/blog/200712/human_sorting.html 110 | (See Toothy's implementation in the comments) 111 | ''' 112 | return [ self._atoi(c) for c in re.split('(\d+)', text) ] 113 | 114 | def get_pages(self): 115 | cwd = os.getcwd() 116 | os.chdir(self.tmp_dir) 117 | pages = glob.glob('./page_*') 118 | pages.sort(key = self._natural_keys) 119 | os.chdir(cwd) 120 | return pages 121 | 122 | def reorder_face_up(self, pages): 123 | reorder = [] 124 | assert len(pages) % 2 == 0, "Why is page count not even for duplexing??" 125 | logging.info("Reordering pages") 126 | pages.reverse() 127 | return pages 128 | 129 | def parse_dimensions(self, result): 130 | first_line = str(result.splitlines()[0].strip()) 131 | logging.debug(first_line) 132 | mCropDim = re.compile("""\s*(?P[\d\w\[_\/\\\.]+)\s+\w+\s+(?P\d+)x(?P\d+)\s+""") 133 | # blank3.pnm PPM 1x1 1950x2716-1-1 8-bit sRGB 0.010u 0:00.009 134 | matchCropDim = mCropDim.search(first_line) 135 | if matchCropDim: 136 | x = int(matchCropDim.group('X')) 137 | y = int(matchCropDim.group('Y')) 138 | else: 139 | x = -1 140 | y = -1 141 | return x, y 142 | 143 | def get_dimensions(self, filename): 144 | c = 'identify %s' % filename 145 | result = self.cmd(c) 146 | return self.parse_dimensions(result) 147 | 148 | def is_blank(self, filename): 149 | """ 150 | Returns true if image in filename is blank 151 | 152 | - Shave off one inch around edges 153 | - Blur and crop down as much as possible 154 | - If remaining page has a dimension smaller than 0.3" conclude it's blank 155 | """ 156 | if not os.path.exists(filename): 157 | return True 158 | 159 | #c = 'convert %s -shave %sx%s -virtual-pixel White -blur 0x15 -fuzz 15%% -trim info:' % (filename, self.dpi, self.dpi) 160 | c = 'convert %s -shave %sx%s -density %s -adaptive-resize 65%% -virtual-pixel White -blur 0x15 -fuzz 15%% -trim info:' % (filename, self.dpi, self.dpi, int(self.dpi/2)) 161 | result = self.cmd(c) 162 | x, y = self.parse_dimensions(result) 163 | if x>0 and y>0: 164 | logging.debug('Finding threshold for blanks') 165 | threshold = int(self.dpi)/2*0.3 # Threshold is 0.3 inches 166 | logging.debug('x=%s, y=%s, threshold=%s' % (x, y, threshold)) 167 | if x < threshold or y < threshold: 168 | return True 169 | else: 170 | return False 171 | else: 172 | logging.debug('Could not find dimensions in output of imagemagick for cropping') 173 | return False 174 | 175 | # Old code, doesn't really work for pages with small amounts of text 176 | # c = 'identify -verbose %s' % filename 177 | # result = self.cmd(c) 178 | # mStdDev = re.compile("""\s*standard deviation:\s*\d+\.\d+\s*\((?P\d+\.\d+)\).*""") 179 | # for line in result.splitlines(): 180 | # match = mStdDev.search(str(line)) 181 | # if match: 182 | # stdev = float(match.group('percent')) 183 | # if stdev > 0.1: 184 | # return False 185 | # return True 186 | 187 | 188 | def run_postprocess(self, page_files): 189 | cwd = os.getcwd() 190 | os.chdir(self.tmp_dir) 191 | 192 | processed_pages = [] 193 | self.bw_pages = {} 194 | for page in page_files: 195 | processed_page = '%s_unpaper' % page 196 | c = ['unpaper', page, processed_page] 197 | self.cmd(c) 198 | os.remove(page) 199 | processed_pages.append(processed_page) 200 | self.bw_pages[processed_page] = True 201 | os.chdir(cwd) 202 | return processed_pages 203 | 204 | def run_crop(self, page_files): 205 | cwd = os.getcwd() 206 | os.chdir(self.tmp_dir) 207 | crop_pages = [] 208 | for i, page in enumerate(page_files): 209 | logging.debug("Cropping page %d" % i) 210 | crop_page = '%s.crop' % page 211 | shave_amt = int(int(self.dpi)*0.1) 212 | c = ['convert', 213 | '-deskew 80%', 214 | '-shave %dx%d' % (shave_amt, shave_amt), 215 | '-fuzz 20%', 216 | '-trim', 217 | '+repage', 218 | ] 219 | 220 | # Get original dimensions 221 | x, y = self.get_dimensions(page) 222 | if x>0 and y>0: 223 | # IF we know the original dimensions, then just pad back to that with white background 224 | c.extend([ '-gravity center', 225 | '-extent %sx%s' % (x, y), 226 | '-background white', 227 | ]) 228 | c.extend([ ' %s ' % page, 229 | crop_page, 230 | ]) 231 | self.cmd(c) 232 | crop_pages.append(crop_page) 233 | 234 | if not self.args['--keep-tmpdir']: 235 | os.remove(page) 236 | 237 | os.chdir(cwd) 238 | return crop_pages 239 | 240 | def run_convert(self, page_files): 241 | cwd = os.getcwd() 242 | os.chdir(self.tmp_dir) 243 | 244 | pdf_basename = os.path.basename(self.pdf_filename) 245 | ps_filename = pdf_basename 246 | ps_filename = ps_filename.replace(".pdf", ".ps") 247 | 248 | 249 | # Convert each page to a ps 250 | for page in page_files: 251 | is_bw = self.bw_pages.get(page, False) 252 | if is_bw: 253 | c = ['convert', 254 | page, 255 | '-density %s' % self.dpi, 256 | '-depth 2', 257 | '-define png:compression-level=9', 258 | '-define png:format=8', 259 | '-define png:color-type=0', 260 | '-define png:bit-depth=2', 261 | 'PNG:- | convert - -rotate 180', 262 | '%s.pdf' % page, 263 | ] 264 | else: 265 | c = ['convert', 266 | '-density %s' % self.dpi, 267 | '+page', # Make sure it doesn't crop to letter size 268 | '-compress JPEG', 269 | '-sampling-factor 4:2:0', 270 | '-strip', 271 | '-quality 85', 272 | '-interlace JPEG', 273 | '-colorspace RGB', 274 | '-rotate 180', 275 | page, 276 | '%s.pdf' % page, 277 | ] 278 | self.cmd(c) 279 | 280 | # Create a single ps file using gs 281 | c = ['gs', 282 | '-sDEVICE=pdfwrite', 283 | '-r%s' % self.dpi, 284 | '-dNOPAUSE', 285 | '-dBATCH', 286 | '-dSAFER', 287 | '-sOutputFile=%s' % pdf_basename, 288 | ' '.join(['%s.pdf' % p for p in page_files]), 289 | ] 290 | self.cmd(c) 291 | c = ['epstopdf', 292 | ps_filename, 293 | ] 294 | 295 | #self.cmd(c) 296 | 297 | #c = ['convert', 298 | #'-density %s' % self.dpi, 299 | #'+page', # Make sure it doesn't crop to letter size 300 | #'-compress JPEG', 301 | #'-sampling-factor 4:2:0', 302 | #'-strip', 303 | #'-quality 85', 304 | #'-interlace JPEG', 305 | #'-colorspace RGB', 306 | #'-rotate 180', 307 | #' '.join(page_files), 308 | #'%s' % pdf_basename, 309 | #] 310 | #self.cmd(c) 311 | 312 | 313 | #c = ['ps2pdf', 314 | #'-DPDFSETTINGS=/prepress', 315 | #ps_filename, 316 | #pdf_basename, 317 | #] 318 | 319 | # unneeded since we're going directly to pdf using imagemagick now 320 | #c = ['epstopdf', 321 | #ps_filename, 322 | #] 323 | 324 | #self.cmd(c) 325 | shutil.move(pdf_basename, self.pdf_filename) 326 | if not self.args['--keep-tmpdir']: 327 | for filename in page_files: 328 | os.remove(filename) 329 | 330 | # IF we did the scan, then remove the tmp dir too 331 | if self.args['scan'] and not self.args['--keep-tmpdir']: 332 | os.rmdir(self.tmp_dir) 333 | os.chdir(cwd) 334 | 335 | 336 | def convert_to_bw(self, pages): 337 | new_pages = [] 338 | for i, page in enumerate(pages): 339 | filename = os.path.join(self.tmp_dir, page) 340 | logging.info("Checking if %s is bw..." % filename) 341 | if self._is_color(filename): 342 | new_pages.append(page) 343 | logging.info("No, %s is color..." % filename) 344 | self.bw_pages[page] = False 345 | else: # COnvert to BW 346 | bw_page = self._page_to_bw(filename) 347 | logging.info("Yes, %s converted to bw..." % filename) 348 | new_pages.append(bw_page) 349 | self.bw_pages[bw_page] = True 350 | return new_pages 351 | 352 | 353 | def _page_to_bw(self, page): 354 | out_page = "%s_bw" % page 355 | cwd = os.getcwd() 356 | os.chdir(self.tmp_dir) 357 | 358 | cmd = "convert %s +dither -density %s -colors 16 -colors 4 -colorspace gray -normalize %s_bw" % (page, self.dpi, page) 359 | out = self.cmd(cmd) 360 | # Remove the old file 361 | if not self.args['--keep-tmpdir']: 362 | os.remove(page) 363 | os.chdir(cwd) 364 | return out_page 365 | 366 | def _is_color(self, filename): 367 | """ 368 | Run the following command from ImageMagick: 369 | 370 | :: 371 | 372 | convert holi.pdf -colors 8 -depth 8 -format %c histogram:info:- 373 | 374 | This outputs something like the following: 375 | :: 376 | 377 | 10831: ( 24, 26, 26,255) #181A1A srgba(24,26,26,1) 378 | 4836: ( 55, 87, 79,255) #37574F srgba(55,87,79,1) 379 | 6564: ( 77,138,121,255) #4D8A79 srgba(77,138,121,1) 380 | 4997: ( 86, 96, 93,255) #56605D srgba(86,96,93,1) 381 | 7005: ( 92,153,139,255) #5C998B srgba(92,153,139,1) 382 | 2479: (143,118,123,255) #8F767B srgba(143,118,123,1) 383 | 8870: (169,176,170,255) #A9B0AA srgba(169,176,170,1) 384 | 442906: (254,254,254,255) #FEFEFE srgba(254,254,254,1) 385 | 1053: ( 0, 0, 0,255) #000000 black 386 | 484081: (255,255,255,255) #FFFFFF white 387 | 388 | """ 389 | cmd = "convert %s -density %s -adaptive-resize 35%% -colors 8 -depth 8 -format %%c histogram:info:-" % (filename, int(self.dpi/3)) 390 | out = self.cmd(cmd) 391 | mLine = re.compile(r"""\s*(?P\d+):\s*\(\s*(?P\d+),\s*(?P\d+),\s*(?P\d+).+""") 392 | colors = [] 393 | for line in out.splitlines(): 394 | matchLine = mLine.search(str(line)) 395 | if matchLine: 396 | logging.debug("Found RGB values") 397 | color = [int(x) for x in (matchLine.group('count'), 398 | matchLine.group('R'), 399 | matchLine.group('G'), 400 | matchLine.group('B'), 401 | ) 402 | ] 403 | colors.append(color) 404 | # sort 405 | colors.sort(reverse=True, key = lambda x: x[0]) 406 | logging.debug(colors) 407 | is_color = False 408 | logging.debug(colors) 409 | for color in colors: 410 | # Calculate the mean differences between the RGB components 411 | # Shades of grey will be very close to zero in this metric... 412 | diff = float(sum([abs(color[2]-color[1]), 413 | abs(color[3]-color[1]), 414 | abs(color[3]-color[2]), 415 | ]))/3 416 | if diff > 30: 417 | is_color = True 418 | logging.debug("Found color, diff is %s" % diff) 419 | else: 420 | logging.debug("No color, diff is %s" % diff) 421 | return is_color 422 | 423 | 424 | 425 | def get_options(self, argv): 426 | """ 427 | Parse the command-line options and set the following object properties: 428 | 429 | :param argv: usually just sys.argv[1:] 430 | :returns: Nothing 431 | 432 | :ivar debug: Enable logging debug statements 433 | :ivar verbose: Enable verbose logging 434 | :ivar config: Dict of the config file 435 | 436 | """ 437 | self.args = argv 438 | 439 | if argv['--verbose']: 440 | logging.basicConfig(level=logging.INFO, format='%(message)s') 441 | if argv['--debug']: 442 | logging.basicConfig(level=logging.DEBUG, format='%(message)s') 443 | if self.args['pdf']: 444 | self.pdf_filename = os.path.abspath(self.args['']) 445 | 446 | self.dpi = int(self.args['--dpi']) 447 | 448 | output_dir = time.strftime('%Y%m%d_%H%M%S', time.localtime()) 449 | if argv['--tmpdir']: 450 | self.tmp_dir = argv['--tmpdir'] 451 | else: 452 | self.tmp_dir = os.path.join('/tmp', output_dir) 453 | self.tmp_dir = os.path.abspath(self.tmp_dir) 454 | 455 | # Make the tmp dir only if we're scanning, o/w throw an error 456 | if argv['scan']: 457 | if os.path.exists(self.tmp_dir): 458 | self._error("Temporary output directory %s already exists!" % self.tmp_dir) 459 | else: 460 | os.makedirs(self.tmp_dir) 461 | else: 462 | if not os.path.exists(self.tmp_dir): 463 | self._error("Scan files directory %s does not exist!" % self.tmp_dir) 464 | 465 | # Blank checks 466 | self.keep_blanks = argv['--keep-blanks'] 467 | self.blank_threshold = float(argv['--blank-threshold']) 468 | assert(self.blank_threshold >= 0 and self.blank_threshold <= 1.0) 469 | self.post_process = argv['--post-process'] 470 | 471 | def go(self, argv): 472 | """ 473 | The main entry point into ScanPdf 474 | 475 | #. Get the options 476 | #. Create the temp dir 477 | #. Run scanadf 478 | """ 479 | # Read the command line options 480 | self.get_options(argv) 481 | logging.info("Temp dir: %s" % self.tmp_dir) 482 | if self.args['scan']: 483 | self.run_scan() 484 | 485 | if self.args['pdf']: 486 | # Now, convert the files to ps 487 | pages = self.get_pages() 488 | logging.debug( pages ) 489 | if self.args['--face-up']: 490 | pages = self.reorder_face_up(pages) 491 | 492 | logging.debug( pages ) 493 | 494 | # Crop the pages 495 | pages = self.run_crop(pages) 496 | 497 | # Now, check if color or bw 498 | pages = self.convert_to_bw(pages) 499 | logging.debug(pages) 500 | 501 | # Run blanks 502 | if not self.keep_blanks: 503 | no_blank_pages = [] 504 | for i,page in enumerate(pages): 505 | filename = os.path.join(self.tmp_dir, page) 506 | logging.info("Checking if %s is blank..." % filename) 507 | if not self.is_blank(filename): 508 | no_blank_pages.append(page) 509 | else: 510 | logging.info(" page %s is blank, removing..." % i) 511 | os.remove(filename) 512 | pages = no_blank_pages 513 | 514 | logging.debug( pages ) 515 | 516 | if self.post_process: 517 | pages = self.run_postprocess(pages) 518 | 519 | self.run_convert(pages) 520 | 521 | def main(): 522 | args = docopt.docopt(__doc__, version='Scan PDF %s' % __version__ ) 523 | script = ScanPdf() 524 | print(args) 525 | script.go(args) 526 | 527 | if __name__ == '__main__': 528 | main() 529 | 530 | -------------------------------------------------------------------------------- /scanpdf/version.py: -------------------------------------------------------------------------------- 1 | __version__ = "0.3.1" 2 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | from setuptools import setup, find_packages 3 | 4 | import io 5 | import os 6 | import scanpdf 7 | from scanpdf.version import __version__ 8 | from setuptools import Command 9 | 10 | class PyTest(Command): 11 | user_options = [] 12 | def initialize_options(self): 13 | pass 14 | def finalize_options(self): 15 | pass 16 | def run(self): 17 | import sys,subprocess 18 | cwd = os.getcwd() 19 | os.chdir('test') 20 | errno = subprocess.call([sys.executable, 'runtests.py']) 21 | os.chdir(cwd) 22 | raise SystemExit(errno) 23 | 24 | def read(*filenames, **kwargs): 25 | encoding = kwargs.get('encoding', 'utf-8') 26 | sep = kwargs.get('sep', '\n') 27 | buf = [] 28 | for filename in filenames: 29 | with io.open(filename, encoding=encoding) as f: 30 | buf.append(f.read()) 31 | return sep.join(buf) 32 | 33 | packages = find_packages(exclude="tests") 34 | 35 | long_description = read('README.rst') 36 | 37 | with open("requirements.txt") as f: 38 | required = f.read().splitlines() 39 | 40 | setup ( 41 | name = "scanpdf", 42 | version = __version__, 43 | description="Utility to use SANE/scanadf to scan to PDF", 44 | license = "ASL 2.0", 45 | long_description = long_description, 46 | author="Virantha N. Ekanayake", 47 | author_email="virantha@gmail.com", # Removed. 48 | package_data = {'': ['*.xml']}, 49 | zip_safe = True, 50 | include_package_data = True, 51 | packages = packages, 52 | install_requires = required, 53 | entry_points = { 54 | 'console_scripts': [ 55 | 'scanpdf = scanpdf.scanpdf:main' 56 | ], 57 | }, 58 | options = { 59 | "pyinstaller": {"packages": packages} 60 | }, 61 | cmdclass = {'test':PyTest} 62 | 63 | ) -------------------------------------------------------------------------------- /test/COVERAGE.rst: -------------------------------------------------------------------------------- 1 | ============================= test session starts ============================== 2 | Nothing yet 3 | ========================= ========================== 4 | -------------------------------------------------------------------------------- /test/test_scanpdf.py: -------------------------------------------------------------------------------- 1 | import scanpdf.ScanPdf as P 2 | import pytest 3 | import os 4 | import logging 5 | 6 | import smtplib 7 | from mock import Mock 8 | from mock import patch, call 9 | from mock import MagicMock 10 | from mock import PropertyMock 11 | 12 | 13 | class Testscanpdf: 14 | 15 | def setup(self): 16 | self.p = P.ScanPdf() 17 | -------------------------------------------------------------------------------- /tox.ini: -------------------------------------------------------------------------------- 1 | [tox] 2 | envlist=py27,py33 3 | 4 | [testenv] 5 | changedir=test 6 | deps= 7 | pytest 8 | mock 9 | coverage 10 | commands=py.test 11 | --------------------------------------------------------------------------------