├── .gitignore
├── AUTHORS.rst
├── CHANGES.rst
├── CONTRIBUTING.rst
├── HISTORY.rst
├── LICENSE.txt
├── MANIFEST.in
├── README.rst
├── TODO.rst
├── dist
    └── README.txt
├── docs
    ├── Makefile
    ├── authors.rst
    ├── conf.py
    ├── contributing.rst
    ├── history.rst
    ├── index.rst
    ├── installation.rst
    ├── make.bat
    ├── readme.rst
    ├── todo.rst
    └── usage.rst
├── fabfile.py
├── first_setup.zsh
├── requirements.txt
├── scanpdf.egg-info
    ├── PKG-INFO
    ├── SOURCES.txt
    ├── dependency_links.txt
    ├── entry_points.txt
    ├── requires.txt
    ├── top_level.txt
    └── zip-safe
├── scanpdf
    ├── __init__.py
    ├── scanpdf.py
    └── version.py
├── setup.py
├── test
    ├── COVERAGE.rst
    └── test_scanpdf.py
└── tox.ini


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | .*
3 | *~
4 | 


--------------------------------------------------------------------------------
/AUTHORS.rst:
--------------------------------------------------------------------------------
 1 | =======
 2 | Credits
 3 | =======
 4 | 
 5 | Development Lead
 6 | ----------------
 7 | 
 8 | * Virantha N. Ekanayake <virantha@gmail.com>
 9 | 
10 | Contributors
11 | ------------
12 | 
13 | None yet. Why not be the first?


--------------------------------------------------------------------------------
/CHANGES.rst:
--------------------------------------------------------------------------------
1 | =======  ========   ======
2 | Version  Date       Changes
3 | -------  --------   ------
4 | 
5 | v0.3.0   8/25/14    Allow arbitrary page sizes and auto-crops
6 | v0.1.0   1/1/14     First release
7 | =======  ========   ======
8 | 


--------------------------------------------------------------------------------
/CONTRIBUTING.rst:
--------------------------------------------------------------------------------
  1 | ============
  2 | Contributing
  3 | ============
  4 | 
  5 | Contributions are welcome, and they are greatly appreciated! Every
  6 | little bit helps, and credit will always be given. 
  7 | 
  8 | You can contribute in many ways:
  9 | 
 10 | Types of Contributions
 11 | ----------------------
 12 | 
 13 | Report Bugs
 14 | ~~~~~~~~~~~
 15 | 
 16 | Report bugs at https://github.com/virantha/airframe/issues.
 17 | 
 18 | If you are reporting a bug, please include:
 19 | 
 20 | * Your operating system name and version.
 21 | * Any details about your local setup that might be helpful in troubleshooting.
 22 | * Detailed steps to reproduce the bug.
 23 | 
 24 | Fix Bugs
 25 | ~~~~~~~~
 26 | 
 27 | Look through the GitHub issues for bugs. Anything tagged with "bug"
 28 | is open to whoever wants to implement it.
 29 | 
 30 | Implement Features
 31 | ~~~~~~~~~~~~~~~~~~
 32 | 
 33 | Look through the GitHub issues for features. Anything tagged with "feature"
 34 | is open to whoever wants to implement it.
 35 | 
 36 | Write Documentation
 37 | ~~~~~~~~~~~~~~~~~~~
 38 | 
 39 | Scan PDF  could always use more documentation, whether as part of
 40 | the official Scan PDF  docs, in docstrings, or even on the web in
 41 | blog posts, articles, and such.
 42 | 
 43 | Submit Feedback
 44 | ~~~~~~~~~~~~~~~
 45 | 
 46 | The best way to send feedback is to file an issue at https://github.com/virantha/scanpdf/issues.
 47 | 
 48 | If you are proposing a feature:
 49 | 
 50 | * Explain in detail how it would work.
 51 | * Keep the scope as narrow as possible, to make it easier to implement.
 52 | * Remember that this is a volunteer-driven project, and that contributions
 53 |   are welcome :)
 54 | 
 55 | Get Started!
 56 | ------------
 57 | 
 58 | Ready to contribute? Here's how to set up `scanpdf` for local development.
 59 | 
 60 | 1. Fork the `scanpdf` repo on GitHub.
 61 | 2. Clone your fork locally::
 62 | 
 63 |     $ git clone git@github.com:your_name_here/scanpdf.git
 64 | 
 65 | 3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development::
 66 | 
 67 |     $ mkvirtualenv scanpdf
 68 |     $ cd scanpdf/
 69 |     $ python setup.py develop
 70 | 
 71 | 4. Create a branch for local development::
 72 | 
 73 |     $ git checkout -b name-of-your-bugfix-or-feature
 74 |    
 75 |    Now you can make your changes locally.
 76 | 
 77 | 5. When you're done making changes, check that your changes pass tests::
 78 | 
 79 |     $ fab run_tests
 80 | 
 81 |    To get fabric and tox, just pip install them into your virtualenv. 
 82 | 
 83 | 6. Commit your changes and push your branch to GitHub::
 84 | 
 85 |     $ git add .
 86 |     $ git commit -m "Your detailed description of your changes."
 87 |     $ git push origin name-of-your-bugfix-or-feature
 88 | 
 89 | 7. Submit a pull request through the GitHub website.
 90 | 
 91 | Pull Request Guidelines
 92 | -----------------------
 93 | 
 94 | Before you submit a pull request, check that it meets these guidelines:
 95 | 
 96 | 1. The pull request should include tests.
 97 | 2. If the pull request adds functionality, the docs should be updated. Put
 98 |    your new functionality into a function with a docstring, and add the
 99 |    feature to the list in README.rst.
100 | 3. The pull request should work for Python 2.6, 2.7, and 3.3, and for PyPy. Check 
101 |    https://travis-ci.org/Virantha N. Ekanayake/scanpdf/pull_requests
102 |    and make sure that the tests pass for all supported Python versions.
103 | 
104 | Tips
105 | ----
106 | 
107 | Anything?::
108 | 


--------------------------------------------------------------------------------
/HISTORY.rst:
--------------------------------------------------------------------------------
 1 | .. :changelog:
 2 | 
 3 | History
 4 | -------
 5 | 
 6 | 0.1.0 (2013-08-11)
 7 | ++++++++++++++++++
 8 | 
 9 | * First release on PyPI.
10 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
  1 | 
  2 |                                  Apache License
  3 |                            Version 2.0, January 2004
  4 |                         http://www.apache.org/licenses/
  5 | 
  6 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  7 | 
  8 |    1. Definitions.
  9 | 
 10 |       "License" shall mean the terms and conditions for use, reproduction,
 11 |       and distribution as defined by Sections 1 through 9 of this document.
 12 | 
 13 |       "Licensor" shall mean the copyright owner or entity authorized by
 14 |       the copyright owner that is granting the License.
 15 | 
 16 |       "Legal Entity" shall mean the union of the acting entity and all
 17 |       other entities that control, are controlled by, or are under common
 18 |       control with that entity. For the purposes of this definition,
 19 |       "control" means (i) the power, direct or indirect, to cause the
 20 |       direction or management of such entity, whether by contract or
 21 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 22 |       outstanding shares, or (iii) beneficial ownership of such entity.
 23 | 
 24 |       "You" (or "Your") shall mean an individual or Legal Entity
 25 |       exercising permissions granted by this License.
 26 | 
 27 |       "Source" form shall mean the preferred form for making modifications,
 28 |       including but not limited to software source code, documentation
 29 |       source, and configuration files.
 30 | 
 31 |       "Object" form shall mean any form resulting from mechanical
 32 |       transformation or translation of a Source form, including but
 33 |       not limited to compiled object code, generated documentation,
 34 |       and conversions to other media types.
 35 | 
 36 |       "Work" shall mean the work of authorship, whether in Source or
 37 |       Object form, made available under the License, as indicated by a
 38 |       copyright notice that is included in or attached to the work
 39 |       (an example is provided in the Appendix below).
 40 | 
 41 |       "Derivative Works" shall mean any work, whether in Source or Object
 42 |       form, that is based on (or derived from) the Work and for which the
 43 |       editorial revisions, annotations, elaborations, or other modifications
 44 |       represent, as a whole, an original work of authorship. For the purposes
 45 |       of this License, Derivative Works shall not include works that remain
 46 |       separable from, or merely link (or bind by name) to the interfaces of,
 47 |       the Work and Derivative Works thereof.
 48 | 
 49 |       "Contribution" shall mean any work of authorship, including
 50 |       the original version of the Work and any modifications or additions
 51 |       to that Work or Derivative Works thereof, that is intentionally
 52 |       submitted to Licensor for inclusion in the Work by the copyright owner
 53 |       or by an individual or Legal Entity authorized to submit on behalf of
 54 |       the copyright owner. For the purposes of this definition, "submitted"
 55 |       means any form of electronic, verbal, or written communication sent
 56 |       to the Licensor or its representatives, including but not limited to
 57 |       communication on electronic mailing lists, source code control systems,
 58 |       and issue tracking systems that are managed by, or on behalf of, the
 59 |       Licensor for the purpose of discussing and improving the Work, but
 60 |       excluding communication that is conspicuously marked or otherwise
 61 |       designated in writing by the copyright owner as "Not a Contribution."
 62 | 
 63 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 64 |       on behalf of whom a Contribution has been received by Licensor and
 65 |       subsequently incorporated within the Work.
 66 | 
 67 |    2. Grant of Copyright License. Subject to the terms and conditions of
 68 |       this License, each Contributor hereby grants to You a perpetual,
 69 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 70 |       copyright license to reproduce, prepare Derivative Works of,
 71 |       publicly display, publicly perform, sublicense, and distribute the
 72 |       Work and such Derivative Works in Source or Object form.
 73 | 
 74 |    3. Grant of Patent License. Subject to the terms and conditions of
 75 |       this License, each Contributor hereby grants to You a perpetual,
 76 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 77 |       (except as stated in this section) patent license to make, have made,
 78 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 79 |       where such license applies only to those patent claims licensable
 80 |       by such Contributor that are necessarily infringed by their
 81 |       Contribution(s) alone or by combination of their Contribution(s)
 82 |       with the Work to which such Contribution(s) was submitted. If You
 83 |       institute patent litigation against any entity (including a
 84 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 85 |       or a Contribution incorporated within the Work constitutes direct
 86 |       or contributory patent infringement, then any patent licenses
 87 |       granted to You under this License for that Work shall terminate
 88 |       as of the date such litigation is filed.
 89 | 
 90 |    4. Redistribution. You may reproduce and distribute copies of the
 91 |       Work or Derivative Works thereof in any medium, with or without
 92 |       modifications, and in Source or Object form, provided that You
 93 |       meet the following conditions:
 94 | 
 95 |       (a) You must give any other recipients of the Work or
 96 |           Derivative Works a copy of this License; and
 97 | 
 98 |       (b) You must cause any modified files to carry prominent notices
 99 |           stating that You changed the files; and
100 | 
101 |       (c) You must retain, in the Source form of any Derivative Works
102 |           that You distribute, all copyright, patent, trademark, and
103 |           attribution notices from the Source form of the Work,
104 |           excluding those notices that do not pertain to any part of
105 |           the Derivative Works; and
106 | 
107 |       (d) If the Work includes a "NOTICE" text file as part of its
108 |           distribution, then any Derivative Works that You distribute must
109 |           include a readable copy of the attribution notices contained
110 |           within such NOTICE file, excluding those notices that do not
111 |           pertain to any part of the Derivative Works, in at least one
112 |           of the following places: within a NOTICE text file distributed
113 |           as part of the Derivative Works; within the Source form or
114 |           documentation, if provided along with the Derivative Works; or,
115 |           within a display generated by the Derivative Works, if and
116 |           wherever such third-party notices normally appear. The contents
117 |           of the NOTICE file are for informational purposes only and
118 |           do not modify the License. You may add Your own attribution
119 |           notices within Derivative Works that You distribute, alongside
120 |           or as an addendum to the NOTICE text from the Work, provided
121 |           that such additional attribution notices cannot be construed
122 |           as modifying the License.
123 | 
124 |       You may add Your own copyright statement to Your modifications and
125 |       may provide additional or different license terms and conditions
126 |       for use, reproduction, or distribution of Your modifications, or
127 |       for any such Derivative Works as a whole, provided Your use,
128 |       reproduction, and distribution of the Work otherwise complies with
129 |       the conditions stated in this License.
130 | 
131 |    5. Submission of Contributions. Unless You explicitly state otherwise,
132 |       any Contribution intentionally submitted for inclusion in the Work
133 |       by You to the Licensor shall be under the terms and conditions of
134 |       this License, without any additional terms or conditions.
135 |       Notwithstanding the above, nothing herein shall supersede or modify
136 |       the terms of any separate license agreement you may have executed
137 |       with Licensor regarding such Contributions.
138 | 
139 |    6. Trademarks. This License does not grant permission to use the trade
140 |       names, trademarks, service marks, or product names of the Licensor,
141 |       except as required for reasonable and customary use in describing the
142 |       origin of the Work and reproducing the content of the NOTICE file.
143 | 
144 |    7. Disclaimer of Warranty. Unless required by applicable law or
145 |       agreed to in writing, Licensor provides the Work (and each
146 |       Contributor provides its Contributions) on an "AS IS" BASIS,
147 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 |       implied, including, without limitation, any warranties or conditions
149 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 |       PARTICULAR PURPOSE. You are solely responsible for determining the
151 |       appropriateness of using or redistributing the Work and assume any
152 |       risks associated with Your exercise of permissions under this License.
153 | 
154 |    8. Limitation of Liability. In no event and under no legal theory,
155 |       whether in tort (including negligence), contract, or otherwise,
156 |       unless required by applicable law (such as deliberate and grossly
157 |       negligent acts) or agreed to in writing, shall any Contributor be
158 |       liable to You for damages, including any direct, indirect, special,
159 |       incidental, or consequential damages of any character arising as a
160 |       result of this License or out of the use or inability to use the
161 |       Work (including but not limited to damages for loss of goodwill,
162 |       work stoppage, computer failure or malfunction, or any and all
163 |       other commercial damages or losses), even if such Contributor
164 |       has been advised of the possibility of such damages.
165 | 
166 |    9. Accepting Warranty or Additional Liability. While redistributing
167 |       the Work or Derivative Works thereof, You may choose to offer,
168 |       and charge a fee for, acceptance of support, warranty, indemnity,
169 |       or other liability obligations and/or rights consistent with this
170 |       License. However, in accepting such obligations, You may act only
171 |       on Your own behalf and on Your sole responsibility, not on behalf
172 |       of any other Contributor, and only if You agree to indemnify,
173 |       defend, and hold each Contributor harmless for any liability
174 |       incurred by, or claims asserted against, such Contributor by reason
175 |       of your accepting any such warranty or additional liability.
176 | 
177 |    END OF TERMS AND CONDITIONS
178 | 
179 |    APPENDIX: How to apply the Apache License to your work.
180 | 
181 |       To apply the Apache License to your work, attach the following
182 |       boilerplate notice, with the fields enclosed by brackets "[]"
183 |       replaced with your own identifying information. (Don't include
184 |       the brackets!)  The text should be enclosed in the appropriate
185 |       comment syntax for the file format. We also recommend that a
186 |       file or class name and description of purpose be included on the
187 |       same "printed page" as the copyright notice for easier
188 |       identification within third-party archives.
189 | 
190 |    Copyright [ 2014 ] [ Virantha N. Ekanayake ]
191 | 
192 |    Licensed under the Apache License, Version 2.0 (the "License");
193 |    you may not use this file except in compliance with the License.
194 |    You may obtain a copy of the License at
195 | 
196 |        http://www.apache.org/licenses/LICENSE-2.0
197 | 
198 |    Unless required by applicable law or agreed to in writing, software
199 |    distributed under the License is distributed on an "AS IS" BASIS,
200 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 |    See the License for the specific language governing permissions and
202 |    limitations under the License.


--------------------------------------------------------------------------------
/MANIFEST.in:
--------------------------------------------------------------------------------
1 | include *.txt
2 | include *.rst
3 | 
4 | 


--------------------------------------------------------------------------------
/README.rst:
--------------------------------------------------------------------------------
 1 | Scan PDF - Easy scans in Linux with a document scanner like the Fujitsu ScanSnap
 2 | ################################################################################
 3 | 
 4 | .. image:: http://badge.fury.io/py/scanpdf.png
 5 |     :target: http://badge.fury.io/py/scanpdf
 6 | 
 7 | .. image:: http://pypip.in/d/scanpdf/badge.png
 8 |     :target: https://crate.io/packages/scanpdf?version=latest
 9 | 
10 | 
11 | If you're looking for a simple way to use a multi-page scanner and get your
12 | document into a PDF in Linux without any proprietary or commercial software,
13 | then ScanPDF might be the solution.  I wrote it to quickly take the Linux SANE
14 | scanner system output image files, and process them into usable PDFs.  By
15 | usable, I mean PDFs that maintain their original scanned resolution, omit blank
16 | pages (if you're scanning in duplex mode, for example), preserve color unless
17 | the original is greyscale/black and white, in which case they are intelligently
18 | down-converted to B/W PDFs to save space.
19 | 
20 | * Free and open-source software: ASL2 license
21 | * Documentation: http://virantha.github.io/scanpdf/html
22 | * Source: https://github.com/virantha/scanpdf
23 | 
24 | Features
25 | --------
26 | * Uses SANE/scanadf to automatically scan to multi-page compressed PDFs
27 | * `Integrates with ScanBd <http://virantha.github.io/scanpdf/html>`_ to respond to hardware button presses
28 | * Automatically removes blank pages.
29 | * Scans in color, and automatically down-converts into 1-bit B/W image for text/greyscale images
30 | * Auto-crops to the proper page size.
31 | 
32 | Usage:
33 | ------
34 | The simplest way to use this is:
35 | 
36 | ::
37 | 
38 |     scanpdf scan pdf <pdffile>
39 | 
40 | This will first perform the scan, and then the conversion to PDF.  If you want
41 | to split up the scan and the PDF conversion into two separate invocations (for
42 | reasons clarified below), then you can do:
43 | 
44 | ::
45 | 
46 |     scanpdf --tmpdir=tmp scan
47 |     scanpdf --tmpdir=tmp pdf <pdffile>
48 |   
49 | One reason for the separation might be if you want to keep scanning documents
50 | (very quick) while the post-processing (slower) for the PDF conversion is
51 | taking place in the background.   For instance, if you're using the hardware
52 | button on the scanner to initiate scans (as detailed in this_ document), then
53 | you want to return immediately after the scan instead of waiting for the full
54 | conversion to PDF has taken place.
55 | 
56 | .. _this: http://virantha.com/2014/03/17/one-touch-scanning-with-fujitsu-scansnap-in-linux/
57 | 
58 | You can optionally use the following switches to control if you're putting pages face up or face down in the auto
59 | document feeder, if you want to skip the blank page processing, adjust the blank page detection threshold, or add 
60 | additional post-processing using unpaper_:
61 | 
62 | .. _unpaper: http://unpaper.berlios.de
63 | 
64 | ::
65 | 
66 |         --dpi=<dpi>                 DPI to scan in [default: 300]
67 |         --face-up=<true/false>      Face-up scanning [default: True]
68 |         --keep-blanks               Don't check for and remove blank pages
69 |         --blank-threshold=<ths>     Percentage of white to be marked as blank [default: 0.97] 
70 |         --post-process              Run unpaper to deskew/clean up
71 | 
72 | 
73 | Right now, I'm assuming this is getting called via ScanBD, so I don't have the option to manually specify the 
74 | scanner.  If you really want to use this standalone, for now, please just set the ``SCANBD_DEVICE`` environment 
75 | variable to your scanner device name before running this script.
76 | 
77 | 
78 | Installation
79 | ------------
80 | ::
81 | 
82 |     $ pip install scanpdf
83 | 
84 | Requires ImageMagick and SANE to be installed, for the command line tools:
85 | 
86 | * ``convert``
87 | * ``identify``
88 | * ``ps2pdf``
89 | * ``scanadf``
90 | 
91 | Also requires epstopdf.
92 | 
93 | Disclaimer
94 | ----------
95 | The software is distributed on an "AS IS" BASIS, WITHOUT
96 | WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
97 | 


--------------------------------------------------------------------------------
/TODO.rst:
--------------------------------------------------------------------------------
1 | Todo list
2 | =========
3 | 
4 | - Make it more generic in terms of stand-alone usage
5 | - Add docstrings
6 | 
7 | 


--------------------------------------------------------------------------------
/dist/README.txt:
--------------------------------------------------------------------------------
1 | Any binary builds for various platforms go here
2 | 


--------------------------------------------------------------------------------
/docs/Makefile:
--------------------------------------------------------------------------------
  1 | # Makefile for Sphinx documentation
  2 | #
  3 | 
  4 | # You can set these variables from the command line.
  5 | SPHINXOPTS    =
  6 | SPHINXBUILD   = sphinx-build
  7 | PAPER         =
  8 | BUILDDIR      = /Users/virantha/dev/githubdocs/scanpdf
  9 | 
 10 | # User-friendly check for sphinx-build
 11 | ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1)
 12 | $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don't have Sphinx installed, grab it from http://sphinx-doc.org/)
 13 | endif
 14 | 
 15 | # Internal variables.
 16 | PAPEROPT_a4     = -D latex_paper_size=a4
 17 | PAPEROPT_letter = -D latex_paper_size=letter
 18 | ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 19 | # the i18n builder cannot share the environment and doctrees with the others
 20 | I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
 21 | 
 22 | .PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest gettext
 23 | 
 24 | help:
 25 | 	@echo "Please use \`make <target>' where <target> is one of"
 26 | 	@echo "  html       to make standalone HTML files"
 27 | 	@echo "  dirhtml    to make HTML files named index.html in directories"
 28 | 	@echo "  singlehtml to make a single large HTML file"
 29 | 	@echo "  pickle     to make pickle files"
 30 | 	@echo "  json       to make JSON files"
 31 | 	@echo "  htmlhelp   to make HTML files and a HTML help project"
 32 | 	@echo "  qthelp     to make HTML files and a qthelp project"
 33 | 	@echo "  devhelp    to make HTML files and a Devhelp project"
 34 | 	@echo "  epub       to make an epub"
 35 | 	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
 36 | 	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
 37 | 	@echo "  latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
 38 | 	@echo "  text       to make text files"
 39 | 	@echo "  man        to make manual pages"
 40 | 	@echo "  texinfo    to make Texinfo files"
 41 | 	@echo "  info       to make Texinfo files and run them through makeinfo"
 42 | 	@echo "  gettext    to make PO message catalogs"
 43 | 	@echo "  changes    to make an overview of all changed/added/deprecated items"
 44 | 	@echo "  xml        to make Docutils-native XML files"
 45 | 	@echo "  pseudoxml  to make pseudoxml-XML files for display purposes"
 46 | 	@echo "  linkcheck  to check all external links for integrity"
 47 | 	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
 48 | 
 49 | clean:
 50 | 	rm -rf $(BUILDDIR)/*
 51 | 
 52 | html:
 53 | 	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
 54 | 	@echo
 55 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
 56 | 
 57 | dirhtml:
 58 | 	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
 59 | 	@echo
 60 | 	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
 61 | 
 62 | singlehtml:
 63 | 	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
 64 | 	@echo
 65 | 	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
 66 | 
 67 | pickle:
 68 | 	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
 69 | 	@echo
 70 | 	@echo "Build finished; now you can process the pickle files."
 71 | 
 72 | json:
 73 | 	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
 74 | 	@echo
 75 | 	@echo "Build finished; now you can process the JSON files."
 76 | 
 77 | htmlhelp:
 78 | 	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
 79 | 	@echo
 80 | 	@echo "Build finished; now you can run HTML Help Workshop with the" \
 81 | 	      ".hhp project file in $(BUILDDIR)/htmlhelp."
 82 | 
 83 | qthelp:
 84 | 	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
 85 | 	@echo
 86 | 	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
 87 | 	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
 88 | 	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/pypdfocr.qhcp"
 89 | 	@echo "To view the help file:"
 90 | 	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/pypdfocr.qhc"
 91 | 
 92 | devhelp:
 93 | 	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
 94 | 	@echo
 95 | 	@echo "Build finished."
 96 | 	@echo "To view the help file:"
 97 | 	@echo "# mkdir -p $$HOME/.local/share/devhelp/pypdfocr"
 98 | 	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/pypdfocr"
 99 | 	@echo "# devhelp"
100 | 
101 | epub:
102 | 	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
103 | 	@echo
104 | 	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
105 | 
106 | latex:
107 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
108 | 	@echo
109 | 	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
110 | 	@echo "Run \`make' in that directory to run these through (pdf)latex" \
111 | 	      "(use \`make latexpdf' here to do that automatically)."
112 | 
113 | latexpdf:
114 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
115 | 	@echo "Running LaTeX files through pdflatex..."
116 | 	$(MAKE) -C $(BUILDDIR)/latex all-pdf
117 | 	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
118 | 
119 | latexpdfja:
120 | 	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
121 | 	@echo "Running LaTeX files through platex and dvipdfmx..."
122 | 	$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
123 | 	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
124 | 
125 | text:
126 | 	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
127 | 	@echo
128 | 	@echo "Build finished. The text files are in $(BUILDDIR)/text."
129 | 
130 | man:
131 | 	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
132 | 	@echo
133 | 	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
134 | 
135 | texinfo:
136 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
137 | 	@echo
138 | 	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
139 | 	@echo "Run \`make' in that directory to run these through makeinfo" \
140 | 	      "(use \`make info' here to do that automatically)."
141 | 
142 | info:
143 | 	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
144 | 	@echo "Running Texinfo files through makeinfo..."
145 | 	make -C $(BUILDDIR)/texinfo info
146 | 	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
147 | 
148 | gettext:
149 | 	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
150 | 	@echo
151 | 	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
152 | 
153 | changes:
154 | 	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
155 | 	@echo
156 | 	@echo "The overview file is in $(BUILDDIR)/changes."
157 | 
158 | linkcheck:
159 | 	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
160 | 	@echo
161 | 	@echo "Link check complete; look for any errors in the above output " \
162 | 	      "or in $(BUILDDIR)/linkcheck/output.txt."
163 | 
164 | doctest:
165 | 	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
166 | 	@echo "Testing of doctests in the sources finished, look at the " \
167 | 	      "results in $(BUILDDIR)/doctest/output.txt."
168 | 
169 | xml:
170 | 	$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
171 | 	@echo
172 | 	@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
173 | 
174 | pseudoxml:
175 | 	$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
176 | 	@echo
177 | 	@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."


--------------------------------------------------------------------------------
/docs/authors.rst:
--------------------------------------------------------------------------------
1 | .. include:: ../AUTHORS.rst


--------------------------------------------------------------------------------
/docs/conf.py:
--------------------------------------------------------------------------------
  1 | # -*- coding: utf-8 -*-
  2 | #
  3 | # scanpdf documentation build configuration file, created by
  4 | # sphinx-quickstart on Wed Oct 23 13:43:29 2013.
  5 | #
  6 | # This file is execfile()d with the current directory set to its
  7 | # containing dir.
  8 | #
  9 | # Note that not all possible configuration values are present in this
 10 | # autogenerated file.
 11 | #
 12 | # All configuration values have a default; values that are commented out
 13 | # serve to show the default.
 14 | 
 15 | import sys
 16 | import os
 17 | import pkg_resources
 18 | 
 19 | # If extensions (or modules to document with autodoc) are in another directory,
 20 | # add these directories to sys.path here. If the directory is relative to the
 21 | # documentation root, use os.path.abspath to make it absolute, like shown here.
 22 | #sys.path.insert(0, os.path.abspath('.'))
 23 | 
 24 | # -- General configuration ------------------------------------------------
 25 | 
 26 | # If your documentation needs a minimal Sphinx version, state it here.
 27 | #needs_sphinx = '1.0'
 28 | 
 29 | # Add any Sphinx extension module names here, as strings. They can be
 30 | # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 31 | # ones.
 32 | extensions = [
 33 |     'sphinx.ext.autodoc',
 34 |     'sphinx.ext.viewcode',
 35 | ]
 36 | 
 37 | # Add any paths that contain templates here, relative to this directory.
 38 | templates_path = ['_templates']
 39 | 
 40 | # The suffix of source filenames.
 41 | source_suffix = '.rst'
 42 | 
 43 | # The encoding of source files.
 44 | #source_encoding = 'utf-8-sig'
 45 | 
 46 | # The master toctree document.
 47 | master_doc = 'index'
 48 | 
 49 | # General information about the project.
 50 | project = u'Scan PDF'
 51 | copyright = u'2014, Virantha N. Ekanayake'
 52 | 
 53 | # The version info for the project you're documenting, acts as replacement for
 54 | # |version| and |release|, also used in various other places throughout the
 55 | # built documents.
 56 | #
 57 | # The short X.Y version.
 58 | version = ''
 59 | try:
 60 |     release = pkg_resources.get_distribution('scanpdf').version
 61 | except pkg_resources.DistributionNotFound:
 62 |     print 'To build the documentation, The distribution information of scanpdf'
 63 |     print 'Has to be available.  Either install the package into your'
 64 |     print 'development environment or run "setup.py develop" to setup the'
 65 |     print 'metadata.  A virtualenv is recommended!'
 66 |     sys.exit(1)
 67 | del pkg_resources
 68 | 
 69 | version = '.'.join(release.split('.')[:2])
 70 | # The full version, including alpha/beta/rc tags.
 71 | 
 72 | # The language for content autogenerated by Sphinx. Refer to documentation
 73 | # for a list of supported languages.
 74 | #language = None
 75 | 
 76 | # There are two options for replacing |today|: either, you set today to some
 77 | # non-false value, then it is used:
 78 | #today = ''
 79 | # Else, today_fmt is used as the format for a strftime call.
 80 | #today_fmt = '%B %d, %Y'
 81 | 
 82 | # List of patterns, relative to source directory, that match files and
 83 | # directories to ignore when looking for source files.
 84 | exclude_patterns = ['_build']
 85 | 
 86 | # The reST default role (used for this markup: `text`) to use for all
 87 | # documents.
 88 | #default_role = None
 89 | 
 90 | # If true, '()' will be appended to :func: etc. cross-reference text.
 91 | #add_function_parentheses = True
 92 | 
 93 | # If true, the current module name will be prepended to all description
 94 | # unit titles (such as .. function::).
 95 | #add_module_names = True
 96 | 
 97 | # If true, sectionauthor and moduleauthor directives will be shown in the
 98 | # output. They are ignored by default.
 99 | #show_authors = False
100 | 
101 | # The name of the Pygments (syntax highlighting) style to use.
102 | pygments_style = 'sphinx'
103 | 
104 | # A list of ignored prefixes for module index sorting.
105 | #modindex_common_prefix = []
106 | 
107 | # If true, keep warnings as "system message" paragraphs in the built documents.
108 | #keep_warnings = False
109 | 
110 | 
111 | # -- Options for HTML output ----------------------------------------------
112 | 
113 | # The theme to use for HTML and HTML Help pages.  See the documentation for
114 | # a list of builtin themes.
115 | html_theme = 'sphinxdoc'
116 | 
117 | # Theme options are theme-specific and customize the look and feel of a theme
118 | # further.  For a list of options available for each theme, see the
119 | # documentation.
120 | #html_theme_options = {}
121 | 
122 | # Add any paths that contain custom themes here, relative to this directory.
123 | #html_theme_path = []
124 | 
125 | # The name for this set of Sphinx documents.  If None, it defaults to
126 | # "<project> v<release> documentation".
127 | #html_title = None
128 | 
129 | # A shorter title for the navigation bar.  Default is the same as html_title.
130 | #html_short_title = None
131 | 
132 | # The name of an image file (relative to this directory) to place at the top
133 | # of the sidebar.
134 | #html_logo = None
135 | 
136 | # The name of an image file (within the static path) to use as favicon of the
137 | # docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
138 | # pixels large.
139 | #html_favicon = None
140 | 
141 | # Add any paths that contain custom static files (such as style sheets) here,
142 | # relative to this directory. They are copied after the builtin static files,
143 | # so a file named "default.css" will overwrite the builtin "default.css".
144 | html_static_path = ['_static']
145 | 
146 | # Add any extra paths that contain custom files (such as robots.txt or
147 | # .htaccess) here, relative to this directory. These files are copied
148 | # directly to the root of the documentation.
149 | #html_extra_path = []
150 | 
151 | # If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
152 | # using the given strftime format.
153 | #html_last_updated_fmt = '%b %d, %Y'
154 | 
155 | # If true, SmartyPants will be used to convert quotes and dashes to
156 | # typographically correct entities.
157 | #html_use_smartypants = True
158 | 
159 | # Custom sidebar templates, maps document names to template names.
160 | #html_sidebars = {}
161 | 
162 | # Additional templates that should be rendered to pages, maps page names to
163 | # template names.
164 | #html_additional_pages = {}
165 | 
166 | # If false, no module index is generated.
167 | #html_domain_indices = True
168 | 
169 | # If false, no index is generated.
170 | #html_use_index = True
171 | 
172 | # If true, the index is split into individual pages for each letter.
173 | #html_split_index = False
174 | 
175 | # If true, links to the reST sources are added to the pages.
176 | #html_show_sourcelink = True
177 | 
178 | # If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
179 | #html_show_sphinx = True
180 | 
181 | # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
182 | #html_show_copyright = True
183 | 
184 | # If true, an OpenSearch description file will be output, and all pages will
185 | # contain a <link> tag referring to it.  The value of this option must be the
186 | # base URL from which the finished HTML is served.
187 | #html_use_opensearch = ''
188 | 
189 | # This is the file name suffix for HTML files (e.g. ".xhtml").
190 | #html_file_suffix = None
191 | 
192 | # Output file base name for HTML help builder.
193 | htmlhelp_basename = 'scanpdfdoc'
194 | 
195 | 
196 | # -- Options for LaTeX output ---------------------------------------------
197 | 
198 | latex_elements = {
199 | # The paper size ('letterpaper' or 'a4paper').
200 | #'papersize': 'letterpaper',
201 | 
202 | # The font size ('10pt', '11pt' or '12pt').
203 | #'pointsize': '10pt',
204 | 
205 | # Additional stuff for the LaTeX preamble.
206 | #'preamble': '',
207 | }
208 | 
209 | # Grouping the document tree into LaTeX files. List of tuples
210 | # (source start file, target name, title,
211 | #  author, documentclass [howto, manual, or own class]).
212 | latex_documents = [
213 |   ('index', 'scanpdf.tex', u'Scan PDF Documentation',
214 |    u'Virantha N. Ekanayake', 'manual'),
215 | ]
216 | 
217 | # The name of an image file (relative to this directory) to place at the top of
218 | # the title page.
219 | #latex_logo = None
220 | 
221 | # For "manual" documents, if this is true, then toplevel headings are parts,
222 | # not chapters.
223 | #latex_use_parts = False
224 | 
225 | # If true, show page references after internal links.
226 | #latex_show_pagerefs = False
227 | 
228 | # If true, show URL addresses after external links.
229 | #latex_show_urls = False
230 | 
231 | # Documents to append as an appendix to all manuals.
232 | #latex_appendices = []
233 | 
234 | # If false, no module index is generated.
235 | #latex_domain_indices = True
236 | 
237 | 
238 | # -- Options for manual page output ---------------------------------------
239 | 
240 | # One entry per manual page. List of tuples
241 | # (source start file, name, description, authors, manual section).
242 | man_pages = [
243 |     ('index', 'scanpdf', u'Scan PDF Documentation',
244 |      [u'Author'], 1)
245 | ]
246 | 
247 | # If true, show URL addresses after external links.
248 | #man_show_urls = False
249 | 
250 | 
251 | # -- Options for Texinfo output -------------------------------------------
252 | 
253 | # Grouping the document tree into Texinfo files. List of tuples
254 | # (source start file, target name, title, author,
255 | #  dir menu entry, description, category)
256 | texinfo_documents = [
257 |   ('index', 'scanpdf', u'Scan PDF Documentation',
258 |    u'Author', 'scanpdf', 'One line description of project.',
259 |    'Miscellaneous'),
260 | ]
261 | 
262 | # Documents to append as an appendix to all manuals.
263 | #texinfo_appendices = []
264 | 
265 | # If false, no module index is generated.
266 | #texinfo_domain_indices = True
267 | 
268 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
269 | #texinfo_show_urls = 'footnote'
270 | 
271 | # If true, do not generate a @detailmenu in the "Top" node's menu.
272 | #texinfo_no_detailmenu = False
273 | 
274 | 
275 | # -- Options for Epub output ----------------------------------------------
276 | 
277 | # Bibliographic Dublin Core info.
278 | epub_title = u'scanpdf'
279 | epub_author = u'Author'
280 | epub_publisher = u'Author'
281 | epub_copyright = u'2013, Author'
282 | 
283 | # The basename for the epub file. It defaults to the project name.
284 | #epub_basename = u'scanpdf'
285 | 
286 | # The HTML theme for the epub output. Since the default themes are not optimized
287 | # for small screen space, using the same theme for HTML and epub output is
288 | # usually not wise. This defaults to 'epub', a theme designed to save visual
289 | # space.
290 | #epub_theme = 'epub'
291 | 
292 | # The language of the text. It defaults to the language option
293 | # or en if the language is not set.
294 | #epub_language = ''
295 | 
296 | # The scheme of the identifier. Typical schemes are ISBN or URL.
297 | #epub_scheme = ''
298 | 
299 | # The unique identifier of the text. This can be a ISBN number
300 | # or the project homepage.
301 | #epub_identifier = ''
302 | 
303 | # A unique identification for the text.
304 | #epub_uid = ''
305 | 
306 | # A tuple containing the cover image and cover page html template filenames.
307 | #epub_cover = ()
308 | 
309 | # A sequence of (type, uri, title) tuples for the guide element of content.opf.
310 | #epub_guide = ()
311 | 
312 | # HTML files that should be inserted before the pages created by sphinx.
313 | # The format is a list of tuples containing the path and title.
314 | #epub_pre_files = []
315 | 
316 | # HTML files shat should be inserted after the pages created by sphinx.
317 | # The format is a list of tuples containing the path and title.
318 | #epub_post_files = []
319 | 
320 | # A list of files that should not be packed into the epub file.
321 | #epub_exclude_files = []
322 | 
323 | # The depth of the table of contents in toc.ncx.
324 | #epub_tocdepth = 3
325 | 
326 | # Allow duplicate toc entries.
327 | #epub_tocdup = True
328 | 
329 | # Choose between 'default' and 'includehidden'.
330 | #epub_tocscope = 'default'
331 | 
332 | # Fix unsupported image types using the PIL.
333 | #epub_fix_images = False
334 | 
335 | # Scale large images.
336 | #epub_max_image_width = 0
337 | 
338 | # How to display URL addresses: 'footnote', 'no', or 'inline'.
339 | #epub_show_urls = 'inline'
340 | 
341 | # If false, no index is generated.
342 | #epub_use_index = True


--------------------------------------------------------------------------------
/docs/contributing.rst:
--------------------------------------------------------------------------------
1 | .. include:: ../CONTRIBUTING.rst


--------------------------------------------------------------------------------
/docs/history.rst:
--------------------------------------------------------------------------------
1 | .. include:: ../HISTORY.rst
2 | 
3 | 


--------------------------------------------------------------------------------
/docs/index.rst:
--------------------------------------------------------------------------------
 1 | .. documentation master file, created by
 2 |    sphinx-quickstart on Wed Oct 23 13:43:29 2013.
 3 |    You can adapt this file completely to your liking, but it should at least
 4 |    contain the root `toctree` directive.
 5 | 
 6 | Scan PDF API Reference (version |release|)
 7 | ====================================
 8 | 
 9 | Contents:
10 | 
11 | .. toctree::
12 |    :maxdepth: 2
13 | 
14 |    readme
15 |    contributing
16 |    authors
17 |    history
18 |    todo
19 | 
20 | 
21 | Testing
22 | ================
23 |     `Coverage <http://virantha.github.io/scanpdf/html/testing/index.html>`_
24 | 
25 | 
26 | Indices and tables
27 | ==================
28 | 
29 | * :ref:`genindex`
30 | * :ref:`modindex`
31 | * :ref:`search`
32 | 


--------------------------------------------------------------------------------
/docs/installation.rst:
--------------------------------------------------------------------------------
1 | ============
2 | Installation
3 | ============
4 | 
5 | At the command line::
6 | 
7 |     $ pip scanpdf
8 | 


--------------------------------------------------------------------------------
/docs/make.bat:
--------------------------------------------------------------------------------
  1 | @ECHO OFF
  2 | 
  3 | REM Command file for Sphinx documentation
  4 | 
  5 | if "%SPHINXBUILD%" == "" (
  6 | 	set SPHINXBUILD=sphinx-build
  7 | )
  8 | set BUILDDIR=_build
  9 | set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .
 10 | set I18NSPHINXOPTS=%SPHINXOPTS% .
 11 | if NOT "%PAPER%" == "" (
 12 | 	set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%
 13 | 	set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%
 14 | )
 15 | 
 16 | if "%1" == "" goto help
 17 | 
 18 | if "%1" == "help" (
 19 | 	:help
 20 | 	echo.Please use `make ^<target^>` where ^<target^> is one of
 21 | 	echo.  html       to make standalone HTML files
 22 | 	echo.  dirhtml    to make HTML files named index.html in directories
 23 | 	echo.  singlehtml to make a single large HTML file
 24 | 	echo.  pickle     to make pickle files
 25 | 	echo.  json       to make JSON files
 26 | 	echo.  htmlhelp   to make HTML files and a HTML help project
 27 | 	echo.  qthelp     to make HTML files and a qthelp project
 28 | 	echo.  devhelp    to make HTML files and a Devhelp project
 29 | 	echo.  epub       to make an epub
 30 | 	echo.  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter
 31 | 	echo.  text       to make text files
 32 | 	echo.  man        to make manual pages
 33 | 	echo.  texinfo    to make Texinfo files
 34 | 	echo.  gettext    to make PO message catalogs
 35 | 	echo.  changes    to make an overview over all changed/added/deprecated items
 36 | 	echo.  xml        to make Docutils-native XML files
 37 | 	echo.  pseudoxml  to make pseudoxml-XML files for display purposes
 38 | 	echo.  linkcheck  to check all external links for integrity
 39 | 	echo.  doctest    to run all doctests embedded in the documentation if enabled
 40 | 	goto end
 41 | )
 42 | 
 43 | if "%1" == "clean" (
 44 | 	for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i
 45 | 	del /q /s %BUILDDIR%\*
 46 | 	goto end
 47 | )
 48 | 
 49 | 
 50 | %SPHINXBUILD% 2> nul
 51 | if errorlevel 9009 (
 52 | 	echo.
 53 | 	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
 54 | 	echo.installed, then set the SPHINXBUILD environment variable to point
 55 | 	echo.to the full path of the 'sphinx-build' executable. Alternatively you
 56 | 	echo.may add the Sphinx directory to PATH.
 57 | 	echo.
 58 | 	echo.If you don't have Sphinx installed, grab it from
 59 | 	echo.http://sphinx-doc.org/
 60 | 	exit /b 1
 61 | )
 62 | 
 63 | if "%1" == "html" (
 64 | 	%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html
 65 | 	if errorlevel 1 exit /b 1
 66 | 	echo.
 67 | 	echo.Build finished. The HTML pages are in %BUILDDIR%/html.
 68 | 	goto end
 69 | )
 70 | 
 71 | if "%1" == "dirhtml" (
 72 | 	%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml
 73 | 	if errorlevel 1 exit /b 1
 74 | 	echo.
 75 | 	echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.
 76 | 	goto end
 77 | )
 78 | 
 79 | if "%1" == "singlehtml" (
 80 | 	%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml
 81 | 	if errorlevel 1 exit /b 1
 82 | 	echo.
 83 | 	echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.
 84 | 	goto end
 85 | )
 86 | 
 87 | if "%1" == "pickle" (
 88 | 	%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle
 89 | 	if errorlevel 1 exit /b 1
 90 | 	echo.
 91 | 	echo.Build finished; now you can process the pickle files.
 92 | 	goto end
 93 | )
 94 | 
 95 | if "%1" == "json" (
 96 | 	%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json
 97 | 	if errorlevel 1 exit /b 1
 98 | 	echo.
 99 | 	echo.Build finished; now you can process the JSON files.
100 | 	goto end
101 | )
102 | 
103 | if "%1" == "htmlhelp" (
104 | 	%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp
105 | 	if errorlevel 1 exit /b 1
106 | 	echo.
107 | 	echo.Build finished; now you can run HTML Help Workshop with the ^
108 | .hhp project file in %BUILDDIR%/htmlhelp.
109 | 	goto end
110 | )
111 | 
112 | if "%1" == "qthelp" (
113 | 	%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp
114 | 	if errorlevel 1 exit /b 1
115 | 	echo.
116 | 	echo.Build finished; now you can run "qcollectiongenerator" with the ^
117 | .qhcp project file in %BUILDDIR%/qthelp, like this:
118 | 	echo.^> qcollectiongenerator %BUILDDIR%\qthelp\pypdfocr.qhcp
119 | 	echo.To view the help file:
120 | 	echo.^> assistant -collectionFile %BUILDDIR%\qthelp\pypdfocr.ghc
121 | 	goto end
122 | )
123 | 
124 | if "%1" == "devhelp" (
125 | 	%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp
126 | 	if errorlevel 1 exit /b 1
127 | 	echo.
128 | 	echo.Build finished.
129 | 	goto end
130 | )
131 | 
132 | if "%1" == "epub" (
133 | 	%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub
134 | 	if errorlevel 1 exit /b 1
135 | 	echo.
136 | 	echo.Build finished. The epub file is in %BUILDDIR%/epub.
137 | 	goto end
138 | )
139 | 
140 | if "%1" == "latex" (
141 | 	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
142 | 	if errorlevel 1 exit /b 1
143 | 	echo.
144 | 	echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.
145 | 	goto end
146 | )
147 | 
148 | if "%1" == "latexpdf" (
149 | 	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
150 | 	cd %BUILDDIR%/latex
151 | 	make all-pdf
152 | 	cd %BUILDDIR%/..
153 | 	echo.
154 | 	echo.Build finished; the PDF files are in %BUILDDIR%/latex.
155 | 	goto end
156 | )
157 | 
158 | if "%1" == "latexpdfja" (
159 | 	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex
160 | 	cd %BUILDDIR%/latex
161 | 	make all-pdf-ja
162 | 	cd %BUILDDIR%/..
163 | 	echo.
164 | 	echo.Build finished; the PDF files are in %BUILDDIR%/latex.
165 | 	goto end
166 | )
167 | 
168 | if "%1" == "text" (
169 | 	%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text
170 | 	if errorlevel 1 exit /b 1
171 | 	echo.
172 | 	echo.Build finished. The text files are in %BUILDDIR%/text.
173 | 	goto end
174 | )
175 | 
176 | if "%1" == "man" (
177 | 	%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man
178 | 	if errorlevel 1 exit /b 1
179 | 	echo.
180 | 	echo.Build finished. The manual pages are in %BUILDDIR%/man.
181 | 	goto end
182 | )
183 | 
184 | if "%1" == "texinfo" (
185 | 	%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo
186 | 	if errorlevel 1 exit /b 1
187 | 	echo.
188 | 	echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.
189 | 	goto end
190 | )
191 | 
192 | if "%1" == "gettext" (
193 | 	%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale
194 | 	if errorlevel 1 exit /b 1
195 | 	echo.
196 | 	echo.Build finished. The message catalogs are in %BUILDDIR%/locale.
197 | 	goto end
198 | )
199 | 
200 | if "%1" == "changes" (
201 | 	%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes
202 | 	if errorlevel 1 exit /b 1
203 | 	echo.
204 | 	echo.The overview file is in %BUILDDIR%/changes.
205 | 	goto end
206 | )
207 | 
208 | if "%1" == "linkcheck" (
209 | 	%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck
210 | 	if errorlevel 1 exit /b 1
211 | 	echo.
212 | 	echo.Link check complete; look for any errors in the above output ^
213 | or in %BUILDDIR%/linkcheck/output.txt.
214 | 	goto end
215 | )
216 | 
217 | if "%1" == "doctest" (
218 | 	%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest
219 | 	if errorlevel 1 exit /b 1
220 | 	echo.
221 | 	echo.Testing of doctests in the sources finished, look at the ^
222 | results in %BUILDDIR%/doctest/output.txt.
223 | 	goto end
224 | )
225 | 
226 | if "%1" == "xml" (
227 | 	%SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml
228 | 	if errorlevel 1 exit /b 1
229 | 	echo.
230 | 	echo.Build finished. The XML files are in %BUILDDIR%/xml.
231 | 	goto end
232 | )
233 | 
234 | if "%1" == "pseudoxml" (
235 | 	%SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml
236 | 	if errorlevel 1 exit /b 1
237 | 	echo.
238 | 	echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.
239 | 	goto end
240 | )
241 | 
242 | :end
243 | 


--------------------------------------------------------------------------------
/docs/readme.rst:
--------------------------------------------------------------------------------
1 | .. include:: ../README.rst


--------------------------------------------------------------------------------
/docs/todo.rst:
--------------------------------------------------------------------------------
1 | .. include:: ../TODO.rst
2 | 
3 | 


--------------------------------------------------------------------------------
/docs/usage.rst:
--------------------------------------------------------------------------------
1 | ========
2 | Usage
3 | ========
4 | 
5 | To use airframe in a project::
6 | 
7 | 	import airframe


--------------------------------------------------------------------------------
/fabfile.py:
--------------------------------------------------------------------------------
 1 | from fabric.api import *
 2 | import os, sys
 3 |  
 4 | project_dir = os.path.join(os.path.dirname(sys.argv[0]))
 5 | 
 6 | def build_windows_dist():
 7 |     if os.name == 'nt':
 8 |         # Call the pyinstaller
 9 |         local("python ../pyinstaller/pyinstaller.py scanpdf_windows.spec --onefile")
10 | 
11 | 
12 | def run_tests():
13 |     test_dir = "test"
14 |     with lcd(test_dir):
15 |         # Regenerate the test script
16 |         local("py.test --genscript=runtests.py")
17 |         t = local("py.test --cov-config .coveragerc --cov=scanpdf --cov-report=term --cov-report=html", capture=False)
18 | 
19 |         with open("test/COVERAGE.rst", "w") as f:
20 |             f.write(t)
21 | 
22 | 
23 | def push_docs():
24 |     """ Build the sphinx docs from develop
25 |         And push it to gh-pages
26 |     """
27 |     githubpages = "/Users/virantha/dev/githubdocs/scanpdf"
28 |     # Convert markdown readme to rst
29 |     #local("pandoc README.md -f markdown -t rst -o README.rst")
30 |     with lcd(githubpages):
31 |         local("git checkout gh-pages")
32 |         local("git pull origin gh-pages")
33 |     with lcd("docs"):
34 |         print("Running sphinx in docs/ and building to ~/dev/githubpages/scanpdf")
35 |         local("make clean")
36 |         local("make html")
37 |         #local("cp -R ../test/htmlcov %s/html/testing" % githubpages)
38 |     with lcd(githubpages):
39 |         local("git add .")
40 |         local('git commit -am "doc update"')
41 |         local('git push origin gh-pages')
42 | 


--------------------------------------------------------------------------------
/first_setup.zsh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env zsh -i
 2 | 
 3 | # Auto-generated by Voodoo
 4 | # First-time script for project setup (DELETE ME AFTER RUNNING!)
 5 | 
 6 | DOCS_DIR=~/dev/githubdocs
 7 | 
 8 | print Setting up your virtualenv
 9 | 
10 | rmvirtualenv scanpdf
11 | if [ $? -ne 0 ]; then
12 |     print Removing old virtualenv failed
13 |     exit -1
14 | fi
15 | mkvirtualenv scanpdf
16 | if [ $? -ne 0 ]; then
17 |     print Making scanpdf Virtual env failed
18 |     exit -1
19 | fi
20 | 
21 | workon scanpdf
22 | if [ $? -ne 0 ]; then
23 |     print Could not switch to scanpdf
24 |     exit -1
25 | fi
26 | print Working in virtualenv scanpdf
27 | 
28 | 
29 | # Set up the pip packages
30 | #pip install pytest mock pytest-cov python-coveralls coverage sphinx tox
31 | pip install sphinx
32 | echo "cd ~/dev/scanpdf" >> ~/dev/envs/scanpdf/bin/postactivate
33 | 
34 | # Start python develop
35 | python setup.py develop
36 | 
37 | # Initialize the git repo
38 | github_remote='git@github.com:virantha/scanpdf.git'
39 | git init
40 | git remote add origin $github_remote
41 | git add .
42 | git commit -am "Setting up new project scanpdf"
43 | 
44 | # Prompt if we want to push to remote git
45 | read -q "REPLY?Create remote repository at $github_remote [y/N]?"
46 | if [[  $REPLY == y ]]; then
47 |     curl --data '{"name":"scanpdf", "description":""}' --user "virantha" https://api.github.com/user/repos
48 | fi
49 | 
50 | read -q "REPLY?Push to remote repository $github_remote [y/N]?"
51 | if [[  $REPLY == y ]]; then
52 |     git push -u origin master
53 | fi
54 | 
55 | print
56 | # Create the docs repository
57 | current_dir=`pwd`
58 | read -q "REPLY?Create and push docs to $github_remote [y/N]?"
59 | if [[  $REPLY == y ]]; then
60 |     # Go to the docs build dir, and check out our repo
61 |     cd $DOCS_DIR
62 |     git clone https://github.com/virantha/scanpdf.git
63 |     cd scanpdf
64 |     git checkout --orphan gh-pages
65 |     git rm -rf .
66 | 
67 |     cd $current_dir/docs
68 |     pip install sphinx
69 |     make html
70 |     cd $DOCS_DIR
71 |     cd scanpdf
72 |     touch .nojekyll
73 |     git add .
74 |     git commit -m "docs"
75 |     git push origin gh-pages
76 | 
77 | fi


--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | docopt>=0.6.1
2 | 


--------------------------------------------------------------------------------
/scanpdf.egg-info/PKG-INFO:
--------------------------------------------------------------------------------
  1 | Metadata-Version: 1.0
  2 | Name: scanpdf
  3 | Version: 0.3.0
  4 | Summary: Utility to use SANE/scanadf to scan to PDF
  5 | Home-page: UNKNOWN
  6 | Author: Virantha N. Ekanayake
  7 | Author-email: virantha@gmail.com
  8 | License: ASL 2.0
  9 | Description: Scan PDF - Easy scans in Linux with a document scanner like the Fujitsu ScanSnap
 10 |         ################################################################################
 11 |         
 12 |         .. image:: http://badge.fury.io/py/scanpdf.png
 13 |             :target: http://badge.fury.io/py/scanpdf
 14 |         
 15 |         .. image:: http://pypip.in/d/scanpdf/badge.png
 16 |             :target: https://crate.io/packages/scanpdf?version=latest
 17 |         
 18 |         
 19 |         If you're looking for a simple way to use a multi-page scanner and get your
 20 |         document into a PDF in Linux without any proprietary or commercial software,
 21 |         then ScanPDF might be the solution.  I wrote it to quickly take the Linux SANE
 22 |         scanner system output image files, and process them into usable PDFs.  By
 23 |         usable, I mean PDFs that maintain their original scanned resolution, omit blank
 24 |         pages (if you're scanning in duplex mode, for example), preserve color unless
 25 |         the original is greyscale/black and white, in which case they are intelligently
 26 |         down-converted to B/W PDFs to save space.
 27 |         
 28 |         * Free and open-source software: ASL2 license
 29 |         * Documentation: http://virantha.github.io/scanpdf/html
 30 |         * Source: https://github.com/virantha/scanpdf
 31 |         
 32 |         Features
 33 |         --------
 34 |         * Uses SANE/scanadf to automatically scan to multi-page compressed PDFs
 35 |         * `Integrates with ScanBd <http://virantha.github.io/scanpdf/html>`_ to respond to hardware button presses
 36 |         * Automatically removes blank pages.
 37 |         * Scans in color, and automatically down-converts into 1-bit B/W image for text/greyscale images
 38 |         
 39 |         Usage:
 40 |         ------
 41 |         The simplest way to use this is:
 42 |         
 43 |         ::
 44 |         
 45 |             scanpdf scan pdf <pdffile>
 46 |         
 47 |         This will first perform the scan, and then the conversion to PDF.  If you want
 48 |         to split up the scan and the PDF conversion into two separate invocations (for
 49 |         reasons clarified below), then you can do:
 50 |         
 51 |         ::
 52 |         
 53 |             scanpdf --tmpdir=tmp scan
 54 |             scanpdf --tmpdir=tmp pdf <pdffile>
 55 |           
 56 |         One reason for the separation might be if you want to keep scanning documents
 57 |         (very quick) while the post-processing (slower) for the PDF conversion is
 58 |         taking place in the background.   For instance, if you're using the hardware
 59 |         button on the scanner to initiate scans (as detailed in this_ document), then
 60 |         you want to return immediately after the scan instead of waiting for the full
 61 |         conversion to PDF has taken place.
 62 |         
 63 |         .. _this: http://virantha.com/2014/03/17/one-touch-scanning-with-fujitsu-scansnap-in-linux/
 64 |         
 65 |         You can optionally use the following switches to control if you're putting pages face up or face down in the auto
 66 |         document feeder, if you want to skip the blank page processing, adjust the blank page detection threshold, or add 
 67 |         additional post-processing using unpaper_:
 68 |         
 69 |         .. _unpaper: http://unpaper.berlios.de
 70 |         
 71 |         ::
 72 |         
 73 |                 --dpi=<dpi>                 DPI to scan in [default: 300]
 74 |                 --face-up=<true/false>      Face-up scanning [default: True]
 75 |                 --keep-blanks               Don't check for and remove blank pages
 76 |                 --blank-threshold=<ths>     Percentage of white to be marked as blank [default: 0.97] 
 77 |                 --post-process              Run unpaper to deskew/clean up
 78 |         
 79 |         
 80 |         Right now, I'm assuming this is getting called via ScanBD, so I don't have the option to manually specify the 
 81 |         scanner.  If you really want to use this standalone, for now, please just set the ``SCANBD_DEVICE`` environment 
 82 |         variable to your scanner device name before running this script.
 83 |         
 84 |         
 85 |         Installation
 86 |         ------------
 87 |         ::
 88 |         
 89 |             $ pip install scanpdf
 90 |         
 91 |         Requires ImageMagick and SANE to be installed, for the command line tools:
 92 |         
 93 |         * ``convert``
 94 |         * ``identify``
 95 |         * ``ps2pdf``
 96 |         * ``scanadf``
 97 |         
 98 |         Also requires epstopdf.
 99 |         
100 |         Disclaimer
101 |         ----------
102 |         The software is distributed on an "AS IS" BASIS, WITHOUT
103 |         WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
104 |         
105 | Platform: UNKNOWN
106 | 


--------------------------------------------------------------------------------
/scanpdf.egg-info/SOURCES.txt:
--------------------------------------------------------------------------------
 1 | AUTHORS.rst
 2 | CHANGES.rst
 3 | CONTRIBUTING.rst
 4 | HISTORY.rst
 5 | LICENSE.txt
 6 | MANIFEST.in
 7 | README.rst
 8 | TODO.rst
 9 | requirements.txt
10 | setup.py
11 | scanpdf/__init__.py
12 | scanpdf/scanpdf.py
13 | scanpdf/version.py
14 | scanpdf.egg-info/PKG-INFO
15 | scanpdf.egg-info/SOURCES.txt
16 | scanpdf.egg-info/dependency_links.txt
17 | scanpdf.egg-info/entry_points.txt
18 | scanpdf.egg-info/requires.txt
19 | scanpdf.egg-info/top_level.txt
20 | scanpdf.egg-info/zip-safe
21 | test/test_scanpdf.py


--------------------------------------------------------------------------------
/scanpdf.egg-info/dependency_links.txt:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/scanpdf.egg-info/entry_points.txt:
--------------------------------------------------------------------------------
1 | [console_scripts]
2 | scanpdf = scanpdf.scanpdf:main
3 | 
4 | 


--------------------------------------------------------------------------------
/scanpdf.egg-info/requires.txt:
--------------------------------------------------------------------------------
1 | docopt>=0.6.1


--------------------------------------------------------------------------------
/scanpdf.egg-info/top_level.txt:
--------------------------------------------------------------------------------
1 | scanpdf
2 | 


--------------------------------------------------------------------------------
/scanpdf.egg-info/zip-safe:
--------------------------------------------------------------------------------
1 | 
2 | 


--------------------------------------------------------------------------------
/scanpdf/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/virantha/scanpdf/82eef134957b2eed5444b5b1cecd9e3a86b8ac0c/scanpdf/__init__.py


--------------------------------------------------------------------------------
/scanpdf/scanpdf.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python2.7
  2 | # Copyright 2014 Virantha Ekanayake All Rights Reserved.
  3 | #
  4 | # Licensed under the Apache License, Version 2.0 (the "License");
  5 | # you may not use this file except in compliance with the License.
  6 | # You may obtain a copy of the License at
  7 | #
  8 | #    http://www.apache.org/licenses/LICENSE-2.0
  9 | #
 10 | # Unless required by applicable law or agreed to in writing, software
 11 | # distributed under the License is distributed on an "AS IS" BASIS,
 12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 13 | # See the License for the specific language governing permissions and
 14 | # limitations under the License.
 15 | """Scan to PDF.
 16 | 
 17 | Usage:
 18 |     scanpdf [options] scan 
 19 |     scanpdf [options] pdf <pdffile> 
 20 |     scanpdf [options] scan pdf <pdffile> 
 21 | 
 22 | 
 23 | Options:
 24 |     -v --verbose                Verbose logging
 25 |     -d --debug                  Debug logging
 26 |     --dpi=<dpi>                 DPI to scan in [default: 300]
 27 |     --tmpdir=<dir>              Temporary directory 
 28 |     --keep-tmpdir               Whether to keep the tmp dir after scanning or not [default: False]
 29 |     --face-up=<true/false>      Face-up scanning [default: True]
 30 |     --keep-blanks               Don't check for and remove blank pages
 31 |     --blank-threshold=<ths>     Percentage of white to be marked as blank [default: 0.97] 
 32 |     --post-process              Run unpaper to deskew/clean up
 33 |     
 34 | """
 35 | 
 36 | import sys, os
 37 | import logging
 38 | import shutil
 39 | import re
 40 | 
 41 | from version import __version__
 42 | import docopt
 43 | 
 44 | import subprocess
 45 | import time
 46 | import glob
 47 | from itertools import combinations
 48 | 
 49 | 
 50 | class ScanPdf(object):
 51 |     """
 52 |         The main clas.  Performs the following functions:
 53 | 
 54 |     """
 55 | 
 56 |     def __init__ (self):
 57 |         """ 
 58 |         """
 59 |         self.config = None
 60 |         self.bw_pages = {}  # Keep track of which pages were in B&W
 61 | 
 62 |     def cmd(self, cmd_list):
 63 |         if isinstance(cmd_list, list):
 64 |             cmd_list = ' '.join(cmd_list)
 65 |         logging.debug("Running cmd: %s" % cmd_list)
 66 |         try:
 67 |             out = subprocess.check_output(cmd_list, stderr=subprocess.STDOUT, shell=True)
 68 |             logging.debug(out)
 69 |             return out
 70 |         except subprocess.CalledProcessError as e:
 71 |             print (e.output)
 72 |             self._error("Could not run command %s" % cmd_list)
 73 |             
 74 | 
 75 | 
 76 |     def run_scan(self):
 77 |         device = os.environ['SCANBD_DEVICE']
 78 |         self.cmd('logger -t "scanbd: " "Begin of scan "')
 79 |         c = ['SANE_CONFIG_DIR=/etc/scanbd', 
 80 |                 'scanadf',
 81 |                 '-d "%s"' % device,
 82 |                 '--source "ADF Duplex"',
 83 |                 '--mode Color',
 84 |                 '--resolution %sdpi' % self.dpi,
 85 |                 #'--y-resolution %sdpi' % self.dpi,
 86 |                 '-o %s/page_%%04d' % self.tmp_dir,
 87 |                 #'-y 876.695mm',
 88 |                 #'--page-height 355.617mm',
 89 |                 '--page-height 876.695',
 90 |                 '-y 876.695',
 91 |                 #'--buffermode On',
 92 |                 '--brightness=25',
 93 |                 '--emphasis=20',
 94 |                 '--ald yes',
 95 |                 ]
 96 |         self.cmd(c)
 97 |         self.cmd('logger -t "scanbd: " "End of scan "')
 98 | 
 99 |     def _error(self, msg):
100 |         print("ERROR: %s" % msg)
101 |         sys.exit(-1)
102 | 
103 |     def _atoi(self,text):                                       
104 |          return int(text) if text.isdigit() else text    
105 | 
106 |     def _natural_keys(self, text):
107 |          '''                                                                                                                    
108 |          alist.sort(key=natural_keys) sorts in human order
109 |          http://nedbatchelder.com/blog/200712/human_sorting.html
110 |          (See Toothy's implementation in the comments)
111 |          ''' 
112 |          return [ self._atoi(c) for c in re.split('(\d+)', text) ]         
113 | 
114 |     def get_pages(self):
115 |         cwd = os.getcwd()
116 |         os.chdir(self.tmp_dir)
117 |         pages = glob.glob('./page_*')
118 |         pages.sort(key = self._natural_keys)
119 |         os.chdir(cwd)
120 |         return pages
121 | 
122 |     def reorder_face_up(self, pages):
123 |         reorder = []
124 |         assert len(pages) % 2 == 0, "Why is page count not even for duplexing??"
125 |         logging.info("Reordering pages")
126 |         pages.reverse()
127 |         return pages
128 |             
129 |     def parse_dimensions(self, result):
130 |         first_line = str(result.splitlines()[0].strip())
131 |         logging.debug(first_line)
132 |         mCropDim = re.compile("""\s*(?P<filename>[\d\w\[_\/\\\.]+)\s+\w+\s+(?P<X>\d+)x(?P<Y>\d+)\s+""")
133 |         # blank3.pnm PPM 1x1 1950x2716-1-1 8-bit sRGB 0.010u 0:00.009
134 |         matchCropDim = mCropDim.search(first_line)
135 |         if matchCropDim:
136 |             x = int(matchCropDim.group('X'))
137 |             y = int(matchCropDim.group('Y'))
138 |         else:
139 |             x = -1
140 |             y = -1
141 |         return x, y
142 | 
143 |     def get_dimensions(self, filename):
144 |         c = 'identify %s' % filename
145 |         result = self.cmd(c)
146 |         return self.parse_dimensions(result)
147 | 
148 |     def is_blank(self, filename):
149 |         """
150 |             Returns true if image in filename is blank
151 | 
152 |             - Shave off one inch around edges
153 |             - Blur and crop down as much as possible
154 |             - If remaining page has a dimension smaller than 0.3" conclude it's blank
155 |         """
156 |         if not os.path.exists(filename):
157 |             return True
158 | 
159 |         #c = 'convert %s -shave %sx%s -virtual-pixel White -blur 0x15 -fuzz 15%% -trim info:' % (filename, self.dpi, self.dpi)
160 |         c = 'convert %s -shave %sx%s -density %s -adaptive-resize 65%% -virtual-pixel White -blur 0x15 -fuzz 15%% -trim info:' % (filename, self.dpi, self.dpi, int(self.dpi/2))
161 |         result = self.cmd(c)
162 |         x, y = self.parse_dimensions(result)
163 |         if x>0 and y>0:
164 |             logging.debug('Finding threshold for blanks')
165 |             threshold = int(self.dpi)/2*0.3  # Threshold is 0.3 inches
166 |             logging.debug('x=%s, y=%s, threshold=%s' % (x, y, threshold))
167 |             if x < threshold or y < threshold:
168 |                 return True
169 |             else:
170 |                 return False
171 |         else:
172 |             logging.debug('Could not find dimensions in output of imagemagick for cropping')
173 |             return False
174 |         
175 |         # Old code, doesn't really work for pages with small amounts of text
176 |         # c = 'identify -verbose %s' % filename
177 |         # result = self.cmd(c)
178 |         # mStdDev = re.compile("""\s*standard deviation:\s*\d+\.\d+\s*\((?P<percent>\d+\.\d+)\).*""")
179 |         # for line in result.splitlines():
180 |         #     match = mStdDev.search(str(line))
181 |         #     if match:
182 |         #         stdev = float(match.group('percent'))
183 |         #         if stdev > 0.1:
184 |         #             return False
185 |         # return True
186 | 
187 | 
188 |     def run_postprocess(self, page_files):
189 |         cwd = os.getcwd()
190 |         os.chdir(self.tmp_dir)
191 |         
192 |         processed_pages = []
193 |         self.bw_pages = {}
194 |         for page in page_files:
195 |             processed_page = '%s_unpaper' % page
196 |             c = ['unpaper', page, processed_page]
197 |             self.cmd(c) 
198 |             os.remove(page)
199 |             processed_pages.append(processed_page)
200 |             self.bw_pages[processed_page] = True
201 |         os.chdir(cwd)
202 |         return processed_pages
203 | 
204 |     def run_crop(self, page_files):
205 |         cwd = os.getcwd()
206 |         os.chdir(self.tmp_dir)
207 |         crop_pages = []
208 |         for i, page in enumerate(page_files):
209 |             logging.debug("Cropping page %d" % i)
210 |             crop_page = '%s.crop' % page
211 |             shave_amt = int(int(self.dpi)*0.1)
212 |             c = ['convert',
213 |                     '-deskew 80%',
214 |                     '-shave %dx%d' % (shave_amt, shave_amt),
215 |                     '-fuzz 20%',
216 |                     '-trim',
217 |                     '+repage',
218 |                 ]
219 |             
220 |             # Get original dimensions
221 |             x, y = self.get_dimensions(page)
222 |             if x>0 and y>0:
223 |                 # IF we know the original dimensions, then just pad back to that with white background
224 |                 c.extend([  '-gravity center',
225 |                             '-extent %sx%s' % (x, y),
226 |                             '-background white',
227 |                 ])
228 |             c.extend([ ' %s ' % page,
229 |                        crop_page,
230 |                     ])
231 |             self.cmd(c)
232 |             crop_pages.append(crop_page)
233 | 
234 |             if not self.args['--keep-tmpdir']:
235 |                 os.remove(page)
236 | 
237 |         os.chdir(cwd)
238 |         return crop_pages
239 | 
240 |     def run_convert(self, page_files):
241 |         cwd = os.getcwd()
242 |         os.chdir(self.tmp_dir)
243 | 
244 |         pdf_basename = os.path.basename(self.pdf_filename)
245 |         ps_filename = pdf_basename
246 |         ps_filename = ps_filename.replace(".pdf", ".ps")
247 | 
248 | 
249 |         # Convert each page to a ps
250 |         for page in page_files:
251 |             is_bw = self.bw_pages.get(page, False)
252 |             if is_bw:
253 |                 c = ['convert',
254 |                         page,
255 |                         '-density %s' % self.dpi,
256 |                         '-depth 2', 
257 |                         '-define png:compression-level=9',
258 |                         '-define png:format=8',
259 |                         '-define png:color-type=0',
260 |                         '-define png:bit-depth=2',
261 |                         'PNG:- | convert - -rotate 180',
262 |                         '%s.pdf' % page,
263 |                         ]
264 |             else:
265 |                 c = ['convert',
266 |                         '-density %s' % self.dpi,
267 |                         '+page', # Make sure it doesn't crop to letter size
268 |                         '-compress JPEG',
269 |                         '-sampling-factor 4:2:0',
270 |                         '-strip',
271 |                         '-quality 85',
272 |                         '-interlace JPEG',
273 |                         '-colorspace RGB',
274 |                         '-rotate 180',
275 |                         page,
276 |                         '%s.pdf' % page,
277 |                     ]
278 |             self.cmd(c)
279 | 
280 |         # Create a single ps file using gs
281 |         c = ['gs', 
282 |                 '-sDEVICE=pdfwrite',
283 |                 '-r%s' % self.dpi,
284 |                 '-dNOPAUSE',
285 |                 '-dBATCH',
286 |                 '-dSAFER',
287 |                 '-sOutputFile=%s' % pdf_basename,
288 |                 ' '.join(['%s.pdf' % p for p in page_files]),
289 |                 ]
290 |         self.cmd(c)
291 |         c = ['epstopdf',
292 |                 ps_filename,
293 |                 ]
294 |         
295 |         #self.cmd(c)
296 | 
297 |         #c = ['convert',
298 |                 #'-density %s' % self.dpi,
299 |                 #'+page', # Make sure it doesn't crop to letter size
300 |                 #'-compress JPEG',
301 |                 #'-sampling-factor 4:2:0',
302 |                 #'-strip',
303 |                 #'-quality 85',
304 |                 #'-interlace JPEG',
305 |                 #'-colorspace RGB',
306 |                 #'-rotate 180',
307 |                 #' '.join(page_files),
308 |                 #'%s' % pdf_basename,
309 |             #]
310 |         #self.cmd(c)
311 | 
312 | 
313 |         #c = ['ps2pdf',
314 |                 #'-DPDFSETTINGS=/prepress',
315 |                 #ps_filename,
316 |                 #pdf_basename,
317 |             #]
318 | 
319 |         # unneeded since we're going directly to pdf using imagemagick now
320 |         #c = ['epstopdf',
321 |                 #ps_filename,
322 |                 #]
323 |         
324 |         #self.cmd(c)
325 |         shutil.move(pdf_basename, self.pdf_filename)
326 |         if not self.args['--keep-tmpdir']:
327 |             for filename in page_files:
328 |                 os.remove(filename)
329 |            
330 |         # IF we did the scan, then remove the tmp dir too
331 |         if self.args['scan'] and not self.args['--keep-tmpdir']:
332 |             os.rmdir(self.tmp_dir)
333 |         os.chdir(cwd)
334 |         
335 | 
336 |     def convert_to_bw(self, pages):
337 |         new_pages = []
338 |         for i, page in enumerate(pages):
339 |             filename = os.path.join(self.tmp_dir, page)
340 |             logging.info("Checking if %s is bw..." % filename)
341 |             if self._is_color(filename):
342 |                 new_pages.append(page)
343 |                 logging.info("No, %s is color..." % filename)
344 |                 self.bw_pages[page] = False
345 |             else: # COnvert to BW
346 |                 bw_page = self._page_to_bw(filename)
347 |                 logging.info("Yes, %s converted to bw..." % filename)
348 |                 new_pages.append(bw_page)
349 |                 self.bw_pages[bw_page] = True
350 |         return new_pages
351 | 
352 |             
353 |     def _page_to_bw(self, page):
354 |         out_page = "%s_bw" % page
355 |         cwd = os.getcwd()
356 |         os.chdir(self.tmp_dir)
357 | 
358 |         cmd = "convert %s +dither -density %s -colors 16 -colors 4 -colorspace gray -normalize %s_bw" % (page, self.dpi, page)
359 |         out = self.cmd(cmd)
360 |         # Remove the old file
361 |         if not self.args['--keep-tmpdir']:
362 |             os.remove(page)
363 |         os.chdir(cwd)
364 |         return out_page
365 | 
366 |     def _is_color(self, filename):
367 |         """
368 |             Run the following command from ImageMagick:
369 | 
370 |             ::
371 |                 
372 |                  convert holi.pdf -colors 8 -depth 8 -format %c histogram:info:- 
373 | 
374 |             This outputs something like the following:
375 |             ::
376 | 
377 |                   10831: ( 24, 26, 26,255) #181A1A srgba(24,26,26,1)
378 |                   4836: ( 55, 87, 79,255) #37574F srgba(55,87,79,1)
379 |                   6564: ( 77,138,121,255) #4D8A79 srgba(77,138,121,1)
380 |                   4997: ( 86, 96, 93,255) #56605D srgba(86,96,93,1)
381 |                   7005: ( 92,153,139,255) #5C998B srgba(92,153,139,1)
382 |                   2479: (143,118,123,255) #8F767B srgba(143,118,123,1)
383 |                   8870: (169,176,170,255) #A9B0AA srgba(169,176,170,1)
384 |                 442906: (254,254,254,255) #FEFEFE srgba(254,254,254,1)
385 |                   1053: (  0,  0,  0,255) #000000 black
386 |                 484081: (255,255,255,255) #FFFFFF white
387 |  
388 |         """
389 |         cmd = "convert %s -density %s -adaptive-resize 35%% -colors 8 -depth 8 -format %%c histogram:info:-" % (filename, int(self.dpi/3))
390 |         out = self.cmd(cmd)
391 |         mLine = re.compile(r"""\s*(?P<count>\d+):\s*\(\s*(?P<R>\d+),\s*(?P<G>\d+),\s*(?P<B>\d+).+""")
392 |         colors = []
393 |         for line in out.splitlines():
394 |             matchLine = mLine.search(str(line))
395 |             if matchLine:
396 |                 logging.debug("Found RGB values")
397 |                 color = [int(x) for x in (matchLine.group('count'),
398 |                              matchLine.group('R'),
399 |                              matchLine.group('G'),
400 |                              matchLine.group('B'),
401 |                              )
402 |                         ]
403 |                 colors.append(color)
404 |         # sort
405 |         colors.sort(reverse=True, key = lambda x: x[0])
406 |         logging.debug(colors)
407 |         is_color = False
408 |         logging.debug(colors)
409 |         for color in colors:
410 |             # Calculate the mean differences between the RGB components
411 |             # Shades of grey will be very close to zero in this metric...
412 |             diff = float(sum([abs(color[2]-color[1]),
413 |                          abs(color[3]-color[1]),
414 |                          abs(color[3]-color[2]),
415 |                          ]))/3
416 |             if diff > 30:
417 |                 is_color = True
418 |                 logging.debug("Found color, diff is %s" % diff)
419 |             else:
420 |                 logging.debug("No color, diff is %s" % diff)
421 |         return is_color
422 | 
423 | 
424 | 
425 |     def get_options(self, argv):
426 |         """
427 |             Parse the command-line options and set the following object properties:
428 | 
429 |             :param argv: usually just sys.argv[1:]
430 |             :returns: Nothing
431 | 
432 |             :ivar debug: Enable logging debug statements
433 |             :ivar verbose: Enable verbose logging
434 |             :ivar config: Dict of the config file
435 | 
436 |         """
437 |         self.args = argv
438 | 
439 |         if argv['--verbose']:
440 |             logging.basicConfig(level=logging.INFO, format='%(message)s')
441 |         if argv['--debug']:
442 |             logging.basicConfig(level=logging.DEBUG, format='%(message)s')                
443 |         if self.args['pdf']:
444 |             self.pdf_filename = os.path.abspath(self.args['<pdffile>'])
445 | 
446 |         self.dpi = int(self.args['--dpi'])
447 | 
448 |         output_dir = time.strftime('%Y%m%d_%H%M%S', time.localtime())
449 |         if argv['--tmpdir']:
450 |             self.tmp_dir = argv['--tmpdir']
451 |         else:
452 |             self.tmp_dir = os.path.join('/tmp', output_dir)
453 |         self.tmp_dir = os.path.abspath(self.tmp_dir)
454 | 
455 |         # Make the tmp dir only if we're scanning, o/w throw an error
456 |         if argv['scan']:
457 |             if os.path.exists(self.tmp_dir):
458 |                 self._error("Temporary output directory %s already exists!" % self.tmp_dir)
459 |             else:
460 |                 os.makedirs(self.tmp_dir)
461 |         else:
462 |             if not os.path.exists(self.tmp_dir):
463 |                 self._error("Scan files directory %s does not exist!" % self.tmp_dir)
464 |             
465 |         # Blank checks
466 |         self.keep_blanks =  argv['--keep-blanks']
467 |         self.blank_threshold = float(argv['--blank-threshold'])
468 |         assert(self.blank_threshold >= 0 and self.blank_threshold <= 1.0)
469 |         self.post_process = argv['--post-process']
470 | 
471 |     def go(self, argv):
472 |         """ 
473 |             The main entry point into ScanPdf
474 | 
475 |             #. Get the options
476 |             #. Create the temp dir
477 |             #. Run scanadf
478 |         """
479 |         # Read the command line options
480 |         self.get_options(argv)
481 |         logging.info("Temp dir: %s" % self.tmp_dir)
482 |         if self.args['scan']:
483 |             self.run_scan()
484 |         
485 |         if self.args['pdf']:
486 |             # Now, convert the files to ps
487 |             pages = self.get_pages()
488 |             logging.debug( pages )
489 |             if self.args['--face-up']:
490 |                 pages = self.reorder_face_up(pages)
491 |             
492 |             logging.debug( pages )
493 | 
494 |             # Crop the pages
495 |             pages = self.run_crop(pages)
496 | 
497 |             # Now, check if color or bw
498 |             pages = self.convert_to_bw(pages)
499 |             logging.debug(pages)
500 | 
501 |             # Run blanks
502 |             if not self.keep_blanks:
503 |                 no_blank_pages = []
504 |                 for i,page in enumerate(pages):
505 |                     filename = os.path.join(self.tmp_dir, page)
506 |                     logging.info("Checking if %s is blank..." % filename)
507 |                     if not self.is_blank(filename):
508 |                         no_blank_pages.append(page)
509 |                     else:
510 |                         logging.info("  page %s is blank, removing..." % i)
511 |                         os.remove(filename)
512 |                 pages = no_blank_pages
513 |                     
514 |             logging.debug( pages )
515 | 
516 |             if self.post_process:
517 |                 pages = self.run_postprocess(pages)
518 |                 
519 |             self.run_convert(pages)
520 |         
521 | def main():
522 |     args = docopt.docopt(__doc__, version='Scan PDF %s' % __version__ )
523 |     script = ScanPdf()
524 |     print(args)
525 |     script.go(args)
526 | 
527 | if __name__ == '__main__':
528 |     main()
529 | 
530 | 


--------------------------------------------------------------------------------
/scanpdf/version.py:
--------------------------------------------------------------------------------
1 | __version__ = "0.3.1"
2 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | from __future__ import print_function
 2 | from setuptools import setup, find_packages
 3 | 
 4 | import io
 5 | import os
 6 | import scanpdf
 7 | from scanpdf.version import __version__
 8 | from setuptools import Command
 9 | 
10 | class PyTest(Command):
11 |     user_options = []
12 |     def initialize_options(self):
13 |         pass
14 |     def finalize_options(self):
15 |         pass
16 |     def run(self):
17 |         import sys,subprocess
18 |         cwd = os.getcwd()
19 |         os.chdir('test')
20 |         errno = subprocess.call([sys.executable, 'runtests.py'])
21 |         os.chdir(cwd)
22 |         raise SystemExit(errno)
23 | 
24 | def read(*filenames, **kwargs):
25 |     encoding = kwargs.get('encoding', 'utf-8')
26 |     sep = kwargs.get('sep', '\n')
27 |     buf = []
28 |     for filename in filenames:
29 |         with io.open(filename, encoding=encoding) as f:
30 |             buf.append(f.read())
31 |     return sep.join(buf)
32 | 
33 | packages = find_packages(exclude="tests")
34 | 
35 | long_description = read('README.rst')
36 | 
37 | with open("requirements.txt") as f:
38 |     required = f.read().splitlines()
39 | 
40 | setup (
41 |     name = "scanpdf",
42 |     version = __version__,
43 |     description="Utility to use SANE/scanadf to scan to PDF",
44 |     license = "ASL 2.0",
45 |     long_description = long_description,
46 |     author="Virantha N. Ekanayake",
47 |     author_email="virantha@gmail.com", # Removed.
48 |     package_data = {'': ['*.xml']},
49 |     zip_safe = True,
50 |     include_package_data = True,
51 |     packages = packages,
52 |     install_requires = required,
53 |     entry_points = {
54 |             'console_scripts': [
55 |                     'scanpdf = scanpdf.scanpdf:main'
56 |                 ],
57 |         },
58 |     options = {
59 | 	    "pyinstaller": {"packages": packages}
60 | 	    },
61 |     cmdclass = {'test':PyTest}
62 | 
63 | )


--------------------------------------------------------------------------------
/test/COVERAGE.rst:
--------------------------------------------------------------------------------
1 | ============================= test session starts ==============================
2 | Nothing yet
3 | =========================  ==========================
4 | 


--------------------------------------------------------------------------------
/test/test_scanpdf.py:
--------------------------------------------------------------------------------
 1 | import scanpdf.ScanPdf as P
 2 | import pytest
 3 | import os
 4 | import logging
 5 | 
 6 | import smtplib
 7 | from mock import Mock
 8 | from mock import patch, call
 9 | from mock import MagicMock
10 | from mock import PropertyMock
11 | 
12 | 
13 | class Testscanpdf:
14 | 
15 |     def setup(self):
16 |         self.p = P.ScanPdf()
17 | 


--------------------------------------------------------------------------------
/tox.ini:
--------------------------------------------------------------------------------
 1 | [tox]
 2 | envlist=py27,py33
 3 | 
 4 | [testenv]
 5 | changedir=test
 6 | deps=
 7 |     pytest
 8 |     mock
 9 |     coverage
10 | commands=py.test
11 | 


--------------------------------------------------------------------------------