├── .gitignore ├── bin ├── scan2jpg ├── scan2png ├── pdfcat ├── pdfresize ├── pdf2pdfa ├── img2pdf ├── pdfmeta ├── ocrpdf └── scan2pdf ├── .gitattributes ├── icons ├── gitlab-avatar.png └── gitlab-avatar.svg ├── tests ├── files │ └── lorem-ipsum.pdf ├── pdfcat.bats ├── pdfresize.bats ├── img2pdf.bats ├── scan2jpg.bats ├── scan2pdf.bats ├── scan2png.bats ├── ocrpdf.bats ├── pdf2pdfa.bats ├── pdfmeta.bats └── includes.bash ├── .editorconfig ├── .github └── workflows │ ├── gitlab-sync.yml │ ├── automerge.yml │ ├── create-docs.yml │ ├── bash-compatibility.yml │ └── os-compatibilty.yml ├── LICENSE ├── Makefile └── README.adoc /.gitignore: -------------------------------------------------------------------------------- 1 | docs 2 | -------------------------------------------------------------------------------- /bin/scan2jpg: -------------------------------------------------------------------------------- 1 | scan2pdf -------------------------------------------------------------------------------- /bin/scan2png: -------------------------------------------------------------------------------- 1 | scan2pdf -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | text eol=lf 2 | -------------------------------------------------------------------------------- /icons/gitlab-avatar.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uroesch/pdftools/HEAD/icons/gitlab-avatar.png -------------------------------------------------------------------------------- /tests/files/lorem-ipsum.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/uroesch/pdftools/HEAD/tests/files/lorem-ipsum.pdf -------------------------------------------------------------------------------- /.editorconfig: -------------------------------------------------------------------------------- 1 | # EditorConfig is awesome: http://EditorConfig.org 2 | 3 | # top-most EditorConfig file 4 | root = true 5 | 6 | # Unix-style newlines with a newline ending every file 7 | [*] 8 | end_of_line = lf 9 | charset = utf-8 10 | trim_trailing_whitespace = true 11 | insert_final_newline = true 12 | indent_style = space 13 | indent_size = 2 14 | 15 | [Makefile] 16 | indent_style = tab 17 | -------------------------------------------------------------------------------- /.github/workflows/gitlab-sync.yml: -------------------------------------------------------------------------------- 1 | # ----------------------------------------------------------------------------- 2 | # Sync changes to gitlab.com 3 | # Author: Urs Roesch https://github.com/uroesch 4 | # Version: 0.1.1 5 | # ----------------------------------------------------------------------------- 6 | name: gitlab-sync 7 | 8 | on: 9 | - push 10 | - delete 11 | 12 | jobs: 13 | gitlab-sync: 14 | runs-on: ubuntu-latest 15 | steps: 16 | - uses: actions/checkout@v3 17 | with: 18 | fetch-depth: 0 19 | 20 | - uses: wangchucheng/git-repo-sync@v0.1.0 21 | with: 22 | target-url: https://gitlab.com/${{ github.repository }} 23 | target-username: ${{ github.repository_owner }} 24 | target-token: ${{ secrets.GITLAB_TOKEN }} 25 | -------------------------------------------------------------------------------- /.github/workflows/automerge.yml: -------------------------------------------------------------------------------- 1 | # ----------------------------------------------------------------------------- 2 | # Automerge pull requests for pdftools 3 | # Author: Urs Roesch https://github.com/uroesch 4 | # Version: 0.3.1 5 | # ----------------------------------------------------------------------------- 6 | name: automerge 7 | on: 8 | pull_request: 9 | branches: 10 | - master 11 | - main 12 | check_suite: 13 | types: 14 | - completed 15 | status: {} 16 | jobs: 17 | automerge: 18 | runs-on: ubuntu-latest 19 | steps: 20 | - name: automerge pull request 21 | uses: "pascalgn/automerge-action@v0.15.5" 22 | env: 23 | MERGE_FILTER_AUTHOR: uroesch 24 | MERGE_FORKS: false 25 | MERGE_RETRIES: 20 26 | MERGE_RETRY_SLEEP: 60000 27 | MERGE_DELETE_BRANCH: true 28 | MERGE_LABELS: "" 29 | GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}" 30 | -------------------------------------------------------------------------------- /tests/pdfcat.bats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bats 2 | 3 | load includes 4 | 5 | @test "pdfcat: Common option --help" { 6 | output=$(pdf2pdfa --help 2>&1) 7 | [[ ${output} =~ ' --help ' ]] 8 | [[ ${output} =~ ' --version ' ]] 9 | } 10 | 11 | @test "pdfcat: Common option -h" { 12 | output=$(pdfcat -h 2>&1) 13 | [[ ${output} =~ ' -h ' ]] 14 | [[ ${output} =~ ' -V ' ]] 15 | } 16 | 17 | @test "pdfcat: Common option --version" { 18 | pdfcat --version | grep -w ${PDFCAT_VERSION} 19 | } 20 | 21 | @test "pdfcat: Common option -V" { 22 | pdfcat -V | grep -w ${PDFCAT_VERSION} 23 | } 24 | 25 | @test "pdfcat: Concat two PDFs" { 26 | ::create-tempdir 27 | input_pdf=${FILES_DIR}/${SAMPLE_PDF} 28 | output_pdf=${TEMP_DIR}/output.pdf 29 | pdfcat "${input_pdf}" "${input_pdf}" > "${output_pdf}" 30 | ::pdf-info "${output_pdf}" 'Title' "Lorem ipsum" 31 | ::pdf-info "${output_pdf}" 'Producer' 'Wikisource' 32 | ::pdf-info "${output_pdf}" 'Pages' '10' 33 | ::pdf-info "${output_pdf}" 'Page size' 'A4' 34 | } 35 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Urs Roesch 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /.github/workflows/create-docs.yml: -------------------------------------------------------------------------------- 1 | # ----------------------------------------------------------------------------- 2 | # Build documents for pdftools 3 | # Author: Urs Roesch https://github.com/uroesch 4 | # Version: 0.1.2 5 | # ----------------------------------------------------------------------------- 6 | name: create-docs 7 | 8 | on: 9 | push: 10 | branches: 11 | - workflow/* 12 | pull_request: 13 | branches: 14 | - master 15 | - main 16 | 17 | jobs: 18 | create-docs: 19 | timeout-minutes: 10 20 | runs-on: ubuntu-latest 21 | container: 22 | image: ${{ matrix.container }} 23 | strategy: 24 | matrix: 25 | container: 26 | - ubuntu:22.04 27 | - ubuntu:20.04 28 | - ubuntu:18.04 29 | - debian:12 30 | - debian:11 31 | - debian:10 32 | env: 33 | DEBIAN_FRONTEND: noninteractive 34 | 35 | steps: 36 | - name: Install dependencies 37 | shell: bash 38 | run: | 39 | apt update 40 | apt install -y make sudo 41 | 42 | - name: Checkout repository 43 | uses: actions/checkout@v3 44 | 45 | - name: Create documents 46 | shell: bash 47 | run: make docs 48 | -------------------------------------------------------------------------------- /tests/pdfresize.bats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bats 2 | 3 | load includes 4 | 5 | @test "pdfresize: Common option --help" { 6 | output=$(pdfresize --help 2>&1) 7 | [[ ${output} =~ ' --help ' ]] 8 | [[ ${output} =~ ' --input ' ]] 9 | [[ ${output} =~ ' --output ' ]] 10 | [[ ${output} =~ ' --quality ' ]] 11 | [[ ${output} =~ ' --version ' ]] 12 | } 13 | 14 | @test "pdfresize: Common option -h" { 15 | output=$(pdfresize -h 2>&1) 16 | [[ ${output} =~ ' -h ' ]] 17 | [[ ${output} =~ ' -i ' ]] 18 | [[ ${output} =~ ' -o ' ]] 19 | [[ ${output} =~ ' -q ' ]] 20 | [[ ${output} =~ ' -V ' ]] 21 | } 22 | 23 | @test "pdfresize: Common option --version" { 24 | pdfresize --version | grep -w ${PDFRESIZE_VERSION} 25 | } 26 | 27 | @test "pdfresize: Common option -V" { 28 | pdfresize -V | grep -w ${PDFRESIZE_VERSION} 29 | } 30 | 31 | @test "pdfresize: Convert to default" { 32 | ::pdfresize default 33 | } 34 | 35 | @test "pdfresize: Convert to screen" { 36 | ::pdfresize screen 37 | } 38 | 39 | @test "pdfresize: Convert to ebook" { 40 | ::pdfresize ebook 41 | } 42 | 43 | @test "pdfresize: Convert to printer" { 44 | ::pdfresize printer 45 | } 46 | 47 | @test "pdfresize: Convert to prepress" { 48 | ::pdfresize prepress 49 | } 50 | -------------------------------------------------------------------------------- /tests/img2pdf.bats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bats 2 | 3 | load includes 4 | 5 | @test "img2pdf: Common option --help" { 6 | output=$(img2pdf --help 2>&1) 7 | [[ ${output} =~ ' --delete ' ]] 8 | [[ ${output} =~ ' --help ' ]] 9 | [[ ${output} =~ ' --output ' ]] 10 | [[ ${output} =~ ' --rotate ' ]] 11 | [[ ${output} =~ ' --version ' ]] 12 | } 13 | 14 | @test "img2pdf: Common option -h" { 15 | output=$(img2pdf -h 2>&1) 16 | [[ ${output} =~ ' -d ' ]] 17 | [[ ${output} =~ ' -h ' ]] 18 | [[ ${output} =~ ' -o ' ]] 19 | [[ ${output} =~ ' -r ' ]] 20 | [[ ${output} =~ ' -V ' ]] 21 | } 22 | 23 | @test "img2pdf: Common option --version" { 24 | img2pdf --version | grep -w ${IMG2PDF_VERSION} 25 | } 26 | 27 | @test "img2pdf: Common option -V" { 28 | img2pdf -V | grep -w ${IMG2PDF_VERSION} 29 | } 30 | 31 | @test "img2pdf: Create pdf from png" { 32 | ::pdf-to-images png 33 | pdf=${TEMP_DIR}/img2pdf-from-png.pdf 34 | ::img2pdf "${pdf}" png 35 | ::is-pdf "${pdf}" 36 | ::cleanup-tempdir 37 | } 38 | 39 | @test "img2pdf: Create pdf from tiff" { 40 | ::pdf-to-images tiff 41 | pdf=${TEMP_DIR}/img2pdf-from-tiff.pdf 42 | ::img2pdf "${pdf}" tif 43 | ::is-pdf "${pdf}" 44 | ::cleanup-tempdir 45 | } 46 | 47 | @test "img2pdf: Create pdf from jpeg" { 48 | ::pdf-to-images jpeg 49 | pdf=${TEMP_DIR}/img2pdf-from-jpeg.pdf 50 | ::img2pdf "${pdf}" jpg 51 | ::is-pdf "${pdf}" 52 | ::cleanup-tempdir 53 | } 54 | -------------------------------------------------------------------------------- /tests/scan2jpg.bats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bats 2 | 3 | load includes 4 | 5 | @test "scan2jpg: Common option --help" { 6 | output=$(scan2jpg --help 2>&1) 7 | [[ ${output} =~ ' --help ' ]] 8 | [[ ${output} =~ ' --interactive ' ]] 9 | [[ ${output} =~ ' --type ' ]] 10 | [[ ${output} =~ ' --resolution ' ]] 11 | [[ ${output} =~ ' --page ' ]] 12 | [[ ${output} =~ ' --depth ' ]] 13 | [[ ${output} =~ ' --format ' ]] 14 | [[ ${output} =~ ' --quality ' ]] 15 | [[ ${output} =~ ' --mode ' ]] 16 | [[ ${output} =~ ' --ocr ' ]] 17 | [[ ${output} =~ ' --ocr-lang ' ]] 18 | [[ ${output} =~ ' --output ' ]] 19 | [[ ${output} =~ ' --orientation ' ]] 20 | [[ ${output} =~ ' --scanner ' ]] 21 | [[ ${output} =~ ' --version ' ]] 22 | [[ ${output} =~ ' --help ' ]] 23 | } 24 | 25 | @test "scan2jpg: Common option -h" { 26 | output=$(scan2jpg -h 2>&1) 27 | [[ ${output} =~ ' -h ' ]] 28 | [[ ${output} =~ ' -I ' ]] 29 | [[ ${output} =~ ' -t ' ]] 30 | [[ ${output} =~ ' -r ' ]] 31 | [[ ${output} =~ ' -p ' ]] 32 | [[ ${output} =~ ' -d ' ]] 33 | [[ ${output} =~ ' -f ' ]] 34 | [[ ${output} =~ ' -q ' ]] 35 | [[ ${output} =~ ' -m ' ]] 36 | [[ ${output} =~ ' -R ' ]] 37 | [[ ${output} =~ ' -L ' ]] 38 | [[ ${output} =~ ' -o ' ]] 39 | [[ ${output} =~ ' -O ' ]] 40 | [[ ${output} =~ ' -s ' ]] 41 | [[ ${output} =~ ' -V ' ]] 42 | } 43 | 44 | @test "scan2jpg: Common option --version" { 45 | scan2jpg --version | grep -w ${SCAN2JPG_VERSION} 46 | } 47 | 48 | @test "scan2jpg: Common option -V" { 49 | scan2jpg -V | grep -w ${SCAN2JPG_VERSION} 50 | } 51 | -------------------------------------------------------------------------------- /tests/scan2pdf.bats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bats 2 | 3 | load includes 4 | 5 | @test "scan2pdf: Common option --help" { 6 | output=$(scan2pdf --help 2>&1) 7 | [[ ${output} =~ ' --help ' ]] 8 | [[ ${output} =~ ' --interactive ' ]] 9 | [[ ${output} =~ ' --type ' ]] 10 | [[ ${output} =~ ' --resolution ' ]] 11 | [[ ${output} =~ ' --page ' ]] 12 | [[ ${output} =~ ' --depth ' ]] 13 | [[ ${output} =~ ' --format ' ]] 14 | [[ ${output} =~ ' --quality ' ]] 15 | [[ ${output} =~ ' --mode ' ]] 16 | [[ ${output} =~ ' --ocr ' ]] 17 | [[ ${output} =~ ' --ocr-lang ' ]] 18 | [[ ${output} =~ ' --output ' ]] 19 | [[ ${output} =~ ' --orientation ' ]] 20 | [[ ${output} =~ ' --scanner ' ]] 21 | [[ ${output} =~ ' --version ' ]] 22 | [[ ${output} =~ ' --help ' ]] 23 | } 24 | 25 | @test "scan2pdf: Common option -h" { 26 | output=$(scan2pdf -h 2>&1) 27 | [[ ${output} =~ ' -h ' ]] 28 | [[ ${output} =~ ' -I ' ]] 29 | [[ ${output} =~ ' -t ' ]] 30 | [[ ${output} =~ ' -r ' ]] 31 | [[ ${output} =~ ' -p ' ]] 32 | [[ ${output} =~ ' -d ' ]] 33 | [[ ${output} =~ ' -f ' ]] 34 | [[ ${output} =~ ' -q ' ]] 35 | [[ ${output} =~ ' -m ' ]] 36 | [[ ${output} =~ ' -R ' ]] 37 | [[ ${output} =~ ' -L ' ]] 38 | [[ ${output} =~ ' -o ' ]] 39 | [[ ${output} =~ ' -O ' ]] 40 | [[ ${output} =~ ' -s ' ]] 41 | [[ ${output} =~ ' -V ' ]] 42 | } 43 | 44 | @test "scan2pdf: Common option --version" { 45 | scan2pdf --version | grep -w ${SCAN2PDF_VERSION} 46 | } 47 | 48 | @test "scan2pdf: Common option -V" { 49 | scan2pdf -V | grep -w ${SCAN2PDF_VERSION} 50 | } 51 | -------------------------------------------------------------------------------- /tests/scan2png.bats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bats 2 | 3 | load includes 4 | 5 | @test "scan2png: Common option --help" { 6 | output=$(scan2png --help 2>&1) 7 | [[ ${output} =~ ' --help ' ]] 8 | [[ ${output} =~ ' --interactive ' ]] 9 | [[ ${output} =~ ' --type ' ]] 10 | [[ ${output} =~ ' --resolution ' ]] 11 | [[ ${output} =~ ' --page ' ]] 12 | [[ ${output} =~ ' --depth ' ]] 13 | [[ ${output} =~ ' --format ' ]] 14 | [[ ${output} =~ ' --quality ' ]] 15 | [[ ${output} =~ ' --mode ' ]] 16 | [[ ${output} =~ ' --ocr ' ]] 17 | [[ ${output} =~ ' --ocr-lang ' ]] 18 | [[ ${output} =~ ' --output ' ]] 19 | [[ ${output} =~ ' --orientation ' ]] 20 | [[ ${output} =~ ' --scanner ' ]] 21 | [[ ${output} =~ ' --version ' ]] 22 | [[ ${output} =~ ' --help ' ]] 23 | } 24 | 25 | @test "scan2png: Common option -h" { 26 | output=$(scan2png -h 2>&1) 27 | [[ ${output} =~ ' -h ' ]] 28 | [[ ${output} =~ ' -I ' ]] 29 | [[ ${output} =~ ' -t ' ]] 30 | [[ ${output} =~ ' -r ' ]] 31 | [[ ${output} =~ ' -p ' ]] 32 | [[ ${output} =~ ' -d ' ]] 33 | [[ ${output} =~ ' -f ' ]] 34 | [[ ${output} =~ ' -q ' ]] 35 | [[ ${output} =~ ' -m ' ]] 36 | [[ ${output} =~ ' -R ' ]] 37 | [[ ${output} =~ ' -L ' ]] 38 | [[ ${output} =~ ' -o ' ]] 39 | [[ ${output} =~ ' -O ' ]] 40 | [[ ${output} =~ ' -s ' ]] 41 | [[ ${output} =~ ' -V ' ]] 42 | } 43 | 44 | @test "scan2png: Common option --version" { 45 | scan2png --version | grep -w ${SCAN2PNG_VERSION} 46 | } 47 | 48 | @test "scan2png: Common option -V" { 49 | scan2png -V | grep -w ${SCAN2PNG_VERSION} 50 | } 51 | -------------------------------------------------------------------------------- /tests/ocrpdf.bats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bats 2 | 3 | load includes 4 | 5 | @test "ocrpdf: Common option --help" { 6 | output=$(ocrpdf --help 2>&1) 7 | [[ ${output} =~ ' --help ' ]] 8 | [[ ${output} =~ ' --lang ' ]] 9 | [[ ${output} =~ ' --quiet ' ]] 10 | [[ ${output} =~ ' --recompress ' ]] 11 | [[ ${output} =~ ' --version ' ]] 12 | } 13 | 14 | @test "ocrpdf: Common option -h" { 15 | output=$(ocrpdf -h 2>&1) 16 | [[ ${output} =~ ' -h ' ]] 17 | [[ ${output} =~ ' -l ' ]] 18 | [[ ${output} =~ ' -q ' ]] 19 | [[ ${output} =~ ' -R ' ]] 20 | [[ ${output} =~ ' -V ' ]] 21 | } 22 | 23 | @test "ocrpdf: Common option --version" { 24 | ocrpdf --version | grep -w ${OCRPDF_VERSION} 25 | } 26 | 27 | @test "ocrpdf: Common option -V" { 28 | ocrpdf -V | grep -w ${OCRPDF_VERSION} 29 | } 30 | 31 | @test "ocrpdf: OCR pdf with png base" { 32 | ::pdf-to-images png 33 | pdf=${TEMP_DIR}/ocrpdf-png.pdf 34 | ::img2pdf "${pdf}" png 35 | ocrpdf -q "${pdf}" 36 | ::is-pdf "${pdf}" 37 | ::pdf-to-text "${pdf}" "Lorem ipsum" 10 38 | ::cleanup-tempdir 39 | } 40 | 41 | @test "ocrpdf: OCR pdf with tiff base" { 42 | ::pdf-to-images tiff 43 | pdf=${TEMP_DIR}/ocrpdf-tiff.pdf 44 | ::img2pdf "${pdf}" tif 45 | ocrpdf -q "${pdf}" 46 | ::is-pdf "${pdf}" 47 | ::pdf-to-text "${pdf}" "Lorem ipsum" 10 48 | ::cleanup-tempdir 49 | } 50 | 51 | @test "ocrpdf: OCR pdf with jpeg base" { 52 | ::pdf-to-images jpeg 53 | pdf=${TEMP_DIR}/ocrpdf-jpeg.pdf 54 | ::img2pdf "${pdf}" jpg 55 | ocrpdf -q "${pdf}" 56 | ::is-pdf "${pdf}" 57 | ::pdf-to-text "${pdf}" "Lorem ipsum" 10 58 | ::cleanup-tempdir 59 | } 60 | -------------------------------------------------------------------------------- /.github/workflows/bash-compatibility.yml: -------------------------------------------------------------------------------- 1 | # ----------------------------------------------------------------------------- 2 | # Verify bash compatibility 3 | # Author: Urs Roesch https://github.com/uroesch 4 | # Version: 0.2.0 5 | # ----------------------------------------------------------------------------- 6 | name: bash-compatibility 7 | 8 | on: 9 | push: 10 | branches: 11 | - workflow/* 12 | pull_request: 13 | branches: 14 | - master 15 | - main 16 | 17 | jobs: 18 | bash-compatibility: 19 | timeout-minutes: 15 20 | runs-on: ubuntu-latest 21 | container: 22 | image: bash:${{ matrix.bash }} 23 | strategy: 24 | fail-fast: false 25 | matrix: 26 | bash: 27 | - '4.0' 28 | - '4.1' 29 | - '4.2' 30 | - '4.3' 31 | - '4.4' 32 | - '5.0' 33 | - '5.1' 34 | - '5.2' 35 | env: 36 | PDFTK_URL: https://gitlab.com/pdftk-java/pdftk/-/package_files/53763921/download 37 | 38 | steps: 39 | - name: Install apk dependencies 40 | shell: bash 41 | run: | 42 | apk add \ 43 | bats \ 44 | curl \ 45 | file \ 46 | git \ 47 | git-lfs \ 48 | grep \ 49 | imagemagick \ 50 | make \ 51 | openjdk11-jre-headless \ 52 | poppler-utils \ 53 | sane-utils \ 54 | tesseract-ocr \ 55 | tesseract-ocr-data-deu \ 56 | tesseract-ocr-data-eng \ 57 | tesseract-ocr-data-fra \ 58 | tesseract-ocr-data-ita 59 | 60 | - name: Install pdftk 61 | shell: bash 62 | run: > 63 | curl -sJL -o /usr/lib/pdftk-all.jar "${PDFTK_URL}" && 64 | printf "#!/usr/bin/env bash\njava -jar /usr/lib/pdftk-all.jar \"\$@\"" > /usr/bin/pdftk && 65 | chmod 755 /usr/bin/pdftk 66 | 67 | - name: Checkout repository 68 | uses: actions/checkout@v3 69 | with: 70 | lfs: true 71 | 72 | - name: Check bash compatibilty 73 | shell: bash 74 | run: make test 75 | -------------------------------------------------------------------------------- /tests/pdf2pdfa.bats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bats 2 | 3 | load includes 4 | 5 | @test "pdf2pdfa: Common option --help" { 6 | output=$(pdf2pdfa --help 2>&1) 7 | [[ ${output} =~ ' --color-model ' ]] 8 | [[ ${output} =~ ' --help ' ]] 9 | [[ ${output} =~ ' --level ' ]] 10 | [[ ${output} =~ ' --strict ' ]] 11 | [[ ${output} =~ ' --suffix ' ]] 12 | [[ ${output} =~ ' --version ' ]] 13 | } 14 | 15 | @test "pdf2pdfa: Common option -h" { 16 | output=$(pdf2pdfa -h 2>&1) 17 | [[ ${output} =~ ' -c ' ]] 18 | [[ ${output} =~ ' -h ' ]] 19 | [[ ${output} =~ ' -l ' ]] 20 | [[ ${output} =~ ' -S ' ]] 21 | [[ ${output} =~ ' -s ' ]] 22 | [[ ${output} =~ ' -V ' ]] 23 | } 24 | 25 | @test "pdf2pdfa: Common option --version" { 26 | pdf2pdfa --version | grep -w ${PDF2PDFA_VERSION} 27 | } 28 | 29 | @test "pdf2pdfa: Common option -V" { 30 | pdf2pdfa -V | grep -w ${PDF2PDFA_VERSION} 31 | } 32 | 33 | @test "pdf2pdfa: Converto to PDF/A-1" { 34 | # gs 9.26 generates pdf version 1.7 instead of 1.4 35 | (( ${GS_VERSION} < 950 )) && skip 36 | ::copy-sample-pdf 37 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 38 | pdf_a="${pdf//.pdf/_a1.pdf}" 39 | pdf2pdfa --suffix _a1 --level 1 "${pdf}" 40 | ::pdf-info "${pdf_a}" 'Producer' 'GPL Ghostscript' 41 | ::pdf-info "${pdf_a}" 'PDF version' '1.4' 42 | ::pdf-info "${pdf_a}" 'Pages' '5' 43 | } 44 | 45 | @test "pdf2pdfa: Converto to PDF/A-2" { 46 | # gs 9.50 does throw errors in level 2 47 | (( ${GS_VERSION} < 955 )) && skip 48 | ::copy-sample-pdf 49 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 50 | pdf_a="${pdf//.pdf/_a2.pdf}" 51 | # strict doesn't work on gs 9.50 52 | pdf2pdfa --suffix _a2 --level 2 --strict "${pdf}" 53 | ::pdf-info "${pdf_a}" 'Producer' 'GPL Ghostscript' 54 | ::pdf-info "${pdf_a}" 'PDF version' '1.7' 55 | ::pdf-info "${pdf_a}" 'Pages' '5' 56 | } 57 | 58 | @test "pdf2pdfa: Converto to PDF/A-3" { 59 | ::copy-sample-pdf 60 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 61 | pdf_a="${pdf//.pdf/_a3.pdf}" 62 | pdf2pdfa --suffix _a3 --level 3 "${pdf}" 63 | ::pdf-info "${pdf_a}" 'Producer' 'GPL Ghostscript' 64 | ::pdf-info "${pdf_a}" 'PDF version' '1.7' 65 | ::pdf-info "${pdf_a}" 'Pages' '5' 66 | } 67 | -------------------------------------------------------------------------------- /tests/pdfmeta.bats: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bats 2 | 3 | load includes 4 | 5 | @test "pdfmeta: Common option --help" { 6 | output=$(pdfmeta --help 2>&1) 7 | [[ ${output} =~ ' --help ' ]] 8 | [[ ${output} =~ ' --keywords ' ]] 9 | [[ ${output} =~ ' --subject ' ]] 10 | [[ ${output} =~ ' --title ' ]] 11 | [[ ${output} =~ ' --creator ' ]] 12 | [[ ${output} =~ ' --producer ' ]] 13 | [[ ${output} =~ ' --creation-date ' ]] 14 | [[ ${output} =~ ' --modification-date ' ]] 15 | [[ ${output} =~ ' --version ' ]] 16 | } 17 | 18 | @test "pdfmeta: Common option -h" { 19 | output=$(pdfmeta -h 2>&1) 20 | [[ ${output} =~ ' -h ' ]] 21 | [[ ${output} =~ ' -k ' ]] 22 | [[ ${output} =~ ' -s ' ]] 23 | [[ ${output} =~ ' -t ' ]] 24 | [[ ${output} =~ ' -c ' ]] 25 | [[ ${output} =~ ' -p ' ]] 26 | [[ ${output} =~ ' -C ' ]] 27 | [[ ${output} =~ ' -M ' ]] 28 | [[ ${output} =~ ' -V ' ]] 29 | } 30 | 31 | @test "pdfmeta: Common option --version" { 32 | pdfmeta --version | grep -w ${PDFMETA_VERSION} 33 | } 34 | 35 | @test "pdfmeta: Common option -V" { 36 | pdfmeta -V | grep -w ${PDFMETA_VERSION} 37 | } 38 | 39 | @test "pdfmeta: Change PDF creator" { 40 | ::copy-sample-pdf 41 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 42 | pdfmeta --creator 'Batman' "${pdf}" 43 | ::pdf-info "${pdf}" 'Creator' 'Batman' 44 | } 45 | 46 | @test "pdfmeta: Change PDF creation date" { 47 | ::copy-sample-pdf 48 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 49 | pdfmeta --creation-date '2022-12-06 01:02:03' "${pdf}" 50 | ::pdf-info "${pdf}" 'CreationDate' 'Tue Dec 6 01:02:03 2022 UTC' 51 | } 52 | 53 | @test "pdfmeta: Change PDF modification date" { 54 | ::copy-sample-pdf 55 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 56 | pdfmeta --modification-date '2022-12-06 11:22:33' "${pdf}" 57 | ::pdf-info "${pdf}" 'ModDate' 'Tue Dec 6 11:22:33 2022 UTC' 58 | } 59 | 60 | @test "pdfmeta: Change PDF keywords" { 61 | ::copy-sample-pdf 62 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 63 | pdfmeta --keywords 'bats, test' "${pdf}" 64 | ::pdf-info "${pdf}" 'Keywords' 'bats, test' 65 | } 66 | 67 | @test "pdfmeta: Change PDF producer" { 68 | ::copy-sample-pdf 69 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 70 | pdfmeta --producer 'Bats producer' "${pdf}" 71 | ::pdf-info "${pdf}" 'Producer' 'Bats producer' 72 | } 73 | 74 | @test "pdfmeta: Change PDF subject" { 75 | ::copy-sample-pdf 76 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 77 | pdfmeta --subject 'Bats subject' "${pdf}" 78 | ::pdf-info "${pdf}" 'Subject' 'Bats subject' 79 | } 80 | 81 | @test "pdfmeta: Change PDF title" { 82 | ::copy-sample-pdf 83 | pdf="${TEMP_DIR}/${SAMPLE_PDF}" 84 | pdfmeta --title 'Bats title' "${pdf}" 85 | ::pdf-info "${pdf}" 'Title' 'Bats title' 86 | } 87 | -------------------------------------------------------------------------------- /tests/includes.bash: -------------------------------------------------------------------------------- 1 | # include file for bats tests 2 | 3 | trap ::cleanup EXIT 4 | 5 | # ----------------------------------------------------------------------------- 6 | # Globals 7 | # ----------------------------------------------------------------------------- 8 | export PATH=${BATS_TEST_DIRNAME}/../bin:${PATH} 9 | export IMG2PDF_VERSION="v0.2.6" 10 | export OCRPDF_VERSION="v0.5.0" 11 | export PDFCAT_VERSION="v0.3.0" 12 | export PDF2PDFA_VERSION="v0.2.0" 13 | export PDFMETA_VERSION="v0.1.5" 14 | export PDFRESIZE_VERSION="v0.0.5" 15 | export SCAN2PDF_VERSION="v0.6.0" 16 | export SCAN2JPG_VERSION="v0.6.0" 17 | export SCAN2PNG_VERSION="v0.6.0" 18 | export GS_VERSION=$(gs --version | awk -F . '{ print $1$2 }') 19 | export FILES_DIR=${BATS_TEST_DIRNAME}/files 20 | export SAMPLE_PDF=lorem-ipsum.pdf 21 | export BASE_TEMP_DIR="${HOME}/tmp" 22 | export TEMP_PREFIX=pdftools-test 23 | export TZ=UTC 24 | 25 | # ----------------------------------------------------------------------------- 26 | # Functions 27 | # ----------------------------------------------------------------------------- 28 | function ::cleanup() { 29 | rm -rf ${BASE_TEMP_DIR}/${TEMP_PREFIX}-[0-9]* 30 | } 31 | 32 | function ::create-tempdir() { 33 | local prefix=${1:-pdftools-test}; 34 | local temp_dir=${BASE_TEMP_DIR}/${TEMP_PREFIX}-$$ 35 | [[ -d ${temp_dir} ]] || mkdir -p ${temp_dir} && : 36 | export TEMP_DIR="${temp_dir}" 37 | } 38 | 39 | function ::cleanup-tempdir() { 40 | [[ -d ${TEMP_DIR} ]] && rm -rf ${TEMP_DIR} || : 41 | } 42 | 43 | function ::is-pdf() { 44 | local pdf=${1} 45 | file ${pdf} | grep -q 'PDF document' 2>/dev/null 46 | } 47 | 48 | function ::copy-sample-pdf() { 49 | ::create-tempdir 50 | cp ${FILES_DIR}/${SAMPLE_PDF} ${TEMP_DIR} 51 | } 52 | 53 | function ::pdf-to-images() { 54 | local format=${1}; shift; 55 | local resolution=${1:-} 56 | ::create-tempdir 57 | pdftocairo \ 58 | -${format} \ 59 | ${resolution:+-r ${resolution}} \ 60 | ${FILES_DIR}/${SAMPLE_PDF} \ 61 | ${TEMP_DIR}/sample-image 62 | } 63 | 64 | function ::pdf-to-text() { 65 | local pdf=${1}; shift; 66 | local pattern=${1}; shift; 67 | local occurrence=${1}; shift; 68 | pdftotext "${pdf}" 69 | count=$(grep -c "${pattern}" ${pdf%%.*}.txt 2>/dev/null) 70 | (( count == occurrence )) 71 | } 72 | 73 | function ::img2pdf() { 74 | local output=${1}; shift; 75 | local source=${1}; shift; 76 | img2pdf \ 77 | --output ${output} \ 78 | $(find ${TEMP_DIR} -type f -name "*.${source}") 79 | } 80 | 81 | function ::pdf-size() { 82 | local original="${1}"; shift; 83 | local resized="${1}"; shift; 84 | ::is-pdf "${resized}" 85 | local size_before=$(du -b "${original}" | cut -f 1 ) 86 | local size_after=$(du -b "${resized}" | cut -f 1 ) 87 | (( size_before > size_after )) 88 | } 89 | 90 | function ::pdfresize() { 91 | ::create-tempdir 92 | local quality="${1}" 93 | local pdf="${TEMP_DIR}/pdfresize-${quality}.pdf" 94 | pdfresize \ 95 | --quality ${quality} \ 96 | --input "${FILES_DIR}/${SAMPLE_PDF}" \ 97 | --output "${pdf}" 98 | ::pdf-size "${FILES_DIR}/${SAMPLE_PDF}" "${pdf}" 99 | } 100 | 101 | function ::pdf-info() { 102 | local pdf="${1}"; shift; 103 | local key="${1}"; shift; 104 | local value="${1}"; shift; 105 | LC_ALL=C TZ=UTC pdfinfo "${pdf}" | grep "${key^}:.*${value}" 106 | } 107 | -------------------------------------------------------------------------------- /bin/pdfcat: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # ------------------------------------------------------------------------------ 4 | # A quick hack to replace pdfunite as it destroys too much of the original's 5 | # meta data. 6 | # ------------------------------------------------------------------------------ 7 | 8 | # ------------------------------------------------------------------------------ 9 | # Author: Urs Roesch 10 | # License: MIT 11 | # Requires: bash, pdftk >= 2.0 12 | # ------------------------------------------------------------------------------ 13 | 14 | # ------------------------------------------------------------------------------ 15 | # Globals 16 | # ------------------------------------------------------------------------------ 17 | declare -r PDFCAT_VERSION=0.3.0 18 | declare -r PDFCAT_SCRIPT=${0##*/} 19 | declare -r PDFCAT_AUTHOR="Urs Roesch " 20 | declare -r PDFCAT_LICENSE="MIT" 21 | declare -r PRODUCER="pdfcat ${PDFCAT_VERSION}" 22 | declare -r ARGUMENTS=$# 23 | declare -r REQUIRE_PDFTK=2 24 | 25 | # ------------------------------------------------------------------------------ 26 | # Functions 27 | # ------------------------------------------------------------------------------ 28 | 29 | function pdftk_major() { 30 | pdftk --version | \ 31 | awk '/^pdftk/ { print int(gensub("[^0-9.]", "", "g", $0)); exit }' 32 | } 33 | 34 | # ------------------------------------------------------------------------------ 35 | 36 | function pdfcat::check_compatibility() { 37 | if ! command -v pdftk &>/dev/null; then 38 | echo "Required binary 'pdftk' was not found on this system" 1&>2 39 | exit 127 40 | else 41 | if [[ $(pdftk_major) -lt ${REQUIRE_PDFTK} ]]; then 42 | echo "Pdftk version ${REQUIRE_PDFTK} or higher is required!" 1&>2 43 | exit 128 44 | fi 45 | fi 46 | } 47 | 48 | # ------------------------------------------------------------------------------ 49 | 50 | function pdfcat::usage() { 51 | local exit_code=$1; shift; 52 | { 53 | echo " Usage:"; 54 | echo " ${PDFCAT_SCRIPT} [] [..]"; 55 | echo ""; 56 | echo " Options:"; 57 | echo " -h | --help This message."; 58 | echo " -V | --version Print version and exit."; 59 | echo ""; 60 | exit ${exit_code}; 61 | } 1>&2 62 | } 63 | 64 | # ------------------------------------------------------------------------------ 65 | 66 | function pdfcat::version() { 67 | printf "%s v%s\nCopyright (c) %s\nLicense - %s\n" \ 68 | "${PDFCAT_SCRIPT}" \ 69 | "${PDFCAT_VERSION}" \ 70 | "${PDFCAT_AUTHOR}" \ 71 | "${PDFCAT_LICENSE}" 72 | exit 0 73 | } 74 | 75 | # ------------------------------------------------------------------------------ 76 | 77 | function pdfcat::parse_options() { 78 | while (( ${#} > 0 )); do 79 | case ${1} in 80 | -h|--help) pdfcat::usage 0;; 81 | -V|--version) pdfcat::version;; 82 | esac 83 | shift 84 | done 85 | } 86 | 87 | # ------------------------------------------------------------------------------ 88 | 89 | function pdfcat::fetch_meta() { 90 | local file="$1"; shift; 91 | pdftk "${file}" dump_data_utf8 output - | grep ^Info 2>/dev/null 92 | } 93 | 94 | # ------------------------------------------------------------------------------ 95 | 96 | function pdfcat::concat_pdf() { 97 | [[ $# -eq 0 ]] && return 98 | local meta=$( pdfcat::fetch_meta "$1" ) 99 | pdftk "$@" cat output - | pdfcat::write_meta "${meta}" 2>/dev/null 100 | } 101 | 102 | # ------------------------------------------------------------------------------ 103 | 104 | function pdfcat::write_meta() { 105 | local meta="$1"; shift; 106 | pdftk - update_info_utf8 <( echo "${meta}" ) output - 107 | } 108 | 109 | # ------------------------------------------------------------------------------ 110 | # Main 111 | # ------------------------------------------------------------------------------ 112 | pdfcat::check_compatibility 113 | 114 | if [[ $0 == ${BASH_SOURCE} && ${ARGUMENTS} -lt 2 ]]; then 115 | pdfcat::parse_options "${@}" 116 | pdfcat::usage 1; 117 | fi 118 | 119 | pdfcat::concat_pdf "$@" 120 | 121 | 122 | 123 | -------------------------------------------------------------------------------- /.github/workflows/os-compatibilty.yml: -------------------------------------------------------------------------------- 1 | # ----------------------------------------------------------------------------- 2 | # Test pdftools 3 | # Author: Urs Roesch https://github.com/uroesch 4 | # Version: 0.3.0 5 | # ----------------------------------------------------------------------------- 6 | name: os-compatibility 7 | 8 | on: 9 | push: 10 | branches: 11 | - workflow/* 12 | pull_request: 13 | branches: 14 | - master 15 | - main 16 | 17 | env: 18 | PDFTK_URL: https://gitlab.com/pdftk-java/pdftk/-/jobs/812582458/artifacts/raw/build/libs/pdftk-all.jar?inline=false 19 | 20 | jobs: 21 | debian: 22 | timeout-minutes: 10 23 | runs-on: ubuntu-latest 24 | container: 25 | image: ${{ matrix.name }}:${{ matrix.release }} 26 | strategy: 27 | matrix: 28 | include: 29 | - { name: ubuntu, release: 22.04 } 30 | - { name: ubuntu, release: 20.04 } 31 | - { name: ubuntu, release: 18.04 } 32 | # - { name: ubuntu, release: 16.04 } 33 | - { name: debian, release: 12 } 34 | - { name: debian, release: 11 } 35 | - { name: debian, release: 10 } 36 | 37 | env: 38 | DEBIAN_FRONTEND: noninteractive 39 | 40 | steps: 41 | - name: Install dependencies other than pdftk 42 | shell: bash 43 | run: > 44 | apt-get update; 45 | apt-get -y install 46 | bats 47 | curl 48 | file 49 | gawk 50 | ghostscript 51 | git 52 | imagemagick 53 | make 54 | poppler-utils 55 | sane-utils 56 | tesseract-ocr 57 | ; 58 | 59 | - name: Install pdftk on non Ubuntu 18.04 60 | if: matrix.release != '18.04' 61 | shell: bash 62 | run: apt-get -y install pdftk 63 | 64 | - name: Install pdftk on Ubuntu 18.04 65 | if: > 66 | matrix.name == 'ubuntu' && 67 | matrix.release == '18.04' 68 | shell: bash 69 | run: | 70 | curl -sJL -o /usr/lib/pdftk-all.jar "${PDFTK_URL}" && \ 71 | printf "%s\n" \ 72 | '#!/usr/bin/env sh' \ 73 | 'java -cp /usr/lib/pdftk-all.jar com.gitlab.pdftk_java.pdftk "$@"' \ 74 | > /usr/bin/pdftk 75 | chmod 755 /usr/bin/pdftk 76 | apt install -y openjdk-11-jre-headless 77 | 78 | - name: Adjust ImageMagick configuration 79 | shell: bash 80 | run: > 81 | test -f /etc/ImageMagick-6/policy.xml && 82 | sed -i '/pattern="PDF"/s/rights="none"/rights="read\|write"/' 83 | /etc/ImageMagick-6/policy.xml 84 | 85 | - name: Checkout repository 86 | uses: actions/checkout@v3 87 | 88 | - name: Test pdftools functionality 89 | shell: bash 90 | run: | 91 | export PATH=${GITHUB_WORKSPACE}/bin:${PATH} 92 | make test 93 | 94 | 95 | redhat: 96 | timeout-minutes: 10 97 | runs-on: ubuntu-latest 98 | container: 99 | image: ${{ matrix.name }}:${{ matrix.release }} 100 | strategy: 101 | matrix: 102 | include: 103 | #- { name: centos, release: 7 } 104 | - { name: rockylinux, release: 8 } 105 | #- { name: rockylinux, release: 9 } 106 | 107 | steps: 108 | - name: Install dependencies other than pdftk 109 | shell: bash 110 | run: > 111 | yum -y install epel-release; 112 | yum -y --allowerasing install 113 | ImageMagick 114 | bats 115 | coreutils 116 | curl 117 | file 118 | gawk 119 | ghostscript 120 | git 121 | java-11-openjdk-headless 122 | make 123 | poppler-utils 124 | sane-frontends 125 | tesseract 126 | ; 127 | 128 | - name: Install pdftk 129 | shell: bash 130 | run: | 131 | curl -sJL -o /usr/lib/pdftk-all.jar "${PDFTK_URL}" && \ 132 | printf "%s\n" \ 133 | '#!/usr/bin/env sh' \ 134 | 'java -cp /usr/lib/pdftk-all.jar com.gitlab.pdftk_java.pdftk "$@"' \ 135 | > /usr/bin/pdftk 136 | chmod 755 /usr/bin/pdftk 137 | 138 | - name: Adjust ImageMagick configuration 139 | shell: bash 140 | run: > 141 | test -f /etc/ImageMagick-6/policy.xml && 142 | sed -i '/pattern="PDF"/s/rights="none"/rights="read\|write"/' 143 | /etc/ImageMagick-6/policy.xml 144 | 145 | - name: Checkout repository 146 | uses: actions/checkout@v3 147 | 148 | - name: Test pdftools functionality 149 | shell: bash 150 | run: | 151 | export PATH=${GITHUB_WORKSPACE}/bin:${PATH} 152 | #bats tests/*bats 153 | make test 154 | -------------------------------------------------------------------------------- /bin/pdfresize: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | #------------------------------------------------------------------------------ 4 | # A wrapper aroung ghostscript to reduce the size of a scanned document 5 | #------------------------------------------------------------------------------ 6 | 7 | # ----------------------------------------------------------------------------- 8 | # Author: Urs Roesch 9 | # License: MIT 10 | # Requires: bash, ghostscript 11 | # ----------------------------------------------------------------------------- 12 | 13 | # ----------------------------------------------------------------------------- 14 | # Setup 15 | # ----------------------------------------------------------------------------- 16 | set -o errexit 17 | set -o nounset 18 | set -o pipefail 19 | 20 | # ----------------------------------------------------------------------------- 21 | # Globals 22 | # ----------------------------------------------------------------------------- 23 | declare -r VERSION=0.0.5 24 | declare -r AUTHOR="Urs Roesch " 25 | declare -r LICENSE="MIT" 26 | declare -r SCRIPT=${0##*/} 27 | declare SETTINGS=default 28 | declare INPUT="" 29 | declare OUTPUT="" 30 | declare -a DEPENDENCIES=( gs ) 31 | declare -a KEYWORDS=( 32 | default 33 | ebook 34 | prepress 35 | printer 36 | screen 37 | ) 38 | 39 | # ----------------------------------------------------------------------------- 40 | # Functions 41 | # ----------------------------------------------------------------------------- 42 | 43 | function check_dependencies() { 44 | for dependency in ${DEPENDENCIES[*]}; do 45 | if ! command -v ${dependency} >/dev/null 2>&1; then 46 | echo "Missing dependency '${dependency}'" 1>&2 47 | echo "Please install first!" 1>&2 48 | exit 64 49 | fi 50 | done 51 | } 52 | 53 | # ----------------------------------------------------------------------------- 54 | 55 | function version() { 56 | printf "%s v%s\nCopyright (c) %s\nLicense - %s\n" \ 57 | "${SCRIPT}" "${VERSION}" "${AUTHOR}" "${LICENSE}" 58 | exit 0 59 | } 60 | 61 | # ----------------------------------------------------------------------------- 62 | 63 | function usage() { 64 | local exit_code=${1:-1} 65 | cat < -o 69 | 70 | Options: 71 | -h | --help This message 72 | -i | --input A PDF file preferably of high resolution 73 | -o | --output Name of the PDF file to save the result to 74 | -q | --quality Quality settings for output PDF. 75 | See quality keywords for acceptable input. 76 | -V | --version Print version and exit. 77 | 78 | Quality keywords: 79 | screen - low-resolution; comparable to "Screen Optimized" in Acrobat Distiller 80 | ebook - medium-resolution; comparable to "eBook" in Acrobat Distiller 81 | printer - comparable to "Print Optimized" in Acrobat Distiller 82 | prepress - comparable to "Prepress Optimized" in Acrobat Distiller 83 | default - intended to be useful across a wide variety of uses 84 | 85 | USAGE 86 | exit ${exit_code} 87 | } 88 | 89 | # ----------------------------------------------------------------------------- 90 | 91 | function resample_pdf() { 92 | gs \ 93 | -q \ 94 | -dNOPAUSE \ 95 | -dBATCH \ 96 | -dPrinted=false \ 97 | -dCompatibilityLevel=1.4 \ 98 | -dPDFSETTINGS=/${SETTINGS} \ 99 | -sDEVICE=pdfwrite \ 100 | -sOutputFile="${OUTPUT}" \ 101 | "${INPUT}" \ 102 | 1>/dev/null 103 | } 104 | 105 | # ----------------------------------------------------------------------------- 106 | 107 | function parse_options() { 108 | [[ ${#} -lt 1 ]] && usage 1 109 | while [[ ${#} -gt 0 ]]; do 110 | case ${1} in 111 | -i|--input) shift; INPUT=${1};; 112 | -o|--output) shift; OUTPUT=${1};; 113 | -q|--quality) shift; SETTINGS=${1};; 114 | -h|--help) usage 0;; 115 | -V|--version) version;; 116 | *) usage 1;; 117 | esac 118 | shift 119 | done 120 | } 121 | 122 | # ----------------------------------------------------------------------------- 123 | 124 | function evaluate_options() { 125 | local message="" 126 | local keywords=${KEYWORDS[@]} 127 | if [[ -z ${INPUT} ]]; then 128 | message+="Missing input PDF!\n" 129 | fi 130 | 131 | if [[ -z ${OUTPUT} ]]; then 132 | message+="Missing ouput PDF!\n" 133 | fi 134 | 135 | if [[ ! ${SETTINGS} =~ ^${keywords// /|}$ ]]; then 136 | message+="Invalid quality keyword!\n" 137 | fi 138 | 139 | if [[ -n ${message} ]]; then 140 | printf "\n\n${message}" 141 | usage 1 142 | fi 143 | 144 | } 145 | 146 | # ----------------------------------------------------------------------------- 147 | # Main 148 | # ----------------------------------------------------------------------------- 149 | check_dependencies 150 | parse_options "$@" 151 | evaluate_options 152 | resample_pdf 153 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # vim: shiftwidth=2 tabstop=2 noexpandtab : 2 | # ----------------------------------------------------------------------------- 3 | # Setup 4 | # ----------------------------------------------------------------------------- 5 | SHELL := bash 6 | .ONESHELL: 7 | .SHELLFLAGS := -o errexit -o nounset -o pipefail -c 8 | .DELETE_ON_ERROR: 9 | MAKEFLAGS += --warn-undefined-variables 10 | MAKEFLAGS += --no-builtin-rules 11 | 12 | # ----------------------------------------------------------------------------- 13 | # Globals 14 | # ----------------------------------------------------------------------------- 15 | DIVIDER := $$(printf "%0.1s" -{1..80}) 16 | OS_REL := /etc/os-release 17 | OS_FAMILY := $(shell sed -n '/^ID=/{s/"//g;s/.*=//;p}' $(OS_REL)) 18 | OS_RELEASE := $(shell sed -n '/^VERSION_ID=/{s/"//g;s/.*=//;p}' $(OS_REL)) 19 | OS_NAME := $(OS_FAMILY)_$(OS_RELEASE) 20 | USER_BIN := $(HOME)/bin 21 | REPO_NAME := $(shell basename $(CURDIR)) 22 | BASH_VERSIONS := 4.0 4.1 4.2 4.3 4.4 5.0 5.1 5.2-rc 23 | 24 | # ----------------------------------------------------------------------------- 25 | # Contidionally assigned globals 26 | # ----------------------------------------------------------------------------- 27 | ifeq ($(OS_NAME), ubuntu_18.04) 28 | ASCIIDOCTOR_PREFIX := bundler exec 29 | ASCIIDOCTOR_DEPENDENCIES := deb:ruby-bundler gem:asciidoctor-pdf:1.6.2 30 | else 31 | ASCIIDOCTOR_PREFIX := 32 | ASCIIDOCTOR_DEPENDENCIES := deb:asciidoctor deb:ruby-asciidoctor-pdf 33 | endif 34 | 35 | .PHONY: test 36 | 37 | # ----------------------------------------------------------------------------- 38 | # Document creation 39 | # ----------------------------------------------------------------------------- 40 | asciidoctor_dependencies: 41 | @echo "Install asciidoctor dependencies $(ASCIIDOCTOR_DEPENDENCIES)" 42 | for pkg in $(ASCIIDOCTOR_DEPENDENCIES); do 43 | IFS=: read method name version <<< $${pkg} 44 | case $${method} in 45 | deb) 46 | if dpkg -l $${name} &>/dev/null; then 47 | echo "Package $${name} already installed" 48 | else 49 | sudo apt -y install $${name} 50 | fi 51 | ;; 52 | gem) 53 | [[ -f Gemfile ]] || bundler init && : 54 | bundler add "$${name}" --version "$${version}" 55 | ;; 56 | esac 57 | done 58 | @echo $(DIVIDER) 59 | 60 | docs/README.html: README.adoc 61 | @echo "Build HTML doc $<" 62 | $(ASCIIDOCTOR_PREFIX) asciidoctor -D docs $< 63 | @echo $(DIVIDER) 64 | 65 | docs/README.pdf: README.adoc 66 | @echo "Build PDF doc $<" 67 | $(ASCIIDOCTOR_PREFIX) asciidoctor-pdf --trace -D docs $< 68 | @echo $(DIVIDER) 69 | 70 | docs: asciidoctor_dependencies docs/README.html docs/README.pdf 71 | 72 | # ----------------------------------------------------------------------------- 73 | # User install 74 | # ----------------------------------------------------------------------------- 75 | user_install: 76 | @echo "Install $(REPO_NAME) under $(USER_BIN)" 77 | mkdir -p $(USER_BIN) || : 78 | for script in bin/*; do 79 | basename=$${script##*/} 80 | install $${script} $(USER_BIN)/$${basename} && 81 | echo "-> Installing $${script} to $(USER_BIN)/$${basename}" 82 | done 83 | 84 | user_uninstall: 85 | @echo "Uninstall $(REPO_NAME) from $(USER_BIN)" 86 | for script in bin/*; do 87 | basename=$${script##*/} 88 | if [[ -f $(USER_BIN)/$${basename} ]]; then 89 | rm $(USER_BIN)/$${basename} && 90 | echo "-> Unstalling $(USER_BIN)/$${basename}" 91 | fi 92 | done 93 | 94 | # ----------------------------------------------------------------------------- 95 | # Janitor tasks 96 | # ----------------------------------------------------------------------------- 97 | clean: 98 | test -d docs && \ 99 | find docs \( -name "*.pdf" -or -name "*.html" \) -print -delete 100 | test -f Gemfile && rm Gemfile 101 | test -f Gemfile.lock && rm Gemfile.lock 102 | # ----------------------------------------------------------------------------- 103 | # Run tests 104 | # ----------------------------------------------------------------------------- 105 | test: 106 | bats tests/*bats 107 | 108 | test-bash: 109 | @echo "Bash tests" 110 | declare -a VERSIONS=( $(BASH_VERSIONS) ); 111 | function setup() { 112 | apk add \ 113 | bats \ 114 | curl \ 115 | file \ 116 | grep \ 117 | imagemagick \ 118 | make \ 119 | openjdk11-jre-headless \ 120 | poppler-utils \ 121 | sane-utils \ 122 | tesseract-ocr \ 123 | tesseract-ocr-data-deu \ 124 | tesseract-ocr-data-eng \ 125 | tesseract-ocr-data-fra \ 126 | tesseract-ocr-data-ita; \ 127 | curl -sJLO 'https://gitlab.com/pdftk-java/pdftk/-/jobs/812582458/artifacts/raw/build/libs/pdftk-all.jar?inline=false' && \ 128 | mv pdftk-all.jar /usr/lib/ && \ 129 | echo -e "#!/usr/bin/env bash\njava -jar /usr/lib/pdftk-all.jar \"\$$@\"" > /usr/bin/pdftk && \ 130 | chmod 755 /usr/bin/pdftk 131 | }; 132 | for version in $${VERSIONS[@]}; do 133 | @echo "Test bash version $${version}" 134 | docker pull bash:$${version}; \ 135 | docker run \ 136 | --rm \ 137 | --tty \ 138 | --volume $$(pwd):/pdftools \ 139 | --workdir /pdftools \ 140 | bash:$${version} \ 141 | bash -c "$$(declare -f setup); setup &>/dev/null && make test"; 142 | done 143 | -------------------------------------------------------------------------------- /bin/pdf2pdfa: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # ----------------------------------------------------------------------------- 4 | # Script to convert a PDF to PDF/A via conversion to postscript and back 5 | # Based on the answer from: 6 | # https://unix.stackexchange.com/questions/79516/converting-pdf-to-pdf-a 7 | # ----------------------------------------------------------------------------- 8 | 9 | # ------------------------------------------------------------------------------ 10 | # Author: Urs Roesch 11 | # License: MIT 12 | # Requires: bash, ghostscript, pdftops 13 | # ------------------------------------------------------------------------------ 14 | 15 | # ----------------------------------------------------------------------------- 16 | # Setup 17 | # ----------------------------------------------------------------------------- 18 | set -o errexit 19 | set -o nounset 20 | set -o pipefail 21 | 22 | # ----------------------------------------------------------------------------- 23 | # Globals 24 | # ----------------------------------------------------------------------------- 25 | declare -r SCRIPT=${0##*/} 26 | declare -r VERSION=0.2.0 27 | declare -r AUTHOR="Urs Roesch " 28 | declare -r LICENSE="MIT" 29 | declare SUFFIX=_a 30 | declare -a PDF_FILES=() 31 | declare -i ARCHIVE_LEVEL=2 32 | declare -i COMPATIBILITY_POLICY=1 33 | declare -u COLOR_MODEL=rgb 34 | declare -a DEPENDENCIES=( 35 | gs 36 | pdftops 37 | ) 38 | 39 | # ----------------------------------------------------------------------------- 40 | # Functions 41 | # ----------------------------------------------------------------------------- 42 | function check_dependencies() { 43 | for dependency in ${DEPENDENCIES[@]}; do 44 | if ! command -v ${dependency} >/dev/null 2>&1; then 45 | echo "Missing dependency '${dependency}'" 1>&2 46 | echo "Please install first!" 1>&2 47 | exit 64 48 | fi 49 | done 50 | } 51 | 52 | # ----------------------------------------------------------------------------- 53 | 54 | function version() { 55 | printf "%s v%s\nCopyright (c) %s\nLicense - %s\n" \ 56 | "${SCRIPT}" "${VERSION}" "${AUTHOR}" "${LICENSE}" 57 | exit 0 58 | } 59 | 60 | # ----------------------------------------------------------------------------- 61 | 62 | function convert_pdf() { 63 | for pdf_original in "${PDF_FILES[@]}"; do 64 | file "${pdf_original}" | grep -q -w PDF || continue 65 | to_pdfa "${pdf_original}" 66 | done 67 | } 68 | 69 | # ----------------------------------------------------------------------------- 70 | 71 | function to_pdfa() { 72 | local pdf_document="${1}"; shift; 73 | local ps_document=$(to_postscript "${pdf_document}") 74 | local pdfa_document=${pdf_document%%.*}${SUFFIX}.pdf 75 | 76 | gs \ 77 | -dPDFA=${ARCHIVE_LEVEL} \ 78 | -dBATCH \ 79 | -dNOPAUSE \ 80 | -dNOOUTERSAVE \ 81 | -sProcessColorModel=Device${COLOR_MODEL} \ 82 | -sDEVICE=pdfwrite \ 83 | -dPDFACompatibilityPolicy=${COMPATIBILITY_POLICY} \ 84 | -sOutputFile="${pdfa_document}" \ 85 | "${ps_document}" &>/dev/null 86 | 87 | rm "${ps_document}" 88 | } 89 | 90 | # ----------------------------------------------------------------------------- 91 | 92 | function to_postscript() { 93 | local pdf_document="${1}"; shift; 94 | local ps_document=$(mktemp) 95 | pdftops "${pdf_document}" "${ps_document}" 96 | echo ${ps_document} 97 | } 98 | 99 | # ----------------------------------------------------------------------------- 100 | 101 | function usage() { 102 | local exit_code=${1:-}; shift; 103 | cat <] [ [..]] 106 | 107 | Options: 108 | -c | --color-model Color model to use for the conversion. 109 | Valid input is RGB or CMYK. 110 | Default: ${COLOR_MODEL} 111 | -h | --help This message 112 | -l | --level PDF-A specification level to use. 113 | Valid input is 1 (A-1), 2 (A-2) and 3 (A-3). 114 | Default: ${ARCHIVE_LEVEL} 115 | -S | --strict Exit if errors are encountered during conversion. 116 | -s | --suffix Append to filename 117 | Default '${SUFFIX}' 118 | -V | --version Display version and exit. 119 | 120 | USAGE 121 | exit ${exit_code} 122 | } 123 | 124 | # ----------------------------------------------------------------------------- 125 | 126 | function parse_options() { 127 | [[ $# -eq 0 ]] && usage 1 128 | while [[ $# -gt 0 ]]; do 129 | case $1 in 130 | -c|--color-model) shift; COLOR_MODEL=${1};; 131 | -h|--help) usage 0;; 132 | -l|--level) shift; ARCHIVE_LEVEL=${1};; 133 | -S|--strict) COMPATIBILITY_POLICY=2;; 134 | -s|--suffix) shift; SUFFIX=${1};; 135 | -V|--version) version;; 136 | -*) usage 1;; 137 | *) PDF_FILES+=( "${1}" );; 138 | esac 139 | shift 140 | done 141 | } 142 | 143 | # ----------------------------------------------------------------------------- 144 | # Main 145 | # ----------------------------------------------------------------------------- 146 | check_dependencies 147 | parse_options "${@}" 148 | convert_pdf 149 | -------------------------------------------------------------------------------- /bin/img2pdf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # ------------------------------------------------------------------------------ 4 | # A script to convert PNG, TIFF or JPEG to PDF files. 5 | # ------------------------------------------------------------------------------ 6 | 7 | # ------------------------------------------------------------------------------ 8 | # Author: Urs Roesch 9 | # License: MIT 10 | # Requires: bash, ImageMagick, pdfcat, pdftk 11 | # ------------------------------------------------------------------------------ 12 | 13 | # ------------------------------------------------------------------------------ 14 | # Setup 15 | # ------------------------------------------------------------------------------ 16 | set -o nounset 17 | set -o errexit 18 | set -o pipefail 19 | 20 | # ------------------------------------------------------------------------------ 21 | # Globals 22 | # ------------------------------------------------------------------------------ 23 | declare -r SCRIPT=${0##*/} 24 | declare -r VERSION=0.2.6 25 | declare -r AUTHOR="Urs Roesch " 26 | declare -r LICENSE="MIT" 27 | declare -- IMAGE_COMPRESSION=${PDF_IMAGE:-jpeg} 28 | declare -- IMAGE_QUALITY=${IMAGE_QUALITY:-90} 29 | declare -- IMAGE_ROTATE="" 30 | declare -- PDF_PREFIX="" 31 | declare -- OUTPUT="" 32 | declare -- DELETE_IMAGES=false 33 | declare -a IMAGES=() 34 | declare -a DEPENDENCIES=( 35 | identify 36 | convert 37 | pdfcat 38 | ) 39 | declare -a INCLUDES=( 40 | pdfcat 41 | ) 42 | 43 | # ------------------------------------------------------------------------------ 44 | # Functions 45 | # ------------------------------------------------------------------------------ 46 | 47 | function check_dependencies() { 48 | for dependency in ${DEPENDENCIES[*]}; do 49 | if ! command -v ${dependency} >/dev/null 2>&1; then 50 | echo "Missing dependency '${dependency}'" 1>&2 51 | echo "Please install first!" 1>&2 52 | exit 64 53 | fi 54 | done 55 | } 56 | 57 | # ------------------------------------------------------------------------------ 58 | 59 | function version() { 60 | printf "%s v%s\nCopyright (c) %s\nLicense - %s\n" \ 61 | "${SCRIPT}" "${VERSION}" "${AUTHOR}" "${LICENSE}" 62 | exit 0 63 | } 64 | 65 | # ------------------------------------------------------------------------------ 66 | 67 | function source_includes() { 68 | for include in ${INCLUDES[*]}; do 69 | local path=$(command -v ${include} 2>/dev/null) 70 | [[ -n ${path} ]] && source ${path} 71 | done 72 | } 73 | 74 | # ------------------------------------------------------------------------------ 75 | 76 | function usage() { 77 | local exit_code=$1 78 | cat < [ ... ] 82 | 83 | Options: 84 | -d | --delete Delete the images after creating the PDF file. 85 | -h | --help This message 86 | -o | --output Write the output to specified file . 87 | -r | --rotate Rotate the image by 88 | Where value can be a positive or negative integer 89 | between 0 and 360. 90 | -V | --version Display version and exit 91 | 92 | USAGE 93 | exit ${exit_code} 94 | } 95 | 96 | # ------------------------------------------------------------------------------ 97 | 98 | function parse_options() { 99 | if (( $# < 1 )); then 100 | usage 1 101 | fi 102 | while (( $# > 0 )); do 103 | case $1 in 104 | -V|--version) version ;; 105 | -d|--delete) DELETE_IMAGES=true ;; 106 | -o|--output) shift; OUTPUT=$1; PDF_PREFIX="$$-${RANDOM}-" ;; 107 | -r|--rotate) shift; IMAGE_ROTATE=$1 ;; 108 | -h|--help) usage 0 ;; 109 | -*) usage 1 ;; 110 | *) IMAGES+=( "$1" );; 111 | esac 112 | shift 113 | done 114 | } 115 | 116 | # ------------------------------------------------------------------------------ 117 | 118 | function check_image() { 119 | local image="$1" 120 | ftype=$( identify -format %m ${image} 2>/dev/null ) 121 | case ${ftype} in 122 | PNG|TIF*) 123 | IMAGE_COMPRESSION=lzw 124 | IMAGE_QUALITY=9 125 | return 0 126 | ;; 127 | JPEG) 128 | IMAGE_COMPRESSION=jpeg 129 | IMAGE_QUALITY=90 130 | return 0 131 | ;; 132 | PDF|*) 133 | return 1 134 | ;; 135 | esac 136 | } 137 | 138 | # ------------------------------------------------------------------------------ 139 | 140 | function temporary_pdf() { 141 | local image=${1}; shift 142 | local path=$([[ $image != ${image%/*} ]] && echo ${image%/*}) 143 | local basename=${image##*/} 144 | echo "${path:-.}/${PDF_PREFIX}${basename%.*}.pdf" 145 | } 146 | 147 | # ------------------------------------------------------------------------------ 148 | 149 | function create_pdf() { 150 | local dest="$( [[ -n ${OUTPUT} ]] && echo ${OUTPUT} || echo ${IMAGES[0]} )" 151 | local -a pdfs=() 152 | for image in "${IMAGES[@]}"; do 153 | check_image "${image}" || continue 154 | pdf=$(temporary_pdf "${image}") 155 | convert "${image}" \ 156 | ${IMAGE_ROTATE:+-rotate ${IMAGE_ROTATE}} \ 157 | -compress ${IMAGE_COMPRESSION} \ 158 | -quality ${IMAGE_QUALITY} \ 159 | "${pdf}" 160 | pdfs=( "${pdfs[@]:-}" "${pdf}" ) 161 | done 162 | [[ -z ${OUTPUT} ]] && return 0 163 | pdfcat::concat_pdf "${pdfs[@]:1}" > "${dest}-$$" 164 | rm "${pdfs[@]:1}" 2>/dev/null 165 | mv "${dest}-$$" "${dest%.*}.pdf" 166 | } 167 | 168 | # ------------------------------------------------------------------------------ 169 | 170 | function delete_images() { 171 | ## delete the original images 172 | if [ ${DELETE_IMAGES} = true ]; then 173 | rm "${IMAGES[@]}" 174 | fi 175 | } 176 | 177 | # ------------------------------------------------------------------------------ 178 | # Main 179 | # ------------------------------------------------------------------------------ 180 | check_dependencies 181 | source_includes 182 | parse_options "$@" 183 | create_pdf 184 | delete_images 185 | -------------------------------------------------------------------------------- /bin/pdfmeta: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # ----------------------------------------------------------------------------- 4 | # A wrapper script around `pdftk` to manipulate a PDFs meta data 5 | # ----------------------------------------------------------------------------- 6 | 7 | # ----------------------------------------------------------------------------- 8 | # Author: Urs Roesch 9 | # License: MIT 10 | # Requires: bash >= 4.0, pdftk >= 2.0 11 | # ----------------------------------------------------------------------------- 12 | 13 | # ----------------------------------------------------------------------------- 14 | # Setup 15 | # ----------------------------------------------------------------------------- 16 | set -o errexit 17 | set -o nounset 18 | set -o pipefail 19 | 20 | # ----------------------------------------------------------------------------- 21 | # Globals 22 | # ----------------------------------------------------------------------------- 23 | declare -r SCRIPT=${0##*/} 24 | declare -r VERSION=0.1.5 25 | declare -r AUTHOR="Urs Roesch " 26 | declare -r LICENSE="MIT" 27 | declare -r PRODUCER="${SCRIPT} ${VERSION}" 28 | declare -r REQUIRE_BASH=4 29 | declare -r REQUIRE_PDFTK=2 30 | declare -a FILES=() 31 | declare -A META 32 | 33 | # ----------------------------------------------------------------------------- 34 | # Functions 35 | # ----------------------------------------------------------------------------- 36 | function usage() { 37 | local exit_code=${1:-1}; shift; 38 | cat << USAGE 39 | Usage: 40 | ${SCRIPT} [[] ..] 41 | 42 | Options: 43 | -h | --help This message 44 | -k | --keywords Comma seperated list of keywords 45 | -s | --subject Define the PDFs subject 46 | -t | --title Define the PDFs title 47 | -c | --creator Define the PDFs creator program or library 48 | -p | --producer Define the PDFs producing program 49 | -C | --creation-date Set the creation date of the PDF 50 | -M | --modification-date Set the modification date of the PDF 51 | -V | --version Display version and exit 52 | 53 | Examples: 54 | 55 | Modify keywords 56 | ${SCRIPT} --keywords "rainbow, magical, unicorn" unicorn.pdf rainbow.pdf 57 | 58 | Modify creation date 59 | ${SCRIPT} --creation-date "2017-01-01 22:30:45" unicorn.pdf 60 | 61 | USAGE 62 | 63 | exit ${exit_code} 64 | } 65 | 66 | # ----------------------------------------------------------------------------- 67 | 68 | function version() { 69 | printf "%s v%s\nCopyright (c) %s\nLicense - %s\n" \ 70 | "${SCRIPT}" "${VERSION}" "${AUTHOR}" "${LICENSE}" 71 | exit 0 72 | } 73 | 74 | # ----------------------------------------------------------------------------- 75 | 76 | function pdftk_major() { 77 | pdftk --version | \ 78 | awk '/^pdftk/ { print int(gensub("[^0-9.]", "", "g", $0)); exit }' 79 | } 80 | 81 | # ----------------------------------------------------------------------------- 82 | 83 | function check_compatibility() { 84 | if [[ ${BASH_VERSINFO[0]} -lt ${REQUIRE_BASH} ]]; then 85 | echo "Bash version ${REQUIRE_BASH} or higher is required!" 1&>2 86 | exit 128 87 | fi 88 | if ! command -v pdftk &>/dev/null; then 89 | echo "Required binary 'pdftk' was not found on this system" 1&>2 90 | exit 127 91 | else 92 | if [[ $(pdftk_major) -lt ${REQUIRE_PDFTK} ]]; then 93 | echo "Pdftk version ${REQUIRE_PDFTK} or higher is required!" 1&>2 94 | exit 128 95 | fi 96 | fi 97 | } 98 | 99 | # ----------------------------------------------------------------------------- 100 | 101 | function convert_date() { 102 | local datetime=$1; shift 103 | date -d "${datetime}" +D:%Y%m%d%H%M%S%z 104 | } 105 | 106 | # ----------------------------------------------------------------------------- 107 | 108 | function parse_options() { 109 | [[ $# -eq 0 ]] && usage 0; 110 | while [[ $# -gt 0 ]]; do 111 | case $1 in 112 | -a|--author) shift; META[Author]="$1";; 113 | -k|--keywords) shift; META[Keywords]="$1";; 114 | -t|--title) shift; META[Title]="$1";; 115 | -s|--subject) shift; META[Subject]="$1";; 116 | -c|--creator) shift; META[Creator]="$1";; 117 | -p|--producer) shift; META[Producer]="$1";; 118 | -C|--creation-date) shift; META[CreationDate]="$1";; 119 | -M|--modification-date) shift; META[ModDate]="$1";; 120 | -h|--help) usage 0;; 121 | -V|--version) version;; 122 | *) FILES+=( "$1") 123 | esac 124 | shift 125 | done 126 | } 127 | 128 | # ----------------------------------------------------------------------------- 129 | 130 | function check_input() { 131 | if [[ ${#META[@]} -lt 1 ]]; then 132 | usage 1; 133 | fi 134 | } 135 | 136 | # ----------------------------------------------------------------------------- 137 | 138 | function assemble_meta_data() { 139 | for key in "${!META[@]}"; do 140 | case ${key} in 141 | *Date) META[${key}]="$(convert_date "${META[${key}]}")";; 142 | esac 143 | echo InfoBegin 144 | echo InfoKey: ${key} 145 | echo InfoValue: ${META[${key}]} 146 | done 147 | } 148 | 149 | # ----------------------------------------------------------------------------- 150 | 151 | function file_is_pdf() { 152 | local file="$1" 153 | 154 | if [[ ! -e "${file}" ]]; then 155 | echo "File '${file}' does not exist; skipping!" 156 | return 1 157 | fi 158 | if ! file "${file}" | grep -q 'PDF document' 2>/dev/null; then 159 | echo "File '${file}' is not a PDF; skipping!" 160 | return 1 161 | fi 162 | } 163 | 164 | # ----------------------------------------------------------------------------- 165 | 166 | function update_meta() { 167 | local file="$1"; shift; 168 | local meta="$( assemble_meta_data )" 169 | local work_file="${file}-${RANDOM}-$$" 170 | 171 | file_is_pdf "${file}" || return 172 | 173 | pdftk "${file}" update_info_utf8 <( echo "${meta}" ) output "${work_file}" 174 | if [[ $? -eq 0 ]]; then 175 | mv "${work_file}" "${file}" 176 | else 177 | echo "There was an issue updating meta information in ${file}" 1>&2 178 | fi 179 | } 180 | 181 | # ----------------------------------------------------------------------------- 182 | # Main 183 | # ----------------------------------------------------------------------------- 184 | check_compatibility 185 | parse_options "$@" 186 | check_input 187 | for file in "${FILES[@]}"; do 188 | update_meta "${file}" 189 | done 190 | -------------------------------------------------------------------------------- /bin/ocrpdf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # ----------------------------------------------------------------------------- 4 | # A utility to run a scanned PDF through tesseract's OCR engine and make it 5 | # searchable. 6 | # ----------------------------------------------------------------------------- 7 | 8 | # ----------------------------------------------------------------------------- 9 | # Author: Urs Roesch 10 | # License: MIT 11 | # Requires: bash, pdfcat, pdfimages (poppler-utils), pdftk, tesseract 12 | # ----------------------------------------------------------------------------- 13 | 14 | # ------------------------------------------------------------------------------ 15 | # Settings 16 | # ------------------------------------------------------------------------------ 17 | set -o nounset 18 | set -o errexit 19 | set -o pipefail 20 | 21 | trap cleanup EXIT 22 | 23 | # ------------------------------------------------------------------------------ 24 | # Globals 25 | # ------------------------------------------------------------------------------ 26 | declare -r SCRIPT=${0##*/} 27 | declare -r VERSION=0.5.0 28 | declare -r AUTHOR="Urs Roesch " 29 | declare -r LICENSE="MIT" 30 | declare -- QUIET= 31 | declare -a PDF_FILES=() 32 | declare -- EXTENSION=pdf 33 | declare -- RECOMPRESS=false 34 | declare -- JPEG_RECOMPRESS=$(command -v jpeg-recompress 2>/dev/null) 35 | declare -a INCLUDES=( 36 | pdfcat 37 | ) 38 | declare -a DEPENDENCIES=( 39 | pdfcat 40 | pdfimages 41 | pdftk 42 | tesseract 43 | ) 44 | 45 | # ------------------------------------------------------------------------------ 46 | # Functions 47 | # ------------------------------------------------------------------------------ 48 | 49 | function check_dependencies() { 50 | for dependency in ${DEPENDENCIES[@]}; do 51 | if ! command -v ${dependency} >/dev/null 2>&1; then 52 | echo "Missing dependency '${dependency}'" 1>&2 53 | echo "Please install first!" 1>&2 54 | exit 64 55 | fi 56 | done 57 | } 58 | 59 | # ------------------------------------------------------------------------------ 60 | 61 | function source_includes() { 62 | for include in ${INCLUDES[@]}; do 63 | local path=$(command -v ${include} 2>/dev/null) 64 | [[ -n ${path} ]] && source ${path} 65 | done 66 | } 67 | 68 | # ------------------------------------------------------------------------------ 69 | 70 | function determine_ocr_languages() { 71 | OCR_LANG=$( 72 | sed 's/ /+/g' <<< $( echo $( tesseract --list-langs 2>&1 | grep "^...$" ) ) 73 | ) 74 | } 75 | 76 | # ------------------------------------------------------------------------------ 77 | 78 | function recompress() { 79 | local image="${1}"; shift; 80 | [[ ${image} =~ \.jpg$ ]] || return 0 && : 81 | [[ ${RECOMPRESS} == false ]] && return 0 || : 82 | [[ -z ${JPEG_RECOMPRESS} ]] && return 0 || : 83 | ${JPEG_RECOMPRESS} "${image}" "${image}.$$" |& grep 'New size' || : 84 | mv "${image}.$$" "${image}" 85 | } 86 | 87 | # ------------------------------------------------------------------------------ 88 | 89 | function run_ocr() { 90 | local filename="${1}"; shift; 91 | tesseract -l ${OCR_LANG} "${filename}" "${filename%%.*}" pdf >/dev/null 2>&1 92 | if [[ $? -eq 0 ]]; then 93 | rm "${filename}" 94 | else 95 | echo "An error occured during character recogntion." 1>&2 96 | echo "Please investigate!" 1>&2 97 | echo "Leaving original image '${filename}' intact" 1>&2 98 | exit 128 99 | fi 100 | } 101 | 102 | # ------------------------------------------------------------------------------ 103 | 104 | function version() { 105 | printf "%s v%s\nCopyright (c) %s\nLicense - %s\n" \ 106 | "${SCRIPT}" "${VERSION}" "${AUTHOR}" "${LICENSE}" 107 | exit 0 108 | } 109 | 110 | # ------------------------------------------------------------------------------ 111 | 112 | function usage() { 113 | local exit_code=${1:-} 114 | cat < [ [,,]] 118 | 119 | Options: 120 | -h | --help This message 121 | -l | --lang Set the OCR languages to use. 122 | For multiple languages concatenate with a '+' 123 | E.g eng+deu for English and German 124 | Default: ${OCR_LANG} 125 | -q | --quiet Don't send display processed file names 126 | -R | --recompress Recompress JPEG images if 'jpeg-recompress' is present. 127 | DEFAULT: ${RECOMPRESS} 128 | -V | --version Print version information and exit 129 | 130 | Description: 131 | Runs PDFs through OCR and saves the output as a text searchable PDF 132 | with the same name. 133 | 134 | Disclaimer: 135 | Only works with PDFs comprised of a single JPEG, LZW or ZIP compressed 136 | image per page. 137 | LZW compressed images will be converted to ZIP compressed ones during 138 | the OCR process. 139 | 140 | USAGE 141 | exit ${exit_code} 142 | } 143 | 144 | # ------------------------------------------------------------------------------ 145 | 146 | function check_compatibility() { 147 | local file="${1}"; shift; 148 | ## check if all the images are single jpegs 149 | pdfimages -list "${file}" | \ 150 | awk \ 151 | '/^ +[0-9]/{p=$1; c++; $9 ~ /^(jpeg|image)$/ && j++ } 152 | END { if (p < c && p == j) {exit 1} }' 153 | } 154 | 155 | # ------------------------------------------------------------------------------ 156 | 157 | function ocr_pdfs() { 158 | for file in "${PDF_FILES[@]}"; do 159 | if ! check_compatibility "${file}"; then 160 | [[ -z ${QUIET} ]] && echo "${file} is not compatible; skipping!" || : 161 | continue 162 | else 163 | [[ -z ${QUIET} ]] && echo "Processing file ${file}" || : 164 | fi 165 | workname="$$-${RANDOM}" 166 | meta="$(pdfcat::fetch_meta "${file}")" 167 | pdfimages -all "${file}" "${workname}" 2>/dev/null && 168 | for image in ${workname}*; do 169 | recompress "${image}" 170 | run_ocr ${image} 171 | done 172 | post_process_pdf "${file}" "${workname}" "${meta}" 173 | done 174 | } 175 | 176 | # ------------------------------------------------------------------------------ 177 | 178 | function post_process_pdf() { 179 | local file="${1}"; shift; 180 | local workname="${1}"; shift; 181 | local meta="${1}"; shift; 182 | merge_pdf "${workname}.${EXTENSION}" ${workname}* 183 | pdftk "${workname}.pdf" \ 184 | update_info_utf8 <( echo "${meta}" ) \ 185 | output "${file}" 2>/dev/null 186 | rm "${workname}.pdf" 187 | } 188 | 189 | # ------------------------------------------------------------------------------ 190 | 191 | function merge_pdf() { 192 | local filename="${1}"; shift; 193 | local documents=( "${@}" ); shift; 194 | if [[ ${#documents[@]} -eq 1 ]]; then 195 | mv "${documents[0]}" "${filename}" 2>/dev/null 196 | else 197 | pdfcat::concat_pdf "${documents[@]}" > "${filename}" && 198 | rm "${documents[@]}" 2>/dev/null 199 | fi 200 | } 201 | 202 | # ------------------------------------------------------------------------------ 203 | 204 | function parse_options() { 205 | [[ ${#} -lt 1 ]] && usage 1 206 | while [[ ${#} -gt 0 ]]; do 207 | case ${1} in 208 | -h|--help) usage 0;; 209 | -l|--lang) shift; OCR_LANG=${1};; 210 | -q|--quiet) QUIET=true;; 211 | -r|--recompress) RECOMPRESS=true;; 212 | -V|--version) version;; 213 | *) PDF_FILES+=( "${1}" );; 214 | esac 215 | shift 216 | done 217 | } 218 | 219 | # ------------------------------------------------------------------------------ 220 | 221 | function cleanup() { 222 | local exit_code=$? 223 | rm $$-[0-9]*.{jpg,pdf} 2>/dev/null || : 224 | exit ${exit_code} 225 | } 226 | 227 | # ------------------------------------------------------------------------------ 228 | # Main 229 | # ------------------------------------------------------------------------------ 230 | check_dependencies 231 | source_includes 232 | determine_ocr_languages 233 | parse_options "${@}" 234 | ocr_pdfs 235 | -------------------------------------------------------------------------------- /icons/gitlab-avatar.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 26 | 27 | 28 | 47 | 55 | 59 | 63 | 67 | 71 | 72 | 73 | 81 | 85 | 89 | 90 | 91 |