├── .dockerignore ├── .envrc ├── .gitignore ├── .travis.yml ├── HACKING.md ├── LICENSE.txt ├── MANIFEST.in ├── Makefile ├── PKGBUILD_template ├── README.md ├── __init__.py ├── brew ├── .gitignore └── email2pdf_template.rb ├── debian └── DEBIAN │ ├── .gitignore │ └── control_template ├── docker └── email2pdf │ └── getmail ├── email2pdf ├── email2pdf.py ├── getmailrc.sample ├── performance └── printstats.py ├── requirements.txt ├── requirements_hacking.txt ├── setup.py └── tests ├── BaseTestClasses.py ├── Direct ├── __init__.py ├── test_Direct_Arguments.py ├── test_Direct_AttachmentDetection.py ├── test_Direct_Basic.py ├── test_Direct_BasicPlain.py ├── test_Direct_CID.py ├── test_Direct_Errors.py ├── test_Direct_FrozenTime.py ├── test_Direct_Metadata.py └── test_Direct_Module.py ├── Subprocess ├── __init__.py ├── test_Subprocess_Basic.py └── test_Subprocess_MIME.py ├── UPPERCASE.png ├── __init__.py ├── basi2c16.png └── jpeg444.jpg /.dockerignore: -------------------------------------------------------------------------------- 1 | .git 2 | *.deb 3 | -------------------------------------------------------------------------------- /.envrc: -------------------------------------------------------------------------------- 1 | layout python3 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .direnv 2 | PKGBUILD 3 | *.deb 4 | cover/ 5 | pkg/ 6 | src/ 7 | *.pkg.tar.xz 8 | 9 | # Byte-compiled / optimized / DLL files 10 | __pycache__/ 11 | *.py[cod] 12 | 13 | # C extensions 14 | *.so 15 | 16 | # Distribution / packaging 17 | .Python 18 | env/ 19 | build/ 20 | develop-eggs/ 21 | dist/ 22 | downloads/ 23 | eggs/ 24 | lib/ 25 | lib64/ 26 | parts/ 27 | sdist/ 28 | var/ 29 | *.egg-info/ 30 | .installed.cfg 31 | *.egg 32 | 33 | # PyInstaller 34 | # Usually these files are written by a python script from a template 35 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 36 | *.manifest 37 | *.spec 38 | 39 | # Installer logs 40 | pip-log.txt 41 | pip-delete-this-directory.txt 42 | 43 | # Unit test / coverage reports 44 | htmlcov/ 45 | .tox/ 46 | .coverage 47 | .cache 48 | nosetests.xml 49 | coverage.xml 50 | 51 | # Translations 52 | *.mo 53 | *.pot 54 | 55 | # Django stuff: 56 | *.log 57 | 58 | # Sphinx documentation 59 | docs/_build/ 60 | 61 | # PyBuilder 62 | target/ 63 | 64 | .email2pdf.profile 65 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | sudo: required 2 | services: 3 | - docker 4 | language: python 5 | install: 6 | - "echo 'Skip'" 7 | script: make rundocker_testing 8 | notifications: 9 | email: 10 | recipients: 11 | - secure: "Tmt2vtBW60X9digOMKdKM8NigEZX/X1wp8mbffbHaiSUkI32Y873b9ILmqF7roDhnMxJsiGQuQ/QMHsK6D5cTOs4pYwrshRySkU3TmbuHB6n5RlV5RDElFTFjs+uc0iHesQYgcIrUSqzinr6toAStAJ00Aa2RuN+137r6Et1QkY=" 12 | -------------------------------------------------------------------------------- /HACKING.md: -------------------------------------------------------------------------------- 1 | # email2pdf - Hacking 2 | 3 | This document talks about hacking/developing on email2pdf - for more 4 | information on email2pdf and how to use it, please see 5 | [README.md](https://github.com/andrewferrier/email2pdf/blob/master/README.md). 6 | 7 | In general, [bug reports/enhancement 8 | requests](https://github.com/andrewferrier/email2pdf/issues) as well as [pull 9 | requests](https://github.com/andrewferrier/email2pdf/pulls) are welcome; 10 | please note the [license 11 | conditions](https://github.com/andrewferrier/email2pdf/blob/master/LICENSE.txt). 12 | If you are trying to report an issue, please try running email2pdf with the 13 | `-vv` option to maximise the debugging output first. 14 | 15 | ## Building & Packaging 16 | 17 | All the supplied build and packaging is based on a 18 | [Makefile](https://github.com/andrewferrier/email2pdf/blob/master/Makefile). 19 | You'll need `make` if you don't have it (`sudo apt-get install make` on 20 | Ubuntu/Debian, `brew install make` on OS X). 21 | 22 | ## Design & Coding Principles 23 | 24 | * Follow [PEP-8](https://www.python.org/dev/peps/pep-0008/). Running `make 25 | analysis` will check against this and run other static code analysis checks 26 | also. 27 | 28 | * Try to keep `email2pdf` as "safe" as possible by default. Without supplying 29 | any potentially harmful command-line options, `email2pdf` will not ignore 30 | parts of the email it shouldn't, and will fail in the standard UNIX way with 31 | an error code if it has any significant doubts about the integrity of the 32 | email it's reading, or any other serious error occurs. 33 | 34 | ## Unit Tests 35 | 36 | All the unit tests are in the `tests/` directory. You can run them from the 37 | Makefile using the `unittest` or `unittest_test` targets (the second is more 38 | verbose, and stops on failing tests). 39 | 40 | All new code should be covered by a test. There is a code coverage checker 41 | target in the Makefile - run `make coverage`. You'll need to have the 42 | `coverage` and `nose` Python modules installed (`pip3 install coverage nose`) 43 | to run them. 44 | 45 | In addition to the standard dependencies from the [standard install 46 | documentation](https://github.com/andrewferrier/email2pdf/blob/master/README.md), 47 | there are some additional dependencies which will be needed to make the tests 48 | work: 49 | 50 | ### OS X 51 | 52 | Just run `pip3 install -r requirements_hacking.txt`. 53 | 54 | ### Debian/Ubuntu 55 | 56 | * `python3-freezegun` - only available in Ubuntu 14.10 onwards - see 57 | . If you are on 58 | an earlier version, you can download the `.deb` manually and install with 59 | `dpkg -i`. 60 | 61 | * `python3-reportlab` - install with `apt-get install python3-reportlab`. 62 | 63 | * `python3-pdfminer3k` (not a standard Debian/Ubuntu package, but there is a 64 | supplied Makefile target which will create it for you using a Docker 65 | container - run `make rundocker_getdebs`, then `dpkg -i` the package when 66 | you are done). 67 | 68 | ## Docker 69 | 70 | There is some experimental packaging for [Docker](https://www.docker.com/) 71 | also. Of course, you need to have Docker installed for this to work, which is 72 | outside the scope of this document. You can run the following `make` targets: 73 | 74 | * `rundocker_interactive` - build and start a Docker image, at the `bash` 75 | prompt. Can be used to interactively test email2pdf. 76 | 77 | * `rundocker_testing` - build and start the Docker image, run the entire unit 78 | testing and style testing suites, and exit. 79 | 80 | * `rundocker_getdebs` - build and start the Docker image, and copy out various 81 | `.debs`, including the `.deb` for email2pdf itself, and various dependencies 82 | that are harder to come by or need to be built manually. 83 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2014-2016 Andrew Ferrier 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of 6 | this software and associated documentation files (the "Software"), to deal in 7 | the Software without restriction, including without limitation the rights to 8 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 9 | the Software, and to permit persons to whom the Software is furnished to do so, 10 | subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 17 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 18 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 19 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 20 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 21 | -------------------------------------------------------------------------------- /MANIFEST.in: -------------------------------------------------------------------------------- 1 | include email2pdf 2 | include *.py 3 | include *.md 4 | include .travis.yml -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | ROOTDIR :=$(shell dirname $(realpath $(lastword $(MAKEFILE_LIST)))) 2 | TEMPDIR := $(shell mktemp -t tmp.XXXXXX -d) 3 | FLAKE8 := $(shell which flake8) 4 | 5 | .PHONY: all builddeb clean test 6 | 7 | clean: 8 | git clean -x -f 9 | 10 | determineversion: 11 | $(eval GITDESCRIBE := $(shell git describe --dirty)) 12 | sed 's/Version: .*/Version: $(GITDESCRIBE)/' debian/DEBIAN/control_template > debian/DEBIAN/control 13 | $(eval GITDESCRIBE_ABBREV := $(shell git describe --abbrev=0)) 14 | sed 's/X\.Y/$(GITDESCRIBE_ABBREV)/' brew/email2pdf_template.rb > brew/email2pdf.rb 15 | sed 's/pkgver=X/pkgver=$(GITDESCRIBE_ABBREV)/' PKGBUILD_template > PKGBUILD 16 | 17 | builddeb: determineversion builddeb_real 18 | 19 | builddeb_real: 20 | dpkg -s build-essential || sudo apt-get install build-essential 21 | cp -R debian/DEBIAN/ $(TEMPDIR) 22 | mkdir -p $(TEMPDIR)/usr/bin 23 | mkdir -p $(TEMPDIR)/usr/share/doc/email2pdf 24 | cp email2pdf $(TEMPDIR)/usr/bin 25 | cp README* $(TEMPDIR)/usr/share/doc/email2pdf 26 | cp LICENSE* $(TEMPDIR)/usr/share/doc/email2pdf 27 | cp getmailrc.sample $(TEMPDIR)/usr/share/doc/email2pdf 28 | fakeroot chmod -R u=rwX,go=rX $(TEMPDIR) 29 | fakeroot chmod -R u+x $(TEMPDIR)/usr/bin 30 | fakeroot dpkg-deb --build $(TEMPDIR) . 31 | 32 | buildarch: determineversion 33 | makepkg --skipinteg 34 | 35 | unittest: 36 | python3 -m unittest discover 37 | 38 | unittest_verbose: 39 | python3 -m unittest discover -f -v 40 | 41 | install_osx_brew: determineversion 42 | brew install -f file://$(ROOTDIR)/brew/email2pdf.rb 43 | 44 | reinstall_osx_brew: determineversion 45 | brew reinstall file://$(ROOTDIR)/brew/email2pdf.rb 46 | 47 | analysis: 48 | # Debian version is badly packaged, make sure we are using Python 3. 49 | -/usr/bin/env python3 $(FLAKE8) --max-line-length=132 email2pdf tests/ 50 | pylint -r n --disable=line-too-long --disable=missing-docstring --disable=locally-disabled email2pdf tests/ 51 | 52 | coverage: 53 | rm -rf cover/ 54 | nosetests tests/Direct/*.py --with-coverage --cover-package=email2pdf,tests --cover-erase --cover-html --cover-branches 55 | open cover/index.html 56 | 57 | .email2pdf.profile: email2pdf 58 | python3 -m cProfile -o .email2pdf.profile `which nosetests` . 59 | 60 | profile: .email2pdf.profile 61 | python3 performance/printstats.py | less 62 | 63 | test: unittest analysis coverage 64 | -------------------------------------------------------------------------------- /PKGBUILD_template: -------------------------------------------------------------------------------- 1 | pkgbase='email2pdf' 2 | pkgname=(email2pdf) 3 | pkgver=X 4 | pkgrel=0 5 | pkgdesc="email2pdf" 6 | arch=(any) 7 | url="" 8 | license=(MIT) 9 | groups=(ajf) 10 | 11 | source=( 12 | email2pdf 13 | ) 14 | 15 | package_email2pdf() { 16 | depends=( 17 | python-beautifulsoup4 18 | python-coloredlogs 19 | python-html5lib 20 | python-magic 21 | python-pypdf2 22 | wkhtmltopdf 23 | ) 24 | 25 | install -Dm 755 email2pdf "$pkgdir"/usr/bin/email2pdf 26 | } 27 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # email2pdf 2 | 3 | **⚠️ DEPRECATED: This repository is deprecated, as I don't use email2pdf any more or have the time to maintain it. For now, it will remain here in case anyone wishes to fork and maintain it.** 4 | 5 | email2pdf is a Python script to convert emails to PDF from the command-line. 6 | It is not interactive (it doesn't run from a browser or have a GUI), but is 7 | intended to be run as a [mail delivery 8 | agent](http://en.wikipedia.org/wiki/Mail_delivery_agent) - it won't retrieve 9 | emails for you, but it will take them from standard input as an MDA will and 10 | 'deliver' them to PDF files. It is well-placed to use together with 11 | [getmail](http://pyropus.ca/software/getmail/), perhaps run on a schedule 12 | using [cron](https://en.wikipedia.org/wiki/Cron) or similar. You can also just 13 | use it as a standalone utility to convert a raw email (normally an 14 | [.eml](https://en.wikipedia.org/wiki/Email#Filename_extensions) file) to a 15 | PDF. Type `email2pdf --help` for more information on usage and options 16 | available. 17 | 18 | For more information on hacking/developing email2pdf, please see 19 | [HACKING.md](https://github.com/andrewferrier/email2pdf/blob/master/HACKING.md). 20 | Note that use is subject to the [license 21 | conditions](https://github.com/andrewferrier/email2pdf/blob/master/LICENSE.txt). 22 | 23 | ## Installing Dependencies 24 | 25 | Before you can use email2pdf, you need to install some dependencies. The 26 | instructions here are split out by platform: 27 | 28 | ### Debian/Ubuntu 29 | 30 | * [wkhtmltopdf](http://wkhtmltopdf.org/) - Install the `.deb` from 31 | http://wkhtmltopdf.org/ rather than using apt-get to minimise the 32 | dependencies you need to install (in particular, to avoid needing a package 33 | manager). 34 | 35 | * [getmail](http://pyropus.ca/software/getmail/) - getmail is optional, but it 36 | works well as a companion to email2pdf. Install using `apt-get install 37 | getmail`. 38 | 39 | * Others - there are some other Python library dependencies. Run `make 40 | builddeb` to create a `.deb` package, then install it with `dpkg -i 41 | mydeb.deb`. This will prompt you regarding any missing dependencies. 42 | 43 | ### OS X 44 | 45 | * [wkhtmltopdf](http://wkhtmltopdf.org/) - Install the package from 46 | http://wkhtmltopdf.org/downloads.html. 47 | 48 | * [getmail](http://pyropus.ca/software/getmail/) - TODO: This hasn't been 49 | tested, so there are no instructions here yet! Note that getmail is 50 | optional. 51 | 52 | * Install [Homebrew](http://brew.sh/) 53 | 54 | * `xcode-select --install` (for lxml, because of 55 | [this](http://stackoverflow.com/questions/19548011/cannot-install-lxml-on-mac-os-x-10-9)) 56 | 57 | * `brew install python3` (or otherwise make sure you have Python 3 and `pip3` 58 | available). 59 | 60 | * `brew install libmagic` 61 | 62 | * `pip3 install -r requirements.txt` 63 | 64 | ## Configuring getmail 65 | 66 | getmail is not strictly a dependency, but when it is combined with email2pdf, 67 | it can be used to retrieve new emails from a remote IMAP server and 68 | automatically convert them to PDFs locally. The 69 | [`getmailrc.sample`](https://github.com/andrewferrier/email2pdf/blob/master/getmailrc.sample) 70 | file in the repository can be used as a starting point for your own getmailrc 71 | to do this. Note that the sample will need editing, of course - see the 72 | getmail documentation for more information on that. Also, it is configured by 73 | default to *delete* remote emails from the server once they are converted - be 74 | careful with that. You might want to consider setting up your crontab 75 | something like this: 76 | 77 | ``` 78 | @hourly getmail --verbose | logger 79 | ``` 80 | 81 | This will ensure that getmail is invoked hourly to fetch email, and log its 82 | output to syslog. 83 | 84 | If your mailserver is unreliable, you might want to consider wrapping the getmail 85 | cron job with [cromer](https://github.com/andrewferrier/cromer). 86 | 87 | ## Configuring procmail 88 | 89 | I don't have any direct experience using procmail with email2pdf, so don't have any 90 | specific setup steps, although I understand it can be made to work. You should be 91 | aware that currently there is an outstanding issue with I/O encodings with procmail 92 | that you may need to work around - see [issue #76](https://github.com/andrewferrier/email2pdf/issues/76) for more information. 93 | 94 | ## Related Projects 95 | 96 | Harvinderpal Ghotra has refactored email2pdf into a 97 | [library](https://github.com/hghotra/eml2pdflib), which may be helpful if you 98 | need to embed email2pdf-like functionality in a Python program (although there 99 | is no specific effort to keep these two projects in sync). 100 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewferrier/email2pdf/c3b20226bc255a75f52c762aece66c58fb76b2c4/__init__.py -------------------------------------------------------------------------------- /brew/.gitignore: -------------------------------------------------------------------------------- 1 | email2pdf.rb 2 | -------------------------------------------------------------------------------- /brew/email2pdf_template.rb: -------------------------------------------------------------------------------- 1 | class Email2pdf < Formula 2 | desc "Email2PDF" 3 | homepage "http://github.com/andrewferrier/email2pdf" 4 | url "https://github.com/andrewferrier/email2pdf/archive/X.Y.zip" 5 | version "X.Y" 6 | 7 | depends_on "python@3" 8 | depends_on "libmagic" 9 | 10 | def install 11 | bin.install "email2pdf" 12 | doc.install "README.md", "LICENSE.txt" 13 | end 14 | end 15 | -------------------------------------------------------------------------------- /debian/DEBIAN/.gitignore: -------------------------------------------------------------------------------- 1 | control 2 | -------------------------------------------------------------------------------- /debian/DEBIAN/control_template: -------------------------------------------------------------------------------- 1 | Package: email2pdf 2 | Version: 3 | Section: base 4 | Priority: optional 5 | Architecture: all 6 | Depends: python3, python3-bs4 (>=4.6.3), python3-html5lib, python3-pypdf2, python3-magic, wkhtmltox 7 | Recommends: getmail 8 | Maintainer: Andrew Ferrier 9 | Description: MDA which converts email to PDF 10 | A mail delivery agent which converts emails to PDF. 11 | -------------------------------------------------------------------------------- /docker/email2pdf/getmail: -------------------------------------------------------------------------------- 1 | */5 * * * * getmail 2 | -------------------------------------------------------------------------------- /email2pdf: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python3 2 | 3 | from datetime import datetime 4 | from email.header import decode_header 5 | from itertools import chain 6 | from subprocess import Popen, PIPE 7 | from sys import platform as _platform 8 | from urllib.error import URLError, HTTPError 9 | from urllib.request import Request, urlopen 10 | import argparse 11 | import chardet 12 | import email 13 | import functools 14 | import html 15 | import io 16 | import locale 17 | import logging 18 | import logging.handlers 19 | import mimetypes 20 | import os 21 | import os.path 22 | import pprint 23 | import re 24 | import shutil 25 | import sys 26 | import tempfile 27 | import textwrap 28 | import traceback 29 | 30 | from PyPDF2 import PdfFileReader, PdfFileWriter 31 | from PyPDF2.generic import NameObject, createStringObject 32 | from bs4 import BeautifulSoup 33 | import magic 34 | 35 | assert sys.version_info >= (3, 4) 36 | 37 | mimetypes.init() 38 | 39 | HEADER_MAPPING = {'Author': 'From', 40 | 'Title': 'Subject', 41 | 'X-email2pdf-To': 'To'} 42 | 43 | FORMATTED_HEADERS_TO_INCLUDE = ['Subject', 'From', 'To', 'Date'] 44 | 45 | MIME_TYPES_BLACKLIST = frozenset(['text/html', 'text/plain']) 46 | 47 | AUTOCALCULATED_FILENAME_EXTENSION_BLACKLIST = frozenset(['.jpe', '.jpeg']) 48 | 49 | AUTOGENERATED_ATTACHMENT_PREFIX = 'floating_attachment' 50 | 51 | IMAGE_LOAD_BLACKLIST = frozenset(['emltrk.com', 'trk.email', 'shim.gif']) 52 | 53 | WKHTMLTOPDF_ERRORS_IGNORE = frozenset([r'QFont::setPixelSize: Pixel size <= 0 \(0\)', 54 | r'Invalid SOS parameters for sequential JPEG', 55 | r'libpng warning: Out of place sRGB chunk', 56 | r'Exit with code 1 due to network error: ContentNotFoundError', 57 | r'Exit with code 1 due to network error: ProtocolUnknownError', 58 | r'Exit with code 1 due to network error: UnknownContentError', 59 | r'libpng warning: iCCP: known incorrect sRGB profile']) 60 | 61 | WKHTMLTOPDF_EXTERNAL_COMMAND = 'wkhtmltopdf' 62 | 63 | 64 | def main(argv, syslog_handler, syserr_handler): 65 | logger = logging.getLogger('email2pdf') 66 | warning_count_filter = WarningCountFilter() 67 | logger.addFilter(warning_count_filter) 68 | 69 | proceed, args = handle_args(argv) 70 | 71 | if not proceed: 72 | return (False, False) 73 | 74 | if args.enforce_syslog and not syslog_handler: 75 | raise FatalException("Required syslog socket was not found.") 76 | 77 | if syslog_handler: 78 | if args.verbose > 0: 79 | syslog_handler.setLevel(logging.DEBUG) 80 | else: 81 | syslog_handler.setLevel(logging.INFO) 82 | 83 | if syserr_handler: 84 | if args.verbose > 1: 85 | syserr_handler.setLevel(logging.DEBUG) 86 | elif args.verbose == 1: 87 | syserr_handler.setLevel(logging.INFO) 88 | elif not args.mostly_hide_warnings: 89 | syserr_handler.setLevel(logging.WARNING) 90 | else: 91 | syserr_handler.setLevel(logging.ERROR) 92 | 93 | logger.info("Options used are: " + str(args)) 94 | 95 | if not shutil.which(WKHTMLTOPDF_EXTERNAL_COMMAND): 96 | raise FatalException("email2pdf requires wkhtmltopdf to be installed - please see " 97 | "https://github.com/andrewferrier/email2pdf/blob/master/README.md#installing-dependencies " 98 | "for more information.") 99 | 100 | output_directory = os.path.normpath(args.output_directory) 101 | 102 | if not os.path.exists(output_directory): 103 | raise FatalException("output-directory does not exist.") 104 | 105 | output_file_name = get_output_file_name(args, output_directory) 106 | logger.info("Output file name is: " + output_file_name) 107 | 108 | set_up_warning_logger(logger, output_file_name) 109 | 110 | input_data = get_input_data(args) 111 | logger.debug("Email input data is: " + input_data) 112 | 113 | input_email = get_input_email(input_data) 114 | (payload, parts_already_used) = handle_message_body(args, input_email) 115 | logger.debug("Payload after handle_message_body: " + str(payload)) 116 | 117 | if args.body: 118 | payload = remove_invalid_urls(payload) 119 | 120 | if args.headers: 121 | header_info = get_formatted_header_info(input_email) 122 | logger.info("Header info is: " + header_info) 123 | payload = header_info + payload 124 | 125 | logger.debug("Final payload before output_body_pdf: " + payload) 126 | output_body_pdf(input_email, bytes(payload, 'UTF-8'), output_file_name) 127 | 128 | if args.attachments: 129 | number_of_attachments = handle_attachments(input_email, 130 | output_directory, 131 | args.add_prefix_date, 132 | args.ignore_floating_attachments, 133 | parts_already_used) 134 | 135 | if (not args.body) and number_of_attachments == 0: 136 | logger.info("First try: didn't print body (on request) or extract any attachments. Retrying with filenamed parts.") 137 | parts_with_a_filename = filter_filenamed_parts(parts_already_used) 138 | if len(parts_with_a_filename) > 0: 139 | number_of_attachments = handle_attachments(input_email, 140 | output_directory, 141 | args.add_prefix_date, 142 | args.ignore_floating_attachments, 143 | set(parts_already_used - parts_with_a_filename)) 144 | 145 | if number_of_attachments == 0: 146 | logger.warning("Second try: didn't print body (on request) and still didn't find any attachments even when looked for " 147 | "referenced ones with a filename. Giving up.") 148 | 149 | if warning_count_filter.warning_pending: 150 | with open(get_modified_output_file_name(output_file_name, "_original.eml"), 'w') as original_copy_file: 151 | original_copy_file.write(input_data) 152 | 153 | return (warning_count_filter.warning_pending, args.mostly_hide_warnings) 154 | 155 | 156 | def handle_args(argv): 157 | class ArgumentParser(argparse.ArgumentParser): 158 | 159 | def error(self, message): 160 | raise FatalException(message) 161 | 162 | parser = ArgumentParser(description="Converts emails to PDFs. " 163 | "See https://github.com/andrewferrier/email2pdf for more information.", add_help=False) 164 | 165 | parser.add_argument("-i", "--input-file", default="-", 166 | help="File containing input email you wish to read in raw form " 167 | "delivered from a MTA. If set to '-' (which is the default), it " 168 | "reads from stdin.") 169 | 170 | parser.add_argument("--input-encoding", 171 | default=locale.getpreferredencoding(), help="Set the " 172 | "expected encoding of the input email (whether on stdin " 173 | "or specified with the --input-file option). If not set, " 174 | "defaults to this system's preferred encoding, which " 175 | "is " + locale.getpreferredencoding() + ".") 176 | 177 | parser.add_argument("-o", "--output-file", 178 | help="Output file you wish to print the body of the email to as PDF. Should " 179 | "include the complete path, otherwise it defaults to the current directory. If " 180 | "this option is not specified, email2pdf picks a date & time-based filename and puts " 181 | "the file in the directory specified by --output-directory.") 182 | 183 | parser.add_argument("-d", "--output-directory", default=os.getcwd(), 184 | help="If --output-file is not specified, the value of this parameter is used as " 185 | "the output directory for the body PDF, with a date-and-time based filename attached. " 186 | "In either case, this parameter also specifies the directory in which attachments are " 187 | "stored. Defaults to the current directory (i.e. " + os.getcwd() + ").") 188 | 189 | body_attachment_options = parser.add_mutually_exclusive_group() 190 | 191 | body_attachment_options.add_argument("--no-body", dest='body', action='store_false', default=True, 192 | help="Don't parse the body of the email and print it to PDF, just detach " 193 | "attachments. The default is to parse both the body and detach attachments.") 194 | 195 | body_attachment_options.add_argument("--no-attachments", dest='attachments', action='store_false', default=True, 196 | help="Don't detach attachments, just print the body of the email to PDF.") 197 | 198 | parser.add_argument("--headers", action='store_true', 199 | help="Add basic email headers (" + ", ".join(FORMATTED_HEADERS_TO_INCLUDE) + 200 | ") to the first PDF page. The default is not to do this.") 201 | 202 | parser.add_argument("--add-prefix-date", action="store_true", 203 | help="Prepend an ISO-8601 prefix date (e.g. YYYY-MM-DD-) to any attachment filename " 204 | "that doesn't have one. Will search through the whole filename for an existing " 205 | "date in that format - if not found, it prepends one.") 206 | 207 | parser.add_argument("--ignore-floating-attachments", action="store_true", 208 | help="Emails sometimes contain attachments that don't have a filename and aren't " 209 | "embedded in the main HTML body of the email using a Content-ID either. By " 210 | "default, email2pdf will detach these and use their Content-ID as a filename, " 211 | "or autogenerate a filename. If this option is specified, it will instead ignore " 212 | "them.") 213 | 214 | parser.add_argument("--enforce-syslog", action="store_true", 215 | help="By default email2pdf will use syslog if available and just log to stderr " 216 | "if not. If this option is specified, email2pdf will exit with an error if the syslog socket " 217 | "can not be located.") 218 | 219 | verbose_options = parser.add_mutually_exclusive_group() 220 | 221 | verbose_options.add_argument("--mostly-hide-warnings", action="store_true", 222 | help="By default email2pdf will output warnings about handling emails to stderr and " 223 | "exit with a non-zero return code if any are encountered, *as well as* outputting a " 224 | "summary file entitled _warnings_and_errors.txt and the original " 225 | "email as _original.eml. Specifying this option disables the first " 226 | "two, so only the additional files are produced - this makes it easier to use email2pdf " 227 | "if it is run on a schedule, as warnings won't cause the same email to be repeatedly " 228 | "retried.") 229 | 230 | verbose_options.add_argument('-v', '--verbose', action='count', default=0, 231 | help="Make the output more verbose. This affects both the output logged to " 232 | "syslog, as well as output to the console. Using this twice makes it doubly verbose.") 233 | 234 | parser.add_argument('-h', '--help', action='store_true', 235 | help="Show some basic help information about how to use email2pdf.") 236 | 237 | args = parser.parse_args(argv[1:]) 238 | 239 | assert args.body or args.attachments 240 | 241 | if args.help: 242 | parser.print_help() 243 | return (False, None) 244 | else: 245 | return (True, args) 246 | 247 | 248 | def get_input_data(args): 249 | logger = logging.getLogger("email2pdf") 250 | 251 | logger.debug("System preferred encoding is: " + locale.getpreferredencoding()) 252 | logger.debug("System encoding is: " + str(locale.getlocale())) 253 | logger.debug("Input encoding that will be used is " + args.input_encoding) 254 | 255 | if args.input_file.strip() == "-": 256 | data = "" 257 | input_stream = io.TextIOWrapper(sys.stdin.buffer, encoding=args.input_encoding) 258 | for line in input_stream: 259 | data += line 260 | else: 261 | with open(args.input_file, "r", encoding=args.input_encoding) as input_handle: 262 | data = input_handle.read() 263 | 264 | return data 265 | 266 | 267 | def get_input_email(input_data): 268 | input_email = email.message_from_string(input_data) 269 | 270 | defects = input_email.defects 271 | for part in input_email.walk(): 272 | defects.extend(part.defects) 273 | 274 | if len(defects) > 0: 275 | raise FatalException("Defects parsing email: " + pprint.pformat(defects)) 276 | 277 | return input_email 278 | 279 | 280 | def get_output_file_name(args, output_directory): 281 | if args.output_file: 282 | output_file_name = args.output_file 283 | if os.path.isfile(output_file_name): 284 | raise FatalException("Output file " + output_file_name + " already exists.") 285 | else: 286 | output_file_name = get_unique_version(os.path.join(output_directory, 287 | datetime.now().strftime("%Y-%m-%dT%H-%M-%S") + ".pdf")) 288 | 289 | return output_file_name 290 | 291 | 292 | def set_up_warning_logger(logger, output_file_name): 293 | warning_logger_name = get_modified_output_file_name(output_file_name, "_warnings_and_errors.txt") 294 | warning_logger = logging.FileHandler(warning_logger_name, delay=True) 295 | warning_logger.setLevel(logging.WARNING) 296 | warning_logger.setFormatter(logging.Formatter('%(levelname)s: %(message)s')) 297 | logger.addHandler(warning_logger) 298 | 299 | 300 | def get_modified_output_file_name(output_file_name, append): 301 | (partial_name, _) = os.path.splitext(output_file_name) 302 | partial_name = os.path.join(os.path.dirname(partial_name), 303 | os.path.basename(partial_name) + append) 304 | return partial_name 305 | 306 | 307 | def handle_message_body(args, input_email): 308 | logger = logging.getLogger("email2pdf") 309 | 310 | cid_parts_used = set() 311 | 312 | part = find_part_by_content_type(input_email, "text/html") 313 | if part is None: 314 | part = find_part_by_content_type(input_email, "text/plain") 315 | if part is None: 316 | if not args.body: 317 | logger.debug("No body parts found, but using --no-body; proceeding.") 318 | return (None, cid_parts_used) 319 | else: 320 | raise FatalException("No body parts found; aborting.") 321 | else: 322 | payload = handle_plain_message_body(part) 323 | else: 324 | (payload, cid_parts_used) = handle_html_message_body(input_email, part) 325 | 326 | return (payload, cid_parts_used) 327 | 328 | 329 | def handle_plain_message_body(part): 330 | logger = logging.getLogger("email2pdf") 331 | 332 | if part['Content-Transfer-Encoding'] == '8bit': 333 | payload = part.get_payload(decode=False) 334 | assert isinstance(payload, str) 335 | logger.info("Email is pre-decoded because Content-Transfer-Encoding is 8bit") 336 | else: 337 | payload = part.get_payload(decode=True) 338 | assert isinstance(payload, bytes) 339 | charset = part.get_content_charset() 340 | if not charset: 341 | charset = 'utf-8' 342 | logger.info("Determined email is plain text, defaulting to charset utf-8") 343 | else: 344 | logger.info("Determined email is plain text with charset " + str(charset)) 345 | 346 | if isinstance(payload, bytes): 347 | try: 348 | payload = str(payload, charset) 349 | except UnicodeDecodeError: 350 | logger.warning("UnicodeDecodeErrors in plain message body, using 'replace'") 351 | payload = str(payload, charset, errors='replace') 352 | 353 | payload = "\n".join( # Wrap long lines, individually 354 | [ textwrap.fill(line, width=80) for line in payload.splitlines() ] 355 | ) 356 | payload = html.escape(payload) 357 | payload = "
\n" + payload + "\n
" 358 | 359 | return payload 360 | 361 | 362 | def handle_html_message_body(input_email, part): 363 | logger = logging.getLogger("email2pdf") 364 | 365 | cid_parts_used = set() 366 | 367 | payload = part.get_payload(decode=True) 368 | charset = part.get_content_charset() 369 | if not charset: 370 | charset = 'utf-8' 371 | logger.info("Determined email is HTML with charset " + str(charset)) 372 | 373 | try: 374 | payload_unicode = str(payload, charset) 375 | except UnicodeDecodeError: 376 | detection = chardet.detect(payload) 377 | charset = detection["encoding"] 378 | logger.info("Detected charset can't decode body; trying again with charset " + charset) 379 | payload_unicode = str(payload, charset) 380 | 381 | def cid_replace(cid_parts_used, matchobj): 382 | cid = matchobj.group(1) 383 | 384 | logger.debug("Looking for image for cid " + cid) 385 | image_part = find_part_by_content_id(input_email, cid) 386 | 387 | if image_part is None: 388 | image_part = find_part_by_content_type_name(input_email, cid) 389 | 390 | if image_part is not None: 391 | assert image_part['Content-Transfer-Encoding'] == 'base64' 392 | image_base64 = image_part.get_payload(decode=False) 393 | image_base64 = re.sub("[\r\n\t]", "", image_base64) 394 | image_decoded = image_part.get_payload(decode=True) 395 | mime_type = get_mime_type(image_decoded) 396 | cid_parts_used.add(image_part) 397 | return "data:" + mime_type + ";base64," + image_base64 398 | else: 399 | logger.warning("Could not find image cid " + cid + " in email content.") 400 | return "broken" 401 | 402 | payload = re.sub(r'cid:([\w_@.-]+)', functools.partial(cid_replace, cid_parts_used), 403 | payload_unicode) 404 | 405 | return (payload, cid_parts_used) 406 | 407 | 408 | def output_body_pdf(input_email, payload, output_file_name): 409 | logger = logging.getLogger("email2pdf") 410 | 411 | wkh2p_process = Popen([WKHTMLTOPDF_EXTERNAL_COMMAND, '-q', '--load-error-handling', 'ignore', 412 | '--load-media-error-handling', 'ignore', '--encoding', 'utf-8', '-', 413 | output_file_name], stdin=PIPE, stdout=PIPE, stderr=PIPE) 414 | output, error = wkh2p_process.communicate(input=payload) 415 | assert output == b'' 416 | 417 | stripped_error = str(error, 'utf-8') 418 | if os.environ['XDG_SESSION_TYPE'] == 'wayland': 419 | w_err = r'Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on ' \ 420 | r'Wayland anyway.' 421 | global WKHTMLTOPDF_ERRORS_IGNORE 422 | WKHTMLTOPDF_ERRORS_IGNORE = WKHTMLTOPDF_ERRORS_IGNORE.union({w_err}) 423 | 424 | for error_pattern in WKHTMLTOPDF_ERRORS_IGNORE: 425 | (stripped_error, number_of_subs_made) = re.subn(error_pattern, '', stripped_error) 426 | if number_of_subs_made > 0: 427 | logger.debug("Made " + str(number_of_subs_made) + " subs with pattern " + error_pattern) 428 | 429 | original_error = str(error, 'utf-8').rstrip() 430 | stripped_error = stripped_error.rstrip() 431 | 432 | if wkh2p_process.returncode > 0 and original_error == '': 433 | raise FatalException("wkhtmltopdf failed with exit code " + str(wkh2p_process.returncode) + ", no error output.") 434 | elif wkh2p_process.returncode > 0 and stripped_error != '': 435 | raise FatalException("wkhtmltopdf failed with exit code " + str(wkh2p_process.returncode) + ", stripped error: " + 436 | stripped_error) 437 | elif stripped_error != '': 438 | raise FatalException("wkhtmltopdf exited with rc = 0 but produced unknown stripped error output " + stripped_error) 439 | 440 | add_metadata_obj = {} 441 | 442 | for key in HEADER_MAPPING: 443 | if HEADER_MAPPING[key] in input_email: 444 | add_metadata_obj[key] = get_utf8_header(input_email[HEADER_MAPPING[key]]) 445 | 446 | add_metadata_obj['Producer'] = 'email2pdf' 447 | 448 | add_update_pdf_metadata(output_file_name, add_metadata_obj) 449 | 450 | 451 | def remove_invalid_urls(payload): 452 | logger = logging.getLogger("email2pdf") 453 | 454 | soup = BeautifulSoup(payload, "html5lib") 455 | 456 | for img in soup.find_all('img'): 457 | if img.has_attr('src'): 458 | src = img['src'] 459 | lower_src = src.lower() 460 | if lower_src == 'broken': 461 | del img['src'] 462 | elif not lower_src.startswith('data'): 463 | found_blacklist = False 464 | 465 | for image_load_blacklist_item in IMAGE_LOAD_BLACKLIST: 466 | if image_load_blacklist_item in lower_src: 467 | found_blacklist = True 468 | 469 | if not found_blacklist: 470 | logger.debug("Getting img URL " + src) 471 | 472 | if not can_url_fetch(src): 473 | logger.warning("Could not retrieve img URL " + src + ", replacing with blank.") 474 | del img['src'] 475 | else: 476 | logger.debug("Removing URL that was found in blacklist " + src) 477 | del img['src'] 478 | else: 479 | logger.debug("Ignoring URL " + src) 480 | 481 | return str(soup) 482 | 483 | 484 | def can_url_fetch(src): 485 | try: 486 | encoded_src = src.replace(" ", "%20") 487 | req = Request(encoded_src) 488 | urlopen(req) 489 | except HTTPError: 490 | return False 491 | except URLError: 492 | return False 493 | except ValueError: 494 | return False 495 | else: 496 | return True 497 | 498 | 499 | def handle_attachments(input_email, output_directory, add_prefix_date, ignore_floating_attachments, parts_to_ignore): 500 | logger = logging.getLogger("email2pdf") 501 | 502 | parts = find_all_attachments(input_email, parts_to_ignore) 503 | logger.debug("Attachments found by handle_attachments: " + str(len(parts))) 504 | 505 | for part in parts: 506 | filename = extract_part_filename(part) 507 | if not filename: 508 | if ignore_floating_attachments: 509 | continue 510 | 511 | filename = get_content_id(part) 512 | if not filename: 513 | filename = AUTOGENERATED_ATTACHMENT_PREFIX 514 | 515 | extension = get_type_extension(part.get_content_type()) 516 | if extension: 517 | filename = filename + extension 518 | 519 | assert filename is not None 520 | 521 | if add_prefix_date: 522 | if not re.search(r"\d\d\d\d[-_]\d\d[-_]\d\d", filename): 523 | filename = datetime.now().strftime("%Y-%m-%d-") + filename 524 | 525 | logger.info("Extracting attachment " + filename) 526 | 527 | full_filename = os.path.join(output_directory, filename) 528 | full_filename = get_unique_version(full_filename) 529 | 530 | payload = part.get_payload(decode=True) 531 | with open(full_filename, 'wb') as output_file: 532 | output_file.write(payload) 533 | 534 | return len(parts) 535 | 536 | 537 | def add_update_pdf_metadata(filename, update_dictionary): 538 | # This seems to be the only way to modify the existing PDF metadata. 539 | # 540 | # pylint: disable=protected-access, no-member 541 | 542 | def add_prefix(value): 543 | return '/' + value 544 | 545 | full_update_dictionary = {add_prefix(k): v for k, v in update_dictionary.items()} 546 | 547 | with open(filename, 'rb') as input_file: 548 | pdf_input = PdfFileReader(input_file) 549 | pdf_output = PdfFileWriter() 550 | 551 | for page in range(pdf_input.getNumPages()): 552 | pdf_output.addPage(pdf_input.getPage(page)) 553 | 554 | info_dict = pdf_output._info.getObject() 555 | 556 | info = pdf_input.documentInfo 557 | 558 | full_update_dictionary = dict(chain(info.items(), full_update_dictionary.items())) 559 | 560 | for key in full_update_dictionary: 561 | assert full_update_dictionary[key] is not None 562 | info_dict.update({NameObject(key): createStringObject(full_update_dictionary[key])}) 563 | 564 | os_file_out, temp_file_name = tempfile.mkstemp(prefix="email2pdf_add_update_pdf_metadata", suffix=".pdf") 565 | # Immediately close the file as created to work around issue on 566 | # Windows where file cannot be opened twice. 567 | os.close(os_file_out) 568 | 569 | with open(temp_file_name, 'wb') as file_out: 570 | pdf_output.write(file_out) 571 | 572 | shutil.move(temp_file_name, filename) 573 | 574 | 575 | def extract_part_filename(part): 576 | logger = logging.getLogger("email2pdf") 577 | filename = part.get_filename() 578 | if filename is not None: 579 | logger.debug("Pre-decoded filename: " + filename) 580 | if decode_header(filename)[0][1] is not None: 581 | logger.debug("Encoding: " + str(decode_header(filename)[0][1])) 582 | logger.debug("Filename in bytes: " + str(decode_header(filename)[0][0])) 583 | filename = str(decode_header(filename)[0][0], (decode_header(filename)[0][1])) 584 | logger.debug("Post-decoded filename: " + filename) 585 | return filename 586 | else: 587 | return None 588 | 589 | 590 | def get_unique_version(filename): 591 | # From here: http://stackoverflow.com/q/183480/27641 592 | counter = 1 593 | file_name_parts = os.path.splitext(filename) 594 | while os.path.isfile(filename): 595 | filename = file_name_parts[0] + '_' + str(counter) + file_name_parts[1] 596 | counter += 1 597 | return filename 598 | 599 | 600 | def find_part_by_content_type_name(message, content_type_name): 601 | for part in message.walk(): 602 | if part.get_param('name', header="Content-Type") == content_type_name: 603 | return part 604 | return None 605 | 606 | 607 | def find_part_by_content_type(message, content_type): 608 | for part in message.walk(): 609 | if part.get_content_type() == content_type: 610 | return part 611 | return None 612 | 613 | 614 | def find_part_by_content_id(message, content_id): 615 | for part in message.walk(): 616 | if part['Content-ID'] in (content_id, '<' + content_id + '>'): 617 | return part 618 | return None 619 | 620 | 621 | def get_content_id(part): 622 | content_id = part['Content-ID'] 623 | if content_id: 624 | content_id = content_id.lstrip('<').rstrip('>') 625 | 626 | return content_id 627 | 628 | # part.get_content_disposition() is only available in Python 3.5+, so this is effectively a backport so we can continue to support 629 | # earlier versions of Python 3. It uses an internal API so is a bit unstable and should be replaced with something stable when we 630 | # upgrade to a minimum of Python 3.5. See http://bit.ly/2bHzXtz. 631 | 632 | 633 | def get_content_disposition(part): 634 | value = part.get('content-disposition') 635 | if value is None: 636 | return None 637 | c_d = email.message._splitparam(value)[0].lower() 638 | return c_d 639 | 640 | 641 | def get_type_extension(content_type): 642 | filetypes = set(mimetypes.guess_all_extensions(content_type)) - AUTOCALCULATED_FILENAME_EXTENSION_BLACKLIST 643 | 644 | if len(filetypes) > 0: 645 | return sorted(list(filetypes))[0] 646 | else: 647 | return None 648 | 649 | 650 | def find_all_attachments(message, parts_to_ignore): 651 | parts = set() 652 | 653 | for part in message.walk(): 654 | if part not in parts_to_ignore and not part.is_multipart(): 655 | if part.get_content_type() not in MIME_TYPES_BLACKLIST: 656 | parts.add(part) 657 | 658 | return parts 659 | 660 | 661 | def filter_filenamed_parts(parts): 662 | new_parts = set() 663 | 664 | for part in parts: 665 | if part.get_filename() is not None: 666 | new_parts.add(part) 667 | 668 | return new_parts 669 | 670 | 671 | def get_formatted_header_info(input_email): 672 | header_info = "" 673 | 674 | for header in FORMATTED_HEADERS_TO_INCLUDE: 675 | if input_email[header]: 676 | decoded_string = get_utf8_header(input_email[header]) 677 | header_info = header_info + '' + header + ': ' + \ 678 | html.escape(decoded_string) + '
' 679 | 680 | return header_info + '
' 681 | 682 | # There are various different magic libraries floating around for Python, and 683 | # this function abstracts that out. The first clause is for `pip3 install 684 | # python-magic`, and the second is for the Ubuntu package python3-magic. 685 | 686 | 687 | def get_mime_type(buffer_data): 688 | # pylint: disable=no-member 689 | if 'from_buffer' in dir(magic): 690 | mime_type = magic.from_buffer(buffer_data, mime=True) 691 | if type(mime_type) is not str: 692 | # Older versions of python-magic seem to output bytes for the 693 | # mime_type name. As of Python 3.6+, it seems to be outputting 694 | # strings directly. 695 | mime_type = str(magic.from_buffer(buffer_data, mime=True), 'utf-8') 696 | else: 697 | m_handle = magic.open(magic.MAGIC_MIME_TYPE) 698 | m_handle.load() 699 | mime_type = m_handle.buffer(buffer_data) 700 | 701 | return mime_type 702 | 703 | 704 | def get_utf8_header(header): 705 | # There is a simpler way of doing this here: 706 | # http://stackoverflow.com/a/21715870/27641. However, it doesn't seem to 707 | # work, as it inserts a space between certain elements in the string 708 | # that's not warranted/correct. 709 | 710 | logger = logging.getLogger("email2pdf") 711 | 712 | decoded_header = decode_header(header) 713 | logger.debug("Decoded header: " + str(decoded_header)) 714 | hdr = "" 715 | for element in decoded_header: 716 | if isinstance(element[0], bytes): 717 | hdr += str(element[0], element[1] or 'ASCII') 718 | else: 719 | hdr += element[0] 720 | return hdr 721 | 722 | 723 | class WarningCountFilter(logging.Filter): 724 | # pylint: disable=too-few-public-methods 725 | warning_pending = False 726 | 727 | def filter(self, record): 728 | if record.levelno == logging.WARNING: 729 | self.warning_pending = True 730 | return True 731 | 732 | 733 | class FatalException(Exception): 734 | 735 | def __init__(self, value): 736 | Exception.__init__(self, value) 737 | self.value = value 738 | 739 | def __str__(self): 740 | return repr(self.value) 741 | 742 | 743 | def call_main(argv, syslog_handler, syserr_handler): 744 | # pylint: disable=bare-except 745 | logger = logging.getLogger("email2pdf") 746 | 747 | try: 748 | (warning_pending, mostly_hide_warnings) = main(argv, syslog_handler, syserr_handler) 749 | except FatalException as exception: 750 | logger.error(exception.value) 751 | sys.exit(2) 752 | except: 753 | traceback.print_exc() 754 | sys.exit(3) 755 | 756 | if warning_pending and not mostly_hide_warnings: 757 | sys.exit(1) 758 | 759 | 760 | if __name__ == "__main__": 761 | logger_setup = logging.getLogger("email2pdf") 762 | logger_setup.propagate = False 763 | logger_setup.setLevel(logging.DEBUG) 764 | 765 | syserr_handler_setup = logging.StreamHandler(stream=sys.stderr) 766 | syserr_handler_setup.setLevel(logging.WARNING) 767 | syserr_formatter = logging.Formatter('%(levelname)s: %(message)s') 768 | syserr_handler_setup.setFormatter(syserr_formatter) 769 | logger_setup.addHandler(syserr_handler_setup) 770 | 771 | if _platform == "linux" or _platform == "linux2": 772 | SYSLOG_ADDRESS = '/dev/log' 773 | elif _platform == "darwin": 774 | SYSLOG_ADDRESS = '/var/run/syslog' 775 | else: 776 | logger_setup.warning("I don't know this platform (" + _platform + "); cannot log to syslog.") 777 | SYSLOG_ADDRESS = None 778 | 779 | if SYSLOG_ADDRESS and os.path.exists(SYSLOG_ADDRESS): 780 | syslog_handler_setup = logging.handlers.SysLogHandler(address=SYSLOG_ADDRESS) 781 | syslog_handler_setup.setLevel(logging.INFO) 782 | SYSLOG_FORMATTER = logging.Formatter('%(pathname)s[%(process)d] %(levelname)s %(lineno)d %(message)s') 783 | syslog_handler_setup.setFormatter(SYSLOG_FORMATTER) 784 | logger_setup.addHandler(syslog_handler_setup) 785 | else: 786 | syslog_handler_setup = None 787 | 788 | call_main(sys.argv, syslog_handler_setup, syserr_handler_setup) 789 | -------------------------------------------------------------------------------- /email2pdf.py: -------------------------------------------------------------------------------- 1 | email2pdf -------------------------------------------------------------------------------- /getmailrc.sample: -------------------------------------------------------------------------------- 1 | [retriever] 2 | type = SimpleIMAPSSLRetriever 3 | server = mail.example.com 4 | username = pdf@example.com 5 | password = mypassword 6 | 7 | [options] 8 | delete = true 9 | 10 | [destination] 11 | type = MDA_external 12 | path = /where/I/installed/email2pdf 13 | arguments = ("-d", "/where/I/want/PDFs/to/go", "--enforce-syslog", "--mostly-hide-warnings") 14 | -------------------------------------------------------------------------------- /performance/printstats.py: -------------------------------------------------------------------------------- 1 | import pstats 2 | 3 | p = pstats.Stats('.email2pdf.profile') 4 | p.strip_dirs().sort_stats('time').print_callers(30) 5 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | beautifulsoup4>=4.6.3 2 | html5lib 3 | lxml 4 | pypdf2 5 | python-magic 6 | reportlab 7 | -------------------------------------------------------------------------------- /requirements_hacking.txt: -------------------------------------------------------------------------------- 1 | flake8 2 | freezegun 3 | nose 4 | pdfminer.six 5 | reportlab 6 | requests 7 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup 2 | 3 | setup( 4 | name='email2pdf', 5 | version='', 6 | packages=['tests', 'tests.Direct', 'tests.Subprocess'], 7 | url='https://github.com/andrewferrier/email2pdf', 8 | license='MIT', 9 | author='Andrew Ferrier', 10 | description='email2pdf is a Python script to convert emails to PDF.', 11 | install_requires=[ 12 | 'beautifulsoup4>=4.6.3', 13 | 'html5lib', 14 | 'lxml', 15 | 'pypdf2', 16 | 'python-magic', 17 | 'reportlab', 18 | ], 19 | ) 20 | -------------------------------------------------------------------------------- /tests/BaseTestClasses.py: -------------------------------------------------------------------------------- 1 | from PyPDF2 import PdfFileReader 2 | from datetime import datetime 3 | from datetime import timedelta 4 | from email import encoders 5 | from email.header import Header 6 | from email.mime.base import MIMEBase 7 | from email.mime.image import MIMEImage 8 | from email.mime.multipart import MIMEMultipart 9 | from email.mime.text import MIMEText 10 | from email.utils import formatdate 11 | from reportlab.pdfgen import canvas 12 | from requests.exceptions import RequestException 13 | from subprocess import Popen, PIPE 14 | 15 | import io 16 | import imghdr 17 | import logging 18 | import inspect 19 | import os 20 | import os.path 21 | import pdfminer.high_level 22 | import requests 23 | import shutil 24 | import sys 25 | import tempfile 26 | import unittest 27 | 28 | 29 | class Email2PDFTestCase(unittest.TestCase): 30 | isOnline = None 31 | examineDir = None 32 | 33 | time_invoked = None 34 | time_completed = None 35 | 36 | NONEXIST_IMG = 'http://www.andrewferrier.com/nonexist.jpg' 37 | NONEXIST_IMG_BLACKLIST = 'http://www.emltrk.com/nonexist.jpg' 38 | EXIST_IMG = 'https://raw.githubusercontent.com/andrewferrier/email2pdf/master/tests/basi2c16.png' 39 | EXIST_IMG_UPPERCASE = 'https://raw.githubusercontent.com/andrewferrier/email2pdf/master/tests/UPPERCASE.png' 40 | COMMAND = os.path.normpath(os.path.join(os.getcwd(), 'email2pdf')) 41 | 42 | DEFAULT_FROM = "from@example.org" 43 | DEFAULT_TO = "to@example.org" 44 | DEFAULT_SUBJECT = "Subject of the email" 45 | 46 | JPG_FILENAME = 'tests/jpeg444.jpg' 47 | PNG_FILENAME = 'tests/basi2c16.png' 48 | 49 | JPG_SIZE = os.path.getsize(JPG_FILENAME) 50 | PNG_SIZE = os.path.getsize(PNG_FILENAME) 51 | 52 | WARNINGS_AND_ERRORS_POSTFIX = "_warnings_and_errors.txt" 53 | ORIGINAL_EMAIL_POSTFIX = "_original.eml" 54 | 55 | def setUp(self): 56 | self.workingDir = tempfile.mkdtemp(dir='/tmp') 57 | self._check_online() 58 | self._check_examine_dir() 59 | 60 | def getTimeStamp(self, my_time): 61 | return my_time.strftime("%Y-%m-%dT%H-%M-%S") 62 | 63 | def existsByTime(self, path=None): 64 | if self.getTimedFilename(path): 65 | return True 66 | else: 67 | return False 68 | 69 | def existsByTimeWarning(self): 70 | if self.getTimedFilename(postfix=self.WARNINGS_AND_ERRORS_POSTFIX): 71 | return True 72 | else: 73 | return False 74 | 75 | def existsByTimeOriginal(self): 76 | if self.getTimedFilename(postfix=self.ORIGINAL_EMAIL_POSTFIX): 77 | return True 78 | else: 79 | return False 80 | 81 | def getWarningFileContents(self): 82 | filename = self.getTimedFilename(postfix=self.WARNINGS_AND_ERRORS_POSTFIX) 83 | with open(filename) as f: 84 | return f.read() 85 | 86 | def assertValidOriginalFileContents(self, filename=None): 87 | try: 88 | if not filename: 89 | filename = self.getTimedFilename(postfix=self.ORIGINAL_EMAIL_POSTFIX) 90 | with open(filename, 'rb') as f: 91 | contents = f.read() 92 | 93 | assert(contents == self.msg.as_bytes()) 94 | except: 95 | raise AssertionError("General error validating email, contents=" + contents + 96 | "\n, self.msg.as_string=" + self.msg.as_string()) 97 | 98 | def getTimedFilename(self, path=None, postfix=".pdf"): 99 | if path is None: 100 | path = self.workingDir 101 | 102 | for single_time in self._timerange(self.time_invoked, self.time_completed): 103 | filename = os.path.join(path, self.getTimeStamp(single_time) + postfix) 104 | if os.path.exists(filename): 105 | return filename 106 | 107 | return None 108 | 109 | def addHeaders(self, frm=DEFAULT_FROM, to=DEFAULT_TO, subject=DEFAULT_SUBJECT, subject_encoding=None): 110 | if subject: 111 | if subject_encoding: 112 | assert isinstance(subject, bytes) 113 | header = Header(subject, subject_encoding) 114 | self.msg['Subject'] = header 115 | else: 116 | assert isinstance(subject, str) 117 | self.msg['Subject'] = subject 118 | 119 | if frm: 120 | self.msg['From'] = frm 121 | 122 | if to: 123 | self.msg['To'] = to 124 | 125 | self.msg['Date'] = formatdate() 126 | 127 | def invokeAsSubprocess(self, inputFile=False, outputDirectory=None, outputFile=None, extraParams=None, 128 | expectOutput=False, okToExist=False): 129 | if type(inputFile) is str: 130 | input_content = bytes(inputFile, 'utf-8') 131 | else: 132 | input_content = self.msg.as_bytes() 133 | 134 | options = [Email2PDFTestCase.COMMAND] 135 | 136 | if inputFile: 137 | input_file_handle = tempfile.NamedTemporaryFile() 138 | options.extend(['-i', input_file_handle.name]) 139 | input_file_handle.write(input_content) 140 | input_file_handle.flush() 141 | my_stdin = None 142 | my_input = None 143 | else: 144 | my_stdin = PIPE 145 | my_input = input_content 146 | 147 | if outputDirectory: 148 | options.extend(['-d', outputDirectory]) 149 | 150 | if outputFile: 151 | options.extend(['-o', outputFile]) 152 | if not okToExist: 153 | assert not os.path.exists(outputFile) 154 | 155 | if extraParams is None: 156 | extraParams = [] 157 | 158 | options.extend(extraParams) 159 | 160 | self.time_invoked = datetime.now() 161 | if outputDirectory is None: 162 | my_cwd = self.workingDir 163 | else: 164 | my_cwd = None 165 | 166 | email2pdf_process = Popen(options, stdin=my_stdin, stdout=PIPE, stderr=PIPE, cwd=my_cwd) 167 | 168 | output, error = email2pdf_process.communicate(my_input) 169 | email2pdf_process.wait() 170 | self.time_completed = datetime.now() 171 | 172 | output = str(output, "utf-8") 173 | error = str(error, "utf-8") 174 | 175 | if expectOutput: 176 | self.assertNotEqual("", output) 177 | else: 178 | self.assertEqual("", output) 179 | 180 | if inputFile: 181 | input_file_handle.close() 182 | 183 | return (email2pdf_process.returncode, output, error) 184 | 185 | def invokeDirectly(self, outputDirectory=None, outputFile=None, extraParams=None, completeMessage=None, okToExist=False): 186 | module_path = self._get_original_script_path() 187 | email2pdf = self._get_email2pdf_object(module_path) 188 | 189 | if completeMessage: 190 | bytes_message = bytes(completeMessage, 'utf-8') 191 | else: 192 | bytes_message = self.msg.as_bytes() 193 | 194 | with tempfile.NamedTemporaryFile() as input_file_handle: 195 | options = [module_path, '-i', input_file_handle.name] 196 | input_file_handle.write(bytes_message) 197 | input_file_handle.flush() 198 | 199 | options.extend(['-d', outputDirectory if outputDirectory else self.workingDir]) 200 | 201 | if outputFile: 202 | options.extend(['-o', outputFile]) 203 | if not okToExist: 204 | assert not os.path.exists(outputFile) 205 | 206 | if extraParams is None: 207 | extraParams = [] 208 | 209 | options.extend(extraParams) 210 | 211 | stream = io.StringIO() 212 | stream_handler = logging.StreamHandler(stream) 213 | log = logging.getLogger('email2pdf') 214 | log.propagate = False 215 | log.setLevel(logging.DEBUG) 216 | log.addHandler(stream_handler) 217 | 218 | self.time_invoked = datetime.now() 219 | 220 | try: 221 | email2pdf.main(options, None, stream_handler) 222 | finally: 223 | self.time_completed = datetime.now() 224 | for handler in log.handlers: 225 | handler.close() 226 | log.removeHandler(handler) 227 | for log_filter in log.filters: 228 | log.removeFilter(log_filter) 229 | stream_handler.close() 230 | 231 | error = stream.getvalue() 232 | 233 | return error 234 | 235 | def setPlainContent(self, content, charset='UTF-8'): 236 | if isinstance(self.msg, MIMEMultipart): 237 | raise Exception("Cannot call setPlainContent() on a MIME-based message.") 238 | else: 239 | self.msg.set_default_type("text/plain") 240 | self.msg.set_payload(content) 241 | self.msg.set_charset(charset) 242 | 243 | def attachHTML(self, content, charset=None): 244 | assert isinstance(self.msg, MIMEMultipart) 245 | 246 | # According to the docs 247 | # (https://docs.python.org/3.3/library/email.mime.html), setting 248 | # charset explicitly to None is different from not setting it. Not 249 | # sure how that works. But for the moment, sticking with this 250 | # style of invocation to be safe. 251 | if charset: 252 | self.msg.attach(MIMEText(content, 'html', charset)) 253 | else: 254 | self.msg.attach(MIMEText(content, 'html')) 255 | 256 | def attachText(self, content, charset=None): 257 | assert isinstance(self.msg, MIMEMultipart) 258 | 259 | if charset: 260 | self.msg.attach(MIMEText(content, 'plain', charset)) 261 | else: 262 | self.msg.attach(MIMEText(content, 'plain')) 263 | 264 | def attachPDF(self, string, filePrefix="email2pdf_unittest_file", 265 | extension="pdf", mainContentType="application", subContentType="pdf", no_filename=False): 266 | _, file_name = tempfile.mkstemp(prefix=filePrefix, suffix="." + extension) 267 | 268 | try: 269 | pdf_canvas = canvas.Canvas(file_name) 270 | pdf_canvas.drawString(0, 500, string) 271 | pdf_canvas.save() 272 | 273 | with open(file_name, "rb") as open_handle: 274 | if no_filename: 275 | self.attachAttachment(mainContentType, subContentType, open_handle.read(), None) 276 | else: 277 | self.attachAttachment(mainContentType, subContentType, open_handle.read(), file_name) 278 | 279 | return os.path.basename(file_name) 280 | finally: 281 | os.unlink(file_name) 282 | 283 | def attachImage(self, content_id=None, jpeg=True, content_type=None, content_type_add_filename=False, inline=False, force_filename=False, extension=None): 284 | if jpeg: 285 | real_filename = self.JPG_FILENAME 286 | file_suffix = 'jpg' if extension is None else extension 287 | else: 288 | real_filename = self.PNG_FILENAME 289 | file_suffix = 'png' if extension is None else extension 290 | 291 | if file_suffix != '': 292 | suffix = "." + file_suffix 293 | else: 294 | suffix = file_suffix 295 | 296 | with tempfile.NamedTemporaryFile(prefix="email2pdf_unittest_image", suffix=suffix) as temp_file: 297 | _, basic_file_name = os.path.split(temp_file.name) 298 | 299 | with open(real_filename, 'rb') as image_file: 300 | image = MIMEImage(image_file.read()) 301 | if content_id: 302 | image.add_header('Content-ID', content_id) 303 | 304 | if content_type: 305 | self._replace_header(image, 'Content-Type', content_type) 306 | 307 | if content_type_add_filename: 308 | image.set_param('name', basic_file_name, header='Content-Type') 309 | 310 | if inline: 311 | if force_filename: 312 | self._replace_header(image, 'Content-Disposition', 'inline; filename="%s"' % basic_file_name) 313 | else: 314 | self._replace_header(image, 'Content-Disposition', 'inline') 315 | else: 316 | self._replace_header(image, 'Content-Disposition', 'attachment; filename="%s"' % basic_file_name) 317 | self.msg.attach(image) 318 | 319 | if inline and not force_filename: 320 | return None 321 | else: 322 | return basic_file_name 323 | 324 | def attachAttachment(self, mainContentType, subContentType, data, file_name=None, file_name_encoding=None): 325 | assert isinstance(self.msg, MIMEMultipart) 326 | 327 | part = MIMEBase(mainContentType, subContentType) 328 | part.set_payload(data) 329 | encoders.encode_base64(part) 330 | 331 | if file_name: 332 | if file_name_encoding: 333 | # I would like to use a more simple implementation here based 334 | # on part.add_header, but the encoding mechanism provided for 335 | # that gives a different output, placing the filename in 336 | # Content-Disposition, with it subtly differently encoded. 337 | # This doesn't match a real-world problematic email which was 338 | # observed like this: 339 | # 340 | # Content-Type: APPLICATION/pdf; NAME="=?UTF-8?Q?123.pdf?=" 341 | # Content-Transfer-Encoding: QUOTED-PRINTABLE 342 | # Content-Disposition: attachment 343 | 344 | header = mainContentType + '/' + subContentType 345 | header += '; name="' + Header(os.path.basename(file_name), file_name_encoding).encode() + '"' 346 | del part['Content-Type'] 347 | part['Content-Type'] = header 348 | part.add_header('Content-Disposition', 'attachment') 349 | else: 350 | part.add_header('Content-Disposition', 'attachment', filename=os.path.basename(file_name)) 351 | else: 352 | part.add_header('Content-Disposition', 'inline') 353 | 354 | self.msg.attach(part) 355 | 356 | def assertIsJPG(self, filename): 357 | self.assertEqual(imghdr.what(filename), 'jpeg') 358 | 359 | def getMetadataField(self, pdf_filename, field_name): 360 | with open(pdf_filename, 'rb') as file_input: 361 | input_f = PdfFileReader(file_input) 362 | document_info = input_f.getDocumentInfo() 363 | key = '/' + field_name 364 | if key in document_info.keys(): 365 | return document_info[key] 366 | else: 367 | return None 368 | 369 | def getPDFText(self, filename): 370 | if os.path.exists(filename): 371 | try: 372 | text = pdfminer.high_level.extract_text(filename) 373 | except pdfminer.pdfparser.PDFSyntaxError: 374 | return None 375 | 376 | text = text.replace("\t", " ") 377 | return text 378 | else: 379 | return None 380 | 381 | def touch(self, fname): 382 | open(fname, 'w').close() 383 | 384 | def find_mount_point(self, path): 385 | while not os.path.ismount(path): 386 | path = os.path.dirname(path) 387 | return path 388 | 389 | def _timerange(self, start_time, end_time): 390 | start_time = start_time.replace(microsecond=0) 391 | end_time = end_time.replace(microsecond=0) 392 | for step in range(int((end_time - start_time).seconds) + 1): 393 | yield start_time + timedelta(0, step) 394 | 395 | def _replace_header(self, mime_base, header, value): 396 | mime_base.__delitem__(header) 397 | mime_base.add_header(header, value) 398 | 399 | @classmethod 400 | def _get_original_script_path(cls): 401 | module_path = inspect.getfile(inspect.currentframe()) 402 | module_path = os.path.join(os.path.dirname(os.path.dirname(module_path)), 'email2pdf') 403 | 404 | return module_path 405 | 406 | @classmethod 407 | def _get_email2pdf_object(cls, module_path): 408 | import importlib.machinery 409 | loader = importlib.machinery.SourceFileLoader('email2pdf', module_path) 410 | return loader.load_module() 411 | 412 | @classmethod 413 | def _check_examine_dir(cls): 414 | if Email2PDFTestCase.examineDir is None: 415 | Email2PDFTestCase.examineDir = '/tmp' 416 | Email2PDFTestCase.examineDir = tempfile.mkdtemp(dir=Email2PDFTestCase.examineDir) 417 | print("Output examination directory: " + Email2PDFTestCase.examineDir) 418 | 419 | @classmethod 420 | def _check_online(cls): 421 | if Email2PDFTestCase.isOnline is None: 422 | print("Checking if online... ", end="") 423 | sys.stdout.flush() 424 | try: 425 | request = requests.get(Email2PDFTestCase.EXIST_IMG, headers={'Connection': 'close'}) 426 | request.raise_for_status() 427 | Email2PDFTestCase.isOnline = True 428 | print("Yes.") 429 | except RequestException as exception: 430 | Email2PDFTestCase.isOnline = False 431 | print("No (" + str(exception) + ")") 432 | 433 | def tearDown(self): 434 | shutil.rmtree(self.workingDir) 435 | -------------------------------------------------------------------------------- /tests/Direct/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewferrier/email2pdf/c3b20226bc255a75f52c762aece66c58fb76b2c4/tests/Direct/__init__.py -------------------------------------------------------------------------------- /tests/Direct/test_Direct_Arguments.py: -------------------------------------------------------------------------------- 1 | from datetime import datetime 2 | from email.mime.multipart import MIMEMultipart 3 | 4 | import os 5 | 6 | from tests import BaseTestClasses 7 | 8 | 9 | class Direct_Arguments(BaseTestClasses.Email2PDFTestCase): 10 | def setUp(self): 11 | super(Direct_Arguments, self).setUp() 12 | self.msg = MIMEMultipart() 13 | 14 | def test_no_body(self): 15 | error = self.invokeDirectly(extraParams=['--no-body']) 16 | self.assertFalse(self.existsByTime()) 17 | self.assertRegex(error, "body.*any.*attachments") 18 | self.assertTrue(self.existsByTimeWarning()) 19 | self.assertTrue(self.existsByTimeOriginal()) 20 | self.assertRegex(self.getWarningFileContents(), "body.*any.*attachments") 21 | self.assertValidOriginalFileContents() 22 | 23 | def test_no_body_but_some_attachments(self): 24 | filename = self.attachPDF("Some PDF content", mainContentType="application", subContentType="octet-stream") 25 | self.invokeDirectly(extraParams=['--no-body']) 26 | self.assertFalse(self.existsByTime()) 27 | self.assertFalse(self.existsByTimeWarning()) 28 | self.assertFalse(self.existsByTimeOriginal()) 29 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 30 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename)), "Some PDF content") 31 | 32 | def test_no_body_mostly_hide_warnings(self): 33 | error = self.invokeDirectly(extraParams=['--no-body', '--mostly-hide-warnings']) 34 | self.assertFalse(self.existsByTime()) 35 | self.assertEqual("", error) 36 | self.assertTrue(self.existsByTimeWarning()) 37 | self.assertTrue(self.existsByTimeOriginal()) 38 | self.assertRegex(self.getWarningFileContents(), "body.*any.*attachments") 39 | self.assertValidOriginalFileContents() 40 | 41 | def test_no_attachments(self): 42 | self.addHeaders() 43 | self.attachText("Some basic textual content") 44 | filename = self.attachPDF("Some PDF content", mainContentType="application", subContentType="octet-stream") 45 | filename2 = self.attachPDF("Some PDF content") 46 | filename3 = self.attachImage() 47 | error = self.invokeDirectly(extraParams=['--no-attachments']) 48 | self.assertEqual('', error) 49 | self.assertTrue(self.existsByTime()) 50 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename))) 51 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename2))) 52 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename3))) 53 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 54 | self.assertFalse(self.existsByTimeWarning()) 55 | self.assertFalse(self.existsByTimeOriginal()) 56 | 57 | def test_no_attachments_mostly_hide_warnings(self): 58 | self.addHeaders() 59 | self.attachText("Some basic textual content") 60 | filename = self.attachPDF("Some PDF content", mainContentType="application", subContentType="octet-stream") 61 | filename2 = self.attachPDF("Some PDF content") 62 | filename3 = self.attachImage() 63 | error = self.invokeDirectly(extraParams=['--no-attachments', '--mostly-hide-warnings']) 64 | self.assertEqual('', error) 65 | self.assertTrue(self.existsByTime()) 66 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename))) 67 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename2))) 68 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename3))) 69 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 70 | self.assertFalse(self.existsByTimeWarning()) 71 | self.assertFalse(self.existsByTimeOriginal()) 72 | 73 | def test_no_body_and_no_attachments(self): 74 | self.addHeaders() 75 | self.attachText("Some basic textual content") 76 | self.attachPDF("Some PDF content", mainContentType="application", subContentType="octet-stream") 77 | self.attachPDF("Some PDF content") 78 | self.attachImage() 79 | with self.assertRaisesRegex(Exception, "attachments.*not allowed with.*body"): 80 | self.invokeDirectly(extraParams=['--no-body', '--no-attachments']) 81 | self.assertFalse(self.existsByTime()) 82 | self.assertFalse(self.existsByTimeWarning()) 83 | self.assertFalse(self.existsByTimeOriginal()) 84 | 85 | def test_verbose_and_mostly_hide_warnings(self): 86 | with self.assertRaisesRegex(Exception, "mostly-hide.*not allowed with.*verbose"): 87 | self.invokeDirectly(extraParams=['--verbose', '--mostly-hide-warnings']) 88 | self.assertFalse(self.existsByTime()) 89 | self.assertFalse(self.existsByTimeWarning()) 90 | self.assertFalse(self.existsByTimeOriginal()) 91 | 92 | def test_headers(self): 93 | path = os.path.join(self.examineDir, "headers.pdf") 94 | self.addHeaders() 95 | self.attachText("Hello!") 96 | error = self.invokeDirectly(outputFile=path, extraParams=['--headers']) 97 | self.assertEqual('', error) 98 | self.assertTrue(os.path.exists(path)) 99 | pdf_text = self.getPDFText(path) 100 | self.assertRegex(pdf_text, "Subject") 101 | self.assertRegex(pdf_text, "From") 102 | self.assertRegex(pdf_text, "To") 103 | self.assertRegex(pdf_text, "Hello") 104 | self.assertFalse(self.existsByTimeWarning()) 105 | self.assertFalse(self.existsByTimeOriginal()) 106 | 107 | def test_add_prefix_date(self): 108 | self.addHeaders() 109 | self.attachText("Some basic textual content") 110 | filename = self.attachPDF("Some PDF content") 111 | filename2 = self.attachPDF("Some PDF content", filePrefix="unittest_file_2014-01-01") 112 | filename3 = self.attachPDF("Some PDF content", filePrefix="unittest_2014-01-01_file") 113 | filename4 = self.attachPDF("Some PDF content", filePrefix="2014-01-01_unittest_file") 114 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename))) 115 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename2))) 116 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename3))) 117 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename4))) 118 | error = self.invokeDirectly(extraParams=['--add-prefix-date']) 119 | self.assertEqual('', error) 120 | self.assertTrue(self.existsByTime()) 121 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename))) 122 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename2))) 123 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename3))) 124 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename4))) 125 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, datetime.now().strftime("%Y-%m-%d-") + filename))) 126 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename2)), "Some PDF content") 127 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename3)), "Some PDF content") 128 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename4)), "Some PDF content") 129 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, 130 | datetime.now().strftime("%Y-%m-%d-") + filename)), "Some PDF content") 131 | self.assertFalse(self.existsByTimeWarning()) 132 | self.assertFalse(self.existsByTimeOriginal()) 133 | 134 | def test_verbose(self): 135 | self.attachText("Hello!") 136 | error = self.invokeDirectly(extraParams=['-v']) 137 | self.assertNotEqual('', error) 138 | self.assertTrue(self.existsByTime()) 139 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Hello!") 140 | self.assertFalse(self.existsByTimeWarning()) 141 | self.assertFalse(self.existsByTimeOriginal()) 142 | 143 | def test_veryverbose(self): 144 | self.attachText("Hello!") 145 | error = self.invokeDirectly(extraParams=['-vv']) 146 | self.assertNotEqual('', error) 147 | self.assertTrue(self.existsByTime()) 148 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Hello!") 149 | self.assertFalse(self.existsByTimeWarning()) 150 | self.assertFalse(self.existsByTimeOriginal()) 151 | -------------------------------------------------------------------------------- /tests/Direct/test_Direct_AttachmentDetection.py: -------------------------------------------------------------------------------- 1 | from email.mime.multipart import MIMEMultipart 2 | 3 | import os 4 | 5 | from tests.BaseTestClasses import Email2PDFTestCase 6 | 7 | 8 | class AttachmentDetection(Email2PDFTestCase): 9 | def setUp(self): 10 | super(AttachmentDetection, self).setUp() 11 | self.msg = MIMEMultipart() 12 | 13 | def test_pdf_as_octet_stream(self): 14 | self.addHeaders() 15 | self.attachText("Some basic textual content") 16 | filename = self.attachPDF("Some PDF content", mainContentType="application", subContentType="octet-stream") 17 | error = self.invokeDirectly() 18 | self.assertEqual('', error) 19 | self.assertTrue(self.existsByTime()) 20 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 21 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 22 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename)), "Some PDF content") 23 | self.assertFalse(self.existsByTimeWarning()) 24 | self.assertFalse(self.existsByTimeOriginal()) 25 | 26 | def test_pdf_with_invalid_extension(self): 27 | self.addHeaders() 28 | self.attachText("Some basic textual content") 29 | filename = self.attachPDF("Some PDF content", extension="pdf") 30 | error = self.invokeDirectly() 31 | self.assertEqual('', error) 32 | self.assertTrue(self.existsByTime()) 33 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 34 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 35 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename)), "Some PDF content") 36 | self.assertFalse(self.existsByTimeWarning()) 37 | self.assertFalse(self.existsByTimeOriginal()) 38 | 39 | def test_pdf_as_octet_stream_with_invalid_extension(self): 40 | self.addHeaders() 41 | self.attachText("Some basic textual content") 42 | filename = self.attachPDF("Some PDF content", extension="xyz", mainContentType="application", subContentType="octet-stream") 43 | error = self.invokeDirectly() 44 | self.assertEqual('', error) 45 | self.assertTrue(self.existsByTime()) 46 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 47 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 48 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename)), "Some PDF content") 49 | self.assertFalse(self.existsByTimeWarning()) 50 | self.assertFalse(self.existsByTimeOriginal()) 51 | 52 | def test_pdf_as_octet_stream_no_body(self): 53 | self.addHeaders() 54 | self.attachText("Some basic textual content") 55 | filename = self.attachPDF("Some PDF content", mainContentType="application", subContentType="octet-stream") 56 | error = self.invokeDirectly(extraParams=['--no-body']) 57 | self.assertEqual('', error) 58 | self.assertFalse(self.existsByTime()) 59 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 60 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename)), "Some PDF content") 61 | self.assertFalse(self.existsByTimeWarning()) 62 | self.assertFalse(self.existsByTimeOriginal()) 63 | 64 | def test_jpeg_as_octet_stream(self): 65 | self.addHeaders() 66 | self.attachText("Some basic textual content") 67 | image_filename = self.attachImage(jpeg=True, content_type="application/octet-stream") 68 | error = self.invokeDirectly() 69 | self.assertEqual('', error) 70 | self.assertTrue(self.existsByTime()) 71 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, image_filename))) 72 | self.assertIsJPG(os.path.join(self.workingDir, image_filename)) 73 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 74 | self.assertFalse(self.existsByTimeWarning()) 75 | self.assertFalse(self.existsByTimeOriginal()) 76 | 77 | def test_jpeg_with_invalid_extension(self): 78 | self.addHeaders() 79 | self.attachText("Some basic textual content") 80 | image_filename = self.attachImage(jpeg=True, extension="blah") 81 | error = self.invokeDirectly() 82 | self.assertEqual('', error) 83 | self.assertTrue(self.existsByTime()) 84 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, image_filename))) 85 | self.assertIsJPG(os.path.join(self.workingDir, image_filename)) 86 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 87 | self.assertFalse(self.existsByTimeWarning()) 88 | self.assertFalse(self.existsByTimeOriginal()) 89 | 90 | def test_jpeg_as_octet_stream_with_invalid_extension(self): 91 | self.addHeaders() 92 | self.attachText("Some basic textual content") 93 | image_filename = self.attachImage(jpeg=True, content_type="application/octet-stream", extension="xyz") 94 | error = self.invokeDirectly() 95 | self.assertEqual('', error) 96 | self.assertTrue(self.existsByTime()) 97 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, image_filename))) 98 | self.assertIsJPG(os.path.join(self.workingDir, image_filename)) 99 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 100 | self.assertFalse(self.existsByTimeWarning()) 101 | self.assertFalse(self.existsByTimeOriginal()) 102 | 103 | def test_word_document(self): 104 | self.addHeaders() 105 | self.attachText("Some basic textual content") 106 | self.attachAttachment("application", "vnd.openxmlformats-officedocument.wordprocessingml.document", 107 | "Word document content", "somefile.docx") 108 | error = self.invokeDirectly() 109 | self.assertEqual('', error) 110 | self.assertTrue(self.existsByTime()) 111 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, "somefile.docx"))) 112 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 113 | self.assertFalse(self.existsByTimeWarning()) 114 | self.assertFalse(self.existsByTimeOriginal()) 115 | 116 | def test_unidentified_file(self): 117 | self.addHeaders() 118 | self.attachText("Some basic textual content") 119 | self.attachAttachment("application", "data", "some data in some format", "somefile.xyz") 120 | error = self.invokeDirectly() 121 | self.assertEqual('', error) 122 | self.assertTrue(self.existsByTime()) 123 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, "somefile.xyz"))) 124 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 125 | self.assertFalse(self.existsByTimeWarning()) 126 | self.assertFalse(self.existsByTimeOriginal()) 127 | 128 | def test_attachment_filename_has_encoding(self): 129 | path = os.path.join(self.workingDir, "somefile.xyz") 130 | self.attachAttachment("application", "data", "some data in some format", "somefile.xyz", file_name_encoding="utf-8") 131 | (rc, output, error) = self.invokeAsSubprocess(extraParams=['--no-body']) 132 | self.assertTrue(os.path.exists(path)) 133 | self.assertEqual('', error) 134 | self.assertFalse(self.existsByTimeWarning()) 135 | self.assertFalse(self.existsByTimeOriginal()) 136 | -------------------------------------------------------------------------------- /tests/Direct/test_Direct_Basic.py: -------------------------------------------------------------------------------- 1 | from email.header import Header 2 | from email.mime.multipart import MIMEMultipart 3 | 4 | import os 5 | 6 | from tests import BaseTestClasses 7 | 8 | 9 | class Direct_Basic(BaseTestClasses.Email2PDFTestCase): 10 | def setUp(self): 11 | super(Direct_Basic, self).setUp() 12 | self.msg = MIMEMultipart() 13 | 14 | def test_simple(self): 15 | self.addHeaders() 16 | error = self.invokeDirectly() 17 | self.assertTrue(self.existsByTime()) 18 | self.assertEqual('', error) 19 | self.assertFalse(self.existsByTimeWarning()) 20 | self.assertFalse(self.existsByTimeOriginal()) 21 | 22 | def test_missing_from_to(self): 23 | path = os.path.join(self.examineDir, "missing_from_to.pdf") 24 | self.addHeaders(frm=None, to=None) 25 | error = self.invokeDirectly(outputFile=path, extraParams=['--headers']) 26 | self.assertTrue(os.path.exists(path)) 27 | self.assertEqual('', error) 28 | self.assertFalse(self.existsByTimeWarning()) 29 | self.assertFalse(self.existsByTimeOriginal()) 30 | 31 | def test_internationalised_subject(self): 32 | path = os.path.join(self.examineDir, "internationalised_subject.pdf") 33 | self.addHeaders(subject=bytes("Hello!", 'iso-8859-1'), subject_encoding='iso-8859-1') 34 | error = self.invokeDirectly(outputFile=path, extraParams=['--headers']) 35 | self.assertTrue(os.path.exists(path)) 36 | self.assertEqual('', error) 37 | self.assertFalse(self.existsByTimeWarning()) 38 | self.assertFalse(self.existsByTimeOriginal()) 39 | 40 | def test_internationalised_subject2(self): 41 | path = os.path.join(self.examineDir, "internationalised_subject_jp.pdf") 42 | self.addHeaders(subject='=?iso-2022-jp?B?GyRCOiNHLyRiSSwkOiRkJGo/ayQyJGsbKEIhIRskQkcvS3ZBMCRO?=') 43 | error = self.invokeDirectly(outputFile=path, extraParams=['--headers']) 44 | self.assertTrue(os.path.exists(path)) 45 | self.assertEqual('', error) 46 | self.assertFalse(self.existsByTimeWarning()) 47 | self.assertFalse(self.existsByTimeOriginal()) 48 | 49 | def test_internationalised_subject3(self): 50 | path = os.path.join(self.examineDir, "internationalised_subject_de.pdf") 51 | self.addHeaders(subject='Ihre Anfrage, Giesestra=?utf-8?B?w58=?=e 5') 52 | error = self.invokeDirectly(outputFile=path, extraParams=['--headers']) 53 | self.assertTrue(os.path.exists(path)) 54 | self.assertEqual('', error) 55 | self.assertFalse(self.existsByTimeWarning()) 56 | self.assertFalse(self.existsByTimeOriginal()) 57 | 58 | def test_internationalised_subject4(self): 59 | path = os.path.join(self.examineDir, "internationalised_subject_complex.pdf") 60 | header = Header() 61 | header.append(bytes('£100', 'iso-8859-1'), 'iso-8859-1') 62 | header.append(bytes(' is != how much ', 'utf-8'), 'utf-8') 63 | header.append(bytes('I have to spend!', 'iso-8859-15'), 'iso-8859-15') 64 | self.addHeaders(subject=header.encode()) 65 | error = self.invokeDirectly(outputFile=path, extraParams=['--headers']) 66 | self.assertTrue(os.path.exists(path)) 67 | self.assertEqual('', error) 68 | self.assertFalse(self.existsByTimeWarning()) 69 | self.assertFalse(self.existsByTimeOriginal()) 70 | 71 | def test_contains_left_angle_bracket_mime(self): 72 | path = os.path.join(self.examineDir, "left_angle_bracket_mime.pdf") 73 | self.attachText("") 74 | error = self.invokeDirectly(outputFile=path) 75 | self.assertTrue(os.path.exists(path)) 76 | self.assertEqual('', error) 77 | self.assertRegex(self.getPDFText(path), "") 78 | self.assertFalse(self.existsByTimeWarning()) 79 | self.assertFalse(self.existsByTimeOriginal()) 80 | -------------------------------------------------------------------------------- /tests/Direct/test_Direct_BasicPlain.py: -------------------------------------------------------------------------------- 1 | from email.message import Message 2 | 3 | import os 4 | 5 | from tests.BaseTestClasses import Email2PDFTestCase 6 | 7 | 8 | class Direct_BasicPlain(Email2PDFTestCase): 9 | def setUp(self): 10 | super(Direct_BasicPlain, self).setUp() 11 | self.msg = Message() 12 | 13 | def test_contains_left_angle_bracket(self): 14 | path = os.path.join(self.examineDir, "left_angle_bracket_plain.pdf") 15 | self.setPlainContent("") 16 | error = self.invokeDirectly(outputFile=path) 17 | self.assertTrue(os.path.exists(path)) 18 | self.assertEqual('', error) 19 | self.assertRegex(self.getPDFText(path), "") 20 | self.assertFalse(self.existsByTimeWarning()) 21 | self.assertFalse(self.existsByTimeOriginal()) 22 | -------------------------------------------------------------------------------- /tests/Direct/test_Direct_CID.py: -------------------------------------------------------------------------------- 1 | from email.mime.multipart import MIMEMultipart 2 | 3 | import os 4 | import glob 5 | 6 | from tests.BaseTestClasses import Email2PDFTestCase 7 | 8 | 9 | class Direct_CID(Email2PDFTestCase): 10 | def setUp(self): 11 | super(Direct_CID, self).setUp() 12 | self.msg = MIMEMultipart() 13 | 14 | def test_inline_image_no_body(self): 15 | self.addHeaders() 16 | self.attachImage('myid', inline=True) 17 | self.attachHTML('') 18 | error = self.invokeDirectly(extraParams=['--no-body']) 19 | self.assertFalse(self.existsByTime()) 20 | self.assertRegex(error, "body.*any.*attachments") 21 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'myid.jpg'))) 22 | self.assertTrue(self.existsByTimeWarning()) 23 | self.assertRegex(self.getWarningFileContents(), "body.*any.*attachments") 24 | self.assertTrue(self.existsByTimeOriginal()) 25 | self.assertValidOriginalFileContents() 26 | 27 | def test_inline_image_with_filename_no_body(self): 28 | self.addHeaders() 29 | image_filename = self.attachImage('myid', inline=True, force_filename=True) 30 | self.attachHTML('') 31 | error = self.invokeDirectly(extraParams=['--no-body']) 32 | self.assertEqual('', error) 33 | self.assertFalse(self.existsByTime()) 34 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, image_filename))) 35 | self.assertFalse(self.existsByTimeWarning()) 36 | self.assertFalse(self.existsByTimeOriginal()) 37 | 38 | def test_inline_image_and_pdf(self): 39 | self.addHeaders() 40 | self.attachImage('myid', inline=True) 41 | self.attachHTML('') 42 | pdf_file_name = self.attachPDF("Some PDF content") 43 | error = self.invokeDirectly(extraParams=['--no-body']) 44 | self.assertEqual('', error) 45 | self.assertFalse(self.existsByTime()) 46 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, pdf_file_name))) 47 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, pdf_file_name)), "Some PDF content") 48 | self.assertFalse(self.existsByTimeWarning()) 49 | self.assertFalse(self.existsByTimeOriginal()) 50 | 51 | def test_embedded_image(self): 52 | path = os.path.join(self.examineDir, "embeddedImage.pdf") 53 | self.addHeaders() 54 | image_filename = self.attachImage('myid') 55 | self.attachHTML('') 56 | error = self.invokeDirectly(outputFile=path) 57 | self.assertEqual('', error) 58 | self.assertTrue(os.path.exists(path)) 59 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(path)) 60 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 61 | self.assertFalse(self.existsByTimeWarning()) 62 | self.assertFalse(self.existsByTimeOriginal()) 63 | 64 | # This test is an attempt to recreate a real-world failing email where the image attachment looked like: 65 | # 66 | # Content-Type: image/png; name=map_8dff3523-1a2d-4fc8-926f-d18e93964f3d 67 | # Content-Disposition: inline; filename=map_8dff3523-1a2d-4fc8-926f-d18e93964f3d 68 | # Content-Transfer-Encoding: base64 69 | # Content-ID: <> 70 | # 71 | # And the HTML looked like: 72 | # 73 | # 74 | 75 | def test_embedded_image_cid_empty(self): 76 | path = os.path.join(self.examineDir, "embeddedImageCIDEmpty.pdf") 77 | self.addHeaders() 78 | image_filename = self.attachImage('<>', jpeg=False, inline=True, force_filename=True, content_type_add_filename=True, extension="") 79 | self.attachHTML('') 80 | error = self.invokeDirectly(outputFile=path) 81 | self.assertEqual('', error) 82 | self.assertTrue(os.path.exists(path)) 83 | self.assertLess(Email2PDFTestCase.PNG_SIZE, os.path.getsize(path)) 84 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 85 | self.assertFalse(self.existsByTimeWarning()) 86 | self.assertFalse(self.existsByTimeOriginal()) 87 | 88 | def test_embedded_image_with_complex_name(self): 89 | path = os.path.join(self.examineDir, "embeddedImageWithComplexName.pdf") 90 | self.addHeaders() 91 | image_filename = self.attachImage('myid@A34A.1A23E', jpeg=False) 92 | self.attachHTML('') 93 | error = self.invokeDirectly(outputFile=path) 94 | self.assertEqual('', error) 95 | self.assertTrue(os.path.exists(path)) 96 | self.assertLess(Email2PDFTestCase.PNG_SIZE, os.path.getsize(path)) 97 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 98 | self.assertFalse(self.existsByTimeWarning()) 99 | self.assertFalse(self.existsByTimeOriginal()) 100 | 101 | def test_embedded_image_invalid_cid(self): 102 | self.addHeaders() 103 | image_filename = self.attachImage('myid') 104 | self.attachHTML('') 105 | error = self.invokeDirectly() 106 | self.assertRegex(error, "(?i)could not find image") 107 | self.assertTrue(self.existsByTime()) 108 | self.assertGreater(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 109 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, image_filename))) 110 | self.assertTrue(self.existsByTimeWarning()) 111 | self.assertRegex(self.getWarningFileContents(), "(?i)could not find image") 112 | self.assertTrue(self.existsByTimeOriginal()) 113 | self.assertValidOriginalFileContents() 114 | 115 | def test_embedded_image_invalid_cid_output_file(self): 116 | path = os.path.join(self.workingDir, "test_embedded_image_invalid_cid_output_file.pdf") 117 | self.addHeaders() 118 | image_filename = self.attachImage('myid') 119 | self.attachHTML('') 120 | error = self.invokeDirectly(outputFile=path) 121 | self.assertRegex(error, "(?i)could not find image") 122 | self.assertTrue(os.path.exists(path)) 123 | self.assertGreater(Email2PDFTestCase.JPG_SIZE, os.path.getsize(path)) 124 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, image_filename))) 125 | warning_filename = os.path.join(self.workingDir, "test_embedded_image_invalid_cid_output_file_warnings_and_errors.txt") 126 | self.assertTrue(os.path.exists(warning_filename)) 127 | with open(warning_filename) as f: 128 | warning_file_contents = f.read() 129 | self.assertRegex(warning_file_contents, "(?i)could not find image") 130 | original_email_filename = os.path.join(self.workingDir, "test_embedded_image_invalid_cid_output_file_original.eml") 131 | self.assertTrue(os.path.exists(original_email_filename)) 132 | self.assertValidOriginalFileContents(filename=original_email_filename) 133 | 134 | def test_embedded_image_png(self): 135 | path = os.path.join(self.examineDir, "embeddedImagePNG.pdf") 136 | self.addHeaders() 137 | image_filename = self.attachImage('myid', jpeg=False) 138 | self.attachHTML('') 139 | error = self.invokeDirectly(outputFile=path) 140 | self.assertEqual('', error) 141 | self.assertTrue(os.path.exists(path)) 142 | self.assertLess(Email2PDFTestCase.PNG_SIZE, os.path.getsize(path)) 143 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 144 | self.assertFalse(self.existsByTimeWarning()) 145 | self.assertFalse(self.existsByTimeOriginal()) 146 | 147 | def test_embedded_image_cid_underscore(self): 148 | self.addHeaders() 149 | image_filename = self.attachImage('') 150 | self.attachHTML('') 151 | error = self.invokeDirectly() 152 | self.assertEqual('', error) 153 | self.assertTrue(self.existsByTime()) 154 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 155 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 156 | self.assertFalse(self.existsByTimeWarning()) 157 | self.assertFalse(self.existsByTimeOriginal()) 158 | 159 | def test_embedded_image_extra_html_content(self): 160 | if self.isOnline: 161 | self.addHeaders() 162 | image_filename = self.attachImage('myid') 163 | self.attachHTML('

' + 164 | '

  • ') 165 | error = self.invokeDirectly() 166 | self.assertEqual('', error) 167 | self.assertTrue(self.existsByTime()) 168 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 169 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 170 | self.assertFalse(self.existsByTimeWarning()) 171 | self.assertFalse(self.existsByTimeOriginal()) 172 | else: 173 | self.skipTest("Not online.") 174 | 175 | def test_embedded_image_upper_case_html_content(self): 176 | self.addHeaders() 177 | image_filename = self.attachImage('myid') 178 | self.attachHTML('') 179 | error = self.invokeDirectly() 180 | self.assertEqual('', error) 181 | self.assertTrue(self.existsByTime()) 182 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 183 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 184 | self.assertFalse(self.existsByTimeWarning()) 185 | self.assertFalse(self.existsByTimeOriginal()) 186 | 187 | def test_embedded_image_no_attachments(self): 188 | self.addHeaders() 189 | image_filename = self.attachImage('myid') 190 | self.attachHTML('') 191 | error = self.invokeDirectly(extraParams=['--no-attachments']) 192 | self.assertEqual('', error) 193 | self.assertTrue(self.existsByTime()) 194 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 195 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 196 | self.assertFalse(self.existsByTimeWarning()) 197 | self.assertFalse(self.existsByTimeOriginal()) 198 | 199 | def test_embedded_image_as_octet_stream(self): 200 | self.addHeaders() 201 | image_filename = self.attachImage('myid', content_type="application/octet-stream") 202 | self.attachHTML('') 203 | error = self.invokeDirectly() 204 | self.assertEqual('', error) 205 | self.assertTrue(self.existsByTime()) 206 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 207 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 208 | self.assertFalse(self.existsByTimeWarning()) 209 | self.assertFalse(self.existsByTimeOriginal()) 210 | 211 | def test_one_embedded_one_not_image(self): 212 | self.addHeaders() 213 | image_filename = self.attachImage('myid') 214 | image_filename2 = self.attachImage() 215 | self.attachHTML('') 216 | error = self.invokeDirectly() 217 | self.assertEqual('', error) 218 | self.assertTrue(self.existsByTime()) 219 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 220 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 221 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, image_filename2))) 222 | self.assertFalse(self.existsByTimeWarning()) 223 | self.assertFalse(self.existsByTimeOriginal()) 224 | 225 | def test_two_embedded(self): 226 | path = os.path.join(self.examineDir, "twoEmbeddedImages.pdf") 227 | self.addHeaders() 228 | image_filename = self.attachImage('myid') 229 | self.attachHTML('') 230 | error = self.invokeDirectly(outputFile=path) 231 | self.assertEqual('', error) 232 | self.assertTrue(os.path.exists(path)) 233 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(path)) 234 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 235 | self.assertFalse(self.existsByTimeWarning()) 236 | self.assertFalse(self.existsByTimeOriginal()) 237 | 238 | def test_two_different_embedded(self): 239 | path = os.path.join(self.examineDir, "twoDifferentEmbeddedImages.pdf") 240 | self.addHeaders() 241 | image_filename = self.attachImage('myid') 242 | image_filename2 = self.attachImage('myid2', jpeg=False) 243 | self.attachHTML('') 244 | error = self.invokeDirectly(outputFile=path) 245 | self.assertEqual('', error) 246 | self.assertTrue(os.path.exists(path)) 247 | self.assertLess(Email2PDFTestCase.JPG_SIZE + Email2PDFTestCase.PNG_SIZE, os.path.getsize(path)) 248 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename))) 249 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, image_filename2))) 250 | self.assertFalse(self.existsByTimeWarning()) 251 | self.assertFalse(self.existsByTimeOriginal()) 252 | 253 | def test_some_cids_not_referenced(self): 254 | self.addHeaders() 255 | self.attachImage('myid', inline=True) 256 | self.attachImage('myid2', inline=True) 257 | self.attachImage('myid3', inline=True) 258 | self.attachImage(inline=True) 259 | self.attachImage(inline=True) 260 | self.attachHTML('') 261 | error = self.invokeDirectly() 262 | self.assertEqual('', error) 263 | self.assertTrue(self.existsByTime()) 264 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 265 | # These use globs because they might generate .jpg or they might generate .jfif 266 | self.assertFalse(glob.glob(os.path.join(self.workingDir, 'myid.*'))) 267 | self.assertTrue(glob.glob(os.path.join(self.workingDir, 'myid2.*'))) 268 | self.assertTrue(glob.glob(os.path.join(self.workingDir, 'myid3.*'))) 269 | self.assertTrue(glob.glob(os.path.join(self.workingDir, 'floating_attachment.*'))) 270 | self.assertTrue(glob.glob(os.path.join(self.workingDir, 'floating_attachment_1.*'))) 271 | self.assertFalse(self.existsByTimeWarning()) 272 | self.assertFalse(self.existsByTimeOriginal()) 273 | 274 | def test_some_cids_not_referenced_ignore_floating_attachments(self): 275 | self.addHeaders() 276 | self.attachImage('myid', inline=True) 277 | self.attachImage('myid2', inline=True) 278 | self.attachImage('myid3', inline=True) 279 | self.attachImage(inline=True) 280 | self.attachImage(inline=True) 281 | self.attachHTML('') 282 | error = self.invokeDirectly(extraParams=['--ignore-floating-attachments']) 283 | self.assertEqual('', error) 284 | self.assertTrue(self.existsByTime()) 285 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 286 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'myid.jpg'))) 287 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'myid2.jpg'))) 288 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'myid3.jpg'))) 289 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'floating_attachment.jpg'))) 290 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'floating_attachment_1.jpg'))) 291 | self.assertFalse(self.existsByTimeWarning()) 292 | self.assertFalse(self.existsByTimeOriginal()) 293 | 294 | def test_some_cids_not_referenced_png(self): 295 | self.addHeaders() 296 | self.attachImage('myid', jpeg=False, inline=True) 297 | self.attachImage('myid2', jpeg=False, inline=True) 298 | self.attachImage(jpeg=False, inline=True) 299 | self.attachHTML('') 300 | error = self.invokeDirectly() 301 | self.assertEqual('', error) 302 | self.assertTrue(self.existsByTime()) 303 | self.assertLess(Email2PDFTestCase.PNG_SIZE, os.path.getsize(self.getTimedFilename())) 304 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'myid.png'))) 305 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, 'myid2.png'))) 306 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, 'floating_attachment.png'))) 307 | self.assertFalse(self.existsByTimeWarning()) 308 | self.assertFalse(self.existsByTimeOriginal()) 309 | 310 | def test_some_cids_not_referenced_pdf(self): 311 | self.addHeaders() 312 | self.attachPDF('Some PDF content', no_filename=True) 313 | self.attachImage('myid', inline=True) 314 | self.attachHTML('') 315 | error = self.invokeDirectly() 316 | self.assertEqual('', error) 317 | self.assertTrue(self.existsByTime()) 318 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 319 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'myid.png'))) 320 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, 'floating_attachment.pdf'))) 321 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, 'floating_attachment.pdf')), "Some PDF content") 322 | self.assertFalse(self.existsByTimeWarning()) 323 | self.assertFalse(self.existsByTimeOriginal()) 324 | 325 | def test_some_cids_not_referenced_docx(self): 326 | self.addHeaders() 327 | self.attachAttachment('application', 328 | 'vnd.openxmlformats-officedocument.wordprocessingml.document', 329 | 'Word document content', None) 330 | self.attachImage('myid', inline=True) 331 | self.attachHTML('') 332 | error = self.invokeDirectly() 333 | self.assertEqual('', error) 334 | self.assertTrue(self.existsByTime()) 335 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 336 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'myid.png'))) 337 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, 'floating_attachment.docx'))) 338 | self.assertFalse(self.existsByTimeWarning()) 339 | self.assertFalse(self.existsByTimeOriginal()) 340 | 341 | def test_some_cids_not_referenced_misc(self): 342 | self.addHeaders() 343 | self.attachAttachment('application', 344 | 'some-random-format', 345 | 'Document content', None) 346 | self.attachImage('myid', inline=True) 347 | self.attachHTML('') 348 | error = self.invokeDirectly() 349 | self.assertEqual('', error) 350 | self.assertTrue(self.existsByTime()) 351 | self.assertLess(Email2PDFTestCase.JPG_SIZE, os.path.getsize(self.getTimedFilename())) 352 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, 'myid.png'))) 353 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, 'floating_attachment'))) 354 | self.assertFalse(self.existsByTimeWarning()) 355 | self.assertFalse(self.existsByTimeOriginal()) 356 | -------------------------------------------------------------------------------- /tests/Direct/test_Direct_Errors.py: -------------------------------------------------------------------------------- 1 | from email.mime.multipart import MIMEMultipart 2 | 3 | import os 4 | import tempfile 5 | import unittest.mock 6 | 7 | from tests import BaseTestClasses 8 | 9 | 10 | class Direct_Errors(BaseTestClasses.Email2PDFTestCase): 11 | def setUp(self): 12 | super(Direct_Errors, self).setUp() 13 | self.msg = MIMEMultipart() 14 | 15 | def test_plaincontent_fileexist(self): 16 | self.attachText("Hello!") 17 | with tempfile.NamedTemporaryFile() as tmpfile: 18 | with self.assertRaisesRegex(Exception, "file.*exist"): 19 | self.invokeDirectly(outputFile=tmpfile.name, okToExist=True) 20 | self.assertFalse(self.existsByTimeWarning()) 21 | self.assertFalse(self.existsByTimeOriginal()) 22 | 23 | def test_plaincontent_dirnotexist(self): 24 | self.attachText("Hello!") 25 | with self.assertRaisesRegex(Exception, "(?i)directory.*not.*exist"): 26 | self.invokeDirectly(outputDirectory="/notexist/") 27 | self.assertFalse(self.existsByTimeWarning()) 28 | self.assertFalse(self.existsByTimeOriginal()) 29 | 30 | def test_image_doesnt_exist(self): 31 | if self.isOnline: 32 | path = os.path.join(self.examineDir, "remoteImageDoesntExist.pdf") 33 | self.addHeaders() 34 | self.attachHTML('') 35 | error = self.invokeDirectly(outputFile=path) 36 | self.assertTrue(os.path.exists(path)) 37 | self.assertRegex(error, "(?i)could not retrieve") 38 | self.assertFalse(self.existsByTimeWarning()) 39 | self.assertFalse(self.existsByTimeOriginal()) 40 | else: 41 | self.skipTest("Not online.") 42 | 43 | def test_image_doesnt_exist_blacklist(self): 44 | path = os.path.join(self.examineDir, "remoteImageDoesntExistBlacklist.pdf") 45 | self.addHeaders() 46 | self.attachHTML('') 47 | error = self.invokeDirectly(outputFile=path) 48 | self.assertTrue(os.path.exists(path)) 49 | self.assertEqual('', error) 50 | self.assertFalse(self.existsByTimeWarning()) 51 | self.assertFalse(self.existsByTimeOriginal()) 52 | 53 | def test_image_doesnt_exist_blacklist_upper(self): 54 | path = os.path.join(self.examineDir, "remoteImageDoesntExistBlacklistUpper.pdf") 55 | self.addHeaders() 56 | self.attachHTML('') 57 | error = self.invokeDirectly(outputFile=path) 58 | self.assertTrue(os.path.exists(path)) 59 | self.assertEqual('', error) 60 | self.assertFalse(self.existsByTimeWarning()) 61 | self.assertFalse(self.existsByTimeOriginal()) 62 | 63 | def test_image_doesnt_exist_with_pdf(self): 64 | if self.isOnline: 65 | self.addHeaders() 66 | self.attachHTML('') 67 | filename = self.attachPDF("Some PDF content") 68 | error = self.invokeDirectly() 69 | self.assertTrue(self.existsByTime()) 70 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 71 | self.assertRegex(error, "(?i)could not retrieve") 72 | self.assertTrue(self.existsByTimeWarning()) 73 | self.assertRegex(self.getWarningFileContents(), "(?i)could not retrieve") 74 | self.assertTrue(self.existsByTimeOriginal()) 75 | self.assertValidOriginalFileContents() 76 | else: 77 | self.skipTest("Not online.") 78 | 79 | def test_local_image_doesnt_exist(self): 80 | path = os.path.join(self.examineDir, "localImageDoesntExist.pdf") 81 | self.addHeaders() 82 | self.attachHTML('') 83 | error = self.invokeDirectly(outputFile=path) 84 | self.assertTrue(os.path.exists(path)) 85 | self.assertRegex(error, "(?i)could not retrieve") 86 | self.assertFalse(self.existsByTimeWarning()) 87 | self.assertFalse(self.existsByTimeOriginal()) 88 | 89 | def test_local_image_with_query_doesnt_exist(self): 90 | path = os.path.join(self.examineDir, "localImageWithQueryDoesntExist.pdf") 91 | self.addHeaders() 92 | self.attachHTML('') 93 | error = self.invokeDirectly(outputFile=path) 94 | self.assertTrue(os.path.exists(path)) 95 | self.assertRegex(error, "(?i)could not retrieve") 96 | self.assertFalse(self.existsByTimeWarning()) 97 | self.assertFalse(self.existsByTimeOriginal()) 98 | 99 | def test_local_script_doesnt_exist(self): 100 | path = os.path.join(self.examineDir, "localScriptDoesntExist.pdf") 101 | self.addHeaders() 102 | self.attachHTML("") 103 | error = self.invokeDirectly(outputFile=path) 104 | self.assertTrue(os.path.exists(path)) 105 | self.assertEqual('', error) 106 | self.assertFalse(self.existsByTimeWarning()) 107 | self.assertFalse(self.existsByTimeOriginal()) 108 | 109 | def test_local_script_with_query_doesnt_exist(self): 110 | path = os.path.join(self.examineDir, "localScriptWithQueryDoesntExist.pdf") 111 | self.addHeaders() 112 | self.attachHTML("") 113 | error = self.invokeDirectly(outputFile=path) 114 | self.assertTrue(os.path.exists(path)) 115 | self.assertEqual('', error) 116 | self.assertFalse(self.existsByTimeWarning()) 117 | self.assertFalse(self.existsByTimeOriginal()) 118 | 119 | def test_local_stylesheet_doesnt_exist(self): 120 | path = os.path.join(self.examineDir, "localStylesheetDoesntExist.pdf") 121 | self.addHeaders() 122 | self.attachHTML("") 123 | error = self.invokeDirectly(outputFile=path) 124 | self.assertTrue(os.path.exists(path)) 125 | self.assertEqual('', error) 126 | self.assertFalse(self.existsByTimeWarning()) 127 | self.assertFalse(self.existsByTimeOriginal()) 128 | 129 | def test_local_stylesheet_with_query_doesnt_exist(self): 130 | path = os.path.join(self.examineDir, "localStylesheetWithQueryDoesntExist.pdf") 131 | self.addHeaders() 132 | self.attachHTML("") 133 | error = self.invokeDirectly(outputFile=path) 134 | self.assertTrue(os.path.exists(path)) 135 | self.assertEqual('', error) 136 | self.assertFalse(self.existsByTimeWarning()) 137 | self.assertFalse(self.existsByTimeOriginal()) 138 | 139 | def test_no_explicit_parts(self): 140 | # If we don't add any parts explicitly, email2pdf should find a 141 | # plain-text part 142 | error = self.invokeDirectly() 143 | self.assertEqual('', error) 144 | self.assertTrue(self.existsByTime()) 145 | self.assertFalse(self.existsByTimeWarning()) 146 | self.assertFalse(self.existsByTimeOriginal()) 147 | 148 | def test_fuzz(self): 149 | with self.assertRaisesRegex(Exception, "(?i)defects parsing email"): 150 | self.invokeDirectly(completeMessage="This is total junk") 151 | self.assertFalse(self.existsByTime()) 152 | self.assertFalse(self.existsByTimeWarning()) 153 | self.assertFalse(self.existsByTimeOriginal()) 154 | 155 | def test_broken_html(self): 156 | self.addHeaders() 157 | self.attachHTML('Some basic textual content

    ") 44 | (rc, output, error) = self.invokeAsSubprocess() 45 | self.assertEqual(0, rc) 46 | self.assertEqual('', error) 47 | self.assertTrue(self.existsByTime()) 48 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some\sbasic\stextual\scontent") 49 | self.assertFalse(self.existsByTimeWarning()) 50 | self.assertFalse(self.existsByTimeOriginal()) 51 | 52 | def test_attachtext_upsidedown(self): 53 | self.addHeaders() 54 | self.attachText("ɯɐɹƃoɹd ɟpdᄅlᴉɐɯǝ ǝɥʇ ɟo ʇsǝʇ ɐ sᴉ sᴉɥʇ ollǝH") 55 | (rc, output, error) = self.invokeAsSubprocess() 56 | self.assertEqual(0, rc) 57 | self.assertTrue(self.existsByTime()) 58 | self.assertEqual('', error) 59 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "ɯɐɹƃoɹd\sɟpdᄅlᴉɐɯǝ\sǝɥʇ\sɟo\sʇsǝʇ\sɐ\ssᴉ\ssᴉɥʇ\sollǝH") 60 | self.assertFalse(self.existsByTimeWarning()) 61 | self.assertFalse(self.existsByTimeOriginal()) 62 | 63 | def test_attachhtml_upsidedown(self): 64 | self.addHeaders() 65 | self.attachHTML("

    ɯɐɹƃoɹd ɟpdᄅlᴉɐɯǝ ǝɥʇ ɟo ʇsǝʇ ɐ sᴉ sᴉɥʇ ollǝH

    ") 66 | (rc, output, error) = self.invokeAsSubprocess() 67 | self.assertEqual(0, rc) 68 | self.assertTrue(self.existsByTime()) 69 | self.assertEqual('', error) 70 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "ɯɐɹƃoɹd\sɟpdᄅlᴉɐɯǝ\sǝɥʇ\sɟo\sʇsǝʇ\sɐ\ssᴉ\ssᴉɥʇ\sollǝH") 71 | self.assertFalse(self.existsByTimeWarning()) 72 | self.assertFalse(self.existsByTimeOriginal()) 73 | 74 | def test_html_entities_currency(self): 75 | path = os.path.join(self.examineDir, "htmlEntitiesCurrency.pdf") 76 | self.addHeaders() 77 | self.attachHTML(b'Pounds: \xc2\xa37.14, Another Pounds: £7.14'.decode('utf-8')) 78 | (rc, output, error) = self.invokeAsSubprocess(outputFile=path) 79 | self.assertEqual(0, rc) 80 | self.assertEqual('', error) 81 | self.assertTrue(os.path.exists(path)) 82 | self.assertRegex(self.getPDFText(path), "Pounds:\s£7.14,\sAnother\sPounds:\s£7.14") 83 | self.assertFalse(self.existsByTimeWarning()) 84 | self.assertFalse(self.existsByTimeOriginal()) 85 | 86 | def test_html_poundsign_iso88591(self): 87 | self.addHeaders() 88 | path = os.path.join(self.examineDir, "html_poundsign_iso88591.pdf") 89 | self.attachHTML("Hello - this email costs \xa35!", charset="ISO-8859-1") 90 | (rc, output, error) = self.invokeAsSubprocess(outputFile=path) 91 | self.assertEqual(0, rc) 92 | self.assertEqual('', error) 93 | self.assertTrue(os.path.exists(path)) 94 | self.assertRegex(self.getPDFText(path), "Hello\s-\sthis\semail\scosts\s\xa35!") 95 | self.assertFalse(self.existsByTimeWarning()) 96 | self.assertFalse(self.existsByTimeOriginal()) 97 | 98 | def test_text_poundsign_iso88591(self): 99 | self.addHeaders() 100 | path = os.path.join(self.examineDir, "text_poundsign_iso88591.pdf") 101 | self.attachText("Hello - this email costs \xa35!", charset="ISO-8859-1") 102 | (rc, output, error) = self.invokeAsSubprocess(outputFile=path) 103 | self.assertEqual(0, rc) 104 | self.assertEqual('', error) 105 | self.assertTrue(os.path.exists(path)) 106 | self.assertRegex(self.getPDFText(path), "Hello\s-\sthis\semail\scosts\s\xa35!") 107 | self.assertFalse(self.existsByTimeWarning()) 108 | self.assertFalse(self.existsByTimeOriginal()) 109 | 110 | def test_plaincontent_poundsign_utf8_8bit(self): 111 | input_email = ("From: \"XYZ\" \n" 112 | "To: \"XYZ\" \n" 113 | "Subject: Blah\n" 114 | "Content-Type: multipart/mixed; boundary=\"CUT-HERE--\"\n" 115 | "\n" 116 | "--CUT-HERE--\n" 117 | "Content-Type: text/plain; charset=UTF-8\n" 118 | "Content-Transfer-Encoding: 8bit\n" 119 | "\n" 120 | "Price is £45.00\n" 121 | "--CUT-HERE----\n") 122 | path = os.path.join(self.examineDir, "plaincontent_poundsign_utf8_8bit.pdf") 123 | (rc, output, error) = self.invokeAsSubprocess(inputFile=input_email, outputFile=path, 124 | extraParams=['--input-encoding=utf-8']) 125 | self.assertEqual(0, rc) 126 | self.assertEqual('', error) 127 | self.assertTrue(os.path.exists(path)) 128 | self.assertRegex(self.getPDFText(path), "Price\sis\s£45.00") 129 | self.assertFalse(self.existsByTimeWarning()) 130 | self.assertFalse(self.existsByTimeOriginal()) 131 | 132 | def test_plainandhtml(self): 133 | self.addHeaders() 134 | self.attachText("Some basic textual content") 135 | self.attachHTML("

    Some basic HTML content

    ") 136 | (rc, output, error) = self.invokeAsSubprocess() 137 | self.assertEqual(0, rc) 138 | self.assertEqual('', error) 139 | self.assertTrue(self.existsByTime()) 140 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some\sbasic\sHTML\scontent") 141 | self.assertFalse(self.existsByTimeWarning()) 142 | self.assertFalse(self.existsByTimeOriginal()) 143 | 144 | def test_wrong_charset_html(self): 145 | self.addHeaders() 146 | broken_body = b"

    Something with raw accents: \xe9

    " 147 | self.attachHTML(broken_body, charset="utf-8") 148 | (rc, output, error) = self.invokeAsSubprocess() 149 | self.assertEqual(0, rc) 150 | self.assertEqual('', error) 151 | self.assertTrue(self.existsByTime()) 152 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Something\swith\sraw\saccents:\s\é") 153 | self.assertFalse(self.existsByTimeWarning()) 154 | self.assertFalse(self.existsByTimeOriginal()) 155 | 156 | def test_pdf(self): 157 | self.addHeaders() 158 | self.attachText("Some basic textual content") 159 | filename = self.attachPDF("Some PDF content") 160 | (rc, output, error) = self.invokeAsSubprocess() 161 | self.assertEqual(0, rc) 162 | self.assertEqual('', error) 163 | self.assertTrue(self.existsByTime()) 164 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 165 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some\sbasic\stextual\scontent") 166 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename)), "Some\sPDF\scontent") 167 | self.assertFalse(self.existsByTimeWarning()) 168 | self.assertFalse(self.existsByTimeOriginal()) 169 | 170 | def test_plaincontent_outputfileoverrides_with_attachments(self): 171 | mainFilename = os.path.join(self.examineDir, "outputFileOverridesWithAttachments.pdf") 172 | self.attachText("Hello!") 173 | attachmentFilename = self.attachPDF("Some PDF content") 174 | with tempfile.TemporaryDirectory() as tempdir: 175 | (rc, output, error) = self.invokeAsSubprocess(outputDirectory=tempdir, outputFile=mainFilename) 176 | self.assertEqual(0, rc) 177 | self.assertEqual('', error) 178 | self.assertFalse(self.existsByTime()) 179 | self.assertFalse(self.existsByTime(tempdir)) 180 | self.assertFalse(os.path.exists(os.path.join(tempdir, "outputFileOverrides.pdf"))) 181 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, "outputFileOverrides.pdf"))) 182 | self.assertTrue(os.path.exists(mainFilename)) 183 | self.assertFalse(os.path.exists(os.path.join(self.examineDir, attachmentFilename))) 184 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, attachmentFilename))) 185 | self.assertTrue(os.path.exists(os.path.join(tempdir, attachmentFilename))) 186 | self.assertRegex(self.getPDFText(mainFilename), "Hello!") 187 | self.assertRegex(self.getPDFText(os.path.join(tempdir, attachmentFilename)), "Some\sPDF\scontent") 188 | self.assertFalse(self.existsByTimeWarning()) 189 | self.assertFalse(self.existsByTimeOriginal()) 190 | 191 | def test_remote_image_does_exist(self): 192 | if self.isOnline: 193 | path = os.path.join(self.examineDir, "remoteImageDoesExist.pdf") 194 | self.addHeaders() 195 | self.attachHTML('') 196 | (rc, output, error) = self.invokeAsSubprocess(outputFile=path) 197 | self.assertEqual(0, rc) 198 | self.assertEqual('', error) 199 | self.assertTrue(os.path.exists(path)) 200 | self.assertFalse(self.existsByTimeWarning()) 201 | self.assertFalse(self.existsByTimeOriginal()) 202 | else: 203 | self.skipTest("Not online.") 204 | 205 | def test_remote_image_does_exist_uppercase(self): 206 | if self.isOnline: 207 | path = os.path.join(self.examineDir, "remoteImageDoesExistUppercase.pdf") 208 | self.addHeaders() 209 | self.attachHTML('') 210 | (rc, output, error) = self.invokeAsSubprocess(outputFile=path) 211 | self.assertEqual(0, rc) 212 | self.assertEqual('', error) 213 | self.assertTrue(os.path.exists(path)) 214 | self.assertFalse(self.existsByTimeWarning()) 215 | self.assertFalse(self.existsByTimeOriginal()) 216 | else: 217 | self.skipTest("Not online.") 218 | 219 | def test_non_embedded_image_jpeg(self): 220 | self.addHeaders() 221 | self.attachText("Hello!") 222 | imageFilename = self.attachImage(jpeg=True) 223 | (rc, output, error) = self.invokeAsSubprocess() 224 | self.assertEqual(0, rc) 225 | self.assertEqual('', error) 226 | self.assertTrue(self.existsByTime()) 227 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, imageFilename))) 228 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Hello!") 229 | self.assertFalse(self.existsByTimeWarning()) 230 | self.assertFalse(self.existsByTimeOriginal()) 231 | 232 | def test_non_embedded_image_jpeg_add_prefix_date(self): 233 | self.addHeaders() 234 | self.attachText("Hello!") 235 | imageFilename = self.attachImage(jpeg=True) 236 | (rc, output, error) = self.invokeAsSubprocess(extraParams=['--add-prefix-date']) 237 | self.assertEqual(0, rc) 238 | self.assertEqual('', error) 239 | self.assertTrue(self.existsByTime()) 240 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, datetime.now().strftime("%Y-%m-%d-") + imageFilename))) 241 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Hello!") 242 | self.assertFalse(self.existsByTimeWarning()) 243 | self.assertFalse(self.existsByTimeOriginal()) 244 | 245 | def test_non_embedded_image_png(self): 246 | self.addHeaders() 247 | self.attachText("Hello!") 248 | imageFilename = self.attachImage(jpeg=False) 249 | (rc, output, error) = self.invokeAsSubprocess() 250 | self.assertEqual(0, rc) 251 | self.assertEqual('', error) 252 | self.assertTrue(self.existsByTime()) 253 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, imageFilename))) 254 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Hello!") 255 | self.assertFalse(self.existsByTimeWarning()) 256 | self.assertFalse(self.existsByTimeOriginal()) 257 | 258 | def test_non_embedded_image_and_pdf(self): 259 | self.addHeaders() 260 | self.attachText("Hello!") 261 | imageFilename = self.attachImage() 262 | filename = self.attachPDF("Some PDF content") 263 | (rc, output, error) = self.invokeAsSubprocess() 264 | self.assertEqual(0, rc) 265 | self.assertEqual('', error) 266 | self.assertTrue(self.existsByTime()) 267 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 268 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, imageFilename))) 269 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Hello!") 270 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename)), "Some\sPDF\scontent") 271 | self.assertFalse(self.existsByTimeWarning()) 272 | self.assertFalse(self.existsByTimeOriginal()) 273 | 274 | def test_2pdfs(self): 275 | self.addHeaders() 276 | self.attachText("Some basic textual content") 277 | filename = self.attachPDF("Some PDF content") 278 | filename2 = self.attachPDF("Some More PDF content") 279 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename))) 280 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename2))) 281 | (rc, output, error) = self.invokeAsSubprocess() 282 | self.assertEqual(0, rc) 283 | self.assertEqual('', error) 284 | self.assertTrue(self.existsByTime()) 285 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 286 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename2))) 287 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 288 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename)), "Some PDF content") 289 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename2)), "Some More PDF content") 290 | self.assertFalse(self.existsByTimeWarning()) 291 | self.assertFalse(self.existsByTimeOriginal()) 292 | 293 | def test_pdf_exists(self): 294 | self.addHeaders() 295 | self.attachText("Some basic textual content") 296 | filename = self.attachPDF("Some PDF content") 297 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename))) 298 | 299 | self.touch(os.path.join(self.workingDir, filename)) 300 | (rc, output, error) = self.invokeAsSubprocess() 301 | self.assertEqual(0, rc) 302 | self.assertEqual('', error) 303 | self.assertTrue(self.existsByTime()) 304 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 305 | 306 | rootName, unused_extension = os.path.splitext(filename) 307 | uniqueName = rootName + "_1.pdf" 308 | 309 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, uniqueName))) 310 | 311 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 312 | self.assertIsNone(self.getPDFText(os.path.join(self.workingDir, filename))) 313 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, uniqueName)), "Some PDF content") 314 | self.assertFalse(self.existsByTimeWarning()) 315 | self.assertFalse(self.existsByTimeOriginal()) 316 | 317 | def test_2pdfs_oneexists(self): 318 | self.addHeaders() 319 | self.attachText("Some basic textual content") 320 | filename = self.attachPDF("Some PDF content") 321 | filename2 = self.attachPDF("Some More PDF content") 322 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename))) 323 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, filename2))) 324 | 325 | self.touch(os.path.join(self.workingDir, filename)) 326 | (rc, output, error) = self.invokeAsSubprocess() 327 | self.assertEqual(0, rc) 328 | self.assertEqual('', error) 329 | self.assertTrue(self.existsByTime()) 330 | 331 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename))) 332 | rootName, unused_extension = os.path.splitext(filename) 333 | uniqueName = rootName + "_1.pdf" 334 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, uniqueName))) 335 | 336 | self.assertTrue(os.path.exists(os.path.join(self.workingDir, filename2))) 337 | rootName2, unused_extension2 = os.path.splitext(filename2) 338 | uniqueName2 = rootName2 + "_1.pdf" 339 | self.assertFalse(os.path.exists(os.path.join(self.workingDir, uniqueName2))) 340 | 341 | self.assertRegex(self.getPDFText(self.getTimedFilename()), "Some basic textual content") 342 | self.assertIsNone(self.getPDFText(os.path.join(self.workingDir, filename))) 343 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, uniqueName)), "Some PDF content") 344 | self.assertRegex(self.getPDFText(os.path.join(self.workingDir, filename2)), "Some More PDF content") 345 | self.assertFalse(self.existsByTimeWarning()) 346 | self.assertFalse(self.existsByTimeOriginal()) 347 | -------------------------------------------------------------------------------- /tests/UPPERCASE.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewferrier/email2pdf/c3b20226bc255a75f52c762aece66c58fb76b2c4/tests/UPPERCASE.png -------------------------------------------------------------------------------- /tests/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewferrier/email2pdf/c3b20226bc255a75f52c762aece66c58fb76b2c4/tests/__init__.py -------------------------------------------------------------------------------- /tests/basi2c16.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewferrier/email2pdf/c3b20226bc255a75f52c762aece66c58fb76b2c4/tests/basi2c16.png -------------------------------------------------------------------------------- /tests/jpeg444.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/andrewferrier/email2pdf/c3b20226bc255a75f52c762aece66c58fb76b2c4/tests/jpeg444.jpg --------------------------------------------------------------------------------