├── .gitignore ├── LICENSE ├── Makefile ├── README.md ├── emlx ├── __init__.py └── emlx.py ├── setup.py └── tests ├── plaintext.emlx ├── richtext.emlx ├── test_plaintext.py └── test_richtext.py /.gitignore: -------------------------------------------------------------------------------- 1 | .coverage 2 | .DS_Store 3 | .pytest_cache 4 | __pycache__ 5 | build/ 6 | dist/ 7 | emlx.egg-info -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2020 Michael Belfrage 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | .PHONY : test 2 | 3 | test: 4 | python -m pytest 5 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Emlx 2 | ===== 3 | 4 | Emlx is the lightweight parser for `.emlx` files as used by Mail.app. 5 | 6 | 7 | Install 8 | ------- 9 | 10 | Install and update using `pip`: 11 | 12 | ``` 13 | pip install emlx 14 | ``` 15 | 16 | 17 | Basic usage 18 | ----------- 19 | 20 | ```python 21 | >>> import emlx 22 | >>> m = emlx.read("12345.emlx") 23 | >>> m.headers 24 | {'Subject': 'Re: Emlx library ✉️', 25 | 'From': 'Michael ', 26 | 'Date': 'Thu, 30 Jan 2020 20:25:43 +0100', 27 | 'Content-Type': 'text/plain; charset=utf-8', 28 | ...} 29 | >>> m.text 30 | "you're welcome :) ..." 31 | >>> m.html is None 32 | True 33 | >>> m.plist 34 | {'color': '000000', 35 | 'conversation-id': 12345, 36 | 'date-last-viewed': 1580423184, 37 | 'flags': {...} 38 | ...} 39 | >>> m.flags 40 | {'read': True, 'answered': True, 'attachment_count': 2} 41 | ``` 42 | 43 | 44 | Troubleshooting 45 | --------------- 46 | 47 | Make sure the terminal or IDE you are using has access to the Mail folders. For example, if you are using PyCharm, you will need to grant the program "Full Disk Access" by going to `System Settings > Privacy & Security` and turn it on for Pycharm. This will resolve errors such as `Operation not permitted`. 48 | 49 | 50 | Architecture 51 | ------------ 52 | 53 | An `.emlx` file consists of three parts: 54 | 55 | 1. bytecount on first line; 56 | 2. email content in MIME format (headers, body, attachments); 57 | 3. Apple property list (plist) with metadata. 58 | 59 | The second part (2.) is parsed by the `email` library. It is included in the Python standard library. Message objects generated by `emlx` extend `email.message.Message` and thus give access to its handy features. Additionally, `emlx` message objects provide the attributes `bytecount` (1.) as integer and `plist` (3.) as a Python dictionary. For convenience, it also offers the attributes `headers`, `text`, `html`, `url`, `id`, and `flags`. 60 | 61 | 62 | History 63 | ------- 64 | 65 | The `emlx` file format was introduced by Apple in 2005. It is similar to `eml`-files popular with other email clients; the difference is the added bytecount (start) and plist (end). For more, see [here](https://en.wikipedia.org/wiki/Email#Filename_extensions). 66 | -------------------------------------------------------------------------------- /emlx/__init__.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | """ 3 | 4 | emlx - the lightweight parser for emlx files. 5 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 6 | 7 | Basic usage: 8 | 9 | >>> import emlx 10 | >>> m = emlx.read("12345.emlx") 11 | >>> m.headers 12 | {'Subject': 'Re: Emlx library ✉️', 13 | 'From': 'Michael ', 14 | 'Date': 'Thu, 30 Jan 2020 20:25:43 +0100', 15 | 'Content-Type': 'text/plain; charset=utf-8', 16 | ...} 17 | >>> m.headers['Subject'] 18 | 'Re: Emlx library ✉️' 19 | >>> m.plist 20 | {'color': '000000', 21 | 'conversation-id': 12345, 22 | 'date-last-viewed': 1580423184, 23 | 'flags': {...} 24 | ...} 25 | >>> m.flags 26 | {'read': True, 'answered': True, 'attachment_count': 2} 27 | 28 | 29 | Architecture 30 | ------------ 31 | 32 | An `.emlx` file consists of three parts: 33 | 34 | 1. bytecount on first line; 35 | 2. email content in MIME format (headers, body, attachments); 36 | 3. Apple property list (plist) with metadata. 37 | 38 | The second part (2.) is parsed by the `email` library. It is included in 39 | the Python standard library. Message objects generated by `emlx` extend 40 | `email.message.Message` and thus give access to its handy features. 41 | 42 | Additionally, `emlx` message objects provide the attributes `bytecount` 43 | (1.) as integer and `plist` (3.) as a Python dict. For convenience, it 44 | also offers the attributes `headers`, `url`, `id`, and `flags`. 45 | 46 | 47 | History 48 | ------- 49 | 50 | The `emlx` file format was introduced by Apple in 2005. It is similar to 51 | `eml`-files popular with other email clients; the difference is the 52 | added bytecount (start) and plist (end). For more, see 53 | https://en.wikipedia.org/wiki/Email#Filename_extensions. 54 | 55 | 56 | Inspired by 57 | ----------- 58 | 59 | Karl Dubost - https://gist.github.com/karlcow/5276813 60 | Rui Carmo - https://the.taoofmac.com/space/blog/2008/03/03/2211 61 | Jamie Zawinski - https://www.jwz.org/blog/2005/07/emlx-flags/ 62 | """ 63 | 64 | __version__ = "1.0.0" 65 | 66 | from .emlx import read 67 | -------------------------------------------------------------------------------- /emlx/emlx.py: -------------------------------------------------------------------------------- 1 | # -*- coding: utf-8 -*- 2 | 3 | """ 4 | emlx.emlx 5 | ~~~~~~~~~~~~ 6 | 7 | This module implements the Emlx object and helper methods. 8 | """ 9 | 10 | import datetime 11 | import email 12 | from email.header import decode_header, make_header 13 | from email.iterators import typed_subpart_iterator 14 | import email.message 15 | import plistlib 16 | 17 | 18 | APPLE_MESSAGE_FLAGS = [ 19 | "read", 20 | "deleted", 21 | "answered", 22 | "encrypted", 23 | "flagged", 24 | "recent", 25 | "draft", 26 | "initial", 27 | "forwarded", 28 | "redirected", 29 | ("attachment_count", 6), 30 | ("priority_level", 7), 31 | "signed", 32 | "is_junk", 33 | "is_not_junk", 34 | ("font_size_delta", 3), 35 | "junk_mail_level_recorded", 36 | "highlight_text_in_toc", 37 | ] 38 | 39 | 40 | class Emlx(email.message.Message): 41 | """This class represents an emlx object. 42 | 43 | It's implemented as an extension of `email.message.Message` 44 | from the Python standard library. 45 | """ 46 | 47 | def __init__(self): 48 | super().__init__() 49 | 50 | # MIME message byte length. 51 | self.bytecount = 0 52 | 53 | # Dictionary of MIME message headers, e.g., 'From' and 'Date'. 54 | # See: https://en.wikipedia.org/wiki/MIME#MIME_header_fields 55 | self.headers = None 56 | 57 | # String of 'Message-ID' header if present. Case-insentive match. 58 | # See: https://tools.ietf.org/html/rfc2392 59 | self.id = None 60 | 61 | # Message url if available. 62 | # See: https://daringfireball.net/2007/12/message_urls_leopard_mail 63 | self.url = None 64 | 65 | # Dictionary of plist included by Mail.app. 66 | self.plist = {} 67 | 68 | # Dictionary of flags set by Mail.app, e.g., 'read' or 'answered'. 69 | # This is part of `self.plist`. 70 | self.flags = {} 71 | 72 | @classmethod 73 | def from_filebuffer(cls, filebuffer, **kwargs): 74 | """Return an emlx object from an emlx filebuffer.""" 75 | 76 | bytecount = int(filebuffer.readline().strip()) 77 | 78 | if kwargs.get("plist_only"): 79 | self = Emlx() 80 | filebuffer.seek(bytecount, 1) 81 | else: 82 | mime_bytes = filebuffer.read(bytecount) 83 | self = email.message_from_bytes(mime_bytes, _class=Emlx) 84 | self.headers = decode_headers(self) 85 | self.id = get_case_insensitive(self.headers, "Message-ID") 86 | self.url = self.id and f"message:{self.id}" 87 | self.text = find_next_payload_of_type(self, "text", "plain") 88 | self.html = find_next_payload_of_type(self, "text", "html") 89 | 90 | self.bytecount = bytecount 91 | 92 | self.plist = plistlib.loads(filebuffer.read()) 93 | self.flags_raw = self.plist.get("flags", 0) 94 | self.flags = decode_plist_flags(self.flags_raw) 95 | self.plist["flags"] = self.flags 96 | 97 | return self 98 | 99 | 100 | def read(filepath_or_buffer, plist_only=False, **kwargs): 101 | """ 102 | Read an emlx file into an `Emlx` object. 103 | 104 | Parameters 105 | ---------- 106 | filepath_or_buffer : str or file-like object 107 | For example "12345.emlx" or objects with a `.read()` and `.seek()` 108 | method -- such as created by `open` or `io.BytesIO`. 109 | plist_only : bool, default False 110 | If `plist_only` is True, then the message MIME content will 111 | be skipped. The result will only include the `plist` attribute 112 | including `flags` and `bytecount`. This can speed things up. 113 | """ 114 | if isinstance(filepath_or_buffer, str): 115 | filebuffer = open(filepath_or_buffer, "rb") 116 | elif is_file_like(filepath_or_buffer): 117 | filebuffer = filepath_or_buffer 118 | else: 119 | raise ValueError("`filepath_or_buffer` is not a str or file-like object.") 120 | 121 | try: 122 | kwargs["plist_only"] = plist_only 123 | result = Emlx.from_filebuffer(filebuffer, **kwargs) 124 | finally: 125 | filebuffer.close() 126 | 127 | return result 128 | 129 | 130 | # Helper methods 131 | 132 | 133 | def is_file_like(obj): 134 | """Is `obj` file-like?""" 135 | return hasattr(obj, "read") and hasattr(obj, "seek") 136 | 137 | 138 | def decode_headers(message): 139 | """Decode headers in `email.message.Message`.""" 140 | return {key: str(safe_decode_header(header)) for key, header in message.items()} 141 | 142 | 143 | def safe_decode_header(header): 144 | """Not every email is RFC-conform.""" 145 | try: 146 | return make_header(decode_header(header)) 147 | except Exception: 148 | return header 149 | 150 | 151 | def find_next_payload_of_type(message, maintype="text", subtype=None): 152 | return next( 153 | ( 154 | part.get_payload() 155 | for part in typed_subpart_iterator(message, maintype, subtype) 156 | ), 157 | None, 158 | ) 159 | 160 | 161 | def decode_plist_flags(integer): 162 | """Decode flags of emlx plist given by `integer`. 163 | 164 | See: https://www.jwz.org/blog/2005/07/emlx-flags/ 165 | """ 166 | result = {} 167 | integer |= 1 << 33 168 | bits = bin(integer) 169 | index = 0 170 | for key in APPLE_MESSAGE_FLAGS: 171 | if isinstance(key, tuple): 172 | key, count = key 173 | value = int(bits[index - count : index], 2) 174 | else: 175 | value = bool(int(bits[index - 1])) 176 | count = 1 177 | if value: 178 | result[key] = value 179 | index -= count 180 | # Fix for "attachment_count" sometimes set to 63. Why is this? 181 | attachment_count = result.get("attachment_count") 182 | if attachment_count == 63: 183 | del result["attachment_count"] 184 | return result 185 | 186 | 187 | def get_case_insensitive(dictionary, key): 188 | """Get `key` in `dictionary` case-insensitively.""" 189 | key = key.lower() 190 | return next( 191 | (value for other_key, value in dictionary.items() if other_key.lower() == key), 192 | None, 193 | ) 194 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import find_packages 2 | from setuptools import setup 3 | 4 | with open("README.md", "r") as f: 5 | long_description = f.read() 6 | 7 | setup( 8 | name="emlx", 9 | version="1.0.4", 10 | url="https://github.com/mikez/emlx", 11 | project_urls={ 12 | "Code": "https://github.com/mikez/emlx", 13 | "Issue tracker": "https://github.com/mikez/emlx/issues", 14 | }, 15 | license="MIT", 16 | author="Michael Belfrage", 17 | author_email="consulting@belfrage.net", 18 | description="The leightweight parser for emlx files.", 19 | long_description=long_description, 20 | long_description_content_type="text/markdown", 21 | packages=find_packages(), 22 | classifiers=[ 23 | "License :: OSI Approved :: MIT License", 24 | "Operating System :: OS Independent", 25 | "Programming Language :: Python :: 3", 26 | ], 27 | python_requires=">=3.6", 28 | ) 29 | -------------------------------------------------------------------------------- /tests/plaintext.emlx: -------------------------------------------------------------------------------- 1 | 497 2 | Subject: Re: Emlx library 3 | Mime-Version: 1.0 4 | Content-Type: text/plain; charset=utf-8 5 | From: Michael 6 | In-Reply-To: <6FA2F219-2B7C-4962-9724-E9399409CCBE@example.com> 7 | Date: Thu, 6 Feb 2020 15:39:55 +0200 8 | Content-Transfer-Encoding: quoted-printable 9 | Message-Id: <7A129E26-2C1F-4517-B6B5-39460ED50E12@example.com> 10 | References: <6FA2F219-2B7C-4962-9724-E9399409CCBE@example.com> 11 | To: Python 12 | 13 | You're welcome. :) 14 | 15 | > Python wrote: 16 | > 17 | > Thank you! 18 | 19 | 20 | 21 | 22 | 23 | conversation-id 24 | 123456 25 | date-last-viewed 26 | 1581111111 27 | date-received 28 | 1581000000 29 | flags 30 | 8623489089 31 | remote-id 32 | 789 33 | 34 | 35 | -------------------------------------------------------------------------------- /tests/richtext.emlx: -------------------------------------------------------------------------------- 1 | 639 2 | Subject: Re: Emlx library 3 | Mime-Version: 1.0 4 | Content-Type: text/html; charset=utf-8 5 | From: Michael 6 | Date: Mon, 29 Mar 2021 18:52:23 +0200 7 | Content-Transfer-Encoding: 7bit 8 | Message-Id: <83E25460-8C9A-45AD-87BA-B3D70DFBC2E7@example.com> 9 | References: <6FA2F219-2B7C-4962-9724-E9399409CCBE@example.com> 10 | To: Python 11 | 12 |
You're welcome. :)

> Python <python@example.com> wrote:
>
> Thank you!
13 | 14 | 15 | 16 | 17 | conversation-id 18 | 123457 19 | date-last-viewed 20 | 1617036766 21 | date-received 22 | 1617036743 23 | flags 24 | 8623489089 25 | remote-id 26 | 790 27 | 28 | 29 | -------------------------------------------------------------------------------- /tests/test_plaintext.py: -------------------------------------------------------------------------------- 1 | import io 2 | import os.path 3 | 4 | import pytest 5 | 6 | import emlx 7 | 8 | 9 | TEST_FILE = "plaintext.emlx" 10 | TEST_BYTECOUNT = 497 11 | TEST_MESSAGE_ID = "<7A129E26-2C1F-4517-B6B5-39460ED50E12@example.com>" 12 | TEST_TEXT_LENGTH = TEST_PAYLOAD_LENGTH = 73 13 | # raw_flags = 8623489089 14 | # = ( 15 | # (1 << 0) # read 16 | # + (1 << 6) # draft 17 | # + (1 << 25) # is not junk 18 | # + (1 << 33) # (something undocumented Apple adds) 19 | # ) 20 | TEST_PLIST = { 21 | "conversation-id": 123456, 22 | "date-last-viewed": 1581111111, 23 | "date-received": 1581000000, 24 | "flags": {"read": True, "draft": True, "is_not_junk": True}, 25 | "remote-id": "789", 26 | } 27 | 28 | # Fixtures 29 | 30 | 31 | @pytest.fixture 32 | def message_filepath(request): 33 | dirname = os.path.dirname(request.module.__file__) 34 | return f"{dirname}/{TEST_FILE}" 35 | 36 | 37 | @pytest.fixture 38 | def message_content(message_filepath): 39 | with open(message_filepath, "rb") as filebuffer: 40 | return filebuffer.read() 41 | 42 | 43 | @pytest.fixture 44 | def message(message_filepath): 45 | return emlx.read(message_filepath) 46 | 47 | 48 | # Tests 49 | 50 | 51 | def test_bytecount(message): 52 | assert message.bytecount == TEST_BYTECOUNT 53 | 54 | 55 | def test_message_mime_content(message): 56 | assert message.headers["Message-Id"] == TEST_MESSAGE_ID 57 | assert message.id == TEST_MESSAGE_ID 58 | assert message.url == "message:" + TEST_MESSAGE_ID 59 | assert message.html is None 60 | assert len(message.text) == TEST_TEXT_LENGTH 61 | assert len(message.get_payload()) == TEST_PAYLOAD_LENGTH 62 | 63 | 64 | def test_plist_and_flags(message): 65 | assert message.plist == TEST_PLIST 66 | assert message.flags == TEST_PLIST["flags"] 67 | 68 | 69 | def test_message_buffered_as_bytesio(message_content): 70 | filebuffer = io.BytesIO(message_content) 71 | message = emlx.read(filebuffer) 72 | assert message.bytecount == TEST_BYTECOUNT 73 | assert message.id == TEST_MESSAGE_ID 74 | assert len(message.get_payload()) == TEST_PAYLOAD_LENGTH 75 | assert message.plist["flags"] == TEST_PLIST["flags"] 76 | 77 | 78 | def test_message_buffered_via_open(message_filepath): 79 | with open(message_filepath, "rb") as filebuffer: 80 | message = emlx.read(filebuffer) 81 | assert message.bytecount == TEST_BYTECOUNT 82 | assert message.id == TEST_MESSAGE_ID 83 | assert len(message.get_payload()) == TEST_PAYLOAD_LENGTH 84 | assert message.plist["flags"] == TEST_PLIST["flags"] 85 | 86 | 87 | def test_message_plist_only(message_filepath): 88 | message = emlx.read(message_filepath, plist_only=True) 89 | assert message.bytecount == TEST_BYTECOUNT 90 | assert message.url is None 91 | assert message.get_payload() is None 92 | assert message.plist == TEST_PLIST 93 | assert message.flags == TEST_PLIST["flags"] 94 | 95 | 96 | def test_message_plist_only_as_bytesio(message_content): 97 | filebuffer = io.BytesIO(message_content) 98 | message = emlx.read(filebuffer, plist_only=True) 99 | assert message.bytecount == TEST_BYTECOUNT 100 | assert message.url is None 101 | assert message.get_payload() is None 102 | assert message.plist == TEST_PLIST 103 | assert message.flags == TEST_PLIST["flags"] 104 | 105 | 106 | def test_message_plist_only_via_open(message_filepath): 107 | with open(message_filepath, "rb") as filebuffer: 108 | message = emlx.read(filebuffer, plist_only=True) 109 | assert message.bytecount == TEST_BYTECOUNT 110 | assert message.url is None 111 | assert message.get_payload() is None 112 | assert message.plist == TEST_PLIST 113 | assert message.flags == TEST_PLIST["flags"] 114 | -------------------------------------------------------------------------------- /tests/test_richtext.py: -------------------------------------------------------------------------------- 1 | import os.path 2 | 3 | import pytest 4 | 5 | import emlx 6 | 7 | 8 | TEST_FILE = "richtext.emlx" 9 | TEST_HTML_LENGTH = 291 10 | TEST_PAYLOAD_LENGTH = 291 11 | 12 | 13 | # Fixtures 14 | 15 | 16 | @pytest.fixture 17 | def message_filepath(request): 18 | dirname = os.path.dirname(request.module.__file__) 19 | return f"{dirname}/{TEST_FILE}" 20 | 21 | 22 | @pytest.fixture 23 | def message(message_filepath): 24 | return emlx.read(message_filepath) 25 | 26 | 27 | # Tests 28 | 29 | 30 | def test_message_mime_content(message): 31 | assert message.text is None 32 | assert len(message.html) == TEST_HTML_LENGTH 33 | assert len(message.get_payload()) == TEST_PAYLOAD_LENGTH 34 | --------------------------------------------------------------------------------