├── CHANGELOG.md ├── mputil ├── __init__.py └── map.py ├── LICENSE.txt ├── README.md ├── .gitignore ├── setup.py └── examples └── lazy_map-lazy_imap.ipynb /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Release Notes 2 | 3 | 4 | ### Version 0.1.1 5 | 6 | - Version bump to Include the license file in the PyPI distribution 7 | 8 | 9 | ### Version 0.1.0 10 | 11 | - Initial release 12 | -------------------------------------------------------------------------------- /mputil/__init__.py: -------------------------------------------------------------------------------- 1 | # mputil -- Utility functions for 2 | # Python's multiprocessing standard library module 3 | # 4 | # Author: Sebastian Raschka 5 | # License: MIT 6 | # Code Repository: https://github.com/rasbt/mputil 7 | 8 | from .map import lazy_map 9 | from .map import lazy_imap 10 | 11 | __all__ = [lazy_map, lazy_imap] 12 | 13 | __version__ = '0.1.1' 14 | __author__ = "Sebastian Raschka " 15 | -------------------------------------------------------------------------------- /LICENSE.txt: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Sebastian Raschka 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | [![PyPI version](https://badge.fury.io/py/mputil.svg)](http://badge.fury.io/py/mputil) 2 | ![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg) 3 | ![License](https://img.shields.io/badge/license-MIT-blue.svg) 4 | 5 | # mputil 6 | 7 | Utility functions for Python's multiprocessing standard library module 8 | 9 | ## Documentation 10 | 11 | Mputil is (currently) a rather small package that provides functions for memory-efficient multi-processing, based Python's `multiprocessing` standard library. Mputil doesn't have a full-blown documentation, yet. However, you can find explanations and usage examples in the Jupyter Notebook that is references in the "Examples" section below. 12 | 13 | ## Examples 14 | 15 | - [`lazy_map` and `lazy_imap`](https://github.com/rasbt/mputil/blob/master/examples/lazy_map-lazy_imap.ipynb) 16 | 17 | ## Installation 18 | 19 | The `mputil` package can be installed via `pip`: 20 | 21 | 22 | pip3 install mputil 23 | 24 | Alternatively, if you are using Anaconda/Miniconda, you can install `mputil` via the conda package manager from the conda-forge channel as follows: 25 | 26 | conda install mputil -c conda-forge 27 | 28 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # OS related temporary files 2 | .DS_Store 3 | 4 | # Byte-compiled / optimized / DLL files 5 | __pycache__/ 6 | *.py[cod] 7 | *$py.class 8 | 9 | # C extensions 10 | *.so 11 | 12 | # Distribution / packaging 13 | .Python 14 | env/ 15 | build/ 16 | develop-eggs/ 17 | dist/ 18 | downloads/ 19 | eggs/ 20 | .eggs/ 21 | lib/ 22 | lib64/ 23 | parts/ 24 | sdist/ 25 | var/ 26 | *.egg-info/ 27 | .installed.cfg 28 | *.egg 29 | 30 | # PyInstaller 31 | # Usually these files are written by a python script from a template 32 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 33 | *.manifest 34 | *.spec 35 | 36 | # Installer logs 37 | pip-log.txt 38 | pip-delete-this-directory.txt 39 | 40 | # Unit test / coverage reports 41 | htmlcov/ 42 | .tox/ 43 | .coverage 44 | .coverage.* 45 | .cache 46 | nosetests.xml 47 | coverage.xml 48 | *,cover 49 | .hypothesis/ 50 | 51 | # Translations 52 | *.mo 53 | *.pot 54 | 55 | # Django stuff: 56 | *.log 57 | local_settings.py 58 | 59 | # Flask stuff: 60 | instance/ 61 | .webassets-cache 62 | 63 | # Scrapy stuff: 64 | .scrapy 65 | 66 | # Sphinx documentation 67 | docs/_build/ 68 | 69 | # PyBuilder 70 | target/ 71 | 72 | # IPython Notebook 73 | .ipynb_checkpoints 74 | 75 | # pyenv 76 | .python-version 77 | 78 | # celery beat schedule file 79 | celerybeat-schedule 80 | 81 | # dotenv 82 | .env 83 | 84 | # virtualenv 85 | venv/ 86 | ENV/ 87 | 88 | # Spyder project settings 89 | .spyderproject 90 | 91 | # Rope project settings 92 | .ropeproject 93 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | # mputil -- Utility functions for 2 | # Python's multiprocessing standard library module 3 | # 4 | # Author: Sebastian Raschka 5 | # License: MIT 6 | # Code Repository: https://github.com/rasbt/mputil 7 | 8 | from setuptools import setup, find_packages 9 | 10 | 11 | def calculate_version(): 12 | initpy = open('mputil/__init__.py').read().split('\n') 13 | version = list(filter(lambda x: 14 | '__version__' in x, initpy))[0].split('\'')[1] 15 | return version 16 | 17 | 18 | package_version = calculate_version() 19 | 20 | setup(name='mputil', 21 | version=package_version, 22 | description="Utility functions for Python's multiprocessing module", 23 | author='Sebastian Raschka', 24 | author_email='mail@sebastianraschka.com', 25 | url='https://github.com/rasbt/mputil', 26 | license='MIT', 27 | zip_safe=True, 28 | packages=find_packages(), 29 | platforms='any', 30 | keywords=['multiprocessing'], 31 | data_files=[("", ["LICENSE.txt"]), ("", ["CHANGELOG.md"])], 32 | classifiers=[ 33 | 'License :: OSI Approved :: MIT License', 34 | 'Development Status :: 5 - Production/Stable', 35 | 'Operating System :: Microsoft :: Windows', 36 | 'Operating System :: POSIX', 37 | 'Operating System :: Unix', 38 | 'Operating System :: MacOS', 39 | 'Programming Language :: Python :: 3.6', 40 | 'Topic :: Scientific/Engineering', 41 | ], 42 | long_description=""" 43 | mputil is a package that provides utility functions for 44 | Python's multiprocessing standard library module 45 | 46 | 47 | Contact 48 | ============= 49 | 50 | If you have any questions or comments about mputil, please feel 51 | free to contact me via 52 | eMail: mail@sebastianraschka.com 53 | or Twitter: https://twitter.com/rasbt 54 | 55 | This project is hosted at https://github.com/rasbt/mputil 56 | 57 | """) 58 | -------------------------------------------------------------------------------- /mputil/map.py: -------------------------------------------------------------------------------- 1 | # mputil -- Utility functions for 2 | # Python's multiprocessing standard library module 3 | # 4 | # Author: Sebastian Raschka 5 | # License: MIT 6 | # Code Repository: https://github.com/rasbt/mputil 7 | 8 | import multiprocessing as mp 9 | from itertools import islice 10 | 11 | 12 | def lazy_map(data_processor, data_generator, n_cpus=1, stepsize=None): 13 | """A variant of multiprocessing.Pool.map that supports lazy evaluation 14 | 15 | As with the regular multiprocessing.Pool.map, the processes are spawned off 16 | asynchronously while the results are returned in order. In contrast to 17 | multiprocessing.Pool.map, the iterator (here: data_generator) is not 18 | consumed at once but evaluated lazily which is useful if the iterator 19 | (for example, a generator) contains objects with a large memory footprint. 20 | 21 | Parameters 22 | ========== 23 | data_processor : func 24 | A processing function that is applied to objects in `data_generator` 25 | 26 | data_generator : iterator or generator 27 | A python iterator or generator that yields objects to be fed into the 28 | `data_processor` function for processing. 29 | 30 | n_cpus=1 : int (default: 1) 31 | Number of processes to run in parallel. 32 | - If `n_cpus` > 0, the specified number of CPUs will be used. 33 | - If `n_cpus=0`, all available CPUs will be used. 34 | - If `n_cpus` < 0, all available CPUs - `n_cpus` will be used. 35 | 36 | stepsize : int or None (default: None) 37 | The number of items to fetch from the iterator to pass on to the 38 | workers at a time. 39 | If `stepsize=None` (default), the stepsize size will 40 | be set equal to `n_cpus`. 41 | 42 | Returns 43 | ========= 44 | list : A Python list containing the results returned 45 | by the `data_processor` function when called on 46 | all elements in yielded by the `data_generator` in 47 | sorted order. Note that the last list may contain 48 | fewer items if the number of elements in `data_generator` 49 | is not evenly divisible by `stepsize`. 50 | """ 51 | if not n_cpus: 52 | n_cpus = mp.cpu_count() 53 | elif n_cpus < 0: 54 | n_cpus = mp.cpu_count() - n_cpus 55 | 56 | if stepsize is None: 57 | stepsize = n_cpus 58 | 59 | results = [] 60 | 61 | with mp.Pool(processes=n_cpus) as p: 62 | while True: 63 | r = p.map(data_processor, islice(data_generator, stepsize)) 64 | if r: 65 | results.extend(r) 66 | else: 67 | break 68 | return results 69 | 70 | 71 | def lazy_imap(data_processor, data_generator, n_cpus=1, stepsize=None): 72 | """A variant of multiprocessing.Pool.imap that supports lazy evaluation 73 | 74 | As with the regular multiprocessing.Pool.imap, the processes are spawned 75 | off asynchronously while the results are returned in order. In contrast to 76 | multiprocessing.Pool.imap, the iterator (here: data_generator) is not 77 | consumed at once but evaluated lazily which is useful if the iterator 78 | (for example, a generator) contains objects with a large memory footprint. 79 | 80 | Parameters 81 | ========== 82 | data_processor : func 83 | A processing function that is applied to objects in `data_generator` 84 | 85 | data_generator : iterator or generator 86 | A python iterator or generator that yields objects to be fed into the 87 | `data_processor` function for processing. 88 | 89 | n_cpus=1 : int (default: 1) 90 | Number of processes to run in parallel. 91 | - If `n_cpus` > 0, the specified number of CPUs will be used. 92 | - If `n_cpus=0`, all available CPUs will be used. 93 | - If `n_cpus` < 0, all available CPUs - `n_cpus` will be used. 94 | 95 | stepsize : int or None (default: None) 96 | The number of items to fetch from the iterator to pass on to the 97 | workers at a time. 98 | If `stepsize=None` (default), the stepsize size will 99 | be set equal to `n_cpus`. 100 | 101 | Returns 102 | ========= 103 | list : A Python list containing the *n* results returned 104 | by the `data_processor` function when called on 105 | elements by the `data_generator` in 106 | sorted order; *n* is equal to the size of `stepsize`. If `stepsize` 107 | is None, *n* is equal to `n_cpus`. 108 | """ 109 | if not n_cpus: 110 | n_cpus = mp.cpu_count() 111 | elif n_cpus < 0: 112 | n_cpus = mp.cpu_count() - n_cpus 113 | 114 | if stepsize is None: 115 | stepsize = n_cpus 116 | 117 | with mp.Pool(processes=n_cpus) as p: 118 | while True: 119 | r = p.map(data_processor, islice(data_generator, stepsize)) 120 | if r: 121 | yield r 122 | else: 123 | break 124 | -------------------------------------------------------------------------------- /examples/lazy_map-lazy_imap.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "`mputil` -- Utility functions for Python's multiprocessing standard library module\n", 8 | "\n", 9 | "- Author: Sebastian Raschka \n", 10 | "- License: MIT\n", 11 | "- Code Repository: https://github.com/rasbt/mputil" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "# `lazy_map` and `lazy_imap` examples" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "`lazy_map` and `lazy_imap` are wrappers of the `map` function in Python's [`multiprocessing`](https://docs.python.org/3.6/library/multiprocessing.html) module. These wrappers evaluate the \"iterator\" lazily (in contrast to `map` and `imap`), which can be desirable if the iterator or generator yields objects with large memory footprints. Note that the syntax and use of `lazy_map` and `lazy_imap` do not exactly mimic their respective `map` and `imap` counterparts." 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## `lazy_map`" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "The `lazy_map` function requires a `data_processor` function as input as well as a `data_generator`. The `data_processor` is a function that performs a desired computation on each of the elements of an iterator (`data_generator`). This iterator is typically a Python generator that yields arbitrary objects." 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 1, 45 | "metadata": { 46 | "collapsed": true 47 | }, 48 | "outputs": [], 49 | "source": [ 50 | "def my_data_processor(x):\n", 51 | " # some expensive computation\n", 52 | " return x\n", 53 | "\n", 54 | "def my_data_generator():\n", 55 | " for i in range(20):\n", 56 | " yield i\n", 57 | " \n", 58 | "# think of `list(my_data_generator())`\n", 59 | "# as too large to fit into memory, which is why\n", 60 | "# we don't want to use map or imap" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "The `lazy_map` function then applies the `data_processor` function to a generator and returns a list containing the values returned by the `data_processor` in sorted order as shown in the example below:" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 2, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "name": "stdout", 77 | "output_type": "stream", 78 | "text": [ 79 | "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "from mputil import lazy_map\n", 85 | "\n", 86 | "gen = my_data_generator()\n", 87 | "print(lazy_map(data_processor=my_data_processor, \n", 88 | " data_generator=gen, \n", 89 | " n_cpus=0))" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "In the example above, `n_cpus` specifies the number of CPUs being used.\n", 97 | "\n", 98 | " - If `n_cpus` > 0, the specified number of CPUs will be used.\n", 99 | " - If `n_cpus=0`, all available CPUs will be used.\n", 100 | " - If `n_cpus` < 0, all available CPUs - `n_cpus` will be used." 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "## `lazy_imap`" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "The `lazy_imap` generator is similar to `lazy_map` function, but the results are returned in \"chunks\" (in sorted oder), which can be useful of the result list itself is too large to fit into memory. Like in `lazy_map`, the \"iterator\" (here: `data_generator`) is also evaluated lazily. The example below demonstrates the use of `lazy_imap`:" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 1, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "name": "stdout", 124 | "output_type": "stream", 125 | "text": [ 126 | "[0, 1, 2, 3]\n", 127 | "[4, 5, 6, 7]\n", 128 | "[8, 9, 10, 11]\n", 129 | "[12, 13, 14, 15]\n", 130 | "[16, 17, 18, 19]\n", 131 | "[20, 21]\n" 132 | ] 133 | } 134 | ], 135 | "source": [ 136 | "from mputil import lazy_imap\n", 137 | "\n", 138 | "def my_data_processor(x):\n", 139 | " # some expensive computation\n", 140 | " return x\n", 141 | "\n", 142 | "def my_data_generator():\n", 143 | " for i in range(22):\n", 144 | " yield i\n", 145 | "\n", 146 | "gen = my_data_generator()\n", 147 | "\n", 148 | "for chunk in lazy_imap(data_processor=my_data_processor, \n", 149 | " data_generator=gen, \n", 150 | " n_cpus=0):\n", 151 | " print(chunk)" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "Note that the number of elements in each return-list is by default equal to the number of CPUs being used. (The example above was run on a machine with 4 CPUs, thus each list consists of 4 elements).\n", 159 | "\n", 160 | "We can increase or decrease the number of elements in each return-list using the `stepsize` parameter; the `stepsize` determines how many values from the `data_generator` are evaluated are fetched in one `lazy_imap` iteration. If the number of objects that can be fetched from `data_generator` is not evenly divisible by `stepsize`, the number of elements in the last result-list is smaller than `stepsize` as shown in the example below:" 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 2, 166 | "metadata": {}, 167 | "outputs": [ 168 | { 169 | "name": "stdout", 170 | "output_type": "stream", 171 | "text": [ 172 | "[0, 1, 2, 3, 4, 5]\n", 173 | "[6, 7, 8, 9, 10, 11]\n", 174 | "[12, 13, 14, 15, 16, 17]\n", 175 | "[18, 19, 20, 21]\n" 176 | ] 177 | } 178 | ], 179 | "source": [ 180 | "gen = my_data_generator()\n", 181 | "\n", 182 | "for chunk in lazy_imap(data_processor=my_data_processor, \n", 183 | " data_generator=gen,\n", 184 | " stepsize=6,\n", 185 | " n_cpus=0):\n", 186 | " print(chunk)" 187 | ] 188 | } 189 | ], 190 | "metadata": { 191 | "kernelspec": { 192 | "display_name": "Python 3", 193 | "language": "python", 194 | "name": "python3" 195 | }, 196 | "language_info": { 197 | "codemirror_mode": { 198 | "name": "ipython", 199 | "version": 3 200 | }, 201 | "file_extension": ".py", 202 | "mimetype": "text/x-python", 203 | "name": "python", 204 | "nbconvert_exporter": "python", 205 | "pygments_lexer": "ipython3", 206 | "version": "3.6.1" 207 | } 208 | }, 209 | "nbformat": 4, 210 | "nbformat_minor": 2 211 | } 212 | --------------------------------------------------------------------------------