├── CHANGELOG.md
├── mputil
    ├── __init__.py
    └── map.py
├── LICENSE.txt
├── README.md
├── .gitignore
├── setup.py
└── examples
    └── lazy_map-lazy_imap.ipynb


/CHANGELOG.md:
--------------------------------------------------------------------------------
 1 | # Release Notes
 2 | 
 3 | 
 4 | ### Version 0.1.1
 5 | 
 6 | - Version bump to Include the license file in the PyPI distribution
 7 | 
 8 | 
 9 | ### Version 0.1.0
10 | 
11 | - Initial release
12 | 


--------------------------------------------------------------------------------
/mputil/__init__.py:
--------------------------------------------------------------------------------
 1 | # mputil -- Utility functions for
 2 | # Python's multiprocessing standard library module
 3 | #
 4 | # Author: Sebastian Raschka <mail@sebastianraschka.com>
 5 | # License: MIT
 6 | # Code Repository: https://github.com/rasbt/mputil
 7 | 
 8 | from .map import lazy_map
 9 | from .map import lazy_imap
10 | 
11 | __all__ = [lazy_map, lazy_imap]
12 | 
13 | __version__ = '0.1.1'
14 | __author__ = "Sebastian Raschka <mail@sebastianraschka.com>"
15 | 


--------------------------------------------------------------------------------
/LICENSE.txt:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Sebastian Raschka
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | [![PyPI version](https://badge.fury.io/py/mputil.svg)](http://badge.fury.io/py/mputil)
 2 | ![Python 3.6](https://img.shields.io/badge/python-3.6-blue.svg)
 3 | ![License](https://img.shields.io/badge/license-MIT-blue.svg)
 4 | 
 5 | # mputil
 6 | 
 7 | Utility functions for Python's multiprocessing standard library module
 8 | 
 9 | ## Documentation
10 | 
11 | Mputil is (currently) a rather small package that provides functions for memory-efficient multi-processing, based Python's `multiprocessing` standard library. Mputil doesn't have a full-blown documentation, yet. However, you can find explanations and usage examples in the Jupyter Notebook that is references in the "Examples" section below.
12 | 
13 | ## Examples
14 | 
15 | - [`lazy_map` and `lazy_imap`](https://github.com/rasbt/mputil/blob/master/examples/lazy_map-lazy_imap.ipynb)
16 | 
17 | ## Installation
18 | 
19 | The `mputil` package can be installed via `pip`:
20 | 
21 |     
22 |     pip3 install mputil
23 | 
24 | Alternatively, if you are using Anaconda/Miniconda, you can install `mputil` via the conda package manager from the conda-forge channel as follows:
25 | 
26 |     conda install mputil -c conda-forge
27 | 
28 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # OS related temporary files
 2 | .DS_Store
 3 | 
 4 | # Byte-compiled / optimized / DLL files
 5 | __pycache__/
 6 | *.py[cod]
 7 | *$py.class
 8 | 
 9 | # C extensions
10 | *.so
11 | 
12 | # Distribution / packaging
13 | .Python
14 | env/
15 | build/
16 | develop-eggs/
17 | dist/
18 | downloads/
19 | eggs/
20 | .eggs/
21 | lib/
22 | lib64/
23 | parts/
24 | sdist/
25 | var/
26 | *.egg-info/
27 | .installed.cfg
28 | *.egg
29 | 
30 | # PyInstaller
31 | #  Usually these files are written by a python script from a template
32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
33 | *.manifest
34 | *.spec
35 | 
36 | # Installer logs
37 | pip-log.txt
38 | pip-delete-this-directory.txt
39 | 
40 | # Unit test / coverage reports
41 | htmlcov/
42 | .tox/
43 | .coverage
44 | .coverage.*
45 | .cache
46 | nosetests.xml
47 | coverage.xml
48 | *,cover
49 | .hypothesis/
50 | 
51 | # Translations
52 | *.mo
53 | *.pot
54 | 
55 | # Django stuff:
56 | *.log
57 | local_settings.py
58 | 
59 | # Flask stuff:
60 | instance/
61 | .webassets-cache
62 | 
63 | # Scrapy stuff:
64 | .scrapy
65 | 
66 | # Sphinx documentation
67 | docs/_build/
68 | 
69 | # PyBuilder
70 | target/
71 | 
72 | # IPython Notebook
73 | .ipynb_checkpoints
74 | 
75 | # pyenv
76 | .python-version
77 | 
78 | # celery beat schedule file
79 | celerybeat-schedule
80 | 
81 | # dotenv
82 | .env
83 | 
84 | # virtualenv
85 | venv/
86 | ENV/
87 | 
88 | # Spyder project settings
89 | .spyderproject
90 | 
91 | # Rope project settings
92 | .ropeproject
93 | 


--------------------------------------------------------------------------------
/setup.py:
--------------------------------------------------------------------------------
 1 | # mputil -- Utility functions for
 2 | # Python's multiprocessing standard library module
 3 | #
 4 | # Author: Sebastian Raschka <mail@sebastianraschka.com>
 5 | # License: MIT
 6 | # Code Repository: https://github.com/rasbt/mputil
 7 | 
 8 | from setuptools import setup, find_packages
 9 | 
10 | 
11 | def calculate_version():
12 |     initpy = open('mputil/__init__.py').read().split('\n')
13 |     version = list(filter(lambda x:
14 |                           '__version__' in x, initpy))[0].split('\'')[1]
15 |     return version
16 | 
17 | 
18 | package_version = calculate_version()
19 | 
20 | setup(name='mputil',
21 |       version=package_version,
22 |       description="Utility functions for Python's multiprocessing module",
23 |       author='Sebastian Raschka',
24 |       author_email='mail@sebastianraschka.com',
25 |       url='https://github.com/rasbt/mputil',
26 |       license='MIT',
27 |       zip_safe=True,
28 |       packages=find_packages(),
29 |       platforms='any',
30 |       keywords=['multiprocessing'],
31 |       data_files=[("", ["LICENSE.txt"]), ("", ["CHANGELOG.md"])],
32 |       classifiers=[
33 |              'License :: OSI Approved :: MIT License',
34 |              'Development Status :: 5 - Production/Stable',
35 |              'Operating System :: Microsoft :: Windows',
36 |              'Operating System :: POSIX',
37 |              'Operating System :: Unix',
38 |              'Operating System :: MacOS',
39 |              'Programming Language :: Python :: 3.6',
40 |              'Topic :: Scientific/Engineering',
41 |       ],
42 |       long_description="""
43 | mputil is a package that provides utility functions for
44 | Python's multiprocessing standard library module
45 | 
46 | 
47 | Contact
48 | =============
49 | 
50 | If you have any questions or comments about mputil, please feel
51 | free to contact me via
52 | eMail: mail@sebastianraschka.com
53 | or Twitter: https://twitter.com/rasbt
54 | 
55 | This project is hosted at https://github.com/rasbt/mputil
56 | 
57 | """)
58 | 


--------------------------------------------------------------------------------
/mputil/map.py:
--------------------------------------------------------------------------------
  1 | # mputil -- Utility functions for
  2 | # Python's multiprocessing standard library module
  3 | #
  4 | # Author: Sebastian Raschka <mail@sebastianraschka.com>
  5 | # License: MIT
  6 | # Code Repository: https://github.com/rasbt/mputil
  7 | 
  8 | import multiprocessing as mp
  9 | from itertools import islice
 10 | 
 11 | 
 12 | def lazy_map(data_processor, data_generator, n_cpus=1, stepsize=None):
 13 |     """A variant of multiprocessing.Pool.map that supports lazy evaluation
 14 | 
 15 |     As with the regular multiprocessing.Pool.map, the processes are spawned off
 16 |     asynchronously while the results are returned in order. In contrast to
 17 |     multiprocessing.Pool.map, the iterator (here: data_generator) is not
 18 |     consumed at once but evaluated lazily which is useful if the iterator
 19 |     (for example, a generator) contains objects with a large memory footprint.
 20 | 
 21 |     Parameters
 22 |     ==========
 23 |     data_processor : func
 24 |         A processing function that is applied to objects in `data_generator`
 25 | 
 26 |     data_generator : iterator or generator
 27 |         A python iterator or generator that yields objects to be fed into the
 28 |         `data_processor` function for processing.
 29 | 
 30 |     n_cpus=1 : int (default: 1)
 31 |         Number of processes to run in parallel.
 32 |         - If `n_cpus` > 0, the specified number of CPUs will be used.
 33 |         - If `n_cpus=0`, all available CPUs will be used.
 34 |         - If `n_cpus` < 0, all available CPUs - `n_cpus` will be used.
 35 | 
 36 |     stepsize : int or None (default: None)
 37 |         The number of items to fetch from the iterator to pass on to the
 38 |         workers at a time.
 39 |         If `stepsize=None` (default), the stepsize size will
 40 |         be set equal to `n_cpus`.
 41 | 
 42 |     Returns
 43 |     =========
 44 |     list : A Python list containing the results returned
 45 |         by the `data_processor` function when called on
 46 |         all elements in yielded by the `data_generator` in
 47 |         sorted order. Note that the last list may contain
 48 |         fewer items if the number of elements in `data_generator`
 49 |         is not evenly divisible by `stepsize`.
 50 |     """
 51 |     if not n_cpus:
 52 |         n_cpus = mp.cpu_count()
 53 |     elif n_cpus < 0:
 54 |         n_cpus = mp.cpu_count() - n_cpus
 55 | 
 56 |     if stepsize is None:
 57 |         stepsize = n_cpus
 58 | 
 59 |     results = []
 60 | 
 61 |     with mp.Pool(processes=n_cpus) as p:
 62 |         while True:
 63 |             r = p.map(data_processor, islice(data_generator, stepsize))
 64 |             if r:
 65 |                 results.extend(r)
 66 |             else:
 67 |                 break
 68 |     return results
 69 | 
 70 | 
 71 | def lazy_imap(data_processor, data_generator, n_cpus=1, stepsize=None):
 72 |     """A variant of multiprocessing.Pool.imap that supports lazy evaluation
 73 | 
 74 |     As with the regular multiprocessing.Pool.imap, the processes are spawned
 75 |     off asynchronously while the results are returned in order. In contrast to
 76 |     multiprocessing.Pool.imap, the iterator (here: data_generator) is not
 77 |     consumed at once but evaluated lazily which is useful if the iterator
 78 |     (for example, a generator) contains objects with a large memory footprint.
 79 | 
 80 |     Parameters
 81 |     ==========
 82 |     data_processor : func
 83 |         A processing function that is applied to objects in `data_generator`
 84 | 
 85 |     data_generator : iterator or generator
 86 |         A python iterator or generator that yields objects to be fed into the
 87 |         `data_processor` function for processing.
 88 | 
 89 |     n_cpus=1 : int (default: 1)
 90 |         Number of processes to run in parallel.
 91 |         - If `n_cpus` > 0, the specified number of CPUs will be used.
 92 |         - If `n_cpus=0`, all available CPUs will be used.
 93 |         - If `n_cpus` < 0, all available CPUs - `n_cpus` will be used.
 94 | 
 95 |     stepsize : int or None (default: None)
 96 |         The number of items to fetch from the iterator to pass on to the
 97 |         workers at a time.
 98 |         If `stepsize=None` (default), the stepsize size will
 99 |         be set equal to `n_cpus`.
100 | 
101 |     Returns
102 |     =========
103 |     list : A Python list containing the *n* results returned
104 |         by the `data_processor` function when called on
105 |         elements by the `data_generator` in
106 |         sorted order; *n* is equal to the size of `stepsize`. If `stepsize`
107 |         is None, *n* is equal to `n_cpus`.
108 |     """
109 |     if not n_cpus:
110 |         n_cpus = mp.cpu_count()
111 |     elif n_cpus < 0:
112 |         n_cpus = mp.cpu_count() - n_cpus
113 | 
114 |     if stepsize is None:
115 |         stepsize = n_cpus
116 | 
117 |     with mp.Pool(processes=n_cpus) as p:
118 |         while True:
119 |             r = p.map(data_processor, islice(data_generator, stepsize))
120 |             if r:
121 |                 yield r
122 |             else:
123 |                 break
124 | 


--------------------------------------------------------------------------------
/examples/lazy_map-lazy_imap.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "`mputil` -- Utility functions for Python's multiprocessing standard library module\n",
  8 |     "\n",
  9 |     "- Author: Sebastian Raschka <mail@sebastianraschka.com>\n",
 10 |     "- License: MIT\n",
 11 |     "- Code Repository: https://github.com/rasbt/mputil"
 12 |    ]
 13 |   },
 14 |   {
 15 |    "cell_type": "markdown",
 16 |    "metadata": {},
 17 |    "source": [
 18 |     "# `lazy_map` and `lazy_imap` examples"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "`lazy_map` and `lazy_imap` are wrappers of the `map` function in Python's [`multiprocessing`](https://docs.python.org/3.6/library/multiprocessing.html) module. These wrappers evaluate the \"iterator\" lazily (in contrast to `map` and `imap`), which can be desirable if the iterator or generator yields objects with large memory footprints. Note that the syntax and use of `lazy_map` and `lazy_imap` do not exactly mimic their respective `map` and `imap` counterparts."
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "markdown",
 30 |    "metadata": {},
 31 |    "source": [
 32 |     "## `lazy_map`"
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "markdown",
 37 |    "metadata": {},
 38 |    "source": [
 39 |     "The `lazy_map` function requires a `data_processor` function as input as well as a `data_generator`. The `data_processor` is a function that performs a desired computation on each of the elements of an iterator (`data_generator`). This iterator is typically a Python generator that yields arbitrary objects."
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "code",
 44 |    "execution_count": 1,
 45 |    "metadata": {
 46 |     "collapsed": true
 47 |    },
 48 |    "outputs": [],
 49 |    "source": [
 50 |     "def my_data_processor(x):\n",
 51 |     "    # some expensive computation\n",
 52 |     "    return x\n",
 53 |     "\n",
 54 |     "def my_data_generator():\n",
 55 |     "    for i in range(20):\n",
 56 |     "        yield i\n",
 57 |     "    \n",
 58 |     "# think of `list(my_data_generator())`\n",
 59 |     "# as too large to fit into memory, which is why\n",
 60 |     "# we don't want to use map or imap"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "markdown",
 65 |    "metadata": {},
 66 |    "source": [
 67 |     "The `lazy_map` function then applies the `data_processor` function to a generator and returns a list containing the values returned by the `data_processor` in sorted order as shown in the example below:"
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "code",
 72 |    "execution_count": 2,
 73 |    "metadata": {},
 74 |    "outputs": [
 75 |     {
 76 |      "name": "stdout",
 77 |      "output_type": "stream",
 78 |      "text": [
 79 |       "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]\n"
 80 |      ]
 81 |     }
 82 |    ],
 83 |    "source": [
 84 |     "from mputil import lazy_map\n",
 85 |     "\n",
 86 |     "gen = my_data_generator()\n",
 87 |     "print(lazy_map(data_processor=my_data_processor, \n",
 88 |     "               data_generator=gen, \n",
 89 |     "               n_cpus=0))"
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "markdown",
 94 |    "metadata": {},
 95 |    "source": [
 96 |     "In the example above, `n_cpus` specifies the number of CPUs being used.\n",
 97 |     "\n",
 98 |     "    - If `n_cpus` > 0, the specified number of CPUs will be used.\n",
 99 |     "    - If `n_cpus=0`, all available CPUs will be used.\n",
100 |     "    - If `n_cpus` < 0, all available CPUs - `n_cpus` will be used."
101 |    ]
102 |   },
103 |   {
104 |    "cell_type": "markdown",
105 |    "metadata": {},
106 |    "source": [
107 |     "## `lazy_imap`"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "markdown",
112 |    "metadata": {},
113 |    "source": [
114 |     "The `lazy_imap` generator is similar to `lazy_map` function, but the results are returned in \"chunks\" (in sorted oder), which can be useful of the result list itself is too large to fit into memory. Like in `lazy_map`, the \"iterator\" (here: `data_generator`) is also evaluated lazily. The example below demonstrates the use of `lazy_imap`:"
115 |    ]
116 |   },
117 |   {
118 |    "cell_type": "code",
119 |    "execution_count": 1,
120 |    "metadata": {},
121 |    "outputs": [
122 |     {
123 |      "name": "stdout",
124 |      "output_type": "stream",
125 |      "text": [
126 |       "[0, 1, 2, 3]\n",
127 |       "[4, 5, 6, 7]\n",
128 |       "[8, 9, 10, 11]\n",
129 |       "[12, 13, 14, 15]\n",
130 |       "[16, 17, 18, 19]\n",
131 |       "[20, 21]\n"
132 |      ]
133 |     }
134 |    ],
135 |    "source": [
136 |     "from mputil import lazy_imap\n",
137 |     "\n",
138 |     "def my_data_processor(x):\n",
139 |     "    # some expensive computation\n",
140 |     "    return x\n",
141 |     "\n",
142 |     "def my_data_generator():\n",
143 |     "    for i in range(22):\n",
144 |     "        yield i\n",
145 |     "\n",
146 |     "gen = my_data_generator()\n",
147 |     "\n",
148 |     "for chunk in lazy_imap(data_processor=my_data_processor, \n",
149 |     "                       data_generator=gen, \n",
150 |     "                       n_cpus=0):\n",
151 |     "    print(chunk)"
152 |    ]
153 |   },
154 |   {
155 |    "cell_type": "markdown",
156 |    "metadata": {},
157 |    "source": [
158 |     "Note that the number of elements in each return-list is by default equal to the number of CPUs being used. (The example above was run on a machine with 4 CPUs, thus each list consists of 4 elements).\n",
159 |     "\n",
160 |     "We can increase or decrease the number of elements in each return-list using the `stepsize` parameter; the `stepsize` determines how many values from the `data_generator` are evaluated are fetched in one `lazy_imap` iteration. If the number of objects that can be fetched from `data_generator` is not evenly divisible by `stepsize`, the number of elements in the last result-list is smaller than `stepsize` as shown in the example below:"
161 |    ]
162 |   },
163 |   {
164 |    "cell_type": "code",
165 |    "execution_count": 2,
166 |    "metadata": {},
167 |    "outputs": [
168 |     {
169 |      "name": "stdout",
170 |      "output_type": "stream",
171 |      "text": [
172 |       "[0, 1, 2, 3, 4, 5]\n",
173 |       "[6, 7, 8, 9, 10, 11]\n",
174 |       "[12, 13, 14, 15, 16, 17]\n",
175 |       "[18, 19, 20, 21]\n"
176 |      ]
177 |     }
178 |    ],
179 |    "source": [
180 |     "gen = my_data_generator()\n",
181 |     "\n",
182 |     "for chunk in lazy_imap(data_processor=my_data_processor, \n",
183 |     "                       data_generator=gen,\n",
184 |     "                       stepsize=6,\n",
185 |     "                       n_cpus=0):\n",
186 |     "    print(chunk)"
187 |    ]
188 |   }
189 |  ],
190 |  "metadata": {
191 |   "kernelspec": {
192 |    "display_name": "Python 3",
193 |    "language": "python",
194 |    "name": "python3"
195 |   },
196 |   "language_info": {
197 |    "codemirror_mode": {
198 |     "name": "ipython",
199 |     "version": 3
200 |    },
201 |    "file_extension": ".py",
202 |    "mimetype": "text/x-python",
203 |    "name": "python",
204 |    "nbconvert_exporter": "python",
205 |    "pygments_lexer": "ipython3",
206 |    "version": "3.6.1"
207 |   }
208 |  },
209 |  "nbformat": 4,
210 |  "nbformat_minor": 2
211 | }
212 | 


--------------------------------------------------------------------------------