├── Resources
└── Cover
│ ├── CoverArt.jpg
│ └── ConcurrentPythonCover.jpg
├── Xperimental
├── getUrls.py
├── AsyncioQueue.py
└── TkinterDemo.py
├── README.md
├── .gitignore
├── Chapters
├── 03_Communicating_Sequential_Processes.md
├── 01_Preface.md
├── 00_Notes.md
└── 02_Introduction.md
└── events
└── ConcurrentPythonDeveloperRetreat.html
/Resources/Cover/CoverArt.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BruceEckel/ConcurrentPython/HEAD/Resources/Cover/CoverArt.jpg
--------------------------------------------------------------------------------
/Resources/Cover/ConcurrentPythonCover.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BruceEckel/ConcurrentPython/HEAD/Resources/Cover/ConcurrentPythonCover.jpg
--------------------------------------------------------------------------------
/Xperimental/getUrls.py:
--------------------------------------------------------------------------------
1 | import asyncio
2 | import aiohttp
3 |
4 | async def fetch_page(url):
5 | response = await aiohttp.request('GET', "http://" + url)
6 | assert response.status == 200
7 | content = await response.read()
8 | print('URL: {0}: Content: {1}'.format(url, content))
9 |
10 |
11 | loop = asyncio.get_event_loop()
12 | loop.run_until_complete(asyncio.wait(map(fetch_page, [
13 | 'google.com', 'cnn.com', 'twitter.com'
14 | ])))
15 | loop.close()
16 |
--------------------------------------------------------------------------------
/Xperimental/AsyncioQueue.py:
--------------------------------------------------------------------------------
1 | import asyncio
2 | import random
3 |
4 | async def do_work(task_name, work_queue):
5 | while not work_queue.empty():
6 | queue_item = await work_queue.get()
7 | print('{0} got: {1}'.format(task_name, queue_item))
8 | await asyncio.sleep(random.random())
9 |
10 | def execute(tasks):
11 | loop = asyncio.get_event_loop()
12 | loop.run_until_complete(asyncio.wait(tasks))
13 | loop.close()
14 |
15 | if __name__ == "__main__":
16 | q = asyncio.Queue()
17 |
18 | [q.put_nowait(i) for i in range(20)]
19 |
20 | print(q)
21 |
22 | execute([asyncio.async(do_work('task' + str(t), q))
23 | for t in range(3)])
24 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # ConcurrentPython
2 | An intermediate-to-advanced book on Python concurrency
3 |
4 | > This is a work in progress. It is being created in the open in order to produce feedback and
5 | > ultimately make it as accurate as possible.
6 |
7 | * The ebook version will remain free, although I haven't yet decided on the specific license.
8 |
9 | * This book is written using [Pandoc](http://pandoc.org/)-flavored Markdown. Thus, although you can read the chapters
10 | directly in the `Chapters` subdirectory of this repository, you will see various artifacts from Pandoc's Markdown
11 | extensions that don't render via Github Markdown. These are typically minor issues and don't greatly impact readability.
12 |
13 | * The epub build system is not yet in place.
14 |
--------------------------------------------------------------------------------
/Xperimental/TkinterDemo.py:
--------------------------------------------------------------------------------
1 | import tkinter as tk
2 |
3 | class Application(tk.Frame):
4 | def __init__(self, master=None):
5 | super().__init__(master)
6 | self.pack()
7 | self.create_widgets()
8 |
9 | def create_widgets(self):
10 | self.hi_there = tk.Button(self)
11 | self.hi_there["text"] = "Hello World\n(click me)"
12 | self.hi_there["command"] = self.say_hi
13 | self.hi_there.pack(side="top")
14 |
15 | self.quit = tk.Button(self, text="QUIT", fg="red",
16 | command=root.destroy)
17 | self.quit.pack(side="bottom")
18 |
19 | def say_hi(self):
20 | print("hi there, everyone!")
21 |
22 | root = tk.Tk()
23 | app = Application(root)
24 | app.mainloop()
25 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Byte-compiled / optimized / DLL files
2 | __pycache__/
3 | *.py[cod]
4 | *$py.class
5 |
6 | # C extensions
7 | *.so
8 |
9 | # Distribution / packaging
10 | .Python
11 | env/
12 | build/
13 | develop-eggs/
14 | dist/
15 | downloads/
16 | eggs/
17 | .eggs/
18 | lib/
19 | lib64/
20 | parts/
21 | sdist/
22 | var/
23 | *.egg-info/
24 | .installed.cfg
25 | *.egg
26 |
27 | # PyInstaller
28 | # Usually these files are written by a python script from a template
29 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
30 | *.manifest
31 | *.spec
32 |
33 | # Installer logs
34 | pip-log.txt
35 | pip-delete-this-directory.txt
36 |
37 | # Unit test / coverage reports
38 | htmlcov/
39 | .tox/
40 | .coverage
41 | .coverage.*
42 | .cache
43 | nosetests.xml
44 | coverage.xml
45 | *,cover
46 | .hypothesis/
47 |
48 | # Translations
49 | *.mo
50 | *.pot
51 |
52 | # Django stuff:
53 | *.log
54 | local_settings.py
55 |
56 | # Flask stuff:
57 | instance/
58 | .webassets-cache
59 |
60 | # Scrapy stuff:
61 | .scrapy
62 |
63 | # Sphinx documentation
64 | docs/_build/
65 |
66 | # PyBuilder
67 | target/
68 |
69 | # IPython Notebook
70 | .ipynb_checkpoints
71 |
72 | # pyenv
73 | .python-version
74 |
75 | # celery beat schedule file
76 | celerybeat-schedule
77 |
78 | # dotenv
79 | .env
80 |
81 | # virtualenv
82 | venv/
83 | ENV/
84 |
85 | # Spyder project settings
86 | .spyderproject
87 |
88 | # Rope project settings
89 | .ropeproject
90 |
--------------------------------------------------------------------------------
/Chapters/03_Communicating_Sequential_Processes.md:
--------------------------------------------------------------------------------
1 | Communicating Sequential Processes
2 | ==================================
3 |
4 | The biggest problem in concurrency is that tasks can interfere with each other.
5 | There are certainly other problems, but this is the biggest. This interference
6 | generally appears in the form of two tasks attempting to read and write the
7 | same data storage. Because the tasks run independently, you can't know which
8 | one has modified the storage, so the data is effectively corrupt. This is the
9 | problem of *shared-memory concurrency*.
10 |
11 | You will see later in this book that there are concurrency strategies which
12 | attempt to solve the problem by locking the storage while one task is using
13 | it so other tasks are unable to read or write that storage. Although there is
14 | not yet a conclusive proof, some people believe that this dance is so tricky and
15 | complicated that it's impossible to write a correct program of any complexity
16 | using shared-memory concurrency.
17 |
18 | One solution to this problem is to altogether eliminate the possibility of
19 | shared storage. Each task is isolated, and the only way to communicate with
20 | other tasks is through controlled channels that safely pass data from one task
21 | to another. This is the general description of *communicating sequential
22 | processes* (CSP). The *sequential* term means that, within any process, you can
23 | effectively ignore the fact that you are working within a concurrent world and
24 | program as you normally do, sequentially from beginning to end. By defending you
25 | from shared-memory pitfalls, CSP allows you to think more simply about the
26 | problem you're solving.
27 |
28 | This book explores a number of strategies that implement CSP, but the easiest
29 | place to start is probably Python's built-in `multiprocessing` module. It turns
30 | out that `multiprocessing` is not a seamless implementation of CSP, but we
31 | shall start by pretending that it is and ignoring the "leaks" in the abstraction.
32 | This produces a fairly nice introduction to CSP, and later in the book we can
33 | address the leaks as necessary.
34 |
--------------------------------------------------------------------------------
/events/ConcurrentPythonDeveloperRetreat.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
Concurrent Python Developer Retreat — July 16-19 2017
4 |
5 |
6 | The Concurrent Python Developer Retreat
7 | =======================================
8 | ## Featuring Luciano Ramalho
9 | ### Author of Fluent Python
10 | #### Crested Butte, Colorado, July 16-19 2017
11 |
12 |
13 | ### The Retreat
14 |
15 | We'll be exploring topics and tools around my book project
16 | Concurrent Python.
17 | Although I've added material based on the *Concurrency* chapter from
18 | On Java 8,
19 | this book is still in its research and organization phase; I expect it to be a couple of
20 | years before I consider it finished. The electronic version of the book is meant to
21 | be publicly available during and after development. The book is written in (Pandoc-flavored)
22 | Markdown so you can read it directly in the Github repository.
23 |
24 | **Prerequisites**: The book is intended for readers who know Python but don't know anything about
25 | concurrency. Thus, for the retreat you should know Python, but I don't expect any concurrency knowledge.
26 |
27 | - Miguel Grinberg's [Asynchronous Python for the Complete Beginner](
28 | https://www.youtube.com/watch?v=iG6fr81xHKA) is a nice
29 | overview of some of the core concepts.
30 |
31 |
32 | ### Activities
33 |
34 | The following activities may occur as desired:
35 |
36 | - Hikes and other outdoor adventures
37 | - Meals at local restaurants
38 | - Barbeques, other evening activities
39 |
40 |
41 | ### Price
42 |
43 | - Minimum: $75
44 |
45 | - **Suggested (pay what you can):**
46 |
47 | - You’re paying: $250
48 |
49 | - Your Employer is paying: $500
50 |
51 |
52 | ### Location
53 |
54 | - Bruce Eckel’s living room, Crested Butte, CO (WiFi, TV Monitor w/ Chromecast, Printer/Scanner and
55 | Kitchen). We can comfortably fit 12 people.
56 |
57 | - For variety, we go to coffee shops and group houses.
58 |
59 | - If people are staying in the hostel, that has a nice community space as well.
60 |
61 | - “Townie” bikes can be rented during the event to more easily get around town.
62 |
63 | - **Note:** Although you can find lodging — even nice hotels — up on the mountain
64 | (called *Mount* Crested Butte, 3 miles away with a free bus running to town), it's much nicer
65 | to stay in town (just called "Crested Butte," no "Mount"), so try to do that if you can.
66 |
67 |
68 | ### Getting to Crested Butte
69 |
70 | Details here
71 |
72 |
73 | ### Lodging
74 |
75 | Details here
76 |
77 |
78 | ### Registration
79 |
80 | Add yourself to the mailing list
81 | (click on "subscribe to this group").
82 |
83 |
84 | ### Questions?
85 | Email
86 |
87 |
88 |
89 |
90 |
--------------------------------------------------------------------------------
/Chapters/01_Preface.md:
--------------------------------------------------------------------------------
1 | Preface
2 | =======
3 |
4 | > This is an intermediate-to-advanced book on concurrency in Python.
5 |
6 | The reader is assumed to already be fluent in the Python language. There are a
7 | large number and variety of resources available to achieve this fluency, many of
8 | them free.
9 |
10 | In addition, I use the latest version of Python available at this writing,
11 | version 3.6, which includes important concurrency features. If you are wedded to
12 | an earlier version of Python, this book might not be for you.
13 |
14 | That said, this book *is* an introduction to the concepts and applications of
15 | concurrency in Python, for a beginner *to those topics*. I start from scratch,
16 | assuming you've never heard the term before or any of the surrounding ideas.
17 | This is a very reasonable approach because concurrency is relatively orthogonal
18 | to the rest of programming and it's entirely possible to be an expert in
19 | everything about a language for years before exploring concurrency for the first
20 | time.
21 |
22 | A Strange but Pragmatic Approach
23 | --------------------------------
24 |
25 | Virtually every discussion of concurrency starts with the complex, low-level
26 | nuts and bolts and then slowly works up to the more elegant high-level
27 | solutions. This is understandable because it's how we teach everything else.
28 |
29 | Concurrency is a strange topic in just about every conceivable way. As you shall
30 | learn in this book, the *only* justification for wading into the complexity and
31 | tribulations of concurrency is to make your program run faster. And if that's
32 | truly what you need, you probably *don't* need to learn everything about
33 | concurrency. What you'd really like is the shortest, easiest path to a faster
34 | program.
35 |
36 | So that's how I organize the book. After explaining what concurrency is, we'll
37 | start with the simplest and easiest---and typically, the highest
38 | level---approaches to speeding up your program with concurrency. As the book
39 | progresses, the techniques will become lower-level, messier, and more
40 | complicated. The goal is that you'll go only as far as you need in order to
41 | solve your speed problem, and after that you'll only go further if you're
42 | interested, or when you come back later with a new problem.
43 |
44 | eBook First
45 | -----------
46 |
47 | This book is developed in the open, as an eBook. Although I haven't yet settled
48 | on a license, my intent is for the eBook to remain free.
49 |
50 | At the 2016 Pycon in Portland, I held an open-spaces session around this book,
51 | and also attended a dinner where we talked about it. In both cases people argued
52 | that, while they really liked the idea of a free eBook, eventually they want a
53 | print book. Once the eBook version has been out long enough to evolve and settle
54 | down, and also to get adequate feedback to eliminate errors, I will revisit the
55 | issue and consider creating a print version of the book.
56 |
57 | Business Model
58 | --------------
59 |
60 | My two most recent books, [Atomic Scala](www.AtomicScala.com) and [On Java
61 | 8](www.OnJava8.com), were not free. In the process of developing those books I
62 | became aware that a big part of that choice was that---while I enjoyed learning
63 | about those languages and writing those books---I ultimately didn't want to do
64 | further work in those areas, so there was no additional financial support
65 | (seminars, consulting, etc.) created from those books. They needed to be their
66 | own endpoints.
67 |
68 | The Python community has called to me ever since I discovered it. I have
69 | postponed that call for a couple of decades, with the excuse "let me just finish
70 | this one thing, then I'll come join you" (an all-too-familiar refrain heard by
71 | the partners of computer programmers). I evolve slowly, but I know now that the
72 | Python community is where I'm happiest.
73 |
74 | While the thought of struggling with the arbitrary limitations and roadblocks
75 | of, say, the Java language (for details, see [On Java 8](www.OnJava8.com)) in a
76 | training or consulting context fills me with angst, the vision of working with
77 | people who already know the productivity and delights of Python fills me with
78 | joy. That's where I want to live, and I want this book to lay the groundwork for
79 | conferences, developer retreats, consulting, training, and other experiences. I
80 | believe it will satisfy my need for fun and fulfillment.
81 |
82 | I could certainly have taken the approach I've used in my other books and
83 | created an introductory Python book, but there are a vast number of those which
84 | do an outstanding job. I don't see myself making a big contribution there.
85 | Instead, I chose something I think most folks experience as quite challenging.
86 | Over the decades, I have had periodic, intense bouts with concurrency. I've come
87 | away from each of these believing that, finally, I understand the topic. And
88 | each time I've discovered later there was some gaping hole in my knowledge. You
89 | will discover that concurrency consistently produces excellent examples of the
90 | [Dunning-Kruger
91 | Effect](https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect), where the
92 | less you know, the more you think you know. I know enough about concurrency to
93 | know there can *always* appear some surprising hole in my knowlege.
94 |
95 | Source Code
96 | -----------
97 |
98 | All the source code for this book is available as copyrighted freeware,
99 | distributed along with the sources for the book's text, via
100 | [Github](https://github.com/BruceEckel/ConcurrentPython). To make sure you get
101 | the most current version, this is the official code distribution site.
102 |
103 | Coding Standards
104 | ----------------
105 |
106 | In the text of this book, identifiers (keywords, methods, variables, and class
107 | names) are set in bold, fixed-width `code font`. Some keywords, such as `class`,
108 | are used so much that the bolding can become tedious. Those which are
109 | distinctive enough are left in normal font.
110 |
111 | The code files in this book are tested with an automated system, and should work
112 | without compiler errors (except those specifically tagged) in the latest version
113 | of Python.
114 |
115 | Bug Reports
116 | -----------
117 |
118 | No matter how many tools a writer uses to detect errors, some always creep in
119 | and these often leap off the page for a fresh reader. If you discover anything
120 | you believe to be an error, please submit the error along with your suggested
121 | correction, for either the book's prose or examples,
122 | [here](https://github.com/BruceEckel/ConcurrentPython/issues). Your help is
123 | appreciated.
124 |
125 | Mailing List
126 | ------------
127 |
128 | For news and notifications, you can subscribe to the low-volume email list at
129 | [www.ConcurrentPython.com](http://www.ConcurrentPython.com). I don't use ads and
130 | strive to make the content as appropriate as possible.
131 |
132 | Colophon
133 | --------
134 |
135 | This book was written with Pandoc-flavored Markdown, and produced into ePub
136 | version 3 format using [Pandoc](http://pandoc.org/).
137 |
138 | The body font is Georgia and the headline font is Verdana. The code font is
139 | Ubuntu Mono, because it is especially compact and allows more characters on a
140 | line without wrapping. I chose to place the code inline (rather than make
141 | listings into images, as I've seen some books do) because it is important to me
142 | that the reader be able to resize the font of the code listings when they resize
143 | the body font (otherwise, really, what's the point?).
144 |
145 | The build process for the book is automated, as well as the process to extract,
146 | compile and test the code examples. All automation is achieved through fairly
147 | extensive programs I wrote in Python 3.
148 |
149 | ### Cover Design
150 |
151 |
152 | Thanks
153 | ------
154 |
155 |
156 | Dedication
157 | ----------
158 |
159 |
--------------------------------------------------------------------------------
/Chapters/00_Notes.md:
--------------------------------------------------------------------------------
1 | General
2 | =======
3 |
4 | Concurrency: Taking a program that isn't running fast enough, breaking it into
5 | pieces, and "running those pieces separately." The what and how of "running separately"
6 | is where all the details and complexity lie for the various concurrency strategies.
7 |
8 | - On top of this, a small fraction of problems use some concurrency solutions as a structuring mechanism.
9 | Usually the driving force is "not fast enough" but sometimes (ironically) it can be
10 | "too complicated."
11 |
12 | The concept of whether something is synchronous refers to when a function
13 | finishes vs. when a function returns. In the vast majority of Python code a
14 | function returns when it finishes, so these two points are identical, that is,
15 | *synchronous*. But with an asynchronous call, the function returns control to
16 | the caller *before* the function finishes---typically much sooner. So the two
17 | events, returning and finishing, now happen at different points in time: they
18 | are *asynchronous*.
19 |
20 | If tasks don’t wait on each other then they are compute intensive
21 |
22 | Ideally, make tasks that don’t block on other tasks (deadlock prone)
23 |
24 |
25 | ## For Study and Exploration
26 | > Feel free to pull-request something you think might be helpful here
27 |
28 | - Trio
29 | - [docs](https://trio.readthedocs.io/en/latest/)
30 | - [Difference between asyncio and curio](https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/)
31 | - [Docs](http://curio.readthedocs.io/en/latest/)
32 | - [Repo](https://github.com/dabeaz/curio)
33 |
34 | - General Concurrency
35 | - [Ted Leung](http://www.slideshare.net/twleung/a-survey-of-concurrency-constructs)
36 | - [The Art of Multiprocessor Programming, Revised Reprint](http://amzn.to/2j1oneL)
37 | - [A Helpful Reading List of some selected topics](https://github.com/python-trio/trio/wiki/Reading-list)
38 | - [Grok the GIL Write Fast And Thread Safe Python ](https://www.youtube.com/watch?v=7SSYhuk5hmc&t=861s)
39 |
40 | - Actors:
41 | - [Thespian Docs](http://godaddy.github.io/Thespian/doc/)
42 | - [Thespian Motivation](https://engineering.godaddy.com/why-godaddy-built-an-actor-system-library/)
43 | - Can Ponylang interface with Python?
44 | - [dramatiq: Simple distributed task processing for Python 3](
45 | https://github.com/Bogdanp/dramatiq)
46 |
47 | - Async:
48 | - [Christian Medina](https://hackernoon.com/threaded-asynchronous-magic-and-how-to-wield-it-bba9ed602c32#.l8tws7nkv)
49 | - [PyMotw overview](https://pymotw.com/3/asyncio/index.html)
50 | - [Web Crawler, Guido et. al.](http://aosabook.org/en/500L/a-web-crawler-with-asyncio-coroutines.html)
51 | - [Mike Bayer on Async](http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/)
52 | - [Youtube: asyncawait and asyncio in Python 3 6 and beyond](https://www.youtube.com/watch?v=2ZFFv-wZ8_g)
53 | - uvloop: faster replacement for the built-in asyncio event loop.
54 |
55 | - Communicating Sequential Processes (CSP)
56 | - [PyCSP](https://github.com/runefriborg/pycsp/wiki)
57 | - [PyCSP Slides](http://arild.github.io/csp-presentation/#1)
58 |
59 | - [Join Calculus](https://en.wikipedia.org/wiki/Join-calculus)
60 | - [Chymyst — declarative concurrency in Scala](https://github.com/Chymyst/chymyst-core/) -- good resource for
61 | ideas and examples.
62 | - [Join-calculus for Python](https://github.com/maandree/join-python)
63 |
64 | - AioHTTP:
65 | - [Docs](http://aiohttp.readthedocs.io/en/stable/)
66 | - [An Intro](http://stackabuse.com/python-async-await-tutorial/)
67 |
68 | - Remote Objects
69 | - [Pyro4](https://pythonhosted.org/Pyro4/) Mature Python remote object implementation.
70 |
71 | - Calling Go from Python (example of taking advantage of another language's concurrency model)
72 | - [Example](https://github.com/jbuberel/buildmodeshared/tree/master/gofrompython)
73 | (For windows it would have to be done in the bash shell but that might be OK)
74 | - RabbitMQ allows cross-language calls.
75 |
76 | - An example: Each task controls a pixel/block on the screen. Produces a visual,
77 | graphic idea of what's going on.
78 | - Which graphics library to use?
79 | - [Processing](http://py.processing.org/)
80 | - [Tkinter](http://stackoverflow.com/questions/4842156/manipulating-individual-pixel-colors-in-the-tkinter-canvas-widget)
81 |
82 | - Asyncio example: Webcomic Reader
83 | - Text-file database keeps track of last comic read
84 | - Each comic is simply opened in a browser tab
85 |
86 | - [DASK parallel computing library for analytic computing](https://dask.pydata.org)
87 |
88 | - [Pachyderm Pipeline System](http://docs.pachyderm.io/en/latest/reference/pachyderm_pipeline_system.html)
89 | - [Slack channel]( http://slack.pachyderm.io/)
90 | - "Our official release for the CLI are for OSX and Linux as of now. However, we do have windows users that work with Pachyderm via the new Linux subsystem in Windows 10. Also, the CLI is only one choice for interacting with Pachyderm. You can also use the Python, Go, or other client, which should work just fine on Windows."
91 |
92 | - Queue-based Concurrency
93 | - Celery on Rabbit MQ
94 |
95 | - Misc
96 | - Bridge between Python and Java: https://www.py4j.org/
97 | - Cython for speed, making it easier: https://github.com/AlanCristhian/statically
98 | - Compiler: https://nuitka.net/
99 |
100 | What Confuses You About Concurrency?
101 | ====================================
102 | (Notes from an open-spaces session at Pycon 2017)
103 |
104 | - I don't want to think about it.
105 | - What about testing? New different kinds of failure modes introduced by concurrency.
106 | - Making sure an event is handled.
107 | - Martin Fowler's recent Youtube presentation on event-driven programming.
108 | - Twelve-factor application (Heroku came up with this term and has the list) to make things easily scalable
109 | - How to write things that can easily be scaled without being a hassle
110 | - Does async and await preclude gevent, twisted, etc.
111 | - How do I write code/libraries compliant with async and await?
112 |
113 | Tools
114 | =====
115 |
116 | - Pipenv
117 | - Codeclimate
118 |
119 |
120 | Miscellaneous
121 | =============
122 |
123 | > Many of these were collected for a general Python book, not necessarily for concurrency
124 |
125 | - [Nice intro to Python debugging](https://blog.sentry.io/2017/06/22/debugging-python-errors)
126 |
127 | - Code style checkers: flake8 and hacking
128 |
129 | - ptpython: better python REPL
130 |
131 | - logging instead of print()
132 |
133 | - tabulate and texttable for generating text tables.
134 |
135 | - Peewee ORM as database example, data storage
136 |
137 | - Debuggers:
138 | PuDB
139 | WDB web debugger
140 |
141 | - Testing:
142 | Behave, Gherkin, assertpy, hypothesis, robotframework
143 | https://testrun.org/tox/latest/
144 |
145 | - matplotlib
146 |
147 | - tqdm progress bar
148 |
149 | - Simple object serialization: http://marshmallow.readthedocs.org/en/latest/
150 |
151 | - Auto-reformatters:
152 | https://github.com/google/yapf
153 |
154 | - List example:
155 | things_i_love = ["meaningful examples", "irony", "lists"]
156 |
157 | - "Fight with the problem, not with the language"
158 |
159 | - http://www.pythontutor.com/
160 |
161 | - Python 3 special but basic features:
162 | https://asmeurer.github.io/python3-presentation/slides.html
163 |
164 | - PUDB debugger (screen text based)
165 |
166 | - Formatting: https://pypi.python.org/pypi/autopep8/
167 |
168 | - Design by contract: http://andreacensi.github.io/contracts/
169 |
170 | - dbm for databases ("Storing data")
171 |
172 | - Interacting with C? (cffi)
173 |
174 | - Requests library for HTTP http://docs.python-requests.org/en/latest/
175 |
176 | - https://testrun.org/tox/latest/
177 |
178 | - Beck Design Rules: http://martinfowler.com/bliki/BeckDesignRules.html
179 |
180 | - Bound Inner Classes: http://code.activestate.com/recipes/577070-bound-inner-classes/
181 |
182 | - Message broker:
183 | http://www.rabbitmq.com/tutorials/tutorial-one-python.html
184 |
185 | - Download, install, configure:
186 | https://github.com/princebot/pythonize
187 |
188 | - Functional programming:
189 | http://toolz.readthedocs.org/en/latest/api.html
190 |
191 | - Things that should be builtins?
192 | https://boltons.readthedocs.org/en/latest/
193 |
194 | - Best Python Quotes (for text examples):
195 | http://www.telegraph.co.uk/comedy/comedians/monty-python-s-25-funniest-quotes/life-of-brian-tedious-prophet/
196 |
197 | - Single interface to environment variables’, configuration files’, and command line arguments’ provided values:
198 | https://pypi.python.org/pypi/crumbs
199 |
200 | - "Hidden" features:
201 | http://stackoverflow.com/questions/101268/hidden-features-of-python
202 |
--------------------------------------------------------------------------------
/Chapters/02_Introduction.md:
--------------------------------------------------------------------------------
1 | Introduction
2 | ============
3 |
4 | > "Double, double toil and trouble; Fire burn and caldron bubble."---William
5 | > Shakespeare, *MacBeth*
6 |
7 | The only justification for concurrency is if your program doesn't run fast
8 | enough. There are a few languages designed to make concurrency relatively
9 | effortless---at least, their particular flavor of concurrency, which might or
10 | might not fit your needs---but these are not yet the most popular programming
11 | languages. Python does what it can to make concurrency "Pythonic," but you must
12 | still work within the limitations of a language that wasn't designed around
13 | concurrency.
14 |
15 | You can be thoroughly fluent in Python and know little to nothing about
16 | concurrency. Indeed, for this book I expect those exact credentials. This means,
17 | however, that diving into concurrency is a test of patience. You, a competent
18 | programmer, must suffer the indignity of being thrown back to "beginner" status,
19 | to learn new fundamental concepts (when, by this time, you thought you were
20 | an accomplished programmer).
21 |
22 | I say all this to give you one last chance to rethink your strategy and consider
23 | whether there might be some other way to make your program run faster. There are
24 | many simpler approaches:
25 |
26 | * Faster hardware is probably much cheaper than programmer time.
27 |
28 | * Have you upgraded to the latest version of Python? That often produces
29 | speed improvements.
30 |
31 | * Use a profiler to discover your speed bottleneck, and see if you can change
32 | an algorithm. For example, sometimes the choice of data structure makes all
33 | the difference.
34 |
35 | * Try using [Cython](http://cython.org/) on functions with performance bottlenecks.
36 |
37 | * See if your program will run on [PyPy](http://pypy.org/).
38 |
39 | * If your bottlenecks are math-intensive, consider [Numba](http://numba.pydata.org/).
40 |
41 | * Search the Internet for other performance approaches.
42 |
43 | If you jump right to concurrency without exploring these other approaches first,
44 | you might still discover that your problem is dominated by one of these issues
45 | and you must use them anyway, after a complicated struggle with concurrency. You
46 | might also need to use them in addition to a concurrency approach.
47 |
48 | What Does the Term *Concurrency* Mean?
49 | --------------------------------------
50 |
51 | As a non-concurrent programmer, you think in linear terms: A program runs from
52 | beginning to end, performing all its intermediate steps in sequence. This is the
53 | easiest way to think about programming.
54 |
55 | Concurrency breaks a program into pieces, typically called *tasks*. As much as
56 | possible, these tasks run independently of each other, with the hope that the
57 | whole program runs faster.
58 |
59 | That's concurrency in a nutshell: Independently-running tasks.
60 |
61 | At this point your mind should be filled with questions:
62 |
63 | * How do I start a task?
64 |
65 | * How do I stop a task?
66 |
67 | * How do I get information into a task?
68 |
69 | * How do I get results from a task?
70 |
71 | * What mechanism drives the tasks?
72 |
73 | * How do tasks fail?
74 |
75 | * Can tasks communicate with each other?
76 |
77 | * What happens if two tasks try to use the same piece of storage?
78 |
79 | These are not naive---they are precisely the questions you must ask to
80 | understand concurrency. There is no one answer to any of them. In fact, these
81 | questions distinguish the different concurrency strategies. In addition, each
82 | strategy produces a different set of rules governing how you write concurrent
83 | code. Different strategies also have domains of application where they shine,
84 | and other domains where they don't provide much benefit, or can even produce
85 | slower results.
86 |
87 | The term "concurrency" is often defined inconsistently in the literature. One
88 | of the more common distinctions declares concurrency to be when all tasks are
89 | driven by a single processor, vs *parallelism* where tasks are distributed
90 | among multiple processors. There are (mostly historical) reasons for this
91 | difference, but in this book I relegate "the number of processors driving the
92 | tasks" as one of the many variables involved with the general problem of
93 | concurrency. Concurrency doesn't increase the number of CPU cycles you have
94 | available---it tries to use them better.
95 |
96 | Concurrency is initially overwhelming precisely because it is a general goal
97 | ("make a program faster using tasks") with myriad strategies to achieve that
98 | goal (and more strategies regularly appear). The overwhelm diminishes when you
99 | understand it from the perspective of different competing strategies for the
100 | same problem.
101 |
102 | This book takes the pragmatic approach of only giving you what you need to solve
103 | your problem, presenting the simplest strategies first whenever possible. It's
104 | exceptionally difficult to understand *everything* about concurrency, so
105 | requiring that you do so in order to implement the simplest approach necessary
106 | to solve your problem is unreasonable and impractical.
107 |
108 | Each strategy has strengths and weaknesses. Typically, one strategy might solve
109 | some classes of problems quite well, while being relatively ineffective for
110 | other types of problems. Much of the art and craft of concurrency comes from
111 | understanding the different strategies and knowing which ones give better
112 | results for a particular set of constraints.
113 |
114 | After this introductory chapter, we'll start by exploring the concept of
115 | *Communicating Sequential Processes* (CSP), which confines each task inside its
116 | own world where it can't accidentally cause trouble for other tasks. Information
117 | is passed in and out through concurrency-safe *channels*.
118 |
119 | The CSP model is implemented within a number of concurrency strategies. We'll start
120 | by looking at the `multiprocessing` module that's part of the standard Python
121 | distribution.
122 |
123 | Next, we'll look at the idea of a *message queue*, which solves a specific but
124 | common type of problem ... We'll explore `RabbitMQ` and the `Celery` library
125 | which wraps it to make it more Pythonic.
126 |
127 | The next chapter explores how adding further constraints to CSP can produce
128 | significant benefits by learning about the *Actor Model*. ...
129 |
130 | The typical incentive for using multiple processors is when you have data that
131 | can be broken up into chunks and processed separately. There's a second common
132 | type of concurrency problem, which is when parts of your program spend time
133 | waiting on external operations---for example, Internet requests. In this
134 | situation it's not so much the number of processors you have but that they are
135 | stuck waiting on one thing when they might be working on something else. In
136 | fact, you can make much better use of a single processor by allowing it to
137 | jump from a place where it is waiting (*blocked*) to somewhere it can do some
138 | useful work. The Python 3.6 Asyncio and coroutines are targeted to this exact
139 | problem, and we'll spend the chapter exploring this strategy.
140 |
141 | A *foreign function call interface* allows you to make calls to code written in
142 | other languages. We can take advantage of this by calling into languages that
143 | are specifically designed to make concurrency easy. ...
144 |
145 | Finally we'll look at one of the more primitive and early constructs, the
146 | *thread*, along with the rather heavy constraint of Python's *global interpreter
147 | lock* (GIL). With all the other, better strategies available it's not clear that
148 | you actually need to understand things at this level, but it's something you
149 | might need to know if someone asks the question "what about threads?"
150 |
151 | [Add other topics here as they arise]
152 |
153 | *******************************************************************************
154 |
155 | - What does "asynchronous" mean?
156 | * When a function finishes vs. when it returns
157 | * Again, not something you're used to thinking about
158 | * Because the answer has always been: "at the same time"
159 |
160 | > [NOTE] The following material is still in rough form.
161 |
162 | Concurrency Superpowers
163 | -----------------------
164 |
165 | Imagine you're inside a science-fiction movie. You must search a tall building
166 | for a single item that is carefully and cleverly hidden in one of the ten
167 | million rooms of the building. You enter the building and move down a corridor.
168 | The corridor divides.
169 |
170 | By yourself it will take a hundred lifetimes to accomplish this task.
171 |
172 | Now suppose you have a strange superpower. You can split yourself in two, and
173 | send one of yourself down one corridor while you continue down the other. Every
174 | time you encounter a divide in a corridor or a staircase to the next level, you
175 | repeat this splitting-in-two trick. Eventually there is one of you for every
176 | terminating corridor in the entire building.
177 |
178 | Each corridor contains a thousand rooms. Your superpower is getting stretched a
179 | little thin, so you only make 50 of yourself to search the rooms in parallel.
180 |
181 | I'd love to be able to say, "Your superpower in the science-fiction movie?
182 | That's what concurrency is." That it's as simple as splitting yourself in two
183 | every time you have more tasks to solve. The problem is that any model we use
184 | to describe this phenomenon ends up being a leaky abstraction.
185 |
186 | Here's one of those leaks: In an ideal world, every time you cloned yourself,
187 | you would also duplicate a hardware processor to run that clone. But of course
188 | that isn't what happens---you actually might have four or eight processors on
189 | your machine (typical when this was written). You might also have more, and
190 | there are still lots of situations where you have only one processor. In the
191 | abstraction under discussion, the way physical processors are allocated not only
192 | leaks through but can even dominate your decisions.
193 |
194 | Let's change something in our science-fiction movie. Now when each clone
195 | searcher eventually reaches a door they must knock on it and wait until someone
196 | answers. If we have one processor per searcher, this is no problem---the
197 | processor just idles until the door is answered. But if we only have eight
198 | processors and thousands of searchers, we don't want a processor to be idle
199 | just because a searcher happens to be blocked, waiting for a door to be
200 | answered. Instead, we want that processor applied to a searcher where it can do
201 | some real work, so we need mechanisms to switch processors from one task to
202 | another.
203 |
204 | Many models are able to effectively hide the number of processors and allow you
205 | to pretend you have a very large number. But there are situations where this
206 | breaks down, when you must know the number of processors so you can work
207 | around that number.
208 |
209 | One of the biggest impacts depends on whether you have a single processor or
210 | more than one. If you only have one processor, then the cost of task-switching
211 | is also borne by that processor, and applying concurrency techniques to your
212 | system can make it run *slower*.
213 |
214 | This might make you decide that, in the case of a single processor, it never
215 | makes sense to write concurrent code. However, there are situations where the
216 | *model* of concurrency produces much simpler code and it's actually worth having
217 | it run slower to achieve that.
218 |
219 | In the case of the clones knocking on doors and waiting, even the
220 | single-processor system benefits from concurrency because it can switch from a
221 | task that is waiting (*blocked*) to one that is ready to go. But if all the
222 | tasks can run all the time, then the cost of switching slows everything down,
223 | and in that case concurrency usually only makes sense if you *do* have multiple
224 | processors.
225 |
226 | Suppose you are trying to crack some kind of encryption. The more workers trying
227 | to crack it at the same time, the better chance you have of finding the answer
228 | sooner. Here, each worker can constantly use as much processor time as you can
229 | give it, and the best situation is when each worker has their own processor---in
230 | this case (called a *compute-bound* problem), you should write the code so you
231 | *only* have as many workers as you have processors.
232 |
233 | In a customer-service department that takes phone calls, you only have a certain
234 | number of people, but you can have lots of phone calls. Those people (the
235 | processors) must work on one phone call at a time until it is complete, and
236 | extra calls must be queued.
237 |
238 | In the fairy tale of "The Shoemaker and the Elves," the shoemaker had too much
239 | work to do and when he was asleep, a group of elves came and made shoes for him.
240 | Here the work is distributed but even with a large number of physical processors
241 | the bottleneck is in the limitation of building certain parts of the shoe---if,
242 | for example, the sole takes the longest to make, that limits the rate of
243 | shoe creation and changes the way you design your solution.
244 |
245 | Thus, the problem you're trying to solve drives the design of the solution.
246 | There's the lovely abstraction of breaking a problem into subtasks that "run
247 | independently," then there's the reality of how it's actually going to happen.
248 | The physical reality keeps intruding upon, and shaking up, that abstraction.
249 |
250 | That's only part of the problem. Consider a factory that makes cakes. We've
251 | somehow distributed cake-making among workers, but now it's time for a
252 | worker to put their cake in a box. There's a box sitting there, ready to receive
253 | a cake. But before the worker can put the cake into the box, another worker
254 | darts in and puts *their* cake in the box instead! Our worker is already putting
255 | the cake in, and bam! The two cakes are smashed together and ruined. This is the
256 | common "shared memory" problem that produces what we call a *race condition*,
257 | where the result depends on which worker can get their cake in the box first
258 | (you typically solve the problem using a locking mechanism so one worker can
259 | grab the box first and prevent cake-smashing).
260 |
261 | This problem occurs when tasks that execute "at the same time" interfere with
262 | each other. It can happen in such a subtle and occasional manner that it's
263 | probably fair to say that concurrency is "arguably deterministic but effectively
264 | nondeterministic." That is, you can hypothetically write concurrent programs
265 | that, through care and code inspection, work correctly. In practice, however,
266 | it's much more common to write concurrent programs that only appear to work, but
267 | given the right conditions, will fail. These conditions might never actually
268 | occur, or occur so infrequently you never see them during testing. In fact, it's
269 | often impossible to write test code to generate failure conditions for your
270 | concurrent program. The resulting failures might only occur occasionally, and as
271 | a result they appear in the form of customer complaints. This is one of the
272 | strongest arguments for studying concurrency: If you ignore it, you're likely to
273 | get bitten.
274 |
275 | Concurrency thus seems fraught with peril, and if that makes you a bit fearful,
276 | this is probably a good thing. Although Python makes large improvements in
277 | concurrency, there are still no safety nets to tell you when you make a mistake.
278 | With concurrency, you're on your own, and only by being knowledgeable,
279 | suspicious and aggressive can you write reliable concurrent code.
280 |
281 |
282 | Concurrency is for Speed
283 | ------------------------
284 |
285 | After hearing about the pitfalls of concurrent programming, you may rightly be
286 | wondering if it's worth the trouble. The answer is "no, unless your program
287 | isn't running fast enough." And you'll want to think carefully before deciding
288 | it isn't. Do not casually jump into the well of grief that is concurrent
289 | programming. If there's a way to run your program on a faster machine or if you
290 | can profile it and discover the bottleneck and swap in a faster algorithm, do
291 | that instead. Only if there's clearly no other choice should you begin using
292 | concurrency, and then only in isolated places.
293 |
294 | The speed issue sounds simple at first: If you want a program to run faster,
295 | break it into pieces and run each piece on a separate processor. With our
296 | ability to increase clock speeds running out of steam (at least for conventional
297 | chips), speed improvements are appearing in the form of multicore processors
298 | rather than faster chips. To make your programs run faster, you'll have to learn
299 | to take advantage of those extra processors, and that's one thing that
300 | concurrency gives you.
301 |
302 | If you have a multiprocessor machine, multiple tasks can be distributed across
303 | those processors, which can dramatically improve throughput. This is often the
304 | case with powerful multiprocessor Web servers, which can distribute large
305 | numbers of user requests across CPUs in a program that allocates one thread per
306 | request.
307 |
308 | However, concurrency can often improve the performance of programs running on a
309 | *single* processor. This can sound a bit counterintuitive. If you think about
310 | it, a concurrent program running on a single processor should actually have
311 | *more* overhead than if all the parts of the program ran sequentially, because
312 | of the added cost of the *context switch* (changing from one task to another).
313 | On the surface, it would appear cheaper to run all the parts of the program as a
314 | single task and save the cost of context switching.
315 |
316 | The issue that can make a difference is *blocking*. If one task in your program
317 | is unable to continue because of some condition outside of the control of the
318 | program (typically I/O), we say that the task or the thread *blocks* (in our
319 | science-fiction story, the clone has knocked on the door and is waiting for it
320 | to open). Without concurrency, the whole program comes to a stop until the
321 | external condition changes. If the program is written using concurrency,
322 | however, the other tasks in the program can continue to execute when one task is
323 | blocked, so the program continues to move forward. In fact, from a performance
324 | standpoint, it makes no sense to use concurrency on a single-processor machine
325 | unless one of the tasks might block.
326 |
327 | A common example of performance improvements in single-processor systems is
328 | *event-driven programming*, in particular user-interface programming. Consider a
329 | program that performs some long-running operation and thus ends up ignoring user
330 | input and being unresponsive. If you have a "quit" button, you don't want to
331 | poll it in every piece of code you write. This produces awkward code, without
332 | any guarantee that a programmer won't forget to perform the check. Without
333 | concurrency, the only way to produce a responsive user interface is for all
334 | tasks to periodically check for user input. By creating a separate task that
335 | responds to user input, the program guarantees a certain level of
336 | responsiveness.
337 |
338 | A straightforward way to implement concurrency is at the operating system level,
339 | using *processes*, which are different from threads. A process is a
340 | self-contained program running within its own address space. Processes are
341 | attractive because the operating system usually isolates one process from
342 | another so they cannot interfere with each other, which makes programming with
343 | processes relatively easy (this is supported by Python's `multiprocessing`
344 | module). In contrast, threads share resources like memory and I/O, so a
345 | fundamental difficulty in writing multithreaded programs is coordinating these
346 | resources between different thread-driven tasks, so they cannot be accessed by
347 | more than one task at a time.
348 |
349 | Some people go so far as to advocate processes as the only reasonable approach
350 | to concurrency,^[Eric Raymond, for example, makes a strong case in *The Art of
351 | UNIX Programming* (Addison-Wesley, 2004).] but unfortunately there are generally
352 | quantity and overhead limitations to processes that prevent their applicability
353 | across the concurrency spectrum. (Eventually you get used to the standard
354 | concurrency refrain, "That approach works in some cases but not in other cases").
355 |
356 | Some programming languages are designed to isolate concurrent tasks from each
357 | other. These are generally called *functional languages*, where each function
358 | call produces no side effects (and so cannot interfere with other functions) and
359 | can thus be driven as an independent task. *Erlang* (and it's more modern
360 | variant, *Elixir*) is one such language, and it includes safe mechanisms for one
361 | task to communicate with another. If you find that a portion of your program
362 | must make heavy use of concurrency and you are running into excessive problems
363 | trying to build that portion, you might consider creating that part of your
364 | program in a dedicated concurrency language.
365 |
366 | Initially, Python took the more traditional approach of adding support for
367 | threading on top of a sequential language.^[It could be argued that trying to
368 | bolt concurrency onto a sequential language is a doomed approach, but you'll
369 | have to draw your own conclusions.] Instead of forking external processes in a
370 | multitasking operating system, threading creates tasks *within* the single
371 | process represented by the executing program. As we shall see, more modern
372 | approaches have since been added to Python.
373 |
374 | Concurrency imposes costs, including complexity costs, but can be outweighed by
375 | improvements in program design, resource balancing, and user convenience. In
376 | general, concurrency enables you to create a more loosely coupled design;
377 | otherwise, parts of your code would be forced to pay explicit attention to
378 | operations that would normally be handled by concurrency.
379 |
380 | The Four Maxims of Concurrency
381 | ------------------------------
382 |
383 | After grappling with concurrency over many years, I developed these four
384 | maxims:
385 |
386 | > 1. Don't do it
387 | > 2. Nothing is true and everything matters
388 | > 3. Just because it works doesn't mean it's not broken
389 | > 4. You must still understand it
390 |
391 | These apply to a number of languages. However, there do exist languages designed
392 | to prevent these issues.
393 |
394 | ### 1. Don't do it
395 |
396 | (And don't do it yourself).
397 |
398 | The easiest way to avoid entangling yourself in the profound problems produced
399 | by concurrency is not to do it. Although it can be seductive and seem safe
400 | enough to try something simple, the pitfalls are myriad and subtle. If you can
401 | avoid it, your life will be much easier.
402 |
403 | The *only* thing that justifies concurrency is speed. If your program isn't
404 | running fast enough---and be careful here, because just *wanting* it to run
405 | faster isn't justification---first apply a profiler to discover whether there's
406 | some other optimization you can perform.
407 |
408 | If you're compelled into concurrency, take the simplest, safest approach to the
409 | problem. Use well-known libraries and write as little of your own code as
410 | possible. With concurrency, there's no such thing as "too simple." Cleverness is
411 | your enemy.
412 |
413 | ### 2. Nothing is true and everything matters
414 |
415 | Programming without concurrency, you've come to expect a certain order and
416 | consistency in your world. With something as simple as setting a variable to a
417 | value, it's obvious it should always work properly.
418 |
419 | In concurrency-land, some things might be true and others are not, to the point
420 | where you must assume that nothing is true. You must question everything. Even
421 | setting a variable to a value might or might not work the way you expect, and it
422 | goes downhill from there. I've become familiar with the feeling of discovering
423 | that something I thought should obviously work, actually doesn't.
424 |
425 | All kinds of things you can ignore in non-concurrent programming suddenly become
426 | important with concurrency. For example, you must now know about the processor
427 | cache and the problems of keeping the local cache consistent with main memory.
428 | You must understand the deep complexities of object construction so that your
429 | constructor doesn't accidentally expose data to change by other threads. The
430 | list goes on.
431 |
432 | ### 3. Just because it works doesn't mean it's not broken
433 |
434 | You can easily write a concurrent program that appears to work but is actually
435 | broken, and the problem only reveals itself under the rarest of
436 | conditions---inevitably as a user problem after you've deployed the program.
437 |
438 | + You can't prove a concurrent program is correct, you can only (sometimes)
439 | prove it is incorrect.
440 |
441 | + Most of the time you can't even do that: If it's broken you probably won't
442 | be able to detect it.
443 |
444 | + You can't usually write useful tests, so you must rely on code inspection
445 | combined with deep knowledge of concurrency in order to discover bugs.
446 |
447 | + Even working programs only work under their design parameters. Most
448 | concurrent programs fail in some way when those design parameters are
449 | exceeded.
450 |
451 | In other areas of programming, we develop a sense of determinism. Everything
452 | happens as promised (or implied) by the language, which is comforting and
453 | expected---after all, the point of a programming language is to get the machine
454 | to do what we want. Moving from the world of deterministic programming into the
455 | realm of concurrent programming, we encounter a cognitive bias called the
456 | [Dunning-Kruger
457 | Effect](https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect) which can
458 | be summed up as "the less you know, the more you think you know." It means
459 | "...relatively unskilled persons suffer illusory superiority, mistakenly
460 | assessing their ability to be much higher than it really is."
461 |
462 | My own experience is that, no matter how certain you are that your code is
463 | thread-safe, it's probably broken. It's all too easy to be very sure you
464 | understand all the issues, then months or years later you discover some concept
465 | that makes you realize that most everything you've written is actually
466 | vulnerable to concurrency bugs. The compiler doesn't tell you when something is
467 | incorrect. To get it right you must hold all the issues of concurrency in your
468 | forebrain as you study your code.
469 |
470 | In all the non-concurrent areas of Python, "no obvious bugs" seems to mean that
471 | everything is OK. With concurrency, it means nothing. The very worst thing you
472 | can be in this situation is "confident."
473 |
474 | ### 4. You must still understand it.
475 |
476 | After maxims 1-3 you might be properly frightened of concurrency, and think,
477 | "I've avoided it up until now, perhaps I can just continue avoiding it."
478 |
479 | This is a rational reaction. You might know about other programming languages
480 | that are better designed to build concurrent programs---even ones that easily
481 | communication with Python). Why not write the concurrent parts in those
482 | languages and use Python for everything else?
483 |
484 | Alas, you cannot escape so easily:
485 |
486 | + Even if you never explicitly create a thread, frameworks you use
487 | might.
488 |
489 | + Here's the worst thing: when you create components, you must assume those
490 | components might be reused in a multithreading environment. Even if your
491 | solution is to give up and declare that your components are "not
492 | thread-safe," you must still know enough to realize that such a statement is
493 | important and what it means.
494 |
495 | Unfortunately, you don't get to choose when threads appear in your programs.
496 | Just because you never start a thread yourself doesn't mean you can avoid
497 | writing threaded code. For example, Web systems are one of the most common
498 | applications, and are inherently multithreaded---Web servers typically contain
499 | multiple processors, and parallelism is an ideal way to utilize these
500 | processors. As simple as such a system might seem, you must understand
501 | concurrency to write it properly.
502 |
503 | Python supports conccurrency, so concurrency issues are present whether you are
504 | aware of them or not. As a result, programs might just work by accident, or work
505 | most of the time and mysteriously break every now and again because of
506 | undiscovered flaws. Sometimes this breakage is relatively benign, but sometimes
507 | it means the loss of valuable data, and if you aren't at least aware of
508 | concurrency issues, you can end up assuming the problem is somewhere else rather
509 | than in your code. These kinds of issues can also be exposed or amplified if a
510 | program is moved to a multiprocessor system. Basically, knowing about
511 | concurrency makes you aware that apparently correct programs can exhibit
512 | incorrect behavior.
513 |
514 | Summary
515 | -------
516 |
517 | If concurrency were easy, there would be no reason to avoid it. Because it is
518 | hard, you should consider carefully whether it's worth the effort. Can you
519 | speed things up some other way? For example, move to faster hardware (which can
520 | be a lot less expensive than lost programmer time) or break your program into
521 | pieces and run those pieces on different machines?
522 |
523 | Occam's (or Ockham's) razor is an oft-misunderstood principle. I've seen at
524 | least one movie where they define it as "the simplest solution is the correct
525 | one," as if it's some kind of law. It's actually a guideline: When faced with a
526 | number of approaches, first try the one that requires the fewest assumptions. In
527 | the programming world, this has evolved into "try the simplest thing that could
528 | possibly work." When you know something about a particular tool, it can be quite
529 | tempting to use it, or to specify ahead of time that your solution must "run
530 | fast," to justify designing in concurrency from the beginning. But our
531 | programming version of Occam's razor says that you should try the simplest
532 | approach first (which will also be cheaper to develop) and see if it's good
533 | enough.
534 |
535 | As I came from a low-level background (physics and computer engineering), I was
536 | prone to imagining the cost of all the little wheels turning. I can't count the
537 | number of times I was certain the simplest approach could never be fast enough,
538 | only to discover upon trying that it was more than adequate. This was especially
539 | true with Python, where I imagined that the cost of the interpreter couldn't
540 | possibly keep up with the speed of C++ or Java, only to discover that the Python
541 | solution often ran as fast or even faster.
542 |
543 | ### Drawbacks
544 |
545 | The main drawbacks to concurrency are:
546 |
547 | 1. Slowdown while threads wait for shared resources.
548 |
549 | 2. Additional CPU overhead for thread management.
550 |
551 | 3. Unrewarded complexity from poor design decisions.
552 |
553 | 4. Pathologies such as starving, racing, deadlock, and livelock (multiple
554 | threads working individual tasks that the ensemble can't finish).
555 |
556 | 5. Inconsistencies across platforms. With some examples, I discovered race
557 | conditions that quickly appeared on some computers but not
558 | on others. If you develop a program on the latter, you might get badly
559 | surprised when you distribute it.
560 |
561 | In addition, there's an art to the application of concurrency. Python is
562 | designed to allow you to create as many objects as necessary to solve your
563 | problem---at least in theory.^[Creating millions of objects for finite-element
564 | analysis in engineering, for example, might not be practical in Python without
565 | the *Flyweight* design pattern.] However, because of overhead it's only possible
566 | to create a few thousand threads before running out of memory. You normally only
567 | need a handful of threads to solve a problem, so this is typically not much of a
568 | limit, but for some designs it becomes a constraint that might force you to use
569 | an entirely different scheme.
570 |
571 | #### The Shared-Memory Pitfall
572 |
573 | One of the main difficulties with concurrency occurs because more than one task
574 | might be sharing a resource---such as the memory in an object---and you must
575 | ensure that multiple tasks don't simultaneously read and change that resource.
576 |
577 | I have spent years studying and struggling with concurrency. I've learned you
578 | can never believe that a program using shared-memory concurrency is working
579 | correctly. You can discover it's wrong, but you can never prove it's right.
580 | This is one of the well-know maxims of concurrency.^[In science, a theory is
581 | never proved, but to be valid it must be *falsifiable*. With concurrency,
582 | we can't even get falsifiability most of the time.]
583 |
584 | I've met numerous people who have an impressive amount of confidence in their
585 | ability to write correct threaded programs. I occasionally start thinking I can
586 | get it right, too. For one particular program, I initially wrote it when we only
587 | had single-CPU machines. I was able to convince myself that, because of the
588 | promises I thought I understood, the program was correct. And it didn't fail on
589 | my single-CPU machine.
590 |
591 | Fast forward to machines with multiple CPUs. I was surprised when the program
592 | broke, but that's a fundamental problem with concurrency. You *can* actually
593 | discover some concurrency problems on a single-CPU machine, but there are other
594 | problems that won't appear until you try it on a multi-CPU machine, where your
595 | threads are actually running in parallel.
596 |
597 | You can never let yourself become too confident about your programming
598 | abilities when it comes to shared-memory concurrency.
599 |
--------------------------------------------------------------------------------