├── Resources └── Cover │ ├── CoverArt.jpg │ └── ConcurrentPythonCover.jpg ├── Xperimental ├── getUrls.py ├── AsyncioQueue.py └── TkinterDemo.py ├── README.md ├── .gitignore ├── Chapters ├── 03_Communicating_Sequential_Processes.md ├── 01_Preface.md ├── 00_Notes.md └── 02_Introduction.md └── events └── ConcurrentPythonDeveloperRetreat.html /Resources/Cover/CoverArt.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BruceEckel/ConcurrentPython/HEAD/Resources/Cover/CoverArt.jpg -------------------------------------------------------------------------------- /Resources/Cover/ConcurrentPythonCover.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BruceEckel/ConcurrentPython/HEAD/Resources/Cover/ConcurrentPythonCover.jpg -------------------------------------------------------------------------------- /Xperimental/getUrls.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import aiohttp 3 | 4 | async def fetch_page(url): 5 | response = await aiohttp.request('GET', "http://" + url) 6 | assert response.status == 200 7 | content = await response.read() 8 | print('URL: {0}: Content: {1}'.format(url, content)) 9 | 10 | 11 | loop = asyncio.get_event_loop() 12 | loop.run_until_complete(asyncio.wait(map(fetch_page, [ 13 | 'google.com', 'cnn.com', 'twitter.com' 14 | ]))) 15 | loop.close() 16 | -------------------------------------------------------------------------------- /Xperimental/AsyncioQueue.py: -------------------------------------------------------------------------------- 1 | import asyncio 2 | import random 3 | 4 | async def do_work(task_name, work_queue): 5 | while not work_queue.empty(): 6 | queue_item = await work_queue.get() 7 | print('{0} got: {1}'.format(task_name, queue_item)) 8 | await asyncio.sleep(random.random()) 9 | 10 | def execute(tasks): 11 | loop = asyncio.get_event_loop() 12 | loop.run_until_complete(asyncio.wait(tasks)) 13 | loop.close() 14 | 15 | if __name__ == "__main__": 16 | q = asyncio.Queue() 17 | 18 | [q.put_nowait(i) for i in range(20)] 19 | 20 | print(q) 21 | 22 | execute([asyncio.async(do_work('task' + str(t), q)) 23 | for t in range(3)]) 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # ConcurrentPython 2 | An intermediate-to-advanced book on Python concurrency 3 | 4 | > This is a work in progress. It is being created in the open in order to produce feedback and 5 | > ultimately make it as accurate as possible. 6 | 7 | * The ebook version will remain free, although I haven't yet decided on the specific license. 8 | 9 | * This book is written using [Pandoc](http://pandoc.org/)-flavored Markdown. Thus, although you can read the chapters 10 | directly in the `Chapters` subdirectory of this repository, you will see various artifacts from Pandoc's Markdown 11 | extensions that don't render via Github Markdown. These are typically minor issues and don't greatly impact readability. 12 | 13 | * The epub build system is not yet in place. 14 | -------------------------------------------------------------------------------- /Xperimental/TkinterDemo.py: -------------------------------------------------------------------------------- 1 | import tkinter as tk 2 | 3 | class Application(tk.Frame): 4 | def __init__(self, master=None): 5 | super().__init__(master) 6 | self.pack() 7 | self.create_widgets() 8 | 9 | def create_widgets(self): 10 | self.hi_there = tk.Button(self) 11 | self.hi_there["text"] = "Hello World\n(click me)" 12 | self.hi_there["command"] = self.say_hi 13 | self.hi_there.pack(side="top") 14 | 15 | self.quit = tk.Button(self, text="QUIT", fg="red", 16 | command=root.destroy) 17 | self.quit.pack(side="bottom") 18 | 19 | def say_hi(self): 20 | print("hi there, everyone!") 21 | 22 | root = tk.Tk() 23 | app = Application(root) 24 | app.mainloop() 25 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | 6 | # C extensions 7 | *.so 8 | 9 | # Distribution / packaging 10 | .Python 11 | env/ 12 | build/ 13 | develop-eggs/ 14 | dist/ 15 | downloads/ 16 | eggs/ 17 | .eggs/ 18 | lib/ 19 | lib64/ 20 | parts/ 21 | sdist/ 22 | var/ 23 | *.egg-info/ 24 | .installed.cfg 25 | *.egg 26 | 27 | # PyInstaller 28 | # Usually these files are written by a python script from a template 29 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 30 | *.manifest 31 | *.spec 32 | 33 | # Installer logs 34 | pip-log.txt 35 | pip-delete-this-directory.txt 36 | 37 | # Unit test / coverage reports 38 | htmlcov/ 39 | .tox/ 40 | .coverage 41 | .coverage.* 42 | .cache 43 | nosetests.xml 44 | coverage.xml 45 | *,cover 46 | .hypothesis/ 47 | 48 | # Translations 49 | *.mo 50 | *.pot 51 | 52 | # Django stuff: 53 | *.log 54 | local_settings.py 55 | 56 | # Flask stuff: 57 | instance/ 58 | .webassets-cache 59 | 60 | # Scrapy stuff: 61 | .scrapy 62 | 63 | # Sphinx documentation 64 | docs/_build/ 65 | 66 | # PyBuilder 67 | target/ 68 | 69 | # IPython Notebook 70 | .ipynb_checkpoints 71 | 72 | # pyenv 73 | .python-version 74 | 75 | # celery beat schedule file 76 | celerybeat-schedule 77 | 78 | # dotenv 79 | .env 80 | 81 | # virtualenv 82 | venv/ 83 | ENV/ 84 | 85 | # Spyder project settings 86 | .spyderproject 87 | 88 | # Rope project settings 89 | .ropeproject 90 | -------------------------------------------------------------------------------- /Chapters/03_Communicating_Sequential_Processes.md: -------------------------------------------------------------------------------- 1 | Communicating Sequential Processes 2 | ================================== 3 | 4 | The biggest problem in concurrency is that tasks can interfere with each other. 5 | There are certainly other problems, but this is the biggest. This interference 6 | generally appears in the form of two tasks attempting to read and write the 7 | same data storage. Because the tasks run independently, you can't know which 8 | one has modified the storage, so the data is effectively corrupt. This is the 9 | problem of *shared-memory concurrency*. 10 | 11 | You will see later in this book that there are concurrency strategies which 12 | attempt to solve the problem by locking the storage while one task is using 13 | it so other tasks are unable to read or write that storage. Although there is 14 | not yet a conclusive proof, some people believe that this dance is so tricky and 15 | complicated that it's impossible to write a correct program of any complexity 16 | using shared-memory concurrency. 17 | 18 | One solution to this problem is to altogether eliminate the possibility of 19 | shared storage. Each task is isolated, and the only way to communicate with 20 | other tasks is through controlled channels that safely pass data from one task 21 | to another. This is the general description of *communicating sequential 22 | processes* (CSP). The *sequential* term means that, within any process, you can 23 | effectively ignore the fact that you are working within a concurrent world and 24 | program as you normally do, sequentially from beginning to end. By defending you 25 | from shared-memory pitfalls, CSP allows you to think more simply about the 26 | problem you're solving. 27 | 28 | This book explores a number of strategies that implement CSP, but the easiest 29 | place to start is probably Python's built-in `multiprocessing` module. It turns 30 | out that `multiprocessing` is not a seamless implementation of CSP, but we 31 | shall start by pretending that it is and ignoring the "leaks" in the abstraction. 32 | This produces a fairly nice introduction to CSP, and later in the book we can 33 | address the leaks as necessary. 34 | -------------------------------------------------------------------------------- /events/ConcurrentPythonDeveloperRetreat.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | Concurrent Python Developer Retreat — July 16-19 2017 4 | 5 | 6 | The Concurrent Python Developer Retreat 7 | ======================================= 8 | ## Featuring <a href="https://us.pycon.org/2016/speaker/profile/32/" target="_blank">Luciano Ramalho</a> 9 | ### Author of <a href="http://shop.oreilly.com/product/0636920032519.do">Fluent Python</a> 10 | #### Crested Butte, Colorado, July 16-19 2017 11 | 12 | <hr style="height:10px;border-width:0;color:LightSkyBlue;background-color:LightSkyBlue"> 13 | ### The Retreat 14 | 15 | We'll be exploring topics and tools around my book project 16 | <a href="http://www.ConcurrentPython.com">Concurrent Python</a>. 17 | Although I've added material based on the *Concurrency* chapter from 18 | <a href="www.OnJava8.com">On Java 8</a>, 19 | this book is still in its research and organization phase; I expect it to be a couple of 20 | years before I consider it finished. The electronic version of the book is meant to 21 | be publicly available during and after development. The book is written in (Pandoc-flavored) 22 | Markdown so you can read it directly in the Github repository. 23 | 24 | **Prerequisites**: The book is intended for readers who know Python but don't know anything about 25 | concurrency. Thus, for the retreat you should know Python, but I don't expect any concurrency knowledge. 26 | 27 | - Miguel Grinberg's [Asynchronous Python for the Complete Beginner]( 28 | https://www.youtube.com/watch?v=iG6fr81xHKA) is a nice 29 | overview of some of the core concepts. 30 | 31 | <hr style="height:10px;border-width:0;color:LightSkyBlue;background-color:LightSkyBlue"> 32 | ### Activities 33 | 34 | The following activities may occur as desired: 35 | 36 | - Hikes and other outdoor adventures 37 | - Meals at local restaurants 38 | - Barbeques, other evening activities 39 | 40 | <hr style="height:10px;border-width:0;color:LightSkyBlue;background-color:LightSkyBlue"> 41 | ### Price 42 | 43 | - Minimum: $75 44 | 45 | - **Suggested (pay what you can):** 46 | 47 | - You&rsquo;re paying: $250 48 | 49 | - Your Employer is paying: $500 50 | 51 | <hr style="height:10px;border-width:0;color:LightSkyBlue;background-color:LightSkyBlue"> 52 | ### Location 53 | 54 | - Bruce Eckel&rsquo;s living room, Crested Butte, CO (WiFi, TV Monitor w/ Chromecast, Printer/Scanner and 55 | Kitchen). We can comfortably fit 12 people. 56 | 57 | - For variety, we go to coffee shops and group houses. 58 | 59 | - If people are staying in the hostel, that has a nice community space as well. 60 | 61 | - &ldquo;Townie&rdquo; bikes can be rented during the event to more easily get around town. 62 | 63 | - **Note:** Although you can find lodging &mdash; even nice hotels &mdash; up on the mountain 64 | (called *Mount* Crested Butte, 3 miles away with a free bus running to town), it's much nicer 65 | to stay in town (just called "Crested Butte," no "Mount"), so try to do that if you can. 66 | 67 | <hr style="height:10px;border-width:0;color:LightSkyBlue;background-color:LightSkyBlue"> 68 | ### Getting to Crested Butte 69 | 70 | Details <a href="https://wintertechforum.github.io/travel/" target="_blank">here</a> 71 | 72 | <hr style="height:10px;border-width:0;color:LightSkyBlue;background-color:LightSkyBlue"> 73 | ### Lodging 74 | 75 | Details <a href="https://wintertechforum.github.io/lodging/" target="_blank">here</a> 76 | 77 | <hr style="height:10px;border-width:0;color:LightSkyBlue;background-color:LightSkyBlue"> 78 | ### Registration 79 | 80 | Add yourself to the <a href="https://groups.google.com/d/forum/concurrentpythondeveloperretreat">mailing list</a> 81 | (click on "subscribe to this group"). 82 | 83 | <hr style="height:10px;border-width:0;color:LightSkyBlue;background-color:LightSkyBlue"> 84 | ### Questions? 85 | Email <mindviewinc@gmail.com> 86 | 87 | 88 | 89 | 90 | -------------------------------------------------------------------------------- /Chapters/01_Preface.md: -------------------------------------------------------------------------------- 1 | Preface 2 | ======= 3 | 4 | > This is an intermediate-to-advanced book on concurrency in Python. 5 | 6 | The reader is assumed to already be fluent in the Python language. There are a 7 | large number and variety of resources available to achieve this fluency, many of 8 | them free. 9 | 10 | In addition, I use the latest version of Python available at this writing, 11 | version 3.6, which includes important concurrency features. If you are wedded to 12 | an earlier version of Python, this book might not be for you. 13 | 14 | That said, this book *is* an introduction to the concepts and applications of 15 | concurrency in Python, for a beginner *to those topics*. I start from scratch, 16 | assuming you've never heard the term before or any of the surrounding ideas. 17 | This is a very reasonable approach because concurrency is relatively orthogonal 18 | to the rest of programming and it's entirely possible to be an expert in 19 | everything about a language for years before exploring concurrency for the first 20 | time. 21 | 22 | A Strange but Pragmatic Approach 23 | -------------------------------- 24 | 25 | Virtually every discussion of concurrency starts with the complex, low-level 26 | nuts and bolts and then slowly works up to the more elegant high-level 27 | solutions. This is understandable because it's how we teach everything else. 28 | 29 | Concurrency is a strange topic in just about every conceivable way. As you shall 30 | learn in this book, the *only* justification for wading into the complexity and 31 | tribulations of concurrency is to make your program run faster. And if that's 32 | truly what you need, you probably *don't* need to learn everything about 33 | concurrency. What you'd really like is the shortest, easiest path to a faster 34 | program. 35 | 36 | So that's how I organize the book. After explaining what concurrency is, we'll 37 | start with the simplest and easiest---and typically, the highest 38 | level---approaches to speeding up your program with concurrency. As the book 39 | progresses, the techniques will become lower-level, messier, and more 40 | complicated. The goal is that you'll go only as far as you need in order to 41 | solve your speed problem, and after that you'll only go further if you're 42 | interested, or when you come back later with a new problem. 43 | 44 | eBook First 45 | ----------- 46 | 47 | This book is developed in the open, as an eBook. Although I haven't yet settled 48 | on a license, my intent is for the eBook to remain free. 49 | 50 | At the 2016 Pycon in Portland, I held an open-spaces session around this book, 51 | and also attended a dinner where we talked about it. In both cases people argued 52 | that, while they really liked the idea of a free eBook, eventually they want a 53 | print book. Once the eBook version has been out long enough to evolve and settle 54 | down, and also to get adequate feedback to eliminate errors, I will revisit the 55 | issue and consider creating a print version of the book. 56 | 57 | Business Model 58 | -------------- 59 | 60 | My two most recent books, [Atomic Scala](www.AtomicScala.com) and [On Java 61 | 8](www.OnJava8.com), were not free. In the process of developing those books I 62 | became aware that a big part of that choice was that---while I enjoyed learning 63 | about those languages and writing those books---I ultimately didn't want to do 64 | further work in those areas, so there was no additional financial support 65 | (seminars, consulting, etc.) created from those books. They needed to be their 66 | own endpoints. 67 | 68 | The Python community has called to me ever since I discovered it. I have 69 | postponed that call for a couple of decades, with the excuse "let me just finish 70 | this one thing, then I'll come join you" (an all-too-familiar refrain heard by 71 | the partners of computer programmers). I evolve slowly, but I know now that the 72 | Python community is where I'm happiest. 73 | 74 | While the thought of struggling with the arbitrary limitations and roadblocks 75 | of, say, the Java language (for details, see [On Java 8](www.OnJava8.com)) in a 76 | training or consulting context fills me with angst, the vision of working with 77 | people who already know the productivity and delights of Python fills me with 78 | joy. That's where I want to live, and I want this book to lay the groundwork for 79 | conferences, developer retreats, consulting, training, and other experiences. I 80 | believe it will satisfy my need for fun and fulfillment. 81 | 82 | I could certainly have taken the approach I've used in my other books and 83 | created an introductory Python book, but there are a vast number of those which 84 | do an outstanding job. I don't see myself making a big contribution there. 85 | Instead, I chose something I think most folks experience as quite challenging. 86 | Over the decades, I have had periodic, intense bouts with concurrency. I've come 87 | away from each of these believing that, finally, I understand the topic. And 88 | each time I've discovered later there was some gaping hole in my knowledge. You 89 | will discover that concurrency consistently produces excellent examples of the 90 | [Dunning-Kruger 91 | Effect](https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect), where the 92 | less you know, the more you think you know. I know enough about concurrency to 93 | know there can *always* appear some surprising hole in my knowlege. 94 | 95 | Source Code 96 | ----------- 97 | 98 | All the source code for this book is available as copyrighted freeware, 99 | distributed along with the sources for the book's text, via 100 | [Github](https://github.com/BruceEckel/ConcurrentPython). To make sure you get 101 | the most current version, this is the official code distribution site. 102 | 103 | Coding Standards 104 | ---------------- 105 | 106 | In the text of this book, identifiers (keywords, methods, variables, and class 107 | names) are set in bold, fixed-width `code font`. Some keywords, such as `class`, 108 | are used so much that the bolding can become tedious. Those which are 109 | distinctive enough are left in normal font. 110 | 111 | The code files in this book are tested with an automated system, and should work 112 | without compiler errors (except those specifically tagged) in the latest version 113 | of Python. 114 | 115 | Bug Reports 116 | ----------- 117 | 118 | No matter how many tools a writer uses to detect errors, some always creep in 119 | and these often leap off the page for a fresh reader. If you discover anything 120 | you believe to be an error, please submit the error along with your suggested 121 | correction, for either the book's prose or examples, 122 | [here](https://github.com/BruceEckel/ConcurrentPython/issues). Your help is 123 | appreciated. 124 | 125 | Mailing List 126 | ------------ 127 | 128 | For news and notifications, you can subscribe to the low-volume email list at 129 | [www.ConcurrentPython.com](http://www.ConcurrentPython.com). I don't use ads and 130 | strive to make the content as appropriate as possible. 131 | 132 | Colophon 133 | -------- 134 | 135 | This book was written with Pandoc-flavored Markdown, and produced into ePub 136 | version 3 format using [Pandoc](http://pandoc.org/). 137 | 138 | The body font is Georgia and the headline font is Verdana. The code font is 139 | Ubuntu Mono, because it is especially compact and allows more characters on a 140 | line without wrapping. I chose to place the code inline (rather than make 141 | listings into images, as I've seen some books do) because it is important to me 142 | that the reader be able to resize the font of the code listings when they resize 143 | the body font (otherwise, really, what's the point?). 144 | 145 | The build process for the book is automated, as well as the process to extract, 146 | compile and test the code examples. All automation is achieved through fairly 147 | extensive programs I wrote in Python 3. 148 | 149 | ### Cover Design 150 | 151 | 152 | Thanks 153 | ------ 154 | 155 | 156 | Dedication 157 | ---------- 158 | 159 | -------------------------------------------------------------------------------- /Chapters/00_Notes.md: -------------------------------------------------------------------------------- 1 | General 2 | ======= 3 | 4 | Concurrency: Taking a program that isn't running fast enough, breaking it into 5 | pieces, and "running those pieces separately." The what and how of "running separately" 6 | is where all the details and complexity lie for the various concurrency strategies. 7 | 8 | - On top of this, a small fraction of problems use some concurrency solutions as a structuring mechanism. 9 | Usually the driving force is "not fast enough" but sometimes (ironically) it can be 10 | "too complicated." 11 | 12 | The concept of whether something is synchronous refers to when a function 13 | finishes vs. when a function returns. In the vast majority of Python code a 14 | function returns when it finishes, so these two points are identical, that is, 15 | *synchronous*. But with an asynchronous call, the function returns control to 16 | the caller *before* the function finishes---typically much sooner. So the two 17 | events, returning and finishing, now happen at different points in time: they 18 | are *asynchronous*. 19 | 20 | If tasks don’t wait on each other then they are compute intensive 21 | 22 | Ideally, make tasks that don’t block on other tasks (deadlock prone) 23 | 24 | 25 | ## For Study and Exploration 26 | > Feel free to pull-request something you think might be helpful here 27 | 28 | - Trio 29 | - [docs](https://trio.readthedocs.io/en/latest/) 30 | - [Difference between asyncio and curio](https://vorpus.org/blog/some-thoughts-on-asynchronous-api-design-in-a-post-asyncawait-world/) 31 | - [Docs](http://curio.readthedocs.io/en/latest/) 32 | - [Repo](https://github.com/dabeaz/curio) 33 | 34 | - General Concurrency 35 | - [Ted Leung](http://www.slideshare.net/twleung/a-survey-of-concurrency-constructs) 36 | - [The Art of Multiprocessor Programming, Revised Reprint](http://amzn.to/2j1oneL) 37 | - [A Helpful Reading List of some selected topics](https://github.com/python-trio/trio/wiki/Reading-list) 38 | - [Grok the GIL Write Fast And Thread Safe Python ](https://www.youtube.com/watch?v=7SSYhuk5hmc&t=861s) 39 | 40 | - Actors: 41 | - [Thespian Docs](http://godaddy.github.io/Thespian/doc/) 42 | - [Thespian Motivation](https://engineering.godaddy.com/why-godaddy-built-an-actor-system-library/) 43 | - Can Ponylang interface with Python? 44 | - [dramatiq: Simple distributed task processing for Python 3]( 45 | https://github.com/Bogdanp/dramatiq) 46 | 47 | - Async: 48 | - [Christian Medina](https://hackernoon.com/threaded-asynchronous-magic-and-how-to-wield-it-bba9ed602c32#.l8tws7nkv) 49 | - [PyMotw overview](https://pymotw.com/3/asyncio/index.html) 50 | - [Web Crawler, Guido et. al.](http://aosabook.org/en/500L/a-web-crawler-with-asyncio-coroutines.html) 51 | - [Mike Bayer on Async](http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/) 52 | - [Youtube: asyncawait and asyncio in Python 3 6 and beyond](https://www.youtube.com/watch?v=2ZFFv-wZ8_g) 53 | - uvloop: faster replacement for the built-in asyncio event loop. 54 | 55 | - Communicating Sequential Processes (CSP) 56 | - [PyCSP](https://github.com/runefriborg/pycsp/wiki) 57 | - [PyCSP Slides](http://arild.github.io/csp-presentation/#1) 58 | 59 | - [Join Calculus](https://en.wikipedia.org/wiki/Join-calculus) 60 | - [Chymyst — declarative concurrency in Scala](https://github.com/Chymyst/chymyst-core/) -- good resource for 61 | ideas and examples. 62 | - [Join-calculus for Python](https://github.com/maandree/join-python) 63 | 64 | - AioHTTP: 65 | - [Docs](http://aiohttp.readthedocs.io/en/stable/) 66 | - [An Intro](http://stackabuse.com/python-async-await-tutorial/) 67 | 68 | - Remote Objects 69 | - [Pyro4](https://pythonhosted.org/Pyro4/) Mature Python remote object implementation. 70 | 71 | - Calling Go from Python (example of taking advantage of another language's concurrency model) 72 | - [Example](https://github.com/jbuberel/buildmodeshared/tree/master/gofrompython) 73 | (For windows it would have to be done in the bash shell but that might be OK) 74 | - RabbitMQ allows cross-language calls. 75 | 76 | - An example: Each task controls a pixel/block on the screen. Produces a visual, 77 | graphic idea of what's going on. 78 | - Which graphics library to use? 79 | - [Processing](http://py.processing.org/) 80 | - [Tkinter](http://stackoverflow.com/questions/4842156/manipulating-individual-pixel-colors-in-the-tkinter-canvas-widget) 81 | 82 | - Asyncio example: Webcomic Reader 83 | - Text-file database keeps track of last comic read 84 | - Each comic is simply opened in a browser tab 85 | 86 | - [DASK parallel computing library for analytic computing](https://dask.pydata.org) 87 | 88 | - [Pachyderm Pipeline System](http://docs.pachyderm.io/en/latest/reference/pachyderm_pipeline_system.html) 89 | - [Slack channel]( http://slack.pachyderm.io/) 90 | - "Our official release for the CLI are for OSX and Linux as of now. However, we do have windows users that work with Pachyderm via the new Linux subsystem in Windows 10. Also, the CLI is only one choice for interacting with Pachyderm. You can also use the Python, Go, or other client, which should work just fine on Windows." 91 | 92 | - Queue-based Concurrency 93 | - Celery on Rabbit MQ 94 | 95 | - Misc 96 | - Bridge between Python and Java: https://www.py4j.org/ 97 | - Cython for speed, making it easier: https://github.com/AlanCristhian/statically 98 | - Compiler: https://nuitka.net/ 99 | 100 | What Confuses You About Concurrency? 101 | ==================================== 102 | (Notes from an open-spaces session at Pycon 2017) 103 | 104 | - I don't want to think about it. 105 | - What about testing? New different kinds of failure modes introduced by concurrency. 106 | - Making sure an event is handled. 107 | - Martin Fowler's recent Youtube presentation on event-driven programming. 108 | - Twelve-factor application (Heroku came up with this term and has the list) to make things easily scalable 109 | - How to write things that can easily be scaled without being a hassle 110 | - Does async and await preclude gevent, twisted, etc. 111 | - How do I write code/libraries compliant with async and await? 112 | 113 | Tools 114 | ===== 115 | 116 | - Pipenv 117 | - Codeclimate 118 | 119 | 120 | Miscellaneous 121 | ============= 122 | 123 | > Many of these were collected for a general Python book, not necessarily for concurrency 124 | 125 | - [Nice intro to Python debugging](https://blog.sentry.io/2017/06/22/debugging-python-errors) 126 | 127 | - Code style checkers: flake8 and hacking 128 | 129 | - ptpython: better python REPL 130 | 131 | - logging instead of print() 132 | 133 | - tabulate and texttable for generating text tables. 134 | 135 | - Peewee ORM as database example, data storage 136 | 137 | - Debuggers: 138 | PuDB 139 | WDB web debugger 140 | 141 | - Testing: 142 | Behave, Gherkin, assertpy, hypothesis, robotframework 143 | https://testrun.org/tox/latest/ 144 | 145 | - matplotlib 146 | 147 | - tqdm progress bar 148 | 149 | - Simple object serialization: http://marshmallow.readthedocs.org/en/latest/ 150 | 151 | - Auto-reformatters: 152 | https://github.com/google/yapf 153 | 154 | - List example: 155 | things_i_love = ["meaningful examples", "irony", "lists"] 156 | 157 | - "Fight with the problem, not with the language" 158 | 159 | - http://www.pythontutor.com/ 160 | 161 | - Python 3 special but basic features: 162 | https://asmeurer.github.io/python3-presentation/slides.html 163 | 164 | - PUDB debugger (screen text based) 165 | 166 | - Formatting: https://pypi.python.org/pypi/autopep8/ 167 | 168 | - Design by contract: http://andreacensi.github.io/contracts/ 169 | 170 | - dbm for databases ("Storing data") 171 | 172 | - Interacting with C? (cffi) 173 | 174 | - Requests library for HTTP http://docs.python-requests.org/en/latest/ 175 | 176 | - https://testrun.org/tox/latest/ 177 | 178 | - Beck Design Rules: http://martinfowler.com/bliki/BeckDesignRules.html 179 | 180 | - Bound Inner Classes: http://code.activestate.com/recipes/577070-bound-inner-classes/ 181 | 182 | - Message broker: 183 | http://www.rabbitmq.com/tutorials/tutorial-one-python.html 184 | 185 | - Download, install, configure: 186 | https://github.com/princebot/pythonize 187 | 188 | - Functional programming: 189 | http://toolz.readthedocs.org/en/latest/api.html 190 | 191 | - Things that should be builtins? 192 | https://boltons.readthedocs.org/en/latest/ 193 | 194 | - Best Python Quotes (for text examples): 195 | http://www.telegraph.co.uk/comedy/comedians/monty-python-s-25-funniest-quotes/life-of-brian-tedious-prophet/ 196 | 197 | - Single interface to environment variables’, configuration files’, and command line arguments’ provided values: 198 | https://pypi.python.org/pypi/crumbs 199 | 200 | - "Hidden" features: 201 | http://stackoverflow.com/questions/101268/hidden-features-of-python 202 | -------------------------------------------------------------------------------- /Chapters/02_Introduction.md: -------------------------------------------------------------------------------- 1 | Introduction 2 | ============ 3 | 4 | > "Double, double toil and trouble; Fire burn and caldron bubble."---William 5 | > Shakespeare, *MacBeth* 6 | 7 | The only justification for concurrency is if your program doesn't run fast 8 | enough. There are a few languages designed to make concurrency relatively 9 | effortless---at least, their particular flavor of concurrency, which might or 10 | might not fit your needs---but these are not yet the most popular programming 11 | languages. Python does what it can to make concurrency "Pythonic," but you must 12 | still work within the limitations of a language that wasn't designed around 13 | concurrency. 14 | 15 | You can be thoroughly fluent in Python and know little to nothing about 16 | concurrency. Indeed, for this book I expect those exact credentials. This means, 17 | however, that diving into concurrency is a test of patience. You, a competent 18 | programmer, must suffer the indignity of being thrown back to "beginner" status, 19 | to learn new fundamental concepts (when, by this time, you thought you were 20 | an accomplished programmer). 21 | 22 | I say all this to give you one last chance to rethink your strategy and consider 23 | whether there might be some other way to make your program run faster. There are 24 | many simpler approaches: 25 | 26 | * Faster hardware is probably much cheaper than programmer time. 27 | 28 | * Have you upgraded to the latest version of Python? That often produces 29 | speed improvements. 30 | 31 | * Use a profiler to discover your speed bottleneck, and see if you can change 32 | an algorithm. For example, sometimes the choice of data structure makes all 33 | the difference. 34 | 35 | * Try using [Cython](http://cython.org/) on functions with performance bottlenecks. 36 | 37 | * See if your program will run on [PyPy](http://pypy.org/). 38 | 39 | * If your bottlenecks are math-intensive, consider [Numba](http://numba.pydata.org/). 40 | 41 | * Search the Internet for other performance approaches. 42 | 43 | If you jump right to concurrency without exploring these other approaches first, 44 | you might still discover that your problem is dominated by one of these issues 45 | and you must use them anyway, after a complicated struggle with concurrency. You 46 | might also need to use them in addition to a concurrency approach. 47 | 48 | What Does the Term *Concurrency* Mean? 49 | -------------------------------------- 50 | 51 | As a non-concurrent programmer, you think in linear terms: A program runs from 52 | beginning to end, performing all its intermediate steps in sequence. This is the 53 | easiest way to think about programming. 54 | 55 | Concurrency breaks a program into pieces, typically called *tasks*. As much as 56 | possible, these tasks run independently of each other, with the hope that the 57 | whole program runs faster. 58 | 59 | That's concurrency in a nutshell: Independently-running tasks. 60 | 61 | At this point your mind should be filled with questions: 62 | 63 | * How do I start a task? 64 | 65 | * How do I stop a task? 66 | 67 | * How do I get information into a task? 68 | 69 | * How do I get results from a task? 70 | 71 | * What mechanism drives the tasks? 72 | 73 | * How do tasks fail? 74 | 75 | * Can tasks communicate with each other? 76 | 77 | * What happens if two tasks try to use the same piece of storage? 78 | 79 | These are not naive---they are precisely the questions you must ask to 80 | understand concurrency. There is no one answer to any of them. In fact, these 81 | questions distinguish the different concurrency strategies. In addition, each 82 | strategy produces a different set of rules governing how you write concurrent 83 | code. Different strategies also have domains of application where they shine, 84 | and other domains where they don't provide much benefit, or can even produce 85 | slower results. 86 | 87 | The term "concurrency" is often defined inconsistently in the literature. One 88 | of the more common distinctions declares concurrency to be when all tasks are 89 | driven by a single processor, vs *parallelism* where tasks are distributed 90 | among multiple processors. There are (mostly historical) reasons for this 91 | difference, but in this book I relegate "the number of processors driving the 92 | tasks" as one of the many variables involved with the general problem of 93 | concurrency. Concurrency doesn't increase the number of CPU cycles you have 94 | available---it tries to use them better. 95 | 96 | Concurrency is initially overwhelming precisely because it is a general goal 97 | ("make a program faster using tasks") with myriad strategies to achieve that 98 | goal (and more strategies regularly appear). The overwhelm diminishes when you 99 | understand it from the perspective of different competing strategies for the 100 | same problem. 101 | 102 | This book takes the pragmatic approach of only giving you what you need to solve 103 | your problem, presenting the simplest strategies first whenever possible. It's 104 | exceptionally difficult to understand *everything* about concurrency, so 105 | requiring that you do so in order to implement the simplest approach necessary 106 | to solve your problem is unreasonable and impractical. 107 | 108 | Each strategy has strengths and weaknesses. Typically, one strategy might solve 109 | some classes of problems quite well, while being relatively ineffective for 110 | other types of problems. Much of the art and craft of concurrency comes from 111 | understanding the different strategies and knowing which ones give better 112 | results for a particular set of constraints. 113 | 114 | After this introductory chapter, we'll start by exploring the concept of 115 | *Communicating Sequential Processes* (CSP), which confines each task inside its 116 | own world where it can't accidentally cause trouble for other tasks. Information 117 | is passed in and out through concurrency-safe *channels*. 118 | 119 | The CSP model is implemented within a number of concurrency strategies. We'll start 120 | by looking at the `multiprocessing` module that's part of the standard Python 121 | distribution. 122 | 123 | Next, we'll look at the idea of a *message queue*, which solves a specific but 124 | common type of problem ... We'll explore `RabbitMQ` and the `Celery` library 125 | which wraps it to make it more Pythonic. 126 | 127 | The next chapter explores how adding further constraints to CSP can produce 128 | significant benefits by learning about the *Actor Model*. ... 129 | 130 | The typical incentive for using multiple processors is when you have data that 131 | can be broken up into chunks and processed separately. There's a second common 132 | type of concurrency problem, which is when parts of your program spend time 133 | waiting on external operations---for example, Internet requests. In this 134 | situation it's not so much the number of processors you have but that they are 135 | stuck waiting on one thing when they might be working on something else. In 136 | fact, you can make much better use of a single processor by allowing it to 137 | jump from a place where it is waiting (*blocked*) to somewhere it can do some 138 | useful work. The Python 3.6 Asyncio and coroutines are targeted to this exact 139 | problem, and we'll spend the chapter exploring this strategy. 140 | 141 | A *foreign function call interface* allows you to make calls to code written in 142 | other languages. We can take advantage of this by calling into languages that 143 | are specifically designed to make concurrency easy. ... 144 | 145 | Finally we'll look at one of the more primitive and early constructs, the 146 | *thread*, along with the rather heavy constraint of Python's *global interpreter 147 | lock* (GIL). With all the other, better strategies available it's not clear that 148 | you actually need to understand things at this level, but it's something you 149 | might need to know if someone asks the question "what about threads?" 150 | 151 | [Add other topics here as they arise] 152 | 153 | ******************************************************************************* 154 | 155 | - What does "asynchronous" mean? 156 | * When a function finishes vs. when it returns 157 | * Again, not something you're used to thinking about 158 | * Because the answer has always been: "at the same time" 159 | 160 | > [NOTE] The following material is still in rough form. 161 | 162 | Concurrency Superpowers 163 | ----------------------- 164 | 165 | Imagine you're inside a science-fiction movie. You must search a tall building 166 | for a single item that is carefully and cleverly hidden in one of the ten 167 | million rooms of the building. You enter the building and move down a corridor. 168 | The corridor divides. 169 | 170 | By yourself it will take a hundred lifetimes to accomplish this task. 171 | 172 | Now suppose you have a strange superpower. You can split yourself in two, and 173 | send one of yourself down one corridor while you continue down the other. Every 174 | time you encounter a divide in a corridor or a staircase to the next level, you 175 | repeat this splitting-in-two trick. Eventually there is one of you for every 176 | terminating corridor in the entire building. 177 | 178 | Each corridor contains a thousand rooms. Your superpower is getting stretched a 179 | little thin, so you only make 50 of yourself to search the rooms in parallel. 180 | 181 | I'd love to be able to say, "Your superpower in the science-fiction movie? 182 | That's what concurrency is." That it's as simple as splitting yourself in two 183 | every time you have more tasks to solve. The problem is that any model we use 184 | to describe this phenomenon ends up being a leaky abstraction. 185 | 186 | Here's one of those leaks: In an ideal world, every time you cloned yourself, 187 | you would also duplicate a hardware processor to run that clone. But of course 188 | that isn't what happens---you actually might have four or eight processors on 189 | your machine (typical when this was written). You might also have more, and 190 | there are still lots of situations where you have only one processor. In the 191 | abstraction under discussion, the way physical processors are allocated not only 192 | leaks through but can even dominate your decisions. 193 | 194 | Let's change something in our science-fiction movie. Now when each clone 195 | searcher eventually reaches a door they must knock on it and wait until someone 196 | answers. If we have one processor per searcher, this is no problem---the 197 | processor just idles until the door is answered. But if we only have eight 198 | processors and thousands of searchers, we don't want a processor to be idle 199 | just because a searcher happens to be blocked, waiting for a door to be 200 | answered. Instead, we want that processor applied to a searcher where it can do 201 | some real work, so we need mechanisms to switch processors from one task to 202 | another. 203 | 204 | Many models are able to effectively hide the number of processors and allow you 205 | to pretend you have a very large number. But there are situations where this 206 | breaks down, when you must know the number of processors so you can work 207 | around that number. 208 | 209 | One of the biggest impacts depends on whether you have a single processor or 210 | more than one. If you only have one processor, then the cost of task-switching 211 | is also borne by that processor, and applying concurrency techniques to your 212 | system can make it run *slower*. 213 | 214 | This might make you decide that, in the case of a single processor, it never 215 | makes sense to write concurrent code. However, there are situations where the 216 | *model* of concurrency produces much simpler code and it's actually worth having 217 | it run slower to achieve that. 218 | 219 | In the case of the clones knocking on doors and waiting, even the 220 | single-processor system benefits from concurrency because it can switch from a 221 | task that is waiting (*blocked*) to one that is ready to go. But if all the 222 | tasks can run all the time, then the cost of switching slows everything down, 223 | and in that case concurrency usually only makes sense if you *do* have multiple 224 | processors. 225 | 226 | Suppose you are trying to crack some kind of encryption. The more workers trying 227 | to crack it at the same time, the better chance you have of finding the answer 228 | sooner. Here, each worker can constantly use as much processor time as you can 229 | give it, and the best situation is when each worker has their own processor---in 230 | this case (called a *compute-bound* problem), you should write the code so you 231 | *only* have as many workers as you have processors. 232 | 233 | In a customer-service department that takes phone calls, you only have a certain 234 | number of people, but you can have lots of phone calls. Those people (the 235 | processors) must work on one phone call at a time until it is complete, and 236 | extra calls must be queued. 237 | 238 | In the fairy tale of "The Shoemaker and the Elves," the shoemaker had too much 239 | work to do and when he was asleep, a group of elves came and made shoes for him. 240 | Here the work is distributed but even with a large number of physical processors 241 | the bottleneck is in the limitation of building certain parts of the shoe---if, 242 | for example, the sole takes the longest to make, that limits the rate of 243 | shoe creation and changes the way you design your solution. 244 | 245 | Thus, the problem you're trying to solve drives the design of the solution. 246 | There's the lovely abstraction of breaking a problem into subtasks that "run 247 | independently," then there's the reality of how it's actually going to happen. 248 | The physical reality keeps intruding upon, and shaking up, that abstraction. 249 | 250 | That's only part of the problem. Consider a factory that makes cakes. We've 251 | somehow distributed cake-making among workers, but now it's time for a 252 | worker to put their cake in a box. There's a box sitting there, ready to receive 253 | a cake. But before the worker can put the cake into the box, another worker 254 | darts in and puts *their* cake in the box instead! Our worker is already putting 255 | the cake in, and bam! The two cakes are smashed together and ruined. This is the 256 | common "shared memory" problem that produces what we call a *race condition*, 257 | where the result depends on which worker can get their cake in the box first 258 | (you typically solve the problem using a locking mechanism so one worker can 259 | grab the box first and prevent cake-smashing). 260 | 261 | This problem occurs when tasks that execute "at the same time" interfere with 262 | each other. It can happen in such a subtle and occasional manner that it's 263 | probably fair to say that concurrency is "arguably deterministic but effectively 264 | nondeterministic." That is, you can hypothetically write concurrent programs 265 | that, through care and code inspection, work correctly. In practice, however, 266 | it's much more common to write concurrent programs that only appear to work, but 267 | given the right conditions, will fail. These conditions might never actually 268 | occur, or occur so infrequently you never see them during testing. In fact, it's 269 | often impossible to write test code to generate failure conditions for your 270 | concurrent program. The resulting failures might only occur occasionally, and as 271 | a result they appear in the form of customer complaints. This is one of the 272 | strongest arguments for studying concurrency: If you ignore it, you're likely to 273 | get bitten. 274 | 275 | Concurrency thus seems fraught with peril, and if that makes you a bit fearful, 276 | this is probably a good thing. Although Python makes large improvements in 277 | concurrency, there are still no safety nets to tell you when you make a mistake. 278 | With concurrency, you're on your own, and only by being knowledgeable, 279 | suspicious and aggressive can you write reliable concurrent code. 280 | 281 | 282 | Concurrency is for Speed 283 | ------------------------ 284 | 285 | After hearing about the pitfalls of concurrent programming, you may rightly be 286 | wondering if it's worth the trouble. The answer is "no, unless your program 287 | isn't running fast enough." And you'll want to think carefully before deciding 288 | it isn't. Do not casually jump into the well of grief that is concurrent 289 | programming. If there's a way to run your program on a faster machine or if you 290 | can profile it and discover the bottleneck and swap in a faster algorithm, do 291 | that instead. Only if there's clearly no other choice should you begin using 292 | concurrency, and then only in isolated places. 293 | 294 | The speed issue sounds simple at first: If you want a program to run faster, 295 | break it into pieces and run each piece on a separate processor. With our 296 | ability to increase clock speeds running out of steam (at least for conventional 297 | chips), speed improvements are appearing in the form of multicore processors 298 | rather than faster chips. To make your programs run faster, you'll have to learn 299 | to take advantage of those extra processors, and that's one thing that 300 | concurrency gives you. 301 | 302 | If you have a multiprocessor machine, multiple tasks can be distributed across 303 | those processors, which can dramatically improve throughput. This is often the 304 | case with powerful multiprocessor Web servers, which can distribute large 305 | numbers of user requests across CPUs in a program that allocates one thread per 306 | request. 307 | 308 | However, concurrency can often improve the performance of programs running on a 309 | *single* processor. This can sound a bit counterintuitive. If you think about 310 | it, a concurrent program running on a single processor should actually have 311 | *more* overhead than if all the parts of the program ran sequentially, because 312 | of the added cost of the *context switch* (changing from one task to another). 313 | On the surface, it would appear cheaper to run all the parts of the program as a 314 | single task and save the cost of context switching. 315 | 316 | The issue that can make a difference is *blocking*. If one task in your program 317 | is unable to continue because of some condition outside of the control of the 318 | program (typically I/O), we say that the task or the thread *blocks* (in our 319 | science-fiction story, the clone has knocked on the door and is waiting for it 320 | to open). Without concurrency, the whole program comes to a stop until the 321 | external condition changes. If the program is written using concurrency, 322 | however, the other tasks in the program can continue to execute when one task is 323 | blocked, so the program continues to move forward. In fact, from a performance 324 | standpoint, it makes no sense to use concurrency on a single-processor machine 325 | unless one of the tasks might block. 326 | 327 | A common example of performance improvements in single-processor systems is 328 | *event-driven programming*, in particular user-interface programming. Consider a 329 | program that performs some long-running operation and thus ends up ignoring user 330 | input and being unresponsive. If you have a "quit" button, you don't want to 331 | poll it in every piece of code you write. This produces awkward code, without 332 | any guarantee that a programmer won't forget to perform the check. Without 333 | concurrency, the only way to produce a responsive user interface is for all 334 | tasks to periodically check for user input. By creating a separate task that 335 | responds to user input, the program guarantees a certain level of 336 | responsiveness. 337 | 338 | A straightforward way to implement concurrency is at the operating system level, 339 | using *processes*, which are different from threads. A process is a 340 | self-contained program running within its own address space. Processes are 341 | attractive because the operating system usually isolates one process from 342 | another so they cannot interfere with each other, which makes programming with 343 | processes relatively easy (this is supported by Python's `multiprocessing` 344 | module). In contrast, threads share resources like memory and I/O, so a 345 | fundamental difficulty in writing multithreaded programs is coordinating these 346 | resources between different thread-driven tasks, so they cannot be accessed by 347 | more than one task at a time. 348 | 349 | Some people go so far as to advocate processes as the only reasonable approach 350 | to concurrency,^[Eric Raymond, for example, makes a strong case in *The Art of 351 | UNIX Programming* (Addison-Wesley, 2004).] but unfortunately there are generally 352 | quantity and overhead limitations to processes that prevent their applicability 353 | across the concurrency spectrum. (Eventually you get used to the standard 354 | concurrency refrain, "That approach works in some cases but not in other cases"). 355 | 356 | Some programming languages are designed to isolate concurrent tasks from each 357 | other. These are generally called *functional languages*, where each function 358 | call produces no side effects (and so cannot interfere with other functions) and 359 | can thus be driven as an independent task. *Erlang* (and it's more modern 360 | variant, *Elixir*) is one such language, and it includes safe mechanisms for one 361 | task to communicate with another. If you find that a portion of your program 362 | must make heavy use of concurrency and you are running into excessive problems 363 | trying to build that portion, you might consider creating that part of your 364 | program in a dedicated concurrency language. 365 | 366 | Initially, Python took the more traditional approach of adding support for 367 | threading on top of a sequential language.^[It could be argued that trying to 368 | bolt concurrency onto a sequential language is a doomed approach, but you'll 369 | have to draw your own conclusions.] Instead of forking external processes in a 370 | multitasking operating system, threading creates tasks *within* the single 371 | process represented by the executing program. As we shall see, more modern 372 | approaches have since been added to Python. 373 | 374 | Concurrency imposes costs, including complexity costs, but can be outweighed by 375 | improvements in program design, resource balancing, and user convenience. In 376 | general, concurrency enables you to create a more loosely coupled design; 377 | otherwise, parts of your code would be forced to pay explicit attention to 378 | operations that would normally be handled by concurrency. 379 | 380 | The Four Maxims of Concurrency 381 | ------------------------------ 382 | 383 | After grappling with concurrency over many years, I developed these four 384 | maxims: 385 | 386 | > 1. Don't do it 387 | > 2. Nothing is true and everything matters 388 | > 3. Just because it works doesn't mean it's not broken 389 | > 4. You must still understand it 390 | 391 | These apply to a number of languages. However, there do exist languages designed 392 | to prevent these issues. 393 | 394 | ### 1. Don't do it 395 | 396 | (And don't do it yourself). 397 | 398 | The easiest way to avoid entangling yourself in the profound problems produced 399 | by concurrency is not to do it. Although it can be seductive and seem safe 400 | enough to try something simple, the pitfalls are myriad and subtle. If you can 401 | avoid it, your life will be much easier. 402 | 403 | The *only* thing that justifies concurrency is speed. If your program isn't 404 | running fast enough---and be careful here, because just *wanting* it to run 405 | faster isn't justification---first apply a profiler to discover whether there's 406 | some other optimization you can perform. 407 | 408 | If you're compelled into concurrency, take the simplest, safest approach to the 409 | problem. Use well-known libraries and write as little of your own code as 410 | possible. With concurrency, there's no such thing as "too simple." Cleverness is 411 | your enemy. 412 | 413 | ### 2. Nothing is true and everything matters 414 | 415 | Programming without concurrency, you've come to expect a certain order and 416 | consistency in your world. With something as simple as setting a variable to a 417 | value, it's obvious it should always work properly. 418 | 419 | In concurrency-land, some things might be true and others are not, to the point 420 | where you must assume that nothing is true. You must question everything. Even 421 | setting a variable to a value might or might not work the way you expect, and it 422 | goes downhill from there. I've become familiar with the feeling of discovering 423 | that something I thought should obviously work, actually doesn't. 424 | 425 | All kinds of things you can ignore in non-concurrent programming suddenly become 426 | important with concurrency. For example, you must now know about the processor 427 | cache and the problems of keeping the local cache consistent with main memory. 428 | You must understand the deep complexities of object construction so that your 429 | constructor doesn't accidentally expose data to change by other threads. The 430 | list goes on. 431 | 432 | ### 3. Just because it works doesn't mean it's not broken 433 | 434 | You can easily write a concurrent program that appears to work but is actually 435 | broken, and the problem only reveals itself under the rarest of 436 | conditions---inevitably as a user problem after you've deployed the program. 437 | 438 | + You can't prove a concurrent program is correct, you can only (sometimes) 439 | prove it is incorrect. 440 | 441 | + Most of the time you can't even do that: If it's broken you probably won't 442 | be able to detect it. 443 | 444 | + You can't usually write useful tests, so you must rely on code inspection 445 | combined with deep knowledge of concurrency in order to discover bugs. 446 | 447 | + Even working programs only work under their design parameters. Most 448 | concurrent programs fail in some way when those design parameters are 449 | exceeded. 450 | 451 | In other areas of programming, we develop a sense of determinism. Everything 452 | happens as promised (or implied) by the language, which is comforting and 453 | expected---after all, the point of a programming language is to get the machine 454 | to do what we want. Moving from the world of deterministic programming into the 455 | realm of concurrent programming, we encounter a cognitive bias called the 456 | [Dunning-Kruger 457 | Effect](https://en.wikipedia.org/wiki/Dunning%E2%80%93Kruger_effect) which can 458 | be summed up as "the less you know, the more you think you know." It means 459 | "...relatively unskilled persons suffer illusory superiority, mistakenly 460 | assessing their ability to be much higher than it really is." 461 | 462 | My own experience is that, no matter how certain you are that your code is 463 | thread-safe, it's probably broken. It's all too easy to be very sure you 464 | understand all the issues, then months or years later you discover some concept 465 | that makes you realize that most everything you've written is actually 466 | vulnerable to concurrency bugs. The compiler doesn't tell you when something is 467 | incorrect. To get it right you must hold all the issues of concurrency in your 468 | forebrain as you study your code. 469 | 470 | In all the non-concurrent areas of Python, "no obvious bugs" seems to mean that 471 | everything is OK. With concurrency, it means nothing. The very worst thing you 472 | can be in this situation is "confident." 473 | 474 | ### 4. You must still understand it. 475 | 476 | After maxims 1-3 you might be properly frightened of concurrency, and think, 477 | "I've avoided it up until now, perhaps I can just continue avoiding it." 478 | 479 | This is a rational reaction. You might know about other programming languages 480 | that are better designed to build concurrent programs---even ones that easily 481 | communication with Python). Why not write the concurrent parts in those 482 | languages and use Python for everything else? 483 | 484 | Alas, you cannot escape so easily: 485 | 486 | + Even if you never explicitly create a thread, frameworks you use 487 | might. 488 | 489 | + Here's the worst thing: when you create components, you must assume those 490 | components might be reused in a multithreading environment. Even if your 491 | solution is to give up and declare that your components are "not 492 | thread-safe," you must still know enough to realize that such a statement is 493 | important and what it means. 494 | 495 | Unfortunately, you don't get to choose when threads appear in your programs. 496 | Just because you never start a thread yourself doesn't mean you can avoid 497 | writing threaded code. For example, Web systems are one of the most common 498 | applications, and are inherently multithreaded---Web servers typically contain 499 | multiple processors, and parallelism is an ideal way to utilize these 500 | processors. As simple as such a system might seem, you must understand 501 | concurrency to write it properly. 502 | 503 | Python supports conccurrency, so concurrency issues are present whether you are 504 | aware of them or not. As a result, programs might just work by accident, or work 505 | most of the time and mysteriously break every now and again because of 506 | undiscovered flaws. Sometimes this breakage is relatively benign, but sometimes 507 | it means the loss of valuable data, and if you aren't at least aware of 508 | concurrency issues, you can end up assuming the problem is somewhere else rather 509 | than in your code. These kinds of issues can also be exposed or amplified if a 510 | program is moved to a multiprocessor system. Basically, knowing about 511 | concurrency makes you aware that apparently correct programs can exhibit 512 | incorrect behavior. 513 | 514 | Summary 515 | ------- 516 | 517 | If concurrency were easy, there would be no reason to avoid it. Because it is 518 | hard, you should consider carefully whether it's worth the effort. Can you 519 | speed things up some other way? For example, move to faster hardware (which can 520 | be a lot less expensive than lost programmer time) or break your program into 521 | pieces and run those pieces on different machines? 522 | 523 | Occam's (or Ockham's) razor is an oft-misunderstood principle. I've seen at 524 | least one movie where they define it as "the simplest solution is the correct 525 | one," as if it's some kind of law. It's actually a guideline: When faced with a 526 | number of approaches, first try the one that requires the fewest assumptions. In 527 | the programming world, this has evolved into "try the simplest thing that could 528 | possibly work." When you know something about a particular tool, it can be quite 529 | tempting to use it, or to specify ahead of time that your solution must "run 530 | fast," to justify designing in concurrency from the beginning. But our 531 | programming version of Occam's razor says that you should try the simplest 532 | approach first (which will also be cheaper to develop) and see if it's good 533 | enough. 534 | 535 | As I came from a low-level background (physics and computer engineering), I was 536 | prone to imagining the cost of all the little wheels turning. I can't count the 537 | number of times I was certain the simplest approach could never be fast enough, 538 | only to discover upon trying that it was more than adequate. This was especially 539 | true with Python, where I imagined that the cost of the interpreter couldn't 540 | possibly keep up with the speed of C++ or Java, only to discover that the Python 541 | solution often ran as fast or even faster. 542 | 543 | ### Drawbacks 544 | 545 | The main drawbacks to concurrency are: 546 | 547 | 1. Slowdown while threads wait for shared resources. 548 | 549 | 2. Additional CPU overhead for thread management. 550 | 551 | 3. Unrewarded complexity from poor design decisions. 552 | 553 | 4. Pathologies such as starving, racing, deadlock, and livelock (multiple 554 | threads working individual tasks that the ensemble can't finish). 555 | 556 | 5. Inconsistencies across platforms. With some examples, I discovered race 557 | conditions that quickly appeared on some computers but not 558 | on others. If you develop a program on the latter, you might get badly 559 | surprised when you distribute it. 560 | 561 | In addition, there's an art to the application of concurrency. Python is 562 | designed to allow you to create as many objects as necessary to solve your 563 | problem---at least in theory.^[Creating millions of objects for finite-element 564 | analysis in engineering, for example, might not be practical in Python without 565 | the *Flyweight* design pattern.] However, because of overhead it's only possible 566 | to create a few thousand threads before running out of memory. You normally only 567 | need a handful of threads to solve a problem, so this is typically not much of a 568 | limit, but for some designs it becomes a constraint that might force you to use 569 | an entirely different scheme. 570 | 571 | #### The Shared-Memory Pitfall 572 | 573 | One of the main difficulties with concurrency occurs because more than one task 574 | might be sharing a resource---such as the memory in an object---and you must 575 | ensure that multiple tasks don't simultaneously read and change that resource. 576 | 577 | I have spent years studying and struggling with concurrency. I've learned you 578 | can never believe that a program using shared-memory concurrency is working 579 | correctly. You can discover it's wrong, but you can never prove it's right. 580 | This is one of the well-know maxims of concurrency.^[In science, a theory is 581 | never proved, but to be valid it must be *falsifiable*. With concurrency, 582 | we can't even get falsifiability most of the time.] 583 | 584 | I've met numerous people who have an impressive amount of confidence in their 585 | ability to write correct threaded programs. I occasionally start thinking I can 586 | get it right, too. For one particular program, I initially wrote it when we only 587 | had single-CPU machines. I was able to convince myself that, because of the 588 | promises I thought I understood, the program was correct. And it didn't fail on 589 | my single-CPU machine. 590 | 591 | Fast forward to machines with multiple CPUs. I was surprised when the program 592 | broke, but that's a fundamental problem with concurrency. You *can* actually 593 | discover some concurrency problems on a single-CPU machine, but there are other 594 | problems that won't appear until you try it on a multi-CPU machine, where your 595 | threads are actually running in parallel. 596 | 597 | You can never let yourself become too confident about your programming 598 | abilities when it comes to shared-memory concurrency. 599 | --------------------------------------------------------------------------------