├── .gitignore ├── Chapter 01 - The Python Data Model.ipynb ├── Chapter 02 - An Array of Sequences.ipynb ├── Chapter 03 - Dictionaries and sets.ipynb ├── Chapter 04 - Text versus Bytes.ipynb ├── Chapter 05 - First-class functions.ipynb ├── Chapter 06 - Design patterns with first-class functions.ipynb ├── Chapter 07 - Function decorators and closures.ipynb ├── Chapter 08 - Object references, mutability and recycling.ipynb ├── cafe.txt ├── python.gif ├── rare_chars.txt ├── readme.md └── zen.txt /.gitignore: -------------------------------------------------------------------------------- 1 | /.ipynb_checkpoints 2 | *.bin 3 | 4 | *.sublime-project 5 | 6 | *.sublime-workspace 7 | -------------------------------------------------------------------------------- /Chapter 01 - The Python Data Model.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "raw_mimetype": "text/markdown" 7 | }, 8 | "source": [ 9 | "# The Python Data Model\n", 10 | "\n", 11 | "The Python data model formalizes the interfaces of the building blocks of the language. The Python interpreter invokes special methods to perform basic object operations. The special method names are always spelled with leading and trailing double underscores, i.e. `__getitem__`.\n", 12 | "\n", 13 | "The special method names allow your objects to interact with basic language constructs such as:\n", 14 | "- iteration\n", 15 | " - **for**\n", 16 | " - **while**\n", 17 | "- collections: This module implements specialized container datatypes providing alternatives to Python's general purpose built-in containers, dict, list, set, and tuple.\n", 18 | " - **namedtuple()** factory function for creating tuple subclasses with named fields \n", 19 | " - **deque** list-like container with fast appends and pops on either end \n", 20 | " - **Counter** dict subclass for counting hashable objects \n", 21 | " - **OrderedDict** dict subclass that remembers the order entries were added \n", 22 | " - **defaultdict** dict subclass that calls a factory function to supply missing values\n", 23 | "- attribute access\n", 24 | " - `object.attribute`\n", 25 | " - `getattr(obj, 'attribute')`\n", 26 | "- operator overloading\n", 27 | " - operator overloading is done by redefining certain special methods in any class eg\"\n", 28 | " ```class MyClass(object):\n", 29 | " def __add__(self, x):\n", 30 | " return '%s plus %s' % (self, x)```\n", 31 | "- function and method invocation. Methods are associated with object instances or classes; functions aren't. \n", 32 | " - method invocation\n", 33 | " - instance = Foo()\n", 34 | " - instance.foo(arg1,arg2)\n", 35 | " - function invocation:\n", 36 | " - Bar.foo(1,2)\n", 37 | "- object creation and destruction;\n", 38 | " - x = MyClass()\n", 39 | " - object destruction via the __del__ method\n", 40 | "- string representation and formatting\n", 41 | " - name = \"John\"\n", 42 | " - print(\"Hello, %s!\" % name)\n", 43 | "- managed contexts (i.e. with blocks)\n", 44 | " - automatically manage resources encapsulated within context manager types, or more generally performs startup and cleanup actions around a block of code. eg:\n", 45 | " ```with open('what_are_context_managers.txt', 'r') as infile:\n", 46 | " for line in infile:\n", 47 | " print('> {}'.format(line))```\n", 48 | "\n", 49 | "Nice [presentation of the Python data model](https://delapuente.github.io/presentations/python-datamodel/index.html#/4/1)" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 67, 55 | "metadata": { 56 | "collapsed": false 57 | }, 58 | "outputs": [], 59 | "source": [ 60 | "#Using collections here\n", 61 | "import collections\n", 62 | "from random import choice\n", 63 | "\n", 64 | "#named tuple in use\n", 65 | "Card = collections.namedtuple('Card', ['rank', 'suit'])\n", 66 | "\n", 67 | "class FrenchDeck:\n", 68 | " ranks = [str(n) for n in range(2, 11)] + list('JQKA') \n", 69 | " suits = 'spades diamonds clubs hearts'.split()\n", 70 | " \n", 71 | " def __init__(self):\n", 72 | " self._cards = [Card(rank, suit) for suit in self.suits for rank in self.ranks]\n", 73 | " \n", 74 | " def __len__(self):\n", 75 | " return len(self._cards)\n", 76 | " \n", 77 | " def __getitem__(self, position): \n", 78 | " return self._cards[position]" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 68, 84 | "metadata": { 85 | "collapsed": false 86 | }, 87 | "outputs": [ 88 | { 89 | "name": "stdout", 90 | "output_type": "stream", 91 | "text": [ 92 | "52\n", 93 | "Card(rank='2', suit='spades')\n", 94 | "Card(rank='A', suit='hearts')\n", 95 | "Card(rank='J', suit='hearts')\n" 96 | ] 97 | } 98 | ], 99 | "source": [ 100 | "deck = FrenchDeck()\n", 101 | "print(len(deck))\n", 102 | "print(deck[0])\n", 103 | "print(deck[-1])\n", 104 | "print(choice(deck))" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "By implementing the special methods `__len__` and `__getitem__` our FrenchDeck behaves like a standard Python sequence (i.e. str, unicode, list, tuple, buffer, xrange), allowing it to benefit from core language features — like iteration and slicing—and from the standard library\n", 112 | "\n", 113 | "Thress advantages of using special methods to leverage the Python Data Model:\n", 114 | "1. Standard operations have standard names, across objects\n", 115 | "1. Once standard operations have been implemented, rest of the the library is available\n", 116 | "1. Because `__getitem__` delegates to the [] operator of self._cards, our deck automatically supports slicing. " 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 69, 122 | "metadata": { 123 | "collapsed": false 124 | }, 125 | "outputs": [ 126 | { 127 | "name": "stdout", 128 | "output_type": "stream", 129 | "text": [ 130 | "[Card(rank='2', suit='spades'), Card(rank='3', suit='spades'), Card(rank='4', suit='spades')]\n", 131 | "[Card(rank='A', suit='spades'), Card(rank='A', suit='diamonds'), Card(rank='A', suit='clubs'), Card(rank='A', suit='hearts')]\n", 132 | "True\n", 133 | "False\n" 134 | ] 135 | } 136 | ], 137 | "source": [ 138 | "print(deck[:3])\n", 139 | "print(deck[12::13])\n", 140 | "print(Card('Q', 'hearts') in deck)\n", 141 | "print(Card('7', 'beasts') in deck)" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "- Special methods are meant to be called by the Python interpreter, and not by you.\n", 149 | "- The only special method that is frequently called by user code directly is __init__, to invoke the initializer of the superclass in your own __init__ implementation.\n", 150 | "- If you need to invoke a special method, it is usually better to call the related built-in function, such as len, iter, str etc.\n", 151 | "\n", 152 | "## Emulating numeric types" 153 | ] 154 | }, 155 | { 156 | "cell_type": "code", 157 | "execution_count": 4, 158 | "metadata": { 159 | "collapsed": false, 160 | "scrolled": true 161 | }, 162 | "outputs": [ 163 | { 164 | "name": "stdout", 165 | "output_type": "stream", 166 | "text": [ 167 | "TestResults(failed=0, attempted=3)\n", 168 | "Vector(4, 5)\n", 169 | "5.0\n", 170 | "Vector(9, 12)\n", 171 | "15.0\n" 172 | ] 173 | } 174 | ], 175 | "source": [ 176 | "from math import hypot \n", 177 | "\n", 178 | "class Vector:\n", 179 | " def __init__(self, x=0, y=0): \n", 180 | " self.x = x\n", 181 | " self.y = y\n", 182 | " def __repr__(self):\n", 183 | " #note use of string representation, to get raw value, not str as %s\n", 184 | " return 'Vector(%r, %r)' % (self.x, self.y) \n", 185 | " def __abs__(self):\n", 186 | " return hypot(self.x, self.y) \n", 187 | " def __bool__(self):\n", 188 | " #Convert the magnitude to a boolean using bool(abs(self)) because __bool__ is expected to return a boolean.\n", 189 | " return bool(abs(self))\n", 190 | " def __add__(self, other):\n", 191 | " '''\n", 192 | " >>> v1 = Vector(2, 4) \n", 193 | " >>> v2 = Vector(2, 1) \n", 194 | " >>> v1+v2 \n", 195 | " Vector(4, 5)\n", 196 | " '''\n", 197 | " x = self.x + other.x \n", 198 | " y = self.y + other.y \n", 199 | " return Vector(x, y)\n", 200 | " def __mul__(self, scalar):\n", 201 | " return Vector(self.x * scalar, self.y * scalar)\n", 202 | "\n", 203 | "import doctest\n", 204 | "print(doctest.testmod())\n", 205 | "v1 = Vector(2, 4)\n", 206 | "v2 = Vector(2, 1) \n", 207 | "print(v1+v2)\n", 208 | "v = Vector(3, 4)\n", 209 | "print(abs(v))\n", 210 | "print(v*3)\n", 211 | "print(abs(v * 3))" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "Note that although we implemented four special methods (apart from __init__), none of them is directly called\n", 219 | "\n", 220 | "## String representation\n", 221 | "\n", 222 | "The __repr__ special method is called by the repr built-in to get string representation of the object for inspection. Note that in our __repr__ implementation we used %r to obtain the standard repre‐ sentation of the attributes to be displayed. This is good practice, as it shows the crucial difference between Vector(1, 2) and Vector('1', '2') — the latter would not work in the context of this example, because the constructors arguments must be numbers, not str.\n", 223 | "\n", 224 | "## Arithmetic Operators\n", 225 | "\n", 226 | "There are 3 kinds of operators calling-notations: prefix (+ 3 5), infix (3 + 5), and postfix (3 5 +). The __add__ and __mul__ methods create and return a new instance of Vector, and do not modify either operand — self or other are merely read. This is the expected behavior of infix operators: to create new objects and not touch their operands.\n", 227 | "\n", 228 | "## Boolean value of a custom type\n", 229 | "\n", 230 | "To determine whether a value x is truthy or falsy, Python applies bool(x), which always returns True or False.\n", 231 | "\n", 232 | "## Why len is not a method\n", 233 | "\n", 234 | "len is not called as a method because it gets special treatment as part of the Python Data Model, just like abs. But thanks to the special method __len__ you can also make len work with your own custom objects. This is fair compromise between the need for efficient built-in objects and the consistency of the language. Also from the Zen of Python: “Special cases aren’t special enough to break the rules.”\n", 235 | "\n", 236 | "# Summary\n", 237 | "\n", 238 | "By implementing special methods, custom objects can behave like the built-in types, enabling Pythonic 'expressive coding'. Emulating sequences, as shown with the FrenchDeck example, is one of the most widely used applications of the special methods.\n", 239 | "\n", 240 | "A basic requirement for a Python object is to provide usable string representations of itself, one used for debugging and logging, another for presentation to end users. That is why the special methods __repr__ and __str__ exist in the Data Model." 241 | ] 242 | } 243 | ], 244 | "metadata": { 245 | "kernelspec": { 246 | "display_name": "Python 3", 247 | "language": "python", 248 | "name": "python3" 249 | }, 250 | "language_info": { 251 | "codemirror_mode": { 252 | "name": "ipython", 253 | "version": 3 254 | }, 255 | "file_extension": ".py", 256 | "mimetype": "text/x-python", 257 | "name": "python", 258 | "nbconvert_exporter": "python", 259 | "pygments_lexer": "ipython3", 260 | "version": "3.6.0" 261 | } 262 | }, 263 | "nbformat": 4, 264 | "nbformat_minor": 2 265 | } 266 | -------------------------------------------------------------------------------- /Chapter 02 - An Array of Sequences.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# An array of sequences\n", 8 | "\n", 9 | "ABC introduced many ideas we now consider “Pythonic”:\n", 10 | "- generic operations on sequences\n", 11 | "- built-in tuple and mapping types\n", 12 | "- structure by indentation\n", 13 | "- strong typing without variable declarations\n", 14 | "\n", 15 | "Objects whose value can change are said to be mutable (M); objects whose value is unchangeable once they are created are called immutable (IM). Mutable objects have methods such as pop, append, extend, remove, __setitem__, __delitem__. Immutable objects have __getitem__, __contains__, index and count.\n", 16 | "\n", 17 | "The standard library offers a rich selection of sequence types implemented in C. \n", 18 | "\n", 19 | "- Container sequences hold references to the objects they contain, which may be of any type\n", 20 | " - list (M)\n", 21 | " - tuple (IM)\n", 22 | " - collections.deque (M)\n", 23 | "- Flat sequences physically store the value of each item within its own memory space, and not as distinct objects. flat sequences are more compact, but they are limited to holding primitive values like characters, bytes and numbers.\n", 24 | " - str (IM)\n", 25 | " - bytes (IM)\n", 26 | " - bytearray (M)\n", 27 | " - memoryview (M)\n", 28 | " - array.array (M)\n", 29 | " \n", 30 | "## List comprehensions and generator expressions\n", 31 | "\n", 32 | "List comprehension is an elegant way to define and create list in Python, implementing a well-known notation for sets as used by mathematicians.\n", 33 | "\n", 34 | "### For loop vs listcomps\n", 35 | "\n", 36 | "A for loop may be used to do lots of different things: scanning a sequence to count or pick items, computing aggregates (sums, averages), or any number of other processing. The code in Example 2-1 is building up a list. In contrast, a listcomp is meant to do one thing only: to build a new list." 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 26, 42 | "metadata": { 43 | "collapsed": false 44 | }, 45 | "outputs": [ 46 | { 47 | "name": "stdout", 48 | "output_type": "stream", 49 | "text": [ 50 | "[5, 5, 3, 5, 4, 4, 3]\n", 51 | "[5, 5, 3, 5, 4, 4, 3]\n" 52 | ] 53 | } 54 | ], 55 | "source": [ 56 | "sentence = \"the quick brown fox jumps over the lazy dog\"\n", 57 | "words = sentence.split()\n", 58 | "word_lengths = []\n", 59 | "\n", 60 | "#using a for loop\n", 61 | "for word in words:\n", 62 | " if word != \"the\":\n", 63 | " word_lengths.append(len(word))\n", 64 | "#note 3 is missing from the start of the list\n", 65 | "print(word_lengths)\n", 66 | "\n", 67 | "#using list comprehension\n", 68 | "word_lengths_comp = []\n", 69 | "word_lengths_comp = [len(word) for word in words if word != \"the\"]\n", 70 | "print(word_lengths_comp)" 71 | ] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "### Listcomps versus map and filter\n", 78 | "\n", 79 | "Use map() or filter() for expressions that are too long or complicated to express with a list comprehension. If you already have a function defined, it is often reasonable to use map, though it is considered 'unpythonic'. " 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "execution_count": 27, 85 | "metadata": { 86 | "collapsed": false 87 | }, 88 | "outputs": [ 89 | { 90 | "name": "stdout", 91 | "output_type": "stream", 92 | "text": [ 93 | "[162, 163, 165, 8364, 164]\n", 94 | "[162, 163, 165, 8364, 164]\n" 95 | ] 96 | } 97 | ], 98 | "source": [ 99 | "symbols = '$¢£¥€¤'\n", 100 | "beyond_ascii_listcomp = [ord(s) for s in symbols if ord(s) > 127]\n", 101 | "print(beyond_ascii_listcomp)\n", 102 | "beyond_ascii_map = list(filter(lambda c: c > 127, map(ord, symbols)))\n", 103 | "print(beyond_ascii_map)" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "### Cartesian products\n", 111 | "\n", 112 | "Listcomps can generate lists from the cartesian product of two or more iterables. The items that make up the cartesian product are tuples made from items from every input iterable. The resulting list has a length equal to the lengths of the input iterables mul‐ tiplied." 113 | ] 114 | }, 115 | { 116 | "cell_type": "markdown", 117 | "metadata": {}, 118 | "source": [ 119 | "colors = ['black', 'white']\n", 120 | "\n", 121 | "sizes = ['S', 'M', 'L']\n", 122 | "\n", 123 | "tshirts = [(color, size) for color in colors for size in sizes] \n", 124 | "tshirts\n", 125 | "for color in colors:\n", 126 | " for size in sizes:\n", 127 | " print((color, size))\n", 128 | "tshirts = [(color, size) for size in sizes for color in colors]\n", 129 | "print(tshirts)" 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "### Generator expressions\n", 137 | "\n", 138 | "To initialize tuples, arrays and other types of sequences, you could also start from a listcomp but a genexp saves memory because it yields items one by one using the iterator protocol instead of building a whole list just to feed another constructor. If two lists used in the cartesian product had a thousand items each, using a generator expression would save the expense of building a list with a million items just to feed the for loop." 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 28, 144 | "metadata": { 145 | "collapsed": false 146 | }, 147 | "outputs": [ 148 | { 149 | "name": "stdout", 150 | "output_type": "stream", 151 | "text": [ 152 | "(36, 162, 163, 165, 8364, 164)\n" 153 | ] 154 | }, 155 | { 156 | "data": { 157 | "text/plain": [ 158 | "array('I', [36, 162, 163, 165, 8364, 164])" 159 | ] 160 | }, 161 | "execution_count": 28, 162 | "metadata": {}, 163 | "output_type": "execute_result" 164 | } 165 | ], 166 | "source": [ 167 | "symbols = '$¢£¥€¤'\n", 168 | "print(tuple(ord(symbol) for symbol in symbols))\n", 169 | "import array\n", 170 | "array.array('I', (ord(symbol) for symbol in symbols))" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": {}, 176 | "source": [ 177 | "## Tuples are not just immutable lists\n", 178 | "\n", 179 | "Tuples do double-duty: they can be used as immutable lists and also as records with no field names.\n", 180 | "\n", 181 | "### Tuples as records" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": 29, 187 | "metadata": { 188 | "collapsed": false 189 | }, 190 | "outputs": [ 191 | { 192 | "name": "stdout", 193 | "output_type": "stream", 194 | "text": [ 195 | "BRA/CE342567\n", 196 | "ESP/XDA205856\n", 197 | "USA/31195855\n", 198 | "USA\n", 199 | "BRA\n", 200 | "ESP\n", 201 | "33.9425\n", 202 | "-118.408056\n", 203 | "idrsa.pub\n" 204 | ] 205 | } 206 | ], 207 | "source": [ 208 | "city, year, pop, chg, area = ('Tokyo', 2003, 32450, 0.66, 8014)\n", 209 | "traveler_ids = [('USA', '31195855'), ('BRA', 'CE342567'), ('ESP', 'XDA205856')]\n", 210 | "for passport in sorted(traveler_ids):\n", 211 | " print('%s/%s' % passport)\n", 212 | "for country, _ in traveler_ids:\n", 213 | " print(country)\n", 214 | "lax_coordinates = (33.9425, -118.408056)\n", 215 | "# tuple unpacking\n", 216 | "latitude, longitude = lax_coordinates \n", 217 | "print(latitude)\n", 218 | "print(longitude)\n", 219 | "import os\n", 220 | "#os.path.split() function builds a tuple (path, last_part) from a filesystem path\n", 221 | "_, filename = os.path.split('/home/luciano/.ssh/idrsa.pub')\n", 222 | "print(filename)" 223 | ] 224 | }, 225 | { 226 | "cell_type": "markdown", 227 | "metadata": {}, 228 | "source": [ 229 | "### Using * to grab excess items\n" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 30, 235 | "metadata": { 236 | "collapsed": false 237 | }, 238 | "outputs": [ 239 | { 240 | "name": "stdout", 241 | "output_type": "stream", 242 | "text": [ 243 | "0 1 [2, 3, 4]\n", 244 | "0 1 [2]\n", 245 | "0 1 []\n" 246 | ] 247 | } 248 | ], 249 | "source": [ 250 | "a, b, *rest = range(5)\n", 251 | "print(a, b, rest)\n", 252 | "a, b, *rest = range(3)\n", 253 | "print(a, b, rest)\n", 254 | "a, b, *rest = range(2)\n", 255 | "print(a, b, rest)" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "### Nested tuple unpacking\n", 263 | "\n", 264 | "The tuple to receive an expression to unpack can have nested tuples, like (a, b, (c, d)) and Python will do the right thing if the expression matches the nesting structure." 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": 31, 270 | "metadata": { 271 | "collapsed": false 272 | }, 273 | "outputs": [ 274 | { 275 | "name": "stdout", 276 | "output_type": "stream", 277 | "text": [ 278 | " | lat. | long. \n", 279 | "Mexico City | 19.4333 | -99.1333\n", 280 | "New York-Newark | 40.8086 | -74.0204\n", 281 | "Sao Paulo | -23.5478 | -46.6358\n" 282 | ] 283 | } 284 | ], 285 | "source": [ 286 | "metro_areas = [('Tokyo','JP',36.933,(35.689722,139.691667)), \n", 287 | " ('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)), \n", 288 | " ('Mexico City', 'MX', 20.142, (19.433333, -99.133333)), \n", 289 | " ('New York-Newark', 'US', 20.104, (40.808611, -74.020386)), \n", 290 | " ('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)),]\n", 291 | "#defining colum widths\n", 292 | "print('{:15} | {:^9} | {:^9}'.format('', 'lat.', 'long.'))\n", 293 | "#setting the numner of digits\n", 294 | "fmt = '{:15} | {:9.4f} | {:9.4f}'\n", 295 | "for name, cc, pop, (latitude, longitude) in metro_areas: #\n", 296 | " if longitude <= 0: #\n", 297 | " print(fmt.format(name, latitude, longitude))" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "### Named tuples" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": 35, 310 | "metadata": { 311 | "collapsed": false 312 | }, 313 | "outputs": [ 314 | { 315 | "name": "stdout", 316 | "output_type": "stream", 317 | "text": [ 318 | "City(name='Tokyo', country='JP', population=36.933, coordinates=(35.689722, 139.691667))\n", 319 | "(35.689722, 139.691667)\n", 320 | "JP\n", 321 | "name: Delhi NCR\n", 322 | "country: IN\n", 323 | "population: 21.935\n", 324 | "coordinates: LatLong(lat=28.613889, long=77.208889)\n" 325 | ] 326 | } 327 | ], 328 | "source": [ 329 | "from collections import namedtuple\n", 330 | "City = namedtuple('City', 'name country population coordinates')\n", 331 | "tokyo = City('Tokyo', 'JP', 36.933, (35.689722, 139.691667))\n", 332 | "print(tokyo)\n", 333 | "print(tokyo.coordinates)\n", 334 | "print(tokyo[1])\n", 335 | "\n", 336 | "#A named tuple type has a few attributes in addition to those inherited from tuple. \n", 337 | "#the _fields class attribute, \n", 338 | "#the class method _make(iterable)\n", 339 | "#the _asdict() instance method.\n", 340 | "\n", 341 | "City._fields #_fields\n", 342 | "LatLong = namedtuple('LatLong', 'lat long')\n", 343 | "delhi_data = ('Delhi NCR', 'IN', 21.935, LatLong(28.613889, 77.208889)) \n", 344 | "delhi = City._make(delhi_data) #_make()\n", 345 | "delhi._asdict() #_asdict()\n", 346 | "for key, value in delhi._asdict().items():\n", 347 | " print(key + ':', value)" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": {}, 353 | "source": [ 354 | "It is also possible to use tuples as immutable lists\n", 355 | "\n", 356 | "## Slicing" 357 | ] 358 | }, 359 | { 360 | "cell_type": "code", 361 | "execution_count": 39, 362 | "metadata": { 363 | "collapsed": false 364 | }, 365 | "outputs": [ 366 | { 367 | "name": "stdout", 368 | "output_type": "stream", 369 | "text": [ 370 | "[10, 20]\n", 371 | "[30, 40, 50, 60]\n", 372 | "[10, 20, 30]\n", 373 | "[40, 50, 60]\n", 374 | "bye\n", 375 | "elcycib\n", 376 | "eccb\n" 377 | ] 378 | } 379 | ], 380 | "source": [ 381 | "l=[10,20,30,40,50,60]\n", 382 | "print(l[:2])\n", 383 | "print(l[2:])\n", 384 | "print(l[:3])\n", 385 | "print(l[3:])\n", 386 | "s = 'bicycle'\n", 387 | "#slice every 3rd letter\n", 388 | "print(s[::3])\n", 389 | "#slice every -1 letter (i.e. from the end backwards - reversing the word)\n", 390 | "print(s[::-1])\n", 391 | "#slice every other letter from end to beginning\n", 392 | "print(s[::-2])" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "Slicing flat-file data:" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 42, 405 | "metadata": { 406 | "collapsed": false 407 | }, 408 | "outputs": [ 409 | { 410 | "name": "stdout", 411 | "output_type": "stream", 412 | "text": [ 413 | " $17.50 Pimoroni PiBrella \n", 414 | " $4.95 6mm Tactile Switch x20 \n", 415 | " $28.00 Panavise Jr. - PV-201 \n", 416 | " $34.95 PiTFT Mini Kit 320x240 \n", 417 | " \n" 418 | ] 419 | } 420 | ], 421 | "source": [ 422 | "invoice =\"\"\"\n", 423 | "0.....6.................................40........52...55........\n", 424 | "1909 Pimoroni PiBrella $17.50 3 $52.50\n", 425 | "1489 6mm Tactile Switch x20 $4.95 2 $9.90\n", 426 | "1510 Panavise Jr. - PV-201 $28.00 1 $28.00\n", 427 | "1601 PiTFT Mini Kit 320x240 $34.95 1 $34.95\n", 428 | "\"\"\"\n", 429 | "SKU = slice(0, 6)\n", 430 | "DESCRIPTION = slice(6, 40)\n", 431 | "UNIT_PRICE = slice(40, 52)\n", 432 | "QUANTITY = slice(52, 55)\n", 433 | "ITEM_TOTAL = slice(55, None)\n", 434 | "line_items = invoice.split('\\n')[2:]\n", 435 | "for item in line_items:\n", 436 | " print(item[UNIT_PRICE], item[DESCRIPTION])" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "metadata": {}, 442 | "source": [ 443 | "### Multi-dimensional slicing and ellipsis\n", 444 | "\n", 445 | "The [] operator can also take multiple indexes or slices separated by commas.\n", 446 | "\n", 447 | "The ellipsis — written with three full stops ... and not Unicode U+2026 — is recognized as a token by the Python parser." 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 43, 453 | "metadata": { 454 | "collapsed": false 455 | }, 456 | "outputs": [ 457 | { 458 | "name": "stdout", 459 | "output_type": "stream", 460 | "text": [ 461 | "[0, 1, 20, 30, 5, 6, 7, 8, 9]\n", 462 | "[0, 1, 20, 30, 5, 8, 9]\n", 463 | "[0, 1, 20, 11, 5, 22, 9]\n", 464 | "[1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]\n", 465 | "abcdabcdabcdabcdabcd\n" 466 | ] 467 | } 468 | ], 469 | "source": [ 470 | "l = list(range(10))\n", 471 | "l[2:5] = [20, 30]\n", 472 | "print(l)\n", 473 | "del l[5:7]\n", 474 | "print(l)\n", 475 | "l[3::2] = [11, 22]\n", 476 | "print(l)\n", 477 | "#Using + and * with sequences\n", 478 | "l=[1,2,3]\n", 479 | "print(l*5)\n", 480 | "print(5 * 'abcd')" 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "metadata": {}, 486 | "source": [ 487 | "### Building lists of lists\n", 488 | "Sometimes we need to initialize a list with a certain number of nested lists, for example, to distribute students in a list of teams or to represent squares on a game board. The best way of doing so is with a list comprehension, like this:" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 49, 494 | "metadata": { 495 | "collapsed": false 496 | }, 497 | "outputs": [ 498 | { 499 | "name": "stdout", 500 | "output_type": "stream", 501 | "text": [ 502 | "[['_', '_', '_'], ['_', '_', 'X'], ['_', '_', '_']]\n", 503 | "[['_', '_', 'O'], ['_', '_', 'O'], ['_', '_', 'O']]\n", 504 | "[['_', '_', '_'], ['_', '_', '_'], ['X', '_', '_']]\n" 505 | ] 506 | } 507 | ], 508 | "source": [ 509 | "#using listcomp\n", 510 | "board = [['_'] * 3 for i in range(3)]\n", 511 | "board[1][2] = 'X'\n", 512 | "print(board)\n", 513 | "# Board that doesn't work\n", 514 | "weird_board = [['_'] * 3] * 3\n", 515 | "weird_board[1][2] = 'O'\n", 516 | "print(weird_board)\n", 517 | "# Using for\n", 518 | "board = []\n", 519 | "for i in range(3):\n", 520 | " row=['_']*3 #\n", 521 | " board.append(row)\n", 522 | "board[2][0] = 'X'\n", 523 | "print(board)" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "## Augmented assignment with sequences\n", 531 | "\n", 532 | "The augmented assignment operators += and *= behave very differently depending on the first operand. For example, the special method that makes += work is __iadd__ (for “in-place addition”). However, if __iadd__ is not implemented, Python falls back to calling __add__. Consider this simple expression:" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": null, 538 | "metadata": { 539 | "collapsed": true 540 | }, 541 | "outputs": [], 542 | "source": [ 543 | "a+=b\n", 544 | "#for mutable equivalent to:\n", 545 | "a.extend(b) # original object modified\n", 546 | "#for immutable equivalent to:\n", 547 | "a = a + b # new object created" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": 51, 553 | "metadata": { 554 | "collapsed": false 555 | }, 556 | "outputs": [ 557 | { 558 | "name": "stdout", 559 | "output_type": "stream", 560 | "text": [ 561 | "[1, 2, 3]\n", 562 | "4409327496\n", 563 | "[1, 2, 3, 1, 2, 3]\n", 564 | "4409327496\n", 565 | "4408207184\n", 566 | "4408247304\n" 567 | ] 568 | } 569 | ], 570 | "source": [ 571 | "#list i.e. mutable\n", 572 | "l=[1,2,3]\n", 573 | "print(l)\n", 574 | "#original ID\n", 575 | "print(id(l))\n", 576 | "l*=2\n", 577 | "# original is modified - i.e. inplace operand\n", 578 | "print(l)\n", 579 | "print(id(l))\n", 580 | "#using a tuple - i.e. immutable\n", 581 | "t=(1,2,3)\n", 582 | "#original ID\n", 583 | "print(id(t))\n", 584 | "#operating\n", 585 | "t *=2\n", 586 | "#note new ID\n", 587 | "print(id(t))" 588 | ] 589 | }, 590 | { 591 | "cell_type": "markdown", 592 | "metadata": {}, 593 | "source": [ 594 | "## list.sort and the sorted built-in function\n", 595 | "\n", 596 | "The list.sort method sorts a list in-place, that is, without making a copy. It returns None to remind us that it changes the target object, and does not create a new list. In contrast, the built-in function sorted creates a new list and returns it." 597 | ] 598 | }, 599 | { 600 | "cell_type": "code", 601 | "execution_count": 54, 602 | "metadata": { 603 | "collapsed": false 604 | }, 605 | "outputs": [ 606 | { 607 | "name": "stdout", 608 | "output_type": "stream", 609 | "text": [ 610 | "['apple', 'banana', 'grape', 'raspberry']\n", 611 | "['grape', 'raspberry', 'apple', 'banana']\n", 612 | "['raspberry', 'grape', 'banana', 'apple']\n", 613 | "['grape', 'apple', 'banana', 'raspberry']\n", 614 | "['raspberry', 'banana', 'grape', 'apple']\n", 615 | "['grape', 'raspberry', 'apple', 'banana']\n", 616 | "['apple', 'banana', 'grape', 'raspberry']\n" 617 | ] 618 | } 619 | ], 620 | "source": [ 621 | "fruits = ['grape', 'raspberry', 'apple', 'banana']\n", 622 | "#alphabetised new list\n", 623 | "print(sorted(fruits))\n", 624 | "#original\n", 625 | "print(fruits)\n", 626 | "#backwards new list\n", 627 | "print(sorted(fruits, reverse=True))\n", 628 | "#sorted by length new list\n", 629 | "print(sorted(fruits, key=len))\n", 630 | "#backwards by length new list\n", 631 | "print(sorted(fruits, key=len, reverse=True))\n", 632 | "#original again\n", 633 | "print(fruits)\n", 634 | "# sorting in place\n", 635 | "fruits.sort()\n", 636 | "#original is now sorted\n", 637 | "print(fruits)" 638 | ] 639 | }, 640 | { 641 | "cell_type": "markdown", 642 | "metadata": {}, 643 | "source": [ 644 | "### Searching with bisect\n", 645 | "\n" 646 | ] 647 | }, 648 | { 649 | "cell_type": "code", 650 | "execution_count": 61, 651 | "metadata": { 652 | "collapsed": false 653 | }, 654 | "outputs": [ 655 | { 656 | "name": "stdout", 657 | "output_type": "stream", 658 | "text": [ 659 | "DEMO: bisect\n", 660 | "haystack -> 1 4 5 6 8 12 15 20 21 23 23 26 29 30\n", 661 | "31 @ 14 | | | | | | | | | | | | | |31\n", 662 | "30 @ 14 | | | | | | | | | | | | | |30\n", 663 | "29 @ 13 | | | | | | | | | | | | |29\n", 664 | "23 @ 11 | | | | | | | | | | |23\n", 665 | "22 @ 9 | | | | | | | | |22\n", 666 | "10 @ 5 | | | | |10\n", 667 | " 8 @ 5 | | | | |8 \n", 668 | " 5 @ 3 | | |5 \n", 669 | " 2 @ 1 |2 \n", 670 | " 1 @ 1 |1 \n", 671 | " 0 @ 0 0 \n" 672 | ] 673 | }, 674 | { 675 | "data": { 676 | "text/plain": [ 677 | "['F', 'A', 'C', 'C', 'B', 'A', 'A']" 678 | ] 679 | }, 680 | "execution_count": 61, 681 | "metadata": {}, 682 | "output_type": "execute_result" 683 | } 684 | ], 685 | "source": [ 686 | "import bisect\n", 687 | "import sys\n", 688 | "\n", 689 | "HAYSTACK = [1, 4, 5, 6, 8, 12, 15, 20, 21, 23, 23, 26, 29, 30]\n", 690 | "NEEDLES = [0, 1, 2, 5, 8, 10, 22, 23, 29, 30, 31]\n", 691 | "\n", 692 | "ROW_FMT = '{0:2d} @ {1:2d} {2}{0:<2d}'\n", 693 | "\n", 694 | "def demo(bisect_fn):\n", 695 | " for needle in reversed(NEEDLES):\n", 696 | " #Use the chosen bisect function to get the insertion point.\n", 697 | " position = bisect_fn(HAYSTACK, needle)\n", 698 | " #Build a pattern of vertical bars proportional to the offset.\n", 699 | " offset = position * ' |' \n", 700 | " #Print formatted row showing needle and insertion point.\n", 701 | " print(ROW_FMT.format(needle, position, offset))\n", 702 | " \n", 703 | "if __name__ == '__main__':\n", 704 | " \n", 705 | " #Choose the bisect function to use according to the last command line argument.\n", 706 | " if sys.argv[-1] == 'left':\n", 707 | " bisect_fn = bisect.bisect_left \n", 708 | " else:\n", 709 | " bisect_fn = bisect.bisect\n", 710 | " \n", 711 | " #Print header with name of function selected.\n", 712 | " print('DEMO:', bisect_fn.__name__)\n", 713 | " print('haystack ->', ' '.join('%2d' % n for n in HAYSTACK)) \n", 714 | " demo(bisect_fn)\n", 715 | "\n", 716 | "# Given a test score, grade returns the corresponding letter grade.\n", 717 | "def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'):\n", 718 | " i = bisect.bisect(breakpoints, score)\n", 719 | " return grades[i]\n", 720 | "[grade(score) for score in [33, 99, 77, 70, 89, 90, 100]]" 721 | ] 722 | }, 723 | { 724 | "cell_type": "code", 725 | "execution_count": 65, 726 | "metadata": { 727 | "collapsed": false, 728 | "scrolled": true 729 | }, 730 | "outputs": [ 731 | { 732 | "name": "stdout", 733 | "output_type": "stream", 734 | "text": [ 735 | "20 -> [20]\n", 736 | " 1 -> [1, 20]\n", 737 | "13 -> [1, 13, 20]\n", 738 | "16 -> [1, 13, 16, 20]\n", 739 | "14 -> [1, 13, 14, 16, 20]\n", 740 | " 5 -> [1, 5, 13, 14, 16, 20]\n", 741 | "21 -> [1, 5, 13, 14, 16, 20, 21]\n", 742 | " 1 -> [1, 1, 5, 13, 14, 16, 20, 21]\n", 743 | " 5 -> [1, 1, 5, 5, 13, 14, 16, 20, 21]\n", 744 | "27 -> [1, 1, 5, 5, 13, 14, 16, 20, 21, 27]\n", 745 | " 9 -> [1, 1, 5, 5, 9, 13, 14, 16, 20, 21, 27]\n", 746 | "16 -> [1, 1, 5, 5, 9, 13, 14, 16, 16, 20, 21, 27]\n", 747 | "11 -> [1, 1, 5, 5, 9, 11, 13, 14, 16, 16, 20, 21, 27]\n", 748 | "27 -> [1, 1, 5, 5, 9, 11, 13, 14, 16, 16, 20, 21, 27, 27]\n" 749 | ] 750 | } 751 | ], 752 | "source": [ 753 | "import bisect\n", 754 | "import random\n", 755 | "SIZE=14\n", 756 | "\n", 757 | "random.seed(1729)\n", 758 | "my_list = []\n", 759 | "for i in range(SIZE):\n", 760 | " #create a new number from range SIZE * 2\n", 761 | " new_item = random.randrange(SIZE*2)\n", 762 | " # insert into list, in correct order\n", 763 | " bisect.insort(my_list, new_item)\n", 764 | " print('%2d ->' % new_item, my_list)" 765 | ] 766 | }, 767 | { 768 | "cell_type": "markdown", 769 | "metadata": {}, 770 | "source": [ 771 | "## When a list is not the answer\n", 772 | "The list type is flexible and easy to use, but depending on specific requirements there are better options. \n", 773 | "\n", 774 | "For example, if you need to store 10 million of floating point values an **array** is much more efficient, because an array does not actually hold full-fledged float objects, but only the packed bytes representing their machine values — just like an array in the C language. \n", 775 | "\n", 776 | "On the other hand, if you are constantly adding and removing items from the ends of a list as a FIFO or LIFO data structure, a **deque** (double-ended queue) works faster.\n", 777 | "\n", 778 | "### Arrays\n", 779 | "\n", 780 | "If all you want to put in the list are numbers, an array.array is more efficient than a list: it supports all mutable sequence operations (including .pop, .insert and .ex tend), and additional methods for fast loading and saving such as .frombytes and .tofile.\n", 781 | "\n", 782 | "Saving with array.tofile is about 7 times faster than writing one float per line in a text file. In addition, the size of the binary file with 10 million doubles is 80,000,000 bytes (8 bytes per double, zero overhead), while the text file has 181,515,739 bytes, for the same data." 783 | ] 784 | }, 785 | { 786 | "cell_type": "code", 787 | "execution_count": 66, 788 | "metadata": { 789 | "collapsed": false 790 | }, 791 | "outputs": [ 792 | { 793 | "name": "stdout", 794 | "output_type": "stream", 795 | "text": [ 796 | "0.5243459839714878\n", 797 | "0.5243459839714878\n", 798 | "True\n" 799 | ] 800 | } 801 | ], 802 | "source": [ 803 | "from array import array\n", 804 | "from random import random\n", 805 | "# create an array of double-precision floats (typecode 'd') from any iterable\n", 806 | "object, in this case a generator expression;\n", 807 | "floats = array('d', (random() for i in range(10**7)))\n", 808 | "#inspect the last number in the array;\n", 809 | "print(floats[-1])\n", 810 | "fp = open('floats.bin', 'wb')\n", 811 | "#save the array to a binary file;\n", 812 | "floats.tofile(fp)\n", 813 | "fp.close()\n", 814 | "floats2 = array('d')\n", 815 | "fp = open('floats.bin', 'rb')\n", 816 | "#create an empty array of doubles;\n", 817 | "floats2.fromfile(fp, 10**7)\n", 818 | "fp.close()\n", 819 | "#inspect the last number in the array;\n", 820 | "print(floats2[-1])\n", 821 | "#check floats match\n", 822 | "print(floats2 == floats)" 823 | ] 824 | }, 825 | { 826 | "cell_type": "markdown", 827 | "metadata": {}, 828 | "source": [ 829 | "## Memory views\n", 830 | "\n", 831 | "The built-in memorview class is a shared-memory sequence type that lets you handle slices of arrays without copying bytes. A memoryview is essentially a generalized NumPy array structure in Python itself (without the math). It allows you to share memory between data-structures (things like PIL images, SQLlite databases, NumPy arrays, etc.) without first copying. This is very important for large data sets." 832 | ] 833 | }, 834 | { 835 | "cell_type": "code", 836 | "execution_count": 76, 837 | "metadata": { 838 | "collapsed": false 839 | }, 840 | "outputs": [ 841 | { 842 | "name": "stdout", 843 | "output_type": "stream", 844 | "text": [ 845 | "5\n", 846 | "-2\n", 847 | "[254, 255, 255, 255, 0, 0, 1, 0, 2, 0]\n", 848 | "0\n", 849 | "\n", 850 | "array('h', [-2, -1, 1024, 1, 2])\n" 851 | ] 852 | } 853 | ], 854 | "source": [ 855 | "from array import array\n", 856 | "\n", 857 | "numbers = array('h', [-2, -1, 0, 1, 2])\n", 858 | "memv = memoryview(numbers)\n", 859 | "print(len(memv))\n", 860 | "print(memv[0])\n", 861 | "memv_oct = memv.cast('B')\n", 862 | "print(memv_oct.tolist())\n", 863 | "print(memv_oct[5])\n", 864 | "memv_oct[5] = 4\n", 865 | "print(numbers)" 866 | ] 867 | }, 868 | { 869 | "cell_type": "code", 870 | "execution_count": 77, 871 | "metadata": { 872 | "collapsed": false 873 | }, 874 | "outputs": [ 875 | { 876 | "name": "stdout", 877 | "output_type": "stream", 878 | "text": [ 879 | "[ 0 1 2 3 4 5 6 7 8 9 10 11]\n", 880 | "(12,)\n", 881 | "[[ 0 1 2 3]\n", 882 | " [ 4 5 6 7]\n", 883 | " [ 8 9 10 11]]\n", 884 | "[ 8 9 10 11]\n", 885 | "9\n", 886 | "[1 5 9]\n", 887 | "[[ 0 4 8]\n", 888 | " [ 1 5 9]\n", 889 | " [ 2 6 10]\n", 890 | " [ 3 7 11]]\n" 891 | ] 892 | } 893 | ], 894 | "source": [ 895 | "import numpy\n", 896 | "a = numpy.arange(12)\n", 897 | "print(a) \n", 898 | "type(a)\n", 899 | "print(a.shape)\n", 900 | "a.shape = 3, 4\n", 901 | "print(a)\n", 902 | "print(a[2])\n", 903 | "print(a[2,1])\n", 904 | "print(a[:, 1])\n", 905 | "print(a.transpose())" 906 | ] 907 | }, 908 | { 909 | "cell_type": "markdown", 910 | "metadata": {}, 911 | "source": [ 912 | "## Deques and other queues\n", 913 | "\n", 914 | "The .append and .pop methods make a list usable as a stack or a queue (if you use .append and .pop(0), you get LIFO behavior). But inserting and removing from the left of a list (the 0-index end) is costly because the entire list must be shifted.\n", 915 | "\n", 916 | "The class collections.deque is a thread-safe double-ended queue designed for fast inserting and removing from both ends. It is also the way to go if you need to keep a list of “last seen items” or something like that, because a deque can be bounded — i.e. created with a maximum length and then, when it is full, it discards items from the opposite end when you append new ones." 917 | ] 918 | }, 919 | { 920 | "cell_type": "code", 921 | "execution_count": 78, 922 | "metadata": { 923 | "collapsed": false 924 | }, 925 | "outputs": [ 926 | { 927 | "name": "stdout", 928 | "output_type": "stream", 929 | "text": [ 930 | "deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)\n", 931 | "deque([7, 8, 9, 0, 1, 2, 3, 4, 5, 6], maxlen=10)\n", 932 | "deque([1, 2, 3, 4, 5, 6, 7, 8, 9, 0], maxlen=10)\n", 933 | "deque([-1, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=10)\n", 934 | "deque([3, 4, 5, 6, 7, 8, 9, 11, 22, 33], maxlen=10)\n", 935 | "deque([40, 30, 20, 10, 3, 4, 5, 6, 7, 8], maxlen=10)\n" 936 | ] 937 | } 938 | ], 939 | "source": [ 940 | "from collections import deque\n", 941 | "dq = deque(range(10), maxlen=10)\n", 942 | "print(dq)\n", 943 | "#shift the list by 3\n", 944 | "dq.rotate(3)\n", 945 | "print(dq)\n", 946 | "# move the list back 4\n", 947 | "dq.rotate(-4)\n", 948 | "print(dq)\n", 949 | "# add -1 to the left of the list\n", 950 | "dq.appendleft(-1)\n", 951 | "print(dq)\n", 952 | "# add values to the end\n", 953 | "dq.extend([11, 22, 33])\n", 954 | "print(dq)\n", 955 | "# extend by values at the beginning\n", 956 | "dq.extendleft([10, 20, 30, 40])\n", 957 | "print(dq)" 958 | ] 959 | }, 960 | { 961 | "cell_type": "markdown", 962 | "metadata": {}, 963 | "source": [ 964 | "Besides deque, other Python Standard Library packages implement queues:\n", 965 | "\n", 966 | "- **queue**: Provides the synchronized (i.e. thread-safe) classes Queue, LifoQueue and Priori tyQueue. These are used for safe communication between threads. All three classes can be bounded by providing a maxsize argument greater than 0 to the constructor. However, they don’t discard items to make room as deque does. Instead, when the queue is full the insertion of a new item blocks — i.e. it waits until some other thread makes room by taking an item from the queue, which is useful to throttle the num‐ ber of live threads.\n", 967 | "- **multiprocessing**: Implements its own bounded Queue, very similar to queue.Queue but designed for inter-process communication. There is also has a specialized multiprocess ing.JoinableQueue for easier task management.\n", 968 | "- **asyncio**: Newly added to Python 3.4, asyncio provides Queue, LifoQueue, PriorityQueue and JoinableQueue with APIs inspired by the classes in queue and multiprocess ing, but adapted for managing tasks in asynchronous programming.\n", 969 | "- **heapq**: In contrast to the previous three modules, heapq does not implement a queue class, but provides functions like heappush and heappop that let you use a mutable se‐ quence as a heap queue or priority queue" 970 | ] 971 | }, 972 | { 973 | "cell_type": "code", 974 | "execution_count": null, 975 | "metadata": { 976 | "collapsed": true 977 | }, 978 | "outputs": [], 979 | "source": [] 980 | } 981 | ], 982 | "metadata": { 983 | "kernelspec": { 984 | "display_name": "Python 3", 985 | "language": "python", 986 | "name": "python3" 987 | }, 988 | "language_info": { 989 | "codemirror_mode": { 990 | "name": "ipython", 991 | "version": 3 992 | }, 993 | "file_extension": ".py", 994 | "mimetype": "text/x-python", 995 | "name": "python", 996 | "nbconvert_exporter": "python", 997 | "pygments_lexer": "ipython3", 998 | "version": "3.6.0" 999 | } 1000 | }, 1001 | "nbformat": 4, 1002 | "nbformat_minor": 2 1003 | } 1004 | -------------------------------------------------------------------------------- /Chapter 03 - Dictionaries and sets.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Dictionaries and sets\n", 8 | "\n", 9 | "Python dicts are highly optimized.\n", 10 | "\n", 11 | "Dictionaries are used in:\n" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Module namespaces" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": 2, 24 | "metadata": { 25 | "collapsed": false 26 | }, 27 | "outputs": [ 28 | { 29 | "name": "stdout", 30 | "output_type": "stream", 31 | "text": [ 32 | "{'foo': 1, 'bar': [1, 2, 3]}\n" 33 | ] 34 | } 35 | ], 36 | "source": [ 37 | "#import a module\n", 38 | "import argparse\n", 39 | "#access the modules namespace\n", 40 | "args = argparse.Namespace()\n", 41 | "#insert some dict elements\n", 42 | "args.foo = 1\n", 43 | "args.bar = [1,2,3]\n", 44 | "#print the dictionary, using the vars() method\n", 45 | "print(vars(args))" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": {}, 51 | "source": [ 52 | "## Class and instance attributes" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 8, 58 | "metadata": { 59 | "collapsed": false 60 | }, 61 | "outputs": [ 62 | { 63 | "name": "stdout", 64 | "output_type": "stream", 65 | "text": [ 66 | "{'multi': 4, 'str': '2'}\n" 67 | ] 68 | } 69 | ], 70 | "source": [ 71 | "#define a new class\n", 72 | "class new_class():\n", 73 | " def __init__(self, number):\n", 74 | " self.multi = int(number) * 2\n", 75 | " self.str = str(number)\n", 76 | "#instantiate class\n", 77 | "a = new_class(2)\n", 78 | "#print object as dict\n", 79 | "print(a.__dict__)" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "## Function keyword arguments" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": null, 92 | "metadata": { 93 | "collapsed": true 94 | }, 95 | "outputs": [], 96 | "source": [ 97 | "#define a function that takes both positional and keyword arguments\n", 98 | "def my_function(*positional,**kwargs):\n", 99 | " print(\"Positional:\", positional, \"as a tuple\")\n", 100 | " print(\"Keywords:\", kwargs, \"as a dictionary\")\n", 101 | "#pass positional arguments, and keyword args.\n", 102 | "my_function('one', 'two', 'three', a=12, b=\"abc\")" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "There are lots of ways to make dictionaries" 110 | ] 111 | }, 112 | { 113 | "cell_type": "code", 114 | "execution_count": 16, 115 | "metadata": { 116 | "collapsed": false 117 | }, 118 | "outputs": [ 119 | { 120 | "data": { 121 | "text/plain": [ 122 | "True" 123 | ] 124 | }, 125 | "execution_count": 16, 126 | "metadata": {}, 127 | "output_type": "execute_result" 128 | } 129 | ], 130 | "source": [ 131 | "a = dict(one=1, two=2, three=3)\n", 132 | "b = {'one': 1, 'two': 2, 'three': 3}\n", 133 | "c = dict(zip(['one', 'two', 'three'], [1, 2, 3])) \n", 134 | "d = dict([('two', 2), ('one', 1), ('three', 3)]) \n", 135 | "e = dict({'three': 3, 'one': 1, 'two': 2}) \n", 136 | "a==b==c==d==e" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "## Dict comprehension in Python >2.7 (like listcomp)\n" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 18, 149 | "metadata": { 150 | "collapsed": false 151 | }, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | "{'China': 86, 'India': 91, 'United States': 1, 'Indonesia': 62, 'Brazil': 55}\n" 158 | ] 159 | } 160 | ], 161 | "source": [ 162 | "DIAL_CODES = [(86, 'China'),(91, 'India'),(1, 'United States'),(62, 'Indonesia'),(55, 'Brazil')]\n", 163 | "#Use the dict comp syntax - i.e. with {} instead of []\n", 164 | "country_code = {country: code for code, country in DIAL_CODES}\n", 165 | "print(country_code)" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 44, 171 | "metadata": { 172 | "collapsed": false 173 | }, 174 | "outputs": [ 175 | { 176 | "name": "stdout", 177 | "output_type": "stream", 178 | "text": [ 179 | "ambiguity [(12, 16), (12, 16)]\n", 180 | "Beautiful [(1, 1), (1, 1)]\n", 181 | "complicated [(4, 24), (4, 24)]\n", 182 | "explicitly [(11, 8), (11, 8)]\n", 183 | "implementation [(17, 8), (17, 8), (18, 8), (18, 8)]\n", 184 | "Namespaces [(19, 1), (19, 1)]\n", 185 | "practicality [(9, 10), (9, 10)]\n", 186 | "preferably [(13, 25), (13, 25)]\n", 187 | "Readability [(7, 1), (7, 1)]\n", 188 | "temptation [(12, 38), (12, 38)]\n" 189 | ] 190 | } 191 | ], 192 | "source": [ 193 | "import re\n", 194 | "\n", 195 | "#regex to get unique instances of a word\n", 196 | "WORD_RE = re.compile('\\w+')\n", 197 | "#index is a dict\n", 198 | "index = {}\n", 199 | "\n", 200 | "#open the text file\n", 201 | "with open('zen.txt', encoding='utf-8') as fp:\n", 202 | " #for each line in the file\n", 203 | " for line_no, line in enumerate(fp, 1):\n", 204 | " #use regex to get each word in the line \n", 205 | " for match in WORD_RE.finditer(line):\n", 206 | " #group / deduplicate words in line\n", 207 | " word = match.group()\n", 208 | " #only words over 9 chars\n", 209 | " if len(word) < 9:\n", 210 | " continue\n", 211 | " #isolate the match column, adding one to get accurate line numbers\n", 212 | " column_no = match.start()+1\n", 213 | " #concatenate line & column for location\n", 214 | " location = (line_no, column_no)\n", 215 | " #this is UGLY; coded like this to make a point\n", 216 | " #Lookup 1: Get the list of occurrences for word in the index dict, or add empty [] if not already in list.\n", 217 | " occurrences = index.get(word, [])\n", 218 | " #Append new location as tuple to occurrence list. May be empty if first occurrence.\n", 219 | " occurrences.append(location)\n", 220 | " #Lookup 2: Overwrite occurrences back to index dict; this entails a second search through the index.\n", 221 | " index[word] = occurrences\n", 222 | " #BETTER: Get list of occurrences, set empty list if not found, then append the occurrence tuple\n", 223 | " #Single lookup using setdefault, rather than get with default\n", 224 | " index.setdefault(word, []).append(location)\n", 225 | " \n", 226 | "# print in alphabetical order\n", 227 | "for word in sorted(index, key=str.upper): print(word, index[word])" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "## Mappings with flexible key lookup\n", 235 | "\n", 236 | "A defaultdict is configured to create items on demand whenever a missing key is searched." 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 46, 242 | "metadata": { 243 | "collapsed": false 244 | }, 245 | "outputs": [ 246 | { 247 | "name": "stdout", 248 | "output_type": "stream", 249 | "text": [ 250 | "ambiguity [(12, 16)]\n", 251 | "Beautiful [(1, 1)]\n", 252 | "complicated [(4, 24)]\n", 253 | "explicitly [(11, 8)]\n", 254 | "implementation [(17, 8), (18, 8)]\n", 255 | "Namespaces [(19, 1)]\n", 256 | "practicality [(9, 10)]\n", 257 | "preferably [(13, 25)]\n", 258 | "Readability [(7, 1)]\n", 259 | "temptation [(12, 38)]\n" 260 | ] 261 | } 262 | ], 263 | "source": [ 264 | "import re\n", 265 | "import collections\n", 266 | "\n", 267 | "WORD_RE = re.compile('\\w+')\n", 268 | "#this time, index is a defaultdict. \n", 269 | "#If key is not present, a list is created, the key inserted, and list reference returned\n", 270 | "index = collections.defaultdict(list) #list is the 'list constructor', used as the default_factory\n", 271 | "\n", 272 | "with open('zen.txt', encoding='utf-8') as fp:\n", 273 | " for line_no, line in enumerate(fp, 1): \n", 274 | " for match in WORD_RE.finditer(line):\n", 275 | " word = match.group()\n", 276 | " if len(word) < 9:\n", 277 | " continue\n", 278 | " column_no = match.start()+1\n", 279 | " location = (line_no, column_no)\n", 280 | " index[word].append(location)\n", 281 | " \n", 282 | "# print in alphabetical order\n", 283 | "for word in sorted(index, key=str.upper): print(word, index[word])" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "metadata": {}, 289 | "source": [ 290 | "## The __missing__ method\n", 291 | "\n", 292 | "Underlying the way mappings deal with missing keys is the aptly named __missing__ method. This method is not defined in the base dict class, but dict is aware of it: if you subclass dict and provide a __missing__ method, the standard dict.__getitem__ will call it whenever a key is not found, instead of raising KeyError." 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": 52, 298 | "metadata": { 299 | "collapsed": false 300 | }, 301 | "outputs": [ 302 | { 303 | "name": "stdout", 304 | "output_type": "stream", 305 | "text": [ 306 | "two\n", 307 | "four\n", 308 | "two\n", 309 | "four\n", 310 | "N/A\n", 311 | "{'2': 'two', '4': 'four'}\n" 312 | ] 313 | } 314 | ], 315 | "source": [ 316 | "class StrKeyDict0(dict):\n", 317 | " def __missing__(self, key): \n", 318 | " if isinstance(key, str):\n", 319 | " raise KeyError(key) \n", 320 | " return self[str(key)]\n", 321 | " \n", 322 | " def get(self, key, default=None):\n", 323 | " try:\n", 324 | " return self[key]\n", 325 | " except KeyError: \n", 326 | " return default\n", 327 | " \n", 328 | " def __contains__(self, key):\n", 329 | " return key in self.keys() or str(key) in self.keys()\n", 330 | "\n", 331 | "\n", 332 | "d = StrKeyDict0([('2', 'two'), ('4', 'four')])\n", 333 | "#pass a string - found easily\n", 334 | "print(d['2'])\n", 335 | "#pass a int, convert to string, find\n", 336 | "print(d[4])\n", 337 | "#print(d[1]) >>>throws KeyError1, because 1 is an int and isn't found\n", 338 | "#pass an string, no problem\n", 339 | "print(d.get('2'))\n", 340 | "#pass an int, convert & find\n", 341 | "print(d.get(4))\n", 342 | "#pass an unknown string: pass, convert, return - not added to dict though\n", 343 | "print(d.get(1,'N/A'))\n", 344 | "print(d)\n" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "## Variations of dict\n", 352 | "- collections.OrderedDict\n", 353 | " - maintains keys in insertion order, allowing iteration over items in a predictable order.\n", 354 | "- collections.ChainMap\n", 355 | " - holds a list of mappings which can be searched as one. eg - named tuple containing named tuples\n", 356 | "- collections.counter\n", 357 | " - a mapping that holds an integer count for each key. Updating an existing key adds to its count. \n", 358 | "- collections.UserDict\n", 359 | " - a pure Python implementation of a mapping that works like a standard dict.\n", 360 | " \n", 361 | "## Subclassing UserDict\n", 362 | "\n", 363 | "It’s almost always easier to create a new mapping type by extending UserDict than dict. The main reason why it’s preferable to subclass from UserDict than dict is that the built-in has some implementation shortcuts that end up forcing us to override methods that we can just inherit from UserDict with no problems." 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": 61, 369 | "metadata": { 370 | "collapsed": false 371 | }, 372 | "outputs": [ 373 | { 374 | "name": "stdout", 375 | "output_type": "stream", 376 | "text": [ 377 | "two\n", 378 | "four\n", 379 | "two\n", 380 | "four\n", 381 | "two\n", 382 | "True\n", 383 | "N/A\n", 384 | "None\n", 385 | "{'2': 'two', '4': 'four', '1': 'N/A'}\n" 386 | ] 387 | } 388 | ], 389 | "source": [ 390 | "import collections\n", 391 | "\n", 392 | "class StrKeyDict(collections.UserDict):\n", 393 | " def __missing__(self, key): \n", 394 | " if isinstance(key, str):\n", 395 | " raise KeyError(key) \n", 396 | " return self[str(key)]\n", 397 | "\n", 398 | " def __contains__(self, key): \n", 399 | " return str(key) in self.data\n", 400 | "\n", 401 | " def __setitem__(self, key, item):\n", 402 | " print(item)\n", 403 | " self.data[str(key)] = item\n", 404 | " \n", 405 | "e = StrKeyDict([('2', 'two'), ('4', 'four')])\n", 406 | "#pass a string - found easily\n", 407 | "print(e['2'])\n", 408 | "#pass a int, convert to string, find\n", 409 | "print(e[4])\n", 410 | "#print(d[1]) >>>throws KeyError1, because 1 is an int and isn't found\n", 411 | "#pass an string, no problem\n", 412 | "print(e.get('2'))\n", 413 | "#pass an int, convert & find\n", 414 | "print(e.__contains__(4))\n", 415 | "#pass an unknown string: pass, convert, add, return\n", 416 | "print(e.__setitem__(1,'N/A'))\n", 417 | "print(e)\n" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "Mappingproxy instance that is a read-only but dynamic view of the original mapping" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": 64, 430 | "metadata": { 431 | "collapsed": false 432 | }, 433 | "outputs": [ 434 | { 435 | "name": "stdout", 436 | "output_type": "stream", 437 | "text": [ 438 | "{1: 'A'}\n", 439 | "A\n", 440 | "{1: 'A', 2: 'B'}\n", 441 | "B\n" 442 | ] 443 | } 444 | ], 445 | "source": [ 446 | "from types import MappingProxyType\n", 447 | "\n", 448 | "#make a doct\n", 449 | "d = {1:'A'}\n", 450 | "#make a proxy of it\n", 451 | "d_proxy = MappingProxyType(d)\n", 452 | "#print it - no problems\n", 453 | "print(d_proxy)\n", 454 | "#updated - can't\n", 455 | "print(d_proxy[1])\n", 456 | "#d_proxy[2] = 'x' # throws TypeError: 'mappingproxy' object does not support item assignment\n", 457 | "d[2] = 'B'\n", 458 | "print(d_proxy)\n", 459 | "print(d_proxy[2])" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "## Set theory\n", 467 | "\n", 468 | "A set is a collection of unique objects. A basic use case is removing duplication. Set elements must be hashable. The set type is not hashable, but frozenset is, so you can have frozenset elements inside a set.\n", 469 | "\n", 470 | "## Set operations\n", 471 | "\n", 472 | "There are lots of manipulation methods, and operators, based on mathematical set theory" 473 | ] 474 | }, 475 | { 476 | "cell_type": "code", 477 | "execution_count": 8, 478 | "metadata": { 479 | "collapsed": false 480 | }, 481 | "outputs": [ 482 | { 483 | "name": "stdout", 484 | "output_type": "stream", 485 | "text": [ 486 | "{1}\n", 487 | "\n", 488 | "set()\n", 489 | "frozenset({0, 1, 2, 3, 4, 5, 6, 7, 8, 9})\n", 490 | "{'®', '÷', '$', '£', '×', '¥', '¶', '+', '±', '<', '%', '¤', '=', '°', '¬', '>', 'µ', '§', '©', '#', '¢'}\n" 491 | ] 492 | } 493 | ], 494 | "source": [ 495 | "needles = {1}\n", 496 | "haystack = {2}\n", 497 | "#Count needles in haystack, using asuming vars are sets\n", 498 | "found = len(needles & haystack)\n", 499 | "\n", 500 | "#Same as above, using for\n", 501 | "found = 0\n", 502 | "for n in needles:\n", 503 | " if n in haystack: found += 1\n", 504 | "\n", 505 | "#Count needles, building two sets - could be cheaper if one element is already a set\n", 506 | "found = len(set(needles) & set(haystack))\n", 507 | "#alternative synatx of above\n", 508 | "found = len(set(needles).intersection(haystack))\n", 509 | "\n", 510 | "#Set literals\n", 511 | "s = {1}\n", 512 | "print(s)\n", 513 | "print(type(s))\n", 514 | "s.pop()\n", 515 | "#empty set becomes \n", 516 | "print(s)\n", 517 | "\n", 518 | "#Frozen sets must use the constructor\n", 519 | "print(frozenset(range(10)))\n", 520 | "\n", 521 | "#set comprehension, same syntax as list comp\n", 522 | "from unicodedata import name\n", 523 | "print({chr(i) for i in range(32, 256) if 'SIGN' in name(chr(i),'')})" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "## Use of Dicts and Sets\n", 531 | "\n", 532 | "These are fast datastructures, because the can make use of hash tables for lookup, because they are immutable. Python has optimisations for speed, and aims to keep the hash load factor at < 70%. When load factor increases, a new hash table is created. Collisions are resolved with linear probing(?). Because hash tables are used:\n", 533 | "\n", 534 | "- Dict keys must be hashable\n", 535 | "- dicts have significant memory overhead (i.e. empty table needed & field names stored in record.\n", 536 | "- Key search is FAST\n", 537 | "- Key ordering depends on insertion order, as first of identical insertion doesn't have a collision\n", 538 | "- " 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 11, 544 | "metadata": { 545 | "collapsed": false 546 | }, 547 | "outputs": [ 548 | { 549 | "name": "stdout", 550 | "output_type": "stream", 551 | "text": [ 552 | "d1: dict_keys([86, 91, 1, 62, 55, 92, 880, 234, 7, 81])\n", 553 | "d2: dict_keys([1, 7, 55, 62, 81, 86, 91, 92, 234, 880])\n", 554 | "d3: dict_keys([880, 55, 86, 91, 62, 81, 234, 92, 7, 1])\n", 555 | "True\n" 556 | ] 557 | } 558 | ], 559 | "source": [ 560 | "# dial codes of the top 10 most populous countries\n", 561 | "DIAL_CODES = [\n", 562 | " (86, 'China'),\n", 563 | " (91, 'India'),\n", 564 | " (1, 'United States'),\n", 565 | " (62, 'Indonesia'),\n", 566 | " (55, 'Brazil'),\n", 567 | " (92, 'Pakistan'),\n", 568 | " (880, 'Bangladesh'),\n", 569 | " (234, 'Nigeria'),\n", 570 | " (7, 'Russia'),\n", 571 | " (81, 'Japan'),\n", 572 | "]\n", 573 | "d1 = dict(DIAL_CODES) \n", 574 | "print('d1:', d1.keys())\n", 575 | "d2 = dict(sorted(DIAL_CODES))\n", 576 | "print('d2:', d2.keys())\n", 577 | "d3 = dict(sorted(DIAL_CODES, key=lambda x:x[1])) \n", 578 | "print('d3:', d3.keys())\n", 579 | "#The dictionaries compare equal, because they hold the same key:value pairs.\n", 580 | "print(d1==d2 and d2==d3)" 581 | ] 582 | }, 583 | { 584 | "cell_type": "markdown", 585 | "metadata": {}, 586 | "source": [ 587 | "## How sets work — practical consequences\n", 588 | "The set and frozenset types are also implemented with a hash table, except that each bucket holds only a reference to the element (as if it were a key in a dict, but without a value to go with it). Therefore:\n", 589 | "\n", 590 | "- Set elements must be hashable objects.\n", 591 | "- Sets have a significant memory overhead.\n", 592 | "- Membership testing is very efficient.\n", 593 | "- Element ordering depends on insertion order.\n", 594 | "- Adding elements to a set may change the order of other elements.\n", 595 | "\n", 596 | "## Summary\n", 597 | "\n", 598 | "Dictionaries are a keystone of Python. Beyond the basic dict, the standard library offers handy, ready-to-use specialized mappings like defaultdict, OrderedDict, ChainMap and Counter, all defined in the collections module.\n", 599 | "\n", 600 | "Two powerful methods available in most mappings are setdefault and update. The setdefault method is used to update items holding mutable values, for example, in a dict of list values, to avoid redundant searches for the same key. The update method allows bulk insertion or overwriting of items from any other mapping, from iterables providing (key, value) pairs and from keyword arguments.\n", 601 | "\n", 602 | "The hash table implementation underlying dict and set is extremely fast. There is a price to pay for all this speed, and the price is in memory.\n" 603 | ] 604 | } 605 | ], 606 | "metadata": { 607 | "kernelspec": { 608 | "display_name": "Python 3", 609 | "language": "python", 610 | "name": "python3" 611 | }, 612 | "language_info": { 613 | "codemirror_mode": { 614 | "name": "ipython", 615 | "version": 3 616 | }, 617 | "file_extension": ".py", 618 | "mimetype": "text/x-python", 619 | "name": "python", 620 | "nbconvert_exporter": "python", 621 | "pygments_lexer": "ipython3", 622 | "version": "3.6.0" 623 | } 624 | }, 625 | "nbformat": 4, 626 | "nbformat_minor": 2 627 | } 628 | -------------------------------------------------------------------------------- /Chapter 04 - Text versus Bytes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Text versus Bytes\n", 8 | "\n", 9 | "Python 3 drew distinction between strings of human text and sequences of raw bytes.\n", 10 | "\n", 11 | "## What is a character?\n", 12 | "\n", 13 | "Python 3 => str = unicode characters & is similar to Py2 unicode object\n", 14 | "- ---------------------------------\n", 15 | "Python 2 => unicode object = unicode characters\n", 16 | "\n", 17 | "Python 2 => str = raw bytes\n", 18 | "\n", 19 | "The Unicode standard explicitly separates the identity of characters from specific byte representations.\n", 20 | "\n", 21 | "- Unicode code point is a number from 0 to 1,114,111 (base 10)\n", 22 | "- Represented in the Unicode standard as 4 to 6 hexadecimal digits with a “U+” prefix.\n", 23 | "- About 10% of the valid code points have characters assigned to them\n", 24 | "- The actual bytes that represent a character depend on the encoding in use, where encoding is an algorithm that converts code points to byte sequences and vice-versa. \n", 25 | " - The code point for A (U+0041) = \\x41 in the UTF-8 encoding & \\x41\\x00 in UTF-16LE encoding." 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 14, 31 | "metadata": { 32 | "collapsed": false 33 | }, 34 | "outputs": [ 35 | { 36 | "name": "stdout", 37 | "output_type": "stream", 38 | "text": [ 39 | "4\n", 40 | "b'caf\\xc3\\xa9'\n", 41 | "5\n", 42 | "café\n" 43 | ] 44 | } 45 | ], 46 | "source": [ 47 | "#cafe, with a extended ASCII character\n", 48 | "s = 'café'\n", 49 | "#has 4 unicode characters\n", 50 | "print(len(s))\n", 51 | "#change encoding to UTF-8\n", 52 | "b = s.encode('utf8')\n", 53 | "print(b)\n", 54 | "#now é is represented by 2 bytes, so len = 5\n", 55 | "print(len(b))\n", 56 | "print(b.decode('utf8'))" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "## New binary sequences in Py3\n", 64 | "\n", 65 | "- bytes (Immutable - items are int 0-255)\n", 66 | "- bytearray (Mutable - items are int 0-255" 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 28, 72 | "metadata": { 73 | "collapsed": false 74 | }, 75 | "outputs": [ 76 | { 77 | "name": "stdout", 78 | "output_type": "stream", 79 | "text": [ 80 | "b'caf\\xc3\\xa9'\n", 81 | "99\n", 82 | "b'c'\n", 83 | "bytearray(b'caf\\xc3\\xa9')\n", 84 | "5\n", 85 | "bytearray(b'\\xa9')\n" 86 | ] 87 | } 88 | ], 89 | "source": [ 90 | "#create a byte string using \\xc3\\xa9 for é (not \\xcc\\x81!!)\n", 91 | "cafe = bytes('café', encoding='utf_8')\n", 92 | "#prints as utf-8 byte literals - NOT code point, which starts with U+\n", 93 | "print(cafe)\n", 94 | "#prints first character, but represented as ASCII decimal. C = 99 in ASCII decimal\n", 95 | "print(cafe[0])\n", 96 | "# slice produces output of same type\n", 97 | "print(cafe[:1])\n", 98 | "cafe_arr = bytearray(cafe)\n", 99 | "#byte array displays as bytearray(b....). \"caf\" are in the ASCII range, so printed\n", 100 | "print(cafe_arr)\n", 101 | "#cafe_arr has 5 bytes - 2 for é\n", 102 | "print(len(cafe_arr))\n", 103 | "#... , so the last item in a bytestring is the last of the 2 é bytes - i.e. \\xa9\n", 104 | "print(cafe_arr[-1:])" 105 | ] 106 | }, 107 | { 108 | "cell_type": "markdown", 109 | "metadata": {}, 110 | "source": [ 111 | "- For bytes in the printable ASCII range — from space to ~ — the ASCII character itself is used.\n", 112 | "- For bytes corresponding to tab, newline, carriage return and \\, the escape sequences \\t, \\n, \\r and \\\\ are used.\n", 113 | "- For every other byte value, an hexadecimal escape sequence is used, e.g. \\x00 is the null byte.\n", 114 | "\n", 115 | "Both bytes and bytearray support every str method except those that do formatting (format, format_map). This means that you can use familiar string methods like endswith, replace, strip, translate, upper etc.\n", 116 | "\n", 117 | "The other ways of building bytes or bytearray instances are calling their constructors with:\n", 118 | "- a str and an encoding keyword argument.\n", 119 | "- an iterable providing items with values from 0 to 255.\n", 120 | "- a single integer, to create a binary sequence of that size initialized with null bytes3.\n", 121 | "- an object that implements the buffer protocol (eg. bytes, bytearray, memoryview, array.array); this copies the bytes from the source object to the newly created binary sequence." 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 30, 127 | "metadata": { 128 | "collapsed": false 129 | }, 130 | "outputs": [ 131 | { 132 | "name": "stdout", 133 | "output_type": "stream", 134 | "text": [ 135 | "b'1K\\xce\\xa9'\n" 136 | ] 137 | }, 138 | { 139 | "data": { 140 | "text/plain": [ 141 | "b'\\xfe\\xff\\xff\\xff\\x00\\x00\\x01\\x00\\x02\\x00'" 142 | ] 143 | }, 144 | "execution_count": 30, 145 | "metadata": {}, 146 | "output_type": "execute_result" 147 | } 148 | ], 149 | "source": [ 150 | "#printing from Hex to UTF-8\n", 151 | "print(bytes.fromhex('31 4B CE A9'))\n", 152 | "\n", 153 | "# Initializing bytes from the raw data of an array.\n", 154 | "import array\n", 155 | "numbers = array.array('h', [-2, -1, 0, 1, 2]) \n", 156 | "octets = bytes(numbers)\n", 157 | "octets" 158 | ] 159 | }, 160 | { 161 | "cell_type": "markdown", 162 | "metadata": {}, 163 | "source": [ 164 | "## Structs and memory views\n", 165 | "\n", 166 | "The struct module provides functions to parse packed bytes into a tuple of fields of different types and to perform the opposite conversion, from a tuple into packed bytes. \n", 167 | "\n", 168 | "Memoryview class does not let you create or store byte sequences, but provides shared memory access to slices of data from other binary sequences, packed arrays and buffers such as PIL images, without copying the bytes." 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 32, 174 | "metadata": { 175 | "collapsed": false 176 | }, 177 | "outputs": [ 178 | { 179 | "name": "stdout", 180 | "output_type": "stream", 181 | "text": [ 182 | "b'GIF87aY\\x02\\xcb\\x00'\n", 183 | "(b'GIF', b'87a', 601, 203)\n" 184 | ] 185 | } 186 | ], 187 | "source": [ 188 | "import struct\n", 189 | "\n", 190 | "#struct format: < little-endian; 3s3s two sequences of 3 bytes; HH two 16-bit integers.\n", 191 | "fmt = '<3s3sHH'\n", 192 | "\n", 193 | "#python.gif = 601x203px\n", 194 | "with open('python.gif', 'rb') as fp:\n", 195 | " img = memoryview(fp.read())\n", 196 | "\n", 197 | "header = img[:10]\n", 198 | "\n", 199 | "print(bytes(header))\n", 200 | "#type, version, width height\n", 201 | "print(struct.unpack(fmt, header))\n", 202 | "#release memory associated with memory view instances\n", 203 | "del header\n", 204 | "del img" 205 | ] 206 | }, 207 | { 208 | "cell_type": "markdown", 209 | "metadata": {}, 210 | "source": [ 211 | "## Basic encoders/decoders\n", 212 | "\n" 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "for codec in ['latin_1','utf_8', 'utf_16']:\n", 220 | " # Make sure n = ñ from Latin1, not ñ from OSX\n", 221 | " print(codec, 'El Niño'.encode(codec), sep='\\t')" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 45, 227 | "metadata": { 228 | "collapsed": false 229 | }, 230 | "outputs": [ 231 | { 232 | "name": "stdout", 233 | "output_type": "stream", 234 | "text": [ 235 | "b'Sa\\xcc\\x83o Paulo'\n", 236 | "b'\\xff\\xfeS\\x00a\\x00\\x03\\x03o\\x00 \\x00P\\x00a\\x00u\\x00l\\x00o\\x00'\n", 237 | "b'Sao Paulo'\n", 238 | "b'Sa?o Paulo'\n", 239 | "b'São Paulo'\n", 240 | "Montréal\n", 241 | "Montrιal\n", 242 | "MontrИal\n", 243 | "Montr�al\n" 244 | ] 245 | } 246 | ], 247 | "source": [ 248 | "city = 'São Paulo'\n", 249 | "print(city.encode('utf_8'))\n", 250 | "print(city.encode('utf_16'))\n", 251 | "# print(city.encode('iso8859_1'))\n", 252 | "#silently skip unknown chars\n", 253 | "print(city.encode('cp437', errors='ignore'))\n", 254 | "#replace with ?\n", 255 | "print(city.encode('cp437', errors='replace'))\n", 256 | "#replace with XML\n", 257 | "print(city.encode('cp437', errors='xmlcharrefreplace'))\n", 258 | "\n", 259 | "##Coping with UnicodeDecodeError\n", 260 | "\n", 261 | "octets = b'Montr\\xe9al' \n", 262 | "print(octets.decode('cp1252'))\n", 263 | "print(octets.decode('iso8859_7'))\n", 264 | "print(octets.decode('koi8_r'))\n", 265 | "#print(octets.decode('utf_8')) #'utf-8' codec can't decode byte 0xe9 in position 5: invalid continuation byte\n", 266 | "print(octets.decode('utf_8', errors='replace'))" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "Chardet can be used to detect encoding, based on common bytes\n", 274 | "\n", 275 | "## Endianness, and byte order\n", 276 | "\n", 277 | "One big advantage of UTF-8 is that it produces the same byte sequence regardless of machine endianness, so no BOM is needed. \n", 278 | "\n", 279 | "## Handling text files\n", 280 | "\n", 281 | "- Bytes should be decoded to str as early as possible on input, e.g. when opening a file for reading. B\n", 282 | "- Business logic of your program, where text handling is done exclusively on str objects. You should never be encoding or de‐ coding in the middle of other processing. \n", 283 | "- On output, the str are encoded to bytes as late as possible.\n", 284 | "\n", 285 | "Python 3 makes it easier to follow the advice of the Unicode sandwich, because the open built-in does the necessary decoding when reading and encoding when writing files in text mode, so all you get from my_file.read() and pass to my_file.write(text) are str objects." 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 58, 291 | "metadata": { 292 | "collapsed": false 293 | }, 294 | "outputs": [ 295 | { 296 | "name": "stdout", 297 | "output_type": "stream", 298 | "text": [ 299 | "<_io.TextIOWrapper name='cafe.txt' mode='w' encoding='utf_8'>\n", 300 | "4\n", 301 | "<_io.TextIOWrapper name='cafe.txt' mode='r' encoding='US-ASCII'>\n", 302 | "US-ASCII\n", 303 | "<_io.BufferedReader name='cafe.txt'>\n", 304 | "b'caf\\xc3\\xa9'\n" 305 | ] 306 | } 307 | ], 308 | "source": [ 309 | "fp = open('cafe.txt', 'w', encoding='utf_8')\n", 310 | "#returns TextIOWrapper object\n", 311 | "print(fp)\n", 312 | "print(fp.write('café'))\n", 313 | "fp.close\n", 314 | "import os\n", 315 | "os.stat('cafe.txt').st_size\n", 316 | "#opens with locale default encoding (ASCII for me)\n", 317 | "fp2 = open('cafe.txt')\n", 318 | "print(fp2)\n", 319 | "print(fp2.encoding)\n", 320 | "fp3 = open('cafe.txt', 'rb')\n", 321 | "print(fp3)\n", 322 | "#read the raw bytes\n", 323 | "print(fp3.read())" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": 64, 329 | "metadata": { 330 | "collapsed": false 331 | }, 332 | "outputs": [ 333 | { 334 | "name": "stdout", 335 | "output_type": "stream", 336 | "text": [ 337 | " locale.getpreferredencoding() -> 'US-ASCII'\n", 338 | " type(my_file) -> \n", 339 | " my_file.encoding -> 'US-ASCII'\n", 340 | " sys.stdout.isatty() -> False\n", 341 | " sys.stdout.encoding -> 'UTF-8'\n", 342 | " sys.stdin.isatty() -> False\n", 343 | " sys.stdin.encoding -> 'US-ASCII'\n", 344 | " sys.stderr.isatty() -> False\n", 345 | " sys.stderr.encoding -> 'UTF-8'\n", 346 | " sys.getdefaultencoding() -> 'utf-8'\n", 347 | " sys.getfilesystemencoding() -> 'utf-8'\n" 348 | ] 349 | } 350 | ], 351 | "source": [ 352 | "import sys, locale\n", 353 | "expressions = \"\"\"\n", 354 | " locale.getpreferredencoding()\n", 355 | " type(my_file)\n", 356 | " my_file.encoding\n", 357 | " sys.stdout.isatty()\n", 358 | " sys.stdout.encoding\n", 359 | " sys.stdin.isatty()\n", 360 | " sys.stdin.encoding\n", 361 | " sys.stderr.isatty()\n", 362 | " sys.stderr.encoding\n", 363 | " sys.getdefaultencoding()\n", 364 | " sys.getfilesystemencoding()\n", 365 | "\"\"\"\n", 366 | "\n", 367 | "#locale.getpreferredencoding() = default from locale\n", 368 | "#my_file.encoding = file gets from default localte\n", 369 | "#sys.stdout.isatty() = output is not going to console\n", 370 | "#sys.stdout.encoding = therefore console output is UTF-8\n", 371 | "#sys.stdin.isatty()\n", 372 | "#sys.stdin.encoding\n", 373 | "#sys.stderr.isatty()\n", 374 | "#sys.stderr.encoding\n", 375 | "#sys.getdefaultencoding() # default from internal Python setting\n", 376 | "#sys.getfilesystemencoding() # is mbcs on Windows. On GNU/Linux and OSX all of these encodings... \n", 377 | " #are set to UTF-8 by default, and have been for several years, so I/O handles all Unicode characters. \n", 378 | "\n", 379 | "my_file = open('dummy', 'w')\n", 380 | "\n", 381 | "for expression in expressions.split():\n", 382 | " value = eval(expression) \n", 383 | " print(expression.rjust(30), '->', repr(value))" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": {}, 389 | "source": [ 390 | "the best advice about encoding defaults is: do not rely on them!\n", 391 | "\n", 392 | "## Normalizing Unicode for saner comparisons\n", 393 | "\n", 394 | "NFC (Normalization Form C) composes the code points to produce the shortest equiv‐ alent string, while NFD decomposes, expanding composed characters into base char‐ acters and separate combining characters. Both of these normalizations make compar‐ isons work as expected:" 395 | ] 396 | }, 397 | { 398 | "cell_type": "code", 399 | "execution_count": 69, 400 | "metadata": { 401 | "collapsed": false 402 | }, 403 | "outputs": [ 404 | { 405 | "name": "stdout", 406 | "output_type": "stream", 407 | "text": [ 408 | "5 4\n", 409 | "4 4\n", 410 | "True\n", 411 | "5 5\n", 412 | "True\n" 413 | ] 414 | } 415 | ], 416 | "source": [ 417 | "from unicodedata import normalize\n", 418 | "s1 = 'café' # composed \"e\" with acute accent\n", 419 | "s2 = 'café' # decomposed \"e\" and acute accent\n", 420 | "print(len(s1), len(s2))\n", 421 | "#using NFC\n", 422 | "print(len(normalize('NFC', s1)), len(normalize('NFC', s2)))\n", 423 | "print(normalize('NFC', s1) == normalize('NFC', s2))\n", 424 | "#Using NFD\n", 425 | "print(len(normalize('NFD', s1)), len(normalize('NFD', s2)))\n", 426 | "print(normalize('NFD', s1) == normalize('NFD', s2))" 427 | ] 428 | }, 429 | { 430 | "cell_type": "code", 431 | "execution_count": 70, 432 | "metadata": { 433 | "collapsed": false 434 | }, 435 | "outputs": [ 436 | { 437 | "name": "stdout", 438 | "output_type": "stream", 439 | "text": [ 440 | "OHM SIGN\n", 441 | "GREEK CAPITAL LETTER OMEGA\n", 442 | "False\n", 443 | "True\n" 444 | ] 445 | } 446 | ], 447 | "source": [ 448 | "from unicodedata import normalize, name\n", 449 | "#ohn the unit\n", 450 | "ohm = '\\u2126'\n", 451 | "print(name(ohm))\n", 452 | "#normalise to greek char\n", 453 | "ohm_c = normalize('NFC', ohm)\n", 454 | "print(name(ohm_c))\n", 455 | "#originals don't match\n", 456 | "print(ohm == ohm_c)\n", 457 | "#normalised do\n", 458 | "print(normalize('NFC', ohm) == normalize('NFC', ohm_c))" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": 74, 464 | "metadata": { 465 | "collapsed": false 466 | }, 467 | "outputs": [ 468 | { 469 | "name": "stdout", 470 | "output_type": "stream", 471 | "text": [ 472 | "1⁄2\n", 473 | "42\n", 474 | "µ μ\n", 475 | "181 956\n", 476 | "MICRO SIGN GREEK SMALL LETTER MU\n" 477 | ] 478 | } 479 | ], 480 | "source": [ 481 | "from unicodedata import normalize, name \n", 482 | "\n", 483 | "half = '½'\n", 484 | "print(normalize('NFKC', half))\n", 485 | "four_squared = '4²'\n", 486 | "print(normalize('NFKC', four_squared))\n", 487 | "#the micro sign is considered a “compatibility character”.\n", 488 | "micro = 'µ'\n", 489 | "micro_kc = normalize('NFKC', micro) \n", 490 | "print(micro, micro_kc)\n", 491 | "print(ord(micro), ord(micro_kc))\n", 492 | "print(name(micro), name(micro_kc))\n" 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": {}, 498 | "source": [ 499 | "In the NFKC and NFKD forms, each compatibility character is replaced by a “compat‐ ibility decomposition” of one or more characters that are considered a “preferred” rep‐ resentation, even if there is some formatting loss.\n", 500 | "\n", 501 | "## Case folding\n", 502 | "\n", 503 | "Case folding is essentially converting all text to lowercase, with some additional transformations. For any string s containing only latin-1 characters, s.casefold() produces the same result as s.lower(), with only two exceptions: the micro sign 'μ' is changed to the Greek lower case mu (which looks the same in most fonts) and the German Eszett or “sharp s” (ß) becomes “ss”.\n" 504 | ] 505 | }, 506 | { 507 | "cell_type": "code", 508 | "execution_count": 79, 509 | "metadata": { 510 | "collapsed": false 511 | }, 512 | "outputs": [ 513 | { 514 | "name": "stdout", 515 | "output_type": "stream", 516 | "text": [ 517 | "GREEK SMALL LETTER MU\n", 518 | "µ μ\n", 519 | "LATIN SMALL LETTER SHARP S\n", 520 | "ß ss\n" 521 | ] 522 | } 523 | ], 524 | "source": [ 525 | "micro = 'µ'\n", 526 | "micro_cf = micro.casefold() \n", 527 | "print(name(micro_cf))\n", 528 | "print(micro, micro_cf)\n", 529 | "eszett = 'ß'\n", 530 | "print(name(eszett))\n", 531 | "eszett_cf = eszett.casefold()\n", 532 | "print(eszett, eszett_cf)" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": 83, 538 | "metadata": { 539 | "collapsed": false 540 | }, 541 | "outputs": [ 542 | { 543 | "name": "stdout", 544 | "output_type": "stream", 545 | "text": [ 546 | "False\n", 547 | "True\n", 548 | "False\n", 549 | "False\n", 550 | "False\n", 551 | "True\n", 552 | "True\n", 553 | "True\n" 554 | ] 555 | } 556 | ], 557 | "source": [ 558 | "from unicodedata import normalize \n", 559 | "\n", 560 | "def nfc_equal(str1, str2):\n", 561 | " return normalize('NFC', str1) == normalize('NFC', str2)\n", 562 | "\n", 563 | "def fold_equal(str1, str2):\n", 564 | " return (normalize('NFC', str1).casefold() ==\n", 565 | " normalize('NFC', str2).casefold())\n", 566 | "\n", 567 | "s1 = 'café'\n", 568 | "s2 = 'cafe\\u0301'\n", 569 | "#different é\n", 570 | "print(s1 == s2)\n", 571 | "#normalise them - ok\n", 572 | "print(nfc_equal(s1, s2))\n", 573 | "#normalising doesn't work, because both have valid, but different code points\n", 574 | "print(nfc_equal('A', 'a'))\n", 575 | "\n", 576 | "s3 = 'Straße'\n", 577 | "s4 = 'strasse'\n", 578 | "#not equal\n", 579 | "print(s3 == s4)\n", 580 | "#normalised are equal\n", 581 | "print(nfc_equal(s3, s4))\n", 582 | "#folding means transformation for ezzet\n", 583 | "print(fold_equal(s3, s4))\n", 584 | "# é is normalised\n", 585 | "print(fold_equal(s1, s2))\n", 586 | "# cases are matched during casefold\n", 587 | "print(fold_equal('A', 'a'))" 588 | ] 589 | }, 590 | { 591 | "cell_type": "markdown", 592 | "metadata": {}, 593 | "source": [ 594 | "## Extreme “normalization”: taking out diacritics" 595 | ] 596 | }, 597 | { 598 | "cell_type": "code", 599 | "execution_count": 100, 600 | "metadata": { 601 | "collapsed": false 602 | }, 603 | "outputs": [ 604 | { 605 | "name": "stdout", 606 | "output_type": "stream", 607 | "text": [ 608 | "“Herr Voß: • ½ cup of ŒtkerTM caffe latte • bowl of acai.”\n", 609 | "\"Herr Voß: - ½ cup of OEtkerTM caffè latte - bowl of açaí.\"\n", 610 | "\"Herr Voss: - 1⁄2 cup of OEtkerTM caffe latte - bowl of acai.\"\n" 611 | ] 612 | } 613 | ], 614 | "source": [ 615 | "import unicodedata\n", 616 | "import string\n", 617 | "\n", 618 | "def shave_marks(txt):\n", 619 | " \"\"\"Remove all diacritic marks\"\"\"\n", 620 | " norm_txt = unicodedata.normalize('NFD', txt) \n", 621 | " shaved = ''.join(c for c in norm_txt\n", 622 | " if not unicodedata.combining(c)) \n", 623 | " return unicodedata.normalize('NFC', shaved)\n", 624 | "\n", 625 | "def shave_marks_latin(txt):\n", 626 | " \"\"\"Remove all diacritic marks from Latin base characters\"\"\" \n", 627 | " norm_txt = unicodedata.normalize('NFD', txt)\n", 628 | " latin_base = False\n", 629 | " keepers = []\n", 630 | " for c in norm_txt:\n", 631 | " if unicodedata.combining(c) and latin_base: \n", 632 | " continue # ignore diacritic on Latin base char\n", 633 | " keepers.append(c)\n", 634 | " # if it isn't combining char, it's a new base char \n", 635 | " if not unicodedata.combining(c):\n", 636 | " latin_base = c in string.ascii_letters \n", 637 | " shaved = ''.join(keepers)\n", 638 | " return unicodedata.normalize('NFC', shaved)\n", 639 | "\n", 640 | "single_map = str.maketrans(\"\"\"‚ƒ„†ˆ‹‘’“”•–—~›\"\"\",\n", 641 | " \"\"\"'f\"*^<''\"\"---~>\"\"\")\n", 642 | "\n", 643 | "multi_map = str.maketrans({\n", 644 | " '€': '',\n", 645 | " 'Œ': 'OE',\n", 646 | " '‰': '',\n", 647 | "})\n", 648 | "\n", 649 | "multi_map.update(single_map)\n", 650 | "\n", 651 | "def dewinize(txt):\n", 652 | " \"\"\"Replace Win1252 symbols with ASCII chars or sequences\"\"\" \n", 653 | " return txt.translate(multi_map)\n", 654 | "\n", 655 | "def asciize(txt):\n", 656 | " no_marks = shave_marks_latin(dewinize(txt)) \n", 657 | " no_marks = no_marks.replace('ß', 'ss')\n", 658 | " return unicodedata.normalize('NFKC', no_marks)\n", 659 | "\n", 660 | "order = '“Herr Voß: • ½ cup of ŒtkerTM caffè latte • bowl of açaí.”'\n", 661 | "\n", 662 | "print(shave_marks(order))\n", 663 | "print(dewinize(order))\n", 664 | "print(asciize(order))" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": 106, 670 | "metadata": { 671 | "collapsed": false 672 | }, 673 | "outputs": [ 674 | { 675 | "name": "stdout", 676 | "output_type": "stream", 677 | "text": [ 678 | "['acerola', 'açaí', 'atemoia', 'cajá', 'caju']\n", 679 | "['acerola', 'açaí', 'atemoia', 'cajá', 'caju']\n" 680 | ] 681 | } 682 | ], 683 | "source": [ 684 | "## Doesn't work on OSX?\n", 685 | "fruits = ['caju', 'atemoia', 'cajá', 'açaí', 'acerola']\n", 686 | "print(sorted(fruits))\n", 687 | "\n", 688 | "import locale\n", 689 | "locale.setlocale(locale.LC_COLLATE, 'pt_BR.UTF-8')\n", 690 | "fruits2 = ['caju', 'atemoia', 'cajá', 'açaí', 'acerola']\n", 691 | "sorted_fruits = sorted(fruits2, key=locale.strxfrm)\n", 692 | "print(sorted_fruits)\n", 693 | "\n", 694 | "#Module not found - need to install Django?\n", 695 | "#import pyuca\n", 696 | "#coll = pyuca.Collator()\n", 697 | "#fruits = ['caju', 'atemoia', 'cajá', 'açaí', 'acerola']\n", 698 | "#sorted_fruits = sorted(fruits, key=coll.sort_key)\n", 699 | "#sorted_fruits" 700 | ] 701 | }, 702 | { 703 | "cell_type": "markdown", 704 | "metadata": {}, 705 | "source": [ 706 | "## The Unicode database\n", 707 | "\n", 708 | "The Unicode standard provides an entire database that includes not only the table mapping code points to character names, but also lot of metadata about the individual characters and how they are related. For example:\n", 709 | "\n", 710 | "- the Unicode database records whether a character is printable, is a letter, is a decimal digit or is some other numeric symbol. " 711 | ] 712 | }, 713 | { 714 | "cell_type": "code", 715 | "execution_count": 107, 716 | "metadata": { 717 | "collapsed": false 718 | }, 719 | "outputs": [ 720 | { 721 | "name": "stdout", 722 | "output_type": "stream", 723 | "text": [ 724 | "U+0031\t 1 \tre_dig\tisdig\tisnum\t 1.00\tDIGIT ONE\n", 725 | "U+00bc\t ¼ \t-\t-\tisnum\t 0.25\tVULGAR FRACTION ONE QUARTER\n", 726 | "U+00b2\t ² \t-\tisdig\tisnum\t 2.00\tSUPERSCRIPT TWO\n", 727 | "U+0969\t ३ \tre_dig\tisdig\tisnum\t 3.00\tDEVANAGARI DIGIT THREE\n", 728 | "U+136b\t ፫ \t-\tisdig\tisnum\t 3.00\tETHIOPIC DIGIT THREE\n", 729 | "U+216b\t Ⅻ \t-\t-\tisnum\t12.00\tROMAN NUMERAL TWELVE\n", 730 | "U+2466\t ⑦ \t-\tisdig\tisnum\t 7.00\tCIRCLED DIGIT SEVEN\n", 731 | "U+2480\t ⒀ \t-\t-\tisnum\t13.00\tPARENTHESIZED NUMBER THIRTEEN\n", 732 | "U+3285\t ㊅ \t-\t-\tisnum\t 6.00\tCIRCLED IDEOGRAPH SIX\n" 733 | ] 734 | } 735 | ], 736 | "source": [ 737 | "import unicodedata\n", 738 | "import re\n", 739 | "\n", 740 | "re_digit = re.compile(r'\\d')\n", 741 | "sample = '1\\xbc\\xb2\\u0969\\u136b\\u216b\\u2466\\u2480\\u3285'\n", 742 | "\n", 743 | "#print the unicode codepoint\n", 744 | "for char in sample: print('U+%04x' % ord(char),\n", 745 | " #print the char\n", 746 | " char.center(6),\n", 747 | " #Show re_dig if character matches the r'\\d' regex.\n", 748 | " 're_dig' if re_digit.match(char) else '-',\n", 749 | " #Show isdig if char.isdigit() is True.\n", 750 | " 'isdig' if char.isdigit() else '-',\n", 751 | " #Show isnum if char.isnumeric() is True.\n", 752 | " 'isnum' if char.isnumeric() else '-', \n", 753 | " format(unicodedata.numeric(char), '5.2f'), \n", 754 | " unicodedata.name(char),\n", 755 | " sep='\\t')\n" 756 | ] 757 | }, 758 | { 759 | "cell_type": "markdown", 760 | "metadata": {}, 761 | "source": [ 762 | "## str versus bytes in regular expressions\n", 763 | "\n", 764 | "you can use regular expressions on str and bytes but in the second case bytes outside of the ASCII range are treated as non-digits and non-word characters." 765 | ] 766 | }, 767 | { 768 | "cell_type": "code", 769 | "execution_count": 109, 770 | "metadata": { 771 | "collapsed": false 772 | }, 773 | "outputs": [ 774 | { 775 | "name": "stdout", 776 | "output_type": "stream", 777 | "text": [ 778 | "Text\n", 779 | " 'Ramanujan saw ௧௭௨௯ as 1729 = 1³ + 12³ = 9³ + 10³.'\n", 780 | "Numbers\n", 781 | " str : ['௧௭௨௯', '1729', '1', '12', '9', '10']\n", 782 | " bytes: [b'1729', b'1', b'12', b'9', b'10']\n", 783 | "Words\n", 784 | " str : ['Ramanujan', 'saw', '௧௭௨௯', 'as', '1729', '1³', '12³', '9³', '10³']\n", 785 | " bytes: [b'Ramanujan', b'saw', b'as', b'1729', b'1', b'12', b'9', b'10']\n" 786 | ] 787 | } 788 | ], 789 | "source": [ 790 | "import re\n", 791 | "re_numbers_str = re.compile(r'\\d+')\n", 792 | "re_words_str = re.compile(r'\\w+')\n", 793 | "re_numbers_bytes = re.compile(rb'\\d+')\n", 794 | "re_words_bytes = re.compile(rb'\\w+')\n", 795 | "text_str = (\"Ramanujan saw \\u0be7\\u0bed\\u0be8\\u0bef\" \n", 796 | " \" as 1729 = 1³ + 12³ = 9³ + 10³.\")\n", 797 | "text_bytes = text_str.encode('utf_8')\n", 798 | "print('Text', repr(text_str), sep='\\n ') \n", 799 | "print('Numbers')\n", 800 | "print(' str :', re_numbers_str.findall(text_str)) \n", 801 | "print(' bytes:', re_numbers_bytes.findall(text_bytes)) \n", 802 | "print('Words')\n", 803 | "print(' str :', re_words_str.findall(text_str)) \n", 804 | "print(' bytes:', re_words_bytes.findall(text_bytes))" 805 | ] 806 | } 807 | ], 808 | "metadata": { 809 | "kernelspec": { 810 | "display_name": "Python 3", 811 | "language": "python", 812 | "name": "python3" 813 | }, 814 | "language_info": { 815 | "codemirror_mode": { 816 | "name": "ipython", 817 | "version": 3 818 | }, 819 | "file_extension": ".py", 820 | "mimetype": "text/x-python", 821 | "name": "python", 822 | "nbconvert_exporter": "python", 823 | "pygments_lexer": "ipython3", 824 | "version": "3.6.0" 825 | } 826 | }, 827 | "nbformat": 4, 828 | "nbformat_minor": 2 829 | } 830 | -------------------------------------------------------------------------------- /Chapter 05 - First-class functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# First-class functions\n", 8 | "\n", 9 | "First-class objects are those that can be:\n", 10 | "\n", 11 | "- created at runtime;\n", 12 | "- assigned to a variable or element in a data structure;\n", 13 | "- passed as an argument to a function;\n", 14 | "- returned as the result of a function.\n", 15 | "\n" 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 4, 21 | "metadata": { 22 | "collapsed": false 23 | }, 24 | "outputs": [ 25 | { 26 | "name": "stdout", 27 | "output_type": "stream", 28 | "text": [ 29 | "1405006117752879898543142606244511569936384000000000\n", 30 | "returns n!\n", 31 | "\n", 32 | "\n", 33 | "120\n", 34 | "\n", 35 | "[1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800]\n" 36 | ] 37 | } 38 | ], 39 | "source": [ 40 | "#function is created a runtime\n", 41 | "def factorial(n):\n", 42 | " '''returns n!'''\n", 43 | " return 1 if n < 2 else n * factorial(n-1)\n", 44 | "\n", 45 | "print(factorial(42))\n", 46 | "print(factorial.__doc__)\n", 47 | "print(type(factorial))\n", 48 | "#is assigned to a variable\n", 49 | "fact = factorial\n", 50 | "print(fact)\n", 51 | "print(fact(5))\n", 52 | "#is passed as an argument\n", 53 | "print(map(factorial,range(11)))\n", 54 | "#factorial returned from a higher-order function\n", 55 | "print(list(map(factorial,range(11))))" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "## Higher-order functions\n", 63 | "\n", 64 | "A function that takes a function as argument or returns a function as result is a higher-order function. One example is map." 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 8, 70 | "metadata": { 71 | "collapsed": false 72 | }, 73 | "outputs": [ 74 | { 75 | "name": "stdout", 76 | "output_type": "stream", 77 | "text": [ 78 | "['fig', 'apple', 'cherry', 'banana', 'raspberry', 'strawberry']\n", 79 | "['banana', 'apple', 'fig', 'raspberry', 'strawberry', 'cherry']\n", 80 | "[1, 1, 2, 6, 24, 120]\n", 81 | "[1, 1, 2, 6, 24, 120]\n", 82 | "[1, 6, 120]\n", 83 | "[1, 6, 120]\n" 84 | ] 85 | } 86 | ], 87 | "source": [ 88 | "fruits = ['strawberry', 'fig', 'apple', 'cherry', 'raspberry', 'banana']\n", 89 | "print(sorted(fruits, key=len))\n", 90 | "\n", 91 | "def reverse(word):\n", 92 | " return word[::-1]\n", 93 | " reverse('testing')\n", 94 | "\n", 95 | "print(sorted(fruits, key=reverse))\n", 96 | "\n", 97 | "#Functional languages commonly offer the map, filter and reduce higher-order functions\n", 98 | "#A listcomp or a genexp does the job of map and filter combined, but is more readable\n", 99 | "print(list(map(fact, range(6))))\n", 100 | "print([fact(n) for n in range(6)])\n", 101 | "print(list(map(factorial, filter(lambda n: n % 2, range(6)))))\n", 102 | "print([factorial(n) for n in range(6) if n % 2])" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "metadata": {}, 108 | "source": [ 109 | "## Anonymous functions\n", 110 | "\n", 111 | "The lambda keyword creates an anonymous function within a Python expression." 112 | ] 113 | }, 114 | { 115 | "cell_type": "code", 116 | "execution_count": 9, 117 | "metadata": { 118 | "collapsed": false 119 | }, 120 | "outputs": [ 121 | { 122 | "data": { 123 | "text/plain": [ 124 | "['banana', 'apple', 'fig', 'raspberry', 'strawberry', 'cherry']" 125 | ] 126 | }, 127 | "execution_count": 9, 128 | "metadata": {}, 129 | "output_type": "execute_result" 130 | } 131 | ], 132 | "source": [ 133 | "fruits = ['strawberry', 'fig', 'apple', 'cherry', 'raspberry', 'banana']\n", 134 | "sorted(fruits, key=lambda word: word[::-1])" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "## The seven flavors of callable objects\n", 142 | "The call operator, i.e. (), may be applied to other objects beyond user-defined functions.\n", 143 | "\n", 144 | "- User-defined functions: created with def statements or lambda expressions. \n", 145 | "- Built-in functions: a function implemented in C (for CPython), like len \n", 146 | "- Built-in methods methods implemented in C, like dict.get\n", 147 | "- Methods functions defined in the body of a class.\n", 148 | "- Classes when invoked, a class runs its __new__ method to create an instance, then __in it__ to initialize it, and finally the instance is returned to the caller\n", 149 | "- Class instances if a class defines a __call__ method, then its instances may be invoked as functions\n", 150 | "- Generator functions functions or methods that use the yield keyword" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 38, 156 | "metadata": { 157 | "collapsed": false 158 | }, 159 | "outputs": [ 160 | { 161 | "name": "stdout", 162 | "output_type": "stream", 163 | "text": [ 164 | "68\n", 165 | "86\n" 166 | ] 167 | } 168 | ], 169 | "source": [ 170 | "#A user defined function, with call() implemented\n", 171 | "import random \n", 172 | "\n", 173 | "class BingoCage:\n", 174 | " \n", 175 | " def __init__(self, items): \n", 176 | " self._items = list(items) \n", 177 | " random.shuffle(self._items)\n", 178 | " \n", 179 | " def pick(self): \n", 180 | " try:\n", 181 | " return self._items.pop() \n", 182 | " except IndexError:\n", 183 | " raise LookupError('pick from empty BingoCage') \n", 184 | " \n", 185 | " def __call__(self):\n", 186 | " return self.pick()\n", 187 | "\n", 188 | "bingo = BingoCage(range(100))\n", 189 | "#call via pick\n", 190 | "print(bingo.pick())\n", 191 | "#call using call()\n", 192 | "print(bingo())" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 3, 198 | "metadata": { 199 | "collapsed": false 200 | }, 201 | "outputs": [ 202 | { 203 | "name": "stdout", 204 | "output_type": "stream", 205 | "text": [ 206 | "['__annotations__', '__call__', '__class__', '__closure__', '__code__', '__defaults__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__get__', '__getattribute__', '__globals__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__kwdefaults__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__']\n", 207 | "['__annotations__', '__call__', '__closure__', '__code__', '__defaults__', '__get__', '__globals__', '__kwdefaults__', '__name__', '__qualname__']\n" 208 | ] 209 | } 210 | ], 211 | "source": [ 212 | "def factorial(n):\n", 213 | " '''returns n!'''\n", 214 | " return 1 if n < 2 else n * factorial(n-1)\n", 215 | "\n", 216 | "print(dir(factorial))\n", 217 | "\n", 218 | "# empty user-defined class\n", 219 | "class C: pass\n", 220 | "# instantiated\n", 221 | "obj = C()\n", 222 | "# empty function\n", 223 | "def func():pass\n", 224 | "# attributes that exist in a function, but not instance\n", 225 | "print(sorted(set(dir(func)) - set(dir(obj))))" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 32, 231 | "metadata": { 232 | "collapsed": false 233 | }, 234 | "outputs": [ 235 | { 236 | "name": "stdout", 237 | "output_type": "stream", 238 | "text": [ 239 | "NAME TYPE DESCRIPTION \n", 240 | "__annotations__ dict parameter and return annotations\n", 241 | "__call__ method-wrapper implementation of the () operator; a.k.a. the callable object protocol\n", 242 | "__closure__ tuple the function closure, i.e. bindings for free variables (often isNone)\n", 243 | "__code__ code function metadata and function body compiled into bytecode\n", 244 | "__defaults__ tuple default values for the formal parameters\n", 245 | "__get__ method-wrapper implementation of the read-only descriptor protocol (see XREF)\n", 246 | "__globals__ dict global variables of the module where the function is defined\n", 247 | "__kwdefaults__ dict default values for the keyword-only formal parameters\n", 248 | "__name__ str the function name \n", 249 | "__qualname__ str the qualified function name, ex.:Random.choice(see PEP-3155)\n" 250 | ] 251 | } 252 | ], 253 | "source": [ 254 | "data = [['NAME', 'TYPE', 'DESCRIPTION'],\n", 255 | "['__annotations__', 'dict', 'parameter and return annotations'],\n", 256 | "['__call__', 'method-wrapper', 'implementation of the () operator; a.k.a. the callable object protocol'],\n", 257 | "['__closure__', 'tuple', 'the function closure, i.e. bindings for free variables (often isNone)'],\n", 258 | "['__code__', 'code', 'function metadata and function body compiled into bytecode'],\n", 259 | "['__defaults__', 'tuple', 'default values for the formal parameters'],\n", 260 | "['__get__', 'method-wrapper', 'implementation of the read-only descriptor protocol (see XREF)'],\n", 261 | "['__globals__', 'dict', 'global variables of the module where the function is defined'],\n", 262 | "['__kwdefaults__', 'dict','default values for the keyword-only formal parameters'],\n", 263 | "['__name__', 'str', 'the function name',],\n", 264 | "['__qualname__', 'str','the qualified function name, ex.:Random.choice(see PEP-3155)']]\n", 265 | "\n", 266 | "\n", 267 | "col_width = max(len(word) for row in data for word in row) - 50 # padding\n", 268 | "for row in data:\n", 269 | " print(\"\".join(word.ljust(col_width) for word in row))\n", 270 | "\n", 271 | " " 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "## From positional to keyword-only parameters" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": 38, 284 | "metadata": { 285 | "collapsed": false 286 | }, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "
\n", 293 | "

hello

\n", 294 | "

hello

\n", 295 | "

world

\n", 296 | "

hello

\n", 297 | "

hello

\n", 298 | "

world

\n", 299 | "\n", 300 | "\n" 301 | ] 302 | } 303 | ], 304 | "source": [ 305 | "def tag(name, *content, cls=None, **attrs): \n", 306 | " \"\"\"Generate one or more HTML tags\"\"\" \n", 307 | " if cls is not None:\n", 308 | " attrs['class'] = cls \n", 309 | " if attrs:\n", 310 | " attr_str = ''.join(' %s=\"%s\"' % (attr, value) \n", 311 | " for attr, value in sorted(attrs.items()))\n", 312 | " else:\n", 313 | " attr_str = ''\n", 314 | " if content:\n", 315 | " return '\\n'.join('<%s%s>%s' % (name, attr_str, c, name) for c in content)\n", 316 | " else:\n", 317 | " return '<%s%s />' % (name, attr_str)\n", 318 | "\n", 319 | "#single argument passed as name\n", 320 | "print(tag('br'))\n", 321 | "#arguments after first are captured as content\n", 322 | "print(tag('p', 'hello'))\n", 323 | "print(tag('p', 'hello', 'world'))\n", 324 | "#arguments not explicitly defined are captured by **attrs\n", 325 | "print(tag('p', 'hello', id=33))\n", 326 | "#The cls parameter can only be passed as a keyword argument, rather than positional\n", 327 | "print(tag('p', 'hello', 'world', cls='sidebar'))\n", 328 | "#Even the first positional argument can be passed as a keyword when tag is called.\n", 329 | "print(tag(content='testing', name=\"img\"))\n", 330 | "#Prefixing the my_tag dict with ** passes all its items as separate arguments\n", 331 | "my_tag = {'name': 'img', 'title': 'Sunset Boulevard','src': 'sunset.jpg', 'cls': 'framed'}\n", 332 | "print(tag(**my_tag))" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "## Retrieving information about parameters\n", 340 | "\n", 341 | "How does Bobo know what are the parameter names required by the function, and whether they have default values or not?\n", 342 | "\n", 343 | "Within a function object, the __defaults__ attribute holds a tuple with the default values of positional and keyword arguments. The defaults for keyword-only arguments appear in __kwdefaults__. The names of the arguments, however, are found within the __code__ attribute, which is a reference to a code object with many attributes of its own." 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": null, 349 | "metadata": { 350 | "collapsed": true 351 | }, 352 | "outputs": [], 353 | "source": [ 354 | "import bobo\n", 355 | "\n", 356 | "@bobo.query('/') \n", 357 | "def hello(person):\n", 358 | " return 'Hello %s!' % person\n", 359 | "\n", 360 | "#curl -i http://localhost:8080/\n", 361 | "#HTTP/1.0 403 Forbidden\n", 362 | "\n", 363 | "#curl -i http://localhost:8080/?person=Jim\n", 364 | "#Hello Jim!" 365 | ] 366 | }, 367 | { 368 | "cell_type": "code", 369 | "execution_count": 43, 370 | "metadata": { 371 | "collapsed": false 372 | }, 373 | "outputs": [ 374 | { 375 | "name": "stdout", 376 | "output_type": "stream", 377 | "text": [ 378 | "(80,)\n", 379 | "('text', 'max_len', 'end', 'space_before', 'space_after')\n", 380 | "2\n", 381 | "(text, max_len=80)\n", 382 | "POSITIONAL_OR_KEYWORD : text = \n", 383 | "POSITIONAL_OR_KEYWORD : max_len = 80\n" 384 | ] 385 | } 386 | ], 387 | "source": [ 388 | "from inspect import signature\n", 389 | "\n", 390 | "def clip(text, max_len=80):\n", 391 | " \"\"\"Return text clipped at the last space before or after max_len \"\"\"\n", 392 | " end = None\n", 393 | " if len(text) > max_len:\n", 394 | " space_before = text.rfind(' ', 0, max_len) \n", 395 | " if space_before >= 0:\n", 396 | " end = space_before \n", 397 | " else:\n", 398 | " space_after = text.rfind(' ', max_len) \n", 399 | " if space_after >= 0:\n", 400 | " end = space_after\n", 401 | " if end is None: # no spaces were found\n", 402 | " end = len(text) \n", 403 | " return text[:end].rstrip()\n", 404 | "\n", 405 | "print(clip.__defaults__)\n", 406 | "print(clip.__code__.co_varnames)\n", 407 | "print(clip.__code__.co_argcount)\n", 408 | "\n", 409 | "#inspect.signature returns an inspect.Signature object, \n", 410 | "#which has a parameters attribute that lets you read an ordered mapping of names to inspect\n", 411 | "sig = signature(clip)\n", 412 | "print(str(sig))\n", 413 | "\n", 414 | "for name, param in sig.parameters.items():\n", 415 | " print(param.kind, ':', name, '=', param.default)" 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": 50, 421 | "metadata": { 422 | "collapsed": false 423 | }, 424 | "outputs": [ 425 | { 426 | "name": "stdout", 427 | "output_type": "stream", 428 | "text": [ 429 | "\n", 430 | "name = img\n", 431 | "cls = framed\n", 432 | "attrs = {'title': 'Sunset Boulevard', 'src': 'sunset.jpg'}\n" 433 | ] 434 | }, 435 | { 436 | "ename": "TypeError", 437 | "evalue": "missing a required argument: 'name'", 438 | "output_type": "error", 439 | "traceback": [ 440 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 441 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 442 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 28\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 29\u001b[0m \u001b[0;32mdel\u001b[0m \u001b[0mmy_tag\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;34m'name'\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 30\u001b[0;31m \u001b[0mbound_args\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0msig\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbind\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mmy_tag\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 443 | "\u001b[0;32m/anaconda/lib/python3.6/inspect.py\u001b[0m in \u001b[0;36mbind\u001b[0;34m(*args, **kwargs)\u001b[0m\n\u001b[1;32m 2932\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mthe\u001b[0m \u001b[0mpassed\u001b[0m \u001b[0marguments\u001b[0m \u001b[0mcan\u001b[0m \u001b[0;32mnot\u001b[0m \u001b[0mbe\u001b[0m \u001b[0mbound\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2933\u001b[0m \"\"\"\n\u001b[0;32m-> 2934\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0margs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m0\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_bind\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2935\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2936\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mbind_partial\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 444 | "\u001b[0;32m/anaconda/lib/python3.6/inspect.py\u001b[0m in \u001b[0;36m_bind\u001b[0;34m(self, args, kwargs, partial)\u001b[0m\n\u001b[1;32m 2847\u001b[0m \u001b[0mmsg\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m'missing a required argument: {arg!r}'\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2848\u001b[0m \u001b[0mmsg\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mmsg\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0marg\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mparam\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mname\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2849\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mTypeError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mmsg\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;32mfrom\u001b[0m \u001b[0;32mNone\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2850\u001b[0m \u001b[0;32melse\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 2851\u001b[0m \u001b[0;31m# We have a positional argument to process\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 445 | "\u001b[0;31mTypeError\u001b[0m: missing a required argument: 'name'" 446 | ] 447 | } 448 | ], 449 | "source": [ 450 | "import inspect\n", 451 | "\n", 452 | "def tag(name, *content, cls=None, **attrs): \n", 453 | " \"\"\"Generate one or more HTML tags\"\"\" \n", 454 | " if cls is not None:\n", 455 | " attrs['class'] = cls \n", 456 | " if attrs:\n", 457 | " attr_str = ''.join(' %s=\"%s\"' % (attr, value) \n", 458 | " for attr, value in sorted(attrs.items()))\n", 459 | " else:\n", 460 | " attr_str = ''\n", 461 | " if content:\n", 462 | " return '\\n'.join('<%s%s>%s' % (name, attr_str, c, name) for c in content)\n", 463 | " else:\n", 464 | " return '<%s%s />' % (name, attr_str)\n", 465 | "\n", 466 | "sig = inspect.signature(tag)\n", 467 | "my_tag = {'name': 'img', 'title': 'Sunset Boulevard', 'src': 'sunset.jpg', 'cls': 'framed'}\n", 468 | "bound_args = sig.bind(**my_tag)\n", 469 | "print(bound_args)\n", 470 | "\n", 471 | "for name, value in bound_args.arguments.items():\n", 472 | " print(name, '=', value)\n", 473 | "\n", 474 | "del my_tag['name']\n", 475 | "#name positional argument is missing\n", 476 | "bound_args = sig.bind(**my_tag)\n" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": {}, 482 | "source": [ 483 | "## Function annotations\n", 484 | "\n", 485 | "Python 3 provides syntax to attach metadata to the parameters of a function declaration and its return value. See `-> str`. Annotations have no meaning to the Python interpreter. They are just metadata that may be used by tools, such as IDEs, frameworks and decorators." 486 | ] 487 | }, 488 | { 489 | "cell_type": "code", 490 | "execution_count": 51, 491 | "metadata": { 492 | "collapsed": false 493 | }, 494 | "outputs": [ 495 | { 496 | "name": "stdout", 497 | "output_type": "stream", 498 | "text": [ 499 | "{'text': , 'max_len': 'int > 0', 'return': }\n" 500 | ] 501 | } 502 | ], 503 | "source": [ 504 | "def clip(text:str, max_len:'int > 0'=80) -> str:\n", 505 | " \"\"\"Return text clipped at the last space before or after max_len \"\"\"\n", 506 | " end = None\n", 507 | " if len(text) > max_len:\n", 508 | " space_before = text.rfind(' ', 0, max_len) \n", 509 | " if space_before >= 0:\n", 510 | " end = space_before \n", 511 | " else:\n", 512 | " space_after = text.rfind(' ', max_len) \n", 513 | " if space_after >= 0:\n", 514 | " end = space_after\n", 515 | " if end is None: # no spaces were found\n", 516 | " end = len(text) \n", 517 | " return text[:end].rstrip()\n", 518 | "\n", 519 | "print(clip.__annotations__)" 520 | ] 521 | }, 522 | { 523 | "cell_type": "markdown", 524 | "metadata": {}, 525 | "source": [ 526 | "Although Guido makes it clear that Python does not aim to be a functional program‐ ming language, a functional coding style can be used to good extent, thanks to the support of packages like operator and functools.\n", 527 | "\n", 528 | "## The operator module" 529 | ] 530 | }, 531 | { 532 | "cell_type": "code", 533 | "execution_count": 54, 534 | "metadata": { 535 | "collapsed": false 536 | }, 537 | "outputs": [ 538 | { 539 | "name": "stdout", 540 | "output_type": "stream", 541 | "text": [ 542 | "('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833))\n", 543 | "('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889))\n", 544 | "('Tokyo', 'JP', 36.933, (35.689722, 139.691667))\n", 545 | "('Mexico City', 'MX', 20.142, (19.433333, -99.133333))\n", 546 | "('New York-Newark', 'US', 20.104, (40.808611, -74.020386))\n", 547 | "('JP', 'Tokyo')\n", 548 | "('IN', 'Delhi NCR')\n", 549 | "('MX', 'Mexico City')\n", 550 | "('US', 'New York-Newark')\n", 551 | "('BR', 'Sao Paulo')\n" 552 | ] 553 | } 554 | ], 555 | "source": [ 556 | "from functools import reduce\n", 557 | "from operator import mul \n", 558 | "\n", 559 | "#Factorial implemented with reduce and operator.mul.\n", 560 | "def fact(n):\n", 561 | " return reduce(mul, range(1, n+1))\n", 562 | "\n", 563 | "metro_data = [\n", 564 | "('Tokyo', 'JP', 36.933, (35.689722, 139.691667)),\n", 565 | "('Delhi NCR', 'IN', 21.935, (28.613889, 77.208889)),\n", 566 | "('Mexico City', 'MX', 20.142, (19.433333, -99.133333)),\n", 567 | "('New York-Newark', 'US', 20.104, (40.808611, -74.020386)),\n", 568 | "('Sao Paulo', 'BR', 19.649, (-23.547778, -46.635833)), ]\n", 569 | "\n", 570 | "from operator import itemgetter\n", 571 | "\n", 572 | "#sorting a list of tuples by the value of one field\n", 573 | "for city in sorted(metro_data, key=itemgetter(1)):\n", 574 | " print(city)\n", 575 | "\n", 576 | "cc_name = itemgetter(1, 0)\n", 577 | "\n", 578 | "#If you pass multiple index arguments to itemgetter, \n", 579 | "#the function it builds will return tuples with the extracted values:\n", 580 | "for city in metro_data: \n", 581 | " print(cc_name(city))" 582 | ] 583 | }, 584 | { 585 | "cell_type": "code", 586 | "execution_count": 58, 587 | "metadata": { 588 | "collapsed": false 589 | }, 590 | "outputs": [ 591 | { 592 | "name": "stdout", 593 | "output_type": "stream", 594 | "text": [ 595 | "Metropolis(name='Tokyo', cc='JP', pop=36.933, coord=LatLong(lat=35.689722, long=139.691667))\n", 596 | "35.689722\n", 597 | "('Sao Paulo', -23.547778)\n", 598 | "('Mexico City', 19.433333)\n", 599 | "('Delhi NCR', 28.613889)\n", 600 | "('Tokyo', 35.689722)\n", 601 | "('New York-Newark', 40.808611)\n" 602 | ] 603 | } 604 | ], 605 | "source": [ 606 | "from collections import namedtuple\n", 607 | "\n", 608 | "#using named tuples\n", 609 | "LatLong = namedtuple('LatLong', 'lat long')\n", 610 | "Metropolis = namedtuple('Metropolis', 'name cc pop coord')\n", 611 | "#build nested list from metro_data, using for loop\n", 612 | "metro_areas = [Metropolis(name, cc, pop, LatLong(lat, long))\n", 613 | " for name, cc, pop, (lat, long) in metro_data]\n", 614 | "\n", 615 | "print(metro_areas[0])\n", 616 | "print(metro_areas[0].coord.lat)\n", 617 | "\n", 618 | "from operator import attrgetter\n", 619 | "\n", 620 | "#attrgetter as a helper method\n", 621 | "name_lat = attrgetter('name', 'coord.lat')\n", 622 | "\n", 623 | "for city in sorted(metro_areas, key=attrgetter('coord.lat')):\n", 624 | " print(name_lat(city))" 625 | ] 626 | }, 627 | { 628 | "cell_type": "code", 629 | "execution_count": 59, 630 | "metadata": { 631 | "collapsed": false 632 | }, 633 | "outputs": [ 634 | { 635 | "name": "stdout", 636 | "output_type": "stream", 637 | "text": [ 638 | "THE TIME HAS COME\n", 639 | "The-time-has-come\n" 640 | ] 641 | } 642 | ], 643 | "source": [ 644 | "#creates a function on-the-fly\n", 645 | "from operator import methodcaller\n", 646 | "\n", 647 | "s = 'The time has come'\n", 648 | "\n", 649 | "upcase = methodcaller('upper') \n", 650 | "print(upcase(s))\n", 651 | "\n", 652 | "hiphenate = methodcaller('replace', ' ', '-') \n", 653 | "print(hiphenate(s))" 654 | ] 655 | }, 656 | { 657 | "cell_type": "markdown", 658 | "metadata": {}, 659 | "source": [ 660 | "## Freezing arguments with functools.partial\n", 661 | "\n", 662 | "The functools module brings together a handful of higher-order functions. The best known of them is probably reduce. The functools.partial is a higher-order function that allows partial application of a function. Given a function, a partial application produces a new callable with some of the arguments of the original function fixed." 663 | ] 664 | }, 665 | { 666 | "cell_type": "code", 667 | "execution_count": 60, 668 | "metadata": { 669 | "collapsed": false 670 | }, 671 | "outputs": [ 672 | { 673 | "name": "stdout", 674 | "output_type": "stream", 675 | "text": [ 676 | "21\n", 677 | "[3, 6, 9, 12, 15, 18, 21, 24, 27]\n" 678 | ] 679 | } 680 | ], 681 | "source": [ 682 | "from operator import mul\n", 683 | "from functools import partial\n", 684 | "#partial application of a function - 2nd positional argument is bound to 3\n", 685 | "triple = partial(mul, 3)\n", 686 | "print(triple(7))\n", 687 | "print(list(map(triple, range(1, 10))))" 688 | ] 689 | } 690 | ], 691 | "metadata": { 692 | "kernelspec": { 693 | "display_name": "Python 3", 694 | "language": "python", 695 | "name": "python3" 696 | }, 697 | "language_info": { 698 | "codemirror_mode": { 699 | "name": "ipython", 700 | "version": 3 701 | }, 702 | "file_extension": ".py", 703 | "mimetype": "text/x-python", 704 | "name": "python", 705 | "nbconvert_exporter": "python", 706 | "pygments_lexer": "ipython3", 707 | "version": "3.6.0" 708 | } 709 | }, 710 | "nbformat": 4, 711 | "nbformat_minor": 2 712 | } 713 | -------------------------------------------------------------------------------- /Chapter 06 - Design patterns with first-class functions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Design patterns with first-class functions\n", 8 | "\n", 9 | "## Case study: refactoring Strategy\n", 10 | "\n", 11 | "The strategy pattern is described as follows:\n", 12 | "\n", 13 | "Define a family of algorithms, encapsulate each one, and make them interchangeable. Strategy lets the algorithm vary independently from clients that use it.\n", 14 | "\n", 15 | "A clear example of Strategy applied in the e-commerce domain is computing discounts to orders according to the attributes of the customer or inspection of the ordered items.\n", 16 | "\n", 17 | "- Context: The entity that delegates to subordinate and interchangable components.\n", 18 | "- Strategy: The interface common to the interchangeable components.\n", 19 | "- Concrete Strategy: One of the concrete subclasses of Strategy.\n", 20 | "\n", 21 | "## Implementation Order class with pluggable discount strategies." 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 9, 27 | "metadata": { 28 | "collapsed": false 29 | }, 30 | "outputs": [ 31 | { 32 | "name": "stdout", 33 | "output_type": "stream", 34 | "text": [ 35 | "\n", 36 | "\n", 37 | "\n" 38 | ] 39 | } 40 | ], 41 | "source": [ 42 | "from abc import ABC, abstractmethod\n", 43 | "from collections import namedtuple\n", 44 | "\n", 45 | "Customer = namedtuple('Customer', 'name fidelity')\n", 46 | "\n", 47 | "\n", 48 | "class LineItem:\n", 49 | "\n", 50 | " def __init__(self, product, quantity, price):\n", 51 | " self.product = product\n", 52 | " self.quantity = quantity\n", 53 | " self.price = price\n", 54 | "\n", 55 | " def total(self):\n", 56 | " return self.price * self.quantity\n", 57 | "\n", 58 | "\n", 59 | "class Order: # the Context\n", 60 | "\n", 61 | " def __init__(self, customer, cart, promotion=None):\n", 62 | " self.customer = customer\n", 63 | " self.cart = list(cart)\n", 64 | " self.promotion = promotion\n", 65 | "\n", 66 | " def total(self):\n", 67 | " if not hasattr(self, '__total'):\n", 68 | " self.__total = sum(item.total() for item in self.cart)\n", 69 | " return self.__total\n", 70 | "\n", 71 | " def due(self):\n", 72 | " if self.promotion is None:\n", 73 | " discount = 0\n", 74 | " else:\n", 75 | " discount = self.promotion.discount(self)\n", 76 | " return self.total() - discount\n", 77 | "\n", 78 | " def __repr__(self):\n", 79 | " fmt = ''\n", 80 | " return fmt.format(self.total(), self.due())\n", 81 | "\n", 82 | "#strategy has no attributes\n", 83 | "class Promotion(ABC): # the Strategy: an Abstract Base Class\n", 84 | "\n", 85 | " @abstractmethod\n", 86 | " def discount(self, order):\n", 87 | " \"\"\"Return discount as a positive dollar amount\"\"\"\n", 88 | "\n", 89 | "#each concrete strategy is a class\n", 90 | "class FidelityPromo(Promotion): # first Concrete Strategy\n", 91 | " \"\"\"5% discount for customers with 1000 or more fidelity points\"\"\"\n", 92 | "\n", 93 | " def discount(self, order):\n", 94 | " return order.total() * .05 if order.customer.fidelity >= 1000 else 0\n", 95 | "\n", 96 | "\n", 97 | "class BulkItemPromo(Promotion): # second Concrete Strategy\n", 98 | " \"\"\"10% discount for each LineItem with 20 or more units\"\"\"\n", 99 | "\n", 100 | " def discount(self, order):\n", 101 | " discount = 0\n", 102 | " for item in order.cart:\n", 103 | " if item.quantity >= 20:\n", 104 | " discount += item.total() * .1\n", 105 | " return discount\n", 106 | "\n", 107 | "\n", 108 | "class LargeOrderPromo(Promotion): # third Concrete Strategy\n", 109 | " \"\"\"7% discount for orders with 10 or more distinct items\"\"\"\n", 110 | "\n", 111 | " def discount(self, order):\n", 112 | " distinct_items = {item.product for item in order.cart} \n", 113 | " if len(distinct_items) >= 10:\n", 114 | " return order.total() * .07 \n", 115 | " return 0\n", 116 | "\n", 117 | "joe = Customer('John Doe', 0)\n", 118 | "ann = Customer('Ann Smith', 1100)\n", 119 | "cart = [LineItem('banana', 4, .5), LineItem('apple', 10, 1.5), LineItem('watermellon', 5, 5.0)]\n", 120 | "banana_cart = [LineItem('banana', 30, .5), LineItem('apple', 10, 1.5)]\n", 121 | "\n", 122 | "print(Order(joe, cart, FidelityPromo()))\n", 123 | "print(Order(ann, cart, FidelityPromo()))\n", 124 | "print(Order(joe, banana_cart, BulkItemPromo()))" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": {}, 130 | "source": [ 131 | "## Order class with discount strategies implemented as functions.\n", 132 | "\n", 133 | "Functions are passed as arguments, to simplify the strategy pattern" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 21, 139 | "metadata": { 140 | "collapsed": false 141 | }, 142 | "outputs": [ 143 | { 144 | "name": "stdout", 145 | "output_type": "stream", 146 | "text": [ 147 | "\n", 148 | "\n", 149 | "\n", 150 | "\n", 151 | "\n", 152 | "\n" 153 | ] 154 | } 155 | ], 156 | "source": [ 157 | "from collections import namedtuple\n", 158 | "import inspect\n", 159 | "\n", 160 | "#Hard coded promo names\n", 161 | "# promos = [fidelity_promo, bulk_item_promo, large_order_promo]\n", 162 | "\n", 163 | "#The promos list is built by introspection of the module global namespace.\n", 164 | "promos = [globals()[name] for name in globals() if name.endswith('_promo') and name != 'best_promo']\n", 165 | "\n", 166 | "#promos list is built by introspection of a new promotions module. Assuming promotions is in another file / module\n", 167 | "# promos = [func for name, func in inspect.getmembers(promotions, inspect.isfunction)]\n", 168 | "\n", 169 | "Customer = namedtuple('Customer', 'name fidelity')\n", 170 | "\n", 171 | "class LineItem:\n", 172 | "\n", 173 | " def __init__(self, product, quantity, price):\n", 174 | " self.product = product\n", 175 | " self.quantity = quantity\n", 176 | " self.price = price\n", 177 | "\n", 178 | " def total(self):\n", 179 | " return self.price * self.quantity\n", 180 | "\n", 181 | "\n", 182 | "class Order: # the Context\n", 183 | "\n", 184 | " def __init__(self, customer, cart, promotion=None):\n", 185 | " self.customer = customer\n", 186 | " self.cart = list(cart)\n", 187 | " self.promotion = promotion\n", 188 | "\n", 189 | " def total(self):\n", 190 | " if not hasattr(self, '__total'):\n", 191 | " self.__total = sum(item.total() for item in self.cart)\n", 192 | " return self.__total\n", 193 | "\n", 194 | " def due(self):\n", 195 | " if self.promotion is None:\n", 196 | " discount = 0\n", 197 | " else:\n", 198 | " discount = self.promotion(self)\n", 199 | " return self.total() - discount\n", 200 | "\n", 201 | " def __repr__(self):\n", 202 | " fmt = ''\n", 203 | " return fmt.format(self.total(), self.due())\n", 204 | "\n", 205 | "\n", 206 | "def fidelity_promo(order):\n", 207 | " \"\"\"5% discount for customers with 1000 or more fidelity points\"\"\"\n", 208 | " return order.total() * .05 if order.customer.fidelity >= 1000 else 0\n", 209 | "\n", 210 | "\n", 211 | "def bulk_item_promo(order):\n", 212 | " \"\"\"10% discount for each LineItem with 20 or more units\"\"\"\n", 213 | " discount = 0\n", 214 | " for item in order.cart:\n", 215 | " if item.quantity >= 20:\n", 216 | " discount += item.total() * .1\n", 217 | " return discount\n", 218 | "\n", 219 | "\n", 220 | "def large_order_promo(order):\n", 221 | " \"\"\"7% discount for orders with 10 or more distinct items\"\"\"\n", 222 | " distinct_items = {item.product for item in order.cart}\n", 223 | " if len(distinct_items) >= 10:\n", 224 | " return order.total() * .07\n", 225 | " return 0\n", 226 | "\n", 227 | "def best_promo(order):\n", 228 | " \"\"\"Select best discount available\"\"\"\n", 229 | " return max(promo(order) for promo in promos)\n", 230 | "\n", 231 | "#Note the callouts in Example 6-4: there is no need to instantiate \n", 232 | "#a new promotion object with each new order: the functions are ready to use.\n", 233 | "\n", 234 | "joe = Customer('John Doe', 0)\n", 235 | "ann = Customer('Ann Smith', 1100)\n", 236 | "cart = [LineItem('banana', 4, .5), LineItem(\n", 237 | " 'apple', 10, 1.5), LineItem('watermellon', 5, 5.0)]\n", 238 | "\n", 239 | "print(Order(joe, cart, fidelity_promo))\n", 240 | "print(Order(ann, cart, fidelity_promo))\n", 241 | "banana_cart = [LineItem('banana', 30, .5), LineItem('apple', 10, 1.5)]\n", 242 | "print(Order(joe, banana_cart, bulk_item_promo))\n", 243 | "long_order = [LineItem(str(item_code), 1, 1.0) for item_code in range(10)]\n", 244 | "print(Order(joe, long_order, large_order_promo))\n", 245 | "print(Order(joe, cart, large_order_promo))\n", 246 | "\n", 247 | "#\n", 248 | "print(Order(joe, long_order, best_promo))\n" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "## Command Pattern\n", 256 | "\n", 257 | "Command is another design pattern that can be simplified by the use of functions passed as arguments. The goal of Command is to decouple an object that invokes an operation (the Invoker) from the provider object that implements it (the Receiver).\n", 258 | "\n", 259 | "Instead of giving the Invoker a Command instance, we can simply give it a function. Instead of calling command.execute(), the Invoker can just call command(). " 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 24, 265 | "metadata": { 266 | "collapsed": false 267 | }, 268 | "outputs": [], 269 | "source": [ 270 | "class MacroCommand:\n", 271 | " \"\"\"A command that executes a list of commands\"\"\"\n", 272 | " def __init__(self, commands): \n", 273 | " self.commands = list(commands) #\n", 274 | " \n", 275 | " def __call__(self):\n", 276 | " for command in self.commands: #\n", 277 | " command()" 278 | ] 279 | } 280 | ], 281 | "metadata": { 282 | "kernelspec": { 283 | "display_name": "Python 3", 284 | "language": "python", 285 | "name": "python3" 286 | }, 287 | "language_info": { 288 | "codemirror_mode": { 289 | "name": "ipython", 290 | "version": 3 291 | }, 292 | "file_extension": ".py", 293 | "mimetype": "text/x-python", 294 | "name": "python", 295 | "nbconvert_exporter": "python", 296 | "pygments_lexer": "ipython3", 297 | "version": "3.6.0" 298 | } 299 | }, 300 | "nbformat": 4, 301 | "nbformat_minor": 2 302 | } 303 | -------------------------------------------------------------------------------- /Chapter 07 - Function decorators and closures.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Function decorators and closures\n", 8 | "\n", 9 | "Function decorators let us “mark” functions in the source code to enhance their behavior is some way. This is powerful stuff, but mastering it requires understanding closures.\n", 10 | "\n", 11 | "Aside from their application in decorators, closures are also essential for effective asyn‐ chronous programming with callbacks, and for coding in a functional style whenever it makes sense.\n", 12 | "\n", 13 | "## Decorators\n", 14 | "\n", 15 | "A decorator is a callable that takes another function as argument. Strictly speaking, decorators are just syntactic sugar. The second crucial fact is that they are executed immediately when a module is loaded." 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": 39, 21 | "metadata": { 22 | "collapsed": false 23 | }, 24 | "outputs": [ 25 | { 26 | "name": "stdout", 27 | "output_type": "stream", 28 | "text": [ 29 | "running inner()\n", 30 | "None\n", 31 | ".inner at 0x11174f620>\n" 32 | ] 33 | } 34 | ], 35 | "source": [ 36 | "#@decorate\n", 37 | "#def target():\n", 38 | "# print('running target()')\n", 39 | " \n", 40 | "#Has the same effect as writing this:\n", 41 | "\n", 42 | "#def target():\n", 43 | "# print('running target()')\n", 44 | " \n", 45 | "#target = decorate(target)\n", 46 | "\n", 47 | "#live example\n", 48 | "\n", 49 | "def deco(func):\n", 50 | " def inner():\n", 51 | " print('running inner()')\n", 52 | " #deco just returns inner()\n", 53 | " return inner\n", 54 | "\n", 55 | "#deco used as a decorator\n", 56 | "@deco\n", 57 | "def target():\n", 58 | " print('running target()')\n", 59 | "\n", 60 | "#instantiating target points to inner()\n", 61 | "print(target())\n", 62 | "#target is now therefore a reference to inner\n", 63 | "print(target)" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "metadata": { 70 | "collapsed": true 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "## Closure example - How to create a closure?\n", 75 | "\n", 76 | "- create a nested function\n", 77 | "- nested function has to refer to a variable defined inside the enclosing function\n", 78 | "- enclosing function has to return the nested function" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 40, 84 | "metadata": { 85 | "collapsed": false 86 | }, 87 | "outputs": [ 88 | { 89 | "name": "stdout", 90 | "output_type": "stream", 91 | "text": [ 92 | "x = 5\n", 93 | "y = 5\n", 94 | "10\n", 95 | "x = 10\n", 96 | "y = 5\n", 97 | "15\n" 98 | ] 99 | } 100 | ], 101 | "source": [ 102 | "def makeInc(x):\n", 103 | " \n", 104 | " def inc(y):\n", 105 | " # x is \"closed\" in the definition of inc\n", 106 | " print(\"y = %d\" %y)\n", 107 | " return y + x\n", 108 | "\n", 109 | " print(\"x = %d\" %x)\n", 110 | " return inc\n", 111 | "\n", 112 | "#instantiate makeInc, with x as 5\n", 113 | "inc5 = makeInc(5)\n", 114 | "print(inc5(5)) # returns 10\n", 115 | "\n", 116 | "#instantiate makeInc, with x as 10\n", 117 | "inc10 = makeInc(10)\n", 118 | "print(inc10(5)) # returns 15" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 45, 124 | "metadata": { 125 | "collapsed": false 126 | }, 127 | "outputs": [ 128 | { 129 | "name": "stdout", 130 | "output_type": "stream", 131 | "text": [ 132 | "running register function ()\n", 133 | "running register function ()\n", 134 | "running main()\n", 135 | "registry -> [, ]\n", 136 | "running f1()\n", 137 | "running f2()\n", 138 | "running f3()\n" 139 | ] 140 | } 141 | ], 142 | "source": [ 143 | "#registry will hold references to functions decorated by @register\n", 144 | "registry = []\n", 145 | "\n", 146 | "#register takes a function as argument.\n", 147 | "def register(func):\n", 148 | " #Display what function is being decorated, for demonstration\n", 149 | " print('running register function (%s)' % func)\n", 150 | " #Append func to registry list\n", 151 | " registry.append(func)\n", 152 | " #Return func: we must return a function, here we return the same received as argument.\n", 153 | " return func\n", 154 | "\n", 155 | "#f1 and f2 are decorated by @register\n", 156 | "@register\n", 157 | "def f1():\n", 158 | " print('running f1()')\n", 159 | " \n", 160 | "@register\n", 161 | "def f2():\n", 162 | " print('running f2()')\n", 163 | "\n", 164 | "#f3 is not decorated.\n", 165 | "def f3():\n", 166 | " print('running f3()')\n", 167 | "\n", 168 | "#main displays the registry, then calls f1(), f2() and f3().\n", 169 | "def main():\n", 170 | " print('running main()') \n", 171 | " print('registry ->', registry) \n", 172 | " f1()\n", 173 | " f2()\n", 174 | " f3()\n", 175 | " \n", 176 | "if __name__=='__main__': main()\n", 177 | " \n", 178 | "#If registration.py is imported (and not run as a script), the output is this:\n", 179 | "#>>> import registration\n", 180 | "#running register() \n", 181 | "#running register()" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "## Decorator-enhanced Strategy pattern\n", 189 | "\n" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": 53, 195 | "metadata": { 196 | "collapsed": false 197 | }, 198 | "outputs": [ 199 | { 200 | "name": "stdout", 201 | "output_type": "stream", 202 | "text": [ 203 | "3\n", 204 | "6\n" 205 | ] 206 | } 207 | ], 208 | "source": [ 209 | "promos = []\n", 210 | "\n", 211 | "\n", 212 | "def promotion(promo_func):\n", 213 | " promos.append(promo_func)\n", 214 | " return promo_func\n", 215 | "\n", 216 | "\n", 217 | "@promotion\n", 218 | "def fidelity(order):\n", 219 | " \"\"\"5% discount for customers with 1000 or more fidelity points\"\"\"\n", 220 | " return order.total() * .05 if order.customer.fidelity >= 1000 else 0\n", 221 | "\n", 222 | "\n", 223 | "@promotion\n", 224 | "def bulk_item(order):\n", 225 | " \"\"\"10% discount for each LineItem with 20 or more units\"\"\" \n", 226 | " discount = 0\n", 227 | " for item in order.cart:\n", 228 | " if item.quantity >= 20:\n", 229 | " discount += item.total() * .1\n", 230 | " return discount\n", 231 | "\n", 232 | "\n", 233 | "@promotion\n", 234 | "def large_order(order):\n", 235 | " \"\"\"7% discount for orders with 10 or more distinct items\"\"\"\n", 236 | " distinct_items = {item.product for item in order.cart}\n", 237 | " if len(distinct_items) >= 10:\n", 238 | " return order.total() * .07 \n", 239 | " return 0\n", 240 | "\n", 241 | "\n", 242 | "def best_promo(order):\n", 243 | " \"\"\"Select best discount available\"\"\"\n", 244 | " return max(promo(order) for promo in promos)\n", 245 | "\n", 246 | "b=6\n", 247 | "def f2(a):\n", 248 | " print(a)\n", 249 | " print(b)\n", 250 | " #if b is deleted, the function will fetch it from the global scope\n", 251 | "# b=9\n", 252 | "\n", 253 | "# when Python compiles the body of the function, it decides that b is a local variable \n", 254 | "# because it is assigned within the function. \n", 255 | "# The generated bytecode reflects this decision and will try to fetch b from the local environment. \n", 256 | "# aka UnboundLocalError: local variable 'b' referenced before assignment\n", 257 | "f2(3)\n" 258 | ] 259 | }, 260 | { 261 | "cell_type": "markdown", 262 | "metadata": {}, 263 | "source": [ 264 | "## Closures\n", 265 | "\n", 266 | "A closure is function with an extended scope that encompasses non-global variables referenced in the body of the function but not defined there. It does not matter whether the function is anonymous or not, what matters is that it can access non-global variables that are defined outside of its body." 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 71, 272 | "metadata": { 273 | "collapsed": false 274 | }, 275 | "outputs": [ 276 | { 277 | "name": "stdout", 278 | "output_type": "stream", 279 | "text": [ 280 | "10.0\n", 281 | "30.0\n", 282 | "33.333333333333336\n", 283 | "10.0\n", 284 | "30.0\n", 285 | "33.333333333333336\n", 286 | "('new_value', 'total')\n", 287 | "('series',)\n", 288 | "10.0\n" 289 | ] 290 | } 291 | ], 292 | "source": [ 293 | "class Averager():\n", 294 | " def __init__(self):\n", 295 | " self.series = []\n", 296 | " \n", 297 | " def __call__(self, new_value): \n", 298 | " self.series.append(new_value) \n", 299 | " total = sum(self.series) \n", 300 | " return total/len(self.series)\n", 301 | "\n", 302 | "avg = Averager()\n", 303 | "print(avg(10))\n", 304 | "print(avg(50))\n", 305 | "print(avg(40))\n", 306 | "\n", 307 | "def make_averager():\n", 308 | " series = []\n", 309 | " \n", 310 | " def averager(new_value):\n", 311 | " #Within averager, series is a free variable. \n", 312 | " series.append(new_value) \n", 313 | " total = sum(series) \n", 314 | " return total/len(series)\n", 315 | " \n", 316 | " return averager\n", 317 | "\n", 318 | "avg2 = make_averager()\n", 319 | "print(avg2(10))\n", 320 | "print(avg2(50))\n", 321 | "print(avg2(40))\n", 322 | "print(avg2.__code__.co_varnames)\n", 323 | "print(avg2.__code__.co_freevars)\n", 324 | "\n", 325 | "def make_averager2(): \n", 326 | " count = 0\n", 327 | " total = 0\n", 328 | " \n", 329 | " def averager(new_value):\n", 330 | " #count is being called, but it is only in the local scope, so has not yet been assigned\n", 331 | " #if nonlocal is used, Python knows to look outside the local scope\n", 332 | " nonlocal count, total\n", 333 | " count += 1\n", 334 | " total += new_value \n", 335 | " return total / count\n", 336 | " \n", 337 | " return averager\n", 338 | "\n", 339 | "avg3 = make_averager2()\n", 340 | "#throws UnboundLocalError: local variable 'count' referenced before assignment, when nonlocal is missing\n", 341 | "print(avg3(10))" 342 | ] 343 | }, 344 | { 345 | "cell_type": "code", 346 | "execution_count": 77, 347 | "metadata": { 348 | "collapsed": false 349 | }, 350 | "outputs": [ 351 | { 352 | "name": "stdout", 353 | "output_type": "stream", 354 | "text": [ 355 | "**************************************** Calling snooze(.123)\n", 356 | "[0.12658882s] snooze(0.123) -> None \n", 357 | "**************************************** Calling factorial(6)\n", 358 | "[0.00000119s] factorial(1) -> 1 \n", 359 | "[0.00015378s] factorial(2) -> 2 \n", 360 | "[0.00022388s] factorial(3) -> 6 \n", 361 | "[0.00028610s] factorial(4) -> 24 \n", 362 | "[0.00036502s] factorial(5) -> 120 \n", 363 | "[0.00059915s] factorial(6) -> 720 \n", 364 | "6! = 720\n" 365 | ] 366 | } 367 | ], 368 | "source": [ 369 | "import time\n", 370 | "import functools\n", 371 | "\n", 372 | "def clock(func):\n", 373 | " @functools.wraps(func)\n", 374 | " def clocked(*args, **kwargs):\n", 375 | " t0 = time.time()\n", 376 | " result = func(*args, **kwargs) \n", 377 | " elapsed = time.time() - t0 \n", 378 | " name = func.__name__\n", 379 | " arg_lst = []\n", 380 | " if args:\n", 381 | " arg_lst.append(', '.join(repr(arg) for arg in args)) \n", 382 | " if kwargs:\n", 383 | " pairs = ['%s=%r' % (k, w) for k, w in sorted(kwargs.items())]\n", 384 | " arg_lst.append(', '.join(pairs))\n", 385 | " arg_str = ', '.join(arg_lst)\n", 386 | " print('[%0.8fs] %s(%s) -> %r ' % (elapsed, name, arg_str, result)) \n", 387 | " return result\n", 388 | " return clocked\n", 389 | "\n", 390 | "@clock\n", 391 | "def snooze(seconds): time.sleep(seconds)\n", 392 | "\n", 393 | "#From now on, each time factorial(n) is called, clocked(n) gets executed\n", 394 | "@clock\n", 395 | "def factorial(n):\n", 396 | " return 1 if n < 2 else n*factorial(n-1)\n", 397 | "\n", 398 | "if __name__=='__main__':\n", 399 | " print('*' * 40, 'Calling snooze(.123)') \n", 400 | " snooze(.123)\n", 401 | " print('*' * 40, 'Calling factorial(6)') \n", 402 | " print('6! =', factorial(6))\n" 403 | ] 404 | }, 405 | { 406 | "cell_type": "markdown", 407 | "metadata": {}, 408 | "source": [ 409 | "## Decorators in the standard library\n", 410 | "\n", 411 | "Python has three built-in functions that are designed to decorate methods: property, classmethod and staticmethod. \n", 412 | "\n", 413 | "### Memoization with functools.lru_cache\n", 414 | "\n", 415 | "A very practical decorator is functools.lru_cache. It implements memoization: an optimization technique which works by saving the results of previous invocations of an expensive function, avoiding repeat computations on previously used arguments. The letters LRU stand for Least Recently Used, meaning that the growth of the cache is limited by discarding the entries that have not been read for a while." 416 | ] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "execution_count": 82, 421 | "metadata": { 422 | "collapsed": false 423 | }, 424 | "outputs": [ 425 | { 426 | "name": "stdout", 427 | "output_type": "stream", 428 | "text": [ 429 | "[0.00000119s] fibonacci(0) -> 0 \n", 430 | "[0.00000095s] fibonacci(1) -> 1 \n", 431 | "[0.00046587s] fibonacci(2) -> 1 \n", 432 | "[0.00000000s] fibonacci(1) -> 1 \n", 433 | "[0.00000072s] fibonacci(0) -> 0 \n", 434 | "[0.00000119s] fibonacci(1) -> 1 \n", 435 | "[0.00008774s] fibonacci(2) -> 1 \n", 436 | "[0.00020289s] fibonacci(3) -> 2 \n", 437 | "[0.00088596s] fibonacci(4) -> 3 \n", 438 | "[0.00000095s] fibonacci(1) -> 1 \n", 439 | "[0.00000000s] fibonacci(0) -> 0 \n", 440 | "[0.00000119s] fibonacci(1) -> 1 \n", 441 | "[0.00016093s] fibonacci(2) -> 1 \n", 442 | "[0.00043392s] fibonacci(3) -> 2 \n", 443 | "[0.00000000s] fibonacci(0) -> 0 \n", 444 | "[0.00000095s] fibonacci(1) -> 1 \n", 445 | "[0.00006676s] fibonacci(2) -> 1 \n", 446 | "[0.00000119s] fibonacci(1) -> 1 \n", 447 | "[0.00000000s] fibonacci(0) -> 0 \n", 448 | "[0.00000000s] fibonacci(1) -> 1 \n", 449 | "[0.00006104s] fibonacci(2) -> 1 \n", 450 | "[0.00012112s] fibonacci(3) -> 2 \n", 451 | "[0.00024986s] fibonacci(4) -> 3 \n", 452 | "[0.00076199s] fibonacci(5) -> 5 \n", 453 | "[0.00172329s] fibonacci(6) -> 8 \n", 454 | "8\n", 455 | "[0.00000000s] fibonacci2(0) -> 0 \n", 456 | "[0.00000095s] fibonacci2(1) -> 1 \n", 457 | "[0.00027204s] fibonacci2(2) -> 1 \n", 458 | "[0.00000215s] fibonacci2(3) -> 2 \n", 459 | "[0.00058532s] fibonacci2(4) -> 3 \n", 460 | "[0.00000215s] fibonacci2(5) -> 5 \n", 461 | "[0.00085998s] fibonacci2(6) -> 8 \n", 462 | "8\n" 463 | ] 464 | } 465 | ], 466 | "source": [ 467 | "def clock(func):\n", 468 | " @functools.wraps(func)\n", 469 | " def clocked(*args, **kwargs):\n", 470 | " t0 = time.time()\n", 471 | " result = func(*args, **kwargs) \n", 472 | " elapsed = time.time() - t0 \n", 473 | " name = func.__name__\n", 474 | " arg_lst = []\n", 475 | " if args:\n", 476 | " arg_lst.append(', '.join(repr(arg) for arg in args)) \n", 477 | " if kwargs:\n", 478 | " pairs = ['%s=%r' % (k, w) for k, w in sorted(kwargs.items())]\n", 479 | " arg_lst.append(', '.join(pairs))\n", 480 | " arg_str = ', '.join(arg_lst)\n", 481 | " print('[%0.8fs] %s(%s) -> %r ' % (elapsed, name, arg_str, result)) \n", 482 | " return result\n", 483 | " return clocked\n", 484 | "\n", 485 | "@clock\n", 486 | "def fibonacci(n):\n", 487 | " #fibonacci(1) is called 8 times, fibonacci(2) 5 times\n", 488 | " if n < 2:\n", 489 | " return n\n", 490 | " return fibonacci(n-2) + fibonacci(n-1)\n", 491 | "\n", 492 | "if __name__=='__main__': \n", 493 | " print(fibonacci(6))\n", 494 | " \n", 495 | "import functools\n", 496 | "\n", 497 | "\n", 498 | "@functools.lru_cache() # \n", 499 | "@clock #\n", 500 | "def fibonacci2(n):\n", 501 | " if n < 2: \n", 502 | " return n\n", 503 | " return fibonacci2(n-2) + fibonacci2(n-1) \n", 504 | "\n", 505 | "if __name__=='__main__':\n", 506 | " print(fibonacci2(6))" 507 | ] 508 | }, 509 | { 510 | "cell_type": "markdown", 511 | "metadata": {}, 512 | "source": [ 513 | "## Generic functions with single dispatch\n", 514 | "\n", 515 | "The new functools.singledispatch decorator in Python 3.4 allows each module to contribute to the overall solution, and lets you easily provide a specialized function even for classes that you can’t edit. If you decorate a plain function with @singledispatch it becomes a generic function: a group of functions to perform the same operation in different ways, depending on the type of the first argument.\n", 516 | "\n", 517 | "A notable quality of the singledispatch mechanism is that you can register specialized functions anywhere in the system, in any module. If you later add a module with a new user-defined type, you can easily provide a new custom function to handle that type. And you can write custom functions for classes that you did not write and can’t change." 518 | ] 519 | }, 520 | { 521 | "cell_type": "code", 522 | "execution_count": 87, 523 | "metadata": { 524 | "collapsed": false 525 | }, 526 | "outputs": [ 527 | { 528 | "name": "stdout", 529 | "output_type": "stream", 530 | "text": [ 531 | "
{1, 2, 3}
\n", 532 | "
<built-in function abs>
\n", 533 | "
    \n", 534 | "
  • alpha

  • \n", 535 | "
  • 66 (0x42)
  • \n", 536 | "
  • {1, 2, 3}
  • \n", 537 | "
\n" 538 | ] 539 | } 540 | ], 541 | "source": [ 542 | "from functools import singledispatch \n", 543 | "from collections import abc\n", 544 | "import numbers\n", 545 | "import html\n", 546 | "\n", 547 | "#@singledispatch marks the base function which handles the object type.\n", 548 | "@singledispatch\n", 549 | "def htmlize(obj):\n", 550 | " content = html.escape(repr(obj)) \n", 551 | " return '
{}
'.format(content)\n", 552 | "\n", 553 | "#Each specialized function is decorated with @«base_function».register(«type»).\n", 554 | "@htmlize.register(str)\n", 555 | "#The name of the specialized functions is irrelevant; _ is a good choice to make this clear.\n", 556 | "def _(text):\n", 557 | " content = html.escape(text).replace('\\n', '
\\n') \n", 558 | " return '

{0}

'.format(content)\n", 559 | "\n", 560 | "#For each additional type to receive special treatment, register a new function.\n", 561 | "#numbers.Integral is a virtual superclass of int (see below).\n", 562 | "@htmlize.register(numbers.Integral) \n", 563 | "def _(n):\n", 564 | " return '
{0} (0x{0:x})
'.format(n)\n", 565 | "\n", 566 | "#You can stack several register decorators to support different types with the\n", 567 | "#same function.\n", 568 | "@htmlize.register(tuple) \n", 569 | "@htmlize.register(abc.MutableSequence) \n", 570 | "def _(seq):\n", 571 | " inner = '\\n
  • '.join(htmlize(item) for item in seq) \n", 572 | " return '
      \\n
    • ' + inner + '
    • \\n
    '\n", 573 | "\n", 574 | "print(htmlize({1, 2, 3}))\n", 575 | "print(htmlize(abs))\n", 576 | "print(htmlize(['alpha', 66, {3, 2, 1}]))" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "## Parametrized Decorators\n", 584 | "\n", 585 | "When parsing a decorator in source code, Python takes the decorated function and passes it as the first argument to the decorator function." 586 | ] 587 | }, 588 | { 589 | "cell_type": "code", 590 | "execution_count": 94, 591 | "metadata": { 592 | "collapsed": false 593 | }, 594 | "outputs": [ 595 | { 596 | "name": "stdout", 597 | "output_type": "stream", 598 | "text": [ 599 | "running register(active=False)->decorate()\n", 600 | "FALSE\n", 601 | "running register(active=True)->decorate()\n", 602 | "TRUE\n", 603 | "{}\n", 604 | "running register(active=True)->decorate()\n", 605 | "TRUE\n", 606 | "{, }\n", 607 | "running register(active=False)->decorate()\n", 608 | "FALSE\n", 609 | "{}\n" 610 | ] 611 | } 612 | ], 613 | "source": [ 614 | "#registry is now a set, so adding and removing functions is faster.\n", 615 | "registry = set()\n", 616 | "\n", 617 | "#register takes an optional keyword argument.\n", 618 | "def register(active=True):\n", 619 | " #The decorate inner function is the actual decorator; note how it takes a function as argument.\n", 620 | " def decorate(func):\n", 621 | " print('running register(active=%s)->decorate(%s)' % (active, func))\n", 622 | " #Register func only if the active argument (retrieved from the closure) is True.\n", 623 | " if active:\n", 624 | " print(\"TRUE\")\n", 625 | " registry.add(func)\n", 626 | " else:\n", 627 | " print(\"FALSE\")\n", 628 | " #If not active and func in registry, remove it.\n", 629 | " registry.discard(func)\n", 630 | " \n", 631 | " #Because decorate is a decorator, it must return a function.\n", 632 | " return func\n", 633 | " #register is our decorator factory, so it returns decorate.\n", 634 | " return decorate\n", 635 | " \n", 636 | "@register(active=False) \n", 637 | "def f1():\n", 638 | " print('running f1()')\n", 639 | " \n", 640 | "@register() \n", 641 | "def f2():\n", 642 | " print('running f2()')\n", 643 | " \n", 644 | "def f3():\n", 645 | " print('running f3()')\n", 646 | "\n", 647 | "#only f2 is added to the registry\n", 648 | "print(registry)\n", 649 | "#f3 is added\n", 650 | "register()(f3)\n", 651 | "print(registry)\n", 652 | "#f2 is removed\n", 653 | "register(active=False)(f2)\n", 654 | "print(registry)" 655 | ] 656 | }, 657 | { 658 | "cell_type": "code", 659 | "execution_count": 97, 660 | "metadata": { 661 | "collapsed": false 662 | }, 663 | "outputs": [ 664 | { 665 | "name": "stdout", 666 | "output_type": "stream", 667 | "text": [ 668 | "[0.12328005s] snooze(0.123) -> None\n", 669 | "[0.12703919s] snooze(0.123) -> None\n", 670 | "[0.12393713s] snooze(0.123) -> None\n" 671 | ] 672 | } 673 | ], 674 | "source": [ 675 | "import time\n", 676 | "\n", 677 | "DEFAULT_FMT = '[{elapsed:0.8f}s] {name}({args}) -> {result}'\n", 678 | "\n", 679 | "#clock is our parametrized decorator factory.\n", 680 | "def clock(fmt=DEFAULT_FMT):\n", 681 | " #decorate is the actual decorator.\n", 682 | " def decorate(func):\n", 683 | " #clocked wraps the decorated function.\n", 684 | " def clocked(*_args):\n", 685 | " t0 = time.time()\n", 686 | " _result = func(*_args) \n", 687 | " elapsed = time.time() - t0 \n", 688 | " name = func.__name__\n", 689 | " args = ', '.join(repr(arg) for arg in _args) \n", 690 | " result = repr(_result) \n", 691 | " print(fmt.format(**locals()))\n", 692 | " return _result\n", 693 | " return clocked \n", 694 | " return decorate\n", 695 | " \n", 696 | "if __name__ == '__main__':\n", 697 | " \n", 698 | " @clock()\n", 699 | " def snooze(seconds):\n", 700 | " time.sleep(seconds) \n", 701 | " \n", 702 | " for i in range(3):\n", 703 | " snooze(.123)" 704 | ] 705 | } 706 | ], 707 | "metadata": { 708 | "kernelspec": { 709 | "display_name": "Python 3", 710 | "language": "python", 711 | "name": "python3" 712 | }, 713 | "language_info": { 714 | "codemirror_mode": { 715 | "name": "ipython", 716 | "version": 3 717 | }, 718 | "file_extension": ".py", 719 | "mimetype": "text/x-python", 720 | "name": "python", 721 | "nbconvert_exporter": "python", 722 | "pygments_lexer": "ipython3", 723 | "version": "3.6.0" 724 | } 725 | }, 726 | "nbformat": 4, 727 | "nbformat_minor": 2 728 | } 729 | -------------------------------------------------------------------------------- /Chapter 08 - Object references, mutability and recycling.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Object references, mutability and recycling\n", 8 | "\n", 9 | "A name is not the object; a name is a separate thing.\n", 10 | "\n", 11 | "If you imagine variables are like boxes, you can’t make sense of assignment in Python. Think of variables as Post-it notes. " 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 5, 17 | "metadata": { 18 | "collapsed": false 19 | }, 20 | "outputs": [ 21 | { 22 | "name": "stdout", 23 | "output_type": "stream", 24 | "text": [ 25 | "Gizmo id: 4404047096\n", 26 | "<__main__.Gizmo object at 0x106806cf8>\n", 27 | "Gizmo id: 4404044128\n" 28 | ] 29 | }, 30 | { 31 | "ename": "TypeError", 32 | "evalue": "unsupported operand type(s) for *: 'Gizmo' and 'int'", 33 | "output_type": "error", 34 | "traceback": [ 35 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 36 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 37 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mGizmo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mGizmo\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m10\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0mdir\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 38 | "\u001b[0;31mTypeError\u001b[0m: unsupported operand type(s) for *: 'Gizmo' and 'int'" 39 | ] 40 | } 41 | ], 42 | "source": [ 43 | "#Variables are assigned to objects only after the objects are created.\n", 44 | "\n", 45 | "class Gizmo:\n", 46 | " def __init__(self):\n", 47 | " print('Gizmo id: %d' % id(self))\n", 48 | " \n", 49 | "x = Gizmo()\n", 50 | "#y doesn't get created, because the multiplication step takes place first\n", 51 | "y = Gizmo() * 10\n", 52 | "dir()" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 8, 58 | "metadata": { 59 | "collapsed": false 60 | }, 61 | "outputs": [ 62 | { 63 | "name": "stdout", 64 | "output_type": "stream", 65 | "text": [ 66 | "True\n", 67 | "4403896112 4403896112\n", 68 | "True\n" 69 | ] 70 | } 71 | ], 72 | "source": [ 73 | "#an example of aliasing.\n", 74 | "charles = {'name': 'Charles L. Dodgson', 'born': 1832}\n", 75 | "lewis = charles\n", 76 | "print(lewis is charles)\n", 77 | "print(id(charles), id(lewis))\n", 78 | "\n", 79 | "#alex is not an alias for charles: these variables are bound to distinct objects. The \n", 80 | "#objects bound to alex and charles have the same value — that’s what == compares — but \n", 81 | "#they have different identities.\n", 82 | "alex = {'name': 'Charles L. Dodgson', 'born': 1832, 'balance': 950}\n", 83 | "alex == charles\n", 84 | "print(alex is not charles)" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "The == operator compares the values of objects (the data they hold), while is compares their identities. The is operator is faster than ==, because it cannot be overloaded, so Python does not have to find and invoke special methods to evaluate it, and computing is as simple as comparing two integer ids. \n", 92 | "\n", 93 | "## The relative immutability of tuples\n", 94 | "\n", 95 | "The immutability of tuples really refers to the physical contents of the tuple data structure (ie. the references it holds), and does not extend to the referenced objects." 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 13, 101 | "metadata": { 102 | "collapsed": false 103 | }, 104 | "outputs": [ 105 | { 106 | "name": "stdout", 107 | "output_type": "stream", 108 | "text": [ 109 | "True\n", 110 | "4403120072\n", 111 | "(1, 2, [30, 40, 99])\n", 112 | "4403120072\n", 113 | "False\n", 114 | "[3, [55, 44], (7, 8, 9)]\n", 115 | "True\n", 116 | "False\n" 117 | ] 118 | } 119 | ], 120 | "source": [ 121 | "t1 = (1, 2, [30, 40])\n", 122 | "t2 = (1, 2, [30, 40])\n", 123 | "print(t1==t2)\n", 124 | "print(id(t1[-1]))\n", 125 | "t1[-1].append(99)\n", 126 | "print(t1)\n", 127 | "#The identity of list at t1[-1] has not changed, only its value.\n", 128 | "print(id(t1[-1]))\n", 129 | "#but t1 and t2 are now different, because t1[-1] has changed\n", 130 | "print(t1==t2)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "markdown", 135 | "metadata": {}, 136 | "source": [ 137 | "Using the constructor or [:] produces a shallow copy, i.e. the outermost container is duplicated, but the copy is filled with references to the same items held by the original container." 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 16, 143 | "metadata": { 144 | "collapsed": false 145 | }, 146 | "outputs": [ 147 | { 148 | "name": "stdout", 149 | "output_type": "stream", 150 | "text": [ 151 | "[3, [55, 44], (7, 8, 9)]\n", 152 | "True\n", 153 | "False\n", 154 | "l1: [3, [66, 44], (7, 8, 9), 100]\n", 155 | "l2: [3, [66, 44], (7, 8, 9)]\n", 156 | "l1: [3, [66, 44, 33, 22], (7, 8, 9), 100]\n", 157 | "l2: [3, [66, 44, 33, 22], (7, 8, 9, 10, 11)]\n" 158 | ] 159 | } 160 | ], 161 | "source": [ 162 | "l1 = [3, [55, 44], (7, 8, 9)]\n", 163 | "l2 = list(l1)\n", 164 | "print(l2)\n", 165 | "print(l1 == l2)\n", 166 | "print(l1 is l2)\n", 167 | "\n", 168 | "l1 = [3, [66, 55, 44], (7, 8, 9)] \n", 169 | "l2 = list(l1) # make a shallow copy - clone the top list, but not the contained list / tuple\n", 170 | "l1.append(100) # append 100 to L1 - doesn't affect l2\n", 171 | "l1[1].remove(55) # remove 55 from L1 list at pos. 2\n", 172 | "print('l1:', l1)\n", 173 | "print('l2:', l2) \n", 174 | "l2[1] += [33, 22] # For a mutable object like the list referred by l2[1], the operator += changes the list in-place.\n", 175 | "l2[2] += (10, 11) # += on a tuple creates a new tuple and rebinds the variable l2[2]\n", 176 | "print('l1:', l1) \n", 177 | "print('l2:', l2)" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "## Deep copies" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 22, 190 | "metadata": { 191 | "collapsed": false 192 | }, 193 | "outputs": [ 194 | { 195 | "name": "stdout", 196 | "output_type": "stream", 197 | "text": [ 198 | "4404044240 4404196128 4404196632\n", 199 | "None\n", 200 | "['Steve', 'Claire', 'David']\n", 201 | "4403172680 4403172680 4404200008\n", 202 | "['Steve', 'Bill', 'Claire', 'David']\n" 203 | ] 204 | } 205 | ], 206 | "source": [ 207 | "import copy\n", 208 | "\n", 209 | "class Bus:\n", 210 | " def __init__(self, passengers=None): \n", 211 | " if passengers is None:\n", 212 | " self.passengers = [] \n", 213 | " else:\n", 214 | " self.passengers = list(passengers) \n", 215 | "\n", 216 | " def pick(self, name):\n", 217 | " self.passengers.append(name) \n", 218 | "\n", 219 | " def drop(self, name):\n", 220 | " self.passengers.remove(name)\n", 221 | " \n", 222 | "bus1 = Bus(['Steve', 'Bill', 'Claire', 'David'])\n", 223 | "bus2 = copy.copy(bus1)\n", 224 | "bus3 = copy.deepcopy(bus1)\n", 225 | "print(id(bus1), id(bus2), id(bus3))\n", 226 | "print(bus1.drop('Bill'))\n", 227 | "print(bus2.passengers)\n", 228 | "print(id(bus1.passengers), id(bus2.passengers), id(bus3.passengers))\n", 229 | "print(bus3.passengers)" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "## Function parameters as references\n", 237 | "\n", 238 | "The only mode of parameter passing in Python is \"call by sharing\". Call by sharing means that each formal parameter of the function gets a copy of each reference in the arguments. In other words, the parameters inside the function become aliases of the actual arguments." 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 27, 244 | "metadata": { 245 | "collapsed": false 246 | }, 247 | "outputs": [ 248 | { 249 | "name": "stdout", 250 | "output_type": "stream", 251 | "text": [ 252 | "3\n", 253 | "1 2\n", 254 | "[1, 2, 3, 4]\n", 255 | "[1, 2, 3, 4] [3, 4]\n", 256 | "(10, 20) (30, 40)\n" 257 | ] 258 | } 259 | ], 260 | "source": [ 261 | "def f(a, b):\n", 262 | " a+=b\n", 263 | " return a\n", 264 | "\n", 265 | "x = 1\n", 266 | "y = 2\n", 267 | "print(f(x,y))\n", 268 | "#int x is unchanged\n", 269 | "print(x,y)\n", 270 | "\n", 271 | "a = [1,2]\n", 272 | "b = [3,4]\n", 273 | "print(f(a,b))\n", 274 | "#list a is changed\n", 275 | "print(a,b)\n", 276 | "\n", 277 | "t = (10,20)\n", 278 | "u = (30,40)\n", 279 | "f(t,u)\n", 280 | "#tuple t is unchanged\n", 281 | "print(t,u)" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 31, 287 | "metadata": { 288 | "collapsed": false 289 | }, 290 | "outputs": [ 291 | { 292 | "name": "stdout", 293 | "output_type": "stream", 294 | "text": [ 295 | "['Alice', 'Bill']\n", 296 | "['Bill', 'Charlie']\n", 297 | "['Carrie']\n", 298 | "['Carrie']\n" 299 | ] 300 | } 301 | ], 302 | "source": [ 303 | "class HauntedBus:\n", 304 | " \"\"\"A bus model haunted by ghost passengers\"\"\"\n", 305 | " def __init__(self, passengers=[]): \n", 306 | " self.passengers = passengers\n", 307 | " def pick(self, name): \n", 308 | " self.passengers.append(name)\n", 309 | " def drop(self, name): \n", 310 | " self.passengers.remove(name)\n", 311 | "\n", 312 | "bus1 = HauntedBus(['Alice', 'Bill'])\n", 313 | "print(bus1.passengers)\n", 314 | "bus1.pick('Charlie')\n", 315 | "bus1.drop('Alice')\n", 316 | "print(bus1.passengers)\n", 317 | "bus2 = HauntedBus()\n", 318 | "bus2.pick('Carrie')\n", 319 | "print(bus2.passengers)\n", 320 | "bus3 = HauntedBus()\n", 321 | "#bus3.passengers is not empty!\n", 322 | "#The problem: bus2.passengers and bus3.passengers refer to the same list. \n", 323 | "#They were both instantiated with the same list\n", 324 | "print(bus3.passengers)" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "The issue with mutable defaults explains why None is often used as the default value for parameters that may receive mutable values. \n", 332 | "\n", 333 | "### Defensive programming with mutable parameters" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 34, 339 | "metadata": { 340 | "collapsed": false 341 | }, 342 | "outputs": [ 343 | { 344 | "name": "stdout", 345 | "output_type": "stream", 346 | "text": [ 347 | "['Sue', 'Tina', 'Maya', 'Diana', 'Pat']\n" 348 | ] 349 | } 350 | ], 351 | "source": [ 352 | "class TwilightBus:\n", 353 | " \"\"\"A bus model that makes passengers vanish\"\"\"\n", 354 | " def __init__(self, passengers=None): \n", 355 | " if passengers is None:\n", 356 | " self.passengers = [] \n", 357 | " else:\n", 358 | "# self.passengers = passengers\n", 359 | " #make a copy of the passengers list, rather than an alias\n", 360 | " self.passengers = list(passengers)\n", 361 | " \n", 362 | " def pick(self, name):\n", 363 | " self.passengers.append(name) \n", 364 | " def drop(self, name):\n", 365 | " self.passengers.remove(name)\n", 366 | "\n", 367 | "basketball_team = ['Sue', 'Tina', 'Maya', 'Diana', 'Pat']\n", 368 | "\n", 369 | "bus = TwilightBus(basketball_team)\n", 370 | "\n", 371 | "bus.drop('Tina')\n", 372 | "bus.drop('Pat')\n", 373 | "#When the methods .remove() and .append() are used \n", 374 | "#with self.passengers we are actually mutating the \n", 375 | "#original list received as argument to the constructor.\n", 376 | "print(basketball_team)" 377 | ] 378 | }, 379 | { 380 | "cell_type": "markdown", 381 | "metadata": {}, 382 | "source": [ 383 | "## del and garbage collection\n", 384 | "\n", 385 | "Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. The del statement deletes names, not objects. An object may be garbage collected as result of a del command, but only if the variable deleted holds the last reference to the object, or if the object becomes unreachable4. Rebinding a variable may also cause the number of references to an object reach zero, causing its destruction." 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": 37, 391 | "metadata": { 392 | "collapsed": false 393 | }, 394 | "outputs": [ 395 | { 396 | "name": "stdout", 397 | "output_type": "stream", 398 | "text": [ 399 | "Gone with the wind...\n", 400 | "True\n", 401 | "True\n", 402 | "Gone with the wind...\n", 403 | "False\n" 404 | ] 405 | } 406 | ], 407 | "source": [ 408 | "#del does not delete objects, but objects may be deleted as a \n", 409 | "#consequence of being unreachable after del is used.\n", 410 | "\n", 411 | "import weakref\n", 412 | "\n", 413 | "s1={1,2,3}\n", 414 | "#s1 and s2 are aliases referring to the same set, {1, 2, 3}.\n", 415 | "s2=s1\n", 416 | "\n", 417 | "#This function must not be a bound method\n", 418 | "def bye():\n", 419 | " print('Gone with the wind...')\n", 420 | "\n", 421 | "#Register the bye callback on the object referred by s1.\n", 422 | "ender = weakref.finalize(s1, bye) \n", 423 | "print(ender.alive)\n", 424 | "\n", 425 | "del s1\n", 426 | "\n", 427 | "print(ender.alive)\n", 428 | "\n", 429 | "#Rebinding the last reference, s2, makes {1, 2, 3} unreachable. \n", 430 | "#It is destroyed, the bye callback is invoked and ender.alive becomes False.\n", 431 | "s2 = 'spam'\n", 432 | "\n", 433 | "print(ender.alive)" 434 | ] 435 | }, 436 | { 437 | "cell_type": "markdown", 438 | "metadata": {}, 439 | "source": [ 440 | "## Weak references\n", 441 | "\n", 442 | "The presence of references is what keeps an object alive in memory. When the reference count of an object reaches zero, the garbage collector disposes of it. But sometimes it is useful to have a reference to an object that does not keep it around longer than necessary. A common use case is a cache.\n", 443 | "\n", 444 | "- Weak references to an object do not increase its reference count. The object that is the target of a reference is called the referent. Therefore, we say that a weak reference does not prevent the referent from being garbage collected.\n", 445 | "- Weak references are useful in caching applications because you don’t want the cached objects to be kept alive just because they are referenced by the cache." 446 | ] 447 | }, 448 | { 449 | "cell_type": "code", 450 | "execution_count": 39, 451 | "metadata": { 452 | "collapsed": false 453 | }, 454 | "outputs": [ 455 | { 456 | "name": "stdout", 457 | "output_type": "stream", 458 | "text": [ 459 | "\n", 460 | "{0, 1}\n", 461 | "None\n", 462 | "True\n", 463 | "True\n" 464 | ] 465 | } 466 | ], 467 | "source": [ 468 | "import weakref\n", 469 | "\n", 470 | "a_set = {0, 1}\n", 471 | "\n", 472 | "wref = weakref.ref(a_set)\n", 473 | "\n", 474 | "print(wref)\n", 475 | "print(wref())\n", 476 | "\n", 477 | "a_set = {2, 3, 4}\n", 478 | "\n", 479 | "#doesn't work?\n", 480 | "print(wref())\n", 481 | "print(wref() is None)\n", 482 | "print(wref() is None)" 483 | ] 484 | }, 485 | { 486 | "cell_type": "code", 487 | "execution_count": 44, 488 | "metadata": { 489 | "collapsed": false 490 | }, 491 | "outputs": [ 492 | { 493 | "name": "stdout", 494 | "output_type": "stream", 495 | "text": [ 496 | "['Brie', 'Parmesan', 'Red Leicester', 'Tilsit']\n", 497 | "['Parmesan']\n", 498 | "[]\n" 499 | ] 500 | } 501 | ], 502 | "source": [ 503 | "import weakref\n", 504 | "\n", 505 | "class Cheese:\n", 506 | " def __init__(self, kind):\n", 507 | " self.kind = kind \n", 508 | " def __repr__(self):\n", 509 | " return 'Cheese(%r)' % self.kind\n", 510 | "\n", 511 | "#The stock maps the name of the cheese to a weak reference to the cheese instance\n", 512 | "in the catalog.\n", 513 | "#stock is a WeakValueDictionary\n", 514 | "stock = weakref.WeakValueDictionary()\n", 515 | "catalog = [Cheese('Red Leicester'), Cheese('Tilsit'), Cheese('Brie'), Cheese('Parmesan')]\n", 516 | "for cheese in catalog:\n", 517 | " stock[cheese.kind] = cheese\n", 518 | "\n", 519 | "print(sorted(stock.keys()))\n", 520 | "del catalog\n", 521 | "print(sorted(stock.keys()))\n", 522 | "del cheese\n", 523 | "print(sorted(stock.keys()))" 524 | ] 525 | }, 526 | { 527 | "cell_type": "markdown", 528 | "metadata": {}, 529 | "source": [ 530 | "### Limitations of weak references\n", 531 | "Not every Python object may be the target, or referent, of a weak reference. Basic list and dict instances may not be referents, but a plain subclass of either can solve this problem easily:" 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": 45, 537 | "metadata": { 538 | "collapsed": false 539 | }, 540 | "outputs": [ 541 | { 542 | "name": "stdout", 543 | "output_type": "stream", 544 | "text": [ 545 | "True\n", 546 | "True\n" 547 | ] 548 | } 549 | ], 550 | "source": [ 551 | "t1=(1,2,3) \n", 552 | "t2 = tuple(t1)\n", 553 | "#t1 and t2 are bound to the same object.\n", 554 | "print(t2 is t1)\n", 555 | "#And so is t3.\n", 556 | "t3 = t1[:]\n", 557 | "print(t3 is t1)\n" 558 | ] 559 | } 560 | ], 561 | "metadata": { 562 | "kernelspec": { 563 | "display_name": "Python 3", 564 | "language": "python", 565 | "name": "python3" 566 | }, 567 | "language_info": { 568 | "codemirror_mode": { 569 | "name": "ipython", 570 | "version": 3 571 | }, 572 | "file_extension": ".py", 573 | "mimetype": "text/x-python", 574 | "name": "python", 575 | "nbconvert_exporter": "python", 576 | "pygments_lexer": "ipython3", 577 | "version": "3.6.0" 578 | } 579 | }, 580 | "nbformat": 4, 581 | "nbformat_minor": 2 582 | } 583 | -------------------------------------------------------------------------------- /cafe.txt: -------------------------------------------------------------------------------- 1 | café -------------------------------------------------------------------------------- /python.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/geonaut/Fluent-Python/3edd7e5eef2a166e6c17a2d26a77b074ad153e1c/python.gif -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | Jupyter notebooks for "Fluent Python", by Luciano Ramalho 2 | ========================================================= 3 | 4 | ![cover](http://akamaicovers.oreilly.com/images/0636920032519/cat.gif "Cover") 5 | 6 | ## Chapter 01 - The Python Data Model 7 | 8 | The book starts with a chapter on the Python data object, highlighting the advantages of using special methods (aka dunder methods), as they give access to core language features — like iteration and slicing from the standard library. A small class is created, that implements some special methods. From there, slice and choice are used, without having programmed them. It is clear from the example that one of the biggest advantages of using Python's magic methods is that they provide a simple way to make objects behave like built-in types. That means you can avoid ugly, counter-intuitive, and nonstandard ways of performing basic operators. For example, the __add__ dunder method is implmented to enable the adding of two dictionaries, where the + operand is not supported by the 'dict' type. 9 | 10 | ## Chapter 02 - An Array of Sequences 11 | 12 | Chapter 2 dives into the Python sequence types, based on the C primitives. Firstly, the list type is discussed, including list comprehension and generator expressions, for creating lists and Cartesian products. These are compared with for loops, and map and filter functions. Secondly tuples are covered, both as records with no names, and immutable lists. This includes tuple unpacking, named tuples and the associated special attibutes, such as `__make` and `__asdict`. Thirdly, there is the a good section on slicing, nested lists, augmented assignment and sorting. There are a few exercises that cover list searching using `bisect` and large lists in arrays. Fourthly, there is a section on memory views, which allow direct manuipulation of data objects through byte poking. The last section introduces NumPy as a means of handling 2D arrays, and queues as a means of handling fixed length sequences. 13 | 14 | ## Chapter 03 - Dictionaries and sets 15 | 16 | The third chapter focuses on dictionaries and sets, which are widely used and fast data structures. It outlines some real world uses, such as module namespaces, and discusses dict comprehension syntax. There is then a discussion of some of the standard dict types - such as defaultdict and ordered dict, and the available methods, like default key, missing key. There is a section dedicated to the UserDict type, and the relevant built-ins. Mapping proxy is highlighted as a read-only, but dynamic view of a dict. Finally, there is a summary of set theory, and hash tables, which explains why dicts must be immutable, why lookup is so fast, and why there is a memory overhead. 17 | 18 | ## Chapter 04 - Text vs Bytes 19 | 20 | The author works through the unicode standard, and the various encodings which implement it. A couple of examples are used to illustrate encoding and decoding a non-ASCII character, and the Python byte sequences are introduced, followed by struct and memory views. A few different encoders are discussed, including files encoding and default system encoding. The chapter goes onto show how mixed characters can be normalised, casefolded and sanitised, and stripped of diacritics. A short function displays the unicode database, including some metadata. 21 | 22 | ## Chapter 05 - First-class functions 23 | 24 | This chapter deals with one of the key features of Python - functions being first-class objects. This means they can be created at runtime, assigned to var or element, passed as an argument or returned as a result. The idea of a higher-order function is introduced, being a function that takes another function as an argument. Then anonymous functions are introduced, as well a different types of callable object. Some nice examples of positional and keyword arguments are introduced, as well as introspection using passive attributes like co_varnames & annotations, and more advanced modules such as inspect. There is also a brief introduction to functional programming operators and funtools. 25 | 26 | ## Chapter 06 - Design patterns with first-class functions 27 | 28 | A relatively brief chapter, the advantages of functions as first class objects are reinforced through the refactoring of a couple of design patterns, as listed in the famous 'Gang of Four' design pattern book. The Strategy design pattern is particularly clearly discussed, with functions being passed as objects to disassocate them from a particular class. 29 | 30 | ## Chapter 07 - Function decorators and closures 31 | 32 | This chapter covers a lot of ground, and does a really solid job of explaining decorators, closures, nonlocal vars and nested functions. Each term is given a good definition, and then the interplay and interdependencies of the different entities is explained. The chapter closes with some standard library decorators, and then goes on to establish a custom decorator, which is passed arguments, and customised even further. The final examples are quite complex, but should help to clarify a lot of the 'framework magic' used in e.g. Flask and Django. 33 | 34 | # Chapter 08 - Object references, mutability and recycling 35 | 36 | Chapter 8 thoroughly explains variables as labels, and the relative immutability of tuple values. Examples of aliasing, copying, deep copying are shown. These are illustrated with an example of a haunted bus, where non-copied lists are shared between different instances of the class, and a twilight bus, where list items disappear from a double-referenced list external to the class. This problem is solved by defining a default value of none, and copying any lists with content to a new list. There is also a discussion of garbage collection, which relies on reference counts reaching 0. The idea of weak reference counts, espectially for caches, is demonstrated -------------------------------------------------------------------------------- /zen.txt: -------------------------------------------------------------------------------- 1 | Beautiful is better than ugly. 2 | Explicit is better than implicit. 3 | Simple is better than complex. 4 | Complex is better than complicated. 5 | Flat is better than nested. 6 | Sparse is better than dense. 7 | Readability counts. 8 | Special cases aren't special enough to break the rules. 9 | Although practicality beats purity. 10 | Errors should never pass silently. 11 | Unless explicitly silenced. 12 | In the face of ambiguity, refuse the temptation to guess. 13 | There should be one—and preferably only one—obvious way to do it. 14 | Although that way may not be obvious at first unless you're Dutch. 15 | Now is better than never. 16 | Although never is often better than right now. 17 | If the implementation is hard to explain, it's a bad idea. 18 | If the implementation is easy to explain, it may be a good idea. 19 | Namespaces are one honking great idea—let's do more of those! --------------------------------------------------------------------------------