├── .gitignore ├── binder ├── start └── environment.yml ├── environment.yml ├── README.md ├── 05-bag.ipynb ├── 06-schedulers.ipynb ├── 07-ML.ipynb ├── 03-array.ipynb └── 04-delayed.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | .DS_Store 3 | dask-worker-space 4 | -------------------------------------------------------------------------------- /binder/start: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Replace DASK_DASHBOARD_URL with the proxy location 4 | sed -i -e "s|DASK_DASHBOARD_URL|${JUPYTERHUB_BASE_URL}user/${JUPYTERHUB_USER}/proxy/8787|g" binder/jupyterlab-workspace.json 5 | 6 | exec "$@" 7 | -------------------------------------------------------------------------------- /environment.yml: -------------------------------------------------------------------------------- 1 | name: talkpython-dask 2 | channels: 3 | - conda-forge 4 | dependencies: 5 | - python=3.8 6 | - nodejs 7 | - dask>=2021.3.0 8 | - dask-ml>=1.7.0 9 | - distributed>=2021.3.0 10 | - jupyterlab>=3.0 11 | - notebook 12 | - pandas>=1.0.1 13 | - numpy>=1.19.2 14 | - scipy>=1.4.1 15 | - scikit-learn>=0.22.1 16 | - scikit-image>=0.15.0 17 | - ipywidgets>=7.5 18 | - bokeh>=2.3.0 19 | - pip>=20.3.0 20 | - pip: 21 | - dask-labextension>=3.0.0 22 | - coiled 23 | - python-graphviz 24 | - h5py 25 | - mimesis 26 | -------------------------------------------------------------------------------- /binder/environment.yml: -------------------------------------------------------------------------------- 1 | name: talkpython-dask 2 | channels: 3 | - conda-forge 4 | dependencies: 5 | - python=3.8 6 | - nodejs 7 | - dask>=2021.3.0 8 | - dask-ml>=1.7.0 9 | - distributed>=2021.3.0 10 | - jupyterlab>=3.0 11 | - notebook 12 | - pandas>=1.0.1 13 | - numpy>=1.19.2 14 | - scipy>=1.4.1 15 | - scikit-learn>=0.22.1 16 | - scikit-image>=0.15.0 17 | - ipywidgets>=7.5 18 | - bokeh>=2.3.0 19 | - pip>=20.3.0 20 | - pip: 21 | - dask-labextension>=3.0.0 22 | - coiled 23 | - python-graphviz 24 | - h5py 25 | - mimesis 26 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Fundamentals of Dask 2 | 3 | This repository contains the material for **Talk Python Training course** on Fundamentals of Dask. 4 | 5 | The Python data science stack, consisting of tools like pandas, NumPy, scikit-learn, and many more is extremely powerful, but it rarely leverages the parallel computing potential of modern hardware. Dask can help bridge this gap. This course will teach you how to parallelize everything from array computations to general Python code with Dask and even perform distributed machine learning to train models at scale. 6 | 7 | Take the course at: [training.talkpython.fm](https://training.talkpython.fm/courses/fundamentals-of-dask-getting-up-to-speed) 8 | 9 | In this course, you will learn to: 10 | 11 | * Scale array computations using a parallel alternative to NumPy 12 | * Parallelize general Python code including for-loops 13 | * Work with unstructured data in parallel 14 | * Train machine learning models faster using distributed computing 15 | * And lots more! 16 | 17 | ## Prerequisites 18 | 19 | - Basic Python 20 | 21 | Not required, but nice to have: 22 | - pandas 23 | - NumPy 24 | - scikit-learn 25 | - Machine Learning 26 | - JupyterLab 27 | - conda (for local setup) 28 | - Terminal (for local setup) 29 | 30 | ## Setup 31 | 32 | You get up and running in two ways: 33 | 34 | ### Launch Binder 35 | 36 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/coiled/talkpython-fundamentals-of-dask/master?urlpath=lab/tree/03-array.ipynb) 37 | 38 | The binder project allows you to open Jupyter notebooks in this repository in an online executable environment. Click on the "launch binder" link in your browser window to get started. It might take a few minutes to start. 39 | 40 | *Note: Binder notebooks timeout if inactive for more than 10 mins.* 41 | 42 | ### Local setup (recommended) 43 | 44 | * [Fork this repository](https://docs.github.com/en/free-pro-team@latest/github/getting-started-with-github/fork-a-repo) 45 | 46 | * Clone your forked repository: 47 | 48 | ```git clone http://github.com//talkpython-fundamentals-of-dask``` 49 | 50 | * From root directory: 51 | 52 | ```cd talkpython-fundamentals-of-dask``` 53 | 54 | create a new conda environment: 55 | 56 | ```conda env create -f environment.yml``` 57 | 58 | * Activate the conda environment: 59 | 60 | ``` conda activate talkpython-dask``` 61 | 62 | * Start JupyterLab 63 | 64 | ```jupyter lab``` 65 | -------------------------------------------------------------------------------- /05-bag.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "concrete-certification", 6 | "metadata": {}, 7 | "source": [ 8 | "# Dask Bag\n", 9 | "\n", 10 | "## Notebook Objectives\n", 11 | "* **Read and Manipulate data with Dask Bag**, high-level interface to parallelize generic Python objects.\n", 12 | "* **Convert Dask Bag to Dask DataFrame**.\n", 13 | "* **Limitations of Dask Bag**.\n", 14 | "* **References** for further reading." 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "id": "experimental-thirty", 20 | "metadata": {}, 21 | "source": [ 22 | "## Read data with Dask Bag\n", 23 | "\n", 24 | "We can create a Dask Bag from any Python sequence: lists, dict, set, from files (json, xml), S3, etc.\n", 25 | "\n", 26 | "Before that, let's start a Cluster:" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 2, 32 | "id": "knowing-graduation", 33 | "metadata": {}, 34 | "outputs": [ 35 | { 36 | "data": { 37 | "text/html": [ 38 | "\n", 39 | "\n", 40 | "\n", 47 | "\n", 55 | "\n", 56 | "
\n", 41 | "

Client

\n", 42 | "\n", 46 | "
\n", 48 | "

Cluster

\n", 49 | "
    \n", 50 | "
  • Workers: 4
  • \n", 51 | "
  • Cores: 12
  • \n", 52 | "
  • Memory: 16.00 GiB
  • \n", 53 | "
\n", 54 | "
" 57 | ], 58 | "text/plain": [ 59 | "" 60 | ] 61 | }, 62 | "execution_count": 2, 63 | "metadata": {}, 64 | "output_type": "execute_result" 65 | } 66 | ], 67 | "source": [ 68 | "from dask.distributed import Client\n", 69 | "\n", 70 | "client = Client(n_workers=4)\n", 71 | "client" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "id": "decent-overall", 77 | "metadata": {}, 78 | "source": [ 79 | "Open the dashboards!" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "id": "liable-montgomery", 85 | "metadata": {}, 86 | "source": [ 87 | "### Reading from Python sequence\n", 88 | "\n", 89 | "Here we create a Dask Bag from a Python list. You can create Bags similarly from sets and dictionaries.\n", 90 | "\n", 91 | "Data is partitioned into blocks. In the following example, there are two partitions with 5 elements each." 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 3, 97 | "id": "christian-nancy", 98 | "metadata": {}, 99 | "outputs": [ 100 | { 101 | "data": { 102 | "text/plain": [ 103 | "dask.bag" 104 | ] 105 | }, 106 | "execution_count": 3, 107 | "metadata": {}, 108 | "output_type": "execute_result" 109 | } 110 | ], 111 | "source": [ 112 | "import dask.bag as db\n", 113 | "\n", 114 | "b = db.from_sequence(['Alaska', 'Minnesota', 'Georgia', 'Maine', 'West Virginia', 'California', 'South Dakota', 'Indiana', 'New York', 'Nebraska'], npartitions=2)\n", 115 | "b" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "id": "handmade-wilson", 121 | "metadata": {}, 122 | "source": [ 123 | "Bag object are also evaluated lazily by default, so we need to call `compute` to get the result." 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 4, 129 | "id": "classified-split", 130 | "metadata": { 131 | "tags": [] 132 | }, 133 | "outputs": [ 134 | { 135 | "data": { 136 | "text/plain": [ 137 | "['Alaska',\n", 138 | " 'Minnesota',\n", 139 | " 'Georgia',\n", 140 | " 'Maine',\n", 141 | " 'West Virginia',\n", 142 | " 'California',\n", 143 | " 'South Dakota',\n", 144 | " 'Indiana',\n", 145 | " 'New York',\n", 146 | " 'Nebraska']" 147 | ] 148 | }, 149 | "execution_count": 4, 150 | "metadata": {}, 151 | "output_type": "execute_result" 152 | } 153 | ], 154 | "source": [ 155 | "b.compute()" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "id": "organized-tsunami", 161 | "metadata": {}, 162 | "source": [ 163 | "`take()` can be used to show elements of the data." 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": 5, 169 | "id": "resistant-competition", 170 | "metadata": {}, 171 | "outputs": [ 172 | { 173 | "data": { 174 | "text/plain": [ 175 | "('Alaska', 'Minnesota', 'Georgia')" 176 | ] 177 | }, 178 | "execution_count": 5, 179 | "metadata": {}, 180 | "output_type": "execute_result" 181 | } 182 | ], 183 | "source": [ 184 | "b.take(3)" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "id": "consolidated-cutting", 190 | "metadata": {}, 191 | "source": [ 192 | "### Reading from JSON file\n", 193 | "\n", 194 | "Here we create a Dask Bag from the JSON files." 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 6, 200 | "id": "animated-buyer", 201 | "metadata": {}, 202 | "outputs": [ 203 | { 204 | "data": { 205 | "text/plain": [ 206 | "['/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/0.json',\n", 207 | " '/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/1.json',\n", 208 | " '/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/2.json',\n", 209 | " '/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/3.json',\n", 210 | " '/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/4.json',\n", 211 | " '/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/5.json',\n", 212 | " '/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/6.json',\n", 213 | " '/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/7.json',\n", 214 | " '/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/8.json',\n", 215 | " '/Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/data/9.json']" 216 | ] 217 | }, 218 | "execution_count": 6, 219 | "metadata": {}, 220 | "output_type": "execute_result" 221 | } 222 | ], 223 | "source": [ 224 | "# Create random data and store as JSON files \n", 225 | "\n", 226 | "import dask\n", 227 | "import json\n", 228 | "import os\n", 229 | "\n", 230 | "b = dask.datasets.make_people()\n", 231 | "b.map(json.dumps).to_textfiles('data/*.json')" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "id": "4533ad2c-47fa-45e5-b1dc-91d29147e861", 237 | "metadata": {}, 238 | "source": [ 239 | "Then, read the data using `read_text`." 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 7, 245 | "id": "postal-newcastle", 246 | "metadata": {}, 247 | "outputs": [ 248 | { 249 | "data": { 250 | "text/plain": [ 251 | "dask.bag" 252 | ] 253 | }, 254 | "execution_count": 7, 255 | "metadata": {}, 256 | "output_type": "execute_result" 257 | } 258 | ], 259 | "source": [ 260 | "b = db.read_text('data/*.json')\n", 261 | "b" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": 7, 267 | "id": "first-lover", 268 | "metadata": {}, 269 | "outputs": [ 270 | { 271 | "data": { 272 | "text/plain": [ 273 | "('{\"age\": 30, \"name\": [\"Darrel\", \"Soto\"], \"occupation\": \"Audiologist\", \"telephone\": \"527.475.4983\", \"address\": {\"address\": \"460 Rivas Drung\", \"city\": \"Winston-Salem\"}, \"credit-card\": {\"number\": \"2446 9077 9141 7987\", \"expiration-date\": \"09/22\"}}\\n',\n", 274 | " '{\"age\": 38, \"name\": [\"Sindy\", \"Campbell\"], \"occupation\": \"Foreman\", \"telephone\": \"946.885.3965\", \"address\": {\"address\": \"1185 Bass Spur\", \"city\": \"Millville\"}, \"credit-card\": {\"number\": \"4956 2525 9272 9241\", \"expiration-date\": \"08/20\"}}\\n')" 275 | ] 276 | }, 277 | "execution_count": 7, 278 | "metadata": {}, 279 | "output_type": "execute_result" 280 | } 281 | ], 282 | "source": [ 283 | "b.take(2)" 284 | ] 285 | }, 286 | { 287 | "cell_type": "markdown", 288 | "id": "marine-telling", 289 | "metadata": {}, 290 | "source": [ 291 | "Note the partitions for the 10 files in our data.\n", 292 | "\n", 293 | "The data comes out as lines of text, we can make this data more readable using `json.loads`." 294 | ] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": 8, 299 | "id": "catholic-retention", 300 | "metadata": {}, 301 | "outputs": [ 302 | { 303 | "data": { 304 | "text/plain": [ 305 | "({'age': 60,\n", 306 | " 'name': ['Jeffery', 'Garcia'],\n", 307 | " 'occupation': 'Training Consultant',\n", 308 | " 'telephone': '1-702-673-7969',\n", 309 | " 'address': {'address': '744 Langton Parade', 'city': 'Sugar Hill'},\n", 310 | " 'credit-card': {'number': '3745 852410 45994', 'expiration-date': '06/23'}},\n", 311 | " {'age': 54,\n", 312 | " 'name': ['Parker', 'Reed'],\n", 313 | " 'occupation': 'Window Dresser',\n", 314 | " 'telephone': '223-543-9697',\n", 315 | " 'address': {'address': '1065 Mill Field', 'city': 'South Portland'},\n", 316 | " 'credit-card': {'number': '3789 947854 38464', 'expiration-date': '09/23'}})" 317 | ] 318 | }, 319 | "execution_count": 8, 320 | "metadata": {}, 321 | "output_type": "execute_result" 322 | } 323 | ], 324 | "source": [ 325 | "b = b.map(json.loads)\n", 326 | "b.take(2)" 327 | ] 328 | }, 329 | { 330 | "cell_type": "markdown", 331 | "id": "civilian-booth", 332 | "metadata": {}, 333 | "source": [ 334 | "## Manipulate data with Dask Bag\n", 335 | "\n", 336 | "Bag objects have the standard functional API found in projects like the Python standard library, toolz, or pyspark, including map, filter, groupby, etc.\n", 337 | "\n", 338 | "Operations on Bag objects create new bags. " 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "id": "wicked-cholesterol", 344 | "metadata": {}, 345 | "source": [ 346 | "### Filter operation\n", 347 | "\n", 348 | "Filter the file for all records having age over 25." 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": 9, 354 | "id": "important-render", 355 | "metadata": {}, 356 | "outputs": [ 357 | { 358 | "data": { 359 | "text/plain": [ 360 | "({'age': 60,\n", 361 | " 'name': ['Jeffery', 'Garcia'],\n", 362 | " 'occupation': 'Training Consultant',\n", 363 | " 'telephone': '1-702-673-7969',\n", 364 | " 'address': {'address': '744 Langton Parade', 'city': 'Sugar Hill'},\n", 365 | " 'credit-card': {'number': '3745 852410 45994', 'expiration-date': '06/23'}},\n", 366 | " {'age': 54,\n", 367 | " 'name': ['Parker', 'Reed'],\n", 368 | " 'occupation': 'Window Dresser',\n", 369 | " 'telephone': '223-543-9697',\n", 370 | " 'address': {'address': '1065 Mill Field', 'city': 'South Portland'},\n", 371 | " 'credit-card': {'number': '3789 947854 38464', 'expiration-date': '09/23'}},\n", 372 | " {'age': 44,\n", 373 | " 'name': ['Nicolas', 'Duncan'],\n", 374 | " 'occupation': 'Forest Ranger',\n", 375 | " 'telephone': '064.491.6735',\n", 376 | " 'address': {'address': '529 Cameron Alley', 'city': 'Garner'},\n", 377 | " 'credit-card': {'number': '4165 7976 6426 7113',\n", 378 | " 'expiration-date': '11/22'}},\n", 379 | " {'age': 42,\n", 380 | " 'name': ['Patrick', 'Rasmussen'],\n", 381 | " 'occupation': 'Technical Clerk',\n", 382 | " 'telephone': '530-726-3639',\n", 383 | " 'address': {'address': '988 Western Shore Line', 'city': 'Yorba Linda'},\n", 384 | " 'credit-card': {'number': '4075 4659 6389 2457',\n", 385 | " 'expiration-date': '08/22'}},\n", 386 | " {'age': 28,\n", 387 | " 'name': ['Caleb', 'Allison'],\n", 388 | " 'occupation': 'Hospital Manager',\n", 389 | " 'telephone': '1-197-089-3998',\n", 390 | " 'address': {'address': '908 White Place', 'city': 'Salinas'},\n", 391 | " 'credit-card': {'number': '3766 677448 55505', 'expiration-date': '05/22'}})" 392 | ] 393 | }, 394 | "execution_count": 9, 395 | "metadata": {}, 396 | "output_type": "execute_result" 397 | } 398 | ], 399 | "source": [ 400 | "b.filter(lambda record: record['age'] > 25).take(5)" 401 | ] 402 | }, 403 | { 404 | "cell_type": "markdown", 405 | "id": "heard-solid", 406 | "metadata": {}, 407 | "source": [ 408 | "### Map operation\n", 409 | "\n", 410 | "Get only the first name." 411 | ] 412 | }, 413 | { 414 | "cell_type": "code", 415 | "execution_count": 10, 416 | "id": "touched-emperor", 417 | "metadata": {}, 418 | "outputs": [ 419 | { 420 | "data": { 421 | "text/plain": [ 422 | "('Jeffery',\n", 423 | " 'Parker',\n", 424 | " 'Nicolas',\n", 425 | " 'Rickie',\n", 426 | " 'Patrick',\n", 427 | " 'Caleb',\n", 428 | " 'Cruz',\n", 429 | " 'Jeanene',\n", 430 | " 'Wade',\n", 431 | " 'Jarrett')" 432 | ] 433 | }, 434 | "execution_count": 10, 435 | "metadata": {}, 436 | "output_type": "execute_result" 437 | } 438 | ], 439 | "source": [ 440 | "x = b.map(lambda record: record['name'][0]).take(10)\n", 441 | "x" 442 | ] 443 | }, 444 | { 445 | "cell_type": "markdown", 446 | "id": "seventh-security", 447 | "metadata": { 448 | "tags": [] 449 | }, 450 | "source": [ 451 | "### Groupby Operation\n", 452 | "\n", 453 | "Group data by some function or key." 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": 11, 459 | "id": "mobile-collect", 460 | "metadata": { 461 | "tags": [] 462 | }, 463 | "outputs": [ 464 | { 465 | "data": { 466 | "text/plain": [ 467 | "[(6, ['Parker', 'Rickie']),\n", 468 | " (4, ['Cruz', 'Wade']),\n", 469 | " (7, ['Jeffery', 'Nicolas', 'Patrick', 'Jeanene', 'Jarrett']),\n", 470 | " (5, ['Caleb'])]" 471 | ] 472 | }, 473 | "execution_count": 11, 474 | "metadata": {}, 475 | "output_type": "execute_result" 476 | } 477 | ], 478 | "source": [ 479 | "b = db.from_sequence(x, npartitions=2)\n", 480 | "b.groupby(len).compute()" 481 | ] 482 | }, 483 | { 484 | "cell_type": "markdown", 485 | "id": "suitable-announcement", 486 | "metadata": {}, 487 | "source": [ 488 | "**Note:**\n", 489 | "\n", 490 | "Often we want to group data by some function or key. We can do this either with the `.groupby` method, which is straightforward but forces a full shuffle of the data (expensive) or with the harder-to-use but faster `.foldby` method, which does a streaming combined groupby and reduction.\n", 491 | "\n", 492 | "* `groupby`: Shuffles data so that all items with the same key are in the same key-value pair\n", 493 | "* `foldby`: Walks through the data accumulating a result per key\n", 494 | "\n", 495 | "_~ Source: [tutorial.dask.org](https://tutorial.dask.org/02_bag.html#Groupby-and-Foldby)_" 496 | ] 497 | }, 498 | { 499 | "cell_type": "markdown", 500 | "id": "northern-interaction", 501 | "metadata": {}, 502 | "source": [ 503 | "## Checkpoint\n", 504 | "\n", 505 | "**Question:** Find all cities from the JSON data we created earlier." 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": null, 511 | "id": "subject-shoulder", 512 | "metadata": {}, 513 | "outputs": [], 514 | "source": [ 515 | "# Your answer here" 516 | ] 517 | }, 518 | { 519 | "cell_type": "code", 520 | "execution_count": null, 521 | "id": "partial-closer", 522 | "metadata": { 523 | "jupyter": { 524 | "source_hidden": true 525 | }, 526 | "tags": [] 527 | }, 528 | "outputs": [], 529 | "source": [ 530 | "b = db.read_text('data/*.json').map(json.loads)\n", 531 | "x = b.map(lambda record: record['address']['city']).take(10)\n", 532 | "x" 533 | ] 534 | }, 535 | { 536 | "cell_type": "markdown", 537 | "id": "hollow-permit", 538 | "metadata": {}, 539 | "source": [ 540 | "## Convert Dask Bag to Dask DataFrame\n", 541 | "\n", 542 | "Dask Bag can be used for simple analysis but for more complex computations, Dask DataFrame or Dask Array might be a better choice. They are faster for the same reason pandas and numpy are faster than Python. They also have more functionality suited for data analysis.\n", 543 | "\n", 544 | "`to_dataframe` method can be used to transform Dask Bag to Dask DataFrame." 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 19, 550 | "id": "subjective-biotechnology", 551 | "metadata": { 552 | "tags": [] 553 | }, 554 | "outputs": [ 555 | { 556 | "data": { 557 | "text/html": [ 558 | "
\n", 559 | "\n", 572 | "\n", 573 | " \n", 574 | " \n", 575 | " \n", 576 | " \n", 577 | " \n", 578 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 582 | " \n", 583 | " \n", 584 | " \n", 585 | " \n", 586 | " \n", 587 | " \n", 588 | " \n", 589 | " \n", 590 | " \n", 591 | " \n", 592 | " \n", 593 | " \n", 594 | " \n", 595 | " \n", 596 | " \n", 597 | " \n", 598 | " \n", 599 | " \n", 600 | " \n", 601 | " \n", 602 | " \n", 603 | " \n", 604 | " \n", 605 | " \n", 606 | " \n", 607 | " \n", 608 | " \n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 616 | " \n", 617 | " \n", 618 | " \n", 619 | " \n", 620 | " \n", 621 | " \n", 622 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 630 | " \n", 631 | "
agenameoccupationtelephoneaddresscredit-card
060[Jeffery, Garcia]Training Consultant1-702-673-7969{'address': '744 Langton Parade', 'city': 'Sug...{'number': '3745 852410 45994', 'expiration-da...
154[Parker, Reed]Window Dresser223-543-9697{'address': '1065 Mill Field', 'city': 'South ...{'number': '3789 947854 38464', 'expiration-da...
244[Nicolas, Duncan]Forest Ranger064.491.6735{'address': '529 Cameron Alley', 'city': 'Garn...{'number': '4165 7976 6426 7113', 'expiration-...
323[Rickie, Dickerson]Chiropodist(393) 425-7342{'address': '733 Miramar Run', 'city': 'Shakop...{'number': '4904 7032 6941 1961', 'expiration-...
442[Patrick, Rasmussen]Technical Clerk530-726-3639{'address': '988 Western Shore Line', 'city': ...{'number': '4075 4659 6389 2457', 'expiration-...
\n", 632 | "
" 633 | ], 634 | "text/plain": [ 635 | " age name occupation telephone \\\n", 636 | "0 60 [Jeffery, Garcia] Training Consultant 1-702-673-7969 \n", 637 | "1 54 [Parker, Reed] Window Dresser 223-543-9697 \n", 638 | "2 44 [Nicolas, Duncan] Forest Ranger 064.491.6735 \n", 639 | "3 23 [Rickie, Dickerson] Chiropodist (393) 425-7342 \n", 640 | "4 42 [Patrick, Rasmussen] Technical Clerk 530-726-3639 \n", 641 | "\n", 642 | " address \\\n", 643 | "0 {'address': '744 Langton Parade', 'city': 'Sug... \n", 644 | "1 {'address': '1065 Mill Field', 'city': 'South ... \n", 645 | "2 {'address': '529 Cameron Alley', 'city': 'Garn... \n", 646 | "3 {'address': '733 Miramar Run', 'city': 'Shakop... \n", 647 | "4 {'address': '988 Western Shore Line', 'city': ... \n", 648 | "\n", 649 | " credit-card \n", 650 | "0 {'number': '3745 852410 45994', 'expiration-da... \n", 651 | "1 {'number': '3789 947854 38464', 'expiration-da... \n", 652 | "2 {'number': '4165 7976 6426 7113', 'expiration-... \n", 653 | "3 {'number': '4904 7032 6941 1961', 'expiration-... \n", 654 | "4 {'number': '4075 4659 6389 2457', 'expiration-... " 655 | ] 656 | }, 657 | "execution_count": 19, 658 | "metadata": {}, 659 | "output_type": "execute_result" 660 | } 661 | ], 662 | "source": [ 663 | "b = db.read_text('data/*.json').map(json.loads)\n", 664 | "df = b.to_dataframe()\n", 665 | "df.head()" 666 | ] 667 | }, 668 | { 669 | "cell_type": "markdown", 670 | "id": "nominated-nothing", 671 | "metadata": {}, 672 | "source": [ 673 | "Remember to close the Cluster. :)" 674 | ] 675 | }, 676 | { 677 | "cell_type": "code", 678 | "execution_count": 20, 679 | "id": "eight-fruit", 680 | "metadata": {}, 681 | "outputs": [], 682 | "source": [ 683 | "client.close()" 684 | ] 685 | }, 686 | { 687 | "cell_type": "markdown", 688 | "id": "spanish-acceptance", 689 | "metadata": {}, 690 | "source": [ 691 | "## Limitations\n", 692 | "\n", 693 | "* Does not perform well on computations that include a great deal of inter-worker communication.\n", 694 | "* Bag operations are slower than array/DataFrame computations (Python is slower than NumPy/pandas).\n", 695 | "* Bag.groupby is slow. You should try to use Bag.foldby if possible.\n", 696 | "* Bags are immutable and so you can not change individual elements." 697 | ] 698 | }, 699 | { 700 | "cell_type": "markdown", 701 | "id": "loose-sampling", 702 | "metadata": {}, 703 | "source": [ 704 | "## References\n", 705 | "* [Dask Bag documentation](https://docs.dask.org/en/latest/bag.html)\n", 706 | "* [Dask Bag API](https://docs.dask.org/en/latest/bag-api.html)\n", 707 | "* [Dask Bag examples](https://docs.dask.org/en/latest/bag-api.html)\n", 708 | "* [Dask Tutotial - Bag](https://tutorial.dask.org/02_bag.html)" 709 | ] 710 | }, 711 | { 712 | "cell_type": "code", 713 | "execution_count": null, 714 | "id": "cardiovascular-subdivision", 715 | "metadata": {}, 716 | "outputs": [], 717 | "source": [] 718 | } 719 | ], 720 | "metadata": { 721 | "kernelspec": { 722 | "display_name": "Python 3", 723 | "language": "python", 724 | "name": "python3" 725 | }, 726 | "language_info": { 727 | "codemirror_mode": { 728 | "name": "ipython", 729 | "version": 3 730 | }, 731 | "file_extension": ".py", 732 | "mimetype": "text/x-python", 733 | "name": "python", 734 | "nbconvert_exporter": "python", 735 | "pygments_lexer": "ipython3", 736 | "version": "3.8.10" 737 | } 738 | }, 739 | "nbformat": 4, 740 | "nbformat_minor": 5 741 | } 742 | -------------------------------------------------------------------------------- /06-schedulers.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "yellow-slovak", 6 | "metadata": {}, 7 | "source": [ 8 | "# Dask Schedulers\n", 9 | "\n", 10 | "## Notebook Objectives\n", 11 | "* **Performance comparison** of different dask schedulers.\n", 12 | "* **References** for further reading." 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "id": "aboriginal-release", 18 | "metadata": {}, 19 | "source": [ 20 | "## Performance comparison of different dask schedulers" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "id": "metric-small", 26 | "metadata": {}, 27 | "source": [ 28 | "To compare the different schedulers, let's go back to the DataFrame example where we read the NYC Taxi Trips dataset and compute the maximum tip amount." 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 5, 34 | "id": "modular-button", 35 | "metadata": {}, 36 | "outputs": [ 37 | { 38 | "data": { 39 | "text/html": [ 40 | "\n", 41 | "
\n", 42 | "
\n", 49 | "
\n", 50 | "

Client

\n", 51 | "

Client-ad7e6a04-d48f-11eb-b4de-acde48001122

\n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | "
Connection method: Cluster objectCluster type: LocalCluster
\n", 61 | " Dashboard: \n", 62 | " http://127.0.0.1:8787/status\n", 63 | "
\n", 68 | " \n", 69 | "
\n", 70 | "

Cluster Info

\n", 71 | " \n", 72 | "
\n", 73 | "
\n", 80 | "
\n", 81 | "

LocalCluster

\n", 82 | "

537e1587

\n", 83 | " \n", 84 | " \n", 85 | " \n", 86 | " \n", 87 | " \n", 88 | " \n", 89 | " \n", 90 | " \n", 91 | " \n", 94 | " \n", 95 | " \n", 96 | " \n", 97 | " \n", 101 | " \n", 105 | " \n", 106 | " \n", 107 | "
Status: runningUsing processes: True
\n", 92 | " Dashboard: http://127.0.0.1:8787/status\n", 93 | " Workers: 4
\n", 98 | " Total threads:\n", 99 | " 12\n", 100 | " \n", 102 | " Total memory:\n", 103 | " 16.00 GiB\n", 104 | "
\n", 108 | "
\n", 109 | "

Scheduler Info

\n", 110 | " \n", 111 | "
\n", 112 | " \n", 113 | "
\n", 114 | "
\n", 121 | "
\n", 122 | "

Scheduler

\n", 123 | "

Scheduler-dd1a7531-946d-479c-97cf-bf50918e22b0

\n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 133 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 143 | " \n", 147 | " \n", 148 | "
Comm: tcp://127.0.0.1:49654Workers: 4
\n", 131 | " Dashboard: http://127.0.0.1:8787/status\n", 132 | " \n", 134 | " Total threads:\n", 135 | " 12\n", 136 | "
\n", 140 | " Started:\n", 141 | " Just now\n", 142 | " \n", 144 | " Total memory:\n", 145 | " 16.00 GiB\n", 146 | "
\n", 149 | "
\n", 150 | "
\n", 151 | " \n", 152 | "
\n", 153 | "

Workers

\n", 154 | " \n", 155 | "
\n", 156 | "
\n", 162 | "
\n", 163 | "
\n", 164 | " \n", 165 | "

Worker: 0

\n", 166 | "
\n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 177 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | "
Comm: tcp://127.0.0.1:49666Total threads: 3
\n", 174 | " Dashboard: \n", 175 | " http://127.0.0.1:49668/status\n", 176 | " \n", 178 | " Memory: \n", 179 | " 4.00 GiB\n", 180 | "
Nanny: tcp://127.0.0.1:49658
\n", 188 | " Local directory: \n", 189 | " /Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/dask-worker-space/worker-r84rhk5j\n", 190 | "
\n", 195 | "
\n", 196 | "
\n", 197 | "
\n", 198 | " \n", 199 | "
\n", 200 | "
\n", 206 | "
\n", 207 | "
\n", 208 | " \n", 209 | "

Worker: 1

\n", 210 | "
\n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 221 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | "
Comm: tcp://127.0.0.1:49667Total threads: 3
\n", 218 | " Dashboard: \n", 219 | " http://127.0.0.1:49670/status\n", 220 | " \n", 222 | " Memory: \n", 223 | " 4.00 GiB\n", 224 | "
Nanny: tcp://127.0.0.1:49659
\n", 232 | " Local directory: \n", 233 | " /Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/dask-worker-space/worker-c0savz3z\n", 234 | "
\n", 239 | "
\n", 240 | "
\n", 241 | "
\n", 242 | " \n", 243 | "
\n", 244 | "
\n", 250 | "
\n", 251 | "
\n", 252 | " \n", 253 | "

Worker: 2

\n", 254 | "
\n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 265 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | "
Comm: tcp://127.0.0.1:49662Total threads: 3
\n", 262 | " Dashboard: \n", 263 | " http://127.0.0.1:49664/status\n", 264 | " \n", 266 | " Memory: \n", 267 | " 4.00 GiB\n", 268 | "
Nanny: tcp://127.0.0.1:49657
\n", 276 | " Local directory: \n", 277 | " /Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/dask-worker-space/worker-zhy6e5hn\n", 278 | "
\n", 283 | "
\n", 284 | "
\n", 285 | "
\n", 286 | " \n", 287 | "
\n", 288 | "
\n", 294 | "
\n", 295 | "
\n", 296 | " \n", 297 | "

Worker: 3

\n", 298 | "
\n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 309 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 323 | " \n", 324 | " \n", 325 | " \n", 326 | "
Comm: tcp://127.0.0.1:49660Total threads: 3
\n", 306 | " Dashboard: \n", 307 | " http://127.0.0.1:49661/status\n", 308 | " \n", 310 | " Memory: \n", 311 | " 4.00 GiB\n", 312 | "
Nanny: tcp://127.0.0.1:49656
\n", 320 | " Local directory: \n", 321 | " /Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/dask-worker-space/worker-kl2ig26m\n", 322 | "
\n", 327 | "
\n", 328 | "
\n", 329 | "
\n", 330 | " \n", 331 | "
\n", 332 | "
\n", 333 | " \n", 334 | "
\n", 335 | "
\n", 336 | "
\n", 337 | " \n", 338 | "
\n", 339 | " \n", 340 | "
\n", 341 | "
\n", 342 | " " 343 | ], 344 | "text/plain": [ 345 | "" 346 | ] 347 | }, 348 | "execution_count": 5, 349 | "metadata": {}, 350 | "output_type": "execute_result" 351 | } 352 | ], 353 | "source": [ 354 | "from dask.distributed import Client\n", 355 | "\n", 356 | "client = Client(n_workers=4)\n", 357 | "client" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": 6, 363 | "id": "processed-apparel", 364 | "metadata": {}, 365 | "outputs": [ 366 | { 367 | "data": { 368 | "text/plain": [ 369 | "dd.Scalar" 370 | ] 371 | }, 372 | "execution_count": 6, 373 | "metadata": {}, 374 | "output_type": "execute_result" 375 | } 376 | ], 377 | "source": [ 378 | "import dask.dataframe as dd\n", 379 | "\n", 380 | "df = dd.read_csv(\"data/yellow_tripdata_2019-*.csv\",\n", 381 | " dtype={'RatecodeID': 'float64',\n", 382 | " 'VendorID': 'float64',\n", 383 | " 'passenger_count': 'float64',\n", 384 | " 'payment_type': 'float64'\n", 385 | " })\n", 386 | "\n", 387 | "max_tip_amount = df.groupby(\"passenger_count\").tip_amount.mean().max()\n", 388 | "max_tip_amount" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 7, 394 | "id": "62c9bfdd-0fa9-4a1c-b2d9-9cca37d00947", 395 | "metadata": {}, 396 | "outputs": [ 397 | { 398 | "name": "stdout", 399 | "output_type": "stream", 400 | "text": [ 401 | "CPU times: user 1min 14s, sys: 4.03 s, total: 1min 18s\n", 402 | "Wall time: 2min 12s\n" 403 | ] 404 | }, 405 | { 406 | "data": { 407 | "text/plain": [ 408 | "7.377822222222222" 409 | ] 410 | }, 411 | "execution_count": 7, 412 | "metadata": {}, 413 | "output_type": "execute_result" 414 | } 415 | ], 416 | "source": [ 417 | "%%time\n", 418 | "\n", 419 | "max_tip_amount.compute()" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "id": "82a447a2-0232-488e-bc03-a0a5d571d2e4", 425 | "metadata": {}, 426 | "source": [ 427 | "Let's try this computation using different schedulers and look at the results. We are selecting the scheduler _inline_ while calling `compute`." 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 4, 433 | "id": "continental-puppy", 434 | "metadata": {}, 435 | "outputs": [ 436 | { 437 | "name": "stderr", 438 | "output_type": "stream", 439 | "text": [ 440 | "/Users/pavithra-coiled/.conda/envs/talkpython-dask/lib/python3.8/site-packages/dask/core.py:151: DtypeWarning: Columns (6) have mixed types.Specify dtype option on import or set low_memory=False.\n", 441 | " result = _execute_task(task, cache)\n", 442 | "/Users/pavithra-coiled/.conda/envs/talkpython-dask/lib/python3.8/site-packages/pandas/core/indexes/base.py:3080: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison\n", 443 | " return self._engine.get_loc(casted_key)\n", 444 | "/Users/pavithra-coiled/.conda/envs/talkpython-dask/lib/python3.8/site-packages/dask/core.py:151: DtypeWarning: Columns (6) have mixed types.Specify dtype option on import or set low_memory=False.\n", 445 | " result = _execute_task(task, cache)\n" 446 | ] 447 | }, 448 | { 449 | "name": "stdout", 450 | "output_type": "stream", 451 | "text": [ 452 | "Scheduler: threading , Compute time: 105.8745608329773 , Result: 7.377822222222222\n", 453 | "Scheduler: processes , Compute time: 384.7675268650055 , Result: 7.377822222222222\n" 454 | ] 455 | }, 456 | { 457 | "name": "stderr", 458 | "output_type": "stream", 459 | "text": [ 460 | "/Users/pavithra-coiled/.conda/envs/talkpython-dask/lib/python3.8/site-packages/dask/core.py:151: DtypeWarning: Columns (6) have mixed types.Specify dtype option on import or set low_memory=False.\n", 461 | " result = _execute_task(task, cache)\n" 462 | ] 463 | }, 464 | { 465 | "name": "stdout", 466 | "output_type": "stream", 467 | "text": [ 468 | "Scheduler: sync , Compute time: 176.0339961051941 , Result: 7.377822222222222\n" 469 | ] 470 | } 471 | ], 472 | "source": [ 473 | "import time\n", 474 | "\n", 475 | "for sch in ['threading', 'processes', 'synchronous']:\n", 476 | " t0 = time.time()\n", 477 | " amount = max_tip_amount.compute(scheduler=sch)\n", 478 | " t1 = time.time()\n", 479 | " print(\"Scheduler:\", sch, \", Compute time:\", t1 - t0, \", Result:\", amount)" 480 | ] 481 | }, 482 | { 483 | "cell_type": "markdown", 484 | "id": "7f329ce0-78a6-4840-a1ec-25f3915d8343", 485 | "metadata": {}, 486 | "source": [ 487 | "We can see that the results are the same, but the time to compute varies. This is because each scheduler works differently and is best-suited for specific situations." 488 | ] 489 | }, 490 | { 491 | "cell_type": "markdown", 492 | "id": "97f6323a-54f7-423e-b0b1-432a8a4f6318", 493 | "metadata": {}, 494 | "source": [ 495 | "For most cases, we recommend using the distributed scheduler:\n", 496 | "\n", 497 | "```\n", 498 | "from dask.distributed import Client\n", 499 | "client = Client()\n", 500 | "```" 501 | ] 502 | }, 503 | { 504 | "cell_type": "markdown", 505 | "id": "1c30de77-6bf1-4138-8441-8921996965fe", 506 | "metadata": {}, 507 | "source": [ 508 | "Note that only the distributed scheduler supports all the dashboards, modern scheduling improvements, and other features.\n", 509 | "\n", 510 | "The distributed scheduler:\n", 511 | "\n", 512 | " * will also work well for these workloads on a single machine\n", 513 | " * recommended for workloads that do hold the GIL, (`dask.bag` and custom code wrapped with `dask.delayed`), even on single machine\n", 514 | " * more intelligent and provides better diagnostics than the processes scheduler\n", 515 | " * required for scaling out work across a cluster\n", 516 | " " 517 | ] 518 | }, 519 | { 520 | "cell_type": "markdown", 521 | "id": "fa58aedf-2919-4678-bf2d-5d8445e7a261", 522 | "metadata": {}, 523 | "source": [ 524 | "Finally, let's close the cluster!" 525 | ] 526 | }, 527 | { 528 | "cell_type": "code", 529 | "execution_count": 8, 530 | "id": "394e7bd1-2baf-42a3-bfe5-616c9d632a5f", 531 | "metadata": {}, 532 | "outputs": [], 533 | "source": [ 534 | "client.close()" 535 | ] 536 | }, 537 | { 538 | "cell_type": "markdown", 539 | "id": "worst-controversy", 540 | "metadata": {}, 541 | "source": [ 542 | "## References\n", 543 | "\n", 544 | "* [Scheduling Documentation](https://docs.dask.org/en/latest/scheduling.html)\n", 545 | "* [Dask Tutorial - Distributed](https://tutorial.dask.org/05_distributed.html)" 546 | ] 547 | }, 548 | { 549 | "cell_type": "code", 550 | "execution_count": null, 551 | "id": "single-landing", 552 | "metadata": {}, 553 | "outputs": [], 554 | "source": [] 555 | } 556 | ], 557 | "metadata": { 558 | "kernelspec": { 559 | "display_name": "Python 3", 560 | "language": "python", 561 | "name": "python3" 562 | }, 563 | "language_info": { 564 | "codemirror_mode": { 565 | "name": "ipython", 566 | "version": 3 567 | }, 568 | "file_extension": ".py", 569 | "mimetype": "text/x-python", 570 | "name": "python", 571 | "nbconvert_exporter": "python", 572 | "pygments_lexer": "ipython3", 573 | "version": "3.8.10" 574 | } 575 | }, 576 | "nbformat": 4, 577 | "nbformat_minor": 5 578 | } 579 | -------------------------------------------------------------------------------- /07-ML.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "bf3d47cc-d76e-4fad-97c4-117f64c7d857", 6 | "metadata": {}, 7 | "source": [ 8 | "# Dask-ML\n", 9 | "\n", 10 | "## Notebook Objectives\n", 11 | "* **Demonstrate scikit-learn**, a library for machine learning in Python.\n", 12 | "* Use **Joblib and Dask to leverage parallelism** in case of compute-bound challenges.\n", 13 | "* Use **Dask-ML for distributed machine learning** in case of memory-bound challenges.\n", 14 | "* A brief look at **machine learning in the cloud** for additional computing resources. (Optional)\n", 15 | "* **References** for further reading." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "id": "24da6bf0-474e-4323-b2ca-6d507dbd1bd9", 21 | "metadata": {}, 22 | "source": [ 23 | "## scikit-learn for machine learning\n", 24 | "\n", 25 | "scikit-learn is a powerful library for machine learning in Python. It provides tools for pre-processing, model training, evaluation, and more.\n", 26 | "\n", 27 | "If your model and data fits on your computer, we recommend using scikit-learn as usual with no parallelism.\n", 28 | "\n", 29 | "Let's take a look at at how you can train machine learning models in scikit-learn." 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "id": "862ae9be-26be-4163-8724-e1888fc4fd9c", 35 | "metadata": {}, 36 | "source": [ 37 | "#### Creating Datasets\n", 38 | "\n", 39 | "We start by generating some synthetic data using scikit-learn's [`make_classifaction`](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html) module. `make_classification` creates random classification problems, we create one with 100k samples and 10 features." 40 | ] 41 | }, 42 | { 43 | "cell_type": "code", 44 | "execution_count": 6, 45 | "id": "eba26578-71f6-4a90-b599-0c010c9be050", 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "from sklearn.datasets import make_classification\n", 50 | "\n", 51 | "X, y = make_classification(n_samples=100_000, n_features=10, random_state=0)" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "id": "44748d91-b9b8-4a6c-8a63-49765ec98c22", 57 | "metadata": {}, 58 | "source": [ 59 | "Let's examine the X and y variables. Note that X represents the set of input variables and y the output/target variables." 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 7, 65 | "id": "d8e0df36-cb38-4528-b70e-bf0b3eb69965", 66 | "metadata": {}, 67 | "outputs": [ 68 | { 69 | "data": { 70 | "text/plain": [ 71 | "array([[-0.7462974 , 0.19602952, 0.11141229, 0.59340009, 1.32627975,\n", 72 | " -1.10504115, -0.63411817, 1.19223806, -0.32277383, -0.03057938],\n", 73 | " [-0.74584283, -0.24857446, 0.50831426, -0.6628635 , 1.24896798,\n", 74 | " 0.95601408, -2.28687281, 1.12441665, -1.53928374, 0.78151558],\n", 75 | " [-0.62459237, -0.02605275, -0.18403411, -0.94905415, 1.07726998,\n", 76 | " 1.18669218, 0.30910096, 0.8074069 , -0.79054371, 0.059631 ],\n", 77 | " [-0.99690131, -0.09017488, 0.67867704, 0.28108283, 1.71104871,\n", 78 | " 1.01523959, 0.78247076, 1.26565066, -1.39478782, 1.37608239],\n", 79 | " [ 0.40153919, 0.29434464, -1.76744682, 1.20321684, -0.64477815,\n", 80 | " -0.36214576, 0.61815685, 0.93696374, 1.26810107, 0.2989785 ]])" 81 | ] 82 | }, 83 | "execution_count": 7, 84 | "metadata": {}, 85 | "output_type": "execute_result" 86 | } 87 | ], 88 | "source": [ 89 | "X[:5]" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 8, 95 | "id": "2da5faf7-b29b-47e2-a5b4-98182b2cdada", 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/plain": [ 101 | "array([0, 0, 0, 0, 1])" 102 | ] 103 | }, 104 | "execution_count": 8, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "y[:5]" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "id": "54418017-3d4b-45fc-a052-e81cb2f71adc", 116 | "metadata": {}, 117 | "source": [ 118 | "#### k-nearest neighbors Classification\n", 119 | "\n", 120 | "Next, we will implement a [k-NN classifier](https://scikit-learn.org/stable/modules/neighbors.html#classification) that creates a model based on the 'k' nearest neighbors of the query points.\n", 121 | "\n", 122 | "Scikit-learn makes it very easy to train this model. All we need to do is call the fit method, and the score method computes the accuracy (the fraction of the data the model gets correct)." 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "id": "e63474a1-d763-4522-b299-64cad8c5b2ee", 129 | "metadata": {}, 130 | "outputs": [], 131 | "source": [ 132 | "from sklearn.neighbors import KNeighborsClassifier" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "id": "46d770e9-76a5-4c7e-ac56-76a1057ab488", 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "%%time\n", 143 | "\n", 144 | "neigh = KNeighborsClassifier(n_neighbors=3)\n", 145 | "clf = neigh.fit(X, y)" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "id": "1021e289-7b34-4fb1-8100-8b34fba9c250", 152 | "metadata": {}, 153 | "outputs": [], 154 | "source": [ 155 | "clf.score(X,y)" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "id": "d5ce56d2-3597-4c19-a5dc-858c8b3bf574", 161 | "metadata": {}, 162 | "source": [ 163 | "#### Hyperparameter Tuning\n", 164 | "\n", 165 | "Hyperparameters are some predefined attributes that impact the performance of your models. For example, in the above k-NN example, the value of k is defined ahead of time. We might want to check how the model performs with different values of k, and select the best value of k. This process of selecting the best hyperparameters is called Hyperparameter Tuning.\n", 166 | "\n", 167 | "There are many ways to tune hyerparameters, we will look at GridSearchCV in this notebook." 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 10, 173 | "id": "80b75c4a-34c8-4238-a494-9bdf918077cf", 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "from sklearn.model_selection import GridSearchCV" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "id": "2c49df4f-9194-4289-82dc-bdf53ac0c811", 183 | "metadata": {}, 184 | "source": [ 185 | "We can specify the parameters to be explored as shown below, and then run `fit` on all the sets of parameters." 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 11, 191 | "id": "10630bc8-f757-493c-bbe6-911db7b27cba", 192 | "metadata": {}, 193 | "outputs": [], 194 | "source": [ 195 | "param_grid = {\n", 196 | " 'n_neighbors': [3, 5, 8],\n", 197 | " 'weights': ['uniform', 'distance'],\n", 198 | "}" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "id": "f29856c5-5183-4333-be96-a27d81774bb4", 204 | "metadata": {}, 205 | "source": [ 206 | "`verbose` gives us a detailed output for each fit and `cv` is used to define the number of folds during cross-validation." 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": 12, 212 | "id": "fd0ecee7-c80f-4d8a-a94b-b11688ff0e91", 213 | "metadata": {}, 214 | "outputs": [ 215 | { 216 | "name": "stdout", 217 | "output_type": "stream", 218 | "text": [ 219 | "Fitting 2 folds for each of 6 candidates, totalling 12 fits\n", 220 | "[CV] END .....................n_neighbors=3, weights=uniform; total time= 5.6s\n", 221 | "[CV] END .....................n_neighbors=3, weights=uniform; total time= 5.7s\n", 222 | "[CV] END ....................n_neighbors=3, weights=distance; total time= 4.3s\n", 223 | "[CV] END ....................n_neighbors=3, weights=distance; total time= 4.3s\n", 224 | "[CV] END .....................n_neighbors=5, weights=uniform; total time= 7.1s\n", 225 | "[CV] END .....................n_neighbors=5, weights=uniform; total time= 9.9s\n", 226 | "[CV] END ....................n_neighbors=5, weights=distance; total time= 7.7s\n", 227 | "[CV] END ....................n_neighbors=5, weights=distance; total time= 5.8s\n", 228 | "[CV] END .....................n_neighbors=8, weights=uniform; total time= 7.3s\n", 229 | "[CV] END .....................n_neighbors=8, weights=uniform; total time= 6.5s\n", 230 | "[CV] END ....................n_neighbors=8, weights=distance; total time= 5.3s\n", 231 | "[CV] END ....................n_neighbors=8, weights=distance; total time= 5.3s\n", 232 | "CPU times: user 1min 14s, sys: 283 ms, total: 1min 14s\n", 233 | "Wall time: 1min 14s\n" 234 | ] 235 | }, 236 | { 237 | "data": { 238 | "text/plain": [ 239 | "GridSearchCV(cv=2, estimator=KNeighborsClassifier(n_neighbors=3),\n", 240 | " param_grid={'n_neighbors': [3, 5, 8],\n", 241 | " 'weights': ['uniform', 'distance']},\n", 242 | " verbose=2)" 243 | ] 244 | }, 245 | "execution_count": 12, 246 | "metadata": {}, 247 | "output_type": "execute_result" 248 | } 249 | ], 250 | "source": [ 251 | "%%time\n", 252 | "\n", 253 | "grid_search = GridSearchCV(clf, param_grid, verbose=2, cv=2)\n", 254 | "grid_search.fit(X, y)" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "id": "78d5ad72-e391-4b6d-a4d6-4f36fff04623", 260 | "metadata": {}, 261 | "source": [ 262 | "Note the time taken!\n", 263 | "\n", 264 | "Now, we can check what that best parameters were and the best score they produced." 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": 13, 270 | "id": "ca3974c6-59c8-43e8-9a48-ce7773c7172c", 271 | "metadata": {}, 272 | "outputs": [ 273 | { 274 | "data": { 275 | "text/plain": [ 276 | "{'n_neighbors': 8, 'weights': 'distance'}" 277 | ] 278 | }, 279 | "execution_count": 13, 280 | "metadata": {}, 281 | "output_type": "execute_result" 282 | } 283 | ], 284 | "source": [ 285 | "grid_search.best_params_" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": 14, 291 | "id": "f58bfcec-6518-4d9c-b0bd-5cfd81716b2d", 292 | "metadata": {}, 293 | "outputs": [ 294 | { 295 | "data": { 296 | "text/plain": [ 297 | "0.8952100000000001" 298 | ] 299 | }, 300 | "execution_count": 14, 301 | "metadata": {}, 302 | "output_type": "execute_result" 303 | } 304 | ], 305 | "source": [ 306 | "grid_search.best_score_" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "id": "8862e077-c508-42e9-894f-d910c4d33d71", 312 | "metadata": {}, 313 | "source": [ 314 | "## joblib and Dask for compute bound problems\n", 315 | "\n", 316 | "If you data fits in memory but your model is complex, a general solution is to leverage parallel computing." 317 | ] 318 | }, 319 | { 320 | "cell_type": "markdown", 321 | "id": "eb83f047-fdcf-4793-8e26-c082dddedc9e", 322 | "metadata": {}, 323 | "source": [ 324 | "### Single machine parallelism: scikit-learn + joblib\n", 325 | "\n", 326 | "scikit-learn offers **single-machine parallelism** using a tool called Joblib. We can parallelize some algorithms by passing the number of cores in the `n_jobs` parameter.\n", 327 | "\n", 328 | "Let's look at GridSearchCV again, but this time we will use all available CPU cores. To do this, we can define `n_jobs=-1`. Note that you can also define the exact number of core to use, for example `n_jobs=4` will use 4 cores." 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 15, 334 | "id": "a8772a9e-613d-4676-82c1-af6f7494e1de", 335 | "metadata": { 336 | "tags": [] 337 | }, 338 | "outputs": [ 339 | { 340 | "name": "stdout", 341 | "output_type": "stream", 342 | "text": [ 343 | "CPU times: user 176 ms, sys: 118 ms, total: 294 ms\n", 344 | "Wall time: 32.4 s\n" 345 | ] 346 | }, 347 | { 348 | "data": { 349 | "text/plain": [ 350 | "GridSearchCV(cv=2, estimator=KNeighborsClassifier(n_neighbors=3), n_jobs=-1,\n", 351 | " param_grid={'n_neighbors': [3, 5, 8],\n", 352 | " 'weights': ['uniform', 'distance']})" 353 | ] 354 | }, 355 | "execution_count": 15, 356 | "metadata": {}, 357 | "output_type": "execute_result" 358 | } 359 | ], 360 | "source": [ 361 | "%%time\n", 362 | "\n", 363 | "grid_search = GridSearchCV(clf, param_grid, cv=2, n_jobs=-1)\n", 364 | "grid_search.fit(X, y)" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "id": "118dd976-5ef9-4679-8154-d5e648a6606f", 370 | "metadata": {}, 371 | "source": [ 372 | "Notice how the the compute time is almost reduced by half!" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "id": "79c483fc-a862-4280-a6a8-e941f60d8c3f", 378 | "metadata": {}, 379 | "source": [ 380 | "### Multi-machine parallelis: scikit-learn + joblib + Dask\n", 381 | "\n", 382 | "Dask offers a *parallel backend* scale this computation to a cluster. \n", 383 | "\n", 384 | "First, let's spin up a cluster and open the dashboard plots!" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 24, 390 | "id": "5fbe6de6-726f-47e0-9d94-44b127cd182a", 391 | "metadata": {}, 392 | "outputs": [ 393 | { 394 | "name": "stderr", 395 | "output_type": "stream", 396 | "text": [ 397 | "/opt/anaconda3/envs/talkpython-dask/lib/python3.8/site-packages/distributed/node.py:151: UserWarning: Port 8787 is already in use.\n", 398 | "Perhaps you already have a cluster running?\n", 399 | "Hosting the HTTP server on port 55138 instead\n", 400 | " warnings.warn(\n" 401 | ] 402 | }, 403 | { 404 | "data": { 405 | "text/html": [ 406 | "\n", 407 | "\n", 408 | "\n", 415 | "\n", 423 | "\n", 424 | "
\n", 409 | "

Client

\n", 410 | "\n", 414 | "
\n", 416 | "

Cluster

\n", 417 | "
    \n", 418 | "
  • Workers: 4
  • \n", 419 | "
  • Cores: 12
  • \n", 420 | "
  • Memory: 16.00 GiB
  • \n", 421 | "
\n", 422 | "
" 425 | ], 426 | "text/plain": [ 427 | "" 428 | ] 429 | }, 430 | "execution_count": 24, 431 | "metadata": {}, 432 | "output_type": "execute_result" 433 | } 434 | ], 435 | "source": [ 436 | "import joblib\n", 437 | "from dask.distributed import Client\n", 438 | "\n", 439 | "client = Client(n_workers=4)\n", 440 | "client" 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "id": "bba07264-4902-46ca-80e2-4a9cc091101f", 446 | "metadata": {}, 447 | "source": [ 448 | "Continuing with the previous GridSearchCV Example, we can use Dask as shown below:" 449 | ] 450 | }, 451 | { 452 | "cell_type": "code", 453 | "execution_count": 25, 454 | "id": "8bd919f3-bb0b-40db-9f54-fcb97ec47016", 455 | "metadata": {}, 456 | "outputs": [ 457 | { 458 | "name": "stdout", 459 | "output_type": "stream", 460 | "text": [ 461 | "CPU times: user 5.3 s, sys: 2.57 s, total: 7.87 s\n", 462 | "Wall time: 36 s\n" 463 | ] 464 | } 465 | ], 466 | "source": [ 467 | "%%time\n", 468 | "\n", 469 | "with joblib.parallel_backend(\"dask\", scatter=[X, y]):\n", 470 | " grid_search.fit(X, y)" 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "id": "c11895e1-8c77-4680-b7e9-4f1a00d2d34b", 476 | "metadata": {}, 477 | "source": [ 478 | "## Checkpoint\n", 479 | "\n", 480 | "**Question:** Fit a LogisticRegresstionCV model on the given data. Implement it with and without parallelism and note the time. Reference: [scikit-learn documentation](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegressionCV.html?highlight=logistic%20regression#sklearn.linear_model.LogisticRegressionCV)" 481 | ] 482 | }, 483 | { 484 | "cell_type": "code", 485 | "execution_count": 25, 486 | "id": "b29d1e44-50a4-484d-b1e9-99d0721564ca", 487 | "metadata": { 488 | "tags": [] 489 | }, 490 | "outputs": [], 491 | "source": [ 492 | "from sklearn.linear_model import LogisticRegressionCV" 493 | ] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "execution_count": null, 498 | "id": "a21da37a-9fd0-4c94-a60b-54b3902ef798", 499 | "metadata": {}, 500 | "outputs": [], 501 | "source": [ 502 | "# Your answer here" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": 38, 508 | "id": "d45b2213-801e-49b8-910b-01e963b70c97", 509 | "metadata": { 510 | "collapsed": true, 511 | "jupyter": { 512 | "outputs_hidden": true, 513 | "source_hidden": true 514 | }, 515 | "tags": [] 516 | }, 517 | "outputs": [ 518 | { 519 | "name": "stdout", 520 | "output_type": "stream", 521 | "text": [ 522 | "CPU times: user 11.3 s, sys: 206 ms, total: 11.5 s\n", 523 | "Wall time: 1.01 s\n" 524 | ] 525 | } 526 | ], 527 | "source": [ 528 | "%%time\n", 529 | "\n", 530 | "# Without parallelism\n", 531 | "clf = LogisticRegressionCV(cv=4, random_state=0).fit(X, y)" 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": 40, 537 | "id": "3b0e9d0f-9552-47ff-9e0a-0b7709eff961", 538 | "metadata": { 539 | "collapsed": true, 540 | "jupyter": { 541 | "outputs_hidden": true, 542 | "source_hidden": true 543 | }, 544 | "tags": [] 545 | }, 546 | "outputs": [ 547 | { 548 | "name": "stdout", 549 | "output_type": "stream", 550 | "text": [ 551 | "CPU times: user 265 ms, sys: 46.1 ms, total: 311 ms\n", 552 | "Wall time: 657 ms\n" 553 | ] 554 | } 555 | ], 556 | "source": [ 557 | "%%time\n", 558 | "\n", 559 | "# With parallelism (We can have all 4 folds execute in parallel!)\n", 560 | "clf = LogisticRegressionCV(cv=4, random_state=0, n_jobs=4).fit(X, y)" 561 | ] 562 | }, 563 | { 564 | "cell_type": "markdown", 565 | "id": "1d8e6216-ccc0-449a-ae1c-22723e6559ea", 566 | "metadata": {}, 567 | "source": [ 568 | "## Dask-ML for memory bound problems\n", 569 | "\n", 570 | "Memory-bound problems arise when your dataset is too large to even read. This is where Dask can help. In the previous course, we saw how Dask DataFrame can be used to perform pandas-like operations on larger-than-memory data. Similarly, we can use Dask-ML to perform scikit-learn-like operations on our large datasets." 571 | ] 572 | }, 573 | { 574 | "cell_type": "markdown", 575 | "id": "73899ada-2d9e-4d56-9be0-79239d55233d", 576 | "metadata": {}, 577 | "source": [ 578 | "We can use Dask-ML on the previous GridSearchCV example, but this time, with more parameters." 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": 26, 584 | "id": "f54a1552-7fca-4fdb-b50f-2ab7d729f53c", 585 | "metadata": {}, 586 | "outputs": [], 587 | "source": [ 588 | "import dask_ml.model_selection as dcv" 589 | ] 590 | }, 591 | { 592 | "cell_type": "code", 593 | "execution_count": 27, 594 | "id": "64c30e9e-22d4-4eb6-bac1-afb08abd2620", 595 | "metadata": {}, 596 | "outputs": [], 597 | "source": [ 598 | "param_grid = {\n", 599 | " 'n_neighbors': [3, 5, 8],\n", 600 | " 'weights': ['uniform', 'distance'],\n", 601 | " 'algorithm': ['auto', 'ball_tree'],\n", 602 | "}" 603 | ] 604 | }, 605 | { 606 | "cell_type": "code", 607 | "execution_count": 28, 608 | "id": "549ccf50-9946-4c80-bb6d-15225a0c42e5", 609 | "metadata": {}, 610 | "outputs": [ 611 | { 612 | "name": "stdout", 613 | "output_type": "stream", 614 | "text": [ 615 | "CPU times: user 34.9 s, sys: 5.52 s, total: 40.5 s\n", 616 | "Wall time: 4min 14s\n" 617 | ] 618 | }, 619 | { 620 | "data": { 621 | "text/plain": [ 622 | "GridSearchCV(cv=2, estimator=KNeighborsClassifier(n_neighbors=3),\n", 623 | " param_grid={'algorithm': ['auto', 'ball_tree'],\n", 624 | " 'n_neighbors': [3, 5, 8],\n", 625 | " 'weights': ['uniform', 'distance']})" 626 | ] 627 | }, 628 | "execution_count": 28, 629 | "metadata": {}, 630 | "output_type": "execute_result" 631 | } 632 | ], 633 | "source": [ 634 | "%%time\n", 635 | "\n", 636 | "grid_search = dcv.GridSearchCV(clf, param_grid, cv=2)\n", 637 | "grid_search.fit(X, y)" 638 | ] 639 | }, 640 | { 641 | "cell_type": "markdown", 642 | "id": "d20f0266-25c3-41cf-803c-61c95f10f92e", 643 | "metadata": {}, 644 | "source": [ 645 | "Let's look at another algorithm: Logistic Regression using Dask-ML. As Dask-ML implements the scikit-learn API, the code is similar." 646 | ] 647 | }, 648 | { 649 | "cell_type": "code", 650 | "execution_count": 29, 651 | "id": "f518e6a3-8f49-40e9-ae68-0d98473d9e8b", 652 | "metadata": {}, 653 | "outputs": [], 654 | "source": [ 655 | "from dask_ml.linear_model import LogisticRegression" 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": 30, 661 | "id": "f8ee9f60-9067-477e-95d1-3f3dcc97d543", 662 | "metadata": {}, 663 | "outputs": [ 664 | { 665 | "name": "stdout", 666 | "output_type": "stream", 667 | "text": [ 668 | "CPU times: user 609 ms, sys: 136 ms, total: 744 ms\n", 669 | "Wall time: 1.82 s\n" 670 | ] 671 | }, 672 | { 673 | "data": { 674 | "text/plain": [ 675 | "0.88207" 676 | ] 677 | }, 678 | "execution_count": 30, 679 | "metadata": {}, 680 | "output_type": "execute_result" 681 | } 682 | ], 683 | "source": [ 684 | "%%time\n", 685 | "\n", 686 | "clf = LogisticRegression().fit(X,y)\n", 687 | "clf.score(X,y)" 688 | ] 689 | }, 690 | { 691 | "cell_type": "code", 692 | "execution_count": 31, 693 | "id": "135108f9-7d7b-49b5-850b-45e96bfd119b", 694 | "metadata": {}, 695 | "outputs": [ 696 | { 697 | "data": { 698 | "text/plain": [ 699 | "array([False, False, False, False, True])" 700 | ] 701 | }, 702 | "execution_count": 31, 703 | "metadata": {}, 704 | "output_type": "execute_result" 705 | } 706 | ], 707 | "source": [ 708 | "clf.predict(X)[:5]" 709 | ] 710 | }, 711 | { 712 | "cell_type": "markdown", 713 | "id": "84341c54-d462-4aff-b7e7-10bb92039d0f", 714 | "metadata": {}, 715 | "source": [ 716 | "That's it!" 717 | ] 718 | }, 719 | { 720 | "cell_type": "markdown", 721 | "id": "4f24a1fe-7e2e-41ea-ace6-6b9b446d6bee", 722 | "metadata": {}, 723 | "source": [ 724 | "## Checkpoint\n", 725 | "\n", 726 | "**Question:** Use Dask-ML to implement a [Naive Bayes classifier](https://ml.dask.org/naive-bayes.html) on the given dataset." 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": null, 732 | "id": "0c8b8dfe-8d84-4426-9271-70982dfa707e", 733 | "metadata": {}, 734 | "outputs": [], 735 | "source": [ 736 | "# Your answer here" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": 45, 742 | "id": "569b2337-e35b-42d8-8d3b-5a34e94c7c00", 743 | "metadata": { 744 | "collapsed": true, 745 | "jupyter": { 746 | "outputs_hidden": true 747 | }, 748 | "tags": [] 749 | }, 750 | "outputs": [ 751 | { 752 | "data": { 753 | "text/plain": [ 754 | "array([0, 0, 0, 0, 1])" 755 | ] 756 | }, 757 | "execution_count": 45, 758 | "metadata": {}, 759 | "output_type": "execute_result" 760 | } 761 | ], 762 | "source": [ 763 | "from dask_ml.naive_bayes import GaussianNB\n", 764 | "\n", 765 | "clf = GaussianNB().fit(X,y)\n", 766 | "clf.predict(X)[:5].compute()" 767 | ] 768 | }, 769 | { 770 | "cell_type": "markdown", 771 | "id": "4b548ff7-77d2-4289-b07e-79828f9b3e0a", 772 | "metadata": {}, 773 | "source": [ 774 | "Finally, let's close the cluster." 775 | ] 776 | }, 777 | { 778 | "cell_type": "code", 779 | "execution_count": null, 780 | "id": "fddf8955-eb9a-43d1-8f7a-f3a72fff84bf", 781 | "metadata": {}, 782 | "outputs": [], 783 | "source": [ 784 | "client.close()" 785 | ] 786 | }, 787 | { 788 | "cell_type": "markdown", 789 | "id": "90d8a6cb-c38b-427b-8e6f-92c91a53630e", 790 | "metadata": {}, 791 | "source": [ 792 | "## Machine Learning in the Cloud (Optional)" 793 | ] 794 | }, 795 | { 796 | "cell_type": "markdown", 797 | "id": "8bf3a52b-e1ab-4282-a490-4224a854ab80", 798 | "metadata": {}, 799 | "source": [ 800 | "As we saw in the first course, Dask can also scale this computation to the cloud! There are many ways to do this, but here, we will be using Coiled. Coiled provides cluster-as-a-service functionality to provision hosted Dask clusters. It manages software environments, networking, etc. so that we can connect to the cloud quickly.\n", 801 | "\n", 802 | "To get started, sign-up on [cloud.coiled.io](https://cloud.coiled.io) and get your coiled login token. Then in terminal (or command prompt), execute `coiled login` and share your token when prompted." 803 | ] 804 | }, 805 | { 806 | "cell_type": "markdown", 807 | "id": "1845e4f1-595a-4f07-86e9-10c4b662ec69", 808 | "metadata": {}, 809 | "source": [ 810 | "That's it! We can work from this same notebook now. We can import coiled and create a cluster as shown below:" 811 | ] 812 | }, 813 | { 814 | "cell_type": "code", 815 | "execution_count": 1, 816 | "id": "6826d61e-5b62-4149-87e1-4d4a7a17edd6", 817 | "metadata": {}, 818 | "outputs": [ 819 | { 820 | "data": { 821 | "application/vnd.jupyter.widget-view+json": { 822 | "model_id": "", 823 | "version_major": 2, 824 | "version_minor": 0 825 | }, 826 | "text/plain": [ 827 | "Output()" 828 | ] 829 | }, 830 | "metadata": {}, 831 | "output_type": "display_data" 832 | }, 833 | { 834 | "name": "stdout", 835 | "output_type": "stream", 836 | "text": [ 837 | "Found software environment build\n" 838 | ] 839 | }, 840 | { 841 | "data": { 842 | "text/html": [ 843 | "
\n"
 844 |       ],
 845 |       "text/plain": []
 846 |      },
 847 |      "metadata": {},
 848 |      "output_type": "display_data"
 849 |     }
 850 |    ],
 851 |    "source": [
 852 |     "import coiled\n",
 853 |     "\n",
 854 |     "cluster = coiled.Cluster(n_workers=10)"
 855 |    ]
 856 |   },
 857 |   {
 858 |    "cell_type": "code",
 859 |    "execution_count": 2,
 860 |    "id": "f154acd2-3ad9-44ee-b492-bed74e1fc445",
 861 |    "metadata": {},
 862 |    "outputs": [
 863 |     {
 864 |      "name": "stdout",
 865 |      "output_type": "stream",
 866 |      "text": [
 867 |       "Dashboard: http://ec2-54-158-32-172.compute-1.amazonaws.com:8787\n"
 868 |      ]
 869 |     },
 870 |     {
 871 |      "name": "stderr",
 872 |      "output_type": "stream",
 873 |      "text": [
 874 |       "/Users/pavithra-coiled/.conda/envs/talkpython-dask/lib/python3.8/site-packages/distributed/client.py:1186: VersionMismatchWarning: Mismatched versions found\n",
 875 |       "\n",
 876 |       "+---------+--------+-----------+---------+\n",
 877 |       "| Package | client | scheduler | workers |\n",
 878 |       "+---------+--------+-----------+---------+\n",
 879 |       "| blosc   | None   | 1.10.2    | 1.10.2  |\n",
 880 |       "| lz4     | None   | 3.1.3     | 3.1.3   |\n",
 881 |       "| numpy   | 1.20.3 | 1.21.0    | 1.21.0  |\n",
 882 |       "+---------+--------+-----------+---------+\n",
 883 |       "  warnings.warn(version_module.VersionMismatchWarning(msg[0][\"warning\"]))\n"
 884 |      ]
 885 |     }
 886 |    ],
 887 |    "source": [
 888 |     "from dask.distributed import Client\n",
 889 |     "\n",
 890 |     "client = Client(cluster)\n",
 891 |     "\n",
 892 |     "print('Dashboard:', client.dashboard_link)"
 893 |    ]
 894 |   },
 895 |   {
 896 |    "cell_type": "markdown",
 897 |    "id": "054569ea-4e3b-4dc3-8609-c0b7bf08a050",
 898 |    "metadata": {},
 899 |    "source": [
 900 |     "Note that the dashboard link points to AWS."
 901 |    ]
 902 |   },
 903 |   {
 904 |    "cell_type": "markdown",
 905 |    "id": "f4c7bb77-fef1-42cd-aa53-1ba2c499cab9",
 906 |    "metadata": {},
 907 |    "source": [
 908 |     "Now, let's implement KMeans on some generated data using sklearn and Dask-ML."
 909 |    ]
 910 |   },
 911 |   {
 912 |    "cell_type": "code",
 913 |    "execution_count": 3,
 914 |    "id": "ed46562c-5df0-4a9a-b79a-77e9c250e5e6",
 915 |    "metadata": {},
 916 |    "outputs": [],
 917 |    "source": [
 918 |     "from sklearn.datasets import make_blobs\n",
 919 |     "\n",
 920 |     "X, y = make_blobs(n_samples=100, n_features=5, random_state=0)"
 921 |    ]
 922 |   },
 923 |   {
 924 |    "cell_type": "code",
 925 |    "execution_count": 4,
 926 |    "id": "60f63e8f-7bd3-4ed9-adf9-044af675451d",
 927 |    "metadata": {},
 928 |    "outputs": [
 929 |     {
 930 |      "data": {
 931 |       "text/plain": [
 932 |        "(100, 5)"
 933 |       ]
 934 |      },
 935 |      "execution_count": 4,
 936 |      "metadata": {},
 937 |      "output_type": "execute_result"
 938 |     }
 939 |    ],
 940 |    "source": [
 941 |     "X.shape"
 942 |    ]
 943 |   },
 944 |   {
 945 |    "cell_type": "code",
 946 |    "execution_count": 5,
 947 |    "id": "df13bced-8aa3-4ade-bc2b-b376f466c9b6",
 948 |    "metadata": {},
 949 |    "outputs": [],
 950 |    "source": [
 951 |     "from dask_ml.cluster import KMeans"
 952 |    ]
 953 |   },
 954 |   {
 955 |    "cell_type": "code",
 956 |    "execution_count": 6,
 957 |    "id": "f6394c05-a31a-48f7-b2b1-a3bcc6a707d8",
 958 |    "metadata": {},
 959 |    "outputs": [
 960 |     {
 961 |      "name": "stdout",
 962 |      "output_type": "stream",
 963 |      "text": [
 964 |       "CPU times: user 883 ms, sys: 39.8 ms, total: 923 ms\n",
 965 |       "Wall time: 26 s\n"
 966 |      ]
 967 |     }
 968 |    ],
 969 |    "source": [
 970 |     "%%time\n",
 971 |     "\n",
 972 |     "clf = KMeans().fit(X)"
 973 |    ]
 974 |   },
 975 |   {
 976 |    "cell_type": "code",
 977 |    "execution_count": 7,
 978 |    "id": "81a4aaad-3d18-4fd2-8474-ac2e1c7eea53",
 979 |    "metadata": {},
 980 |    "outputs": [
 981 |     {
 982 |      "data": {
 983 |       "text/html": [
 984 |        "\n",
 985 |        "\n",
 986 |        "\n",
 999 |        "\n",
1030 |        "\n",
1031 |        "
\n", 987 | "\n", 988 | " \n", 989 | " \n", 990 | " \n", 991 | " \n", 992 | " \n", 993 | " \n", 994 | " \n", 995 | " \n", 996 | " \n", 997 | "
Array Chunk
Bytes 400 B 32 B
Shape (100,) (8,)
Count 78 Tasks 13 Chunks
Type int32 numpy.ndarray
\n", 998 | "
\n", 1000 | "\n", 1001 | "\n", 1002 | " \n", 1003 | " \n", 1004 | " \n", 1005 | "\n", 1006 | " \n", 1007 | " \n", 1008 | " \n", 1009 | " \n", 1010 | " \n", 1011 | " \n", 1012 | " \n", 1013 | " \n", 1014 | " \n", 1015 | " \n", 1016 | " \n", 1017 | " \n", 1018 | " \n", 1019 | " \n", 1020 | " \n", 1021 | "\n", 1022 | " \n", 1023 | " \n", 1024 | "\n", 1025 | " \n", 1026 | " 100\n", 1027 | " 1\n", 1028 | "\n", 1029 | "
" 1032 | ], 1033 | "text/plain": [ 1034 | "dask.array" 1035 | ] 1036 | }, 1037 | "execution_count": 7, 1038 | "metadata": {}, 1039 | "output_type": "execute_result" 1040 | } 1041 | ], 1042 | "source": [ 1043 | "clf.labels_" 1044 | ] 1045 | }, 1046 | { 1047 | "cell_type": "code", 1048 | "execution_count": 8, 1049 | "id": "d4f2c8ba-2ebc-4b19-9ef1-c05375bc1c36", 1050 | "metadata": {}, 1051 | "outputs": [ 1052 | { 1053 | "data": { 1054 | "text/plain": [ 1055 | "array([2, 4, 0, 6, 0, 7, 7, 5, 1, 2], dtype=int32)" 1056 | ] 1057 | }, 1058 | "execution_count": 8, 1059 | "metadata": {}, 1060 | "output_type": "execute_result" 1061 | } 1062 | ], 1063 | "source": [ 1064 | "clf.labels_[:10].compute()" 1065 | ] 1066 | }, 1067 | { 1068 | "cell_type": "code", 1069 | "execution_count": 9, 1070 | "id": "0b789ba3-a867-4797-b053-bd341b2a07b9", 1071 | "metadata": {}, 1072 | "outputs": [], 1073 | "source": [ 1074 | "client.close()" 1075 | ] 1076 | }, 1077 | { 1078 | "cell_type": "markdown", 1079 | "id": "6df48786-524b-4bd2-913f-72683176aa56", 1080 | "metadata": {}, 1081 | "source": [ 1082 | "## References\n", 1083 | "\n", 1084 | "* [Dask-ML documentation](https://ml.dask.org/)\n", 1085 | "* [Dask Examples - Machine Learning](https://examples.dask.org/machine-learning.html)\n", 1086 | "* [Dask Tutorial - Machine Learning](https://tutorial.dask.org/08_machine_learning.html)" 1087 | ] 1088 | }, 1089 | { 1090 | "cell_type": "code", 1091 | "execution_count": null, 1092 | "id": "ab8e02a4-6b79-448d-a090-3c17768b0b74", 1093 | "metadata": {}, 1094 | "outputs": [], 1095 | "source": [] 1096 | } 1097 | ], 1098 | "metadata": { 1099 | "kernelspec": { 1100 | "display_name": "Python 3", 1101 | "language": "python", 1102 | "name": "python3" 1103 | }, 1104 | "language_info": { 1105 | "codemirror_mode": { 1106 | "name": "ipython", 1107 | "version": 3 1108 | }, 1109 | "file_extension": ".py", 1110 | "mimetype": "text/x-python", 1111 | "name": "python", 1112 | "nbconvert_exporter": "python", 1113 | "pygments_lexer": "ipython3", 1114 | "version": "3.8.10" 1115 | } 1116 | }, 1117 | "nbformat": 4, 1118 | "nbformat_minor": 5 1119 | } 1120 | -------------------------------------------------------------------------------- /03-array.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "id": "continent-fifty", 6 | "metadata": {}, 7 | "source": [ 8 | "# Dask Array\n", 9 | "\n", 10 | "## Notebook Objectives\n", 11 | "* **Demonstrate NumPy**, a library for working with multidimensional arrays.\n", 12 | "* Using **blocked algorithms** to work on a large dataset in small chunks.\n", 13 | "* **Introducing Dask Array**, interface for parallel NumPy.\n", 14 | "* **Limitations of Dask Array**.\n", 15 | "* **References** for further reading." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "id": "metallic-spray", 21 | "metadata": {}, 22 | "source": [ 23 | "## Demostrate NumPy for array operations\n" 24 | ] 25 | }, 26 | { 27 | "cell_type": "markdown", 28 | "id": "limiting-navigator", 29 | "metadata": {}, 30 | "source": [ 31 | "NumPy has a `ones()` function to create unit arrays, or arrays of all ones. We use it to create a 10x10 matrix of ones:" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 1, 37 | "id": "handed-round", 38 | "metadata": {}, 39 | "outputs": [ 40 | { 41 | "data": { 42 | "text/plain": [ 43 | "array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", 44 | " [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", 45 | " [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", 46 | " [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", 47 | " [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", 48 | " [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", 49 | " [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", 50 | " [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", 51 | " [1, 1, 1, 1, 1, 1, 1, 1, 1, 1],\n", 52 | " [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])" 53 | ] 54 | }, 55 | "execution_count": 1, 56 | "metadata": {}, 57 | "output_type": "execute_result" 58 | } 59 | ], 60 | "source": [ 61 | "import numpy as np\n", 62 | "\n", 63 | "x = np.ones((10, 10), dtype=int)\n", 64 | "x" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "id": "literary-encounter", 70 | "metadata": {}, 71 | "source": [ 72 | "The `sum()` method is used to calculate sum." 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 2, 78 | "id": "checked-offer", 79 | "metadata": {}, 80 | "outputs": [ 81 | { 82 | "name": "stdout", 83 | "output_type": "stream", 84 | "text": [ 85 | "CPU times: user 30 µs, sys: 6 µs, total: 36 µs\n", 86 | "Wall time: 38.9 µs\n" 87 | ] 88 | }, 89 | { 90 | "data": { 91 | "text/plain": [ 92 | "100" 93 | ] 94 | }, 95 | "execution_count": 2, 96 | "metadata": {}, 97 | "output_type": "execute_result" 98 | } 99 | ], 100 | "source": [ 101 | "%%time\n", 102 | "x.sum()" 103 | ] 104 | }, 105 | { 106 | "cell_type": "markdown", 107 | "id": "twenty-operations", 108 | "metadata": {}, 109 | "source": [ 110 | "The `random` module can be used to create arrays of random data. Let's create a larger matrix of dimension 1000x1000:" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 28, 116 | "id": "subtle-outdoors", 117 | "metadata": {}, 118 | "outputs": [ 119 | { 120 | "data": { 121 | "text/plain": [ 122 | "array([[0.45568271, 0.1431577 , 0.37642501, ..., 0.15796487, 0.45990731,\n", 123 | " 0.27984186],\n", 124 | " [0.81437462, 0.62504866, 0.51884817, ..., 0.66213603, 0.76877329,\n", 125 | " 0.27941481],\n", 126 | " [0.67832935, 0.47963707, 0.86939886, ..., 0.8534172 , 0.84040989,\n", 127 | " 0.56960117],\n", 128 | " ...,\n", 129 | " [0.01973544, 0.35057438, 0.05273421, ..., 0.60723429, 0.2137585 ,\n", 130 | " 0.99921152],\n", 131 | " [0.89358418, 0.53268602, 0.69352128, ..., 0.06789243, 0.84053498,\n", 132 | " 0.38334184],\n", 133 | " [0.05749119, 0.42748649, 0.72071472, ..., 0.44029739, 0.43499474,\n", 134 | " 0.46421326]])" 135 | ] 136 | }, 137 | "execution_count": 28, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "x = np.random.random(size=(1000, 1000))\n", 144 | "x" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 3, 150 | "id": "expressed-immigration", 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | "CPU times: user 33 µs, sys: 7 µs, total: 40 µs\n", 158 | "Wall time: 42.9 µs\n" 159 | ] 160 | }, 161 | { 162 | "data": { 163 | "text/plain": [ 164 | "100" 165 | ] 166 | }, 167 | "execution_count": 3, 168 | "metadata": {}, 169 | "output_type": "execute_result" 170 | } 171 | ], 172 | "source": [ 173 | "%%time\n", 174 | "x.sum()" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "id": "integrated-cooking", 180 | "metadata": {}, 181 | "source": [ 182 | "NumPy has many helpful operations, including matrix transpose, matrix addition, and mean as shown below:" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 4, 188 | "id": "compound-functionality", 189 | "metadata": {}, 190 | "outputs": [ 191 | { 192 | "name": "stdout", 193 | "output_type": "stream", 194 | "text": [ 195 | "CPU times: user 36 µs, sys: 0 ns, total: 36 µs\n", 196 | "Wall time: 39.1 µs\n" 197 | ] 198 | }, 199 | { 200 | "data": { 201 | "text/plain": [ 202 | "array([[2, 2, 2, 2, 2, 2, 2, 2, 2, 2],\n", 203 | " [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],\n", 204 | " [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],\n", 205 | " [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],\n", 206 | " [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],\n", 207 | " [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],\n", 208 | " [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],\n", 209 | " [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],\n", 210 | " [2, 2, 2, 2, 2, 2, 2, 2, 2, 2],\n", 211 | " [2, 2, 2, 2, 2, 2, 2, 2, 2, 2]])" 212 | ] 213 | }, 214 | "execution_count": 4, 215 | "metadata": {}, 216 | "output_type": "execute_result" 217 | } 218 | ], 219 | "source": [ 220 | "%%time\n", 221 | "y = x + x.T\n", 222 | "y" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 5, 228 | "id": "novel-billion", 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "name": "stdout", 233 | "output_type": "stream", 234 | "text": [ 235 | "CPU times: user 76 µs, sys: 8 µs, total: 84 µs\n", 236 | "Wall time: 86.1 µs\n" 237 | ] 238 | }, 239 | { 240 | "data": { 241 | "text/plain": [ 242 | "2.0" 243 | ] 244 | }, 245 | "execution_count": 5, 246 | "metadata": {}, 247 | "output_type": "execute_result" 248 | } 249 | ], 250 | "source": [ 251 | "%%time\n", 252 | "np.mean(y)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "id": "acoustic-feedback", 258 | "metadata": {}, 259 | "source": [ 260 | "Let's now create an even larger matrix of 20,000x20,000 normally distributed random values and compute it's mean." 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 6, 266 | "id": "integrated-metallic", 267 | "metadata": {}, 268 | "outputs": [ 269 | { 270 | "name": "stdout", 271 | "output_type": "stream", 272 | "text": [ 273 | "CPU times: user 9.15 s, sys: 771 ms, total: 9.92 s\n", 274 | "Wall time: 9.95 s\n" 275 | ] 276 | }, 277 | { 278 | "data": { 279 | "text/plain": [ 280 | "array([ 9.99981611, 9.99998234, 10.00035018, ..., 10.0003316 ,\n", 281 | " 10.00153614, 9.99937434])" 282 | ] 283 | }, 284 | "execution_count": 6, 285 | "metadata": {}, 286 | "output_type": "execute_result" 287 | } 288 | ], 289 | "source": [ 290 | "%%time \n", 291 | "x = np.random.normal(10, 0.1, size=(20000, 20000)) \n", 292 | "y = x.mean(axis=0) \n", 293 | "y" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "id": "split-original", 299 | "metadata": {}, 300 | "source": [ 301 | "Note that this computation takes some time. We will run this same example using Dask in a few minutes!" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "id": "heavy-samoa", 307 | "metadata": {}, 308 | "source": [ 309 | "Now, let's try to create an even larger matrix with a billion values along each axis!" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": 33, 315 | "id": "generic-dimension", 316 | "metadata": {}, 317 | "outputs": [ 318 | { 319 | "ename": "MemoryError", 320 | "evalue": "Unable to allocate 6.94 EiB for an array with shape (1000000000, 1000000000) and data type int64", 321 | "output_type": "error", 322 | "traceback": [ 323 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 324 | "\u001b[0;31mMemoryError\u001b[0m Traceback (most recent call last)", 325 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mx\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mnp\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mones\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;36m1_000_000_000\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1_000_000_000\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mint\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 326 | "\u001b[0;32m~/.conda/envs/talkpython-dask/lib/python3.8/site-packages/numpy/core/numeric.py\u001b[0m in \u001b[0;36mones\u001b[0;34m(shape, dtype, order, like)\u001b[0m\n\u001b[1;32m 201\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0m_ones_with_like\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morder\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0morder\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlike\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mlike\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 202\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 203\u001b[0;31m \u001b[0ma\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mempty\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdtype\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0morder\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 204\u001b[0m \u001b[0mmultiarray\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mcopyto\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ma\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcasting\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'unsafe'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 205\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0ma\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 327 | "\u001b[0;31mMemoryError\u001b[0m: Unable to allocate 6.94 EiB for an array with shape (1000000000, 1000000000) and data type int64" 328 | ] 329 | } 330 | ], 331 | "source": [ 332 | "x = np.ones((1_000_000_000, 1_000_000_000), dtype=int)" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "id": "streaming-midnight", 338 | "metadata": {}, 339 | "source": [ 340 | "This throws a `MemoryError`, meaning NumPy isn't able to handle data at this size. We can work around this limitation using blocked algorithms as shown in the following section." 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "id": "recreational-upgrade", 346 | "metadata": {}, 347 | "source": [ 348 | "## Blocked Algorithms" 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "id": "angry-projection", 354 | "metadata": {}, 355 | "source": [ 356 | "*\"A blocked algorithm executes on a large dataset by breaking it up into many small blocks.\"* ~ [tutorial.dask.org](https://tutorial.dask.org/03_array.html#Blocked-Algorithms)\n", 357 | "\n", 358 | "For example, in the above example with a billion+ numbers, consider taking the sum of all numbers. We might instead break up the array into 1,000 chunks, each of size 1,000,000, take the sum of each chunk, and then take the sum of the intermediate sums.\n", 359 | "\n", 360 | "Let's do this:" 361 | ] 362 | }, 363 | { 364 | "cell_type": "code", 365 | "execution_count": 34, 366 | "id": "funky-awareness", 367 | "metadata": {}, 368 | "outputs": [], 369 | "source": [ 370 | "# Load data with h5py\n", 371 | "import h5py\n", 372 | "import os\n", 373 | "f = h5py.File(os.path.join('data', 'random.hdf5'), mode='r')\n", 374 | "dset = f['/x']" 375 | ] 376 | }, 377 | { 378 | "cell_type": "code", 379 | "execution_count": 35, 380 | "id": "ahead-discussion", 381 | "metadata": {}, 382 | "outputs": [ 383 | { 384 | "name": "stdout", 385 | "output_type": "stream", 386 | "text": [ 387 | "5005765.3125\n" 388 | ] 389 | } 390 | ], 391 | "source": [ 392 | "# Compute sum of large array, one million numbers at a time\n", 393 | "sums = []\n", 394 | "for i in range(0, 1_000_000_000, 1_000_000):\n", 395 | " chunk = dset[i: i + 1_000_000]\n", 396 | " sums.append(chunk.sum())\n", 397 | "\n", 398 | "total = sum(sums)\n", 399 | "print(total)" 400 | ] 401 | }, 402 | { 403 | "cell_type": "markdown", 404 | "id": "bearing-document", 405 | "metadata": {}, 406 | "source": [ 407 | "Note that this is a sequential process in the notebook kernel, both the loading and summing." 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "id": "interpreted-triumph", 413 | "metadata": {}, 414 | "source": [ 415 | "## Checkpoint\n", 416 | "\n", 417 | "Question: Create a random matrix of size 1000x1000 and compute standard deviation." 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": null, 423 | "id": "honey-awareness", 424 | "metadata": {}, 425 | "outputs": [], 426 | "source": [ 427 | "# Your answer goes here" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": null, 433 | "id": "lucky-playing", 434 | "metadata": { 435 | "jupyter": { 436 | "source_hidden": true 437 | }, 438 | "tags": [] 439 | }, 440 | "outputs": [], 441 | "source": [ 442 | "# Answer\n", 443 | "\n", 444 | "x = np.random.random(size=(1000, 1000))\n", 445 | "y = x.std(axis=0)\n", 446 | "y" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "id": "obvious-bubble", 452 | "metadata": {}, 453 | "source": [ 454 | "## Dask Array for parallel NumPy" 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "id": "illegal-causing", 460 | "metadata": {}, 461 | "source": [ 462 | "Dask Array is high-level interface that can be used to scale NumPy code to large datasets by using chuncking techniques as seen in the previous section." 463 | ] 464 | }, 465 | { 466 | "cell_type": "markdown", 467 | "id": "specific-abraham", 468 | "metadata": {}, 469 | "source": [ 470 | "Let's create a new cluster:" 471 | ] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "execution_count": 7, 476 | "id": "valuable-seventh", 477 | "metadata": {}, 478 | "outputs": [ 479 | { 480 | "data": { 481 | "text/html": [ 482 | "\n", 483 | "
\n", 484 | "
\n", 491 | "
\n", 492 | "

Client

\n", 493 | "

Client-751baf30-d815-11eb-b4c1-3e22fb7564f2

\n", 494 | " \n", 495 | " \n", 496 | " \n", 497 | " \n", 498 | " \n", 499 | " \n", 500 | " \n", 501 | " \n", 502 | " \n", 506 | " \n", 507 | " \n", 508 | " \n", 509 | "
Connection method: Cluster objectCluster type: LocalCluster
\n", 503 | " Dashboard: \n", 504 | " http://127.0.0.1:8787/status\n", 505 | "
\n", 510 | " \n", 511 | "
\n", 512 | "

Cluster Info

\n", 513 | " \n", 514 | "
\n", 515 | "
\n", 522 | "
\n", 523 | "

LocalCluster

\n", 524 | "

2614b8c1

\n", 525 | " \n", 526 | " \n", 527 | " \n", 528 | " \n", 529 | " \n", 530 | " \n", 531 | " \n", 532 | " \n", 533 | " \n", 536 | " \n", 537 | " \n", 538 | " \n", 539 | " \n", 543 | " \n", 547 | " \n", 548 | " \n", 549 | "
Status: runningUsing processes: True
\n", 534 | " Dashboard: http://127.0.0.1:8787/status\n", 535 | " Workers: 4
\n", 540 | " Total threads:\n", 541 | " 12\n", 542 | " \n", 544 | " Total memory:\n", 545 | " 16.00 GiB\n", 546 | "
\n", 550 | "
\n", 551 | "

Scheduler Info

\n", 552 | " \n", 553 | "
\n", 554 | " \n", 555 | "
\n", 556 | "
\n", 563 | "
\n", 564 | "

Scheduler

\n", 565 | "

Scheduler-44f1121d-1f22-45d0-bc92-11e25d854ecf

\n", 566 | " \n", 567 | " \n", 568 | " \n", 569 | " \n", 570 | " \n", 571 | " \n", 572 | " \n", 575 | " \n", 579 | " \n", 580 | " \n", 581 | " \n", 585 | " \n", 589 | " \n", 590 | "
Comm: tcp://127.0.0.1:63861Workers: 4
\n", 573 | " Dashboard: http://127.0.0.1:8787/status\n", 574 | " \n", 576 | " Total threads:\n", 577 | " 12\n", 578 | "
\n", 582 | " Started:\n", 583 | " Just now\n", 584 | " \n", 586 | " Total memory:\n", 587 | " 16.00 GiB\n", 588 | "
\n", 591 | "
\n", 592 | "
\n", 593 | " \n", 594 | "
\n", 595 | "

Workers

\n", 596 | " \n", 597 | "
\n", 598 | "
\n", 604 | "
\n", 605 | "
\n", 606 | " \n", 607 | "

Worker: 0

\n", 608 | "
\n", 609 | " \n", 610 | " \n", 611 | " \n", 612 | " \n", 613 | " \n", 614 | " \n", 615 | " \n", 619 | " \n", 623 | " \n", 624 | " \n", 625 | " \n", 626 | " \n", 627 | " \n", 628 | " \n", 629 | " \n", 633 | " \n", 634 | " \n", 635 | " \n", 636 | "
Comm: tcp://127.0.0.1:63868Total threads: 3
\n", 616 | " Dashboard: \n", 617 | " http://127.0.0.1:63872/status\n", 618 | " \n", 620 | " Memory: \n", 621 | " 4.00 GiB\n", 622 | "
Nanny: tcp://127.0.0.1:63863
\n", 630 | " Local directory: \n", 631 | " /Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/dask-worker-space/worker-ec3y923x\n", 632 | "
\n", 637 | "
\n", 638 | "
\n", 639 | "
\n", 640 | " \n", 641 | "
\n", 642 | "
\n", 648 | "
\n", 649 | "
\n", 650 | " \n", 651 | "

Worker: 1

\n", 652 | "
\n", 653 | " \n", 654 | " \n", 655 | " \n", 656 | " \n", 657 | " \n", 658 | " \n", 659 | " \n", 663 | " \n", 667 | " \n", 668 | " \n", 669 | " \n", 670 | " \n", 671 | " \n", 672 | " \n", 673 | " \n", 677 | " \n", 678 | " \n", 679 | " \n", 680 | "
Comm: tcp://127.0.0.1:63870Total threads: 3
\n", 660 | " Dashboard: \n", 661 | " http://127.0.0.1:63873/status\n", 662 | " \n", 664 | " Memory: \n", 665 | " 4.00 GiB\n", 666 | "
Nanny: tcp://127.0.0.1:63864
\n", 674 | " Local directory: \n", 675 | " /Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/dask-worker-space/worker-asmqcmx8\n", 676 | "
\n", 681 | "
\n", 682 | "
\n", 683 | "
\n", 684 | " \n", 685 | "
\n", 686 | "
\n", 692 | "
\n", 693 | "
\n", 694 | " \n", 695 | "

Worker: 2

\n", 696 | "
\n", 697 | " \n", 698 | " \n", 699 | " \n", 700 | " \n", 701 | " \n", 702 | " \n", 703 | " \n", 707 | " \n", 711 | " \n", 712 | " \n", 713 | " \n", 714 | " \n", 715 | " \n", 716 | " \n", 717 | " \n", 721 | " \n", 722 | " \n", 723 | " \n", 724 | "
Comm: tcp://127.0.0.1:63867Total threads: 3
\n", 704 | " Dashboard: \n", 705 | " http://127.0.0.1:63871/status\n", 706 | " \n", 708 | " Memory: \n", 709 | " 4.00 GiB\n", 710 | "
Nanny: tcp://127.0.0.1:63866
\n", 718 | " Local directory: \n", 719 | " /Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/dask-worker-space/worker-j5mi2q4h\n", 720 | "
\n", 725 | "
\n", 726 | "
\n", 727 | "
\n", 728 | " \n", 729 | "
\n", 730 | "
\n", 736 | "
\n", 737 | "
\n", 738 | " \n", 739 | "

Worker: 3

\n", 740 | "
\n", 741 | " \n", 742 | " \n", 743 | " \n", 744 | " \n", 745 | " \n", 746 | " \n", 747 | " \n", 751 | " \n", 755 | " \n", 756 | " \n", 757 | " \n", 758 | " \n", 759 | " \n", 760 | " \n", 761 | " \n", 765 | " \n", 766 | " \n", 767 | " \n", 768 | "
Comm: tcp://127.0.0.1:63869Total threads: 3
\n", 748 | " Dashboard: \n", 749 | " http://127.0.0.1:63874/status\n", 750 | " \n", 752 | " Memory: \n", 753 | " 4.00 GiB\n", 754 | "
Nanny: tcp://127.0.0.1:63865
\n", 762 | " Local directory: \n", 763 | " /Users/pavithra-coiled/Developer/talkpython-dask-course/2-dask-fundamentals/dask-worker-space/worker-z8ngf4fe\n", 764 | "
\n", 769 | "
\n", 770 | "
\n", 771 | "
\n", 772 | " \n", 773 | "
\n", 774 | "
\n", 775 | " \n", 776 | "
\n", 777 | "
\n", 778 | "
\n", 779 | " \n", 780 | "
\n", 781 | " \n", 782 | "
\n", 783 | "
\n", 784 | " " 785 | ], 786 | "text/plain": [ 787 | "" 788 | ] 789 | }, 790 | "execution_count": 7, 791 | "metadata": {}, 792 | "output_type": "execute_result" 793 | } 794 | ], 795 | "source": [ 796 | "from dask.distributed import Client\n", 797 | "\n", 798 | "client = Client(n_workers=4)\n", 799 | "client" 800 | ] 801 | }, 802 | { 803 | "cell_type": "markdown", 804 | "id": "requested-consistency", 805 | "metadata": {}, 806 | "source": [ 807 | "Don't forget to open the dashboards!" 808 | ] 809 | }, 810 | { 811 | "cell_type": "markdown", 812 | "id": "related-defensive", 813 | "metadata": {}, 814 | "source": [ 815 | "The following Dask Array code creates a 10,000x10,000 array with 100x100 chunks. The visualization below shows the resulting structure created." 816 | ] 817 | }, 818 | { 819 | "cell_type": "code", 820 | "execution_count": 8, 821 | "id": "usual-certificate", 822 | "metadata": {}, 823 | "outputs": [ 824 | { 825 | "data": { 826 | "text/html": [ 827 | "\n", 828 | "\n", 829 | "\n", 842 | "\n", 897 | "\n", 898 | "
\n", 830 | "\n", 831 | " \n", 832 | " \n", 833 | " \n", 834 | " \n", 835 | " \n", 836 | " \n", 837 | " \n", 838 | " \n", 839 | " \n", 840 | "
Array Chunk
Bytes 762.94 MiB 78.12 kiB
Shape (10000, 10000) (100, 100)
Count 10000 Tasks 10000 Chunks
Type float64 numpy.ndarray
\n", 841 | "
\n", 843 | "\n", 844 | "\n", 845 | " \n", 846 | " \n", 847 | " \n", 848 | " \n", 849 | " \n", 850 | " \n", 851 | " \n", 852 | " \n", 853 | " \n", 854 | " \n", 855 | " \n", 856 | " \n", 857 | " \n", 858 | " \n", 859 | " \n", 860 | " \n", 861 | " \n", 862 | " \n", 863 | " \n", 864 | " \n", 865 | " \n", 866 | "\n", 867 | " \n", 868 | " \n", 869 | " \n", 870 | " \n", 871 | " \n", 872 | " \n", 873 | " \n", 874 | " \n", 875 | " \n", 876 | " \n", 877 | " \n", 878 | " \n", 879 | " \n", 880 | " \n", 881 | " \n", 882 | " \n", 883 | " \n", 884 | " \n", 885 | " \n", 886 | " \n", 887 | " \n", 888 | "\n", 889 | " \n", 890 | " \n", 891 | "\n", 892 | " \n", 893 | " 10000\n", 894 | " 10000\n", 895 | "\n", 896 | "
" 899 | ], 900 | "text/plain": [ 901 | "dask.array" 902 | ] 903 | }, 904 | "execution_count": 8, 905 | "metadata": {}, 906 | "output_type": "execute_result" 907 | } 908 | ], 909 | "source": [ 910 | "import dask.array as da\n", 911 | "\n", 912 | "x = da.ones((10_000, 10_000), chunks=(100, 100))\n", 913 | "x" 914 | ] 915 | }, 916 | { 917 | "cell_type": "markdown", 918 | "id": "photographic-george", 919 | "metadata": {}, 920 | "source": [ 921 | "Let's compute the sum of this array. Dask Array also evaluates lazily, so we need to call `compute()` to get the result." 922 | ] 923 | }, 924 | { 925 | "cell_type": "code", 926 | "execution_count": 9, 927 | "id": "attended-original", 928 | "metadata": {}, 929 | "outputs": [ 930 | { 931 | "name": "stdout", 932 | "output_type": "stream", 933 | "text": [ 934 | "CPU times: user 55.8 ms, sys: 3.4 ms, total: 59.2 ms\n", 935 | "Wall time: 58.4 ms\n" 936 | ] 937 | } 938 | ], 939 | "source": [ 940 | "%%time\n", 941 | "result = x.sum()" 942 | ] 943 | }, 944 | { 945 | "cell_type": "code", 946 | "execution_count": 10, 947 | "id": "appropriate-sunday", 948 | "metadata": {}, 949 | "outputs": [ 950 | { 951 | "data": { 952 | "text/html": [ 953 | "\n", 954 | "\n", 955 | "\n", 968 | "\n", 971 | "\n", 972 | "
\n", 956 | "\n", 957 | " \n", 958 | " \n", 959 | " \n", 960 | " \n", 961 | " \n", 962 | " \n", 963 | " \n", 964 | " \n", 965 | " \n", 966 | "
Array Chunk
Bytes 8 B 8.0 B
Shape () ()
Count 23364 Tasks 1 Chunks
Type float64 numpy.ndarray
\n", 967 | "
\n", 969 | "\n", 970 | "
" 973 | ], 974 | "text/plain": [ 975 | "dask.array" 976 | ] 977 | }, 978 | "execution_count": 10, 979 | "metadata": {}, 980 | "output_type": "execute_result" 981 | } 982 | ], 983 | "source": [ 984 | "result" 985 | ] 986 | }, 987 | { 988 | "cell_type": "code", 989 | "execution_count": 11, 990 | "id": "vanilla-variable", 991 | "metadata": {}, 992 | "outputs": [ 993 | { 994 | "name": "stdout", 995 | "output_type": "stream", 996 | "text": [ 997 | "CPU times: user 14 s, sys: 587 ms, total: 14.6 s\n", 998 | "Wall time: 14.8 s\n" 999 | ] 1000 | }, 1001 | { 1002 | "data": { 1003 | "text/plain": [ 1004 | "100000000.0" 1005 | ] 1006 | }, 1007 | "execution_count": 11, 1008 | "metadata": {}, 1009 | "output_type": "execute_result" 1010 | } 1011 | ], 1012 | "source": [ 1013 | "%%time\n", 1014 | "result.compute()" 1015 | ] 1016 | }, 1017 | { 1018 | "cell_type": "markdown", 1019 | "id": "visible-candle", 1020 | "metadata": {}, 1021 | "source": [ 1022 | "Now, let's do the same NumPy operations as earlier and compare the compute time!" 1023 | ] 1024 | }, 1025 | { 1026 | "cell_type": "code", 1027 | "execution_count": 12, 1028 | "id": "certain-twins", 1029 | "metadata": {}, 1030 | "outputs": [ 1031 | { 1032 | "name": "stdout", 1033 | "output_type": "stream", 1034 | "text": [ 1035 | "CPU times: user 2.09 s, sys: 77.9 ms, total: 2.17 s\n", 1036 | "Wall time: 2.66 s\n" 1037 | ] 1038 | }, 1039 | { 1040 | "data": { 1041 | "text/plain": [ 1042 | "array([ 9.99984054, 9.99810732, 10.0011439 , 10.00006243, 10.00067303,\n", 1043 | " 10.00061939, 9.99967573, 10.00082095, 10.00005874, 10.00077991,\n", 1044 | " 10.00026624, 9.9997531 , 9.9998511 , 10.00016311, 10.00117175,\n", 1045 | " 10.00104788, 9.99950712, 10.00058845, 10.0008487 , 9.99974336,\n", 1046 | " 9.99990533, 10.00005898, 9.9997313 , 9.99963978, 9.99963639,\n", 1047 | " 10.0004849 , 10.00105947, 10.0013202 , 10.0003486 , 9.99945365,\n", 1048 | " 9.9993116 , 10.00008687, 10.00020631, 10.00188321, 9.99979062,\n", 1049 | " 9.99948907, 9.99998619, 9.99977491, 9.99940876, 9.99954004,\n", 1050 | " 9.99940587, 10.00030509, 9.99879323, 10.00005036, 9.99899518,\n", 1051 | " 9.99996586, 10.00165206, 10.00018339, 9.99941188, 9.9996999 ,\n", 1052 | " 10.00090938, 9.9999311 , 9.99904588, 10.00009871, 10.00080389,\n", 1053 | " 9.9996124 , 9.99973756, 10.0006106 , 9.99932609, 9.99976075,\n", 1054 | " 10.00024083, 9.99707116, 9.99981883, 9.99889187, 9.99995958,\n", 1055 | " 9.99922913, 10.00073525, 9.9999535 , 10.00043148, 9.99911152,\n", 1056 | " 10.00125334, 10.00005256, 9.99983756, 10.00075923, 9.99979802,\n", 1057 | " 9.99972084, 10.00023872, 10.00063686, 9.99954617, 9.99973034,\n", 1058 | " 10.00006705, 10.00021521, 9.99963444, 10.00034974, 9.99978419,\n", 1059 | " 10.00112573, 10.00122812, 10.000071 , 9.99896755, 9.99921739,\n", 1060 | " 10.00022158, 9.99956024, 10.00018029, 9.99947774, 10.00028779,\n", 1061 | " 10.00035899, 9.99952996, 9.99992577, 10.0007089 , 9.99882874,\n", 1062 | " 10.00140435, 9.99983057, 10.00084932, 10.00110383, 9.99861585,\n", 1063 | " 10.00057231, 9.99946645, 10.00006694, 9.99990636, 10.00075132,\n", 1064 | " 9.99948178, 10.00017333, 10.0002826 , 9.99890721, 10.00102143,\n", 1065 | " 10.00052894, 9.99955435, 9.99998 , 10.00069429, 10.00137973,\n", 1066 | " 9.99904271, 9.99989111, 9.99978139, 10.00059649, 9.99930181,\n", 1067 | " 10.00061705, 9.9993816 , 9.99939831, 10.00018856, 10.00056412,\n", 1068 | " 9.99937629, 10.0004428 , 10.00040308, 10.00143014, 10.00078526,\n", 1069 | " 10.00026439, 9.99907207, 10.00159326, 10.00129855, 10.00078995,\n", 1070 | " 9.99974592, 9.99964061, 10.00055318, 9.99948493, 9.99856299,\n", 1071 | " 9.99908679, 10.00014546, 9.99976545, 9.99945505, 9.99907417,\n", 1072 | " 9.99914494, 9.99994726, 9.99908853, 9.99928258, 10.00063795,\n", 1073 | " 10.00022527, 9.99882283, 9.99902949, 9.99992081, 9.99939082,\n", 1074 | " 9.99922374, 10.00056829, 10.00163323, 10.00045605, 10.00005765,\n", 1075 | " 10.00038722, 9.99962068, 9.99960248, 9.99976618, 10.00107423,\n", 1076 | " 9.99968588, 9.99877324, 10.00105333, 9.99946677, 10.00037982,\n", 1077 | " 9.99980408, 9.99860037, 10.00151025, 9.99957228, 9.99947328,\n", 1078 | " 9.99925242, 10.00099767, 9.99971488, 9.9996304 , 9.99957526,\n", 1079 | " 9.99962409, 10.00078201, 10.0005748 , 10.00027635, 10.00094458,\n", 1080 | " 9.99915061, 10.00042229, 10.00008712, 9.99955296, 9.99935311,\n", 1081 | " 10.00076805, 10.00027261, 10.0005381 , 10.00007226, 10.0008371 ])" 1082 | ] 1083 | }, 1084 | "execution_count": 12, 1085 | "metadata": {}, 1086 | "output_type": "execute_result" 1087 | } 1088 | ], 1089 | "source": [ 1090 | "%%time\n", 1091 | "x = da.random.normal(10, 0.1, size=(20000, 20000), chunks=(1000, 1000))\n", 1092 | "y = x.mean(axis=0)[::100] \n", 1093 | "y.compute()" 1094 | ] 1095 | }, 1096 | { 1097 | "cell_type": "code", 1098 | "execution_count": 13, 1099 | "id": "understood-mechanics", 1100 | "metadata": {}, 1101 | "outputs": [], 1102 | "source": [ 1103 | "client.close()" 1104 | ] 1105 | }, 1106 | { 1107 | "cell_type": "markdown", 1108 | "id": "blind-growing", 1109 | "metadata": {}, 1110 | "source": [ 1111 | "## Checkpoint" 1112 | ] 1113 | }, 1114 | { 1115 | "cell_type": "markdown", 1116 | "id": "interim-silicon", 1117 | "metadata": {}, 1118 | "source": [ 1119 | "**Question**: Using Dask Array, create a random matrix of size 1 million x 1 million and compute the standard deviation." 1120 | ] 1121 | }, 1122 | { 1123 | "cell_type": "code", 1124 | "execution_count": null, 1125 | "id": "floppy-gender", 1126 | "metadata": {}, 1127 | "outputs": [], 1128 | "source": [ 1129 | "#your answer here" 1130 | ] 1131 | }, 1132 | { 1133 | "cell_type": "code", 1134 | "execution_count": null, 1135 | "id": "thick-marshall", 1136 | "metadata": { 1137 | "jupyter": { 1138 | "source_hidden": true 1139 | }, 1140 | "tags": [] 1141 | }, 1142 | "outputs": [], 1143 | "source": [ 1144 | "# Answer\n", 1145 | "\n", 1146 | "x = da.random((1_000_000, 1_000_000), chunks=(10_000, 10_000))\n", 1147 | "y = x.std(axis=0)\n", 1148 | "y" 1149 | ] 1150 | }, 1151 | { 1152 | "cell_type": "markdown", 1153 | "id": "suspected-sociology", 1154 | "metadata": {}, 1155 | "source": [ 1156 | "## Limitations of Dask Array\n", 1157 | "\n", 1158 | "* Dask Array does not implement the entire NumPy interface. For example, it does not implement `np.linalg` and `np.sometrue`.\n", 1159 | "* Dask Array does not support some operations where the resulting shape depends on the values of the array.\n", 1160 | "* Dask Array does not attempt operations like sort which are difficult to do in parallel." 1161 | ] 1162 | }, 1163 | { 1164 | "cell_type": "markdown", 1165 | "id": "retained-saturn", 1166 | "metadata": {}, 1167 | "source": [ 1168 | "## References\n", 1169 | "\n", 1170 | "* [Dask Array documentation](https://docs.dask.org/en/latest/array.html)\n", 1171 | "* [Dask Array API](https://docs.dask.org/en/latest/array-api.html)\n", 1172 | "* [Dask Array examples](https://examples.dask.org/array.html)\n", 1173 | "* [Dask Tutorial - Array](https://tutorial.dask.org/03_array.html)" 1174 | ] 1175 | }, 1176 | { 1177 | "cell_type": "code", 1178 | "execution_count": null, 1179 | "id": "polyphonic-parks", 1180 | "metadata": {}, 1181 | "outputs": [], 1182 | "source": [] 1183 | } 1184 | ], 1185 | "metadata": { 1186 | "kernelspec": { 1187 | "display_name": "Python 3", 1188 | "language": "python", 1189 | "name": "python3" 1190 | }, 1191 | "language_info": { 1192 | "codemirror_mode": { 1193 | "name": "ipython", 1194 | "version": 3 1195 | }, 1196 | "file_extension": ".py", 1197 | "mimetype": "text/x-python", 1198 | "name": "python", 1199 | "nbconvert_exporter": "python", 1200 | "pygments_lexer": "ipython3", 1201 | "version": "3.8.10" 1202 | } 1203 | }, 1204 | "nbformat": 4, 1205 | "nbformat_minor": 5 1206 | } 1207 | -------------------------------------------------------------------------------- /04-delayed.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Dask Delayed\n", 8 | "\n", 9 | "## Notebook Objectives\n", 10 | "* **Recap Delayed API** from first course.\n", 11 | "* **Parallelize Python code with Delayed API**.\n", 12 | "* **Best Practices** for using Dask Delayed.\n", 13 | "* **References** for further reading." 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "## Recap basics of Delayed API" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "We introduced the Dask Dealayed API in the first course:\n", 28 | "* It can be used to parallelize regular Python code.\n", 29 | "* It is evaluated lazily, meaning computation isn't evaluated until necessary, or until we call `compute()`.\n", 30 | "* The task graph generated can be visualized using `visualize`." 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": 1, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "from time import sleep\n", 40 | "\n", 41 | "def inc(x):\n", 42 | " sleep(1)\n", 43 | " return x + 1\n", 44 | "\n", 45 | "def add(x, y):\n", 46 | " sleep(1)\n", 47 | " return x + y" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 2, 53 | "metadata": {}, 54 | "outputs": [ 55 | { 56 | "data": { 57 | "text/plain": [ 58 | "Delayed('add-6aae8eb9-1fa2-4af7-af1e-0b71f22ff8a6')" 59 | ] 60 | }, 61 | "execution_count": 2, 62 | "metadata": {}, 63 | "output_type": "execute_result" 64 | } 65 | ], 66 | "source": [ 67 | "from dask import delayed\n", 68 | "\n", 69 | "x = delayed(inc)(10)\n", 70 | "y = delayed(inc)(10)\n", 71 | "\n", 72 | "z = delayed(add)(x, y)\n", 73 | "z" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 3, 79 | "metadata": {}, 80 | "outputs": [ 81 | { 82 | "data": { 83 | "text/plain": [ 84 | "22" 85 | ] 86 | }, 87 | "execution_count": 3, 88 | "metadata": {}, 89 | "output_type": "execute_result" 90 | } 91 | ], 92 | "source": [ 93 | "z.compute()" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 4, 99 | "metadata": {}, 100 | "outputs": [ 101 | { 102 | "data": { 103 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAALMAAAFyCAYAAAC+1+tWAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nO3deXxM9/4/8NfMZB9JiIQIbVJboqKC2gUpSqtoS6WqdtX14na59Gup20fbS1FVO9fSRqKC0ha1L7FEKmIniQQl1pB9kExm3r8/XPk1SGQ5M5+Zz3k/Hw9/dDI555V3Xzk5c+bMORoiIjBm/9ZoRSdgTClcZiYNLjOThoPoAEpLS0vDwYMHRcewef379xcdQXEa2V4ARkdHIzw8XHQMmyfZ/3ZA5heARMT/HvNv9erVov/XWIy0ZWbqw2Vm0uAyM2lwmZk0uMxMGlxmJg0uM5MGl5lJg8vMpMFlZtLgMjNpcJmZNLjMTBpcZiYNLjOTBpeZSYPLzKTBZWbS4DIzaXCZmTS4zEwaXGYmDS4zkwaXmUmDy8ykwWVm0uAyM2lwmZk0uMxMGlxmJg0uM5MGl5lJg8vMpMFlZtLgMjNpcJmZNLjMTBpcZiYNLjOTBpeZSYPLzKTBZWbS4DIzaXCZmTS4zFaQlpYmOoIqOIgOYCnR0dGiIwAADAYDFi5ciE8++UR0FABAbGys6AgWI22Zw8PDRUcoxtbyyEi63Yz+/fuDiGzmX8eOHQEACxcuFJ7l7/9kJF2Zbcm1a9ewf/9+AMBPP/0kOI38uMwW9PPPP0OrvT/i2NhY/PXXX4ITyY3LbEE//fQTTCYTAMDBwQFr1qwRnEhuXGYLSU1NxfHjx4v2TwsLC3lXw8K4zBYSFRUFB4f/f7CIiHDy5EmcOXNGYCq5cZktJCIiAkajsdhjjo6ONnP8W0ZcZgs4evQozp0798jjRqMRy5cvF5BIHbjMFrBq1So4OTk99muXLl1CfHy8lROpA5dZYUSEyMhIFBQUPPbrjo6OWLVqlZVTqQOXWWH79+/H1atXS/y60WhERERE0SE7phwus8JK28V4ID09Hfv27bNSIvXgMiuosLAQq1evhslkgrOzc9E/JyenYv8NgHc1LEDas+ZEuH37Nv71r38Ve+zEiROIiorC1KlTiz1erVo1a0ZTBQ3JegqVjYiOjkZ4eLi0Z6rZkDW8m8GkwWVm0uAyM2lwmZk0uMxMGlxmJg0uM5MGl5lJg8vMpMFlZtLgMjNpcJmZNLjMTBpcZiYNLjOTBpeZSYPLzKTBZWbS4DIzaXCZmTS4zEwaXGYmDS4zkwaXmUmDy8ykwWVm0uAyM2lwmZk0uMxMGlxmJg0uM5MGX2zcQrKyspCbm4srV64AAM6fPw+9Xg93d3e4ubkJTicnvth4JZnNZpw4cQJ79uxBfHw8kpKSkJycjJycnBK/p3bt2ggMDERQUBDatWuHsLAw+Pn5WTG1lNZwmSvAbDZj165diIiIwKZNm3D79m14e3ujVatWaNSoERo2bAh/f394eHhAr9dDr9cjKysLeXl5yMnJKSr86dOnER8fj4KCAgQFBaF///4YNGgQ6tevL/pHtEdrQKzMsrKy6Ouvv6Y6deoQAGrdujXNnDmTjh07RiaTqULLNBgMtHXrVvr444/Jz8+PAFD79u1p/fr1ZDabFf4JpBbNZS6DnJwcmjhxInl6epKnpyeNGzeOEhMTFV9PYWEhbdmyhV577TXSarUUHBxM0dHRiq9HUlzmJ1m9ejXVrl2bqlevTl9//TVlZWVZZb2nTp2it956i7RaLXXp0sUivzyS4TKXJD09nXr27EkajYZGjBhB6enpQnLExsZSs2bNyMnJib799lve9SgZl/lxYmJiqE6dOuTv70/79u0THYcKCwtp2rRp5OjoSD179qRbt26JjmSLuMwPW7FiBTk4OFCfPn0oIyNDdJxiDh48SE8//TTVrVuXkpOTRcexNVzmv5s+fTppNBqaMGGCzf45T09Pp1atWlGNGjUoPj5edBxbwmV+4JtvviGtVkuzZ88WHeWJcnNzqXv37uTh4UEJCQmi49gKLjMR0ZIlS0ij0dC8efNERymz/Px86tq1K9WsWZNSUlJEx7EFXOYtW7aQTqejyZMni45Sbjk5OfT8889T/fr1KTMzU3Qc0aJV/Xb21atXERISgm7duiEyMlJ0nAq5ceMGmjdvjrZt22Lt2rWi44ik3hvBm81mDBw4EF5eXli0aJHoOBVWs2ZNREZGYsOGDZg3b57oOGKJ/tsgyvz588nR0ZGOHTsmOooiJk+eTHq9ni5duiQ6iijq3M24efMmgoKCMGrUKEydOlV0HEUUFBTgueeeQ3BwsFp3N9S5mzFhwgRUqVIFkyZNEh1FMU5OTpgzZw7WrVuHnTt3io4jhOq2zJcuXUL9+vWxaNEiDBs2THQcxfXo0QP37t3Dnj17REexNvVtmadPnw5fX18MHDhQdBSLmDx5Mvbu3Yv9+/eLjmJ1qtoy5+TkoFatWpg2bRo++ugj0XEsJjQ0FDVq1MC6detER7EmdW2Zo6OjYTab8fbbb4uOYlEjR47Exo0bcfv2bdFRrEpVZY6IiECfPn1QtWpV0VEsql+/fnBycsKaNWtER7Eq1ZQ5PT0d+/btw4ABA0RHsTi9Xo+ePXvil19+ER3FqlRT5j179kCn0yEsLMzq687Ly8Pvv/+OcePGVeo55dGtWzccOHAA+fn5iizPHqimzLt370aLFi3g4eFh9XVv2bIFo0ePxs8//1yp55RH165dcefOHcTFxSmyPHugmjLHxcWhQ4cOQtbdr18/tGrVCg4OJV9AqizPKQ9/f3/UqVMHsbGxiizPHqiizESE5ORkPPvss8IyaLVaaLWlj7sszymPRo0aISkpSbHl2TpVXGsuLS0NeXl5CAwMVGyZycnJOHToEE6cOIH27dvjtddeK/b1jIwMrF27FhcvXsTzzz8PIoJGoyn3cyojMDAQCQkJii3P5gk7x8mKYmJiCABdvXpVkeXNmjWLOnfuTGazmS5cuEABAQE0f/78oq8nJiZSy5Yt6eDBg2Q0GmnRokXk7OxMDRs2LNdzlMhZq1YtxZZn46JVsZuRnZ0NAPD09FRkefPmzUPjxo2h0WgQEBCAkJAQbNy4sejrQ4YMQefOndG2bVs4ODjgnXfeQe3atYstoyzPqSxPT89SL+AoG1WUOS8vDzqdDq6uroosb8+ePfjqq68AAGfOnMHly5dx7tw5AMCuXbsQFxdX7BCgRqNBy5Yti3YhyvIcJbi7u+POnTswmUyKLdOWqaLMd+/ehaurq2JFqV27Nv7880+MHj0aZ8+eRb169WA2mwEAx48fBwAEBwcX+56/r7ssz1GCXq8HEeHOnTuKLtdWqeIFoIuLC+7evavY8iZNmoS9e/di69atcHV1LXZCz4M/63FxcXjqqaeKfd+DspblOUp4UGKl/iLZOlVsmd3d3WEymRQp9IULF/DVV1/h7bffLirJg60yADRp0gTA/V2JkpTlOUrIzc2Fm5ubYseubZ0qyvzgXb8HLwQrIy8vDwCwatUq5OTkYN++fYiJiUFmZiby8vIQFhaGoKAgREREICYmBsD9T4Hv3bsXaWlpOHHiBF5++eUnPqewsLDSWXNycoS84ymKKsrs7+8P4P5WtbKaNGmC4cOHY//+/WjRogXOnDmDOXPmIC8vD3369AER4Y8//kCjRo3QqVMn1KtXD5999hmef/55hISE4ODBgwDwxOcoUebU1FQEBARUejn2QhUn55vNZri7u2PevHkYOnSoIsvMzc2Fu7t70X/n5+fD2dm52HPS09Ph5uYGvV6PvLw8VKlS5ZHllOU5FdWjRw/4+vpixYoVii3Thqnj5HytVosGDRrg7Nmzii3z70UG8EiRAcDHxwd6vR4ASixpWZ5TUYmJiYq+62nrVFFmAGjVqhUOHDggOobVXL58GX/99Rdat24tOorVqKbMYWFhiIuLQ25urugoVrFz5064uLigbdu2oqNYjarKbDKZio4eyG7nzp1o166dao4xAyoqs6+vL9q0aaPYye+27M6dO/j111/x6quvio5iVaopMwAMGjQIv/zyi/S7Ghs2bMDdu3fRv39/0VGsSlVlDg8Ph8lkwqpVq0RHsailS5fipZdeQs2aNUVHsSpVldnLywuDBw/GtGnTFHlTwhbFxcVh165dGDNmjOgoVqeKN03+7vz58wgMDMSPP/6It956S3QcxfXq1Qvp6ek4dOiQ6CjWps4bwQ8ZMgQHDhzAyZMnpXq1HxMTg86dO2Pjxo14+eWXRcexNnWW+fr16wgKCsLYsWMxZcoU0XEUUVhYiBYtWqBWrVrYsmWL6DgiqOPt7If5+vpiypQpmDZtmqJvcYs0Y8YMnDt3DvPnzxcdRRhVbpmB+1uy0NBQGAwGxMXF2fXuxqFDh9CxY0d88803+PTTT0XHEUWduxkP/PXXX2jWrBn69u2LJUuWiI5TIRkZGWjevDkaN26MjRs3Kv7RKzuizt2MB/z9/bFixQosW7YM06dPFx2n3O7evVt0DvWPP/6o5iLfJ+D6BjZn3rx5pNFoaOnSpaKjlFlhYSG9/vrrVL16dTpz5ozoOLYgWh0fDnuCDz74AGlpaXj33XcBAMOHDxecqHT5+fkYOHAgtmzZgh07dqBRo0aiI9kG0b9OtmTq1Kmk0Wjoiy++EB2lRLm5udStWzeqWrUq7d27V3QcW8L3zn7Y3LlzSavVUt++fSkrK0t0nGJOnz5NjRs3plq1atHRo0dFx7E16rg8V3l8+OGH2LRpE2JiYtC6dWscPXpUdCQQERYuXIgWLVrAx8cH8fHxCAkJER3L9oj+dbJVV65coc6dO5ODgwONHTuWsrOzheQ4fvw4dejQgXQ6HU2ePJkKCwuF5LADvJtRGrPZTEuXLiVvb2+qVasWzZ49m1JTUyknJ8ei601OTqaUlBQaNWoUOTg4UJs2bXi34sm4zGVx+/ZtGjNmDLm6upKDgwN9+eWXdPnyZYus6+DBgxQaGkparZbq1q1Ly5YtI5PJZJF1SYbLXFbXrl2jgIAAcnV1JS8vL9JqtdSlSxdavHgxpaSkVHi5RqORDh48SJMnT6b69esTAKpbty4BoBEjRpDZbFbwp5BatKrfzi6rmzdvIjQ0FMnJyWjSpAni4+OxefNm/PTTT9i2bRsMBgP8/f3Rrl07BAYGIigoCAEBAahSpQqqVKkCvV6PnJwcZGdnIycnB8nJyUhKSsLp06dx4MAB5Obmok6dOnjjjTcwaNAgeHp6ol69egDuH/NesmSJoreHkJS6z80oixs3biA0NBQXL16E0WhEr1698NtvvxV9vaCgoOjTHQkJCUhMTMT58+dL/SSLu7t7Uenbt2+PsLCwYhdryc/Ph6urK4gIWq0Wo0aNwvz58/nt6tKt4XcAS5Geno5OnToVFdnJyanounUPODk5ITQ0FKGhoUWPGY1GXLt2Dbm5ucjLy4PBYICnpyc8PDzg7u4OX1/fUtfr7OyMqlWrIjMzE2azGYsXL4bRaMSSJUu40KXgMpcgPT0dHTt2xPnz52E0GgHcv3ZyWW7V4OjoiKeffrpS6/fz80NmZiaA+9fKW758ObRaLRYtWsSFLgHviD3GgyKnpKQUFRm4v8WtU6eOVTI888wzxf7bbDZj6dKlGDt2LHjP8PG4zA/JzMxEly5dkJqa+sh+r9lstlqZn376aTg6Oj6y/rlz5+Kf//ynVTLYGy7z32RmZqJz585ITEwstkX+O2uVuXbt2o89gmE2mzFnzhyMHTvWKjnsCZf5f7KyshAWFoazZ8+WWOSy7jMroU6dOiXmeFBo3kIXx2XG/Y8ehYaG4syZMyUWCLh/SM1anxWsU6dOsXulPMxsNmP27Nn47LPPrJLHHnCZAURERCAxMfGJz/Pz87NCmvuetDuj0Wig0WiwcuVKnDx50kqpbBuXGcCYMWOQlpaGjz/+GM7OziXenenhIwyWVFKZtVotNBoN/Pz88N133+H8+fNFd69SOy7z/9SsWRNTp05FWloa3n777UduOebo6PjIGyaW5ObmVuxWEzqdDhqNBt7e3liwYAEuXryIMWPG2PUlEpTGZX6It7c3Tp48iRdffBFffvklvLy8oNPpYDKZrPbi74FatWoVvUHSpEkTrFy5EgUFBcjIyFDNvf3KReRpTrZo06ZNBIDi4+OJiMhgMNDs2bPJz8+Pli1bZtUs3bt3p9DQUNq2bVvRY+PHj6caNWqQwWCwahY7wGfNPSw0NBQeHh7YtGlTsccLCgpgMBhQrVo1q2W5cePGI9dYvnnzJp555hlMnToV//jHP6yWxQ7wWXN/FxMTg06dOmHfvn3o0KGD6DglGj16NNavX4/U1FQ4OTmJjmMruMx/16NHD9y9exd79+4VHaVUly9fRv369bFgwQKbv8aHFXGZHzh27BiaN2+OzZs3o0ePHqLjPNHw4cMRExODpKQk6HQ60XFsAZf5gX79+iE1NRUJCQl2cYplamoqAgMDERUVpbob8ZSAywzcvy1v48aNER0djb59+4qOU2bh4eFISkrC0aNH7eIX0MK4zMD920L8+eefOH36tF191u748eNo1qwZfv/9d/Ts2VN0HNG4zJcuXUL9+vWxZMkSDBkyRHSccuvZsycyMjIQGxsrOopo6r4+MwBMmzYNtWrVwoABA0RHqZBJkybh0KFDNn8ExhpUvWW+ceMGnnnmGcyYMQMffPCB6DgV1rlzZzg7O2Pr1q2io4ik7i3zzJkz4eHhgWHDhomOUimff/45tm3bhsOHD4uOIpRqt8wZGRkICAjAxIkT8a9//Ut0nEpr1aoVnnrqKaxbt050FFHUu2WeM2cOtFpt0dXy7d348eOxfv16nDp1SnQUYVRZZoPBgLlz52LMmDHw9PQUHUcRr732Gho3boxvv/1WdBRhVFnmhQsX4u7du1KddabRaPDZZ58hKioKKSkpouMIoboy5+fnY9asWXjvvffg7e0tOo6iBg4ciICAAHz33XeiowihujIvX74ct27dkvJj+jqdDp988gmWLVuGq1evio5jdaoqs8lkwsyZMzF06FCrfwTKWoYPH47q1atj1qxZoqNYnarKvGrVKly4cAGffPKJ6CgW4+zsjLFjx2LBggW4deuW6DhWpZoyExGmTZuGAQMGoEGDBqLjWNQHH3wAV1dXzJ07V3QUq1JNmTds2IDTp09j3LhxoqNYnF6vx4cffog5c+YgNzdXdByrUU2Zp0+fjj59+iA4OFh0FKsYPXo0jEYjFi1aJDqK1aiizNu3b0dsbKwUb1uXlZeXF959913MmDEDd+/eFR3HKlRxbsYLL7wAnU6H7du3i45iVdevX0fdunUxc+ZMvP/++6LjWJr852bExcVh9+7d+L//+z/RUazO19cXQ4YMwdSpU0u9uqkspN8y9+rVC+np6Th06JDoKEJcuHABDRs2xLJlyzBo0CDRcSxJ7o9NnThxAiEhIfj111/Rq1cv0XGEGTx4MA4fPmx3n3EsJ7nLPGDAAJw5cwbHjh1T9aeXz549i+DgYKxZswavv/666DiWIm+ZU1NTERQUhIiICLz55pui4wj3+uuv4+LFizhy5Iisv9jyvgD8z3/+A39/f/Tr1090FJswadIkHDt2TOojOlJumdPS0lCvXj3MmzcPI0eOFB3HZnTv3h35+fnYs2eP6CiWIOeWecaMGahRowYGDx4sOopN+fzzz7F3717s379fdBSLkG7LfPv2bQQEBOCrr77CmDFjRMexOaGhofD09MTGjRtFR1GafFvmWbNmwdnZGSNGjBAdxSaNHz8emzZtQkJCgugoipNqy5yTkwN/f398+umnmDBhgug4NqtFixaoX78+Vq9eLTqKkh49NLdhwwa7vOYacP9Gj/n5+XBxcbH44afs7GyLLNca8y8sLERhYSFcXFwsuh5Lesz81zxyy6KCggLk5ORg8eLF1kllZ+Lj4y06G55/6Uqbf4n333rnnXcsFsieeXp6WqVoPP/HK23+0r0AZOrFZWbS4DIzaXCZmTS4zEwaXGYmDS4zkwaXmUmDy8ykwWVm0uAyM2lwmZk0uMxMGlxmJg0uM5MGl5lJg8vMpMFlZtLgMjNpcJmZNLjMTBpcZiYNLjOTBpeZSYPLzKTBZWbS4DIzaXCZmTS4zEwaXGYmDS4zkwaXmUmDy8ykwWVm0uAyM2lwmZk0uMxMGlxmJg0uM5MGl5lJg8vMpMFlZtLgMjNpcJmZNOyizLGxsaIjqJq9zN+hpC94enpaM0eJzGYz7t69C71eLzoKAMBoNFplPTz/xytt/o+UuWXLlli8eLFFA5XHH3/8gfXr12PcuHHw8fERHcfieP6VQDYuMDCQANCXX34pOooq2dH8ozVERKJ/oUpy6tQpNGnSBABQv359nDt3TnAidbGz+a+x6ReAUVFRcHR0BACkpKTg2LFjghOpi73N32bLTESIiIgo2uF3cnLCqlWrBKdSD3ucv83uZsTGxqJdu3bFHvP19cWVK1eg1drs76A07HD+trubsWrVKjg5ORV77Pr16zhw4ICgROpij/O3yTKbTCZERkaioKCg2OOOjo42/6dOBvY6f5ss886dO5GRkfHI40ajEVFRUVZ740Kt7HX+NlnmqKioR/7EPZCdnY0dO3ZYOZG62Ov8ba7M9+7dw9q1ax/5E/eAo6MjIiMjrZxKPex5/jZX5k2bNuHOnTslft1oNGLdunUwGAxWTKUe9jx/mytzVFTUEw/93Lt3D5s2bbJSInWx5/nbVJlzcnKwefNmmEymJz7Xll9V2yt7n3+Jp4CKoNVqsX///mKPbd++HZ9//jni4+OLPf7gbVamHHufv02VuUqVKmjRokWxx1JTUwHgkceZ8ux9/ja1m8FYZXCZmTS4zEwaXGYmDS4zkwaXmUmDy8ykwWVm0uAyM2lwmZk0uMxMGlxmJg0uM5MGl5lJg8vMpMFlZtLgMjNpcJmZNLjMTBpcZiYNLjOTBpeZSYPLzKTBZWbS4DIzaXCZmTS4zEwaXGYmDS4zkwaXmUmDy8ykYTPXZ87NzUVsbCxOnz6NpKQkXLx4EVlZWbh58yZcXV3RvHlzVKlSBV5eXmjYsCECAwMREhKCkJAQ6HQ60fHtngzzF3q74XPnziEyMhJbt25FfHw8CgsLUbNmTQQFBaFu3brw8vKCXq+Hm5sbsrKyYDAYkJ6ejqSkJCQnJ8NgMKBq1aro2LEjXn31VfTt2xceHh6ifhy7I9n814CszGg00sqVK6lt27YEgPz8/Oi9996j1atX040bN8q8HLPZTKdOnaIffviB+vTpQ87OzuTq6koDBw6khIQEC/4E9k3i+Udbrcwmk4n++9//Ur169cjBwYHefPNN2rJlCxUWFiqy/IyMDFq0aBE1b96cNBoNvfTSS3T48GFFli0DFczfOmU+fPgwtWzZkhwdHWnUqFGUmppq0fVt3ryZ2rVrR1qtlt577z3KyMiw6PpsnUrmb9kyFxYW0oQJE0ir1VKnTp3o1KlTllxdMWazmSIiIsjX15d8fX1p+/btVlu3rVDZ/C1X5qtXr1LHjh3J1dWVFi1aRGaz2VKrKlVWVhaFh4eTVqulSZMmkclkEpLD2lQ4f8uUOSkpiQICAigwMJCOHz9uiVWU28KFC8nZ2ZnCw8MpPz9fdByLUun8lS9zQkIC+fj4UJs2bej27dtKL75Sdu3aRR4eHtS1a1cyGAyi41iEiuevbJmTkpLIx8eHunXrRnl5eUouWjEJCQnk7e1NPXv2JKPRKDqOolQ+f+XKfPXqVQoICKA2bdrY7CAfOHToEOn1eho6dKjoKIrh+StU5sLCQurUqRM1bNiQbt26pcQiLW7z5s2k0+lo3rx5oqNUGs+fiJQq84QJE8jFxcXu3nmbMmUKOTs705EjR0RHqRSePxEpUebDhw+TTqejhQsXKhHIqkwmE73wwgsUHBxst/vPPP8ilSuzyWSi1q1bU4cOHYQdx6yslJQUcnFxoRkzZoiOUm48/2IqV+alS5eSo6OjVd9ZsoQvvviC3N3dy3WijS3g+RdT8TIXFhZSvXr16J133qlMAJtgMBioRo0aNH78eNFRyozn/4iKlzkyMpIcHBwoJSWlMgFsxtSpU8nDw8NuTkri+T+i4mVu3749hYeHV/TbbU5OTg65u7vTDz/8IDpKmfD8HxFdoc8ApqSk4ODBgxg2bJiynxUQyN3dHX379kVERIToKE/E83+8CpU5MjISvr6+6Nq1a4VXbIsGDx6Mw4cPIzExUXSUUvH8H69CZd66dSt69+5d6Q8ynj9/HsOHD0daWlqllqOUjh07olq1ati2bZvoKKXi+T9eucucl5eH+Ph4hIWFVWiFf5eQkIDly5fj5MmTlV6WEnQ6HTp16oTdu3eLjlIinn8pyruXvW3bNgJA169fr+iOejHp6emKLEcps2fPJi8vL9ExSsTzL1H5XwCeOXMGNWrUQM2aNSv22/MQb29vRZajlCZNmiAjIwM3btwQHeWxeP4lK3eZk5KSEBgYWO4VPY7ZbMbu3btx+PDhoscuX76M2bNnw2w249SpU/j6668REREBs9lc7Hvz8vKwcuVKTJo0CdHR0cjOzlYk04OfzVZfBPL8S1HebXmPHj0UOQ/19OnT1K9fPwJACxYsICKi3377jXx8fAgAzZo1i4YNG0avvPIKAaBvvvmm6HvPnj1LL7/8Mh0/fpyMRiMNGDCAqlevrtinjt3c3GjZsmWKLEtpPP8Slf9Nk7Zt29LYsWPL+22PdeLEiWLDJCIaP348AaAdO3YUPda8eXNq0aIFEd1/GzckJIQWL15c9PUjR46Qk5MT/f7774rkqlWrFn3//feKLEtpPP8SRZf7WnN5eXlwd3cv/5+Ax3B2dn7kMVdXVwBAUFBQ0WPPPvsstm7dCgDYvHkzjh07hp49exZ9vXnz5sjNzYWTk5Miudzd3ZGbm6vIspTG8y9ZufeZ8/PzFQtdVjqdDvS/S+IdP34cer0ePj4+xZ6jZCYXFxfcu3dPseUpiedfsnKXWa/Xw2AwlHtFSjGbzTAYDBY9Fpybm6vY1k9pPP+SlbvM7u7uyMvLK/eKlA6J4psAAAT4SURBVNKkSRMAQFRUVLHHb9++jfXr1yuyDlsuM8+/ZOXeZ/b29sbNmzfLvaLHyc/PBwDcunWr6LGcnBwAQEFBQdFjt27dQn5+PogIvXv3RrNmzfDjjz/CxcUFb7zxBk6cOIE9e/YgOjq60pmMRiOysrJQvXr1Si/LEnj+pSjvS8Zx48ZR06ZNy/ttjzh06FDRoaHg4GDauHEj7dmzh+rWrUsAaOTIkXTt2jVatWoVeXh4EACaMmUKGY1GSktLo27dupFGoyGNRkOdO3emtLS0Smciun/YCQAdPXpUkeUpjedfovIfmlu+fDm5ubnZxDXbMjMzFb9qz4YNG0ij0djstSd4/iUq/9vZTZs2xZ07d3Dq1Kny/xlQWNWqVeHl5aXoMv/88080aNAAer1e0eUqhedfsgqVuXr16jZ9Zlll7Nq1Cy+88ILoGCXi+Zes3GXWarXo2LEjduzYUaEV2rKsrCzEx8ejc+fOoqOUiOdfsgqdnN+nTx9s27YNt2/frtBKbdXatWvh4OCA7t27i45SKp7/41WozP369YOTk5Mih2JsSUREBHr37o2qVauKjlIqnn8JKvqqc9CgQdS0aVO7vZLOw06fPk0ajYY2btwoOkqZ8PwfUfFLDRw7dsyu/uc/ycCBA6lRo0Y2ccirLHj+j6jc5bleeeUVatOmjd1vHZKSkkin01FkZKToKOXC8y+mcmU+cuQI6XQ6WrFiRWUWI1z37t3pueeeU+yeeNbC8y+m8pe0/eijj6hGjRp2c1mrh61Zs4Y0Gg3t27dPdJQK4fkXqXyZMzMzyc/Pj/r161fZRVndlStXyMfHh0aMGCE6SoXx/Isoc+X8PXv22N0tFUwmE3Xp0oUaNGhA2dnZouNUCs+fiJS8Qc+///1vcnJysps7ob7//vvk6upqM/fJqyyev4JlNpvNNHToUHJzc6ODBw8qtViLmDJlCul0Olq7dq3oKIrh+St8H8CCggJ66aWXqFq1ajb7gmrq1KkEgObMmSM6iuJUPn/l79B6584d6t27N7m5uSn20XMlGI1Gev/990mn0xX7mLxsVDx/y9w722g00siRI4tu/i36+G1aWlrRTdHXr18vNIs1qHT+linzAwsXLiQXFxfq1KkTJScnW3JVJYqOjiYfHx8KCgqiEydOCMkgisrmb9kyE90/h6Bp06bk7OxMEydOtNrHkRITE+nFF18kjUZDI0aMoNzcXKus19aoaP6WLzPR/T9733//PXl4eJC3tzd98cUXFnvH6tSpUzRo0CBycHCg5557jvbv32+R9dgTlczfOmV+ID09nSZOnEienp5UpUoVGjx4MG3fvr3S+3SZmZm0ZMkSCg0NJY1GQ8HBwRQVFSV8X9HWSD7/aA3R/667ZEXZ2dlYuXIlIiIiEBcXB09PT3Tq1AlhYWFo0qQJGjZsiKeeeuqx31tQUIDU1FQkJibi8OHD2LVrF44cOQJHR0f07t0bQ4YMQY8ePaDRaKz8U9kPSee/RkiZ/+7cuXPYunUrdu3ahX379hVdkMTNzQ3u7u6oUqUK9Ho98vLykJmZidzcXBQWFkKj0SAwMBBhYWEICwvDiy++CE9PT5E/il2SaP7iy/yw9PR0JCYmIjU1Fbm5ucjLy4PBYIC7uzuqVq2KatWqoWHDhggMDCy6YiVTjh3P3/bKzFgFranQB1oZs0VcZiYNLjOTxv8D94J8b9F1GAcAAAAASUVORK5CYII=\n", 104 | "text/plain": [ 105 | "" 106 | ] 107 | }, 108 | "execution_count": 4, 109 | "metadata": {}, 110 | "output_type": "execute_result" 111 | } 112 | ], 113 | "source": [ 114 | "z.visualize()" 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "## Parallize Python code with Delayed\n", 122 | "\n", 123 | "We will look at more examples of using Delayed in this section." 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "### Parallel for-loop\n", 131 | "\n", 132 | "Loops are the most common parts of a program that can be parallelized. Let's take a look at the following sequential for-loop." 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 5, 138 | "metadata": {}, 139 | "outputs": [], 140 | "source": [ 141 | "data = [1, 2, 3, 4, 5, 6, 7, 8]" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 6, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "name": "stdout", 151 | "output_type": "stream", 152 | "text": [ 153 | "CPU times: user 948 µs, sys: 1.19 ms, total: 2.14 ms\n", 154 | "Wall time: 8.03 s\n" 155 | ] 156 | }, 157 | { 158 | "data": { 159 | "text/plain": [ 160 | "44" 161 | ] 162 | }, 163 | "execution_count": 6, 164 | "metadata": {}, 165 | "output_type": "execute_result" 166 | } 167 | ], 168 | "source": [ 169 | "%%time\n", 170 | "\n", 171 | "results = []\n", 172 | "for x in data:\n", 173 | " y = inc(x)\n", 174 | " results.append(y)\n", 175 | "\n", 176 | "total = sum(results)\n", 177 | "total" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "It can be parallelized by wrapping certain functions with delayed." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 7, 190 | "metadata": {}, 191 | "outputs": [ 192 | { 193 | "name": "stdout", 194 | "output_type": "stream", 195 | "text": [ 196 | "CPU times: user 2.09 ms, sys: 944 µs, total: 3.03 ms\n", 197 | "Wall time: 2.43 ms\n" 198 | ] 199 | }, 200 | { 201 | "data": { 202 | "text/plain": [ 203 | "Delayed('sum-fec9e99c-c860-4e82-8ac7-1707b89370c2')" 204 | ] 205 | }, 206 | "execution_count": 7, 207 | "metadata": {}, 208 | "output_type": "execute_result" 209 | } 210 | ], 211 | "source": [ 212 | "%%time\n", 213 | "\n", 214 | "results = []\n", 215 | "for x in data:\n", 216 | " y = delayed(inc)(x)\n", 217 | " results.append(y)\n", 218 | "\n", 219 | "total = delayed(sum)(results)\n", 220 | "total" 221 | ] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "execution_count": 8, 226 | "metadata": {}, 227 | "outputs": [ 228 | { 229 | "name": "stdout", 230 | "output_type": "stream", 231 | "text": [ 232 | "CPU times: user 2.94 ms, sys: 1.87 ms, total: 4.82 ms\n", 233 | "Wall time: 1.01 s\n" 234 | ] 235 | }, 236 | { 237 | "data": { 238 | "text/plain": [ 239 | "44" 240 | ] 241 | }, 242 | "execution_count": 8, 243 | "metadata": {}, 244 | "output_type": "execute_result" 245 | } 246 | ], 247 | "source": [ 248 | "%%time\n", 249 | "total.compute()" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 9, 255 | "metadata": {}, 256 | "outputs": [ 257 | { 258 | "data": { 259 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAvMAAAF5CAYAAAASxqX0AAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nOzdeVhU9f4H8PcMMyyyiooiKIYb5JJLFoq7uHOtrpJlYZvizYzUFtqpW96Lt230aoWlCd4yccfUCjMV1yKX65KikoCKu6CICDPz+f3Rj7mg4DozZ87wfj0PT0/AzHnPQeDNOd9FIyICIiIiIiJSm4VapRMQEREREdHtYZknIiIiIlIplnkiIiIiIpXSKR2AiMhZHD16FJs3b1Y6hsN7+OGHlY5AROQ0NJwAS0RkHWlpaRg5cqTSMRwef+0QEVkNJ8ASEVmbiPCtmrcFCxYo/aUhInI6LPNERERERCrFMk9EREREpFIs80REREREKsUyT0RERESkUizzREREREQqxTJPRERERKRSLPNERERERCrFMk9EREREpFIs80REREREKsUyT0RERESkUizzREREREQqxTJPRERERKRSLPNERERERCrFMk9EREREpFIs80REREREKsUyT0RERESkUizzREREREQqxTJPRERERKRSLPNERERERCrFMk9EREREpFIs80REREREKsUyT0RERESkUizzREREREQqxTJPRERERKRSLPNERERERCrFMk9EREREpFIs80REREREKsUyT0RERESkUizzREREREQqxTJPRERERKRSLPNERERERCrFMk9EREREpFIs80REREREKsUyT0RERESkUizzREREREQqxTJPRERERKRSLPNERERERCrFMk9EREREpFIs80REREREKsUyT0RERESkUizzREREREQqxTJPRERERKRSLPNERERERCrFMk9EREREpFIs80RETsxsNisdgYiIbEindAAiImeTlpamdAQAgIjg+++/x+DBg5WOAgDYsmWL0hGIiJwOyzwRkZWNHDlS6QhVzJ07V+kIRERkIxxmQ0RkJQ8//DBExGHenn32WQDA2rVrFc9S+Y2IiKyHZZ6IyAmVl5fjm2++AQDLf4mIyPmwzBMROaEff/wRRUVFAIAFCxbgypUrCiciIiJbYJknInJC33zzDfR6PQCguLgYP/zwg8KJiIjIFljmiYicTElJCZYuXYry8nIAgIuLC77++muFUxERkS2wzBMROZkVK1agtLTU8v9GoxHLly9HcXGxgqmIiMgWWOaJiJzM119/DRcXlyrvKy8vR3p6ukKJiIjIVljmiYicSGFhIb7//nsYjcYq79doNJg3b55CqYiIyFZY5omInMiiRYtgNpuveb/JZEJGRgbOnDmjQCoiIrIVlnkiIidyo6vvixcvtlMSIiKyB5Z5IiInUVBQgI0bN8JkMlX7cRFBamqqnVMREZEtscwTETmJBQsWQKut+ce62WzGli1bkJuba8dURERkSyzzREROYt68eddMfL2aiGDhwoV2SkRERLamUzoAERHduePHj+P8+fMICgqyvM9oNKK0tBReXl5VPnfLli32jkdERDaiERFROgQREVlfWloaRo4cCf6YJyJyWgs5zIaIiIiISKVY5omIiIiIVIplnoiIiIhIpVjmiYiIiIhUimWeiIiIiEilWOaJiIiIiFSKZZ6IiIiISKVY5omIiIiIVIplnoiIiIhIpVjmiYiIiIhUimWeiIiIiEilWOaJiIiIiFSKZZ6IiIiISKVY5omIiIiIVIplnoiIiIhIpVjmiYiIiIhUimWeiIiIiEilWOaJiIiIiFSKZZ6IiIiISKVY5omIiIiIVIplnoiIiIhIpVjmiYiIiIhUimWeiIiIiEilWOaJiIiIiFSKZZ6IiIiISKVY5omIiIiIVIplnoiIiIhIpVjmiYiIiIhUimWeiIiIiEilWOaJiIiIiFSKZZ6IiIiISKV0SgcgIiLrEBEcO3YMJ0+exKVLl7Bz504AwJo1a+Dl5QVPT08EBwejbt26CiclIiJr0YiIKB2CiIhujYhg7969WLt2LTZv3ozs7GwcOHAAJSUlN3xsQEAAwsLC0KZNG/Ts2RN9+vRBw4YN7ZCaiIisbCHLPBGRSpjNZvz888+YN28eVq9ejVOnTqFu3bro3r077r77brRq1QqtWrVCYGAgPD09LVfjCwsLUVxcjOLiYuTm5uLAgQM4cOAAdu3ahV9++QUmkwlt2rTB8OHDERsbi+bNmyv9UomI6OawzBMRObrTp0/j3//+N+bOnYv8/Hzcd999iImJQd++fdGhQwdotbc//am4uBiZmZn48ccfsWDBApw4cQLdunXDs88+i5EjR0Kn42hMIiIHxjJPROSojh07hg8++ABffPEFvLy8MHbsWDz++OMICwuzyfFMJhMyMjKQkpKCRYsWISQkBAkJCXjiiSfg6upqk2MSEdEdYZknInI05eXl+PTTT/Hmm2/C09MTkyZNwvPPP486derYLcORI0fwySefYNasWQgODsaMGTMwcOBAux2fiIhuCss8EZEj2bBhA8aNG4e8vDy8+eabePHFFxW9Kp6bm4uJEydi2bJleOSRRzBt2jQEBAQoloeIiKpYyHXmiYgcgMlkwrvvvou+ffuiZcuW2LdvH1577TXFh7eEhIRg6dKl+O6777B161Z06NABP//8s6KZiIjof1jmiYgUdurUKfTv3x9JSUmYPn060tPTERISonSsKoYOHYqdO3eiW7du6N+/P/7+97+DN3aJiJTHZQqIiBSUk5ODgQMHQkSwefNmdOzYUelINfL19cWiRYswY8YMTJ48GYcPH8aXX34JvV6vdDQiolqLY+aJiBSyd+9eDBw4EI0aNcKqVatUNRb9p59+wkMPPYT7778fS5Ysgbe3t9KRiIhqI46ZJyJSwt69e9GzZ0+EhYXh559/VlWRB4B+/fphzZo12LlzJ4YPH46ysjKlIxER1Uq8Mk9EZGdHjx5FZGQkGjdujJ9++smuS05a2+7du9GzZ08MGjQIX3/99R1tYEVERLeMV+aJiOypqKgI/fv3h5+fH1avXq3qIg8A7dq1w+LFi7F06VK88sorSschIqp1WOaJiOxozJgxKCwsxOrVq+Hn56d0HKvo27cv5syZg48//hhLlixROg4RUa3C1WyIiOxk5syZWLJkCb7//ns0btxY6ThWNWrUKKxfvx7PPPMMOnbsiLvuukvpSEREtQLHzBMR2cGBAwfQoUMHJCQk4J133lE6jk2UlpYiIiICXl5eyMzMhEajUToSEZGzW8gyT0RkB1FRUTh16hS2b98Onc55b4ru3r0bnTp1QnJyMp5++mml4xAROTtOgCUisrX58+fj559/RnJyslMXeeDPCbHjx4/HK6+8gjNnzigdh4jI6fHKPBGRDZWVlaFFixYYOHAgvvjiC6Xj2EVRURHCwsIwatQofPTRR0rHISJyZrwyT0RkS6mpqThx4gTefPNNpaPYja+vL1555RV8/vnnOH36tNJxiIicGss8EZGNmEwmfPDBB3jiiScQEhKidBy7GjduHLy8vDBt2jSloxAROTWWeSIiG1mxYgUOHTqEhIQEpaPYXZ06dTBx4kTMnDkTpaWlSschInJaLPNERDaSmpqKfv36oUWLFkpHUcQzzzyD4uJipKenKx2FiMhpscwTEdnAuXPnsGrVKsTGxiodRTEBAQGIiorCvHnzlI5CROS0WOaJiGxg0aJF0Ol0eOihh5SOoqjRo0fjhx9+wLlz55SOQkTklFjmiYhsICMjA3369IGXl5fSURQ1ePBgmM1mrFu3TukoREROiWWeiMjKRATr1q1Dnz59lI6iOD8/P3Ts2BE///yz0lGIiJwSyzwRkZXt3r0bZ86cYZn/f3379sXatWuVjkFE5JRY5omIrGzHjh3w8PDAPffco3QUhxAREYH9+/fj8uXLSkchInI6LPNERFa2f/9+tGzZElotf8QCQOvWrWE2m3Hw4EGloxAROR3+piEisrIDBw6gdevWSsdwGC1atIBOp8P+/fuVjkJE5HR0SgcgInI2OTk5GDJkiNWeT0Swfv167Ny5Ey4uLggLC0P//v2xYsUKHD58GF5eXhgzZgwuXryI1NRUlJeXIzAwECNHjgQAXL58GcuXL8ewYcNw6tQprFq1Co0bN8Zf/vIXuLi44OTJk0hPT4dWq0VMTAx8fHyslh0AXF1d0aRJE+Tk5Fj1eYmIiGWeiMjqioqK4OfnZ7Xne/PNN3HXXXdh4sSJyMrKwnPPPYf+/fvjL3/5C9q2bYuioiKMGTMG3t7eGD16NIKDg9GmTRuMHDkS69evx9ixY3Hw4EF89NFHOHDgAPz8/PDyyy9j8ODBGDRoENatWweTyYQFCxZg+fLlNtmx1c/PD0VFRVZ/XiKi2o5lnojIyi5evAhvb2+rPJeIYNasWVi4cCEA4N5778WwYcMsHw8PD8fWrVst/+/t7Y0WLVpY/r9Xr1549tlnMXnyZDRt2hSTJ08GAGi1WiQlJWHUqFH4z3/+AwBo3rw5PvzwQ5jNZquP9/f29sbFixet+pxERMQx80REVmfNMq/RaNC6dWuMHDkSy5cvBwC89NJLt/Qcvr6+AIB27dpZ3lcxpr/yijthYWG4cuUKjh8/fqexr+Hj48MyT0RkAyzzRERWptFoICJWe74ZM2bAx8cHDz74IKKiolBYWHjHz+nu7n7N+/R6PQDg0qVLd/z8V7Pm+SAiov9hmScisjIvLy8UFxdb7fk6dOiA7du3Y/z48Vi3bh06deqEc+fOWe357cGadyuIiOh/WOaJiKzMmuPDr1y5gnnz5sHb2xszZ87EypUrUVBQgCVLlgAAdDodSktLrXIsW2KZJyKyDZZ5IiIr8/Pzw/nz563yXCKCzz//3DJMZcCAAahfvz7q169v+f8zZ87gq6++wqVLl/DVV1/h7NmzyMnJsWSo+MPiypUrluetuHNQ+Qp/xfCayp9nLYWFhZax+0REZD0s80REVhYaGorDhw9b7fn++OMPjBo1CosWLcLHH3+MZ599Fg8++CAAICYmBhEREXj66afRpUsX+Pn5oXPnzujQoQMWL16MLVu24KuvvgIAfPzxx/jjjz+wbt06fPbZZwCAd999F/v27cOWLVvwxRdfAACmTJli1d1ay8rKkJ+fj+bNm1vtOYmI6E8a4awkIiKrev3117Fy5Urs2rXLKs9nNBphNptx4sQJNG3atNrPOX36NBo0aAAAKC0trXaCq1L27t2Ltm3bYteuXWjfvr3ScYiInMlCXpknIqrB8ePHsWPHDpw8efKWHte6dWscPHgQJpPJKjl0Oh1cXV1rLPIALEUeqH6lGiUdOHAAWq0WLVu2vOnHlJWVIS8vD5s2bbLaeSQickbcNIqIqAZ//PEHunfvDuDPQt2gQQMEBwcjJCQEQUFBaNKkCQIDAy3/DQoKgoeHBzp16oTLly9j165d6NSpk8KvQnlbt25FeHg4PDw8APw5Tv/48eM4evQoCgoKkJ+fj+PHjyM/Px9HjhzBiRMnLGP5GzVqhIKCAiXjExE5NA6zISKqwZUrV+Dt7Y3y8vIq79doNJY12cvLy6usoe7t7Y1GjRohNzcXXbp0wbJlyyyTVWujzMxMPPjgg3Bzc4Ner8eJEydQVlZm+biLiwt0Oh3MZvM151mr1eKvf/2rZfdbIiK6BofZEBHVxM3NDZ07d77m/SKCsrIylJWVXbMZ0sWLF3Hw4EH4+vrC1dW1Vhd54M9dZwsLC1FSUoK8vLwqRR4ATCYTrly5ck2RB/4s8z169LBXVCIiVWKZJyK6jj59+sDV1fWmPlen08HLywvJycl4//33sW3bNqutN69WK1euhIuLC3bt2oXnn38eGo0GOt3NjfA0Go0s80REN8AyT0RUDbPZjF27duHcuXPXXE2+mlb754/SAQMGYP/+/YiLi0NMTAxMJpNlc6faat68eRg8eDBCQkIwffp0ZGZmIjQ0FC4uLjd8rKurK/bt24fjx4/bISkRkTpxzDwREf68CpyVlYXMzExs2LABGzdutGx0dOHChWuG01TQ6/WoW7cuPv/8czz00ENVPjZ8+HAUFRVhzZo19ngJDufkyZMIDg7G/PnzMWLECMv7y8vL8fHHH+Ott96CiMBoNF7zWI1GA39/fxQVFcFoNKJFixbo0aMHevbsiZ49eyI0NNSeL4WIyFFxzDwR1U5GoxG//fYbpk6dir/85S+oX78+unbtig8//BAA8OqrryIrKwvnzp1D69atr3m8i4sLNBoNnnzySRw6dAgPPvggzp8/jyNHjmD37t3YvHkz2rVrh7Vr12LKlCk4c+aMvV+ion799VeMHz8ebm5uqFOnDn799VccOHAAx48fR2lpKRISErB3715ERERY7mxUptfrMXHiRBQVFSEzMxNjxoxBfn4+xo8fj+bNmyMwMBAPP/wwpk2bht9++63GP7aIiJwdr8wTUa1w6dIl7NixA5s2bcKaNWuwceNGlJaWIjAwEN27d0dkZCS6d++OTp06QaPRVHnsiBEjsGzZMst65xqNBq6urvD390dZWRlKSkpw+fLlao+r0WgQGRmJzMxMm79GR3L27FkEBQXhypUr1X5co9HA09MTXl5eMJvNOHPmDESkSilfvXo1Bg0aVOVxRqMRu3btsnwNK+6g+Pj44L777kNUVBQiIyNx//33W1YcIiJyYgtZ5onIKV28eBHbtm2zlL5ff/0VZWVlCA0NtRT3yMhItGnT5obPNXPmTEyYMOGWjq/VauHq6oq//e1vmDlzJg4ePIiQkJDbfTmq8+GHH+Kdd97BI488gtmzZ9/y4zUaDYqLi1GnTp3rfp7JZML+/fstf6StXbsWZ8+ehZeXFyIiIixf6x49esDNze12Xw4RkaNimSci53Dy5En88ssvllK3Y8cOmM1mhIaGWq7W9u7d+7q7qNYkPz8fTZs2hUajuanhHDqdDp6enli1ahW6dOmCli1bom/fvpgzZ87tvDTVKSwsROvWrTF69Gh88MEHmDZtGiZNmgQANz0cJjQ0FIcPH76t4+fk5Fj+iFu/fj3y8vJQp04ddOzY0fJHXK9eveDj43Nbz09E5EBY5olInQoKCrBu3TrLW3Z2NnQ6HTp27GiZJNm9e3f4+/tb5XhvvvkmpkyZcsPP0+v1aNiwIX766Se0atUKAJCWloZHHnkEP//8M3r16mWVPI5swoQJ+Pbbb3HgwAHUq1cPAPCf//wHTz31FETEMlypJn5+fpg3bx6io6Otkic7O9sysXn9+vXIzc2Fq6sr7rvvPvTu3Ru9e/dGt27dLDvUEhGpCMs8EanDyZMnq5T3/fv3Q6fToUuXLujduzd69eqFyMhIeHl52eT4IoLQ0FDk5ubWeHVZp9OhTZs2+OGHH9CwYcMqHxs6dCjy8vKwfft2px7LvX37dtx3332YPXs2nnjiiSofW7t2LYYNG4YrV65Uu4IN8OcfQy+99BL+8Y9/2CxjXl4eNmzYYPm3dPjwYbi5ueH+++9Hnz590Lt3b0RERMDd3d1mGYiIrIRlnogc0+nTp7F161bLsJnt27dDq9WiQ4cOlnHQ/fv3h5+fn82zHDhwAF999RVmzJiB0tLSaq8su7i4oEePHli+fHm1wzcOHjyIe+65By+++CLee+89m2dWwuXLl3H//ffDz88P69evv2YiMQDs3r0b/fv3x7lz56rd9VWj0aBp06Z49tln8cwzz9hlB92CggJs3LjRMjRn37590Ol0uOeeexAVFYWoqCh0796d5Z6IHBHLPBE5hsoTVivGvGs0GkXKO/DnWuhLly7Fp59+ivXr16NVq1Z44okn8Pe///2aFVq0Wi1GjRqFOXPmXPeq+yeffIIXX3wR0dHRSE1NtdtrsYd169YhPj4eR44cwfbt29GiRYsaP/fIkSOIiopCXl5elUKv0+kQGRmJtm3bYt68eSgvL8ejjz6K8ePHo3PnzvZ4GQCqlvuMjAz88ccf15R7TqglIgfBMk9EyqipvLdu3Rrdu3e3lKa6devaNVdBQQFSU1Mxc+ZMHDt2DH379kV8fDyio6Oh0Wjw9NNP4z//+U+VEhofHw+DwVDtlej9+/fju+++w7Jly7B161aYTCb4+vpiz549CA4OtudLs6nU1FTLsBpvb28MGTIEw4YNw6BBg6qdt3Du3DkMHToUWVlZVYbcVCxHefHiRcyfPx+ffvopdu3ahc6dOyMuLg6PP/74DVe4sbbjx49b7hD9+OOPOHLkSJUJtSz3RKQglnkiso/i4mJs3bq1SnkHgLCwMEsh6tevn9UmrN6qjRs3Yvr06Vi2bBnq1q2Lp556Cn/729/QrFmzKp+XlZWFLl26QKPRQKPRYMaMGXj22WctHy8rK8OGDRvw3XffYenSpcjLy4Ner4fRaISIYNiwYTh06BA0Gg02bNig2Ou1pjVr1mDo0KEYN24cvv76a5w/fx5arRZmsxkajQZdunTBQw89hOjo6CpLgZaUlGDkyJFYvXo1TCYTmjRpgiNHjlyzidRvv/2GadOm4dtvv4WnpydGjx6NiRMn4q677rL3SwVQtdz/8MMPyM3NRZ06ddCtWzfLXaSePXvC1dVVkXxEVKsshBAR2UBRUZGsWLFCXnzxRenUqZO4uLiIVquVe+65R1544QVZtmyZnDt3TtGMFy5ckOTkZGnXrp0AkM6dO0tycrKUlJRc93EdOnQQV1dXWbp0qYiInD59WtLS0uTxxx8XT09PASB6vV4AWN60Wq00aNBAzpw5I0ePHpWQkBC5//77pbi42B4v1WZ27dolvr6+MmrUKDGZTLJ8+fIqrxuAaDQay/lo3LixxMXFSXp6upSWlorRaJSxY8cKAJk6dep1j3XixAlJSkqSkJAQ0Wq1EhUVJWlpaWI0Gu30aquXnZ0ts2bNklGjRklgYKAAEC8vLxk8eLBMnTpVtm3bpnhGInJaabwyT0RWUXlnzjVr1mDDhg2WTZoqhsz07dvXslShkvbv34/PPvsMc+bMgdFoRExMDCZPnowOHTrc1OMXL16M4uJiHD16FEuXLrVMztVoNDWu0qLRaLB69WoMHDgQwJ8TYrt3747w8HAsX74cvr6+Vnt99vLLL78gOjoa7du3x6pVqyxXoseMGYPU1NRqJ7gCgKurK8rKyuDu7o4BAwZg2LBhyMnJwaRJk25qwqvZbMbatWsxbdo0rFy5EqGhoRg7dqzdJszeSE5ODjZu3IhNmzZh9erVyM/Pt2xiVfG9UN1Ow0REt4HDbIjo9phMJuzYsQM//fQTfvrpJ2zatAklJSVo0aIF+vXrh379+qFPnz4OUa6AP4e/LF++HLNmzcJPP/2EFi1a4JlnnsHYsWNva6jLq6++iqlTp97URlI6nQ7jxo3DjBkzqrx/7969GDRoEPz8/PD9998jKCjolnMoJSMjA8OHD0fv3r3x7bffVhnHXlxcjDZt2uDYsWM3XFNeo9FAq9Vi48aNiIiIuOUcBw8exOzZs/HFF1/g0qVLGDZsGF544QVERkbe8nPZyr59+7B27Vr89NNPWLduHQoLCxEYGGj5PunXrx+aNGmidEwiUicOsyGim3f48GFJTk6WmJgY8ff3FwASEBAgMTExkpycLDk5OUpHvMbx48clKSlJgoODLUMz0tPTxWw239HzlpeXS+fOna8ZTnP1m06nk+bNm9c4dOePP/6Q1q1bS7NmzeSXX365o0z2YDab5eOPPxadTidPP/20lJeXV/t5mzdvFq1We91zA0BcXFxkypQpd5zr8uXLkpKSIvfcc0+VIVOXLl264+e2JqPRKFlZWZKUlCRRUVHi7u4uACQ0NFTi4uIkLS1Nzpw5o3RMIlKPNJZ5IqrRiRMnJC0tTeLi4iQkJEQAiKenp0RFRUlSUpJkZWXdcSm2lczMTImJiRGdTicNGzaUhIQEOXLkiFWPcejQIalTp45oNJrrltVt27Zd93lOnz4tAwYMEFdXVzEYDA57Ts+ePSvDhg0TnU4n//znP2+Y88033xQXF5caz41er5du3bpZfTx5VlaWxMXFibu7u/j6+kp8fLwcPnzYqsewlvLy8irlXq/Xi1arlbvvvttS7ouKipSOSUSOi2WeiP7n4sWLkpGRIQkJCdK5c2fRaDSi0+mkc+fOkpCQIBkZGXLlyhWlY9aoqKhIkpOTpW3btlWuzl6+fNlmx5w7d+51i/z7779/U89jMplkypQpotPpZNCgQXLo0CGbZb4dS5YskSZNmkhwcLBs2LDhph5TXl4unTp1qvbuhVarFU9PT8nNzbVZZkedMHs9av8eJCK7Y5knqs1KSkokMzNT9VcFf//9d4mPjxcvLy9xd3eX2NhY2blzp82Pe+LECRk0aJDodDrR6XTXXHXu0qXLLRfHTZs2Sdu2bcXd3V0SExNvuLKOrR08eFCGDBkiGo1GYmNj5fTp07f0+N9//13c3Nyu+8eOrcu1yWSSjIwMiY6OFo1GI0FBQZKYmHjLr0UJar47RkR2wTJPVJs403jdK1euSFpamkRFRQkAadWqlSQlJcnZs2ftcvyMjAwJDAyUkJAQycjIkGbNmlUp9O7u7pKdnX1bz11eXi4Gg0F8fHykQYMGkpiYKJ999pndxn9/99138uOPP0p8fLy4ublJq1at5Mcff7zt55s+fXqV8fM6nU7GjRsnycnJ4uHhIb1795b8/HwrvoKaZWdnS0JCgvj7+4ubm5vExMRIRkaGXY5tDdXNW2nQoIFDz1shIptimSdydvv375fp06fLAw88IL6+vgJAAgMDJTY2VubOnSt5eXlKR7wlx44dqzKhNTo6WjIyMux2dbKsrEwSExNFq9VKTEyMnD9/XkREtmzZYimsGo1GZs2adcfHOnHihLz88svi7u4uGo1GXnnlFdm9e/cdP29NysrKZMWKFdK9e3fLH3lz586tcZLrzTKbzdKvXz/R6/Wi0+mkVatWljsOe/bskXbt2omvr6/Mnz/fGi/jplRMmO3QoYNDT5i9HqPRKL/++qskJSVJ//79xcPDQwBI8+bNJS4uThYuXKj4Xg5EZHMs80TO5vz587Jo0SKJi4uTZs2aCQDx9fIYLm0AACAASURBVPWVBx54QKZPny579+5VOuItM5vNkpGRIQ8++KC4uLhIYGCgvP3223L06FG75ti/f7907NhRvL29JTk5+ZqPT5kyRQDIkCFDrHbMhQsXilarFT8/P8vXs2PHjvKPf/xDtm7desdFu7CwUJYvXy7PPfecNGjQQDQajXTq1EkASFBQkNWumB87dkx8fHxEr9fLrl27qnyspKRE4uPjBYDExsbafSOtjRs3yqhRo8TV1VX8/f3lpZdectgJs9dTWloqa9eulTfeeEMiIiLExcVFXFxcJCIiQt5++23ZtGnTHf97ISKHw02jiNTOZDJh586dls2a1q9fD5PJhI4dO1o2qFHr1vIXLlxAamoqZs6cif3796NXr14YP348HnroIej1ertmSU1NxXPPPYewsDDMnz8fLVq0uOZzzGYzRo0aBYPBgEaNGt3xMdPT0/HXv/4VJpMJERER2Lx5MzIzMzFv3jysXLkSBQUF8PHxQWRkJO6++260atUKrVu3RsOGDeHl5WV5u3jxIgoLC1FcXIzc3FwcOHAABw4cwM6dO7F9+3aICNq3b4/hw4cjNjYWbm5uCAwMhEajQbNmzbBp0yYEBgbe8etZunQpjh07hgkTJtT48TFjxiAgIADz58+/6U28rOXkyZOYPXs2Pv/8cxw7dgxDhgzBhAkTMGDAAFVu8FRcXIytW7dixYoVSE9Px5EjR+Dp6YmuXbsiOjoaw4YNw1133aV0TCK6M1xnnkiNCgoKJCUlpcq42UaNGklMTIykpKTYbdy4rVSMa65bt65dJ7RWp7CwUB599FHRaDQSHx9vt5VEfvzxR8uEZI1GIw8//PA1n/P777/LzJkz5fHHH5d7771XvL29b7iuOwBp3Lix9O3bV55//nlZtGjRNfMkTCaTZUlJvV4voaGhUlBQYJfXnZeXJz179hR3d3fFlum8esJsixYt7Dofw1Yqj7evGHJXeb5MYWGh0hGJ6NbxyjyRGpSUlGDz5s2Wq+/bt2+Hu7s7IiMjnWZ7eLPZjLVr12LatGlYuXIlQkNDMXbsWIwZMwb16tVTJNO2bdswatQoXLp0CXPnzsWgQYPsctyNGzeif//+KC8vh8lkgqurK8aPH49PPvnkho8tKCjA6dOnUVxcbHnz8fGBn58fvLy8EBQUBG9v7xs+T/369XH27FkAgF6vR4sWLZCZmWmXr4XZbMa///1vvPzyy+jTpw9SUlKscqfjdmRnZ2POnDlITk5GaWkpYmJi8NJLL6F9+/aK5LEWo9GIXbt2YcWKFfjuu++wY8cOaDQadOjQAVFRUYiOjka3bt2g1WqVjkpE18cr80SO6vDhw2IwGCQqKsqytF9oaKjEx8dLRkaGTddOt6fz58+LwWCQZs2aOcxa4EajUZKSkkSv18vAgQPtdlVa5M+dUz08PKpstuTm5iZJSUl2yyAicvfdd1+z1GabNm3sOqFy27Zt0rx5cwkICJBVq1bZ7bjVuXDhgiQnJ0ubNm0sE2ZTUlKkrKxM0VzWcvr0acsSmMHBwQJA6tevb1klR20T5YlqEU6AJXIUp06dsvwyDQoKumbJOXst3Wcvv/32m8TFxUmdOnXEx8dH4uLiZN++fUrHktzcXOnRo4ciwzy2b98u3t7e1+ya6uLiInPnzrVbDhGRgQMHVrtj67333isXLlywW46ioiJ57LHH7D7M6Xoq7y4cGBgoCQkJdp+MbWsVQ3Kio6OvWcI2PT3daS4mEDkBlnkipVRs456YmHjNTo+JiYmSlZUlJpNJ6ZhWdfXa8GFhYWIwGOy+eklNFi9eLP7+/hIeHm73Mfo7d+4UHx+fa4p8xdsPP/xg1zzPPPPMNRthVRT6+++/Xy5evGjXPCkpKeLl5SX33nvvba/fb21Hjx6VxMREqV+/vri6uqpuzfqbVVJScs2utB4eHty4isgxsMwT2VPlCWg+Pj7XTEBTw26rt+P48eOSlJQkQUFB4uLiYve14W/k6qUR7b3W+P79+8Xf37/a8lzxdvVyjrb2xhtv1Lhzq06nkz59+tj96mxOTo5ERETUuDSoUkpLSyUtLU26du0qAKRTp06qW7P+VlTelTYwMFAASMOGDS13EY8fP650RKLahGWeyJaKi4tl+fLlMm7cOGnatKkAkLp168qIESNk1qxZcuTIEaUj2lRWVpbExsaKXq+Xhg0bSkJCguTm5iodq4rdu3dL27Ztxc/PTxYsWGD342dnZ0v9+vWvW+QByKlTp+yaa8aMGaLX62vMo9PpJCoqSkpLS+2aq7y83LJp14gRIxxuU6SsrCyJi4sTd3d38fPzk/j4ePnjjz+UjmUzRqNRtm3bJu+99550795ddDqdaLVaue++++Sdd96RX375xenuMBI5GJZ5ImvLzs4Wg8EgAwYMEDc3N8svtsTERNmyZYuiEzvtoWJnzfbt21fZWdPRxtiazWYxGAzi5uYmffr0UWTM86FDhyQgIKDGoTWVx8zbuxAtWrTohktcajQaeeCBBxTZiGjNmjXSuHFjCQkJkczMTLsf/0ZOnDghSUlJ0rRpU8vE7vT0dIe5G2UrRUVFsnTpUhk3bpw0adJEAEhAQIA88cQTsmDBAsuOyURkNSzzRHeqvLxcMjMzJSEhwbICiKenp0RHR0tycrIcO3ZM6Yh2cfDgQUlISBB/f39xc3OTmJgY2bx5s9KxqnXy5EkZOnSo6HQ6SUxMVOwPrPXr10tERIRlLHpNpblBgwZ2z7Zp06brFnm9Xi86nU6eeuopOXHihN3zifw5aTw6Olrxr+P1GI1GSU9Pl6ioKNFoNNKyZUtJSkpyuDsKtlJ5VS5XV1dxcXGpMi/I2f+4IbIDlnmi23HixAnLpk2Vx75XLBvpCCtu2EPF5joxMTHi4uIiQUFBkpiYKKdPn1Y6Wo0yMjIkMDBQQkJCZOPGjUrHEZE/h2aMGjVKtFpttaW+Xbt2ds90+PDhGu8SeHh4SHx8vEOssGQ2myU5OVk8PDykd+/eDpGpJr///rvEx8eLp6eneHt7S1xcnOzevVvpWHZTXFws6enpEhcXJ40bN7aMtY+NjXXqOUNENsYyT3QzjEbjNSvP1KlTR6KiosRgMDjcOHBbKywslOTkZAkLCxMAEhkZKWlpaYoMt7hZZWVllrHWMTExDnm7f9q0aZYr3pVL/eDBg+2epaSkpMpwGq1WK97e3uLh4eGQyzDu2bNH2rVrJ76+vjJ//nyl41xXUVGRJCcnS3h4uGq+f2xhz549kpSUJJGRkaLVakWn00lkZKRlhRwiuiks80Q1qdhEJTY2VurWrXvNOsv2nvjnCHbs2CFxcXFVrizu2bNH6Vg3tH//funYsaN4e3tLamqq0nFqFBERIcOHD5djx47JK6+8Ip6engJAxowZo0ieiuO3aNFCZs+eLadOnRJfX1/54IMPFMlzI1evSuQoS57W5Oo7W40bN3b4O1u2cqOft44254bIgbDME1UwmUySlZVV45WivXv3Kh1REWVlZVXWhm/VqpUkJSU55JXt6lSsT96lSxc5ePCg0nFqtHbtWgFQZZ5BUVGRTJ06VWbMmKFIpieffFKWL19eZfLt5MmTJSgoyKGHki1ZskT8/f0lLCxMduzYoXScm3Lo0KFr5pxs2rRJ6ViKqO5OaMW69rXxTijRDbDMU+128eJFjuGsQUFBgSQlJUmTJk1UuRpHYWGhPProow61c+j1DB48WHr16qV0jBvKz88XV1dX+eqrr5SOcl15eXnSs2dPcXNzk6SkJNUsj1ixGtQ999xTZTWokpISpaMphnOUiK6LZZ5qn8qrK+j1eq6ucJXNmzfLI488Inq9Xho0aCCvvfaa5OXlKR3rlmzZskVCQ0OlYcOGsnr1aqXj3NB///tf0Wg0snLlSqWj3JTRo0dLWFiYwxdkk8kkBoNB9Hq9DBgwQAoKCpSOdEvWrVsnMTExotPppGHDhvL222/X+g2ZbrR6mCPO5yCyMZZ5cn6XLl2SZcuWydixYyU4OJjrHlejtLRUUlNTpUuXLpargSkpKaqbF2A0GiUpKUn0er088MADqhl7/Pjjj0vbtm1V84fkvn37RKvVSnp6utJRbsq2bdukefPmEhAQIKtWrVI6zi07evSovPHGG9KgQQNxdXWVUaNGydatW5WO5RCys7Plk08+kf79+1v29ejSpYskJibKr7/+qprvKaI7wDJPzunYsWOSnJws0dHR4uHhwR0Ja1CxsU1wcLC4uLhIdHS0ZGRkKB3rtuTm5kqPHj3E3d1dDAaDan6J5+fni16vl5SUFKWj3JKhQ4dKjx49lI5x04qKiuSxxx5TzbCr6ly5ckXS0tKka9euDr0hm1KKi4tl2bJlEhcXZ7lw07hxY4mLi5PvvvuuVg9VIqfGMk/O4+plzipPmOKt16oqbznfoEEDSUhIUPWkssWLF4u/v7+Eh4fLzp07lY5zSyZOnCjBwcGqK5fr1q0TAKqbpFkxIbpz586SnZ2tdJzblpWVJbGxsaLX66Vhw4aSkJDg0GvsK6Hy74SrJ9HWls38qFZgmSf1qjx2slWrVpadMismr168eFHpiA6l4qpeZGSkAJCOHTtKcnKyXLp0Selot+3qpQjV9lrOnTsnXl5e8tFHHykd5bZ07dpVHnroIaVj3LKcnByJiIgQDw8PMRgMSse5I8ePH5fExETLEJyYmBiH2QzNkZw6dcoyidbLy0u0Wi3nSpGzYJkndTl79qxlLWJfX18BIHfffbckJCRIZmYmh89U4+TJk5ahNFqtVtVDaSrbvXu3tG3bVurXry/Lly9XOs5tee+998TX11cKCwuVjnJbFi1aJBqNRpXLtpaXl1s2ERsxYoScO3dO6Uh3pLS0VNLS0iQiIoJDcG6gpKREMjIyJD4+3rKKWbNmzSxr2qvtLhnVeizz5Piys7PlX//6l/To0UNcXFzEzc1NBgwYIDNmzFD10BBb++233yQuLk48PDzEz89P4uPj5ciRI0rHumNms1kMBoO4ublJnz59VDuEqrS0VBo1aiSvv/660lFum8lkkvDwcMU2tbKGNWvWSOPGjSUkJEQyMzOVjmMVFUNwKlbBSUhIUO33ia2ZTCbZsmWLvPbaa9K2bVsBIH5+fvLII4/IN998o9o/tKlWYZknx2M2m+XXX3+V119/Xdq0aSMApF69ejJ69GhZuHChXLhwQemIDuvqDZ46dOig+qE0lZ08eVKGDBkiOp1OEhMTxWg0Kh3ptn322Wfi5uam+qUGP//8c3Fzc1P1GORTp05JdHS0U/y7qqxiCE79+vUtQ3DUNsfB3iqWLu7Xr5/o9XpxdXWVAQMGyGeffabqf+Pk1FjmyTEYjUbJzMyU+Ph4adKkiQCQpk2bWm57lpWVKR3RoVUMpanY4KliKI0zjQPNyMiQwMBACQkJUf2YYKPRKC1btpRx48YpHeWOlZaWSmBgoLz66qtKR7kjZrNZkpOTxcPDQ7p27So5OTlKR7Ka0tLSazaiSklJ4c/VGzh37pxlWGfFZlV33323ZZw9kYNgmSfllJSUWHZfDQgIuGb8uzMVUVupPJTG19fXaYbSVFZWVmYZ2xwTE+MU+wKkpaWJVquVffv2KR3FKqZMmSI+Pj5OMSRhz5490q5dO/H19ZX58+crHcfqKg/BadSokSQkJPCK8024fPmyZZx9YGBglV1oOV+LFMYyT/Z15syZGlcU2L9/v9LxVMFoNEp6erplKE3r1q3FYDA4zVCayvbv3y8dO3YUb29vSU1NVTqO1URERMjw4cOVjmE1hYWF4uvrKx988IHSUazi6lWSiouLlY5kdceOHZPExESpV6+eZQjOli1blI6lCiaTSbKysiQxMVFat25dZSW19PR01W22R6rHMk+2l5OTIx999JH07NlTXFxcxN3dXaKjo+XLL7+UU6dOKR1PNU6dOiVJSUnStGlTpx1KU1nFeuBdunSRgwcPKh3HatauXSsAZPPmzUpHsarJkydLUFCQU60EsmTJEvH395ewsDDZsWOH0nFsomIITvv27TkE5zbt2rVL3n33XenYsaMAEF9fX3n00UdlwYIFnONF9sAyT7aRk5MjBoPBslmHn5+fxMTESEpKihQVFSkdT1W2b99+zVCaP/74Q+lYNlNYWCiPPvqoqnfqvJ5BgwZJr169lI5hdfn5+eLq6ipz5sxROopV5eXlSc+ePcXNzU2SkpKcejhFZmamxMTEiE6nk8DAQElMTOQFl1uUm5tr2X1cr9dbLl4lJyfLyZMnlY5Hzollnqxn586d8tZbb1lWoAkICJC4uDj54YcfeJXnFplMpipDaVq1aiUGg8Epb/dXtmXLFgkNDZWGDRvK6tWrlY5jdf/9739Fo9HIypUrlY5iE6NHj5awsDCnK7wmk0kMBoPo9XoZMGCAFBQUKB3JpnJyciQhIUHq1asnbm5uHIJzm06fPi1ffvmlDBkyRFxdXUWv18vAgQNZ7MnaWObpzuzZs0cSExMlLCxMAEhwcLBlBZry8nKl46nO+fPnxWAwWIbSREVFSXp6utMOpalgNBolKSlJ9Hq9PPDAA3LmzBmlI9nE448/LuHh4U779dy3b59otVpJT09XOopNbNu2TZo3by4BAQGyatUqpePY3OXLlyUlJUXatWtXZQgOf7bfukuXLkl6errExsZa5otFRkaKwWDgHgB0p1jm6daYTCbJzMyUhIQEadGihQCQkJAQy4x+Zy0ptrZjxw6Ji4uTOnXqiI+Pj8THxzvV0njXk5ubKz169BB3d3cxGAxO+28oPz9f9Hq9pKSkKB3FpoYOHSo9evRQOobNFBUVyWOPPea0w8BqUjEEx8XFxTIE5/Tp00rHUqWKldyu3sk8MTFRDhw4oHQ8Uh+WebqxymvAV2x9XXlJLmctX7ZWXl4uCxculJ49e1p+mH/22WdOP5SmssWLF4u/v7+Eh4fLzp07lY5jUxMnTpTg4GCnL3/r1q0TAE6/OVHFBO3OnTtLdna20nHs5tChQzJp0iTx9fUVDw8PGTNmjOzatUvpWKpVWlpqWfKy8hLNiYmJsnfvXqXjkTqwzFP1ysvL5YcffpAxY8ZIvXr1BIB07NhR3nvvPadZG1spRUVFYjAYJCQkpFYNpans6qX/nHFZzcrOnTsnXl5e8tFHHykdxS66du0qDz74oNIxbC4nJ0e6du0qHh4eYjAYlI5jVxcvXpTk5GRp27atAJDIyEhJS0tzmt1zlVBWViY//vijjBs3Tho2bCgApE2bNvLuu+/K77//rnQ8clws8/Q/RqNR1qxZI+PGjZP69esLALn33nvlX//6lxw+fFjpeKp38OBBef7558Xb21u8vb0lPj5eDh06pHQsu9u9e7e0bdtW6tevL8uXL1c6jl2899574uvr6xSbKt2MRYsWiUajqRVXFsvLyy2bmo0YMULOnTundCS7MpvN8v3338vAgQNFo9FIy5YtZcaMGbXqDqMtGI1GWb9+vUyYMMGySVX79u3l/fffr1V3guimsMzXdhVj4Cvvasexe9ZVeaxp48aNJTExUc6ePat0LLszm81iMBjEzc1N+vTpU2smfV2+fFkaNWokr7/+utJR7MZkMkl4eLiMGTNG6Sh2s2bNGmncuLGEhIRIZmam0nEUkZ2dLfHx8VXm/jjbjtRK4O9pugGW+dqo8g+GijHwFT8YuAurdVy5ckXS0tIkIiKCq0CIyMmTJ2XIkCGi0+kkMTGxVt2K//TTT8XNzU2OHz+udBS7Sk5OFjc3Nzl27JjSUezm1KlTEh0dXSv/nVdWWFgoBoNBmjRpYtngbuPGjUrHcgqVf383atSoyu9vXrGvtVjma5M9e/ZIQkKCBAUFVfkBwDHw1lPxSyw4ONjyS8zZJwLeSEZGhgQGBkqzZs1q3S90o9EoLVu2lHHjxikdxe5KS0slMDBQXn31VaWj2JXZbJbk5GSpU6eOdO3atdasSlUdXtSwrcqLU1SMsa/4ve5Mu2bTDbHMO7tt27bJpEmTpEmTJpwlb0PV3V7Ozc1VOpaiysrKLGOJY2Ji5Pz580pHsru0tDTRarW19g/mKVOmiI+PT62ZK1DZnj17pF27duLr6yvz589XOo7isrKyJDY2tsrusrVxuKGtlJeXy/fffy9PP/20+Pv7i0ajka5du8onn3xSq+6O1VIs885o37598tZbb1nWgW/durW89dZbsnv3bqWjOZ3MzEyJjo4WjUYjzZs3rxW7tN6M33//XTp27Cje3t6SmpqqdBzFREREyPDhw5WOoZjCwkLx9fWVDz74QOkoirh61Sb+bBA5fPiwJCQkiJ+fn3h5eUlcXBxXarGysrIyWblypTz55JPi5+cnWq1W+vTpI1988UWtm6BdS7DMO4v8/HwxGAwSGRkpACQoKMiyDjxZV2lpqaSkpHBJthqkpKSIp6endOnSpVbf6l27dq0AkM2bNysdRVGTJ0+WoKAgp19f/3qWLFki/v7+EhYWJjt27FA6jkO4cOGCGAwGadasWa1dotceSktLLRtUeXp6iouLi0RFRUlKSopcvHhR6XhkHSzzanbu3DlJSUmRqKgo0Wq1UrduXYmNjZX09HQWSxs4ceKEJCYmSv369cXV1VViY2O5WUolhYWF8uijj9a6nTFrMmjQIOnVq5fSMRSXn58vrq6uMmfOHKWjKCovL0969uwpbm5ukpSUJCaTSelIDsFkMkl6erpERUUJAOnQoYMkJyfL5cuXlY7mdEpKSiQtLU2io6NFr9eLh4eHREdHS1paWq3/ea1yLPNqU/mb0dXVld+MdrBjxw6Ji4sTd3d3CQgIkISEhFqzrOLN2rJli4SGhkrDhg1l9erVSsdR3H//+1/RaDSycuVKpaM4hNGjR0tYWFitL7Amk0kMBoPo9Xrp37+/FBQUKB3Jofz2228SGxsrer1eGjZsKImJiXL69GmlYzmls2fPWi4GajQay8XAjIyMWv99qkIs82pQ+TaZl5dXldtkFy5cUDqeU7r6alGrVq3EYDBISUmJ0tEcitFolKSkJNHr9fLAAw/ImTNnlI7kEB5//HEJDw/nkIH/t2/fPtFqtZKenq50FIewbds2ad68uQQEBMiqVauUjuNwCgoKJDExUerVqydubm4SGxvLOV82lJeXV2WYbnBwMIfpqgvLvKMym82yYcMGGTt2rNStW1e0Wq306tVLkpOTuQKADVVsUR4WFiYajYbjOK8jNzdXevToIe7u7mIwGHiO/l9+fr7o9XpJSUlROopDGTp0qPTo0UPpGA6jqKhIHnvsMQ5Lu47Lly9LSkqK3H333Zb5Sfx5bFu7d++W119/Xe666y4BIOHh4TJlypRavzqbg2OZdzQHDx6Ut99+2/KN1L59e/nwww8lPz9f6WhO7dixY5KYmCj+/v7i7u4usbGxXL7zOhYtWiT+/v4SHh4uO3fuVDqOQ5k4caIEBweznF1l3bp1AqDW77twtZSUFPHy8pLOnTtzN88amM1mycjIsKwc1rJlSzEYDHLp0iWlozkts9ksmzZtkgkTJkj9+vVFq9VK7969Zc6cOVJUVKR0PKqKZd4RnD9/vsrYtcDAQImPj5esrCylozm9ymsfN2rUSBITEzlU5DquXmqPv0yrOnv2rHh5eclHH32kdBSH1LVrV3nwwQeVjuFwcnJypGvXruLh4SEGg0HpOA7twIEDEh8fLx4eHuLr6yvx8fGSl5endCynZjQaJSMjQ2JjY6VOnTri7u4uMTExkp6eLmVlZUrHI5Z55fCbQzkV4+G7desmAKRTp06SkpLC834Du3fvlrZt20r9+vVl+fLlSsdxSH//+9/F19e3Vm6SdDMWLVokGo2Gd72qUV5ebtlkbcSIEVwP/AZOnjwpSUlJEhQUJHq9XmJiYmTLli1Kx3J6V198rFevnsTFxXF8vbJY5u1tz549kpCQIAEBAaLVaiUyMlKSk5N528oOioqKxGAwSNOmTUWr1Up0dLRkZGQoHcvhmc1mMRgM4ubmJn369OFKPjW4fPmyNGrUSF5//XWlozgsk8kk4eHhMmbMGKWjOKw1a9ZI48aNpWnTpixIN+HKlSuSlpYm9913nwCQzp07S0pKipSXlysdzenl5uZKUlKSZYPKih3mjxw5onS02iZNIyKCq5jN5qvfRVfRarU3/bnHjh3DokWL8NVXX2HXrl1o3bo1HnnkEYwePRqhoaHXfD7P/43dyvk/fPgwpk+fjtmzZ0Or1eKpp57CpEmT0KxZs2o/n+f/f06ePIknn3wSP//8M9599128/PLL0Gq1t3T+b5Vaz39ycjImT56Mw4cPo1GjRjY9lprP/xdffIEXXngBhw8fRmBgoE2PZSu2Pv+nTp3C008/jYyMDCQmJiIhIQEuLi42O6ba1HT+N27ciOnTp2PJkiVo2rQpxo0bh7i4ONStW/emn1utP3/sqbrz/9tvvyE1NRXffPMNzp07h65du2L06NEYNWoUvLy8bvq5ef5vrJrzv/CaK/MLFiwQAHy7wduNlJaWyrfffiv9+/cXrVYr9evXlwkTJsi2bduu+zief+ucfxGRDRs2yLBhw0Sr1UpoaKgYDIYbLuXJ82+98387eP55/tXwxvPv2Of/wIEDMn78ePH09BRfX195+eWXb+qOIs//nZ//y5cvV9mYysfHR+Li4uSXX37h+bfd+U/ToQYLFy6s6UO12pYtW/Dxxx/X+PF9+/bhyy+/xLx583D+/HkMHjwYixcvxpAhQ+Dq6nrTx+H5r96Nzr/ZbMbSpUvx4YcfYuvWrejWrRsWLVqEBx544JaupvH8V+9G599aeP6rx/OvLJ5/Zd3s+W/VqhVmzpyJ999/H8nJyZg+fTqmTZuGRx99FC+++CLatWt33cfz/FfvZs6/u7s7YmJiEBMTg1OnTuHrr7/Gl19+iVmzZuGee+7BmDFj8Nhjj133bgnPf/Wud/5rLPMjRoywWSA1q+4W0JUrV7Bwi/xR3QAAFuZJREFU4UJ89tln2Lx5M0JDQzFx4kQ8+eSTCAoKuq3j8PxXr6ZbcFeuXMGCBQvwz3/+E9nZ2RgyZAgyMjIQFRV1W8fh+a+evW6B8vxXj+dfWTz/yrrV81+3bl28+uqrmDx5Mr799lv861//Qvv27REZGYmEhARER0dDo9Fc8zie/+rd6vkPCAjApEmTMGnSJGzZsgVffvklXnvtNbz88suIiYnBc889h/vvv/+ax/H8V+965992A/9qgaNHj+LNN99E06ZN8fTTTyMoKAhr1qzBwYMH8cYbb9x2kaebd+bMGUydOhV33XUX4uLi0KVLF+zZswcrVqy47SJPRETOw9XVFaNHj8aePXuQmZmJunXrYtiwYejYsSNSU1NRXl6udESn17VrV8yePRvHjx/H9OnTsWfPHkRERKBLly6YO3cuSktLlY6oaizzt2nEiBG46667MHv2bIwbNw5HjhxBWloa+vXrZ9PJUfSnnJwcvPDCCwgJCcE///lPxMTEICcnB6mpqQgPD1c6HhEROaDu3btjxYoV2L59O9q3b49nnnkGLVu2xNSpU3Hp0iWl4zk9b29vjB07Ftu3b8emTZvQsmVLjBs3Dk2aNMHXX3+tdDzVqnGYDV1fQUEBUlJSMGLEiFsaC0/W0bJlSzRr1gxTp07FU089BU9PT6UjERGRSlRclX/nnXfwySef4L333lM6Uq3TrVs3dOvWDSdOnMCsWbOQlpamdCTV4iXk27Rp0yaMGjWKRV4h3377LbKzszFhwgQWeSIiui2hoaH497//jdzcXERHRysdp1Zq1KgR3n77bbz99ttKR1EtlnlSpZiYGK67TEREVlGvXj389a9/VToG0W1hmSciIiIiUimWeSIiIiIilWKZJyIiIiJSKZZ5IiIiIiKVYpknIiIiIlIplnkiIiIiIpVimSciIiIiUimWeSIiIiIilWKZJyIiIiJSKZZ5IiIiIiKVYpknIiIiIlIplnkiIiIiIpVimSciIiIiUimWeSIiIiIilWKZJyIiIiJSKZZ5IiIiIiKVYpknIiIiIlIplnkiIiIiIpVimSciIiIiUimWeSIiIiIilWKZJyIiIiJSKZZ5IiIiIiKVYpknIiIiIlIplnkiIiIiIpVimSciIiIiUimWeSIiIiIilWKZJyIiIiJSKZZ5IiIiIiKVYpknIiIiIlIplnkiIiIiIpVimSciIiIiUimWeSIiIiIilWKZJyIiIiJSKZZ5IiIiIiKVYpknIiIiIlIplnkiIiIiIpVimSciIiIiUimWeSIiIiIilWKZJyIiIiJSKV1NHzCbzfbMcV1r165F3759lY4BABARuxyH5796PP/K4vlXFs+/snj+lcXzryyef2Vd7/zXWOZd/q+9O42Nqu7iOH7a0rEwpbRoA0rUWCI0CmLdICTI4o64ocQlEZeoUTQYY1wTE2Oi0RhF4xoT48LqVCTuQkpFa6KAiEVRCta1KpEaCp2WLnTO88LQ2KctMtM7/c2l38/LW+Z/T7+8OWVuh5yctAyDA0N/Lfpr0V+L/lr016K/Fv2T122Znzx5spWXlytm6VF5ebnFYjF76qmnbNSoUepx0o7+WvTXor8W/bXor0V/Lfr3gWe40aNHu5n5Aw88oB5lQKK/Fv216K9Ffy36a9FfK0T9Y1nu/fQQVAq++uorO/nkk83M7KijjrJffvlFPNHAQn8t+mvRX4v+WvTXor9WyPqXZ/Sn2SxZssQikYiZmf3666+2bt068UQDC/216K9Ffy36a9Ffi/5aYeufsct8IpGwRYsWWVtbm5mZRSIRW7p0qXiqgYP+WvTXor8W/bXor0V/rTD2z9jHbD755BObNm1al2uHHXaYbd++nd907gf016K/Fv216K9Ffy36a4Wwf+Y+ZrN06dLOtzj2qa+vtzVr1mgGGmDor0V/Lfpr0V+L/lr01wpj/4xc5tvb223p0qWdb3Hsk5uba0uWLBFNNXDQX4v+WvTXor8W/bXorxXW/hm5zK9cudJ2797d7Xp7e7vFYjFrbW0VTDVw0F+L/lr016K/Fv216K8V1v4ZucwvWbLEcnNze/xaU1OTffTRR/080cBCfy36a9Ffi/5a9Neiv1ZY+2fcMt/c3GwrVqyw9vb2Hr+ek5Njixcv7uepBg76a9Ffi/5a9Neivxb9tcLcP+OW+bfffnu/b2Ps3bvX3n777R7fBkHf0V+L/lr016K/Fv216K8V5v4Zt8wvXrzYsrKy9vtn2tra7J133umniQYW+mvRX4v+WvTXor8W/bXC3D+jlvmdO3faqlWrLJFI/OefzfQP8A8j+mvRX4v+WvTXor8W/bXC3n+QeoB/Gzx4sG3ZsqXLtffff9/mz59vtbW1Xa5n6Af3hxr9teivRX8t+mvRX4v+WmHvn1HLfF5enpWUlHS5NmLECDOzbtcRPPpr0V+L/lr016K/Fv21wt4/ox6zAQAAAHDgWOYBAACAkGKZBwAAAEKKZR4AAAAIKZZ5AAAAIKRY5gEAAICQYpkHAAAAQoplHgAAAAgplnkAAAAgpFjmAQAAgJBimQcAAABCimUeAAAACCmWeQAAACCkWOYBAACAkGKZBwAAAEKKZR4AAAAIKZZ5AAAAIKRY5gEAAICQYpkHAAAAQoplHgAAAAgplnkAAAAgpFjmAQAAgJBimQcAAABCimUeAAAACCmWeQAAACCkWOYBAACAkGKZBwAAAEKKZR4AAAAIKZZ5AAAAIKRY5gEAAICQYpkHAAAAQmqQeoB9/vjjD6uqqrLNmzfb1q1b7aeffrKGhgbbuXOnRSIRKykpsWg0asOHD7exY8famDFj7MQTT7TJkyfbkCFD1OOHHv216K9Ffy36a9Ffi/5aB0P/LHd31c3XrVtnixcvtpUrV1pNTY3l5uba6NGjrbS01EpKSqyoqMii0ahFo1FraGiwpqYm27Fjh9XU1NjWrVutrq7OIpGITZw40S6++GK76qqrbOTIkapvJ3Tor0V/Lfpr0V+L/lr01zrI+peb97Pm5mZ/5plnvLS01M3MS0tL/b777vNVq1Z5U1NTUmf9/vvvvnDhQr/22mu9sLDQc3JyfObMmb569eo0TR9+9Neivxb9teivRX8t+msdxP1j/bbMt7a2+uOPP+4jRozwwYMH+y233OJr164N7Pw9e/Z4LBbzGTNmuJn5pEmTvKKiIrDzw47+WvTXor8W/bXor0V/rQHQv3+W+YqKCi8tLfUhQ4b4Pffc49u3b0/r/T7//HOfOXOmm5lffvnl/vvvv6f1fpmO/lr016K/Fv216K9Ff60B0j+9y3xLS4vPmzfPzcwvuugi//nnn9N5u27ee+89Lykp8cLCQn/zzTf79d6ZgP5a9Neivxb9teivRX+tAdY/fct8bW2tl5WV+bBhwzwWi6XrNv+pubnZb775Zjczv+2227ytrU02S3+ivxb9teivRX8t+mvRX2sA9k/PMr9x40YfOXKkl5WV+Q8//JCOWyTtjTfe8Pz8fD/vvPM8Ho+rx0kr+mvRX4v+WvTXor8W/bUGaP/gl/mqqiofNmyYn3nmmd7Y2Bj08X2ybt06Ly4u9kmTJnlDQ4N6nLSgvxb9teivRX8t+mvRX2sA9w92mf/666992LBhPnv2bG9tbQ3y6MBs2bLFR40a5VOnTvU9e/aoxwkU/bXor0V/Lfpr0V+L/loDvH9wy/xPP/3kI0eO9DPOOMNbWlqCOjYtNm3a5EVFRX7ppZd6IpFQjxMI+mvRX4v+WvTXor8W/bXoH9Ay39ra6qeeeqpPmDDBd+/eHcSRaffpp596bm6uP/roo+pR+oz+WvTXor8W/bXor0V/Lfq7e1DL/O233+5Dhw71mpqaII7rN0888YQPGjTIP/vsM/UofUJ/Lfpr0V+L/lr016K/Fv3dPYhlvqqqyrOysnzRokVBDNSvEomEz5o1y4899tiMf2umN/TXor8W/bXor0V/Lfpr0b9T35b59vZ2nzBhgp999tl9HUTmt99+8/z8fH/ooYfUoySN/lr016K/Fv216K9Ffy36d9G3Zf7ZZ5/1vLw837ZtW18HkXrsscd88ODBXldXpx4lKfTXor8W/bXor0V/Lfpr0b+L1Jf5trY2P/roo33+/Pl9GSAjtLS0+KhRo/yOO+5Qj3LA6K9Ffy36a9Ffi/5a9NeifzepL/Mvv/yyRyIR//XXX/syQMZ46qmnfMiQIf7XX3+pRzkg9Neivxb9teivRX8t+mvRv5vUl/lTTjnF586dm+rLM05TU5MXFRX5448/rh7lgNBfi/5a9Neivxb9teivRf9uUlvmv/vuOzczr6ysTPXGGemmm27y448/Xj3Gf6K/Fv216K9Ffy36a9Ffi/49imVbChYvXmxHHXWUTZ06NZWXZ6y5c+fa5s2b7ZtvvlGPsl/016K/Fv216K9Ffy36a9G/Zykt8ytXrrQLLrjAsrNTenmnH3/80a6//nqrq6vr0zlBmTx5shUXF9uqVavUo+wX/bXor0V/Lfpr0V+L/lr071nSNXbt2mUbN2606dOnp3TDf/vqq6/slVdeyZifBLOysmzatGn28ccfq0fpFf216K9Ffy36a9Ffi/5a9N+PZB/M+fDDD93MfMeOHak+29NFUOcE5bnnnvNhw4Z5IpFQj9Ij+mvRX4v+WvTXor8W/bXo36vkn5n/7rvv7PDDD7fDDjsstZ8e/k9Q5wRl3LhxtmvXLvvzzz/Vo/SI/lr016K/Fv216K9Ffy369y7pZb6mpsbGjh2b9I16kkgk7OOPP7b169d3Xvvtt9/s6aeftkQiYd9++609/PDDtnDhQkskEl1eG4/HbdGiRfbAAw9YLBazXbt2BTLTvu+tpqYmkPOCRn8t+mvRX4v+WvTXor8W/fcj2X/LP+ecc/z6669P9mXdbN682S+77DI3M3/hhRfc3f2dd97x4uJiNzNfsGCBX3fddT5r1iw3M3/kkUc6X/v999/7zJkzvbq62tvb2/3KK6/0Qw891Gtra/s8l7t7NBr1l19+OZCzgkZ/Lfpr0V+L/lr016K/Fv17lfznzE+aNCmw//Z306ZNXWK6u997771uZl5RUdF57aSTTvKTTz7Z3d337t3rJ554or/00kudX9+wYYNHIhF/9913A5nriCOO8AULFgRyVtDor0V/Lfpr0V+L/lr016J/r2KDkv2X/Hg8bkOHDk32ZT065JBDul0bPHiwmZmVlpZ2XjvuuONs5cqVZmb2wQcf2Ndff23nn39+59dPOukka2xstEgkEshcQ4cOtcbGxkDOChr9teivRX8t+mvRX4v+WvTvXdLPzLe3t1tubm7SN+qLnJwcc3czM6uurrZoNGrFxcVd/kxQIc3++Utua2sL7Lwg0V+L/lr016K/Fv216K9F/94lvcxHo1GLx+NJ3ygoiUTCmpqa0vpZqI2NjZafn5+28/uC/lr016K/Fv216K9Ffy369y7pZX7o0KHSmOPHjzczsyVLlnS5/vfff9uKFSsCucfu3butoKAgkLOCRn8t+mvRX4v+WvTXor8W/XuX9DPzxcXFtn379qRv1JPW1lYzM6uvr++8tnv3bjOzLm8z1NfXW2trq7m7XXjhhVZWVmavvfaa5eXl2Zw5c2zTpk22Zs0ai8VigczU0NCQcZ8/ug/9teivRX8t+mvRX4v+WvTfj2R/Zfb+++/38ePHJ/uybr744ovOjwYaN26cv/fee75mzRovKSlxM/MbbrjB//zzT1+6dKkXFBS4mfmDDz7o7e3tXldX52eddZZnZWV5VlaWT5s2zevq6vo8k7v7t99+62bm1dXVgZwXNPpr0V+L/lr016K/Fv216N+r5D+a8rXXXvO8vDzfu3dvsi8N3M6dO/3vv/8O9Mzly5d7dna2Nzc3B3puUOivRX8t+mvRX4v+WvTXon+vYkk/M19WVmYtLS1WXV2d/NsAASssLLThw4cHeubatWuttLS08yOKMg39teivRX8t+mvRX4v+WvTvXdLL/Lhx42zEiBFWWVmZ9M3CoLKy0mbMmKEeo1f016K/Fv216K9Ffy36a9G/d0kv81lZWTZ16lSrqKhI6YaZrL6+3jZu3GjTp09Xj9Ir+mvRX4v+WvTXor8W/bXo37ukl3kzs4svvthWr14d2G8VZ4pYLGZ5eXl29tlnq0fZL/pr0V+L/lr016K/Fv216N+zlJb5Sy65xPLz823ZsmUp3TRTLVy40GbPnp2x/2HCPvTXor8W/bXor0V/Lfpr0b8Xqf7W7Y033uilpaXe0dGR6hEZZcOGDW5mXlFRoR7lgNBfi/5a9Neivxb9teivRf9ukv9oyn22bNni2dnZXl5enuoRGWX27NleVlbmiURCPcoBob8W/bXor0V/Lfpr0V+L/t2kvsy7u8+ZM8cnTJgQ+p+OqqurPTs729966y31KEmhvxb9teivRX8t+mvRX4v+XfRtmd+8ebPn5ub6888/35djpBKJhE+ZMsUnTpwYmp9K96G/Fv216K9Ffy36a9Ffi/5d9G2Zd3e/++67vaioyLdv397XoyReffVVz8nJ8Q0bNqhHSQn9teivRX8t+mvRX4v+WvTv1PdlPh6P+zHHHOPnnntu6N7uqK2t9cLCQp8/f756lJTRX4v+WvTXor8W/bXor0X/Tn1f5t3d161b55FIxB999NEgjusXbW1tPmnSJB8/frw3Nzerx+kT+mvRX4v+WvTXor8W/bXo7+5BLfPu7gsWLPCcnJxQ/BJFIpHwq666ygsKCnzr1q3qcQJBfy36a9Ffi/5a9Neivxb9A1zm3d3nzZvneXl5vmbNmiCPDdwdd9zhkUgkNJ+peqDor0V/Lfpr0V+L/lr01xrg/YNd5js6OnzOnDmen5/vH330UZBHB6Kjo8Nvv/12z87O9mXLlqnHCRz9teivRX8t+mvRX4v+WgO8f7DLvPs/zwJdffXVHolEfOHChUEfn7I9e/b4FVdc4YcccojHYjH1OGlDfy36a9Ffi/5a9Neiv9YA7h/8Mu/+zzNBd911l5uZ33rrrd7S0pKO2xywrVu3ellZmRcWFvrq1auls/QH+mvRX4v+WvTXor8W/bUGaP/0LPOdp8diXlBQ4GVlZf7ll1+m81Y96ujo8BdffNELCgr8lFNO8dra2n6fQYn+WvTXor8W/bXor0V/rQHWP73LvLv7tm3b/PTTT/ecnByfN2+e79ixI923dHf3tWvX+mmnnea5ubl+1113yX86U6G/Fv216K9Ffy36a9FfawD1T/8y7/7P2x6vv/66jxgxwqPRqN95553+xx9/pOVeVVVVfu6557qZ+ZQpU/ybb75Jy33ChP5a9Neivxb9teivRX+tAdK/f5b5feLxuD/55JN+xBFH+KBBg3zWrFn+xhtveDwe79O5P//8sz/22GM+bty4zoiZ+NvMavTXor8W/bXor0V/LfprHeT9Y1nu7tbPWltbbfny5bZo0SJbtWqVZWdn28SJE2369Ok2fvx4Gzt2rI0ZM8by8vK6vba+vt5qampsy5Yttn79equsrLRt27bZ8OHD7fLLL7drrrnGJk6c2N/fUqjQX4v+WvTXor8W/bXor3WQ9i+XLPP/9tdff1lFRYVVVlZaVVWV1dbWWkdHh5mZ5ebmWn5+vkWjUYvH49bQ0ND5umg0aieccIJNnz7dZsyYYVOmTLFIJKL6NkKL/lr016K/Fv216K9Ff62DqL9+mf9/bW1ttm3bNqutrbXGxkaLx+PW1NRk+fn5VlRUZEVFRTZmzBg78sgjLSsrSz3uQYf+WvTXor8W/bXor0V/rRD3z7xlHgAAAMABKc9WTwAAAAAgNSzzAAAAQEixzAMAAAAh9T/qWQXKiyoG1QAAAABJRU5ErkJggg==\n", 260 | "text/plain": [ 261 | "" 262 | ] 263 | }, 264 | "execution_count": 9, 265 | "metadata": {}, 266 | "output_type": "execute_result" 267 | } 268 | ], 269 | "source": [ 270 | "total.visualize()" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "Notice that the computation is significantly faster!" 278 | ] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "metadata": {}, 283 | "source": [ 284 | "### Parallel Pandas `groupby()`" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": {}, 290 | "source": [ 291 | "For another example, let's go back to the NYC taxi cabs dataset. Again, we assume all csv files are in the `/data` subdirectory." 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": null, 297 | "metadata": {}, 298 | "outputs": [], 299 | "source": [ 300 | "# Uncomment to download data again\n", 301 | "# !wget https://s3.amazonaws.com/nyc-tlc/trip+data/yellow_tripdata_2019-{01..12}.csv" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "Similar to the computation in the DataFrame notebook, we read the data for one month and find the mean tip amount as a function of passenger count." 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 13, 314 | "metadata": {}, 315 | "outputs": [ 316 | { 317 | "data": { 318 | "text/html": [ 319 | "
\n", 320 | "\n", 333 | "\n", 334 | " \n", 335 | " \n", 336 | " \n", 337 | " \n", 338 | " \n", 339 | " \n", 340 | " \n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | " \n", 376 | " \n", 377 | " \n", 378 | " \n", 379 | " \n", 380 | " \n", 381 | " \n", 382 | " \n", 383 | " \n", 384 | " \n", 385 | " \n", 386 | " \n", 387 | " \n", 388 | " \n", 389 | " \n", 390 | " \n", 391 | " \n", 392 | " \n", 393 | " \n", 394 | " \n", 395 | " \n", 396 | " \n", 397 | " \n", 398 | " \n", 399 | " \n", 400 | " \n", 401 | " \n", 402 | " \n", 403 | " \n", 404 | " \n", 405 | " \n", 406 | " \n", 407 | " \n", 408 | " \n", 409 | " \n", 410 | " \n", 411 | " \n", 412 | " \n", 413 | " \n", 414 | " \n", 415 | " \n", 416 | " \n", 417 | " \n", 418 | " \n", 419 | " \n", 420 | " \n", 421 | " \n", 422 | " \n", 423 | " \n", 424 | " \n", 425 | " \n", 426 | " \n", 427 | " \n", 428 | " \n", 429 | " \n", 430 | " \n", 431 | " \n", 432 | " \n", 433 | " \n", 434 | " \n", 435 | " \n", 436 | " \n", 437 | " \n", 438 | " \n", 439 | " \n", 440 | " \n", 441 | " \n", 442 | " \n", 443 | " \n", 444 | " \n", 445 | " \n", 446 | " \n", 447 | " \n", 448 | " \n", 449 | " \n", 450 | " \n", 451 | " \n", 452 | " \n", 453 | " \n", 454 | " \n", 455 | " \n", 456 | " \n", 457 | " \n", 458 | " \n", 459 | " \n", 460 | " \n", 461 | " \n", 462 | " \n", 463 | " \n", 464 | "
VendorIDtpep_pickup_datetimetpep_dropoff_datetimepassenger_counttrip_distanceRatecodeIDstore_and_fwd_flagPULocationIDDOLocationIDpayment_typefare_amountextramta_taxtip_amounttolls_amountimprovement_surchargetotal_amountcongestion_surcharge
012019-01-01 00:46:402019-01-01 00:53:2011.51N15123917.00.50.51.650.00.39.95NaN
112019-01-01 00:59:472019-01-01 01:18:5912.61N239246114.00.50.51.000.00.316.30NaN
222018-12-21 13:48:302018-12-21 13:52:4030.01N23623614.50.50.50.000.00.35.80NaN
322018-11-28 15:52:252018-11-28 15:55:4550.01N19319323.50.50.50.000.00.37.55NaN
422018-11-28 15:56:572018-11-28 15:58:3350.02N193193252.00.00.50.000.00.355.55NaN
\n", 465 | "
" 466 | ], 467 | "text/plain": [ 468 | " VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count \\\n", 469 | "0 1 2019-01-01 00:46:40 2019-01-01 00:53:20 1 \n", 470 | "1 1 2019-01-01 00:59:47 2019-01-01 01:18:59 1 \n", 471 | "2 2 2018-12-21 13:48:30 2018-12-21 13:52:40 3 \n", 472 | "3 2 2018-11-28 15:52:25 2018-11-28 15:55:45 5 \n", 473 | "4 2 2018-11-28 15:56:57 2018-11-28 15:58:33 5 \n", 474 | "\n", 475 | " trip_distance RatecodeID store_and_fwd_flag PULocationID DOLocationID \\\n", 476 | "0 1.5 1 N 151 239 \n", 477 | "1 2.6 1 N 239 246 \n", 478 | "2 0.0 1 N 236 236 \n", 479 | "3 0.0 1 N 193 193 \n", 480 | "4 0.0 2 N 193 193 \n", 481 | "\n", 482 | " payment_type fare_amount extra mta_tax tip_amount tolls_amount \\\n", 483 | "0 1 7.0 0.5 0.5 1.65 0.0 \n", 484 | "1 1 14.0 0.5 0.5 1.00 0.0 \n", 485 | "2 1 4.5 0.5 0.5 0.00 0.0 \n", 486 | "3 2 3.5 0.5 0.5 0.00 0.0 \n", 487 | "4 2 52.0 0.0 0.5 0.00 0.0 \n", 488 | "\n", 489 | " improvement_surcharge total_amount congestion_surcharge \n", 490 | "0 0.3 9.95 NaN \n", 491 | "1 0.3 16.30 NaN \n", 492 | "2 0.3 5.80 NaN \n", 493 | "3 0.3 7.55 NaN \n", 494 | "4 0.3 55.55 NaN " 495 | ] 496 | }, 497 | "execution_count": 13, 498 | "metadata": {}, 499 | "output_type": "execute_result" 500 | } 501 | ], 502 | "source": [ 503 | "import pandas as pd\n", 504 | "\n", 505 | "df = pd.read_csv(\"data/yellow_tripdata_2019-01.csv\")\n", 506 | "df.head()" 507 | ] 508 | }, 509 | { 510 | "cell_type": "code", 511 | "execution_count": 14, 512 | "metadata": {}, 513 | "outputs": [ 514 | { 515 | "data": { 516 | "text/plain": [ 517 | "passenger_count\n", 518 | "0 1.786901\n", 519 | "1 1.828308\n", 520 | "2 1.833877\n", 521 | "3 1.795579\n", 522 | "4 1.702710\n", 523 | "5 1.869868\n", 524 | "6 1.856830\n", 525 | "7 6.542632\n", 526 | "8 6.480690\n", 527 | "9 3.116667\n", 528 | "Name: tip_amount, dtype: float64" 529 | ] 530 | }, 531 | "execution_count": 14, 532 | "metadata": {}, 533 | "output_type": "execute_result" 534 | } 535 | ], 536 | "source": [ 537 | "df.groupby(\"passenger_count\").tip_amount.mean()" 538 | ] 539 | }, 540 | { 541 | "cell_type": "markdown", 542 | "metadata": {}, 543 | "source": [ 544 | "Now, to compute this value across the entire dataset, we can use the following sequential code:" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": 15, 550 | "metadata": {}, 551 | "outputs": [], 552 | "source": [ 553 | "import os\n", 554 | "from glob import glob\n", 555 | "\n", 556 | "filenames = sorted(glob(os.path.join('data', '*.csv')))" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 16, 562 | "metadata": {}, 563 | "outputs": [ 564 | { 565 | "name": "stderr", 566 | "output_type": "stream", 567 | "text": [ 568 | ":2: DtypeWarning: Columns (6) have mixed types.Specify dtype option on import or set low_memory=False.\n" 569 | ] 570 | }, 571 | { 572 | "name": "stdout", 573 | "output_type": "stream", 574 | "text": [ 575 | "CPU times: user 2min 20s, sys: 17.4 s, total: 2min 37s\n", 576 | "Wall time: 2min 38s\n" 577 | ] 578 | }, 579 | { 580 | "data": { 581 | "text/plain": [ 582 | "passenger_count\n", 583 | "0 2.122789\n", 584 | "1 2.206790\n", 585 | "2 2.214306\n", 586 | "3 2.137775\n", 587 | "4 2.023804\n", 588 | "5 2.235441\n", 589 | "6 2.221105\n", 590 | "7 6.675962\n", 591 | "8 7.111625\n", 592 | "9 7.377822\n", 593 | "Name: tip_amount, dtype: float64" 594 | ] 595 | }, 596 | "execution_count": 16, 597 | "metadata": {}, 598 | "output_type": "execute_result" 599 | } 600 | ], 601 | "source": [ 602 | "%%time\n", 603 | "\n", 604 | "sums = []\n", 605 | "counts = []\n", 606 | "\n", 607 | "for fn in filenames:\n", 608 | " # Read file\n", 609 | " df = pd.read_csv(fn)\n", 610 | "\n", 611 | " # Groupby passenger_count\n", 612 | " by_passenger_count = df.groupby('passenger_count')\n", 613 | "\n", 614 | " # Sum of (all) tip_amount as function of passenger_count\n", 615 | " amount = by_passenger_count.tip_amount.sum()\n", 616 | "\n", 617 | " # Number of total data points\n", 618 | " total = by_passenger_count.tip_amount.count()\n", 619 | "\n", 620 | " # Save the intermediates\n", 621 | " sums.append(amount)\n", 622 | " counts.append(total)\n", 623 | "\n", 624 | "# Combine intermediates to get total mean\n", 625 | "sum_tip_amount = sum(sums)\n", 626 | "n_passengers = sum(counts)\n", 627 | "mean = sum_tip_amount / n_passengers\n", 628 | "mean" 629 | ] 630 | }, 631 | { 632 | "cell_type": "markdown", 633 | "metadata": {}, 634 | "source": [ 635 | "Parallelize using delayed:" 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": 17, 641 | "metadata": {}, 642 | "outputs": [], 643 | "source": [ 644 | "from dask import compute" 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": 18, 650 | "metadata": {}, 651 | "outputs": [ 652 | { 653 | "name": "stderr", 654 | "output_type": "stream", 655 | "text": [ 656 | "/Users/pavithra-coiled/.conda/envs/talkpython-dask/lib/python3.8/site-packages/dask/local.py:237: DtypeWarning: Columns (6) have mixed types.Specify dtype option on import or set low_memory=False.\n", 657 | " return [execute_task(*a) for a in it]\n" 658 | ] 659 | }, 660 | { 661 | "name": "stdout", 662 | "output_type": "stream", 663 | "text": [ 664 | "CPU times: user 3min 42s, sys: 1min 36s, total: 5min 18s\n", 665 | "Wall time: 2min\n" 666 | ] 667 | }, 668 | { 669 | "data": { 670 | "text/plain": [ 671 | "passenger_count\n", 672 | "0 2.122789\n", 673 | "1 2.206790\n", 674 | "2 2.214306\n", 675 | "3 2.137775\n", 676 | "4 2.023804\n", 677 | "5 2.235441\n", 678 | "6 2.221105\n", 679 | "7 6.675962\n", 680 | "8 7.111625\n", 681 | "9 7.377822\n", 682 | "Name: tip_amount, dtype: float64" 683 | ] 684 | }, 685 | "execution_count": 18, 686 | "metadata": {}, 687 | "output_type": "execute_result" 688 | } 689 | ], 690 | "source": [ 691 | "%%time\n", 692 | "\n", 693 | "sums = []\n", 694 | "counts = []\n", 695 | "\n", 696 | "for fn in filenames:\n", 697 | " \n", 698 | " df = delayed(pd.read_csv)(fn) # Delayed!\n", 699 | "\n", 700 | " by_passenger_count = df.groupby('passenger_count')\n", 701 | " \n", 702 | " amount = by_passenger_count.tip_amount.sum()\n", 703 | "\n", 704 | " total = by_passenger_count.tip_amount.count()\n", 705 | "\n", 706 | " sums.append(amount)\n", 707 | " counts.append(total)\n", 708 | "\n", 709 | " \n", 710 | "sums, counts = compute(sums, counts) # Compute the intermediates!\n", 711 | " \n", 712 | "sum_tip_amount = sum(sums)\n", 713 | "n_passengers = sum(counts)\n", 714 | "mean = sum_tip_amount / n_passengers\n", 715 | "mean" 716 | ] 717 | }, 718 | { 719 | "cell_type": "markdown", 720 | "metadata": {}, 721 | "source": [ 722 | "## Checkpoint\n", 723 | "\n", 724 | "**Question:** Using the Delayed API to parallelize, create a NumPy array `x` of any size and compute the sum of all array entires." 725 | ] 726 | }, 727 | { 728 | "cell_type": "code", 729 | "execution_count": null, 730 | "metadata": {}, 731 | "outputs": [], 732 | "source": [ 733 | "# Your answer here" 734 | ] 735 | }, 736 | { 737 | "cell_type": "code", 738 | "execution_count": null, 739 | "metadata": { 740 | "jupyter": { 741 | "source_hidden": true 742 | }, 743 | "tags": [] 744 | }, 745 | "outputs": [], 746 | "source": [ 747 | "import numpy as np\n", 748 | "\n", 749 | "x = delayed(np.ones)((1000,1000), dtype=int)\n", 750 | "y = x.sum()\n", 751 | "y.compute()" 752 | ] 753 | }, 754 | { 755 | "cell_type": "markdown", 756 | "metadata": {}, 757 | "source": [ 758 | "## Best Practices" 759 | ] 760 | }, 761 | { 762 | "cell_type": "markdown", 763 | "metadata": {}, 764 | "source": [ 765 | "1. Delayed is called on Python functions and not the results" 766 | ] 767 | }, 768 | { 769 | "cell_type": "code", 770 | "execution_count": null, 771 | "metadata": {}, 772 | "outputs": [], 773 | "source": [ 774 | "# [DON'T] Call delayed on result, becasuse it executes immediately\n", 775 | "\n", 776 | "dask.delayed(f(x, y))" 777 | ] 778 | }, 779 | { 780 | "cell_type": "code", 781 | "execution_count": null, 782 | "metadata": {}, 783 | "outputs": [], 784 | "source": [ 785 | "# [DO] Call delayed on function\n", 786 | "\n", 787 | "dask.delayed(f)(x, y)" 788 | ] 789 | }, 790 | { 791 | "cell_type": "markdown", 792 | "metadata": {}, 793 | "source": [ 794 | "2. Compute at once, instead of repeatedly" 795 | ] 796 | }, 797 | { 798 | "cell_type": "code", 799 | "execution_count": null, 800 | "metadata": {}, 801 | "outputs": [], 802 | "source": [ 803 | "# [DON'T] Call compute repeatedly\n", 804 | "\n", 805 | "results = []\n", 806 | "for x in L:\n", 807 | " y = dask.delayed(f)(x)\n", 808 | " results.append(y.compute())\n", 809 | "\n", 810 | "results" 811 | ] 812 | }, 813 | { 814 | "cell_type": "code", 815 | "execution_count": null, 816 | "metadata": {}, 817 | "outputs": [], 818 | "source": [ 819 | "# [DO] Collect many calls for one compute\n", 820 | "\n", 821 | "results = []\n", 822 | "for x in L:\n", 823 | " y = dask.delayed(f)(x)\n", 824 | " results.append(y)\n", 825 | "\n", 826 | "results = dask.compute(*results)" 827 | ] 828 | }, 829 | { 830 | "cell_type": "markdown", 831 | "metadata": {}, 832 | "source": [ 833 | "3. Do not change (mutate) inputs" 834 | ] 835 | }, 836 | { 837 | "cell_type": "code", 838 | "execution_count": null, 839 | "metadata": {}, 840 | "outputs": [], 841 | "source": [ 842 | "# [Don'T] Mutate inputs in functions\n", 843 | "\n", 844 | "@dask.delayed\n", 845 | "def f(x):\n", 846 | " x += 1\n", 847 | " return x" 848 | ] 849 | }, 850 | { 851 | "cell_type": "code", 852 | "execution_count": null, 853 | "metadata": {}, 854 | "outputs": [], 855 | "source": [ 856 | "# [DO] Return new values or copies\n", 857 | "\n", 858 | "@dask.delayed\n", 859 | "def f(x):\n", 860 | " x = x + 1\n", 861 | " return x" 862 | ] 863 | }, 864 | { 865 | "cell_type": "markdown", 866 | "metadata": {}, 867 | "source": [ 868 | "For more best practices, refer to the [Dask documentation](https://docs.dask.org/en/latest/delayed-best-practices.html)." 869 | ] 870 | }, 871 | { 872 | "cell_type": "markdown", 873 | "metadata": {}, 874 | "source": [ 875 | "## References" 876 | ] 877 | }, 878 | { 879 | "cell_type": "markdown", 880 | "metadata": {}, 881 | "source": [ 882 | "* [Dask Delayed documentation](https://docs.dask.org/en/latest/delayed.html)\n", 883 | "* [Dask Delayed best practices](https://docs.dask.org/en/latest/delayed-best-practices.html)\n", 884 | "* [Dask Tutorial - Delayed](https://tutorial.dask.org/01_dask.delayed.html)" 885 | ] 886 | }, 887 | { 888 | "cell_type": "code", 889 | "execution_count": null, 890 | "metadata": {}, 891 | "outputs": [], 892 | "source": [] 893 | } 894 | ], 895 | "metadata": { 896 | "kernelspec": { 897 | "display_name": "Python 3", 898 | "language": "python", 899 | "name": "python3" 900 | }, 901 | "language_info": { 902 | "codemirror_mode": { 903 | "name": "ipython", 904 | "version": 3 905 | }, 906 | "file_extension": ".py", 907 | "mimetype": "text/x-python", 908 | "name": "python", 909 | "nbconvert_exporter": "python", 910 | "pygments_lexer": "ipython3", 911 | "version": "3.8.10" 912 | } 913 | }, 914 | "nbformat": 4, 915 | "nbformat_minor": 4 916 | } 917 | --------------------------------------------------------------------------------