├── 01-DA_Numpy_arrays_creation.ipynb
├── 02-DA_Numpy_array_maths.ipynb
├── 03-DA_Numpy_matplotlib.ipynb
├── 04-DA_Numpy_indexing.ipynb
├── 05-DA_Numpy_combining_arrays.ipynb
├── 06-DA_Pandas_introduction.ipynb
├── 07-DA_Pandas_structures.ipynb
├── 08-DA_Pandas_import_plotting.ipynb
├── 09-DA_Pandas_operations.ipynb
├── 10-DA_Pandas_combine.ipynb
├── 11-DA_Pandas_splitting.ipynb
├── 12-DA_Pandas_realworld.ipynb
├── 98-DA_Numpy_Exercises.ipynb
├── 98-DA_Numpy_Solutions.ipynb
├── 99-DA_Pandas_Exercises.ipynb
├── 99-DA_Pandas_Solutions.ipynb
├── Data
    ├── AB_NYC_2019.csv
    ├── P3_GrantExport.csv
    ├── P3_PersonExport.csv
    ├── composers.xlsx
    └── ny_boroughs.xlsx
├── LICENSE
├── README.md
├── binder
    ├── environment.yml
    └── postBuild
├── colab
    ├── automate_colab_editing.ipynb
    └── colab_data.sh
└── svg.py


/01-DA_Numpy_arrays_creation.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# 1. Creating Numpy arrays"
   8 |    ]
   9 |   },
  10 |   {
  11 |    "cell_type": "markdown",
  12 |    "metadata": {},
  13 |    "source": [
  14 |     "Numpy has many different types of data \"containers\": lists, dictionaries, tuples etc. However none of them allows for efficient numerical calculation, in particular not in multi-dimensional cases (think e.g. of operations on images). Numpy has been developed exactly to fill this gap. It provides a new data structure, the **numpy array**, and a large library of operations that allow to: \n",
  15 |     "- generate such arrays\n",
  16 |     "- combine arrays in different ways (concatenation, stacking etc.)\n",
  17 |     "- modify such arrays (projection, extraction of sub-arrays etc.)\n",
  18 |     "- apply mathematical operations on them\n",
  19 |     "\n",
  20 |     "Numpy is the base of almost the entire Python scientific programming stack. Many libraries build on top of Numpy, either by providing specialized functions to operate on them (e.g. scikit-image for image processing) or by creating more complex data containers on top of it. The data science library Pandas that will also be presented in this course is a good example of this with its dataframe structures.\n"
  21 |    ]
  22 |   },
  23 |   {
  24 |    "cell_type": "code",
  25 |    "execution_count": null,
  26 |    "metadata": {},
  27 |    "outputs": [],
  28 |    "source": [
  29 |     "import numpy as np\n",
  30 |     "from svg import numpy_to_svg"
  31 |    ]
  32 |   },
  33 |   {
  34 |    "cell_type": "markdown",
  35 |    "metadata": {},
  36 |    "source": [
  37 |     "## 1.1 What is an array ?"
  38 |    ]
  39 |   },
  40 |   {
  41 |    "cell_type": "markdown",
  42 |    "metadata": {},
  43 |    "source": [
  44 |     "Let us create the simplest example of an array by transforming a regular Python list into an array (we will see more advanced ways of creating arrays in the next chapters):"
  45 |    ]
  46 |   },
  47 |   {
  48 |    "cell_type": "code",
  49 |    "execution_count": null,
  50 |    "metadata": {},
  51 |    "outputs": [],
  52 |    "source": [
  53 |     "mylist = [2,5,3,9,5,2]"
  54 |    ]
  55 |   },
  56 |   {
  57 |    "cell_type": "code",
  58 |    "execution_count": 3,
  59 |    "metadata": {},
  60 |    "outputs": [
  61 |     {
  62 |      "data": {
  63 |       "text/plain": [
  64 |        "[2, 5, 3, 9, 5, 2]"
  65 |       ]
  66 |      },
  67 |      "execution_count": 3,
  68 |      "metadata": {},
  69 |      "output_type": "execute_result"
  70 |     }
  71 |    ],
  72 |    "source": [
  73 |     "mylist"
  74 |    ]
  75 |   },
  76 |   {
  77 |    "cell_type": "code",
  78 |    "execution_count": 4,
  79 |    "metadata": {},
  80 |    "outputs": [],
  81 |    "source": [
  82 |     "myarray = np.array(mylist)"
  83 |    ]
  84 |   },
  85 |   {
  86 |    "cell_type": "code",
  87 |    "execution_count": 5,
  88 |    "metadata": {},
  89 |    "outputs": [
  90 |     {
  91 |      "data": {
  92 |       "text/plain": [
  93 |        "array([2, 5, 3, 9, 5, 2])"
  94 |       ]
  95 |      },
  96 |      "execution_count": 5,
  97 |      "metadata": {},
  98 |      "output_type": "execute_result"
  99 |     }
 100 |    ],
 101 |    "source": [
 102 |     "myarray"
 103 |    ]
 104 |   },
 105 |   {
 106 |    "cell_type": "code",
 107 |    "execution_count": 6,
 108 |    "metadata": {},
 109 |    "outputs": [
 110 |     {
 111 |      "data": {
 112 |       "text/plain": [
 113 |        "numpy.ndarray"
 114 |       ]
 115 |      },
 116 |      "execution_count": 6,
 117 |      "metadata": {},
 118 |      "output_type": "execute_result"
 119 |     }
 120 |    ],
 121 |    "source": [
 122 |     "type(myarray)"
 123 |    ]
 124 |   },
 125 |   {
 126 |    "cell_type": "markdown",
 127 |    "metadata": {},
 128 |    "source": [
 129 |     "We see that ```myarray``` is a Numpy array thanks to the ```array``` specification in the output. The type also says that we have a numpy ndarray (n-dimensional). At this point we don't see a big difference with regular lists, but we'll see in the following sections all the operations we can do with these objects.\n",
 130 |     "\n",
 131 |     "We can already see a difference with two basic attributes of arrays: their type and shape."
 132 |    ]
 133 |   },
 134 |   {
 135 |    "cell_type": "markdown",
 136 |    "metadata": {},
 137 |    "source": [
 138 |     "### 1.1.1 Array Type"
 139 |    ]
 140 |   },
 141 |   {
 142 |    "cell_type": "markdown",
 143 |    "metadata": {},
 144 |    "source": [
 145 |     "Just like when we create regular variables in Python, arrays receive a type when created. Unlike regular list, **all** elements of an array always have the same type. The type of an array can be recovered through the ```.dtype``` method:"
 146 |    ]
 147 |   },
 148 |   {
 149 |    "cell_type": "code",
 150 |    "execution_count": 7,
 151 |    "metadata": {},
 152 |    "outputs": [
 153 |     {
 154 |      "data": {
 155 |       "text/plain": [
 156 |        "dtype('int64')"
 157 |       ]
 158 |      },
 159 |      "execution_count": 7,
 160 |      "metadata": {},
 161 |      "output_type": "execute_result"
 162 |     }
 163 |    ],
 164 |    "source": [
 165 |     "myarray.dtype"
 166 |    ]
 167 |   },
 168 |   {
 169 |    "cell_type": "markdown",
 170 |    "metadata": {},
 171 |    "source": [
 172 |     "Depending on the content of the list, the array will have different types. But the logic of \"maximal complexity\" is kept. For example if we mix integers and floats, we get a float array:"
 173 |    ]
 174 |   },
 175 |   {
 176 |    "cell_type": "code",
 177 |    "execution_count": 8,
 178 |    "metadata": {},
 179 |    "outputs": [
 180 |     {
 181 |      "data": {
 182 |       "text/plain": [
 183 |        "array([1.2, 6. , 7.6, 5. ])"
 184 |       ]
 185 |      },
 186 |      "execution_count": 8,
 187 |      "metadata": {},
 188 |      "output_type": "execute_result"
 189 |     }
 190 |    ],
 191 |    "source": [
 192 |     "myarray2 = np.array([1.2, 6, 7.6, 5])\n",
 193 |     "myarray2"
 194 |    ]
 195 |   },
 196 |   {
 197 |    "cell_type": "code",
 198 |    "execution_count": 9,
 199 |    "metadata": {},
 200 |    "outputs": [
 201 |     {
 202 |      "data": {
 203 |       "text/plain": [
 204 |        "dtype('float64')"
 205 |       ]
 206 |      },
 207 |      "execution_count": 9,
 208 |      "metadata": {},
 209 |      "output_type": "execute_result"
 210 |     }
 211 |    ],
 212 |    "source": [
 213 |     "myarray2.dtype"
 214 |    ]
 215 |   },
 216 |   {
 217 |    "cell_type": "markdown",
 218 |    "metadata": {},
 219 |    "source": [
 220 |     "In general, we have the possibility to assign a type to an array. This is true here, as well as later when we'll create more complex arrays, and is done via the ```dtype``` option: "
 221 |    ]
 222 |   },
 223 |   {
 224 |    "cell_type": "code",
 225 |    "execution_count": 10,
 226 |    "metadata": {},
 227 |    "outputs": [
 228 |     {
 229 |      "data": {
 230 |       "text/plain": [
 231 |        "array([  1,   6,   7, 244], dtype=uint8)"
 232 |       ]
 233 |      },
 234 |      "execution_count": 10,
 235 |      "metadata": {},
 236 |      "output_type": "execute_result"
 237 |     }
 238 |    ],
 239 |    "source": [
 240 |     "myarray2 = np.array([1.2, 6, 7.6, 500], dtype=np.uint8)\n",
 241 |     "myarray2"
 242 |    ]
 243 |   },
 244 |   {
 245 |    "cell_type": "markdown",
 246 |    "metadata": {},
 247 |    "source": [
 248 |     "The type of the array can also be changed after creation using the ```.astype()``` method:"
 249 |    ]
 250 |   },
 251 |   {
 252 |    "cell_type": "code",
 253 |    "execution_count": 11,
 254 |    "metadata": {},
 255 |    "outputs": [
 256 |     {
 257 |      "data": {
 258 |       "text/plain": [
 259 |        "dtype('float64')"
 260 |       ]
 261 |      },
 262 |      "execution_count": 11,
 263 |      "metadata": {},
 264 |      "output_type": "execute_result"
 265 |     }
 266 |    ],
 267 |    "source": [
 268 |     "myfloat_array = np.array([1.2, 6, 7.6, 500], dtype=np.float)\n",
 269 |     "myfloat_array.dtype"
 270 |    ]
 271 |   },
 272 |   {
 273 |    "cell_type": "code",
 274 |    "execution_count": 12,
 275 |    "metadata": {},
 276 |    "outputs": [
 277 |     {
 278 |      "data": {
 279 |       "text/plain": [
 280 |        "dtype('int8')"
 281 |       ]
 282 |      },
 283 |      "execution_count": 12,
 284 |      "metadata": {},
 285 |      "output_type": "execute_result"
 286 |     }
 287 |    ],
 288 |    "source": [
 289 |     "myint_array = myfloat_array.astype(np.int8)\n",
 290 |     "myint_array.dtype"
 291 |    ]
 292 |   },
 293 |   {
 294 |    "cell_type": "markdown",
 295 |    "metadata": {},
 296 |    "source": [
 297 |     "### 1.1.2 Array shape\n",
 298 |     "\n",
 299 |     "A very important property of an array is its **shape** or in other words the dimensions of each axis. That property can be accessed via the ```.shape``` property:"
 300 |    ]
 301 |   },
 302 |   {
 303 |    "cell_type": "code",
 304 |    "execution_count": 13,
 305 |    "metadata": {},
 306 |    "outputs": [
 307 |     {
 308 |      "data": {
 309 |       "text/plain": [
 310 |        "array([2, 5, 3, 9, 5, 2])"
 311 |       ]
 312 |      },
 313 |      "execution_count": 13,
 314 |      "metadata": {},
 315 |      "output_type": "execute_result"
 316 |     }
 317 |    ],
 318 |    "source": [
 319 |     "myarray"
 320 |    ]
 321 |   },
 322 |   {
 323 |    "cell_type": "code",
 324 |    "execution_count": 14,
 325 |    "metadata": {},
 326 |    "outputs": [
 327 |     {
 328 |      "data": {
 329 |       "text/plain": [
 330 |        "(6,)"
 331 |       ]
 332 |      },
 333 |      "execution_count": 14,
 334 |      "metadata": {},
 335 |      "output_type": "execute_result"
 336 |     }
 337 |    ],
 338 |    "source": [
 339 |     "myarray.shape"
 340 |    ]
 341 |   },
 342 |   {
 343 |    "cell_type": "markdown",
 344 |    "metadata": {},
 345 |    "source": [
 346 |     "We see that our simple array has only one dimension of length 6. Now of course we can create more complex arrays. Let's create for example a *list of two lists*:"
 347 |    ]
 348 |   },
 349 |   {
 350 |    "cell_type": "code",
 351 |    "execution_count": 15,
 352 |    "metadata": {},
 353 |    "outputs": [
 354 |     {
 355 |      "data": {
 356 |       "text/plain": [
 357 |        "array([[1, 2, 3],\n",
 358 |        "       [4, 5, 6]])"
 359 |       ]
 360 |      },
 361 |      "execution_count": 15,
 362 |      "metadata": {},
 363 |      "output_type": "execute_result"
 364 |     }
 365 |    ],
 366 |    "source": [
 367 |     "my2d_list = [[1,2,3], [4,5,6]]\n",
 368 |     "\n",
 369 |     "my2d_array = np.array(my2d_list)\n",
 370 |     "my2d_array"
 371 |    ]
 372 |   },
 373 |   {
 374 |    "cell_type": "code",
 375 |    "execution_count": 16,
 376 |    "metadata": {},
 377 |    "outputs": [
 378 |     {
 379 |      "data": {
 380 |       "text/plain": [
 381 |        "(2, 3)"
 382 |       ]
 383 |      },
 384 |      "execution_count": 16,
 385 |      "metadata": {},
 386 |      "output_type": "execute_result"
 387 |     }
 388 |    ],
 389 |    "source": [
 390 |     "my2d_array.shape"
 391 |    ]
 392 |   },
 393 |   {
 394 |    "cell_type": "markdown",
 395 |    "metadata": {},
 396 |    "source": [
 397 |     "We see now that the shape of this array is *two-dimensional*. We also see that we have 2 lists of 3 elements. In fact at this point we should forget that we have a list of lists and simply consider this object as a *matrix* with *two rows and three columns*. We'll use the follwing graphical representation to clarify some concepts:"
 398 |    ]
 399 |   },
 400 |   {
 401 |    "cell_type": "code",
 402 |    "execution_count": 17,
 403 |    "metadata": {},
 404 |    "outputs": [
 405 |     {
 406 |      "data": {
 407 |       "text/html": [
 408 |        "<svg width=\"250\" height=\"183\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
 409 |        "\n",
 410 |        "  <!-- Horizontal lines -->\n",
 411 |        "  <line x1=\"0\" y1=\"0\" x2=\"200\" y2=\"0\" style=\"stroke-width:2\" />\n",
 412 |        "  <line x1=\"0\" y1=\"66\" x2=\"200\" y2=\"66\" />\n",
 413 |        "  <line x1=\"0\" y1=\"133\" x2=\"200\" y2=\"133\" style=\"stroke-width:2\" />\n",
 414 |        "\n",
 415 |        "  <!-- Vertical lines -->\n",
 416 |        "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"133\" style=\"stroke-width:2\" />\n",
 417 |        "  <line x1=\"66\" y1=\"0\" x2=\"66\" y2=\"133\" />\n",
 418 |        "  <line x1=\"133\" y1=\"0\" x2=\"133\" y2=\"133\" />\n",
 419 |        "  <line x1=\"200\" y1=\"0\" x2=\"200\" y2=\"133\" style=\"stroke-width:2\" />\n",
 420 |        "\n",
 421 |        "  <!-- Colored Rectangle -->\n",
 422 |        "  <polygon points=\"0.000000,0.000000 200.000000,0.000000 200.000000,133.333333 0.000000,133.333333\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
 423 |        "\n",
 424 |        "  <!-- Text -->\n",
 425 |        "  <text x=\"100.000000\" y=\"153.333333\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >3</text>\n",
 426 |        "  <text x=\"220.000000\" y=\"66.666667\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,220.000000,66.666667)\">2</text>\n",
 427 |        "</svg>"
 428 |       ],
 429 |       "text/plain": [
 430 |        "<IPython.core.display.HTML object>"
 431 |       ]
 432 |      },
 433 |      "execution_count": 17,
 434 |      "metadata": {},
 435 |      "output_type": "execute_result"
 436 |     }
 437 |    ],
 438 |    "source": [
 439 |     "numpy_to_svg(my2d_array)"
 440 |    ]
 441 |   },
 442 |   {
 443 |    "cell_type": "markdown",
 444 |    "metadata": {},
 445 |    "source": [
 446 |     "## 1.2 Creating arrays\n",
 447 |     "\n",
 448 |     "We have seen that we can turn regular lists into arrays. However this becomes quickly impractical for larger arrays. Numpy offers several functions to create particular arrays. "
 449 |    ]
 450 |   },
 451 |   {
 452 |    "cell_type": "markdown",
 453 |    "metadata": {},
 454 |    "source": [
 455 |     "### 1.2.1 Common simple arrays\n",
 456 |     "For example an array full of zeros or ones:"
 457 |    ]
 458 |   },
 459 |   {
 460 |    "cell_type": "code",
 461 |    "execution_count": 18,
 462 |    "metadata": {},
 463 |    "outputs": [
 464 |     {
 465 |      "data": {
 466 |       "text/plain": [
 467 |        "array([[1., 1., 1.],\n",
 468 |        "       [1., 1., 1.]])"
 469 |       ]
 470 |      },
 471 |      "execution_count": 18,
 472 |      "metadata": {},
 473 |      "output_type": "execute_result"
 474 |     }
 475 |    ],
 476 |    "source": [
 477 |     "one_array = np.ones((2,3))\n",
 478 |     "one_array"
 479 |    ]
 480 |   },
 481 |   {
 482 |    "cell_type": "code",
 483 |    "execution_count": 19,
 484 |    "metadata": {},
 485 |    "outputs": [
 486 |     {
 487 |      "data": {
 488 |       "text/plain": [
 489 |        "array([[0., 0., 0.],\n",
 490 |        "       [0., 0., 0.]])"
 491 |       ]
 492 |      },
 493 |      "execution_count": 19,
 494 |      "metadata": {},
 495 |      "output_type": "execute_result"
 496 |     }
 497 |    ],
 498 |    "source": [
 499 |     "zero_array = np.zeros((2,3))\n",
 500 |     "zero_array"
 501 |    ]
 502 |   },
 503 |   {
 504 |    "cell_type": "markdown",
 505 |    "metadata": {},
 506 |    "source": [
 507 |     "One can also create diagonal matrix:"
 508 |    ]
 509 |   },
 510 |   {
 511 |    "cell_type": "code",
 512 |    "execution_count": 20,
 513 |    "metadata": {},
 514 |    "outputs": [
 515 |     {
 516 |      "data": {
 517 |       "text/plain": [
 518 |        "array([[1., 0., 0.],\n",
 519 |        "       [0., 1., 0.],\n",
 520 |        "       [0., 0., 1.]])"
 521 |       ]
 522 |      },
 523 |      "execution_count": 20,
 524 |      "metadata": {},
 525 |      "output_type": "execute_result"
 526 |     }
 527 |    ],
 528 |    "source": [
 529 |     "np.eye(3)"
 530 |    ]
 531 |   },
 532 |   {
 533 |    "cell_type": "markdown",
 534 |    "metadata": {},
 535 |    "source": [
 536 |     "By default Numpy creates float arrays:"
 537 |    ]
 538 |   },
 539 |   {
 540 |    "cell_type": "code",
 541 |    "execution_count": 21,
 542 |    "metadata": {},
 543 |    "outputs": [
 544 |     {
 545 |      "data": {
 546 |       "text/plain": [
 547 |        "dtype('float64')"
 548 |       ]
 549 |      },
 550 |      "execution_count": 21,
 551 |      "metadata": {},
 552 |      "output_type": "execute_result"
 553 |     }
 554 |    ],
 555 |    "source": [
 556 |     "one_array.dtype"
 557 |    ]
 558 |   },
 559 |   {
 560 |    "cell_type": "markdown",
 561 |    "metadata": {},
 562 |    "source": [
 563 |     "However as mentioned before, one can impose a type usine the ```dtype``` option:"
 564 |    ]
 565 |   },
 566 |   {
 567 |    "cell_type": "code",
 568 |    "execution_count": 22,
 569 |    "metadata": {},
 570 |    "outputs": [
 571 |     {
 572 |      "data": {
 573 |       "text/plain": [
 574 |        "array([[1, 1, 1],\n",
 575 |        "       [1, 1, 1]], dtype=int8)"
 576 |       ]
 577 |      },
 578 |      "execution_count": 22,
 579 |      "metadata": {},
 580 |      "output_type": "execute_result"
 581 |     }
 582 |    ],
 583 |    "source": [
 584 |     "one_array_int = np.ones((2,3), dtype=np.int8)\n",
 585 |     "one_array_int"
 586 |    ]
 587 |   },
 588 |   {
 589 |    "cell_type": "code",
 590 |    "execution_count": 23,
 591 |    "metadata": {},
 592 |    "outputs": [
 593 |     {
 594 |      "data": {
 595 |       "text/plain": [
 596 |        "dtype('int8')"
 597 |       ]
 598 |      },
 599 |      "execution_count": 23,
 600 |      "metadata": {},
 601 |      "output_type": "execute_result"
 602 |     }
 603 |    ],
 604 |    "source": [
 605 |     "one_array_int.dtype"
 606 |    ]
 607 |   },
 608 |   {
 609 |    "cell_type": "markdown",
 610 |    "metadata": {},
 611 |    "source": [
 612 |     "### 1.2.2 Copying the shape\n",
 613 |     "Often one needs to create arrays of same shape. This can be done with \"like-functions\":"
 614 |    ]
 615 |   },
 616 |   {
 617 |    "cell_type": "code",
 618 |    "execution_count": 24,
 619 |    "metadata": {},
 620 |    "outputs": [
 621 |     {
 622 |      "data": {
 623 |       "text/plain": [
 624 |        "array([[0., 0., 0.],\n",
 625 |        "       [0., 0., 0.]])"
 626 |       ]
 627 |      },
 628 |      "execution_count": 24,
 629 |      "metadata": {},
 630 |      "output_type": "execute_result"
 631 |     }
 632 |    ],
 633 |    "source": [
 634 |     "same_shape_array = np.zeros_like(one_array)\n",
 635 |     "same_shape_array"
 636 |    ]
 637 |   },
 638 |   {
 639 |    "cell_type": "code",
 640 |    "execution_count": 25,
 641 |    "metadata": {},
 642 |    "outputs": [
 643 |     {
 644 |      "data": {
 645 |       "text/plain": [
 646 |        "(2, 3)"
 647 |       ]
 648 |      },
 649 |      "execution_count": 25,
 650 |      "metadata": {},
 651 |      "output_type": "execute_result"
 652 |     }
 653 |    ],
 654 |    "source": [
 655 |     "one_array.shape"
 656 |    ]
 657 |   },
 658 |   {
 659 |    "cell_type": "code",
 660 |    "execution_count": 26,
 661 |    "metadata": {},
 662 |    "outputs": [
 663 |     {
 664 |      "data": {
 665 |       "text/plain": [
 666 |        "(2, 3)"
 667 |       ]
 668 |      },
 669 |      "execution_count": 26,
 670 |      "metadata": {},
 671 |      "output_type": "execute_result"
 672 |     }
 673 |    ],
 674 |    "source": [
 675 |     "same_shape_array.shape"
 676 |    ]
 677 |   },
 678 |   {
 679 |    "cell_type": "code",
 680 |    "execution_count": 27,
 681 |    "metadata": {},
 682 |    "outputs": [
 683 |     {
 684 |      "data": {
 685 |       "text/plain": [
 686 |        "array([[1., 1., 1.],\n",
 687 |        "       [1., 1., 1.]])"
 688 |       ]
 689 |      },
 690 |      "execution_count": 27,
 691 |      "metadata": {},
 692 |      "output_type": "execute_result"
 693 |     }
 694 |    ],
 695 |    "source": [
 696 |     "np.ones_like(one_array)"
 697 |    ]
 698 |   },
 699 |   {
 700 |    "cell_type": "markdown",
 701 |    "metadata": {},
 702 |    "source": [
 703 |     "### 1.2.3 Complex arrays\n",
 704 |     "\n",
 705 |     "We are not limited to create arrays containing ones or zeros. Very common operations involve e.g. the creation of arrays containing regularly arrange numbers. For example a \"from-to-by-step\" list:"
 706 |    ]
 707 |   },
 708 |   {
 709 |    "cell_type": "code",
 710 |    "execution_count": 28,
 711 |    "metadata": {},
 712 |    "outputs": [
 713 |     {
 714 |      "data": {
 715 |       "text/plain": [
 716 |        "array([0, 2, 4, 6, 8])"
 717 |       ]
 718 |      },
 719 |      "execution_count": 28,
 720 |      "metadata": {},
 721 |      "output_type": "execute_result"
 722 |     }
 723 |    ],
 724 |    "source": [
 725 |     "np.arange(0, 10, 2)"
 726 |    ]
 727 |   },
 728 |   {
 729 |    "cell_type": "markdown",
 730 |    "metadata": {},
 731 |    "source": [
 732 |     "Or equidistant numbers between boundaries:"
 733 |    ]
 734 |   },
 735 |   {
 736 |    "cell_type": "code",
 737 |    "execution_count": 29,
 738 |    "metadata": {},
 739 |    "outputs": [
 740 |     {
 741 |      "data": {
 742 |       "text/plain": [
 743 |        "array([0.        , 0.11111111, 0.22222222, 0.33333333, 0.44444444,\n",
 744 |        "       0.55555556, 0.66666667, 0.77777778, 0.88888889, 1.        ])"
 745 |       ]
 746 |      },
 747 |      "execution_count": 29,
 748 |      "metadata": {},
 749 |      "output_type": "execute_result"
 750 |     }
 751 |    ],
 752 |    "source": [
 753 |     "np.linspace(0,1, 10)"
 754 |    ]
 755 |   },
 756 |   {
 757 |    "cell_type": "markdown",
 758 |    "metadata": {},
 759 |    "source": [
 760 |     "Numpy offers in particular a ```random``` submodules that allows one to create arrays containing values from a wide array of distributions. For example, normally distributed:"
 761 |    ]
 762 |   },
 763 |   {
 764 |    "cell_type": "code",
 765 |    "execution_count": 30,
 766 |    "metadata": {},
 767 |    "outputs": [
 768 |     {
 769 |      "data": {
 770 |       "text/plain": [
 771 |        "array([[16.64156121, 13.38970093, 11.32772287,  7.93713055],\n",
 772 |        "       [ 8.33365707, 11.27817138,  9.81766403, 11.11541451],\n",
 773 |        "       [12.97743479,  7.1622948 , 12.02417108,  8.64402656]])"
 774 |       ]
 775 |      },
 776 |      "execution_count": 30,
 777 |      "metadata": {},
 778 |      "output_type": "execute_result"
 779 |     }
 780 |    ],
 781 |    "source": [
 782 |     "normal_array = np.random.normal(loc=10, scale=2, size=(3,4))\n",
 783 |     "normal_array"
 784 |    ]
 785 |   },
 786 |   {
 787 |    "cell_type": "code",
 788 |    "execution_count": 31,
 789 |    "metadata": {},
 790 |    "outputs": [
 791 |     {
 792 |      "data": {
 793 |       "text/plain": [
 794 |        "array([[4, 4, 2, 4],\n",
 795 |        "       [3, 7, 6, 3],\n",
 796 |        "       [6, 5, 5, 4]])"
 797 |       ]
 798 |      },
 799 |      "execution_count": 31,
 800 |      "metadata": {},
 801 |      "output_type": "execute_result"
 802 |     }
 803 |    ],
 804 |    "source": [
 805 |     "np.random.poisson(lam=5, size=(3,4))"
 806 |    ]
 807 |   },
 808 |   {
 809 |    "cell_type": "markdown",
 810 |    "metadata": {},
 811 |    "source": [
 812 |     "### 1.2.4 Higher dimensions"
 813 |    ]
 814 |   },
 815 |   {
 816 |    "cell_type": "markdown",
 817 |    "metadata": {},
 818 |    "source": [
 819 |     "Until now we have almost only dealt with 1D or 2D arrays that look like a simple grid:"
 820 |    ]
 821 |   },
 822 |   {
 823 |    "cell_type": "code",
 824 |    "execution_count": 32,
 825 |    "metadata": {},
 826 |    "outputs": [
 827 |     {
 828 |      "data": {
 829 |       "text/html": [
 830 |        "<svg width=\"250\" height=\"150\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
 831 |        "\n",
 832 |        "  <!-- Horizontal lines -->\n",
 833 |        "  <line x1=\"0\" y1=\"0\" x2=\"200\" y2=\"0\" style=\"stroke-width:2\" />\n",
 834 |        "  <line x1=\"0\" y1=\"20\" x2=\"200\" y2=\"20\" />\n",
 835 |        "  <line x1=\"0\" y1=\"40\" x2=\"200\" y2=\"40\" />\n",
 836 |        "  <line x1=\"0\" y1=\"60\" x2=\"200\" y2=\"60\" />\n",
 837 |        "  <line x1=\"0\" y1=\"80\" x2=\"200\" y2=\"80\" />\n",
 838 |        "  <line x1=\"0\" y1=\"100\" x2=\"200\" y2=\"100\" style=\"stroke-width:2\" />\n",
 839 |        "\n",
 840 |        "  <!-- Vertical lines -->\n",
 841 |        "  <line x1=\"0\" y1=\"0\" x2=\"0\" y2=\"100\" style=\"stroke-width:2\" />\n",
 842 |        "  <line x1=\"20\" y1=\"0\" x2=\"20\" y2=\"100\" />\n",
 843 |        "  <line x1=\"40\" y1=\"0\" x2=\"40\" y2=\"100\" />\n",
 844 |        "  <line x1=\"60\" y1=\"0\" x2=\"60\" y2=\"100\" />\n",
 845 |        "  <line x1=\"80\" y1=\"0\" x2=\"80\" y2=\"100\" />\n",
 846 |        "  <line x1=\"100\" y1=\"0\" x2=\"100\" y2=\"100\" />\n",
 847 |        "  <line x1=\"120\" y1=\"0\" x2=\"120\" y2=\"100\" />\n",
 848 |        "  <line x1=\"140\" y1=\"0\" x2=\"140\" y2=\"100\" />\n",
 849 |        "  <line x1=\"160\" y1=\"0\" x2=\"160\" y2=\"100\" />\n",
 850 |        "  <line x1=\"180\" y1=\"0\" x2=\"180\" y2=\"100\" />\n",
 851 |        "  <line x1=\"200\" y1=\"0\" x2=\"200\" y2=\"100\" style=\"stroke-width:2\" />\n",
 852 |        "\n",
 853 |        "  <!-- Colored Rectangle -->\n",
 854 |        "  <polygon points=\"0.000000,0.000000 200.000000,0.000000 200.000000,100.000000 0.000000,100.000000\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
 855 |        "\n",
 856 |        "  <!-- Text -->\n",
 857 |        "  <text x=\"100.000000\" y=\"120.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >10</text>\n",
 858 |        "  <text x=\"220.000000\" y=\"50.000000\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,220.000000,50.000000)\">5</text>\n",
 859 |        "</svg>"
 860 |       ],
 861 |       "text/plain": [
 862 |        "<IPython.core.display.HTML object>"
 863 |       ]
 864 |      },
 865 |      "execution_count": 32,
 866 |      "metadata": {},
 867 |      "output_type": "execute_result"
 868 |     }
 869 |    ],
 870 |    "source": [
 871 |     "myarray = np.ones((5,10))\n",
 872 |     "numpy_to_svg(myarray)"
 873 |    ]
 874 |   },
 875 |   {
 876 |    "cell_type": "markdown",
 877 |    "metadata": {},
 878 |    "source": [
 879 |     "We are not limited to create 1 or 2 dimensional arrays. We can basically create any-dimension array. For example in microscopy, images can be volumetric and thus they are 3D arrays in Numpy. For example if we acquired 5 planes of a 10px by 10px image, we would have something like:"
 880 |    ]
 881 |   },
 882 |   {
 883 |    "cell_type": "code",
 884 |    "execution_count": 33,
 885 |    "metadata": {},
 886 |    "outputs": [],
 887 |    "source": [
 888 |     "array3D = np.ones((10,10,5))"
 889 |    ]
 890 |   },
 891 |   {
 892 |    "cell_type": "code",
 893 |    "execution_count": 34,
 894 |    "metadata": {},
 895 |    "outputs": [
 896 |     {
 897 |      "data": {
 898 |       "text/html": [
 899 |        "<svg width=\"277\" height=\"367\" style=\"stroke:rgb(0,0,0);stroke-width:1\" >\n",
 900 |        "\n",
 901 |        "  <!-- Horizontal lines -->\n",
 902 |        "  <line x1=\"10\" y1=\"0\" x2=\"127\" y2=\"117\" style=\"stroke-width:2\" />\n",
 903 |        "  <line x1=\"10\" y1=\"20\" x2=\"127\" y2=\"137\" />\n",
 904 |        "  <line x1=\"10\" y1=\"40\" x2=\"127\" y2=\"157\" />\n",
 905 |        "  <line x1=\"10\" y1=\"60\" x2=\"127\" y2=\"177\" />\n",
 906 |        "  <line x1=\"10\" y1=\"80\" x2=\"127\" y2=\"197\" />\n",
 907 |        "  <line x1=\"10\" y1=\"100\" x2=\"127\" y2=\"217\" />\n",
 908 |        "  <line x1=\"10\" y1=\"120\" x2=\"127\" y2=\"237\" />\n",
 909 |        "  <line x1=\"10\" y1=\"140\" x2=\"127\" y2=\"257\" />\n",
 910 |        "  <line x1=\"10\" y1=\"160\" x2=\"127\" y2=\"277\" />\n",
 911 |        "  <line x1=\"10\" y1=\"180\" x2=\"127\" y2=\"297\" />\n",
 912 |        "  <line x1=\"10\" y1=\"200\" x2=\"127\" y2=\"317\" style=\"stroke-width:2\" />\n",
 913 |        "\n",
 914 |        "  <!-- Vertical lines -->\n",
 915 |        "  <line x1=\"10\" y1=\"0\" x2=\"10\" y2=\"200\" style=\"stroke-width:2\" />\n",
 916 |        "  <line x1=\"21\" y1=\"11\" x2=\"21\" y2=\"211\" />\n",
 917 |        "  <line x1=\"33\" y1=\"23\" x2=\"33\" y2=\"223\" />\n",
 918 |        "  <line x1=\"45\" y1=\"35\" x2=\"45\" y2=\"235\" />\n",
 919 |        "  <line x1=\"57\" y1=\"47\" x2=\"57\" y2=\"247\" />\n",
 920 |        "  <line x1=\"68\" y1=\"58\" x2=\"68\" y2=\"258\" />\n",
 921 |        "  <line x1=\"80\" y1=\"70\" x2=\"80\" y2=\"270\" />\n",
 922 |        "  <line x1=\"92\" y1=\"82\" x2=\"92\" y2=\"282\" />\n",
 923 |        "  <line x1=\"104\" y1=\"94\" x2=\"104\" y2=\"294\" />\n",
 924 |        "  <line x1=\"115\" y1=\"105\" x2=\"115\" y2=\"305\" />\n",
 925 |        "  <line x1=\"127\" y1=\"117\" x2=\"127\" y2=\"317\" style=\"stroke-width:2\" />\n",
 926 |        "\n",
 927 |        "  <!-- Colored Rectangle -->\n",
 928 |        "  <polygon points=\"10.000000,0.000000 127.647059,117.647059 127.647059,317.647059 10.000000,200.000000\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
 929 |        "\n",
 930 |        "  <!-- Horizontal lines -->\n",
 931 |        "  <line x1=\"10\" y1=\"0\" x2=\"110\" y2=\"0\" style=\"stroke-width:2\" />\n",
 932 |        "  <line x1=\"21\" y1=\"11\" x2=\"121\" y2=\"11\" />\n",
 933 |        "  <line x1=\"33\" y1=\"23\" x2=\"133\" y2=\"23\" />\n",
 934 |        "  <line x1=\"45\" y1=\"35\" x2=\"145\" y2=\"35\" />\n",
 935 |        "  <line x1=\"57\" y1=\"47\" x2=\"157\" y2=\"47\" />\n",
 936 |        "  <line x1=\"68\" y1=\"58\" x2=\"168\" y2=\"58\" />\n",
 937 |        "  <line x1=\"80\" y1=\"70\" x2=\"180\" y2=\"70\" />\n",
 938 |        "  <line x1=\"92\" y1=\"82\" x2=\"192\" y2=\"82\" />\n",
 939 |        "  <line x1=\"104\" y1=\"94\" x2=\"204\" y2=\"94\" />\n",
 940 |        "  <line x1=\"115\" y1=\"105\" x2=\"215\" y2=\"105\" />\n",
 941 |        "  <line x1=\"127\" y1=\"117\" x2=\"227\" y2=\"117\" style=\"stroke-width:2\" />\n",
 942 |        "\n",
 943 |        "  <!-- Vertical lines -->\n",
 944 |        "  <line x1=\"10\" y1=\"0\" x2=\"127\" y2=\"117\" style=\"stroke-width:2\" />\n",
 945 |        "  <line x1=\"30\" y1=\"0\" x2=\"147\" y2=\"117\" />\n",
 946 |        "  <line x1=\"50\" y1=\"0\" x2=\"167\" y2=\"117\" />\n",
 947 |        "  <line x1=\"70\" y1=\"0\" x2=\"187\" y2=\"117\" />\n",
 948 |        "  <line x1=\"90\" y1=\"0\" x2=\"207\" y2=\"117\" />\n",
 949 |        "  <line x1=\"110\" y1=\"0\" x2=\"227\" y2=\"117\" style=\"stroke-width:2\" />\n",
 950 |        "\n",
 951 |        "  <!-- Colored Rectangle -->\n",
 952 |        "  <polygon points=\"10.000000,0.000000 110.000000,0.000000 227.647059,117.647059 127.647059,117.647059\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
 953 |        "\n",
 954 |        "  <!-- Horizontal lines -->\n",
 955 |        "  <line x1=\"127\" y1=\"117\" x2=\"227\" y2=\"117\" style=\"stroke-width:2\" />\n",
 956 |        "  <line x1=\"127\" y1=\"137\" x2=\"227\" y2=\"137\" />\n",
 957 |        "  <line x1=\"127\" y1=\"157\" x2=\"227\" y2=\"157\" />\n",
 958 |        "  <line x1=\"127\" y1=\"177\" x2=\"227\" y2=\"177\" />\n",
 959 |        "  <line x1=\"127\" y1=\"197\" x2=\"227\" y2=\"197\" />\n",
 960 |        "  <line x1=\"127\" y1=\"217\" x2=\"227\" y2=\"217\" />\n",
 961 |        "  <line x1=\"127\" y1=\"237\" x2=\"227\" y2=\"237\" />\n",
 962 |        "  <line x1=\"127\" y1=\"257\" x2=\"227\" y2=\"257\" />\n",
 963 |        "  <line x1=\"127\" y1=\"277\" x2=\"227\" y2=\"277\" />\n",
 964 |        "  <line x1=\"127\" y1=\"297\" x2=\"227\" y2=\"297\" />\n",
 965 |        "  <line x1=\"127\" y1=\"317\" x2=\"227\" y2=\"317\" style=\"stroke-width:2\" />\n",
 966 |        "\n",
 967 |        "  <!-- Vertical lines -->\n",
 968 |        "  <line x1=\"127\" y1=\"117\" x2=\"127\" y2=\"317\" style=\"stroke-width:2\" />\n",
 969 |        "  <line x1=\"147\" y1=\"117\" x2=\"147\" y2=\"317\" />\n",
 970 |        "  <line x1=\"167\" y1=\"117\" x2=\"167\" y2=\"317\" />\n",
 971 |        "  <line x1=\"187\" y1=\"117\" x2=\"187\" y2=\"317\" />\n",
 972 |        "  <line x1=\"207\" y1=\"117\" x2=\"207\" y2=\"317\" />\n",
 973 |        "  <line x1=\"227\" y1=\"117\" x2=\"227\" y2=\"317\" style=\"stroke-width:2\" />\n",
 974 |        "\n",
 975 |        "  <!-- Colored Rectangle -->\n",
 976 |        "  <polygon points=\"127.647059,117.647059 227.647059,117.647059 227.647059,317.647059 127.647059,317.647059\" style=\"fill:#ECB172A0;stroke-width:0\"/>\n",
 977 |        "\n",
 978 |        "  <!-- Text -->\n",
 979 |        "  <text x=\"177.647059\" y=\"337.647059\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" >5</text>\n",
 980 |        "  <text x=\"247.647059\" y=\"217.647059\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(0,247.647059,217.647059)\">10</text>\n",
 981 |        "  <text x=\"58.823529\" y=\"278.823529\" font-size=\"1.0rem\" font-weight=\"100\" text-anchor=\"middle\" transform=\"rotate(45,58.823529,278.823529)\">10</text>\n",
 982 |        "</svg>"
 983 |       ],
 984 |       "text/plain": [
 985 |        "<IPython.core.display.HTML object>"
 986 |       ]
 987 |      },
 988 |      "execution_count": 34,
 989 |      "metadata": {},
 990 |      "output_type": "execute_result"
 991 |     }
 992 |    ],
 993 |    "source": [
 994 |     "numpy_to_svg(array3D)"
 995 |    ]
 996 |   },
 997 |   {
 998 |    "cell_type": "markdown",
 999 |    "metadata": {},
1000 |    "source": [
1001 |     "All the functions and properties that we have seen until now are N-dimensional, i.e. they work in the same way irrespective of the array size."
1002 |    ]
1003 |   },
1004 |   {
1005 |    "cell_type": "markdown",
1006 |    "metadata": {},
1007 |    "source": [
1008 |     "## 1.3 Importing arrays\n",
1009 |     "\n",
1010 |     "We have seen until now multiple ways to create arrays. However, most of the time, you will *import* data from some source, either directly as arrays or as lists, and use these data in your analysis."
1011 |    ]
1012 |   },
1013 |   {
1014 |    "cell_type": "markdown",
1015 |    "metadata": {},
1016 |    "source": [
1017 |     "### 1.3.1 Loading and saving arrays\n",
1018 |     "\n",
1019 |     "Numpy can efficiently save and load arrays in its own format ```.npy```. Let's create an array and save it:"
1020 |    ]
1021 |   },
1022 |   {
1023 |    "cell_type": "code",
1024 |    "execution_count": 35,
1025 |    "metadata": {},
1026 |    "outputs": [
1027 |     {
1028 |      "data": {
1029 |       "text/plain": [
1030 |        "array([[ 5.41052227, 11.78370736,  9.22402365,  9.91645679,  9.48495895],\n",
1031 |        "       [10.10853493,  8.75839699,  8.26026504, 12.51736441,  9.80407577],\n",
1032 |        "       [10.09084097,  7.27962072, 11.05963249, 14.37978527,  9.00654627],\n",
1033 |        "       [ 6.01521954, 10.25115807, 10.28647927, 10.12389832,  8.91184397]])"
1034 |       ]
1035 |      },
1036 |      "execution_count": 35,
1037 |      "metadata": {},
1038 |      "output_type": "execute_result"
1039 |     }
1040 |    ],
1041 |    "source": [
1042 |     "array_to_save = np.random.normal(10, 2, (4,5))\n",
1043 |     "array_to_save"
1044 |    ]
1045 |   },
1046 |   {
1047 |    "cell_type": "code",
1048 |    "execution_count": 36,
1049 |    "metadata": {},
1050 |    "outputs": [],
1051 |    "source": [
1052 |     "np.save('my_saved_array.npy', array_to_save)"
1053 |    ]
1054 |   },
1055 |   {
1056 |    "cell_type": "code",
1057 |    "execution_count": 37,
1058 |    "metadata": {},
1059 |    "outputs": [
1060 |     {
1061 |      "name": "stdout",
1062 |      "output_type": "stream",
1063 |      "text": [
1064 |       "01-DA_Numpy_arrays_creation.ipynb   98-DA_Numpy_Solutions.ipynb\n",
1065 |       "02-DA_Numpy_array_maths.ipynb       99-DA_Pandas_Exercises.ipynb\n",
1066 |       "03-DA_Numpy_matplotlib.ipynb        99-DA_Pandas_Solutions.ipynb\n",
1067 |       "04-DA_Numpy_indexing.ipynb          My_first_plot.png\n",
1068 |       "05-DA_Numpy_combining_arrays.ipynb  SNSF_data.ipynb\n",
1069 |       "06-DA_Pandas_introduction.ipynb     Untitled.ipynb\n",
1070 |       "07-DA_Pandas_structures.ipynb       \u001b[34m__pycache__\u001b[m\u001b[m/\n",
1071 |       "08-DA_Pandas_import.ipynb           ipyleaflet.ipynb\n",
1072 |       "09-DA_Pandas_operations.ipynb       multiple_arrays.npz\n",
1073 |       "10-DA_Pandas_combine.ipynb          my_saved_array.npy\n",
1074 |       "11-DA_Pandas_splitting.ipynb        \u001b[34mraw.githubusercontent.com\u001b[m\u001b[m/\n",
1075 |       "12-DA_Pandas_plotting.ipynb         svg.py\n",
1076 |       "13-DA_Pandas_ML.ipynb               \u001b[34munused\u001b[m\u001b[m/\n",
1077 |       "98-DA_Numpy_Exercises.ipynb\n"
1078 |      ]
1079 |     }
1080 |    ],
1081 |    "source": [
1082 |     "ls"
1083 |    ]
1084 |   },
1085 |   {
1086 |    "cell_type": "markdown",
1087 |    "metadata": {},
1088 |    "source": [
1089 |     "Now that this array is saved on disk, we can load it again using ```np.load```:"
1090 |    ]
1091 |   },
1092 |   {
1093 |    "cell_type": "code",
1094 |    "execution_count": 38,
1095 |    "metadata": {},
1096 |    "outputs": [
1097 |     {
1098 |      "data": {
1099 |       "text/plain": [
1100 |        "array([[ 5.41052227, 11.78370736,  9.22402365,  9.91645679,  9.48495895],\n",
1101 |        "       [10.10853493,  8.75839699,  8.26026504, 12.51736441,  9.80407577],\n",
1102 |        "       [10.09084097,  7.27962072, 11.05963249, 14.37978527,  9.00654627],\n",
1103 |        "       [ 6.01521954, 10.25115807, 10.28647927, 10.12389832,  8.91184397]])"
1104 |       ]
1105 |      },
1106 |      "execution_count": 38,
1107 |      "metadata": {},
1108 |      "output_type": "execute_result"
1109 |     }
1110 |    ],
1111 |    "source": [
1112 |     "new_array = np.load('my_saved_array.npy')\n",
1113 |     "new_array"
1114 |    ]
1115 |   },
1116 |   {
1117 |    "cell_type": "markdown",
1118 |    "metadata": {},
1119 |    "source": [
1120 |     "If you have several arrays that belong together, you can also save them in a single file using ```np.savez``` in ```npz``` format. Let's create a second array:"
1121 |    ]
1122 |   },
1123 |   {
1124 |    "cell_type": "code",
1125 |    "execution_count": 39,
1126 |    "metadata": {},
1127 |    "outputs": [
1128 |     {
1129 |      "data": {
1130 |       "text/plain": [
1131 |        "array([[14.57759687,  7.62340049]])"
1132 |       ]
1133 |      },
1134 |      "execution_count": 39,
1135 |      "metadata": {},
1136 |      "output_type": "execute_result"
1137 |     }
1138 |    ],
1139 |    "source": [
1140 |     "array_to_save2 = np.random.normal(10, 2, (1,2))\n",
1141 |     "array_to_save2"
1142 |    ]
1143 |   },
1144 |   {
1145 |    "cell_type": "code",
1146 |    "execution_count": 40,
1147 |    "metadata": {},
1148 |    "outputs": [],
1149 |    "source": [
1150 |     "np.savez('multiple_arrays.npz', array_to_save=array_to_save, array_to_save2=array_to_save2)"
1151 |    ]
1152 |   },
1153 |   {
1154 |    "cell_type": "code",
1155 |    "execution_count": 41,
1156 |    "metadata": {},
1157 |    "outputs": [
1158 |     {
1159 |      "name": "stdout",
1160 |      "output_type": "stream",
1161 |      "text": [
1162 |       "01-DA_Numpy_arrays_creation.ipynb   98-DA_Numpy_Solutions.ipynb\n",
1163 |       "02-DA_Numpy_array_maths.ipynb       99-DA_Pandas_Exercises.ipynb\n",
1164 |       "03-DA_Numpy_matplotlib.ipynb        99-DA_Pandas_Solutions.ipynb\n",
1165 |       "04-DA_Numpy_indexing.ipynb          My_first_plot.png\n",
1166 |       "05-DA_Numpy_combining_arrays.ipynb  SNSF_data.ipynb\n",
1167 |       "06-DA_Pandas_introduction.ipynb     Untitled.ipynb\n",
1168 |       "07-DA_Pandas_structures.ipynb       \u001b[34m__pycache__\u001b[m\u001b[m/\n",
1169 |       "08-DA_Pandas_import.ipynb           ipyleaflet.ipynb\n",
1170 |       "09-DA_Pandas_operations.ipynb       multiple_arrays.npz\n",
1171 |       "10-DA_Pandas_combine.ipynb          my_saved_array.npy\n",
1172 |       "11-DA_Pandas_splitting.ipynb        \u001b[34mraw.githubusercontent.com\u001b[m\u001b[m/\n",
1173 |       "12-DA_Pandas_plotting.ipynb         svg.py\n",
1174 |       "13-DA_Pandas_ML.ipynb               \u001b[34munused\u001b[m\u001b[m/\n",
1175 |       "98-DA_Numpy_Exercises.ipynb\n"
1176 |      ]
1177 |     }
1178 |    ],
1179 |    "source": [
1180 |     "ls"
1181 |    ]
1182 |   },
1183 |   {
1184 |    "cell_type": "markdown",
1185 |    "metadata": {},
1186 |    "source": [
1187 |     "And when we load it again:"
1188 |    ]
1189 |   },
1190 |   {
1191 |    "cell_type": "code",
1192 |    "execution_count": 42,
1193 |    "metadata": {},
1194 |    "outputs": [
1195 |     {
1196 |      "data": {
1197 |       "text/plain": [
1198 |        "numpy.lib.npyio.NpzFile"
1199 |       ]
1200 |      },
1201 |      "execution_count": 42,
1202 |      "metadata": {},
1203 |      "output_type": "execute_result"
1204 |     }
1205 |    ],
1206 |    "source": [
1207 |     "load_multiple = np.load('multiple_arrays.npz')\n",
1208 |     "type(load_multiple)"
1209 |    ]
1210 |   },
1211 |   {
1212 |    "cell_type": "markdown",
1213 |    "metadata": {},
1214 |    "source": [
1215 |     "We get here an ```NpzFile``` *object* from which we can read our data. Note that when we load an ```npz``` file, it is only loaded *lazily*, i.e. data are not actually read, but the content is parsed. This is very useful if you need to store large amounts of data but don't always need to re-load all of them. We can use methods to actually access the data:"
1216 |    ]
1217 |   },
1218 |   {
1219 |    "cell_type": "code",
1220 |    "execution_count": 43,
1221 |    "metadata": {},
1222 |    "outputs": [
1223 |     {
1224 |      "data": {
1225 |       "text/plain": [
1226 |        "['array_to_save', 'array_to_save2']"
1227 |       ]
1228 |      },
1229 |      "execution_count": 43,
1230 |      "metadata": {},
1231 |      "output_type": "execute_result"
1232 |     }
1233 |    ],
1234 |    "source": [
1235 |     "load_multiple.files"
1236 |    ]
1237 |   },
1238 |   {
1239 |    "cell_type": "code",
1240 |    "execution_count": 44,
1241 |    "metadata": {},
1242 |    "outputs": [
1243 |     {
1244 |      "data": {
1245 |       "text/plain": [
1246 |        "array([[14.57759687,  7.62340049]])"
1247 |       ]
1248 |      },
1249 |      "execution_count": 44,
1250 |      "metadata": {},
1251 |      "output_type": "execute_result"
1252 |     }
1253 |    ],
1254 |    "source": [
1255 |     "load_multiple.get('array_to_save2')"
1256 |    ]
1257 |   },
1258 |   {
1259 |    "cell_type": "markdown",
1260 |    "metadata": {},
1261 |    "source": [
1262 |     "### 1.3.2 Importing data as arrays\n",
1263 |     "\n",
1264 |     "Images are a typical example of data that are array-like (matrix of pixels) and that can be imported directly as arrays. Of course, each domain will have it's own *importing libraries*. For example in the area of imaging, the scikit-image package is one of the main libraries, and it offers and importer of images as arrays which works both with local files and web addresses:"
1265 |    ]
1266 |   },
1267 |   {
1268 |    "cell_type": "code",
1269 |    "execution_count": 45,
1270 |    "metadata": {},
1271 |    "outputs": [],
1272 |    "source": [
1273 |     "import skimage.io\n",
1274 |     "\n",
1275 |     "image = skimage.io.imread('https://upload.wikimedia.org/wikipedia/commons/f/fd/%27%C3%9Cbermut_Exub%C3%A9rance%27_by_Paul_Klee%2C_1939.jpg')"
1276 |    ]
1277 |   },
1278 |   {
1279 |    "cell_type": "markdown",
1280 |    "metadata": {},
1281 |    "source": [
1282 |     "We can briefly explore that image:"
1283 |    ]
1284 |   },
1285 |   {
1286 |    "cell_type": "code",
1287 |    "execution_count": 46,
1288 |    "metadata": {},
1289 |    "outputs": [
1290 |     {
1291 |      "data": {
1292 |       "text/plain": [
1293 |        "numpy.ndarray"
1294 |       ]
1295 |      },
1296 |      "execution_count": 46,
1297 |      "metadata": {},
1298 |      "output_type": "execute_result"
1299 |     }
1300 |    ],
1301 |    "source": [
1302 |     "type(image)"
1303 |    ]
1304 |   },
1305 |   {
1306 |    "cell_type": "code",
1307 |    "execution_count": 47,
1308 |    "metadata": {},
1309 |    "outputs": [
1310 |     {
1311 |      "data": {
1312 |       "text/plain": [
1313 |        "dtype('uint8')"
1314 |       ]
1315 |      },
1316 |      "execution_count": 47,
1317 |      "metadata": {},
1318 |      "output_type": "execute_result"
1319 |     }
1320 |    ],
1321 |    "source": [
1322 |     "image.dtype"
1323 |    ]
1324 |   },
1325 |   {
1326 |    "cell_type": "code",
1327 |    "execution_count": 48,
1328 |    "metadata": {},
1329 |    "outputs": [
1330 |     {
1331 |      "data": {
1332 |       "text/plain": [
1333 |        "(584, 756, 3)"
1334 |       ]
1335 |      },
1336 |      "execution_count": 48,
1337 |      "metadata": {},
1338 |      "output_type": "execute_result"
1339 |     }
1340 |    ],
1341 |    "source": [
1342 |     "image.shape"
1343 |    ]
1344 |   },
1345 |   {
1346 |    "cell_type": "markdown",
1347 |    "metadata": {},
1348 |    "source": [
1349 |     "We see that we have an array of integeres with 3 dimensions. Since we imported a jpg image, we know that the thrid dimension corresponds to three color channels Red, Green, Blue (RGB)."
1350 |    ]
1351 |   },
1352 |   {
1353 |    "cell_type": "markdown",
1354 |    "metadata": {},
1355 |    "source": [
1356 |     "You can also read regular CSV files directly as Numpy arrays. This is more commonly done using Pandas, so we don't spend much time on this, but here is an example on importing data from the web:"
1357 |    ]
1358 |   },
1359 |   {
1360 |    "cell_type": "code",
1361 |    "execution_count": 49,
1362 |    "metadata": {},
1363 |    "outputs": [],
1364 |    "source": [
1365 |     "oilprice = np.loadtxt('https://raw.githubusercontent.com/guiwitz/Rdatasets/master/csv/quantreg/gasprice.csv',\n",
1366 |     "          delimiter=',', usecols=range(2,3), skiprows=1)"
1367 |    ]
1368 |   },
1369 |   {
1370 |    "cell_type": "code",
1371 |    "execution_count": 50,
1372 |    "metadata": {},
1373 |    "outputs": [
1374 |     {
1375 |      "data": {
1376 |       "text/plain": [
1377 |        "array([126.6, 127.2, 132.1, 133.3, 133.9, 134.5, 133.9, 133.4, 132.8,\n",
1378 |        "       132.3, 131.1, 134.1, 119.2, 116.8, 113.9, 110.6, 107.8, 105.4,\n",
1379 |        "       102.5, 104.5, 104.3, 104.7, 105.2, 106.6, 106.9, 109. , 110.4,\n",
1380 |        "       111.3, 112.1, 112.9, 114. , 113.8, 113.5, 112.6, 111.4, 110.4,\n",
1381 |        "       109.8, 109.4, 109.1, 109.1, 109.9, 111.2, 112.4, 112.4, 112.7,\n",
1382 |        "       112. , 111. , 109.7, 109.2, 108.9, 108.4, 108.8, 109.1, 109.1,\n",
1383 |        "       110.2, 110.4, 109.9, 109.9, 109.1, 107.5, 106.3, 105.3, 104.2,\n",
1384 |        "       102.6, 101.4, 100.6,  99.5, 100.4, 101.1, 101.4, 101.2, 101.3,\n",
1385 |        "       101. , 101.5, 101.3, 102.6, 105.1, 105.8, 107.2, 108.9, 110.2,\n",
1386 |        "       111.8, 112. , 112.8, 114.3, 115.1, 115.3, 114.9, 114.7, 113.9,\n",
1387 |        "       113.2, 112.8, 112.6, 112.3, 111.6, 112.3, 112.1, 112.1, 112.4,\n",
1388 |        "       112.3, 111.8, 111.5, 111.5, 111.3, 111.3, 112. , 112. , 111.2,\n",
1389 |        "       110.6, 109.8, 108.9, 107.8, 107.4, 106.9, 106.5, 106.6, 106.1,\n",
1390 |        "       105.5, 105.5, 106.2, 105.3, 104.7, 104.2, 104.8, 105.8, 105.6,\n",
1391 |        "       105.7, 106.8, 107.9, 107.9, 108.6, 108.6, 109.7, 110.6, 110.6,\n",
1392 |        "       110.7, 110.4, 110.1, 109.5, 108.9, 108.6, 108.1, 107.5, 106.9,\n",
1393 |        "       106.2, 106. , 105.9, 106.5, 106.2, 105.5, 105.1, 104.5, 104.7,\n",
1394 |        "       109.2, 109. , 109.3, 109.2, 108.4, 107.5, 106.4, 105.8, 105.1,\n",
1395 |        "       103.6, 101.8, 100.3,  99.9,  99.2,  99.5, 100.1,  99.9, 100.5,\n",
1396 |        "       100.7, 101.6, 100.9, 100.4, 100.7, 100.5, 100.7, 101.2, 101.1,\n",
1397 |        "       102.8, 103.3, 103.7, 104. , 104.5, 104.6, 105. , 105.6, 106.5,\n",
1398 |        "       107.3, 107.9, 109.5, 109.7, 110.3, 110.9, 111.4, 113. , 115.7,\n",
1399 |        "       116.1, 116.5, 116.1, 115.6, 115. , 114. , 112.9, 112. , 111.4,\n",
1400 |        "       110.6, 110.7, 112.1, 112.3, 112.2, 111.3, 108.2, 107.5, 106.4,\n",
1401 |        "       105.6, 104.4, 106.3, 107. , 106.2, 106.8, 106.8, 106.2, 105.8,\n",
1402 |        "       105.2, 106. , 106.3, 105.6, 105.5, 106.3, 107.7, 109.4, 111. ,\n",
1403 |        "       113.3, 114.1, 116.4, 117.3, 119.1, 119.3, 119.4, 119. , 118.3,\n",
1404 |        "       117.7, 116.9, 115.9, 114.8, 113.8, 112.6, 112.4, 112.1, 112.2,\n",
1405 |        "       111.3, 111.1, 110.7, 110.6, 110.6, 110. , 109.2, 108.1, 107.3,\n",
1406 |        "       106.2, 106. , 105.9, 105.6, 105.7, 105.8, 105.7, 107.2, 107.5,\n",
1407 |        "       107.7, 108.6, 109.2, 108.4, 107.9, 107.6, 107.3, 107.8, 109.9,\n",
1408 |        "       111.5, 111.6, 112.8, 115.8, 117.2, 119.5, 123.4, 124.3, 125.7,\n",
1409 |        "       125.9, 126.2, 126.9, 126. , 125.2, 124.7, 124.1, 123. , 121.9,\n",
1410 |        "       121.7, 121.5, 121.5, 120.9, 119.9, 119.6, 119.9, 120.1, 119.3,\n",
1411 |        "       120.1, 120.3, 120.3, 119.9, 119.1, 120.3, 120.5, 121.7, 122.5,\n",
1412 |        "       122.9, 123.8, 124.6, 124.2, 124.1, 123.3, 122.7, 122.4, 122. ,\n",
1413 |        "       123.5, 123.6, 123.2, 123. , 122.7, 122. , 121.7, 120.8, 119.9,\n",
1414 |        "       119.1, 119.6, 119.1, 119.2, 118.7, 118.8, 118.5, 118.2, 118.2,\n",
1415 |        "       119.5, 120.4, 120.6, 119.8, 118.9, 117.9, 117.1, 116.9, 116.5,\n",
1416 |        "       117. , 116.4, 118.5, 121.9, 121.8, 123. , 122.9, 122.7, 121.9,\n",
1417 |        "       120.8, 119.5, 119.5, 118.7, 117.8, 116.8, 116.3, 116.4, 115.6,\n",
1418 |        "       115. , 114. , 112.8, 111.8, 110.8, 109.9, 108.9, 108.3, 107.2,\n",
1419 |        "       105.5, 105.1, 104.5, 103.2, 103.8, 102.5, 101.7, 100.6,  99.8,\n",
1420 |        "       102.6, 102.3, 101.8, 102.1, 103.2, 103.8, 105.2, 105.5, 105.2,\n",
1421 |        "       104.7, 106. , 104.9, 104.1, 104.2, 104.1, 103.7, 104.4, 103.5,\n",
1422 |        "       102.3, 101.8, 101.1, 100.4,  99.8,  99.1,  98.7,  99.9,  99.9,\n",
1423 |        "       100.6, 101. , 100.7, 100.1,  99.7,  99.4,  98.1,  97.1,  95.4,\n",
1424 |        "        93.3,  92.3,  92.1,  91.4,  91.3,  92. ,  92.1,  91.3,  90.8,\n",
1425 |        "        90.7,  89.9,  88.5,  89.1,  90. ,  95.8,  99.9, 105.5, 108.7,\n",
1426 |        "       110.7, 110.3, 109.9, 110.7, 110.9, 111.2, 110.1, 108.8, 109.2,\n",
1427 |        "       108.8, 110.5, 109.5, 111. , 112.3, 114.8, 117.2, 117.2, 118.3,\n",
1428 |        "       121.4, 121.2, 121.4, 122.3, 123.4, 125.2, 124.8, 124.2, 123.4,\n",
1429 |        "       122. , 122.5, 121.8, 122.2, 124. , 125.8, 126.2, 126. , 126.3,\n",
1430 |        "       125.7, 126.3, 126. , 125.2, 126.8, 130.7, 130.7, 131.9, 135. ,\n",
1431 |        "       140. , 141.3, 149. , 151.1, 150.8, 148.4, 147.8, 144.7, 141.5,\n",
1432 |        "       140.6, 138.6, 142.7, 146.6, 149.4, 150.9, 153.5, 160.7, 166.4,\n",
1433 |        "       164.1, 160.6, 157.1, 152.1, 149.9, 144.7, 143.7, 142. , 144.4,\n",
1434 |        "       145.6, 150.2, 153.5, 153.9, 152.5, 149.8, 147.3, 151.6, 153.2,\n",
1435 |        "       152.3, 150.2, 150.1, 148.7, 148.9, 146.4, 142.5, 139.6, 138.8,\n",
1436 |        "       137.7, 140. , 145.8, 145.6, 144.6, 142.6, 146. , 142.9, 141. ,\n",
1437 |        "       139.3, 138.7, 137.7, 137.9, 141.1, 146.9, 153.5, 158.6, 158.5,\n",
1438 |        "       165.9, 166.3, 163.7, 165.6, 163. , 158. , 152.6, 145.4, 138.4,\n",
1439 |        "       135. , 133. , 131.8, 131.9, 131.9, 134.7, 139.9, 148. , 153.8,\n",
1440 |        "       151.1, 151.6, 146. , 138.1, 131. , 126.4, 122.1, 119.3, 117. ,\n",
1441 |        "       114.7, 114. , 109.7, 108.4, 107.5, 104.2, 106.3, 109.6, 110.9,\n",
1442 |        "       109.9, 108.7, 108.1, 109.8, 108.5, 108.9, 108.7, 111.8, 119.4,\n",
1443 |        "       126.2, 130.8, 133.9, 138.2, 136.8, 136.7, 135.3, 135.6, 134.9,\n",
1444 |        "       136. , 134.8, 135.3, 133.2, 133.5, 134.2, 135.7, 134.5, 136.1,\n",
1445 |        "       138.1, 137.6, 135.5, 135.5, 135.7, 136.5, 135.3, 135.5, 136.7,\n",
1446 |        "       135.7, 138.5, 141.6, 142.2, 144.3, 142.7, 142.7, 140.6, 137. ,\n",
1447 |        "       133.6, 131.6, 131.6, 132.2, 137.1, 141.7, 141.2, 142.3, 142.2,\n",
1448 |        "       143.7, 149.9, 158.2, 163. , 161.7, 164.1, 166.3, 167.3, 162.6,\n",
1449 |        "       157.7, 155.7, 152.1, 150.4, 148.6, 144.1, 142.7, 144.4, 143.9,\n",
1450 |        "       142.8, 145.6, 148. , 145.1, 144.3, 144.8, 148.9, 149.6, 148.8,\n",
1451 |        "       151.6, 155. , 159.4, 169.3, 168.8, 165.3, 163.6, 158. , 152.4,\n",
1452 |        "       151.1, 151.5, 152.7, 149.9, 149.4, 146.4, 145.9, 147.8, 145.4,\n",
1453 |        "       144.1, 143.3, 145.9, 145.4, 149.2, 154.4, 157.9, 160.4, 159.1,\n",
1454 |        "       160.9, 161.7])"
1455 |       ]
1456 |      },
1457 |      "execution_count": 50,
1458 |      "metadata": {},
1459 |      "output_type": "execute_result"
1460 |     }
1461 |    ],
1462 |    "source": [
1463 |     "oilprice"
1464 |    ]
1465 |   },
1466 |   {
1467 |    "cell_type": "code",
1468 |    "execution_count": null,
1469 |    "metadata": {},
1470 |    "outputs": [],
1471 |    "source": []
1472 |   }
1473 |  ],
1474 |  "metadata": {
1475 |   "kernelspec": {
1476 |    "display_name": "Python 3",
1477 |    "language": "python",
1478 |    "name": "python3"
1479 |   },
1480 |   "language_info": {
1481 |    "codemirror_mode": {
1482 |     "name": "ipython",
1483 |     "version": 3
1484 |    },
1485 |    "file_extension": ".py",
1486 |    "mimetype": "text/x-python",
1487 |    "name": "python",
1488 |    "nbconvert_exporter": "python",
1489 |    "pygments_lexer": "ipython3",
1490 |    "version": "3.8.2"
1491 |   },
1492 |   "nteract": {
1493 |    "version": "0.23.1"
1494 |   },
1495 |   "toc": {
1496 |    "base_numbering": 1,
1497 |    "nav_menu": {},
1498 |    "number_sections": false,
1499 |    "sideBar": true,
1500 |    "skip_h1_title": false,
1501 |    "title_cell": "Table of Contents",
1502 |    "title_sidebar": "Contents",
1503 |    "toc_cell": false,
1504 |    "toc_position": {},
1505 |    "toc_section_display": true,
1506 |    "toc_window_display": true
1507 |   }
1508 |  },
1509 |  "nbformat": 4,
1510 |  "nbformat_minor": 4
1511 | }
1512 | 


--------------------------------------------------------------------------------
/02-DA_Numpy_array_maths.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# 2. Mathematics with arrays\n",
  8 |     "\n",
  9 |     "One of the great advantages of Numpy arrays is that they allow one to very easily apply mathematical operations to entire arrays effortlessly. We are presenting here 3 ways in which this can be done."
 10 |    ]
 11 |   },
 12 |   {
 13 |    "cell_type": "code",
 14 |    "execution_count": 1,
 15 |    "metadata": {},
 16 |    "outputs": [],
 17 |    "source": [
 18 |     "import numpy as np"
 19 |    ]
 20 |   },
 21 |   {
 22 |    "cell_type": "markdown",
 23 |    "metadata": {},
 24 |    "source": [
 25 |     "## 2.1 Simple calculus\n",
 26 |     "\n",
 27 |     "To illustrate how arrays are useful, let's first consider the following problem. You have a list:"
 28 |    ]
 29 |   },
 30 |   {
 31 |    "cell_type": "code",
 32 |    "execution_count": 2,
 33 |    "metadata": {},
 34 |    "outputs": [],
 35 |    "source": [
 36 |     "mylist = [1,2,3,4,5]"
 37 |    ]
 38 |   },
 39 |   {
 40 |    "cell_type": "markdown",
 41 |    "metadata": {},
 42 |    "source": [
 43 |     "And now you wish to add to each element of that list the value 3. If we write:"
 44 |    ]
 45 |   },
 46 |   {
 47 |    "cell_type": "code",
 48 |    "execution_count": 3,
 49 |    "metadata": {},
 50 |    "outputs": [
 51 |     {
 52 |      "ename": "TypeError",
 53 |      "evalue": "can only concatenate list (not \"int\") to list",
 54 |      "output_type": "error",
 55 |      "traceback": [
 56 |       "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
 57 |       "\u001b[0;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
 58 |       "\u001b[0;32m<ipython-input-3-ecae2962d7b1>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmylist\u001b[0m \u001b[0;34m+\u001b[0m \u001b[0;36m3\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
 59 |       "\u001b[0;31mTypeError\u001b[0m: can only concatenate list (not \"int\") to list"
 60 |      ]
 61 |     }
 62 |    ],
 63 |    "source": [
 64 |     "mylist + 3"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "markdown",
 69 |    "metadata": {},
 70 |    "source": [
 71 |     "We receive an error because Python doesn't know how to combine a list with a simple integer. In this case we would have to use a for loop or a comprehension list, which is cumbersome."
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "code",
 76 |    "execution_count": 4,
 77 |    "metadata": {},
 78 |    "outputs": [
 79 |     {
 80 |      "data": {
 81 |       "text/plain": [
 82 |        "[4, 5, 6, 7, 8]"
 83 |       ]
 84 |      },
 85 |      "execution_count": 4,
 86 |      "metadata": {},
 87 |      "output_type": "execute_result"
 88 |     }
 89 |    ],
 90 |    "source": [
 91 |     "[x + 3 for x in mylist]"
 92 |    ]
 93 |   },
 94 |   {
 95 |    "cell_type": "markdown",
 96 |    "metadata": {},
 97 |    "source": [
 98 |     "Let's see now how this works for an array:"
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": 5,
104 |    "metadata": {},
105 |    "outputs": [],
106 |    "source": [
107 |     "myarray = np.array(mylist)"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "code",
112 |    "execution_count": 6,
113 |    "metadata": {},
114 |    "outputs": [
115 |     {
116 |      "data": {
117 |       "text/plain": [
118 |        "array([4, 5, 6, 7, 8])"
119 |       ]
120 |      },
121 |      "execution_count": 6,
122 |      "metadata": {},
123 |      "output_type": "execute_result"
124 |     }
125 |    ],
126 |    "source": [
127 |     "myarray + 3"
128 |    ]
129 |   },
130 |   {
131 |    "cell_type": "markdown",
132 |    "metadata": {},
133 |    "source": [
134 |     "Numpy understands without trouble that our goal is to add the value 3 to *each element* in our list. Naturally this is dimension independent e.g.:"
135 |    ]
136 |   },
137 |   {
138 |    "cell_type": "code",
139 |    "execution_count": 7,
140 |    "metadata": {},
141 |    "outputs": [
142 |     {
143 |      "data": {
144 |       "text/plain": [
145 |        "array([[1., 1., 1., 1., 1., 1.],\n",
146 |        "       [1., 1., 1., 1., 1., 1.],\n",
147 |        "       [1., 1., 1., 1., 1., 1.]])"
148 |       ]
149 |      },
150 |      "execution_count": 7,
151 |      "metadata": {},
152 |      "output_type": "execute_result"
153 |     }
154 |    ],
155 |    "source": [
156 |     "my2d_array = np.ones((3,6))\n",
157 |     "my2d_array"
158 |    ]
159 |   },
160 |   {
161 |    "cell_type": "code",
162 |    "execution_count": 8,
163 |    "metadata": {},
164 |    "outputs": [
165 |     {
166 |      "data": {
167 |       "text/plain": [
168 |        "array([[4., 4., 4., 4., 4., 4.],\n",
169 |        "       [4., 4., 4., 4., 4., 4.],\n",
170 |        "       [4., 4., 4., 4., 4., 4.]])"
171 |       ]
172 |      },
173 |      "execution_count": 8,
174 |      "metadata": {},
175 |      "output_type": "execute_result"
176 |     }
177 |    ],
178 |    "source": [
179 |     "my2d_array + 3"
180 |    ]
181 |   },
182 |   {
183 |    "cell_type": "markdown",
184 |    "metadata": {},
185 |    "source": [
186 |     "Of course as long as we don't reassign this new state to our variable it remains unchanged:"
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "code",
191 |    "execution_count": 9,
192 |    "metadata": {},
193 |    "outputs": [
194 |     {
195 |      "data": {
196 |       "text/plain": [
197 |        "array([[1., 1., 1., 1., 1., 1.],\n",
198 |        "       [1., 1., 1., 1., 1., 1.],\n",
199 |        "       [1., 1., 1., 1., 1., 1.]])"
200 |       ]
201 |      },
202 |      "execution_count": 9,
203 |      "metadata": {},
204 |      "output_type": "execute_result"
205 |     }
206 |    ],
207 |    "source": [
208 |     "my2d_array"
209 |    ]
210 |   },
211 |   {
212 |    "cell_type": "markdown",
213 |    "metadata": {},
214 |    "source": [
215 |     "We have to write:"
216 |    ]
217 |   },
218 |   {
219 |    "cell_type": "code",
220 |    "execution_count": 10,
221 |    "metadata": {},
222 |    "outputs": [],
223 |    "source": [
224 |     "my2d_array = my2d_array + 3"
225 |    ]
226 |   },
227 |   {
228 |    "cell_type": "code",
229 |    "execution_count": 11,
230 |    "metadata": {},
231 |    "outputs": [
232 |     {
233 |      "data": {
234 |       "text/plain": [
235 |        "array([[4., 4., 4., 4., 4., 4.],\n",
236 |        "       [4., 4., 4., 4., 4., 4.],\n",
237 |        "       [4., 4., 4., 4., 4., 4.]])"
238 |       ]
239 |      },
240 |      "execution_count": 11,
241 |      "metadata": {},
242 |      "output_type": "execute_result"
243 |     }
244 |    ],
245 |    "source": [
246 |     "my2d_array"
247 |    ]
248 |   },
249 |   {
250 |    "cell_type": "markdown",
251 |    "metadata": {},
252 |    "source": [
253 |     "Naturally all basic operations work:"
254 |    ]
255 |   },
256 |   {
257 |    "cell_type": "code",
258 |    "execution_count": 12,
259 |    "metadata": {},
260 |    "outputs": [
261 |     {
262 |      "data": {
263 |       "text/plain": [
264 |        "array([[16., 16., 16., 16., 16., 16.],\n",
265 |        "       [16., 16., 16., 16., 16., 16.],\n",
266 |        "       [16., 16., 16., 16., 16., 16.]])"
267 |       ]
268 |      },
269 |      "execution_count": 12,
270 |      "metadata": {},
271 |      "output_type": "execute_result"
272 |     }
273 |    ],
274 |    "source": [
275 |     "my2d_array * 4"
276 |    ]
277 |   },
278 |   {
279 |    "cell_type": "code",
280 |    "execution_count": 13,
281 |    "metadata": {},
282 |    "outputs": [
283 |     {
284 |      "data": {
285 |       "text/plain": [
286 |        "array([[0.8, 0.8, 0.8, 0.8, 0.8, 0.8],\n",
287 |        "       [0.8, 0.8, 0.8, 0.8, 0.8, 0.8],\n",
288 |        "       [0.8, 0.8, 0.8, 0.8, 0.8, 0.8]])"
289 |       ]
290 |      },
291 |      "execution_count": 13,
292 |      "metadata": {},
293 |      "output_type": "execute_result"
294 |     }
295 |    ],
296 |    "source": [
297 |     "my2d_array / 5"
298 |    ]
299 |   },
300 |   {
301 |    "cell_type": "code",
302 |    "execution_count": 14,
303 |    "metadata": {},
304 |    "outputs": [
305 |     {
306 |      "data": {
307 |       "text/plain": [
308 |        "array([[1024., 1024., 1024., 1024., 1024., 1024.],\n",
309 |        "       [1024., 1024., 1024., 1024., 1024., 1024.],\n",
310 |        "       [1024., 1024., 1024., 1024., 1024., 1024.]])"
311 |       ]
312 |      },
313 |      "execution_count": 14,
314 |      "metadata": {},
315 |      "output_type": "execute_result"
316 |     }
317 |    ],
318 |    "source": [
319 |     "my2d_array ** 5"
320 |    ]
321 |   },
322 |   {
323 |    "cell_type": "markdown",
324 |    "metadata": {},
325 |    "source": [
326 |     "## 2.2 Mathematical functions\n",
327 |     "\n",
328 |     "In addition to simple arithmetic, Numpy offers a vast choice of functions that can be directly applied to arrays. For example trigonometry:"
329 |    ]
330 |   },
331 |   {
332 |    "cell_type": "code",
333 |    "execution_count": 15,
334 |    "metadata": {},
335 |    "outputs": [
336 |     {
337 |      "data": {
338 |       "text/plain": [
339 |        "array([ 0.54030231, -0.41614684, -0.9899925 , -0.65364362,  0.28366219])"
340 |       ]
341 |      },
342 |      "execution_count": 15,
343 |      "metadata": {},
344 |      "output_type": "execute_result"
345 |     }
346 |    ],
347 |    "source": [
348 |     "np.cos(myarray)"
349 |    ]
350 |   },
351 |   {
352 |    "cell_type": "markdown",
353 |    "metadata": {},
354 |    "source": [
355 |     "Exponentials and logs:"
356 |    ]
357 |   },
358 |   {
359 |    "cell_type": "code",
360 |    "execution_count": 16,
361 |    "metadata": {},
362 |    "outputs": [
363 |     {
364 |      "data": {
365 |       "text/plain": [
366 |        "array([  2.71828183,   7.3890561 ,  20.08553692,  54.59815003,\n",
367 |        "       148.4131591 ])"
368 |       ]
369 |      },
370 |      "execution_count": 16,
371 |      "metadata": {},
372 |      "output_type": "execute_result"
373 |     }
374 |    ],
375 |    "source": [
376 |     "np.exp(myarray)"
377 |    ]
378 |   },
379 |   {
380 |    "cell_type": "code",
381 |    "execution_count": 17,
382 |    "metadata": {},
383 |    "outputs": [
384 |     {
385 |      "data": {
386 |       "text/plain": [
387 |        "array([0.        , 0.30103   , 0.47712125, 0.60205999, 0.69897   ])"
388 |       ]
389 |      },
390 |      "execution_count": 17,
391 |      "metadata": {},
392 |      "output_type": "execute_result"
393 |     }
394 |    ],
395 |    "source": [
396 |     "np.log10(myarray)"
397 |    ]
398 |   },
399 |   {
400 |    "cell_type": "markdown",
401 |    "metadata": {},
402 |    "source": [
403 |     "## 2.3 Logical operations"
404 |    ]
405 |   },
406 |   {
407 |    "cell_type": "markdown",
408 |    "metadata": {},
409 |    "source": [
410 |     "If we use a logical comparison on a regular variable, the output is a *boolean* (True or False) that describes the outcome of the comparison:"
411 |    ]
412 |   },
413 |   {
414 |    "cell_type": "code",
415 |    "execution_count": 18,
416 |    "metadata": {},
417 |    "outputs": [
418 |     {
419 |      "data": {
420 |       "text/plain": [
421 |        "False"
422 |       ]
423 |      },
424 |      "execution_count": 18,
425 |      "metadata": {},
426 |      "output_type": "execute_result"
427 |     }
428 |    ],
429 |    "source": [
430 |     "a = 3\n",
431 |     "b = 2\n",
432 |     "a > 3"
433 |    ]
434 |   },
435 |   {
436 |    "cell_type": "markdown",
437 |    "metadata": {},
438 |    "source": [
439 |     "We can do exactly the same thing with arrays. When we added 3 to an array, that value was automatically added to each element of the array. With logical operations, the comparison is also done for each element in the array resulting in a boolean array:"
440 |    ]
441 |   },
442 |   {
443 |    "cell_type": "code",
444 |    "execution_count": 19,
445 |    "metadata": {},
446 |    "outputs": [
447 |     {
448 |      "data": {
449 |       "text/plain": [
450 |        "array([[0., 0., 0., 0.],\n",
451 |        "       [0., 0., 0., 0.],\n",
452 |        "       [0., 0., 0., 1.],\n",
453 |        "       [0., 0., 0., 0.]])"
454 |       ]
455 |      },
456 |      "execution_count": 19,
457 |      "metadata": {},
458 |      "output_type": "execute_result"
459 |     }
460 |    ],
461 |    "source": [
462 |     "myarray = np.zeros((4,4))\n",
463 |     "myarray[2,3] = 1\n",
464 |     "myarray"
465 |    ]
466 |   },
467 |   {
468 |    "cell_type": "code",
469 |    "execution_count": 20,
470 |    "metadata": {},
471 |    "outputs": [
472 |     {
473 |      "data": {
474 |       "text/plain": [
475 |        "array([[False, False, False, False],\n",
476 |        "       [False, False, False, False],\n",
477 |        "       [False, False, False,  True],\n",
478 |        "       [False, False, False, False]])"
479 |       ]
480 |      },
481 |      "execution_count": 20,
482 |      "metadata": {},
483 |      "output_type": "execute_result"
484 |     }
485 |    ],
486 |    "source": [
487 |     "myarray > 0"
488 |    ]
489 |   },
490 |   {
491 |    "cell_type": "markdown",
492 |    "metadata": {},
493 |    "source": [
494 |     "Exactly as for simple variables, we can assign this boolean array to a new variable directly:"
495 |    ]
496 |   },
497 |   {
498 |    "cell_type": "code",
499 |    "execution_count": 21,
500 |    "metadata": {},
501 |    "outputs": [],
502 |    "source": [
503 |     "myboolean = myarray > 0"
504 |    ]
505 |   },
506 |   {
507 |    "cell_type": "code",
508 |    "execution_count": 22,
509 |    "metadata": {},
510 |    "outputs": [
511 |     {
512 |      "data": {
513 |       "text/plain": [
514 |        "array([[False, False, False, False],\n",
515 |        "       [False, False, False, False],\n",
516 |        "       [False, False, False,  True],\n",
517 |        "       [False, False, False, False]])"
518 |       ]
519 |      },
520 |      "execution_count": 22,
521 |      "metadata": {},
522 |      "output_type": "execute_result"
523 |     }
524 |    ],
525 |    "source": [
526 |     "myboolean"
527 |    ]
528 |   },
529 |   {
530 |    "cell_type": "markdown",
531 |    "metadata": {},
532 |    "source": [
533 |     "## 2.4 Methods modifying array dimensions"
534 |    ]
535 |   },
536 |   {
537 |    "cell_type": "markdown",
538 |    "metadata": {},
539 |    "source": [
540 |     "The operations described above were applied *element-wise*. However sometimes we need to do operations either at the array level or some of its axes. For example, we need very commonly statistics on an array (mean, sum etc.)"
541 |    ]
542 |   },
543 |   {
544 |    "cell_type": "code",
545 |    "execution_count": 23,
546 |    "metadata": {},
547 |    "outputs": [
548 |     {
549 |      "data": {
550 |       "text/plain": [
551 |        "array([[ 8.22235922, 10.86316749,  8.97190654, 12.16211971],\n",
552 |        "       [11.31745909,  9.80774793, 11.2873836 ,  6.77945745],\n",
553 |        "       [10.20776894,  8.78011512,  6.96723135, 11.77819806]])"
554 |       ]
555 |      },
556 |      "execution_count": 23,
557 |      "metadata": {},
558 |      "output_type": "execute_result"
559 |     }
560 |    ],
561 |    "source": [
562 |     "nd_array = np.random.normal(10, 2, (3,4))\n",
563 |     "nd_array"
564 |    ]
565 |   },
566 |   {
567 |    "cell_type": "code",
568 |    "execution_count": 24,
569 |    "metadata": {},
570 |    "outputs": [
571 |     {
572 |      "data": {
573 |       "text/plain": [
574 |        "9.762076209457817"
575 |       ]
576 |      },
577 |      "execution_count": 24,
578 |      "metadata": {},
579 |      "output_type": "execute_result"
580 |     }
581 |    ],
582 |    "source": [
583 |     "np.mean(nd_array)"
584 |    ]
585 |   },
586 |   {
587 |    "cell_type": "code",
588 |    "execution_count": 25,
589 |    "metadata": {},
590 |    "outputs": [
591 |     {
592 |      "data": {
593 |       "text/plain": [
594 |        "1.747626512794281"
595 |       ]
596 |      },
597 |      "execution_count": 25,
598 |      "metadata": {},
599 |      "output_type": "execute_result"
600 |     }
601 |    ],
602 |    "source": [
603 |     "np.std(nd_array)"
604 |    ]
605 |   },
606 |   {
607 |    "cell_type": "markdown",
608 |    "metadata": {},
609 |    "source": [
610 |     "Or the maximum value:"
611 |    ]
612 |   },
613 |   {
614 |    "cell_type": "code",
615 |    "execution_count": 26,
616 |    "metadata": {},
617 |    "outputs": [
618 |     {
619 |      "data": {
620 |       "text/plain": [
621 |        "12.162119714449235"
622 |       ]
623 |      },
624 |      "execution_count": 26,
625 |      "metadata": {},
626 |      "output_type": "execute_result"
627 |     }
628 |    ],
629 |    "source": [
630 |     "np.max(nd_array)"
631 |    ]
632 |   },
633 |   {
634 |    "cell_type": "markdown",
635 |    "metadata": {},
636 |    "source": [
637 |     "Note that several of these functions can be called as array methods instead of numpy functions:"
638 |    ]
639 |   },
640 |   {
641 |    "cell_type": "code",
642 |    "execution_count": 27,
643 |    "metadata": {},
644 |    "outputs": [
645 |     {
646 |      "data": {
647 |       "text/plain": [
648 |        "9.762076209457817"
649 |       ]
650 |      },
651 |      "execution_count": 27,
652 |      "metadata": {},
653 |      "output_type": "execute_result"
654 |     }
655 |    ],
656 |    "source": [
657 |     "nd_array.mean()"
658 |    ]
659 |   },
660 |   {
661 |    "cell_type": "code",
662 |    "execution_count": 28,
663 |    "metadata": {},
664 |    "outputs": [
665 |     {
666 |      "data": {
667 |       "text/plain": [
668 |        "12.162119714449235"
669 |       ]
670 |      },
671 |      "execution_count": 28,
672 |      "metadata": {},
673 |      "output_type": "execute_result"
674 |     }
675 |    ],
676 |    "source": [
677 |     "nd_array.max()"
678 |    ]
679 |   },
680 |   {
681 |    "cell_type": "markdown",
682 |    "metadata": {},
683 |    "source": [
684 |     "Note that most functions can be applied to specific axes. Let's remember that our arrays is:"
685 |    ]
686 |   },
687 |   {
688 |    "cell_type": "code",
689 |    "execution_count": 29,
690 |    "metadata": {},
691 |    "outputs": [
692 |     {
693 |      "data": {
694 |       "text/plain": [
695 |        "array([[ 8.22235922, 10.86316749,  8.97190654, 12.16211971],\n",
696 |        "       [11.31745909,  9.80774793, 11.2873836 ,  6.77945745],\n",
697 |        "       [10.20776894,  8.78011512,  6.96723135, 11.77819806]])"
698 |       ]
699 |      },
700 |      "execution_count": 29,
701 |      "metadata": {},
702 |      "output_type": "execute_result"
703 |     }
704 |    ],
705 |    "source": [
706 |     "nd_array"
707 |    ]
708 |   },
709 |   {
710 |    "cell_type": "markdown",
711 |    "metadata": {},
712 |    "source": [
713 |     "We can for example do a maximum projection along the first axis (rows): the maximum value of eadch column is kept:"
714 |    ]
715 |   },
716 |   {
717 |    "cell_type": "code",
718 |    "execution_count": 30,
719 |    "metadata": {},
720 |    "outputs": [
721 |     {
722 |      "data": {
723 |       "text/plain": [
724 |        "array([11.31745909, 10.86316749, 11.2873836 , 12.16211971])"
725 |       ]
726 |      },
727 |      "execution_count": 30,
728 |      "metadata": {},
729 |      "output_type": "execute_result"
730 |     }
731 |    ],
732 |    "source": [
733 |     "proj0 = nd_array.max(axis=0)\n",
734 |     "proj0"
735 |    ]
736 |   },
737 |   {
738 |    "cell_type": "code",
739 |    "execution_count": 31,
740 |    "metadata": {},
741 |    "outputs": [
742 |     {
743 |      "data": {
744 |       "text/plain": [
745 |        "(4,)"
746 |       ]
747 |      },
748 |      "execution_count": 31,
749 |      "metadata": {},
750 |      "output_type": "execute_result"
751 |     }
752 |    ],
753 |    "source": [
754 |     "proj0.shape"
755 |    ]
756 |   },
757 |   {
758 |    "cell_type": "markdown",
759 |    "metadata": {},
760 |    "source": [
761 |     "We can of course do the same operation for the second axis:"
762 |    ]
763 |   },
764 |   {
765 |    "cell_type": "code",
766 |    "execution_count": 32,
767 |    "metadata": {},
768 |    "outputs": [
769 |     {
770 |      "data": {
771 |       "text/plain": [
772 |        "array([12.16211971, 11.31745909, 11.77819806])"
773 |       ]
774 |      },
775 |      "execution_count": 32,
776 |      "metadata": {},
777 |      "output_type": "execute_result"
778 |     }
779 |    ],
780 |    "source": [
781 |     "proj1 = nd_array.max(axis=1)\n",
782 |     "proj1"
783 |    ]
784 |   },
785 |   {
786 |    "cell_type": "code",
787 |    "execution_count": 33,
788 |    "metadata": {},
789 |    "outputs": [
790 |     {
791 |      "data": {
792 |       "text/plain": [
793 |        "(3,)"
794 |       ]
795 |      },
796 |      "execution_count": 33,
797 |      "metadata": {},
798 |      "output_type": "execute_result"
799 |     }
800 |    ],
801 |    "source": [
802 |     "proj1.shape"
803 |    ]
804 |   },
805 |   {
806 |    "cell_type": "markdown",
807 |    "metadata": {},
808 |    "source": [
809 |     "There are of course more advanced functions. For example a cumulative sum:"
810 |    ]
811 |   },
812 |   {
813 |    "cell_type": "code",
814 |    "execution_count": 34,
815 |    "metadata": {},
816 |    "outputs": [
817 |     {
818 |      "data": {
819 |       "text/plain": [
820 |        "array([  8.22235922,  19.08552671,  28.05743325,  40.21955296,\n",
821 |        "        51.53701205,  61.34475998,  72.63214358,  79.41160103,\n",
822 |        "        89.61936998,  98.3994851 , 105.36671645, 117.14491451])"
823 |       ]
824 |      },
825 |      "execution_count": 34,
826 |      "metadata": {},
827 |      "output_type": "execute_result"
828 |     }
829 |    ],
830 |    "source": [
831 |     "np.cumsum(nd_array)"
832 |    ]
833 |   }
834 |  ],
835 |  "metadata": {
836 |   "kernelspec": {
837 |    "display_name": "Python 3",
838 |    "language": "python",
839 |    "name": "python3"
840 |   },
841 |   "language_info": {
842 |    "codemirror_mode": {
843 |     "name": "ipython",
844 |     "version": 3
845 |    },
846 |    "file_extension": ".py",
847 |    "mimetype": "text/x-python",
848 |    "name": "python",
849 |    "nbconvert_exporter": "python",
850 |    "pygments_lexer": "ipython3",
851 |    "version": "3.8.2"
852 |   }
853 |  },
854 |  "nbformat": 4,
855 |  "nbformat_minor": 4
856 | }
857 | 


--------------------------------------------------------------------------------
/07-DA_Pandas_structures.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# 7. Pandas objects"
   8 |    ]
   9 |   },
  10 |   {
  11 |    "cell_type": "code",
  12 |    "execution_count": 1,
  13 |    "metadata": {},
  14 |    "outputs": [],
  15 |    "source": [
  16 |     "import numpy as np\n",
  17 |     "import pandas as pd\n",
  18 |     "import matplotlib.pyplot as plt"
  19 |    ]
  20 |   },
  21 |   {
  22 |    "cell_type": "markdown",
  23 |    "metadata": {},
  24 |    "source": [
  25 |     "Python has a series of data containers (list, dicts etc.) and Numpy offers multi-dimensional arrays, however none of these structures offers a simple way neither to handle tabular data, nor to easily do standard database operations. This is why Pandas exists: it offers a complete ecosystem of structures and functions dedicated to handle large tables with inhomogeneous contents.\n",
  26 |     "\n",
  27 |     "In this first chapter, we are going to learn about the two main structures of Pandas: Series and Dataframes."
  28 |    ]
  29 |   },
  30 |   {
  31 |    "cell_type": "markdown",
  32 |    "metadata": {},
  33 |    "source": [
  34 |     "## 7.1 Series"
  35 |    ]
  36 |   },
  37 |   {
  38 |    "cell_type": "markdown",
  39 |    "metadata": {},
  40 |    "source": [
  41 |     "### 7.1.1 Simple series"
  42 |    ]
  43 |   },
  44 |   {
  45 |    "cell_type": "markdown",
  46 |    "metadata": {},
  47 |    "source": [
  48 |     "Series are a the Pandas version of 1-D Numpy arrays. We are rarely going to use them directly, but they often appear implicitly when handling data from the more general Dataframe structure. We therefore only give here basics. \n",
  49 |     "\n",
  50 |     "To understand Series' specificities, let's create one. Usually Pandas structures (Series and Dataframes) are created from other simpler structures like Numpy arrays or dictionaries:"
  51 |    ]
  52 |   },
  53 |   {
  54 |    "cell_type": "code",
  55 |    "execution_count": 2,
  56 |    "metadata": {},
  57 |    "outputs": [],
  58 |    "source": [
  59 |     "numpy_array = np.array([4,8,38,1,6])\n"
  60 |    ]
  61 |   },
  62 |   {
  63 |    "cell_type": "markdown",
  64 |    "metadata": {},
  65 |    "source": [
  66 |     "The function ```pd.Series()``` allows us to convert objects into Series:"
  67 |    ]
  68 |   },
  69 |   {
  70 |    "cell_type": "code",
  71 |    "execution_count": 3,
  72 |    "metadata": {},
  73 |    "outputs": [
  74 |     {
  75 |      "data": {
  76 |       "text/plain": [
  77 |        "0     4\n",
  78 |        "1     8\n",
  79 |        "2    38\n",
  80 |        "3     1\n",
  81 |        "4     6\n",
  82 |        "dtype: int64"
  83 |       ]
  84 |      },
  85 |      "execution_count": 3,
  86 |      "metadata": {},
  87 |      "output_type": "execute_result"
  88 |     }
  89 |    ],
  90 |    "source": [
  91 |     "pd_series = pd.Series(numpy_array)\n",
  92 |     "pd_series"
  93 |    ]
  94 |   },
  95 |   {
  96 |    "cell_type": "markdown",
  97 |    "metadata": {},
  98 |    "source": [
  99 |     "The underlying structure can be recovered with the ```.values``` attribute: "
 100 |    ]
 101 |   },
 102 |   {
 103 |    "cell_type": "code",
 104 |    "execution_count": 4,
 105 |    "metadata": {},
 106 |    "outputs": [
 107 |     {
 108 |      "data": {
 109 |       "text/plain": [
 110 |        "array([ 4,  8, 38,  1,  6])"
 111 |       ]
 112 |      },
 113 |      "execution_count": 4,
 114 |      "metadata": {},
 115 |      "output_type": "execute_result"
 116 |     }
 117 |    ],
 118 |    "source": [
 119 |     "pd_series.values"
 120 |    ]
 121 |   },
 122 |   {
 123 |    "cell_type": "markdown",
 124 |    "metadata": {},
 125 |    "source": [
 126 |     "Otherwise, indexing works as for regular arrays:"
 127 |    ]
 128 |   },
 129 |   {
 130 |    "cell_type": "code",
 131 |    "execution_count": 5,
 132 |    "metadata": {},
 133 |    "outputs": [
 134 |     {
 135 |      "data": {
 136 |       "text/plain": [
 137 |        "8"
 138 |       ]
 139 |      },
 140 |      "execution_count": 5,
 141 |      "metadata": {},
 142 |      "output_type": "execute_result"
 143 |     }
 144 |    ],
 145 |    "source": [
 146 |     "pd_series[1]"
 147 |    ]
 148 |   },
 149 |   {
 150 |    "cell_type": "markdown",
 151 |    "metadata": {},
 152 |    "source": [
 153 |     "### 7.1.2 Indexing"
 154 |    ]
 155 |   },
 156 |   {
 157 |    "cell_type": "markdown",
 158 |    "metadata": {},
 159 |    "source": [
 160 |     "On top of accessing values in a series by regular indexing, one can create custom indices for each element in the series:"
 161 |    ]
 162 |   },
 163 |   {
 164 |    "cell_type": "code",
 165 |    "execution_count": 6,
 166 |    "metadata": {},
 167 |    "outputs": [],
 168 |    "source": [
 169 |     "pd_series2 = pd.Series(numpy_array, index=['a', 'b', 'c', 'd','e'])"
 170 |    ]
 171 |   },
 172 |   {
 173 |    "cell_type": "code",
 174 |    "execution_count": 7,
 175 |    "metadata": {},
 176 |    "outputs": [
 177 |     {
 178 |      "data": {
 179 |       "text/plain": [
 180 |        "a     4\n",
 181 |        "b     8\n",
 182 |        "c    38\n",
 183 |        "d     1\n",
 184 |        "e     6\n",
 185 |        "dtype: int64"
 186 |       ]
 187 |      },
 188 |      "execution_count": 7,
 189 |      "metadata": {},
 190 |      "output_type": "execute_result"
 191 |     }
 192 |    ],
 193 |    "source": [
 194 |     "pd_series2"
 195 |    ]
 196 |   },
 197 |   {
 198 |    "cell_type": "markdown",
 199 |    "metadata": {},
 200 |    "source": [
 201 |     "Now a given element can be accessed either by using its regular index:"
 202 |    ]
 203 |   },
 204 |   {
 205 |    "cell_type": "code",
 206 |    "execution_count": 8,
 207 |    "metadata": {},
 208 |    "outputs": [
 209 |     {
 210 |      "data": {
 211 |       "text/plain": [
 212 |        "8"
 213 |       ]
 214 |      },
 215 |      "execution_count": 8,
 216 |      "metadata": {},
 217 |      "output_type": "execute_result"
 218 |     }
 219 |    ],
 220 |    "source": [
 221 |     "pd_series2[1]"
 222 |    ]
 223 |   },
 224 |   {
 225 |    "cell_type": "markdown",
 226 |    "metadata": {},
 227 |    "source": [
 228 |     "or its chosen index:"
 229 |    ]
 230 |   },
 231 |   {
 232 |    "cell_type": "code",
 233 |    "execution_count": 9,
 234 |    "metadata": {},
 235 |    "outputs": [
 236 |     {
 237 |      "data": {
 238 |       "text/plain": [
 239 |        "8"
 240 |       ]
 241 |      },
 242 |      "execution_count": 9,
 243 |      "metadata": {},
 244 |      "output_type": "execute_result"
 245 |     }
 246 |    ],
 247 |    "source": [
 248 |     "pd_series2['b']"
 249 |    ]
 250 |   },
 251 |   {
 252 |    "cell_type": "markdown",
 253 |    "metadata": {},
 254 |    "source": [
 255 |     "A more direct way to create specific indexes is to transform as dictionary into a Series:"
 256 |    ]
 257 |   },
 258 |   {
 259 |    "cell_type": "code",
 260 |    "execution_count": 10,
 261 |    "metadata": {},
 262 |    "outputs": [],
 263 |    "source": [
 264 |     "composer_birth = {'Mahler': 1860, 'Beethoven': 1770, 'Puccini': 1858, 'Shostakovich': 1906}"
 265 |    ]
 266 |   },
 267 |   {
 268 |    "cell_type": "code",
 269 |    "execution_count": 11,
 270 |    "metadata": {},
 271 |    "outputs": [
 272 |     {
 273 |      "data": {
 274 |       "text/plain": [
 275 |        "Mahler          1860\n",
 276 |        "Beethoven       1770\n",
 277 |        "Puccini         1858\n",
 278 |        "Shostakovich    1906\n",
 279 |        "dtype: int64"
 280 |       ]
 281 |      },
 282 |      "execution_count": 11,
 283 |      "metadata": {},
 284 |      "output_type": "execute_result"
 285 |     }
 286 |    ],
 287 |    "source": [
 288 |     "pd_composer_birth = pd.Series(composer_birth)\n",
 289 |     "pd_composer_birth"
 290 |    ]
 291 |   },
 292 |   {
 293 |    "cell_type": "code",
 294 |    "execution_count": 12,
 295 |    "metadata": {},
 296 |    "outputs": [
 297 |     {
 298 |      "data": {
 299 |       "text/plain": [
 300 |        "1858"
 301 |       ]
 302 |      },
 303 |      "execution_count": 12,
 304 |      "metadata": {},
 305 |      "output_type": "execute_result"
 306 |     }
 307 |    ],
 308 |    "source": [
 309 |     "pd_composer_birth['Puccini']"
 310 |    ]
 311 |   },
 312 |   {
 313 |    "cell_type": "markdown",
 314 |    "metadata": {},
 315 |    "source": [
 316 |     "## 7.2 Dataframes"
 317 |    ]
 318 |   },
 319 |   {
 320 |    "cell_type": "markdown",
 321 |    "metadata": {},
 322 |    "source": [
 323 |     "In most cases, one has to deal with more than just one variable, e.g. one has the birth year and the death year of a list of composers. Also one might have different types of information, e.g. in addition to numerical variables (year) one might have string variables like the city of birth. The Pandas structure that allow one to deal with such complex data is called a Dataframe, which can somehow be seen as an aggregation of Series with a common index."
 324 |    ]
 325 |   },
 326 |   {
 327 |    "cell_type": "markdown",
 328 |    "metadata": {},
 329 |    "source": [
 330 |     "### 7.2.1 Creating a Dataframe"
 331 |    ]
 332 |   },
 333 |   {
 334 |    "cell_type": "markdown",
 335 |    "metadata": {},
 336 |    "source": [
 337 |     "To see how to construct such a Dataframe, let's create some more information about composers:"
 338 |    ]
 339 |   },
 340 |   {
 341 |    "cell_type": "code",
 342 |    "execution_count": 13,
 343 |    "metadata": {},
 344 |    "outputs": [],
 345 |    "source": [
 346 |     "composer_death = pd.Series({'Mahler': 1911, 'Beethoven': 1827, 'Puccini': 1924, 'Shostakovich': 1975})\n",
 347 |     "composer_city_birth = pd.Series({'Mahler': 'Kaliste', 'Beethoven': 'Bonn', 'Puccini': 'Lucques', 'Shostakovich': 'Saint-Petersburg'})\n"
 348 |    ]
 349 |   },
 350 |   {
 351 |    "cell_type": "markdown",
 352 |    "metadata": {},
 353 |    "source": [
 354 |     "Now we can combine multiple series into a Dataframe by precising a variable name for each series. Note that all our series need to have the same indices (here the composers' name):"
 355 |    ]
 356 |   },
 357 |   {
 358 |    "cell_type": "code",
 359 |    "execution_count": 14,
 360 |    "metadata": {},
 361 |    "outputs": [
 362 |     {
 363 |      "data": {
 364 |       "text/html": [
 365 |        "<div>\n",
 366 |        "<style scoped>\n",
 367 |        "    .dataframe tbody tr th:only-of-type {\n",
 368 |        "        vertical-align: middle;\n",
 369 |        "    }\n",
 370 |        "\n",
 371 |        "    .dataframe tbody tr th {\n",
 372 |        "        vertical-align: top;\n",
 373 |        "    }\n",
 374 |        "\n",
 375 |        "    .dataframe thead th {\n",
 376 |        "        text-align: right;\n",
 377 |        "    }\n",
 378 |        "</style>\n",
 379 |        "<table border=\"1\" class=\"dataframe\">\n",
 380 |        "  <thead>\n",
 381 |        "    <tr style=\"text-align: right;\">\n",
 382 |        "      <th></th>\n",
 383 |        "      <th>birth</th>\n",
 384 |        "      <th>death</th>\n",
 385 |        "      <th>city</th>\n",
 386 |        "    </tr>\n",
 387 |        "  </thead>\n",
 388 |        "  <tbody>\n",
 389 |        "    <tr>\n",
 390 |        "      <th>Mahler</th>\n",
 391 |        "      <td>1860</td>\n",
 392 |        "      <td>1911</td>\n",
 393 |        "      <td>Kaliste</td>\n",
 394 |        "    </tr>\n",
 395 |        "    <tr>\n",
 396 |        "      <th>Beethoven</th>\n",
 397 |        "      <td>1770</td>\n",
 398 |        "      <td>1827</td>\n",
 399 |        "      <td>Bonn</td>\n",
 400 |        "    </tr>\n",
 401 |        "    <tr>\n",
 402 |        "      <th>Puccini</th>\n",
 403 |        "      <td>1858</td>\n",
 404 |        "      <td>1924</td>\n",
 405 |        "      <td>Lucques</td>\n",
 406 |        "    </tr>\n",
 407 |        "    <tr>\n",
 408 |        "      <th>Shostakovich</th>\n",
 409 |        "      <td>1906</td>\n",
 410 |        "      <td>1975</td>\n",
 411 |        "      <td>Saint-Petersburg</td>\n",
 412 |        "    </tr>\n",
 413 |        "  </tbody>\n",
 414 |        "</table>\n",
 415 |        "</div>"
 416 |       ],
 417 |       "text/plain": [
 418 |        "              birth  death              city\n",
 419 |        "Mahler         1860   1911           Kaliste\n",
 420 |        "Beethoven      1770   1827              Bonn\n",
 421 |        "Puccini        1858   1924           Lucques\n",
 422 |        "Shostakovich   1906   1975  Saint-Petersburg"
 423 |       ]
 424 |      },
 425 |      "execution_count": 14,
 426 |      "metadata": {},
 427 |      "output_type": "execute_result"
 428 |     }
 429 |    ],
 430 |    "source": [
 431 |     "composers_df = pd.DataFrame({'birth': pd_composer_birth, 'death': composer_death, 'city': composer_city_birth})\n",
 432 |     "composers_df"
 433 |    ]
 434 |   },
 435 |   {
 436 |    "cell_type": "markdown",
 437 |    "metadata": {},
 438 |    "source": [
 439 |     "A more common way of creating a Dataframe is to construct it directly from a dictionary of lists where each element of the dictionary turns into a column:"
 440 |    ]
 441 |   },
 442 |   {
 443 |    "cell_type": "code",
 444 |    "execution_count": 15,
 445 |    "metadata": {},
 446 |    "outputs": [],
 447 |    "source": [
 448 |     "dict_of_list = {'birth': [1860, 1770, 1858, 1906], 'death':[1911, 1827, 1924, 1975], \n",
 449 |     " 'city':['Kaliste', 'Bonn', 'Lucques', 'Saint-Petersburg']}"
 450 |    ]
 451 |   },
 452 |   {
 453 |    "cell_type": "code",
 454 |    "execution_count": 16,
 455 |    "metadata": {},
 456 |    "outputs": [
 457 |     {
 458 |      "data": {
 459 |       "text/html": [
 460 |        "<div>\n",
 461 |        "<style scoped>\n",
 462 |        "    .dataframe tbody tr th:only-of-type {\n",
 463 |        "        vertical-align: middle;\n",
 464 |        "    }\n",
 465 |        "\n",
 466 |        "    .dataframe tbody tr th {\n",
 467 |        "        vertical-align: top;\n",
 468 |        "    }\n",
 469 |        "\n",
 470 |        "    .dataframe thead th {\n",
 471 |        "        text-align: right;\n",
 472 |        "    }\n",
 473 |        "</style>\n",
 474 |        "<table border=\"1\" class=\"dataframe\">\n",
 475 |        "  <thead>\n",
 476 |        "    <tr style=\"text-align: right;\">\n",
 477 |        "      <th></th>\n",
 478 |        "      <th>birth</th>\n",
 479 |        "      <th>death</th>\n",
 480 |        "      <th>city</th>\n",
 481 |        "    </tr>\n",
 482 |        "  </thead>\n",
 483 |        "  <tbody>\n",
 484 |        "    <tr>\n",
 485 |        "      <th>0</th>\n",
 486 |        "      <td>1860</td>\n",
 487 |        "      <td>1911</td>\n",
 488 |        "      <td>Kaliste</td>\n",
 489 |        "    </tr>\n",
 490 |        "    <tr>\n",
 491 |        "      <th>1</th>\n",
 492 |        "      <td>1770</td>\n",
 493 |        "      <td>1827</td>\n",
 494 |        "      <td>Bonn</td>\n",
 495 |        "    </tr>\n",
 496 |        "    <tr>\n",
 497 |        "      <th>2</th>\n",
 498 |        "      <td>1858</td>\n",
 499 |        "      <td>1924</td>\n",
 500 |        "      <td>Lucques</td>\n",
 501 |        "    </tr>\n",
 502 |        "    <tr>\n",
 503 |        "      <th>3</th>\n",
 504 |        "      <td>1906</td>\n",
 505 |        "      <td>1975</td>\n",
 506 |        "      <td>Saint-Petersburg</td>\n",
 507 |        "    </tr>\n",
 508 |        "  </tbody>\n",
 509 |        "</table>\n",
 510 |        "</div>"
 511 |       ],
 512 |       "text/plain": [
 513 |        "   birth  death              city\n",
 514 |        "0   1860   1911           Kaliste\n",
 515 |        "1   1770   1827              Bonn\n",
 516 |        "2   1858   1924           Lucques\n",
 517 |        "3   1906   1975  Saint-Petersburg"
 518 |       ]
 519 |      },
 520 |      "execution_count": 16,
 521 |      "metadata": {},
 522 |      "output_type": "execute_result"
 523 |     }
 524 |    ],
 525 |    "source": [
 526 |     "pd.DataFrame(dict_of_list)"
 527 |    ]
 528 |   },
 529 |   {
 530 |    "cell_type": "markdown",
 531 |    "metadata": {},
 532 |    "source": [
 533 |     "However we now lost the composers name. We can enforce it by providing, as we did before for the Series, a list of indices:"
 534 |    ]
 535 |   },
 536 |   {
 537 |    "cell_type": "code",
 538 |    "execution_count": 17,
 539 |    "metadata": {},
 540 |    "outputs": [
 541 |     {
 542 |      "data": {
 543 |       "text/html": [
 544 |        "<div>\n",
 545 |        "<style scoped>\n",
 546 |        "    .dataframe tbody tr th:only-of-type {\n",
 547 |        "        vertical-align: middle;\n",
 548 |        "    }\n",
 549 |        "\n",
 550 |        "    .dataframe tbody tr th {\n",
 551 |        "        vertical-align: top;\n",
 552 |        "    }\n",
 553 |        "\n",
 554 |        "    .dataframe thead th {\n",
 555 |        "        text-align: right;\n",
 556 |        "    }\n",
 557 |        "</style>\n",
 558 |        "<table border=\"1\" class=\"dataframe\">\n",
 559 |        "  <thead>\n",
 560 |        "    <tr style=\"text-align: right;\">\n",
 561 |        "      <th></th>\n",
 562 |        "      <th>birth</th>\n",
 563 |        "      <th>death</th>\n",
 564 |        "      <th>city</th>\n",
 565 |        "    </tr>\n",
 566 |        "  </thead>\n",
 567 |        "  <tbody>\n",
 568 |        "    <tr>\n",
 569 |        "      <th>Mahler</th>\n",
 570 |        "      <td>1860</td>\n",
 571 |        "      <td>1911</td>\n",
 572 |        "      <td>Kaliste</td>\n",
 573 |        "    </tr>\n",
 574 |        "    <tr>\n",
 575 |        "      <th>Beethoven</th>\n",
 576 |        "      <td>1770</td>\n",
 577 |        "      <td>1827</td>\n",
 578 |        "      <td>Bonn</td>\n",
 579 |        "    </tr>\n",
 580 |        "    <tr>\n",
 581 |        "      <th>Puccini</th>\n",
 582 |        "      <td>1858</td>\n",
 583 |        "      <td>1924</td>\n",
 584 |        "      <td>Lucques</td>\n",
 585 |        "    </tr>\n",
 586 |        "    <tr>\n",
 587 |        "      <th>Shostakovich</th>\n",
 588 |        "      <td>1906</td>\n",
 589 |        "      <td>1975</td>\n",
 590 |        "      <td>Saint-Petersburg</td>\n",
 591 |        "    </tr>\n",
 592 |        "  </tbody>\n",
 593 |        "</table>\n",
 594 |        "</div>"
 595 |       ],
 596 |       "text/plain": [
 597 |        "              birth  death              city\n",
 598 |        "Mahler         1860   1911           Kaliste\n",
 599 |        "Beethoven      1770   1827              Bonn\n",
 600 |        "Puccini        1858   1924           Lucques\n",
 601 |        "Shostakovich   1906   1975  Saint-Petersburg"
 602 |       ]
 603 |      },
 604 |      "execution_count": 17,
 605 |      "metadata": {},
 606 |      "output_type": "execute_result"
 607 |     }
 608 |    ],
 609 |    "source": [
 610 |     "pd.DataFrame(dict_of_list, index=['Mahler', 'Beethoven', 'Puccini', 'Shostakovich'])"
 611 |    ]
 612 |   },
 613 |   {
 614 |    "cell_type": "markdown",
 615 |    "metadata": {},
 616 |    "source": [
 617 |     "### 7.2.2 Accessing values"
 618 |    ]
 619 |   },
 620 |   {
 621 |    "cell_type": "markdown",
 622 |    "metadata": {},
 623 |    "source": [
 624 |     "There are multiple ways of accessing values or series of values in a Dataframe. Unlike in Series, a simple bracket gives access to a column and not an index, for example:"
 625 |    ]
 626 |   },
 627 |   {
 628 |    "cell_type": "code",
 629 |    "execution_count": 18,
 630 |    "metadata": {},
 631 |    "outputs": [
 632 |     {
 633 |      "data": {
 634 |       "text/plain": [
 635 |        "Mahler                   Kaliste\n",
 636 |        "Beethoven                   Bonn\n",
 637 |        "Puccini                  Lucques\n",
 638 |        "Shostakovich    Saint-Petersburg\n",
 639 |        "Name: city, dtype: object"
 640 |       ]
 641 |      },
 642 |      "execution_count": 18,
 643 |      "metadata": {},
 644 |      "output_type": "execute_result"
 645 |     }
 646 |    ],
 647 |    "source": [
 648 |     "composers_df['city']"
 649 |    ]
 650 |   },
 651 |   {
 652 |    "cell_type": "markdown",
 653 |    "metadata": {},
 654 |    "source": [
 655 |     "returns a Series. Alternatively one can also use the *attributes* synthax and access columns by using:"
 656 |    ]
 657 |   },
 658 |   {
 659 |    "cell_type": "code",
 660 |    "execution_count": 19,
 661 |    "metadata": {},
 662 |    "outputs": [
 663 |     {
 664 |      "data": {
 665 |       "text/plain": [
 666 |        "Mahler                   Kaliste\n",
 667 |        "Beethoven                   Bonn\n",
 668 |        "Puccini                  Lucques\n",
 669 |        "Shostakovich    Saint-Petersburg\n",
 670 |        "Name: city, dtype: object"
 671 |       ]
 672 |      },
 673 |      "execution_count": 19,
 674 |      "metadata": {},
 675 |      "output_type": "execute_result"
 676 |     }
 677 |    ],
 678 |    "source": [
 679 |     "composers_df.city"
 680 |    ]
 681 |   },
 682 |   {
 683 |    "cell_type": "markdown",
 684 |    "metadata": {},
 685 |    "source": [
 686 |     "The attributes synthax has some limitations, so in case something does not work as expected, revert to the brackets notation.\n",
 687 |     "\n",
 688 |     "When specifiying multiple columns, a DataFrame is returned:"
 689 |    ]
 690 |   },
 691 |   {
 692 |    "cell_type": "code",
 693 |    "execution_count": 20,
 694 |    "metadata": {},
 695 |    "outputs": [
 696 |     {
 697 |      "data": {
 698 |       "text/html": [
 699 |        "<div>\n",
 700 |        "<style scoped>\n",
 701 |        "    .dataframe tbody tr th:only-of-type {\n",
 702 |        "        vertical-align: middle;\n",
 703 |        "    }\n",
 704 |        "\n",
 705 |        "    .dataframe tbody tr th {\n",
 706 |        "        vertical-align: top;\n",
 707 |        "    }\n",
 708 |        "\n",
 709 |        "    .dataframe thead th {\n",
 710 |        "        text-align: right;\n",
 711 |        "    }\n",
 712 |        "</style>\n",
 713 |        "<table border=\"1\" class=\"dataframe\">\n",
 714 |        "  <thead>\n",
 715 |        "    <tr style=\"text-align: right;\">\n",
 716 |        "      <th></th>\n",
 717 |        "      <th>city</th>\n",
 718 |        "      <th>birth</th>\n",
 719 |        "    </tr>\n",
 720 |        "  </thead>\n",
 721 |        "  <tbody>\n",
 722 |        "    <tr>\n",
 723 |        "      <th>Mahler</th>\n",
 724 |        "      <td>Kaliste</td>\n",
 725 |        "      <td>1860</td>\n",
 726 |        "    </tr>\n",
 727 |        "    <tr>\n",
 728 |        "      <th>Beethoven</th>\n",
 729 |        "      <td>Bonn</td>\n",
 730 |        "      <td>1770</td>\n",
 731 |        "    </tr>\n",
 732 |        "    <tr>\n",
 733 |        "      <th>Puccini</th>\n",
 734 |        "      <td>Lucques</td>\n",
 735 |        "      <td>1858</td>\n",
 736 |        "    </tr>\n",
 737 |        "    <tr>\n",
 738 |        "      <th>Shostakovich</th>\n",
 739 |        "      <td>Saint-Petersburg</td>\n",
 740 |        "      <td>1906</td>\n",
 741 |        "    </tr>\n",
 742 |        "  </tbody>\n",
 743 |        "</table>\n",
 744 |        "</div>"
 745 |       ],
 746 |       "text/plain": [
 747 |        "                          city  birth\n",
 748 |        "Mahler                 Kaliste   1860\n",
 749 |        "Beethoven                 Bonn   1770\n",
 750 |        "Puccini                Lucques   1858\n",
 751 |        "Shostakovich  Saint-Petersburg   1906"
 752 |       ]
 753 |      },
 754 |      "execution_count": 20,
 755 |      "metadata": {},
 756 |      "output_type": "execute_result"
 757 |     }
 758 |    ],
 759 |    "source": [
 760 |     "composers_df[['city', 'birth']]"
 761 |    ]
 762 |   },
 763 |   {
 764 |    "cell_type": "markdown",
 765 |    "metadata": {},
 766 |    "source": [
 767 |     "One of the important differences with a regular Numpy array is that here, regular indexing doesn't work:"
 768 |    ]
 769 |   },
 770 |   {
 771 |    "cell_type": "code",
 772 |    "execution_count": 21,
 773 |    "metadata": {},
 774 |    "outputs": [],
 775 |    "source": [
 776 |     "#composers_df[0,0]"
 777 |    ]
 778 |   },
 779 |   {
 780 |    "cell_type": "markdown",
 781 |    "metadata": {},
 782 |    "source": [
 783 |     "Instead one has to use either the ```.iloc[]``` or the ```.loc[]``` method. ```.ìloc[]``` can be used to recover the regular indexing:"
 784 |    ]
 785 |   },
 786 |   {
 787 |    "cell_type": "code",
 788 |    "execution_count": 22,
 789 |    "metadata": {},
 790 |    "outputs": [
 791 |     {
 792 |      "data": {
 793 |       "text/plain": [
 794 |        "1911"
 795 |       ]
 796 |      },
 797 |      "execution_count": 22,
 798 |      "metadata": {},
 799 |      "output_type": "execute_result"
 800 |     }
 801 |    ],
 802 |    "source": [
 803 |     " composers_df.iloc[0,1]"
 804 |    ]
 805 |   },
 806 |   {
 807 |    "cell_type": "markdown",
 808 |    "metadata": {},
 809 |    "source": [
 810 |     "While ```.loc[]``` allows one to recover elements by using the **explicit** index, on our case the composers name:"
 811 |    ]
 812 |   },
 813 |   {
 814 |    "cell_type": "code",
 815 |    "execution_count": 23,
 816 |    "metadata": {},
 817 |    "outputs": [
 818 |     {
 819 |      "data": {
 820 |       "text/plain": [
 821 |        "1911"
 822 |       ]
 823 |      },
 824 |      "execution_count": 23,
 825 |      "metadata": {},
 826 |      "output_type": "execute_result"
 827 |     }
 828 |    ],
 829 |    "source": [
 830 |     "composers_df.loc['Mahler','death']"
 831 |    ]
 832 |   },
 833 |   {
 834 |    "cell_type": "markdown",
 835 |    "metadata": {},
 836 |    "source": [
 837 |     "**Remember that ```loc``` and ``ìloc``` use brackets [] and not parenthesis ().**\n",
 838 |     "\n",
 839 |     "Numpy style indexing works here too"
 840 |    ]
 841 |   },
 842 |   {
 843 |    "cell_type": "code",
 844 |    "execution_count": 24,
 845 |    "metadata": {},
 846 |    "outputs": [
 847 |     {
 848 |      "data": {
 849 |       "text/html": [
 850 |        "<div>\n",
 851 |        "<style scoped>\n",
 852 |        "    .dataframe tbody tr th:only-of-type {\n",
 853 |        "        vertical-align: middle;\n",
 854 |        "    }\n",
 855 |        "\n",
 856 |        "    .dataframe tbody tr th {\n",
 857 |        "        vertical-align: top;\n",
 858 |        "    }\n",
 859 |        "\n",
 860 |        "    .dataframe thead th {\n",
 861 |        "        text-align: right;\n",
 862 |        "    }\n",
 863 |        "</style>\n",
 864 |        "<table border=\"1\" class=\"dataframe\">\n",
 865 |        "  <thead>\n",
 866 |        "    <tr style=\"text-align: right;\">\n",
 867 |        "      <th></th>\n",
 868 |        "      <th>birth</th>\n",
 869 |        "      <th>death</th>\n",
 870 |        "      <th>city</th>\n",
 871 |        "    </tr>\n",
 872 |        "  </thead>\n",
 873 |        "  <tbody>\n",
 874 |        "    <tr>\n",
 875 |        "      <th>Beethoven</th>\n",
 876 |        "      <td>1770</td>\n",
 877 |        "      <td>1827</td>\n",
 878 |        "      <td>Bonn</td>\n",
 879 |        "    </tr>\n",
 880 |        "    <tr>\n",
 881 |        "      <th>Puccini</th>\n",
 882 |        "      <td>1858</td>\n",
 883 |        "      <td>1924</td>\n",
 884 |        "      <td>Lucques</td>\n",
 885 |        "    </tr>\n",
 886 |        "  </tbody>\n",
 887 |        "</table>\n",
 888 |        "</div>"
 889 |       ],
 890 |       "text/plain": [
 891 |        "           birth  death     city\n",
 892 |        "Beethoven   1770   1827     Bonn\n",
 893 |        "Puccini     1858   1924  Lucques"
 894 |       ]
 895 |      },
 896 |      "execution_count": 24,
 897 |      "metadata": {},
 898 |      "output_type": "execute_result"
 899 |     }
 900 |    ],
 901 |    "source": [
 902 |     "composers_df.iloc[1:3,:]"
 903 |    ]
 904 |   },
 905 |   {
 906 |    "cell_type": "markdown",
 907 |    "metadata": {},
 908 |    "source": [
 909 |     "If you are working with a large table, it might be useful to sometimes have a list of all the columns. This is given by the ```.keys()``` attribute:"
 910 |    ]
 911 |   },
 912 |   {
 913 |    "cell_type": "code",
 914 |    "execution_count": 25,
 915 |    "metadata": {},
 916 |    "outputs": [
 917 |     {
 918 |      "data": {
 919 |       "text/plain": [
 920 |        "Index(['birth', 'death', 'city'], dtype='object')"
 921 |       ]
 922 |      },
 923 |      "execution_count": 25,
 924 |      "metadata": {},
 925 |      "output_type": "execute_result"
 926 |     }
 927 |    ],
 928 |    "source": [
 929 |     "composers_df.keys()"
 930 |    ]
 931 |   },
 932 |   {
 933 |    "cell_type": "markdown",
 934 |    "metadata": {},
 935 |    "source": [
 936 |     "### 7.2.3 Adding columns"
 937 |    ]
 938 |   },
 939 |   {
 940 |    "cell_type": "markdown",
 941 |    "metadata": {},
 942 |    "source": [
 943 |     "It is very simple to add a column to a Dataframe. One can e.g. just create a column a give it a default value that we can change later:"
 944 |    ]
 945 |   },
 946 |   {
 947 |    "cell_type": "code",
 948 |    "execution_count": 26,
 949 |    "metadata": {},
 950 |    "outputs": [],
 951 |    "source": [
 952 |     "composers_df['country'] = 'default'"
 953 |    ]
 954 |   },
 955 |   {
 956 |    "cell_type": "code",
 957 |    "execution_count": 27,
 958 |    "metadata": {},
 959 |    "outputs": [
 960 |     {
 961 |      "data": {
 962 |       "text/html": [
 963 |        "<div>\n",
 964 |        "<style scoped>\n",
 965 |        "    .dataframe tbody tr th:only-of-type {\n",
 966 |        "        vertical-align: middle;\n",
 967 |        "    }\n",
 968 |        "\n",
 969 |        "    .dataframe tbody tr th {\n",
 970 |        "        vertical-align: top;\n",
 971 |        "    }\n",
 972 |        "\n",
 973 |        "    .dataframe thead th {\n",
 974 |        "        text-align: right;\n",
 975 |        "    }\n",
 976 |        "</style>\n",
 977 |        "<table border=\"1\" class=\"dataframe\">\n",
 978 |        "  <thead>\n",
 979 |        "    <tr style=\"text-align: right;\">\n",
 980 |        "      <th></th>\n",
 981 |        "      <th>birth</th>\n",
 982 |        "      <th>death</th>\n",
 983 |        "      <th>city</th>\n",
 984 |        "      <th>country</th>\n",
 985 |        "    </tr>\n",
 986 |        "  </thead>\n",
 987 |        "  <tbody>\n",
 988 |        "    <tr>\n",
 989 |        "      <th>Mahler</th>\n",
 990 |        "      <td>1860</td>\n",
 991 |        "      <td>1911</td>\n",
 992 |        "      <td>Kaliste</td>\n",
 993 |        "      <td>default</td>\n",
 994 |        "    </tr>\n",
 995 |        "    <tr>\n",
 996 |        "      <th>Beethoven</th>\n",
 997 |        "      <td>1770</td>\n",
 998 |        "      <td>1827</td>\n",
 999 |        "      <td>Bonn</td>\n",
1000 |        "      <td>default</td>\n",
1001 |        "    </tr>\n",
1002 |        "    <tr>\n",
1003 |        "      <th>Puccini</th>\n",
1004 |        "      <td>1858</td>\n",
1005 |        "      <td>1924</td>\n",
1006 |        "      <td>Lucques</td>\n",
1007 |        "      <td>default</td>\n",
1008 |        "    </tr>\n",
1009 |        "    <tr>\n",
1010 |        "      <th>Shostakovich</th>\n",
1011 |        "      <td>1906</td>\n",
1012 |        "      <td>1975</td>\n",
1013 |        "      <td>Saint-Petersburg</td>\n",
1014 |        "      <td>default</td>\n",
1015 |        "    </tr>\n",
1016 |        "  </tbody>\n",
1017 |        "</table>\n",
1018 |        "</div>"
1019 |       ],
1020 |       "text/plain": [
1021 |        "              birth  death              city  country\n",
1022 |        "Mahler         1860   1911           Kaliste  default\n",
1023 |        "Beethoven      1770   1827              Bonn  default\n",
1024 |        "Puccini        1858   1924           Lucques  default\n",
1025 |        "Shostakovich   1906   1975  Saint-Petersburg  default"
1026 |       ]
1027 |      },
1028 |      "execution_count": 27,
1029 |      "metadata": {},
1030 |      "output_type": "execute_result"
1031 |     }
1032 |    ],
1033 |    "source": [
1034 |     "composers_df"
1035 |    ]
1036 |   },
1037 |   {
1038 |    "cell_type": "markdown",
1039 |    "metadata": {},
1040 |    "source": [
1041 |     "Or one can use an existing list:"
1042 |    ]
1043 |   },
1044 |   {
1045 |    "cell_type": "code",
1046 |    "execution_count": 28,
1047 |    "metadata": {},
1048 |    "outputs": [],
1049 |    "source": [
1050 |     "country = ['Austria','Germany','Italy','Russia']"
1051 |    ]
1052 |   },
1053 |   {
1054 |    "cell_type": "code",
1055 |    "execution_count": 29,
1056 |    "metadata": {},
1057 |    "outputs": [],
1058 |    "source": [
1059 |     "composers_df['country2'] = country"
1060 |    ]
1061 |   },
1062 |   {
1063 |    "cell_type": "code",
1064 |    "execution_count": 30,
1065 |    "metadata": {},
1066 |    "outputs": [
1067 |     {
1068 |      "data": {
1069 |       "text/html": [
1070 |        "<div>\n",
1071 |        "<style scoped>\n",
1072 |        "    .dataframe tbody tr th:only-of-type {\n",
1073 |        "        vertical-align: middle;\n",
1074 |        "    }\n",
1075 |        "\n",
1076 |        "    .dataframe tbody tr th {\n",
1077 |        "        vertical-align: top;\n",
1078 |        "    }\n",
1079 |        "\n",
1080 |        "    .dataframe thead th {\n",
1081 |        "        text-align: right;\n",
1082 |        "    }\n",
1083 |        "</style>\n",
1084 |        "<table border=\"1\" class=\"dataframe\">\n",
1085 |        "  <thead>\n",
1086 |        "    <tr style=\"text-align: right;\">\n",
1087 |        "      <th></th>\n",
1088 |        "      <th>birth</th>\n",
1089 |        "      <th>death</th>\n",
1090 |        "      <th>city</th>\n",
1091 |        "      <th>country</th>\n",
1092 |        "      <th>country2</th>\n",
1093 |        "    </tr>\n",
1094 |        "  </thead>\n",
1095 |        "  <tbody>\n",
1096 |        "    <tr>\n",
1097 |        "      <th>Mahler</th>\n",
1098 |        "      <td>1860</td>\n",
1099 |        "      <td>1911</td>\n",
1100 |        "      <td>Kaliste</td>\n",
1101 |        "      <td>default</td>\n",
1102 |        "      <td>Austria</td>\n",
1103 |        "    </tr>\n",
1104 |        "    <tr>\n",
1105 |        "      <th>Beethoven</th>\n",
1106 |        "      <td>1770</td>\n",
1107 |        "      <td>1827</td>\n",
1108 |        "      <td>Bonn</td>\n",
1109 |        "      <td>default</td>\n",
1110 |        "      <td>Germany</td>\n",
1111 |        "    </tr>\n",
1112 |        "    <tr>\n",
1113 |        "      <th>Puccini</th>\n",
1114 |        "      <td>1858</td>\n",
1115 |        "      <td>1924</td>\n",
1116 |        "      <td>Lucques</td>\n",
1117 |        "      <td>default</td>\n",
1118 |        "      <td>Italy</td>\n",
1119 |        "    </tr>\n",
1120 |        "    <tr>\n",
1121 |        "      <th>Shostakovich</th>\n",
1122 |        "      <td>1906</td>\n",
1123 |        "      <td>1975</td>\n",
1124 |        "      <td>Saint-Petersburg</td>\n",
1125 |        "      <td>default</td>\n",
1126 |        "      <td>Russia</td>\n",
1127 |        "    </tr>\n",
1128 |        "  </tbody>\n",
1129 |        "</table>\n",
1130 |        "</div>"
1131 |       ],
1132 |       "text/plain": [
1133 |        "              birth  death              city  country country2\n",
1134 |        "Mahler         1860   1911           Kaliste  default  Austria\n",
1135 |        "Beethoven      1770   1827              Bonn  default  Germany\n",
1136 |        "Puccini        1858   1924           Lucques  default    Italy\n",
1137 |        "Shostakovich   1906   1975  Saint-Petersburg  default   Russia"
1138 |       ]
1139 |      },
1140 |      "execution_count": 30,
1141 |      "metadata": {},
1142 |      "output_type": "execute_result"
1143 |     }
1144 |    ],
1145 |    "source": [
1146 |     "composers_df"
1147 |    ]
1148 |   }
1149 |  ],
1150 |  "metadata": {
1151 |   "kernelspec": {
1152 |    "display_name": "Python 3",
1153 |    "language": "python",
1154 |    "name": "python3"
1155 |   },
1156 |   "language_info": {
1157 |    "codemirror_mode": {
1158 |     "name": "ipython",
1159 |     "version": 3
1160 |    },
1161 |    "file_extension": ".py",
1162 |    "mimetype": "text/x-python",
1163 |    "name": "python",
1164 |    "nbconvert_exporter": "python",
1165 |    "pygments_lexer": "ipython3",
1166 |    "version": "3.8.2"
1167 |   }
1168 |  },
1169 |  "nbformat": 4,
1170 |  "nbformat_minor": 4
1171 | }
1172 | 


--------------------------------------------------------------------------------
/09-DA_Pandas_operations.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# 9. Operations with Pandas objects"
   8 |    ]
   9 |   },
  10 |   {
  11 |    "cell_type": "code",
  12 |    "execution_count": 1,
  13 |    "metadata": {},
  14 |    "outputs": [],
  15 |    "source": [
  16 |     "import pandas as pd\n",
  17 |     "import numpy as np"
  18 |    ]
  19 |   },
  20 |   {
  21 |    "cell_type": "markdown",
  22 |    "metadata": {},
  23 |    "source": [
  24 |     "One of the great advantages of using Pandas to handle tabular data is how simple it is to extract valuable information from them. Here we are going to see various types of operations that are available for this."
  25 |    ]
  26 |   },
  27 |   {
  28 |    "cell_type": "markdown",
  29 |    "metadata": {},
  30 |    "source": [
  31 |     "## 9.1 Matrix types of operations"
  32 |    ]
  33 |   },
  34 |   {
  35 |    "cell_type": "markdown",
  36 |    "metadata": {},
  37 |    "source": [
  38 |     "The strength of Numpy is its natural way of handling matrix operations, and Pandas reuses a lot of these features. For example one can use simple mathematical operations to operate at the cell level: "
  39 |    ]
  40 |   },
  41 |   {
  42 |    "cell_type": "code",
  43 |    "execution_count": 2,
  44 |    "metadata": {},
  45 |    "outputs": [
  46 |     {
  47 |      "data": {
  48 |       "text/html": [
  49 |        "<div>\n",
  50 |        "<style scoped>\n",
  51 |        "    .dataframe tbody tr th:only-of-type {\n",
  52 |        "        vertical-align: middle;\n",
  53 |        "    }\n",
  54 |        "\n",
  55 |        "    .dataframe tbody tr th {\n",
  56 |        "        vertical-align: top;\n",
  57 |        "    }\n",
  58 |        "\n",
  59 |        "    .dataframe thead th {\n",
  60 |        "        text-align: right;\n",
  61 |        "    }\n",
  62 |        "</style>\n",
  63 |        "<table border=\"1\" class=\"dataframe\">\n",
  64 |        "  <thead>\n",
  65 |        "    <tr style=\"text-align: right;\">\n",
  66 |        "      <th></th>\n",
  67 |        "      <th>composer</th>\n",
  68 |        "      <th>birth</th>\n",
  69 |        "      <th>death</th>\n",
  70 |        "      <th>city</th>\n",
  71 |        "    </tr>\n",
  72 |        "  </thead>\n",
  73 |        "  <tbody>\n",
  74 |        "    <tr>\n",
  75 |        "      <th>0</th>\n",
  76 |        "      <td>Mahler</td>\n",
  77 |        "      <td>1860</td>\n",
  78 |        "      <td>1911</td>\n",
  79 |        "      <td>Kaliste</td>\n",
  80 |        "    </tr>\n",
  81 |        "    <tr>\n",
  82 |        "      <th>1</th>\n",
  83 |        "      <td>Beethoven</td>\n",
  84 |        "      <td>1770</td>\n",
  85 |        "      <td>1827</td>\n",
  86 |        "      <td>Bonn</td>\n",
  87 |        "    </tr>\n",
  88 |        "    <tr>\n",
  89 |        "      <th>2</th>\n",
  90 |        "      <td>Puccini</td>\n",
  91 |        "      <td>1858</td>\n",
  92 |        "      <td>1924</td>\n",
  93 |        "      <td>Lucques</td>\n",
  94 |        "    </tr>\n",
  95 |        "    <tr>\n",
  96 |        "      <th>3</th>\n",
  97 |        "      <td>Shostakovich</td>\n",
  98 |        "      <td>1906</td>\n",
  99 |        "      <td>1975</td>\n",
 100 |        "      <td>Saint-Petersburg</td>\n",
 101 |        "    </tr>\n",
 102 |        "  </tbody>\n",
 103 |        "</table>\n",
 104 |        "</div>"
 105 |       ],
 106 |       "text/plain": [
 107 |        "       composer  birth  death              city\n",
 108 |        "0        Mahler   1860   1911           Kaliste\n",
 109 |        "1     Beethoven   1770   1827              Bonn\n",
 110 |        "2       Puccini   1858   1924           Lucques\n",
 111 |        "3  Shostakovich   1906   1975  Saint-Petersburg"
 112 |       ]
 113 |      },
 114 |      "execution_count": 2,
 115 |      "metadata": {},
 116 |      "output_type": "execute_result"
 117 |     }
 118 |    ],
 119 |    "source": [
 120 |     "compo_pd = pd.read_excel('Data/composers.xlsx')\n",
 121 |     "compo_pd"
 122 |    ]
 123 |   },
 124 |   {
 125 |    "cell_type": "code",
 126 |    "execution_count": 3,
 127 |    "metadata": {},
 128 |    "outputs": [
 129 |     {
 130 |      "data": {
 131 |       "text/plain": [
 132 |        "0    3720\n",
 133 |        "1    3540\n",
 134 |        "2    3716\n",
 135 |        "3    3812\n",
 136 |        "Name: birth, dtype: int64"
 137 |       ]
 138 |      },
 139 |      "execution_count": 3,
 140 |      "metadata": {},
 141 |      "output_type": "execute_result"
 142 |     }
 143 |    ],
 144 |    "source": [
 145 |     "compo_pd['birth']*2"
 146 |    ]
 147 |   },
 148 |   {
 149 |    "cell_type": "code",
 150 |    "execution_count": 4,
 151 |    "metadata": {},
 152 |    "outputs": [
 153 |     {
 154 |      "data": {
 155 |       "text/plain": [
 156 |        "0    7.528332\n",
 157 |        "1    7.478735\n",
 158 |        "2    7.527256\n",
 159 |        "3    7.552762\n",
 160 |        "Name: birth, dtype: float64"
 161 |       ]
 162 |      },
 163 |      "execution_count": 4,
 164 |      "metadata": {},
 165 |      "output_type": "execute_result"
 166 |     }
 167 |    ],
 168 |    "source": [
 169 |     "np.log(compo_pd['birth'])"
 170 |    ]
 171 |   },
 172 |   {
 173 |    "cell_type": "markdown",
 174 |    "metadata": {},
 175 |    "source": [
 176 |     "Here we applied functions only to series. Indeed, since our Dataframe contains e.g. strings, no operation can be done on it:"
 177 |    ]
 178 |   },
 179 |   {
 180 |    "cell_type": "code",
 181 |    "execution_count": 5,
 182 |    "metadata": {},
 183 |    "outputs": [],
 184 |    "source": [
 185 |     "#compo_pd+1"
 186 |    ]
 187 |   },
 188 |   {
 189 |    "cell_type": "markdown",
 190 |    "metadata": {},
 191 |    "source": [
 192 |     "If however we have a homogenous Dataframe, this is possible:"
 193 |    ]
 194 |   },
 195 |   {
 196 |    "cell_type": "code",
 197 |    "execution_count": 6,
 198 |    "metadata": {},
 199 |    "outputs": [
 200 |     {
 201 |      "data": {
 202 |       "text/html": [
 203 |        "<div>\n",
 204 |        "<style scoped>\n",
 205 |        "    .dataframe tbody tr th:only-of-type {\n",
 206 |        "        vertical-align: middle;\n",
 207 |        "    }\n",
 208 |        "\n",
 209 |        "    .dataframe tbody tr th {\n",
 210 |        "        vertical-align: top;\n",
 211 |        "    }\n",
 212 |        "\n",
 213 |        "    .dataframe thead th {\n",
 214 |        "        text-align: right;\n",
 215 |        "    }\n",
 216 |        "</style>\n",
 217 |        "<table border=\"1\" class=\"dataframe\">\n",
 218 |        "  <thead>\n",
 219 |        "    <tr style=\"text-align: right;\">\n",
 220 |        "      <th></th>\n",
 221 |        "      <th>birth</th>\n",
 222 |        "      <th>death</th>\n",
 223 |        "    </tr>\n",
 224 |        "  </thead>\n",
 225 |        "  <tbody>\n",
 226 |        "    <tr>\n",
 227 |        "      <th>0</th>\n",
 228 |        "      <td>1860</td>\n",
 229 |        "      <td>1911</td>\n",
 230 |        "    </tr>\n",
 231 |        "    <tr>\n",
 232 |        "      <th>1</th>\n",
 233 |        "      <td>1770</td>\n",
 234 |        "      <td>1827</td>\n",
 235 |        "    </tr>\n",
 236 |        "    <tr>\n",
 237 |        "      <th>2</th>\n",
 238 |        "      <td>1858</td>\n",
 239 |        "      <td>1924</td>\n",
 240 |        "    </tr>\n",
 241 |        "    <tr>\n",
 242 |        "      <th>3</th>\n",
 243 |        "      <td>1906</td>\n",
 244 |        "      <td>1975</td>\n",
 245 |        "    </tr>\n",
 246 |        "  </tbody>\n",
 247 |        "</table>\n",
 248 |        "</div>"
 249 |       ],
 250 |       "text/plain": [
 251 |        "   birth  death\n",
 252 |        "0   1860   1911\n",
 253 |        "1   1770   1827\n",
 254 |        "2   1858   1924\n",
 255 |        "3   1906   1975"
 256 |       ]
 257 |      },
 258 |      "execution_count": 6,
 259 |      "metadata": {},
 260 |      "output_type": "execute_result"
 261 |     }
 262 |    ],
 263 |    "source": [
 264 |     "compo_pd[['birth','death']]"
 265 |    ]
 266 |   },
 267 |   {
 268 |    "cell_type": "code",
 269 |    "execution_count": 7,
 270 |    "metadata": {},
 271 |    "outputs": [
 272 |     {
 273 |      "data": {
 274 |       "text/html": [
 275 |        "<div>\n",
 276 |        "<style scoped>\n",
 277 |        "    .dataframe tbody tr th:only-of-type {\n",
 278 |        "        vertical-align: middle;\n",
 279 |        "    }\n",
 280 |        "\n",
 281 |        "    .dataframe tbody tr th {\n",
 282 |        "        vertical-align: top;\n",
 283 |        "    }\n",
 284 |        "\n",
 285 |        "    .dataframe thead th {\n",
 286 |        "        text-align: right;\n",
 287 |        "    }\n",
 288 |        "</style>\n",
 289 |        "<table border=\"1\" class=\"dataframe\">\n",
 290 |        "  <thead>\n",
 291 |        "    <tr style=\"text-align: right;\">\n",
 292 |        "      <th></th>\n",
 293 |        "      <th>birth</th>\n",
 294 |        "      <th>death</th>\n",
 295 |        "    </tr>\n",
 296 |        "  </thead>\n",
 297 |        "  <tbody>\n",
 298 |        "    <tr>\n",
 299 |        "      <th>0</th>\n",
 300 |        "      <td>3720</td>\n",
 301 |        "      <td>3822</td>\n",
 302 |        "    </tr>\n",
 303 |        "    <tr>\n",
 304 |        "      <th>1</th>\n",
 305 |        "      <td>3540</td>\n",
 306 |        "      <td>3654</td>\n",
 307 |        "    </tr>\n",
 308 |        "    <tr>\n",
 309 |        "      <th>2</th>\n",
 310 |        "      <td>3716</td>\n",
 311 |        "      <td>3848</td>\n",
 312 |        "    </tr>\n",
 313 |        "    <tr>\n",
 314 |        "      <th>3</th>\n",
 315 |        "      <td>3812</td>\n",
 316 |        "      <td>3950</td>\n",
 317 |        "    </tr>\n",
 318 |        "  </tbody>\n",
 319 |        "</table>\n",
 320 |        "</div>"
 321 |       ],
 322 |       "text/plain": [
 323 |        "   birth  death\n",
 324 |        "0   3720   3822\n",
 325 |        "1   3540   3654\n",
 326 |        "2   3716   3848\n",
 327 |        "3   3812   3950"
 328 |       ]
 329 |      },
 330 |      "execution_count": 7,
 331 |      "metadata": {},
 332 |      "output_type": "execute_result"
 333 |     }
 334 |    ],
 335 |    "source": [
 336 |     "compo_pd[['birth','death']]*2"
 337 |    ]
 338 |   },
 339 |   {
 340 |    "cell_type": "markdown",
 341 |    "metadata": {},
 342 |    "source": [
 343 |     "## 9.2 Column operations"
 344 |    ]
 345 |   },
 346 |   {
 347 |    "cell_type": "markdown",
 348 |    "metadata": {},
 349 |    "source": [
 350 |     "There are other types of functions whose purpose is to summarize the data. For example the mean or standard deviation. Pandas by default applies such functions column-wise and returns a series containing e.g. the mean of each column:"
 351 |    ]
 352 |   },
 353 |   {
 354 |    "cell_type": "code",
 355 |    "execution_count": 8,
 356 |    "metadata": {},
 357 |    "outputs": [
 358 |     {
 359 |      "data": {
 360 |       "text/plain": [
 361 |        "birth    1848.50\n",
 362 |        "death    1909.25\n",
 363 |        "dtype: float64"
 364 |       ]
 365 |      },
 366 |      "execution_count": 8,
 367 |      "metadata": {},
 368 |      "output_type": "execute_result"
 369 |     }
 370 |    ],
 371 |    "source": [
 372 |     "np.mean(compo_pd)"
 373 |    ]
 374 |   },
 375 |   {
 376 |    "cell_type": "markdown",
 377 |    "metadata": {},
 378 |    "source": [
 379 |     "Note that columns for which a mean does not make sense, like the city are discarded.\n",
 380 |     "A series of common functions like mean or standard deviation are directly implemented as methods and can be accessed in the alternative form:"
 381 |    ]
 382 |   },
 383 |   {
 384 |    "cell_type": "code",
 385 |    "execution_count": 9,
 386 |    "metadata": {},
 387 |    "outputs": [
 388 |     {
 389 |      "data": {
 390 |       "text/html": [
 391 |        "<div>\n",
 392 |        "<style scoped>\n",
 393 |        "    .dataframe tbody tr th:only-of-type {\n",
 394 |        "        vertical-align: middle;\n",
 395 |        "    }\n",
 396 |        "\n",
 397 |        "    .dataframe tbody tr th {\n",
 398 |        "        vertical-align: top;\n",
 399 |        "    }\n",
 400 |        "\n",
 401 |        "    .dataframe thead th {\n",
 402 |        "        text-align: right;\n",
 403 |        "    }\n",
 404 |        "</style>\n",
 405 |        "<table border=\"1\" class=\"dataframe\">\n",
 406 |        "  <thead>\n",
 407 |        "    <tr style=\"text-align: right;\">\n",
 408 |        "      <th></th>\n",
 409 |        "      <th>birth</th>\n",
 410 |        "      <th>death</th>\n",
 411 |        "    </tr>\n",
 412 |        "  </thead>\n",
 413 |        "  <tbody>\n",
 414 |        "    <tr>\n",
 415 |        "      <th>count</th>\n",
 416 |        "      <td>4.000000</td>\n",
 417 |        "      <td>4.000000</td>\n",
 418 |        "    </tr>\n",
 419 |        "    <tr>\n",
 420 |        "      <th>mean</th>\n",
 421 |        "      <td>1848.500000</td>\n",
 422 |        "      <td>1909.250000</td>\n",
 423 |        "    </tr>\n",
 424 |        "    <tr>\n",
 425 |        "      <th>std</th>\n",
 426 |        "      <td>56.836021</td>\n",
 427 |        "      <td>61.396933</td>\n",
 428 |        "    </tr>\n",
 429 |        "    <tr>\n",
 430 |        "      <th>min</th>\n",
 431 |        "      <td>1770.000000</td>\n",
 432 |        "      <td>1827.000000</td>\n",
 433 |        "    </tr>\n",
 434 |        "    <tr>\n",
 435 |        "      <th>25%</th>\n",
 436 |        "      <td>1836.000000</td>\n",
 437 |        "      <td>1890.000000</td>\n",
 438 |        "    </tr>\n",
 439 |        "    <tr>\n",
 440 |        "      <th>50%</th>\n",
 441 |        "      <td>1859.000000</td>\n",
 442 |        "      <td>1917.500000</td>\n",
 443 |        "    </tr>\n",
 444 |        "    <tr>\n",
 445 |        "      <th>75%</th>\n",
 446 |        "      <td>1871.500000</td>\n",
 447 |        "      <td>1936.750000</td>\n",
 448 |        "    </tr>\n",
 449 |        "    <tr>\n",
 450 |        "      <th>max</th>\n",
 451 |        "      <td>1906.000000</td>\n",
 452 |        "      <td>1975.000000</td>\n",
 453 |        "    </tr>\n",
 454 |        "  </tbody>\n",
 455 |        "</table>\n",
 456 |        "</div>"
 457 |       ],
 458 |       "text/plain": [
 459 |        "             birth        death\n",
 460 |        "count     4.000000     4.000000\n",
 461 |        "mean   1848.500000  1909.250000\n",
 462 |        "std      56.836021    61.396933\n",
 463 |        "min    1770.000000  1827.000000\n",
 464 |        "25%    1836.000000  1890.000000\n",
 465 |        "50%    1859.000000  1917.500000\n",
 466 |        "75%    1871.500000  1936.750000\n",
 467 |        "max    1906.000000  1975.000000"
 468 |       ]
 469 |      },
 470 |      "execution_count": 9,
 471 |      "metadata": {},
 472 |      "output_type": "execute_result"
 473 |     }
 474 |    ],
 475 |    "source": [
 476 |     "compo_pd.describe()"
 477 |    ]
 478 |   },
 479 |   {
 480 |    "cell_type": "code",
 481 |    "execution_count": 10,
 482 |    "metadata": {},
 483 |    "outputs": [
 484 |     {
 485 |      "data": {
 486 |       "text/plain": [
 487 |        "birth    56.836021\n",
 488 |        "death    61.396933\n",
 489 |        "dtype: float64"
 490 |       ]
 491 |      },
 492 |      "execution_count": 10,
 493 |      "metadata": {},
 494 |      "output_type": "execute_result"
 495 |     }
 496 |    ],
 497 |    "source": [
 498 |     "compo_pd.std()"
 499 |    ]
 500 |   },
 501 |   {
 502 |    "cell_type": "markdown",
 503 |    "metadata": {},
 504 |    "source": [
 505 |     "If you need the mean of only a single column you can of course chains operations:"
 506 |    ]
 507 |   },
 508 |   {
 509 |    "cell_type": "code",
 510 |    "execution_count": 11,
 511 |    "metadata": {},
 512 |    "outputs": [
 513 |     {
 514 |      "data": {
 515 |       "text/plain": [
 516 |        "1848.5"
 517 |       ]
 518 |      },
 519 |      "execution_count": 11,
 520 |      "metadata": {},
 521 |      "output_type": "execute_result"
 522 |     }
 523 |    ],
 524 |    "source": [
 525 |     "compo_pd.birth.mean()"
 526 |    ]
 527 |   },
 528 |   {
 529 |    "cell_type": "markdown",
 530 |    "metadata": {},
 531 |    "source": [
 532 |     "## 9.3 Operations between Series"
 533 |    ]
 534 |   },
 535 |   {
 536 |    "cell_type": "markdown",
 537 |    "metadata": {},
 538 |    "source": [
 539 |     "We can also do computations with multiple series as we would do with Numpy arrays:"
 540 |    ]
 541 |   },
 542 |   {
 543 |    "cell_type": "code",
 544 |    "execution_count": 12,
 545 |    "metadata": {},
 546 |    "outputs": [
 547 |     {
 548 |      "data": {
 549 |       "text/plain": [
 550 |        "0    51\n",
 551 |        "1    57\n",
 552 |        "2    66\n",
 553 |        "3    69\n",
 554 |        "dtype: int64"
 555 |       ]
 556 |      },
 557 |      "execution_count": 12,
 558 |      "metadata": {},
 559 |      "output_type": "execute_result"
 560 |     }
 561 |    ],
 562 |    "source": [
 563 |     "compo_pd['death']-compo_pd['birth']"
 564 |    ]
 565 |   },
 566 |   {
 567 |    "cell_type": "markdown",
 568 |    "metadata": {},
 569 |    "source": [
 570 |     "We can even use the result of this computation to create a new column in our Dataframe:"
 571 |    ]
 572 |   },
 573 |   {
 574 |    "cell_type": "code",
 575 |    "execution_count": 13,
 576 |    "metadata": {},
 577 |    "outputs": [
 578 |     {
 579 |      "data": {
 580 |       "text/html": [
 581 |        "<div>\n",
 582 |        "<style scoped>\n",
 583 |        "    .dataframe tbody tr th:only-of-type {\n",
 584 |        "        vertical-align: middle;\n",
 585 |        "    }\n",
 586 |        "\n",
 587 |        "    .dataframe tbody tr th {\n",
 588 |        "        vertical-align: top;\n",
 589 |        "    }\n",
 590 |        "\n",
 591 |        "    .dataframe thead th {\n",
 592 |        "        text-align: right;\n",
 593 |        "    }\n",
 594 |        "</style>\n",
 595 |        "<table border=\"1\" class=\"dataframe\">\n",
 596 |        "  <thead>\n",
 597 |        "    <tr style=\"text-align: right;\">\n",
 598 |        "      <th></th>\n",
 599 |        "      <th>composer</th>\n",
 600 |        "      <th>birth</th>\n",
 601 |        "      <th>death</th>\n",
 602 |        "      <th>city</th>\n",
 603 |        "    </tr>\n",
 604 |        "  </thead>\n",
 605 |        "  <tbody>\n",
 606 |        "    <tr>\n",
 607 |        "      <th>0</th>\n",
 608 |        "      <td>Mahler</td>\n",
 609 |        "      <td>1860</td>\n",
 610 |        "      <td>1911</td>\n",
 611 |        "      <td>Kaliste</td>\n",
 612 |        "    </tr>\n",
 613 |        "    <tr>\n",
 614 |        "      <th>1</th>\n",
 615 |        "      <td>Beethoven</td>\n",
 616 |        "      <td>1770</td>\n",
 617 |        "      <td>1827</td>\n",
 618 |        "      <td>Bonn</td>\n",
 619 |        "    </tr>\n",
 620 |        "    <tr>\n",
 621 |        "      <th>2</th>\n",
 622 |        "      <td>Puccini</td>\n",
 623 |        "      <td>1858</td>\n",
 624 |        "      <td>1924</td>\n",
 625 |        "      <td>Lucques</td>\n",
 626 |        "    </tr>\n",
 627 |        "    <tr>\n",
 628 |        "      <th>3</th>\n",
 629 |        "      <td>Shostakovich</td>\n",
 630 |        "      <td>1906</td>\n",
 631 |        "      <td>1975</td>\n",
 632 |        "      <td>Saint-Petersburg</td>\n",
 633 |        "    </tr>\n",
 634 |        "  </tbody>\n",
 635 |        "</table>\n",
 636 |        "</div>"
 637 |       ],
 638 |       "text/plain": [
 639 |        "       composer  birth  death              city\n",
 640 |        "0        Mahler   1860   1911           Kaliste\n",
 641 |        "1     Beethoven   1770   1827              Bonn\n",
 642 |        "2       Puccini   1858   1924           Lucques\n",
 643 |        "3  Shostakovich   1906   1975  Saint-Petersburg"
 644 |       ]
 645 |      },
 646 |      "execution_count": 13,
 647 |      "metadata": {},
 648 |      "output_type": "execute_result"
 649 |     }
 650 |    ],
 651 |    "source": [
 652 |     "compo_pd"
 653 |    ]
 654 |   },
 655 |   {
 656 |    "cell_type": "code",
 657 |    "execution_count": 14,
 658 |    "metadata": {},
 659 |    "outputs": [],
 660 |    "source": [
 661 |     "compo_pd['age'] = compo_pd['death']-compo_pd['birth']"
 662 |    ]
 663 |   },
 664 |   {
 665 |    "cell_type": "code",
 666 |    "execution_count": 15,
 667 |    "metadata": {},
 668 |    "outputs": [
 669 |     {
 670 |      "data": {
 671 |       "text/html": [
 672 |        "<div>\n",
 673 |        "<style scoped>\n",
 674 |        "    .dataframe tbody tr th:only-of-type {\n",
 675 |        "        vertical-align: middle;\n",
 676 |        "    }\n",
 677 |        "\n",
 678 |        "    .dataframe tbody tr th {\n",
 679 |        "        vertical-align: top;\n",
 680 |        "    }\n",
 681 |        "\n",
 682 |        "    .dataframe thead th {\n",
 683 |        "        text-align: right;\n",
 684 |        "    }\n",
 685 |        "</style>\n",
 686 |        "<table border=\"1\" class=\"dataframe\">\n",
 687 |        "  <thead>\n",
 688 |        "    <tr style=\"text-align: right;\">\n",
 689 |        "      <th></th>\n",
 690 |        "      <th>composer</th>\n",
 691 |        "      <th>birth</th>\n",
 692 |        "      <th>death</th>\n",
 693 |        "      <th>city</th>\n",
 694 |        "      <th>age</th>\n",
 695 |        "    </tr>\n",
 696 |        "  </thead>\n",
 697 |        "  <tbody>\n",
 698 |        "    <tr>\n",
 699 |        "      <th>0</th>\n",
 700 |        "      <td>Mahler</td>\n",
 701 |        "      <td>1860</td>\n",
 702 |        "      <td>1911</td>\n",
 703 |        "      <td>Kaliste</td>\n",
 704 |        "      <td>51</td>\n",
 705 |        "    </tr>\n",
 706 |        "    <tr>\n",
 707 |        "      <th>1</th>\n",
 708 |        "      <td>Beethoven</td>\n",
 709 |        "      <td>1770</td>\n",
 710 |        "      <td>1827</td>\n",
 711 |        "      <td>Bonn</td>\n",
 712 |        "      <td>57</td>\n",
 713 |        "    </tr>\n",
 714 |        "    <tr>\n",
 715 |        "      <th>2</th>\n",
 716 |        "      <td>Puccini</td>\n",
 717 |        "      <td>1858</td>\n",
 718 |        "      <td>1924</td>\n",
 719 |        "      <td>Lucques</td>\n",
 720 |        "      <td>66</td>\n",
 721 |        "    </tr>\n",
 722 |        "    <tr>\n",
 723 |        "      <th>3</th>\n",
 724 |        "      <td>Shostakovich</td>\n",
 725 |        "      <td>1906</td>\n",
 726 |        "      <td>1975</td>\n",
 727 |        "      <td>Saint-Petersburg</td>\n",
 728 |        "      <td>69</td>\n",
 729 |        "    </tr>\n",
 730 |        "  </tbody>\n",
 731 |        "</table>\n",
 732 |        "</div>"
 733 |       ],
 734 |       "text/plain": [
 735 |        "       composer  birth  death              city  age\n",
 736 |        "0        Mahler   1860   1911           Kaliste   51\n",
 737 |        "1     Beethoven   1770   1827              Bonn   57\n",
 738 |        "2       Puccini   1858   1924           Lucques   66\n",
 739 |        "3  Shostakovich   1906   1975  Saint-Petersburg   69"
 740 |       ]
 741 |      },
 742 |      "execution_count": 15,
 743 |      "metadata": {},
 744 |      "output_type": "execute_result"
 745 |     }
 746 |    ],
 747 |    "source": [
 748 |     "compo_pd"
 749 |    ]
 750 |   },
 751 |   {
 752 |    "cell_type": "markdown",
 753 |    "metadata": {},
 754 |    "source": [
 755 |     "## 9.4 Other functions"
 756 |    ]
 757 |   },
 758 |   {
 759 |    "cell_type": "markdown",
 760 |    "metadata": {},
 761 |    "source": [
 762 |     "Sometimes one needs to apply to a column a very specific function that is not provided by default. In that case we can use one of the different ```apply``` methods of Pandas.\n",
 763 |     "\n",
 764 |     "The simplest case is to apply a function to a column, or Series of a DataFrame. Let's say for example that we want to define the the age >60 as 'old' and <60 as 'young'. We can define the following general function:"
 765 |    ]
 766 |   },
 767 |   {
 768 |    "cell_type": "code",
 769 |    "execution_count": 16,
 770 |    "metadata": {},
 771 |    "outputs": [],
 772 |    "source": [
 773 |     "def define_age(x):\n",
 774 |     "    if x>60:\n",
 775 |     "        return 'old'\n",
 776 |     "    else:\n",
 777 |     "        return 'young'"
 778 |    ]
 779 |   },
 780 |   {
 781 |    "cell_type": "code",
 782 |    "execution_count": 17,
 783 |    "metadata": {},
 784 |    "outputs": [
 785 |     {
 786 |      "data": {
 787 |       "text/plain": [
 788 |        "'young'"
 789 |       ]
 790 |      },
 791 |      "execution_count": 17,
 792 |      "metadata": {},
 793 |      "output_type": "execute_result"
 794 |     }
 795 |    ],
 796 |    "source": [
 797 |     "define_age(30)"
 798 |    ]
 799 |   },
 800 |   {
 801 |    "cell_type": "code",
 802 |    "execution_count": 18,
 803 |    "metadata": {},
 804 |    "outputs": [
 805 |     {
 806 |      "data": {
 807 |       "text/plain": [
 808 |        "'old'"
 809 |       ]
 810 |      },
 811 |      "execution_count": 18,
 812 |      "metadata": {},
 813 |      "output_type": "execute_result"
 814 |     }
 815 |    ],
 816 |    "source": [
 817 |     "define_age(70)"
 818 |    ]
 819 |   },
 820 |   {
 821 |    "cell_type": "markdown",
 822 |    "metadata": {},
 823 |    "source": [
 824 |     "We can now apply this function on an entire Series:"
 825 |    ]
 826 |   },
 827 |   {
 828 |    "cell_type": "code",
 829 |    "execution_count": 19,
 830 |    "metadata": {},
 831 |    "outputs": [
 832 |     {
 833 |      "data": {
 834 |       "text/plain": [
 835 |        "0    young\n",
 836 |        "1    young\n",
 837 |        "2      old\n",
 838 |        "3      old\n",
 839 |        "Name: age, dtype: object"
 840 |       ]
 841 |      },
 842 |      "execution_count": 19,
 843 |      "metadata": {},
 844 |      "output_type": "execute_result"
 845 |     }
 846 |    ],
 847 |    "source": [
 848 |     "compo_pd.age.apply(define_age)"
 849 |    ]
 850 |   },
 851 |   {
 852 |    "cell_type": "code",
 853 |    "execution_count": 20,
 854 |    "metadata": {},
 855 |    "outputs": [
 856 |     {
 857 |      "data": {
 858 |       "text/plain": [
 859 |        "0    2601\n",
 860 |        "1    3249\n",
 861 |        "2    4356\n",
 862 |        "3    4761\n",
 863 |        "Name: age, dtype: int64"
 864 |       ]
 865 |      },
 866 |      "execution_count": 20,
 867 |      "metadata": {},
 868 |      "output_type": "execute_result"
 869 |     }
 870 |    ],
 871 |    "source": [
 872 |     "compo_pd.age.apply(lambda x: x**2)"
 873 |    ]
 874 |   },
 875 |   {
 876 |    "cell_type": "markdown",
 877 |    "metadata": {},
 878 |    "source": [
 879 |     "And again, if we want, we can directly use this output to create a new column:"
 880 |    ]
 881 |   },
 882 |   {
 883 |    "cell_type": "code",
 884 |    "execution_count": 21,
 885 |    "metadata": {},
 886 |    "outputs": [
 887 |     {
 888 |      "data": {
 889 |       "text/html": [
 890 |        "<div>\n",
 891 |        "<style scoped>\n",
 892 |        "    .dataframe tbody tr th:only-of-type {\n",
 893 |        "        vertical-align: middle;\n",
 894 |        "    }\n",
 895 |        "\n",
 896 |        "    .dataframe tbody tr th {\n",
 897 |        "        vertical-align: top;\n",
 898 |        "    }\n",
 899 |        "\n",
 900 |        "    .dataframe thead th {\n",
 901 |        "        text-align: right;\n",
 902 |        "    }\n",
 903 |        "</style>\n",
 904 |        "<table border=\"1\" class=\"dataframe\">\n",
 905 |        "  <thead>\n",
 906 |        "    <tr style=\"text-align: right;\">\n",
 907 |        "      <th></th>\n",
 908 |        "      <th>composer</th>\n",
 909 |        "      <th>birth</th>\n",
 910 |        "      <th>death</th>\n",
 911 |        "      <th>city</th>\n",
 912 |        "      <th>age</th>\n",
 913 |        "      <th>age_def</th>\n",
 914 |        "    </tr>\n",
 915 |        "  </thead>\n",
 916 |        "  <tbody>\n",
 917 |        "    <tr>\n",
 918 |        "      <th>0</th>\n",
 919 |        "      <td>Mahler</td>\n",
 920 |        "      <td>1860</td>\n",
 921 |        "      <td>1911</td>\n",
 922 |        "      <td>Kaliste</td>\n",
 923 |        "      <td>51</td>\n",
 924 |        "      <td>young</td>\n",
 925 |        "    </tr>\n",
 926 |        "    <tr>\n",
 927 |        "      <th>1</th>\n",
 928 |        "      <td>Beethoven</td>\n",
 929 |        "      <td>1770</td>\n",
 930 |        "      <td>1827</td>\n",
 931 |        "      <td>Bonn</td>\n",
 932 |        "      <td>57</td>\n",
 933 |        "      <td>young</td>\n",
 934 |        "    </tr>\n",
 935 |        "    <tr>\n",
 936 |        "      <th>2</th>\n",
 937 |        "      <td>Puccini</td>\n",
 938 |        "      <td>1858</td>\n",
 939 |        "      <td>1924</td>\n",
 940 |        "      <td>Lucques</td>\n",
 941 |        "      <td>66</td>\n",
 942 |        "      <td>old</td>\n",
 943 |        "    </tr>\n",
 944 |        "    <tr>\n",
 945 |        "      <th>3</th>\n",
 946 |        "      <td>Shostakovich</td>\n",
 947 |        "      <td>1906</td>\n",
 948 |        "      <td>1975</td>\n",
 949 |        "      <td>Saint-Petersburg</td>\n",
 950 |        "      <td>69</td>\n",
 951 |        "      <td>old</td>\n",
 952 |        "    </tr>\n",
 953 |        "  </tbody>\n",
 954 |        "</table>\n",
 955 |        "</div>"
 956 |       ],
 957 |       "text/plain": [
 958 |        "       composer  birth  death              city  age age_def\n",
 959 |        "0        Mahler   1860   1911           Kaliste   51   young\n",
 960 |        "1     Beethoven   1770   1827              Bonn   57   young\n",
 961 |        "2       Puccini   1858   1924           Lucques   66     old\n",
 962 |        "3  Shostakovich   1906   1975  Saint-Petersburg   69     old"
 963 |       ]
 964 |      },
 965 |      "execution_count": 21,
 966 |      "metadata": {},
 967 |      "output_type": "execute_result"
 968 |     }
 969 |    ],
 970 |    "source": [
 971 |     "compo_pd['age_def'] = compo_pd.age.apply(define_age)\n",
 972 |     "compo_pd"
 973 |    ]
 974 |   },
 975 |   {
 976 |    "cell_type": "markdown",
 977 |    "metadata": {},
 978 |    "source": [
 979 |     "We can also apply a function to an entire DataFrame. For example we can ask how many composers have birth and death dates within the XIXth century:"
 980 |    ]
 981 |   },
 982 |   {
 983 |    "cell_type": "code",
 984 |    "execution_count": 22,
 985 |    "metadata": {},
 986 |    "outputs": [],
 987 |    "source": [
 988 |     "def nineteen_century_count(x):\n",
 989 |     "    return np.sum((x>=1800)&(x<1900))\n"
 990 |    ]
 991 |   },
 992 |   {
 993 |    "cell_type": "code",
 994 |    "execution_count": 23,
 995 |    "metadata": {},
 996 |    "outputs": [
 997 |     {
 998 |      "data": {
 999 |       "text/plain": [
1000 |        "birth    2\n",
1001 |        "death    1\n",
1002 |        "dtype: int64"
1003 |       ]
1004 |      },
1005 |      "execution_count": 23,
1006 |      "metadata": {},
1007 |      "output_type": "execute_result"
1008 |     }
1009 |    ],
1010 |    "source": [
1011 |     "compo_pd[['birth','death']].apply(nineteen_century_count)"
1012 |    ]
1013 |   },
1014 |   {
1015 |    "cell_type": "markdown",
1016 |    "metadata": {},
1017 |    "source": [
1018 |     "The function is applied column-wise and returns a single number for each in the form of a series."
1019 |    ]
1020 |   },
1021 |   {
1022 |    "cell_type": "code",
1023 |    "execution_count": 24,
1024 |    "metadata": {},
1025 |    "outputs": [],
1026 |    "source": [
1027 |     "def nineteen_century_true(x):\n",
1028 |     "    return (x>=1800)&(x<1900)\n"
1029 |    ]
1030 |   },
1031 |   {
1032 |    "cell_type": "code",
1033 |    "execution_count": 25,
1034 |    "metadata": {},
1035 |    "outputs": [
1036 |     {
1037 |      "data": {
1038 |       "text/html": [
1039 |        "<div>\n",
1040 |        "<style scoped>\n",
1041 |        "    .dataframe tbody tr th:only-of-type {\n",
1042 |        "        vertical-align: middle;\n",
1043 |        "    }\n",
1044 |        "\n",
1045 |        "    .dataframe tbody tr th {\n",
1046 |        "        vertical-align: top;\n",
1047 |        "    }\n",
1048 |        "\n",
1049 |        "    .dataframe thead th {\n",
1050 |        "        text-align: right;\n",
1051 |        "    }\n",
1052 |        "</style>\n",
1053 |        "<table border=\"1\" class=\"dataframe\">\n",
1054 |        "  <thead>\n",
1055 |        "    <tr style=\"text-align: right;\">\n",
1056 |        "      <th></th>\n",
1057 |        "      <th>birth</th>\n",
1058 |        "      <th>death</th>\n",
1059 |        "    </tr>\n",
1060 |        "  </thead>\n",
1061 |        "  <tbody>\n",
1062 |        "    <tr>\n",
1063 |        "      <th>0</th>\n",
1064 |        "      <td>True</td>\n",
1065 |        "      <td>False</td>\n",
1066 |        "    </tr>\n",
1067 |        "    <tr>\n",
1068 |        "      <th>1</th>\n",
1069 |        "      <td>False</td>\n",
1070 |        "      <td>True</td>\n",
1071 |        "    </tr>\n",
1072 |        "    <tr>\n",
1073 |        "      <th>2</th>\n",
1074 |        "      <td>True</td>\n",
1075 |        "      <td>False</td>\n",
1076 |        "    </tr>\n",
1077 |        "    <tr>\n",
1078 |        "      <th>3</th>\n",
1079 |        "      <td>False</td>\n",
1080 |        "      <td>False</td>\n",
1081 |        "    </tr>\n",
1082 |        "  </tbody>\n",
1083 |        "</table>\n",
1084 |        "</div>"
1085 |       ],
1086 |       "text/plain": [
1087 |        "   birth  death\n",
1088 |        "0   True  False\n",
1089 |        "1  False   True\n",
1090 |        "2   True  False\n",
1091 |        "3  False  False"
1092 |       ]
1093 |      },
1094 |      "execution_count": 25,
1095 |      "metadata": {},
1096 |      "output_type": "execute_result"
1097 |     }
1098 |    ],
1099 |    "source": [
1100 |     "compo_pd[['birth','death']].apply(nineteen_century_true)"
1101 |    ]
1102 |   },
1103 |   {
1104 |    "cell_type": "markdown",
1105 |    "metadata": {},
1106 |    "source": [
1107 |     "Here the operation is again applied column-wise but the output is a Series.\n",
1108 |     "\n",
1109 |     "There are more combinations of what can be the in- and output of the apply function and in what order (column- or row-wise) they are applied that cannot be covered here."
1110 |    ]
1111 |   },
1112 |   {
1113 |    "cell_type": "markdown",
1114 |    "metadata": {},
1115 |    "source": [
1116 |     "## 9.5 Logical indexing"
1117 |    ]
1118 |   },
1119 |   {
1120 |    "cell_type": "markdown",
1121 |    "metadata": {},
1122 |    "source": [
1123 |     "Just like with Numpy, it is possible to subselect parts of a Dataframe using logical indexing. Let's have a look again at an example:"
1124 |    ]
1125 |   },
1126 |   {
1127 |    "cell_type": "code",
1128 |    "execution_count": 26,
1129 |    "metadata": {},
1130 |    "outputs": [
1131 |     {
1132 |      "data": {
1133 |       "text/html": [
1134 |        "<div>\n",
1135 |        "<style scoped>\n",
1136 |        "    .dataframe tbody tr th:only-of-type {\n",
1137 |        "        vertical-align: middle;\n",
1138 |        "    }\n",
1139 |        "\n",
1140 |        "    .dataframe tbody tr th {\n",
1141 |        "        vertical-align: top;\n",
1142 |        "    }\n",
1143 |        "\n",
1144 |        "    .dataframe thead th {\n",
1145 |        "        text-align: right;\n",
1146 |        "    }\n",
1147 |        "</style>\n",
1148 |        "<table border=\"1\" class=\"dataframe\">\n",
1149 |        "  <thead>\n",
1150 |        "    <tr style=\"text-align: right;\">\n",
1151 |        "      <th></th>\n",
1152 |        "      <th>composer</th>\n",
1153 |        "      <th>birth</th>\n",
1154 |        "      <th>death</th>\n",
1155 |        "      <th>city</th>\n",
1156 |        "      <th>age</th>\n",
1157 |        "      <th>age_def</th>\n",
1158 |        "    </tr>\n",
1159 |        "  </thead>\n",
1160 |        "  <tbody>\n",
1161 |        "    <tr>\n",
1162 |        "      <th>0</th>\n",
1163 |        "      <td>Mahler</td>\n",
1164 |        "      <td>1860</td>\n",
1165 |        "      <td>1911</td>\n",
1166 |        "      <td>Kaliste</td>\n",
1167 |        "      <td>51</td>\n",
1168 |        "      <td>young</td>\n",
1169 |        "    </tr>\n",
1170 |        "    <tr>\n",
1171 |        "      <th>1</th>\n",
1172 |        "      <td>Beethoven</td>\n",
1173 |        "      <td>1770</td>\n",
1174 |        "      <td>1827</td>\n",
1175 |        "      <td>Bonn</td>\n",
1176 |        "      <td>57</td>\n",
1177 |        "      <td>young</td>\n",
1178 |        "    </tr>\n",
1179 |        "    <tr>\n",
1180 |        "      <th>2</th>\n",
1181 |        "      <td>Puccini</td>\n",
1182 |        "      <td>1858</td>\n",
1183 |        "      <td>1924</td>\n",
1184 |        "      <td>Lucques</td>\n",
1185 |        "      <td>66</td>\n",
1186 |        "      <td>old</td>\n",
1187 |        "    </tr>\n",
1188 |        "    <tr>\n",
1189 |        "      <th>3</th>\n",
1190 |        "      <td>Shostakovich</td>\n",
1191 |        "      <td>1906</td>\n",
1192 |        "      <td>1975</td>\n",
1193 |        "      <td>Saint-Petersburg</td>\n",
1194 |        "      <td>69</td>\n",
1195 |        "      <td>old</td>\n",
1196 |        "    </tr>\n",
1197 |        "  </tbody>\n",
1198 |        "</table>\n",
1199 |        "</div>"
1200 |       ],
1201 |       "text/plain": [
1202 |        "       composer  birth  death              city  age age_def\n",
1203 |        "0        Mahler   1860   1911           Kaliste   51   young\n",
1204 |        "1     Beethoven   1770   1827              Bonn   57   young\n",
1205 |        "2       Puccini   1858   1924           Lucques   66     old\n",
1206 |        "3  Shostakovich   1906   1975  Saint-Petersburg   69     old"
1207 |       ]
1208 |      },
1209 |      "execution_count": 26,
1210 |      "metadata": {},
1211 |      "output_type": "execute_result"
1212 |     }
1213 |    ],
1214 |    "source": [
1215 |     "compo_pd"
1216 |    ]
1217 |   },
1218 |   {
1219 |    "cell_type": "markdown",
1220 |    "metadata": {},
1221 |    "source": [
1222 |     "If we use a logical comparison on a series, this yields a **logical Series**:"
1223 |    ]
1224 |   },
1225 |   {
1226 |    "cell_type": "code",
1227 |    "execution_count": 27,
1228 |    "metadata": {},
1229 |    "outputs": [
1230 |     {
1231 |      "data": {
1232 |       "text/plain": [
1233 |        "0    1860\n",
1234 |        "1    1770\n",
1235 |        "2    1858\n",
1236 |        "3    1906\n",
1237 |        "Name: birth, dtype: int64"
1238 |       ]
1239 |      },
1240 |      "execution_count": 27,
1241 |      "metadata": {},
1242 |      "output_type": "execute_result"
1243 |     }
1244 |    ],
1245 |    "source": [
1246 |     "compo_pd['birth']"
1247 |    ]
1248 |   },
1249 |   {
1250 |    "cell_type": "code",
1251 |    "execution_count": 28,
1252 |    "metadata": {},
1253 |    "outputs": [
1254 |     {
1255 |      "data": {
1256 |       "text/plain": [
1257 |        "0     True\n",
1258 |        "1    False\n",
1259 |        "2    False\n",
1260 |        "3     True\n",
1261 |        "Name: birth, dtype: bool"
1262 |       ]
1263 |      },
1264 |      "execution_count": 28,
1265 |      "metadata": {},
1266 |      "output_type": "execute_result"
1267 |     }
1268 |    ],
1269 |    "source": [
1270 |     "compo_pd['birth'] > 1859"
1271 |    ]
1272 |   },
1273 |   {
1274 |    "cell_type": "markdown",
1275 |    "metadata": {},
1276 |    "source": [
1277 |     "Just like in Numpy we can use this logical Series as an index to select elements in the Dataframe:"
1278 |    ]
1279 |   },
1280 |   {
1281 |    "cell_type": "code",
1282 |    "execution_count": 29,
1283 |    "metadata": {},
1284 |    "outputs": [
1285 |     {
1286 |      "data": {
1287 |       "text/plain": [
1288 |        "0     True\n",
1289 |        "1    False\n",
1290 |        "2    False\n",
1291 |        "3     True\n",
1292 |        "Name: birth, dtype: bool"
1293 |       ]
1294 |      },
1295 |      "execution_count": 29,
1296 |      "metadata": {},
1297 |      "output_type": "execute_result"
1298 |     }
1299 |    ],
1300 |    "source": [
1301 |     "log_indexer = compo_pd['birth'] > 1859\n",
1302 |     "log_indexer"
1303 |    ]
1304 |   },
1305 |   {
1306 |    "cell_type": "code",
1307 |    "execution_count": 30,
1308 |    "metadata": {},
1309 |    "outputs": [
1310 |     {
1311 |      "data": {
1312 |       "text/html": [
1313 |        "<div>\n",
1314 |        "<style scoped>\n",
1315 |        "    .dataframe tbody tr th:only-of-type {\n",
1316 |        "        vertical-align: middle;\n",
1317 |        "    }\n",
1318 |        "\n",
1319 |        "    .dataframe tbody tr th {\n",
1320 |        "        vertical-align: top;\n",
1321 |        "    }\n",
1322 |        "\n",
1323 |        "    .dataframe thead th {\n",
1324 |        "        text-align: right;\n",
1325 |        "    }\n",
1326 |        "</style>\n",
1327 |        "<table border=\"1\" class=\"dataframe\">\n",
1328 |        "  <thead>\n",
1329 |        "    <tr style=\"text-align: right;\">\n",
1330 |        "      <th></th>\n",
1331 |        "      <th>composer</th>\n",
1332 |        "      <th>birth</th>\n",
1333 |        "      <th>death</th>\n",
1334 |        "      <th>city</th>\n",
1335 |        "      <th>age</th>\n",
1336 |        "      <th>age_def</th>\n",
1337 |        "    </tr>\n",
1338 |        "  </thead>\n",
1339 |        "  <tbody>\n",
1340 |        "    <tr>\n",
1341 |        "      <th>0</th>\n",
1342 |        "      <td>Mahler</td>\n",
1343 |        "      <td>1860</td>\n",
1344 |        "      <td>1911</td>\n",
1345 |        "      <td>Kaliste</td>\n",
1346 |        "      <td>51</td>\n",
1347 |        "      <td>young</td>\n",
1348 |        "    </tr>\n",
1349 |        "    <tr>\n",
1350 |        "      <th>1</th>\n",
1351 |        "      <td>Beethoven</td>\n",
1352 |        "      <td>1770</td>\n",
1353 |        "      <td>1827</td>\n",
1354 |        "      <td>Bonn</td>\n",
1355 |        "      <td>57</td>\n",
1356 |        "      <td>young</td>\n",
1357 |        "    </tr>\n",
1358 |        "    <tr>\n",
1359 |        "      <th>2</th>\n",
1360 |        "      <td>Puccini</td>\n",
1361 |        "      <td>1858</td>\n",
1362 |        "      <td>1924</td>\n",
1363 |        "      <td>Lucques</td>\n",
1364 |        "      <td>66</td>\n",
1365 |        "      <td>old</td>\n",
1366 |        "    </tr>\n",
1367 |        "    <tr>\n",
1368 |        "      <th>3</th>\n",
1369 |        "      <td>Shostakovich</td>\n",
1370 |        "      <td>1906</td>\n",
1371 |        "      <td>1975</td>\n",
1372 |        "      <td>Saint-Petersburg</td>\n",
1373 |        "      <td>69</td>\n",
1374 |        "      <td>old</td>\n",
1375 |        "    </tr>\n",
1376 |        "  </tbody>\n",
1377 |        "</table>\n",
1378 |        "</div>"
1379 |       ],
1380 |       "text/plain": [
1381 |        "       composer  birth  death              city  age age_def\n",
1382 |        "0        Mahler   1860   1911           Kaliste   51   young\n",
1383 |        "1     Beethoven   1770   1827              Bonn   57   young\n",
1384 |        "2       Puccini   1858   1924           Lucques   66     old\n",
1385 |        "3  Shostakovich   1906   1975  Saint-Petersburg   69     old"
1386 |       ]
1387 |      },
1388 |      "execution_count": 30,
1389 |      "metadata": {},
1390 |      "output_type": "execute_result"
1391 |     }
1392 |    ],
1393 |    "source": [
1394 |     "compo_pd"
1395 |    ]
1396 |   },
1397 |   {
1398 |    "cell_type": "code",
1399 |    "execution_count": 31,
1400 |    "metadata": {},
1401 |    "outputs": [
1402 |     {
1403 |      "data": {
1404 |       "text/plain": [
1405 |        "0    False\n",
1406 |        "1     True\n",
1407 |        "2     True\n",
1408 |        "3    False\n",
1409 |        "Name: birth, dtype: bool"
1410 |       ]
1411 |      },
1412 |      "execution_count": 31,
1413 |      "metadata": {},
1414 |      "output_type": "execute_result"
1415 |     }
1416 |    ],
1417 |    "source": [
1418 |     "~log_indexer"
1419 |    ]
1420 |   },
1421 |   {
1422 |    "cell_type": "code",
1423 |    "execution_count": 32,
1424 |    "metadata": {},
1425 |    "outputs": [
1426 |     {
1427 |      "data": {
1428 |       "text/html": [
1429 |        "<div>\n",
1430 |        "<style scoped>\n",
1431 |        "    .dataframe tbody tr th:only-of-type {\n",
1432 |        "        vertical-align: middle;\n",
1433 |        "    }\n",
1434 |        "\n",
1435 |        "    .dataframe tbody tr th {\n",
1436 |        "        vertical-align: top;\n",
1437 |        "    }\n",
1438 |        "\n",
1439 |        "    .dataframe thead th {\n",
1440 |        "        text-align: right;\n",
1441 |        "    }\n",
1442 |        "</style>\n",
1443 |        "<table border=\"1\" class=\"dataframe\">\n",
1444 |        "  <thead>\n",
1445 |        "    <tr style=\"text-align: right;\">\n",
1446 |        "      <th></th>\n",
1447 |        "      <th>composer</th>\n",
1448 |        "      <th>birth</th>\n",
1449 |        "      <th>death</th>\n",
1450 |        "      <th>city</th>\n",
1451 |        "      <th>age</th>\n",
1452 |        "      <th>age_def</th>\n",
1453 |        "    </tr>\n",
1454 |        "  </thead>\n",
1455 |        "  <tbody>\n",
1456 |        "    <tr>\n",
1457 |        "      <th>1</th>\n",
1458 |        "      <td>Beethoven</td>\n",
1459 |        "      <td>1770</td>\n",
1460 |        "      <td>1827</td>\n",
1461 |        "      <td>Bonn</td>\n",
1462 |        "      <td>57</td>\n",
1463 |        "      <td>young</td>\n",
1464 |        "    </tr>\n",
1465 |        "    <tr>\n",
1466 |        "      <th>2</th>\n",
1467 |        "      <td>Puccini</td>\n",
1468 |        "      <td>1858</td>\n",
1469 |        "      <td>1924</td>\n",
1470 |        "      <td>Lucques</td>\n",
1471 |        "      <td>66</td>\n",
1472 |        "      <td>old</td>\n",
1473 |        "    </tr>\n",
1474 |        "  </tbody>\n",
1475 |        "</table>\n",
1476 |        "</div>"
1477 |       ],
1478 |       "text/plain": [
1479 |        "    composer  birth  death     city  age age_def\n",
1480 |        "1  Beethoven   1770   1827     Bonn   57   young\n",
1481 |        "2    Puccini   1858   1924  Lucques   66     old"
1482 |       ]
1483 |      },
1484 |      "execution_count": 32,
1485 |      "metadata": {},
1486 |      "output_type": "execute_result"
1487 |     }
1488 |    ],
1489 |    "source": [
1490 |     "compo_pd[~log_indexer]"
1491 |    ]
1492 |   },
1493 |   {
1494 |    "cell_type": "markdown",
1495 |    "metadata": {},
1496 |    "source": [
1497 |     "We can also create more complex logical indexings: "
1498 |    ]
1499 |   },
1500 |   {
1501 |    "cell_type": "code",
1502 |    "execution_count": 33,
1503 |    "metadata": {},
1504 |    "outputs": [
1505 |     {
1506 |      "data": {
1507 |       "text/plain": [
1508 |        "0    False\n",
1509 |        "1    False\n",
1510 |        "2    False\n",
1511 |        "3     True\n",
1512 |        "dtype: bool"
1513 |       ]
1514 |      },
1515 |      "execution_count": 33,
1516 |      "metadata": {},
1517 |      "output_type": "execute_result"
1518 |     }
1519 |    ],
1520 |    "source": [
1521 |     "(compo_pd['birth'] > 1859)&(compo_pd['age']>60)"
1522 |    ]
1523 |   },
1524 |   {
1525 |    "cell_type": "code",
1526 |    "execution_count": 34,
1527 |    "metadata": {},
1528 |    "outputs": [
1529 |     {
1530 |      "data": {
1531 |       "text/html": [
1532 |        "<div>\n",
1533 |        "<style scoped>\n",
1534 |        "    .dataframe tbody tr th:only-of-type {\n",
1535 |        "        vertical-align: middle;\n",
1536 |        "    }\n",
1537 |        "\n",
1538 |        "    .dataframe tbody tr th {\n",
1539 |        "        vertical-align: top;\n",
1540 |        "    }\n",
1541 |        "\n",
1542 |        "    .dataframe thead th {\n",
1543 |        "        text-align: right;\n",
1544 |        "    }\n",
1545 |        "</style>\n",
1546 |        "<table border=\"1\" class=\"dataframe\">\n",
1547 |        "  <thead>\n",
1548 |        "    <tr style=\"text-align: right;\">\n",
1549 |        "      <th></th>\n",
1550 |        "      <th>composer</th>\n",
1551 |        "      <th>birth</th>\n",
1552 |        "      <th>death</th>\n",
1553 |        "      <th>city</th>\n",
1554 |        "      <th>age</th>\n",
1555 |        "      <th>age_def</th>\n",
1556 |        "    </tr>\n",
1557 |        "  </thead>\n",
1558 |        "  <tbody>\n",
1559 |        "    <tr>\n",
1560 |        "      <th>3</th>\n",
1561 |        "      <td>Shostakovich</td>\n",
1562 |        "      <td>1906</td>\n",
1563 |        "      <td>1975</td>\n",
1564 |        "      <td>Saint-Petersburg</td>\n",
1565 |        "      <td>69</td>\n",
1566 |        "      <td>old</td>\n",
1567 |        "    </tr>\n",
1568 |        "  </tbody>\n",
1569 |        "</table>\n",
1570 |        "</div>"
1571 |       ],
1572 |       "text/plain": [
1573 |        "       composer  birth  death              city  age age_def\n",
1574 |        "3  Shostakovich   1906   1975  Saint-Petersburg   69     old"
1575 |       ]
1576 |      },
1577 |      "execution_count": 34,
1578 |      "metadata": {},
1579 |      "output_type": "execute_result"
1580 |     }
1581 |    ],
1582 |    "source": [
1583 |     "compo_pd[(compo_pd['birth'] > 1859)&(compo_pd['age']>60)]"
1584 |    ]
1585 |   },
1586 |   {
1587 |    "cell_type": "markdown",
1588 |    "metadata": {},
1589 |    "source": [
1590 |     "And we can create new arrays containing only these subselections:"
1591 |    ]
1592 |   },
1593 |   {
1594 |    "cell_type": "code",
1595 |    "execution_count": 35,
1596 |    "metadata": {},
1597 |    "outputs": [],
1598 |    "source": [
1599 |     "compos_sub = compo_pd[compo_pd['birth'] > 1859]"
1600 |    ]
1601 |   },
1602 |   {
1603 |    "cell_type": "code",
1604 |    "execution_count": 36,
1605 |    "metadata": {},
1606 |    "outputs": [
1607 |     {
1608 |      "data": {
1609 |       "text/html": [
1610 |        "<div>\n",
1611 |        "<style scoped>\n",
1612 |        "    .dataframe tbody tr th:only-of-type {\n",
1613 |        "        vertical-align: middle;\n",
1614 |        "    }\n",
1615 |        "\n",
1616 |        "    .dataframe tbody tr th {\n",
1617 |        "        vertical-align: top;\n",
1618 |        "    }\n",
1619 |        "\n",
1620 |        "    .dataframe thead th {\n",
1621 |        "        text-align: right;\n",
1622 |        "    }\n",
1623 |        "</style>\n",
1624 |        "<table border=\"1\" class=\"dataframe\">\n",
1625 |        "  <thead>\n",
1626 |        "    <tr style=\"text-align: right;\">\n",
1627 |        "      <th></th>\n",
1628 |        "      <th>composer</th>\n",
1629 |        "      <th>birth</th>\n",
1630 |        "      <th>death</th>\n",
1631 |        "      <th>city</th>\n",
1632 |        "      <th>age</th>\n",
1633 |        "      <th>age_def</th>\n",
1634 |        "    </tr>\n",
1635 |        "  </thead>\n",
1636 |        "  <tbody>\n",
1637 |        "    <tr>\n",
1638 |        "      <th>0</th>\n",
1639 |        "      <td>Mahler</td>\n",
1640 |        "      <td>1860</td>\n",
1641 |        "      <td>1911</td>\n",
1642 |        "      <td>Kaliste</td>\n",
1643 |        "      <td>51</td>\n",
1644 |        "      <td>young</td>\n",
1645 |        "    </tr>\n",
1646 |        "    <tr>\n",
1647 |        "      <th>3</th>\n",
1648 |        "      <td>Shostakovich</td>\n",
1649 |        "      <td>1906</td>\n",
1650 |        "      <td>1975</td>\n",
1651 |        "      <td>Saint-Petersburg</td>\n",
1652 |        "      <td>69</td>\n",
1653 |        "      <td>old</td>\n",
1654 |        "    </tr>\n",
1655 |        "  </tbody>\n",
1656 |        "</table>\n",
1657 |        "</div>"
1658 |       ],
1659 |       "text/plain": [
1660 |        "       composer  birth  death              city  age age_def\n",
1661 |        "0        Mahler   1860   1911           Kaliste   51   young\n",
1662 |        "3  Shostakovich   1906   1975  Saint-Petersburg   69     old"
1663 |       ]
1664 |      },
1665 |      "execution_count": 36,
1666 |      "metadata": {},
1667 |      "output_type": "execute_result"
1668 |     }
1669 |    ],
1670 |    "source": [
1671 |     "compos_sub"
1672 |    ]
1673 |   },
1674 |   {
1675 |    "cell_type": "markdown",
1676 |    "metadata": {},
1677 |    "source": [
1678 |     "We can then modify the new array:"
1679 |    ]
1680 |   },
1681 |   {
1682 |    "cell_type": "code",
1683 |    "execution_count": 37,
1684 |    "metadata": {},
1685 |    "outputs": [
1686 |     {
1687 |      "name": "stderr",
1688 |      "output_type": "stream",
1689 |      "text": [
1690 |       "/Users/gw18g940/miniconda3/envs/danalytics/lib/python3.8/site-packages/pandas/core/indexing.py:966: SettingWithCopyWarning: \n",
1691 |       "A value is trying to be set on a copy of a slice from a DataFrame.\n",
1692 |       "Try using .loc[row_indexer,col_indexer] = value instead\n",
1693 |       "\n",
1694 |       "See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy\n",
1695 |       "  self.obj[item] = s\n"
1696 |      ]
1697 |     }
1698 |    ],
1699 |    "source": [
1700 |     "compos_sub.loc[0,'birth'] = 3000"
1701 |    ]
1702 |   },
1703 |   {
1704 |    "cell_type": "markdown",
1705 |    "metadata": {},
1706 |    "source": [
1707 |     "Note that we get this SettingWithCopyWarning warning. This is a very common problem hand has to do with how new arrays are created when making subselections. Simply stated, did we create an entirely new array or a \"view\" of the old one? This will be very case-dependent and to avoid this, if we want to create a new array we can just enforce it using the ```copy()``` method (for more information on the topic see for example this [explanation](https://www.dataquest.io/blog/settingwithcopywarning/):"
1708 |    ]
1709 |   },
1710 |   {
1711 |    "cell_type": "code",
1712 |    "execution_count": 38,
1713 |    "metadata": {},
1714 |    "outputs": [],
1715 |    "source": [
1716 |     "compos_sub2 = compo_pd[compo_pd['birth'] > 1859].copy()\n",
1717 |     "compos_sub2.loc[0,'birth'] = 3000"
1718 |    ]
1719 |   },
1720 |   {
1721 |    "cell_type": "code",
1722 |    "execution_count": 39,
1723 |    "metadata": {},
1724 |    "outputs": [
1725 |     {
1726 |      "data": {
1727 |       "text/html": [
1728 |        "<div>\n",
1729 |        "<style scoped>\n",
1730 |        "    .dataframe tbody tr th:only-of-type {\n",
1731 |        "        vertical-align: middle;\n",
1732 |        "    }\n",
1733 |        "\n",
1734 |        "    .dataframe tbody tr th {\n",
1735 |        "        vertical-align: top;\n",
1736 |        "    }\n",
1737 |        "\n",
1738 |        "    .dataframe thead th {\n",
1739 |        "        text-align: right;\n",
1740 |        "    }\n",
1741 |        "</style>\n",
1742 |        "<table border=\"1\" class=\"dataframe\">\n",
1743 |        "  <thead>\n",
1744 |        "    <tr style=\"text-align: right;\">\n",
1745 |        "      <th></th>\n",
1746 |        "      <th>composer</th>\n",
1747 |        "      <th>birth</th>\n",
1748 |        "      <th>death</th>\n",
1749 |        "      <th>city</th>\n",
1750 |        "      <th>age</th>\n",
1751 |        "      <th>age_def</th>\n",
1752 |        "    </tr>\n",
1753 |        "  </thead>\n",
1754 |        "  <tbody>\n",
1755 |        "    <tr>\n",
1756 |        "      <th>0</th>\n",
1757 |        "      <td>Mahler</td>\n",
1758 |        "      <td>3000</td>\n",
1759 |        "      <td>1911</td>\n",
1760 |        "      <td>Kaliste</td>\n",
1761 |        "      <td>51</td>\n",
1762 |        "      <td>young</td>\n",
1763 |        "    </tr>\n",
1764 |        "    <tr>\n",
1765 |        "      <th>3</th>\n",
1766 |        "      <td>Shostakovich</td>\n",
1767 |        "      <td>1906</td>\n",
1768 |        "      <td>1975</td>\n",
1769 |        "      <td>Saint-Petersburg</td>\n",
1770 |        "      <td>69</td>\n",
1771 |        "      <td>old</td>\n",
1772 |        "    </tr>\n",
1773 |        "  </tbody>\n",
1774 |        "</table>\n",
1775 |        "</div>"
1776 |       ],
1777 |       "text/plain": [
1778 |        "       composer  birth  death              city  age age_def\n",
1779 |        "0        Mahler   3000   1911           Kaliste   51   young\n",
1780 |        "3  Shostakovich   1906   1975  Saint-Petersburg   69     old"
1781 |       ]
1782 |      },
1783 |      "execution_count": 39,
1784 |      "metadata": {},
1785 |      "output_type": "execute_result"
1786 |     }
1787 |    ],
1788 |    "source": [
1789 |     "compos_sub2"
1790 |    ]
1791 |   }
1792 |  ],
1793 |  "metadata": {
1794 |   "kernelspec": {
1795 |    "display_name": "Python 3",
1796 |    "language": "python",
1797 |    "name": "python3"
1798 |   },
1799 |   "language_info": {
1800 |    "codemirror_mode": {
1801 |     "name": "ipython",
1802 |     "version": 3
1803 |    },
1804 |    "file_extension": ".py",
1805 |    "mimetype": "text/x-python",
1806 |    "name": "python",
1807 |    "nbconvert_exporter": "python",
1808 |    "pygments_lexer": "ipython3",
1809 |    "version": "3.8.2"
1810 |   }
1811 |  },
1812 |  "nbformat": 4,
1813 |  "nbformat_minor": 4
1814 | }
1815 | 


--------------------------------------------------------------------------------
/98-DA_Numpy_Exercises.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 2,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import numpy as np\n",
 10 |     "import matplotlib.pyplot as plt"
 11 |    ]
 12 |   },
 13 |   {
 14 |    "cell_type": "markdown",
 15 |    "metadata": {},
 16 |    "source": [
 17 |     "# Exercice Numpy"
 18 |    ]
 19 |   },
 20 |   {
 21 |    "cell_type": "markdown",
 22 |    "metadata": {},
 23 |    "source": [
 24 |     "## 1. Array creation"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "markdown",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "- Create a 1D array called ```xarray``` with values from 0 to 10 and in steps of 0.1. Check the shape of the array:"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "code",
 36 |    "execution_count": null,
 37 |    "metadata": {},
 38 |    "outputs": [],
 39 |    "source": []
 40 |   },
 41 |   {
 42 |    "cell_type": "markdown",
 43 |    "metadata": {},
 44 |    "source": [
 45 |     "- Create an array of normally distributed numbers with mean $\\mu=0$ and standard deviation $\\sigma=0.5$. It should have 20 rows and as many columns as there are elements in ```xarray```. Call it ```normal_array```:"
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "code",
 50 |    "execution_count": null,
 51 |    "metadata": {},
 52 |    "outputs": [],
 53 |    "source": []
 54 |   },
 55 |   {
 56 |    "cell_type": "markdown",
 57 |    "metadata": {},
 58 |    "source": [
 59 |     "- Check the type of ```normal_array```:"
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "code",
 64 |    "execution_count": null,
 65 |    "metadata": {},
 66 |    "outputs": [],
 67 |    "source": []
 68 |   },
 69 |   {
 70 |    "cell_type": "markdown",
 71 |    "metadata": {},
 72 |    "source": [
 73 |     "## 2. Array mathematics"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "markdown",
 78 |    "metadata": {},
 79 |    "source": [
 80 |     "- Using ```xarray``` as x-variable, create a new array ```yarray``` as y-variable using the function $y = 10* cos(x) * e^{-0.1x}$:"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": null,
 86 |    "metadata": {},
 87 |    "outputs": [],
 88 |    "source": []
 89 |   },
 90 |   {
 91 |    "cell_type": "markdown",
 92 |    "metadata": {},
 93 |    "source": [
 94 |     "- Create ```array_abs``` by taking the absolute value of ```yarray```:"
 95 |    ]
 96 |   },
 97 |   {
 98 |    "cell_type": "code",
 99 |    "execution_count": null,
100 |    "metadata": {},
101 |    "outputs": [],
102 |    "source": []
103 |   },
104 |   {
105 |    "cell_type": "markdown",
106 |    "metadata": {},
107 |    "source": [
108 |     "- Create a boolan array (logical array) where all positions $>0.3$ in ```array_abs``` are ```True``` and the others ```False```"
109 |    ]
110 |   },
111 |   {
112 |    "cell_type": "code",
113 |    "execution_count": null,
114 |    "metadata": {},
115 |    "outputs": [],
116 |    "source": []
117 |   },
118 |   {
119 |    "cell_type": "markdown",
120 |    "metadata": {},
121 |    "source": [
122 |     "- Create a standard deviation projection along the second dimension (columns) of ```normal_array```. Check that the dimensions are the ones you expected. Also are the values around the value you expect?"
123 |    ]
124 |   },
125 |   {
126 |    "cell_type": "code",
127 |    "execution_count": null,
128 |    "metadata": {},
129 |    "outputs": [],
130 |    "source": []
131 |   },
132 |   {
133 |    "cell_type": "markdown",
134 |    "metadata": {},
135 |    "source": [
136 |     "## 3. Plotting\n",
137 |     "\n",
138 |     "- Use a line plot to plot ```yarray``` vs ```xarray```:"
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "code",
143 |    "execution_count": null,
144 |    "metadata": {},
145 |    "outputs": [],
146 |    "source": []
147 |   },
148 |   {
149 |    "cell_type": "markdown",
150 |    "metadata": {},
151 |    "source": [
152 |     "- Try to change the color of the plot to red and to have markers on top of the line as squares:"
153 |    ]
154 |   },
155 |   {
156 |    "cell_type": "code",
157 |    "execution_count": null,
158 |    "metadata": {},
159 |    "outputs": [],
160 |    "source": []
161 |   },
162 |   {
163 |    "cell_type": "markdown",
164 |    "metadata": {},
165 |    "source": [
166 |     "- Plot the ```normal_array``` as an imagage and change the colormap to 'gray':"
167 |    ]
168 |   },
169 |   {
170 |    "cell_type": "code",
171 |    "execution_count": null,
172 |    "metadata": {},
173 |    "outputs": [],
174 |    "source": []
175 |   },
176 |   {
177 |    "cell_type": "markdown",
178 |    "metadata": {},
179 |    "source": [
180 |     "- Assemble the two above plots in a figure with one row and two columns grid:"
181 |    ]
182 |   },
183 |   {
184 |    "cell_type": "code",
185 |    "execution_count": null,
186 |    "metadata": {},
187 |    "outputs": [],
188 |    "source": []
189 |   },
190 |   {
191 |    "cell_type": "markdown",
192 |    "metadata": {},
193 |    "source": [
194 |     "## 4. Indexing\n",
195 |     "\n",
196 |     "- Create new arrays where you select every second element from xarray and yarray. Plot them on top of ```xarray``` and ```yarray```."
197 |    ]
198 |   },
199 |   {
200 |    "cell_type": "code",
201 |    "execution_count": null,
202 |    "metadata": {},
203 |    "outputs": [],
204 |    "source": []
205 |   },
206 |   {
207 |    "cell_type": "markdown",
208 |    "metadata": {},
209 |    "source": [
210 |     "- Select all values of ```yarray``` that are larger than 0. Plot those on top of the regular ```xarray``` and ```yarray```plot."
211 |    ]
212 |   },
213 |   {
214 |    "cell_type": "code",
215 |    "execution_count": null,
216 |    "metadata": {},
217 |    "outputs": [],
218 |    "source": []
219 |   },
220 |   {
221 |    "cell_type": "markdown",
222 |    "metadata": {},
223 |    "source": [
224 |     "- Flip the order of ```xarray``` use it to plot ```yarray```:"
225 |    ]
226 |   },
227 |   {
228 |    "cell_type": "code",
229 |    "execution_count": null,
230 |    "metadata": {},
231 |    "outputs": [],
232 |    "source": []
233 |   },
234 |   {
235 |    "cell_type": "markdown",
236 |    "metadata": {},
237 |    "source": [
238 |     "## 5. Combining arrays\n",
239 |     "\n",
240 |     "- Create an array filled with ones with the same shape as ```normal_array```. Concatenate it to ```normal_array``` along the first dimensions and plot the result:"
241 |    ]
242 |   },
243 |   {
244 |    "cell_type": "code",
245 |    "execution_count": null,
246 |    "metadata": {},
247 |    "outputs": [],
248 |    "source": []
249 |   },
250 |   {
251 |    "cell_type": "markdown",
252 |    "metadata": {},
253 |    "source": [
254 |     "- ```yarray``` represents a signal. Each line of ```normal_array``` represents a possible random noise for that signal. Using broadcasting, try to create an array of noisy versions of ```yarray``` using ```normal_array```. Finally, plot it:"
255 |    ]
256 |   },
257 |   {
258 |    "cell_type": "code",
259 |    "execution_count": null,
260 |    "metadata": {},
261 |    "outputs": [],
262 |    "source": []
263 |   }
264 |  ],
265 |  "metadata": {
266 |   "kernelspec": {
267 |    "display_name": "Python 3",
268 |    "language": "python",
269 |    "name": "python3"
270 |   },
271 |   "language_info": {
272 |    "codemirror_mode": {
273 |     "name": "ipython",
274 |     "version": 3
275 |    },
276 |    "file_extension": ".py",
277 |    "mimetype": "text/x-python",
278 |    "name": "python",
279 |    "nbconvert_exporter": "python",
280 |    "pygments_lexer": "ipython3",
281 |    "version": "3.8.5"
282 |   },
283 |   "toc": {
284 |    "base_numbering": 1,
285 |    "nav_menu": {},
286 |    "number_sections": false,
287 |    "sideBar": true,
288 |    "skip_h1_title": false,
289 |    "title_cell": "Table of Contents",
290 |    "title_sidebar": "Contents",
291 |    "toc_cell": false,
292 |    "toc_position": {},
293 |    "toc_section_display": true,
294 |    "toc_window_display": true
295 |   },
296 |   "varInspector": {
297 |    "cols": {
298 |     "lenName": 16,
299 |     "lenType": 16,
300 |     "lenVar": 40
301 |    },
302 |    "kernels_config": {
303 |     "python": {
304 |      "delete_cmd_postfix": "",
305 |      "delete_cmd_prefix": "del ",
306 |      "library": "var_list.py",
307 |      "varRefreshCmd": "print(var_dic_list())"
308 |     },
309 |     "r": {
310 |      "delete_cmd_postfix": ") ",
311 |      "delete_cmd_prefix": "rm(",
312 |      "library": "var_list.r",
313 |      "varRefreshCmd": "cat(var_dic_list()) "
314 |     }
315 |    },
316 |    "types_to_exclude": [
317 |     "module",
318 |     "function",
319 |     "builtin_function_or_method",
320 |     "instance",
321 |     "_Feature"
322 |    ],
323 |    "window_display": false
324 |   }
325 |  },
326 |  "nbformat": 4,
327 |  "nbformat_minor": 4
328 | }
329 | 


--------------------------------------------------------------------------------
/99-DA_Pandas_Exercises.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 21,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import pandas as pd\n",
 10 |     "import numpy as np\n",
 11 |     "import matplotlib.pyplot as plt\n",
 12 |     "import seaborn as sns"
 13 |    ]
 14 |   },
 15 |   {
 16 |    "cell_type": "markdown",
 17 |    "metadata": {},
 18 |    "source": [
 19 |     "# Exercise Pandas"
 20 |    ]
 21 |   },
 22 |   {
 23 |    "cell_type": "markdown",
 24 |    "metadata": {},
 25 |    "source": [
 26 |     "For these exercices we are using a [dataset](https://www.kaggle.com/dgomonov/new-york-city-airbnb-open-data/kernels) provided by Airbnb for a Kaggle competition. It describes its offer for New York City in 2019, including types of appartments, price, location etc."
 27 |    ]
 28 |   },
 29 |   {
 30 |    "cell_type": "markdown",
 31 |    "metadata": {},
 32 |    "source": [
 33 |     "## 1. Create a dataframe \n",
 34 |     "Create a dataframe of a few lines with objects and their poperties (e.g fruits, their weight and colour).\n",
 35 |     "Calculate the mean of your Dataframe."
 36 |    ]
 37 |   },
 38 |   {
 39 |    "cell_type": "markdown",
 40 |    "metadata": {},
 41 |    "source": [
 42 |     "## 2. Import\n",
 43 |     "- Import the table called ```AB_NYC_2019.csv``` as a dataframe. It is located in the Datasets folder. Have a look at the beginning of the table (head).\n",
 44 |     "\n",
 45 |     "- Create a histogram of prices"
 46 |    ]
 47 |   },
 48 |   {
 49 |    "cell_type": "markdown",
 50 |    "metadata": {},
 51 |    "source": [
 52 |     "## 3. Operations"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "markdown",
 57 |    "metadata": {},
 58 |    "source": [
 59 |     "Create a new column in the dataframe by multiplying the \"price\" and \"availability_365\" columns to get an estimate of the maximum yearly income."
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "markdown",
 64 |    "metadata": {},
 65 |    "source": [
 66 |     "## 3b. Subselection and plotting\n",
 67 |     "Create a new Dataframe by first subselecting yearly incomes between 1 and 100'000. Then make a scatter plot of yearly income versus number of reviews "
 68 |    ]
 69 |   },
 70 |   {
 71 |    "cell_type": "markdown",
 72 |    "metadata": {},
 73 |    "source": [
 74 |     "## 4. Combine"
 75 |    ]
 76 |   },
 77 |   {
 78 |    "cell_type": "markdown",
 79 |    "metadata": {},
 80 |    "source": [
 81 |     "We provide below and additional table that contains the number of inhabitants of each of New York's boroughs (\"neighbourhood_group\" in the table). Use ```merge``` to add this population information to each element in the original dataframe."
 82 |    ]
 83 |   },
 84 |   {
 85 |    "cell_type": "markdown",
 86 |    "metadata": {},
 87 |    "source": [
 88 |     "## 5. Groups"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "markdown",
 93 |    "metadata": {},
 94 |    "source": [
 95 |     "- Using ```groupby``` calculate the average price for each type of room (room_type) in each neighbourhood_group. What is the average price for an entire home in Brooklyn ?\n",
 96 |     "- Unstack the multi-level Dataframe into a regular Dataframe with ```unstack()``` and create a bar plot with the resulting table\n"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "markdown",
101 |    "metadata": {},
102 |    "source": [
103 |     "## 6. Advanced plotting"
104 |    ]
105 |   },
106 |   {
107 |    "cell_type": "markdown",
108 |    "metadata": {},
109 |    "source": [
110 |     "Using Seaborn, create a scatter plot where x and y positions are longitude and lattitude, the color reflects price and the shape of the marker the borough (neighbourhood_group). Can you recognize parts of new york ? Does the map make sense ?"
111 |    ]
112 |   }
113 |  ],
114 |  "metadata": {
115 |   "kernelspec": {
116 |    "display_name": "Python 3",
117 |    "language": "python",
118 |    "name": "python3"
119 |   },
120 |   "language_info": {
121 |    "codemirror_mode": {
122 |     "name": "ipython",
123 |     "version": 3
124 |    },
125 |    "file_extension": ".py",
126 |    "mimetype": "text/x-python",
127 |    "name": "python",
128 |    "nbconvert_exporter": "python",
129 |    "pygments_lexer": "ipython3",
130 |    "version": "3.8.2"
131 |   }
132 |  },
133 |  "nbformat": 4,
134 |  "nbformat_minor": 4
135 | }
136 | 


--------------------------------------------------------------------------------
/Data/composers.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/guiwitz/NumpyPandas_course/63506c8e1229483512786323539fbcf853ae8495/Data/composers.xlsx


--------------------------------------------------------------------------------
/Data/ny_boroughs.xlsx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/guiwitz/NumpyPandas_course/63506c8e1229483512786323539fbcf853ae8495/Data/ny_boroughs.xlsx


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | BSD 3-Clause License
 2 | 
 3 | Copyright (c) 2020, Guillaume Witz
 4 | All rights reserved.
 5 | 
 6 | Redistribution and use in source and binary forms, with or without
 7 | modification, are permitted provided that the following conditions are met:
 8 | 
 9 | 1. Redistributions of source code must retain the above copyright notice, this
10 |    list of conditions and the following disclaimer.
11 | 
12 | 2. Redistributions in binary form must reproduce the above copyright notice,
13 |    this list of conditions and the following disclaimer in the documentation
14 |    and/or other materials provided with the distribution.
15 | 
16 | 3. Neither the name of the copyright holder nor the names of its
17 |    contributors may be used to endorse or promote products derived from
18 |    this software without specific prior written permission.
19 | 
20 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
21 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
23 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
24 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
26 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
27 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
28 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
29 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
30 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/guiwitz/NumpyPandas_course/54488164b462644baf601875be69cc911eda9615?urlpath=lab)
 2 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/guiwitz/NumpyPandas_course/blob/colab)
 3 | 
 4 | 
 5 | # Introduction to Numpy and Pandas
 6 | 
 7 | This repository contains Jupyter notebooks introducing beginners to the Python packages Numpy and Pandas. The material has been designed for people already familiar with Python but not with its "scientific stack".
 8 | 
 9 | This material has been created by Guillaume Witz (Science IT Support, Microscopy Imaging Center, Bern University) in the frame of the [courses offered by ScITS](https://www.scits.unibe.ch/).
10 | 
11 | ## Content
12 | The course has the following content:
13 | 
14 | ### Numpy
15 | - [Numpy arrays:](01-DA_Numpy_arrays_creation.ipynb): what they are and how to create, import and save them
16 | - [Maths with Numpy arrays](02-DA_Numpy_array_maths.ipynb): applying functions to arrays, doing basic statistics with arrays
17 | - [Numpy and Matplotlib](03-DA_Numpy_matplotlib.ipynb): Basics of plotting Numpy arrays with Matplotlib
18 | - [Recovering parts of arrays](04-DA_Numpy_indexing.ipynb): Using array coordinates to extract information (indexing, slicing)
19 | - [Combining arrays](05-DA_Numpy_combining_arrays.ipynb): Assembling arrays by concatenation, stacking etc. Combining arrays of different sizes (broadcasting)
20 |   
21 | ### Pandas
22 | - [Introduction to Pandas](06-DA_Pandas_introduction.ipynb): What does Pandas offer?
23 | - [Pandas data structures](07-DA_Pandas_structures.ipynb): Series and dataframes
24 | - [Importing data to Pandas](08-DA_Pandas_import_plotting.ipynb): Importing data tables into Pandas (from Excel, CSV) and plotting them
25 | - [Pandas operations](09-DA_Pandas_operations.ipynb): Applying functions to the contents of Pandas dataframes (classical statistics, ```apply``` function etc.)
26 | - [Combining Pandas dataframes](10-DA_Pandas_combine.ipynb): Using concatenation or join operations to combine dataframes
27 | - [Analyzing Pandas dataframes](11-DA_Pandas_splitting.ipynb): Split dataframes into groups (```groupy```) for category-based analysis
28 | - [A real-world example](12-DA_Pandas_realworld.ipynb): Complete pipeline including data import, cleaning, analysis and plotting and showing the nitty-gritty issues one often faces with real data
29 | 
30 | ## Running the course
31 | 
32 | ### Live sessions
33 | 
34 | During live sessions of the course, you are given access to a private Jupyter session and don't need to install anything no your computer.
35 | 
36 | ### Without installation
37 | Outside live-sessions, this entire course can still be run interactively without any local installation thanks to the [mybinder](mybinder.org) service. For that just click on the mybinder tag at the top of this Readme. This will open a Jupyter session for you with all packages, notebooks and data available to run.
38 | 
39 | Alternatively you can also run the course on Google Colab. For that just click on the Colab badge at the top of this file.
40 | 
41 | ### Local installation
42 | For a local installation, we recommend using conda to create a specific environment to run the code. If you don't yet have conda, you can e.g. install miniconda, see [here](https://docs.conda.io/en/latest/miniconda.html) for instructions. Then:
43 | 
44 | 1. Clone the repository to your computer using [this link](https://github.com/guiwitz/NumpyPandas_course/archive/master.zip) and unzip it
45 | 2. Open a terminal and move to the ```NumpyPandas_course-master/binder``` folder
46 | 3. Here you find an ```environment.yml``` file that you can use to create a conda environment. Choose an environment name e.g. ```numpypandas``` and type:
47 |    ```
48 |    conda env create -n numpypandas -f environment.yml
49 |    ```
50 | 4. When you want to run the material, activate the environment and start jupyter:
51 |    ```
52 |    conda activate numpypandas
53 |    jupyter lab
54 |    ```
55 |    Note that the top folder of your directory in Jupyter is the folder from where you started Jupyter. So if you are e.g. in the ```binder``` folder, move one level up to have access to the notebooks
56 | 
57 | ## Note on the data used
58 | 
59 | In the Pandas part, we use some data provided publicly by the Swiss National Science foundation at this link: http://p3.snf.ch/Pages/DataAndDocumentation.aspx#DataDownload. The examples of analysis on these data **are in no way confirmed or validated by the SNSF and are entirely the work of Guillaume Witz, Science IT Support, Bern University**.
60 | 
61 | 


--------------------------------------------------------------------------------
/binder/environment.yml:
--------------------------------------------------------------------------------
 1 | channels:
 2 |   - conda-forge
 3 | dependencies:
 4 |   - numpy
 5 |   - matplotlib
 6 |   - scikit-learn
 7 |   - scikit-image
 8 |   - pandas
 9 |   - jupyter
10 |   - jupyterlab=1.2.*
11 |   - jupyter_contrib_nbextensions
12 |   - tqdm
13 |   - seaborn
14 |   - pip
15 |   - nodejs
16 |   - ipywidgets
17 |   - pip:
18 |     - plotnine
19 |     - xlrd


--------------------------------------------------------------------------------
/binder/postBuild:
--------------------------------------------------------------------------------
1 | jupyter labextension install @jupyterlab/toc --no-build
2 | jupyter labextension install @jupyter-widgets/jupyterlab-manager --no-build
3 | jupyter labextension install @lckr/jupyterlab_variableinspector --no-build
4 | 
5 | jupyter lab build


--------------------------------------------------------------------------------
/colab/automate_colab_editing.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": 1,
  6 |    "metadata": {},
  7 |    "outputs": [],
  8 |    "source": [
  9 |     "import os, re, glob"
 10 |    ]
 11 |   },
 12 |   {
 13 |    "cell_type": "markdown",
 14 |    "metadata": {},
 15 |    "source": [
 16 |     "## Collect notebooks from regular branch"
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": 4,
 22 |    "metadata": {},
 23 |    "outputs": [],
 24 |    "source": [
 25 |     "notebooks_or = glob.glob('/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/*.ipynb')\n"
 26 |    ]
 27 |   },
 28 |   {
 29 |    "cell_type": "markdown",
 30 |    "metadata": {},
 31 |    "source": [
 32 |     "## Find which packages to add in each notebook by looking for \"special\" packages"
 33 |    ]
 34 |   },
 35 |   {
 36 |    "cell_type": "code",
 37 |    "execution_count": 6,
 38 |    "metadata": {},
 39 |    "outputs": [],
 40 |    "source": [
 41 |     "external_packages = ['aicsimageio','ipyvolume','mrc','trackpy','stardist','cellpose']\n",
 42 |     "new_packages = []\n",
 43 |     "for noteb in notebooks_or:\n",
 44 |     "    with open(noteb) as n:\n",
 45 |     "        all_lines = n.readlines()\n",
 46 |     "        to_add = []\n",
 47 |     "        for a in all_lines:\n",
 48 |     "            if len(a) < 1000:\n",
 49 |     "                for e in external_packages:\n",
 50 |     "                    if a.find(e) > 0:\n",
 51 |     "                        if e not in to_add:\n",
 52 |     "                            to_add.append(e)\n",
 53 |     "    new_packages.append(to_add)"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": 7,
 59 |    "metadata": {},
 60 |    "outputs": [
 61 |     {
 62 |      "data": {
 63 |       "text/plain": [
 64 |        "[[], [], [], [], [], [], [], [], [], [], [], [], [], [], [], []]"
 65 |       ]
 66 |      },
 67 |      "execution_count": 7,
 68 |      "metadata": {},
 69 |      "output_type": "execute_result"
 70 |     }
 71 |    ],
 72 |    "source": [
 73 |     "new_packages"
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "markdown",
 78 |    "metadata": {},
 79 |    "source": [
 80 |     "## Define basic cells to add to notebook"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": 18,
 86 |    "metadata": {},
 87 |    "outputs": [],
 88 |    "source": [
 89 |     "data_import = \"\"\"  {\n",
 90 |     "   \"cell_type\": \"code\",\n",
 91 |     "   \"execution_count\": null,\n",
 92 |     "   \"metadata\": {},\n",
 93 |     "   \"outputs\": [],\n",
 94 |     "   \"source\": [\n",
 95 |     "    \"import sys, os\\\\n\",\n",
 96 |     "    \"if 'google.colab' in sys.modules:\\\\n\",\n",
 97 |     "    \"    if not os.path.isdir('Data'):\\\\n\",\n",
 98 |     "    \"        !curl https://raw.githubusercontent.com/guiwitz/NumpyPandas_course/master/colab/colab_data.sh -o colab_data.sh\\\\n\",\n",
 99 |     "    \"        !curl https://raw.githubusercontent.com/guiwitz/NumpyPandas_course/master/svg.py -o svg.py\\\\n\",\n",
100 |     "    \"        !sh colab_data.sh\"\n",
101 |     "   ]\n",
102 |     "  },\\n\"\"\""
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "markdown",
107 |    "metadata": {},
108 |    "source": [
109 |     "## Define where to save new notebooks (colab branch)"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "code",
114 |    "execution_count": 19,
115 |    "metadata": {},
116 |    "outputs": [],
117 |    "source": [
118 |     "newpath = '/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course_colab\\\n",
119 |     "/NumpyPandas_course/'\n"
120 |    ]
121 |   },
122 |   {
123 |    "cell_type": "markdown",
124 |    "metadata": {},
125 |    "source": [
126 |     "## Add Google drive import and package installation to each notebook"
127 |    ]
128 |   },
129 |   {
130 |    "cell_type": "code",
131 |    "execution_count": 23,
132 |    "metadata": {},
133 |    "outputs": [
134 |     {
135 |      "name": "stdout",
136 |      "output_type": "stream",
137 |      "text": [
138 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/05-DA_Numpy_combining_arrays.ipynb\n",
139 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/99-DA_Pandas_Exercises.ipynb\n",
140 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/98-DA_Numpy_Exercises.ipynb\n",
141 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/01-DA_Numpy_arrays_creation.ipynb\n",
142 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/09-DA_Pandas_operations.ipynb\n",
143 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/11-DA_Pandas_splitting.ipynb\n",
144 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/08-DA_Pandas_import_plotting.ipynb\n",
145 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/06-DA_Pandas_introduction.ipynb\n",
146 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/02-DA_Numpy_array_maths.ipynb\n",
147 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/03-DA_Numpy_matplotlib.ipynb\n",
148 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/10-DA_Pandas_combine.ipynb\n",
149 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/12-DA_Pandas_realworld.ipynb\n",
150 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/99-DA_Pandas_Solutions.ipynb\n",
151 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/04-DA_Numpy_indexing.ipynb\n",
152 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/07-DA_Pandas_structures.ipynb\n",
153 |       "/Users/gw18g940/OneDrive - Universitaet Bern/Courses/DataAnalytics_course/DataAnalytics_course/98-DA_Numpy_Solutions.ipynb\n"
154 |      ]
155 |     }
156 |    ],
157 |    "source": [
158 |     "for ind, n in enumerate(notebooks_or):\n",
159 |     "    print(n)\n",
160 |     "    fh = newpath + n.split('/')[-1]\n",
161 |     "    counter = 0\n",
162 |     "\n",
163 |     "\n",
164 |     "    with open(fh,'w') as new_file:\n",
165 |     "        with open(n) as old_file:\n",
166 |     "            for line in old_file:\n",
167 |     "                if counter == 2:\n",
168 |     "                    new_file.write(data_import)\n",
169 |     "                    new_file.write(line)\n",
170 |     "                else:\n",
171 |     "                    new_file.write(line)\n",
172 |     "                counter +=1\n"
173 |    ]
174 |   },
175 |   {
176 |    "cell_type": "code",
177 |    "execution_count": null,
178 |    "metadata": {},
179 |    "outputs": [],
180 |    "source": []
181 |   }
182 |  ],
183 |  "metadata": {
184 |   "kernelspec": {
185 |    "display_name": "Python 3",
186 |    "language": "python",
187 |    "name": "python3"
188 |   },
189 |   "language_info": {
190 |    "codemirror_mode": {
191 |     "name": "ipython",
192 |     "version": 3
193 |    },
194 |    "file_extension": ".py",
195 |    "mimetype": "text/x-python",
196 |    "name": "python",
197 |    "nbconvert_exporter": "python",
198 |    "pygments_lexer": "ipython3",
199 |    "version": "3.8.2"
200 |   },
201 |   "toc": {
202 |    "base_numbering": 1,
203 |    "nav_menu": {},
204 |    "number_sections": false,
205 |    "sideBar": true,
206 |    "skip_h1_title": false,
207 |    "title_cell": "Table of Contents",
208 |    "title_sidebar": "Contents",
209 |    "toc_cell": false,
210 |    "toc_position": {},
211 |    "toc_section_display": true,
212 |    "toc_window_display": true
213 |   },
214 |   "varInspector": {
215 |    "cols": {
216 |     "lenName": 16,
217 |     "lenType": 16,
218 |     "lenVar": 40
219 |    },
220 |    "kernels_config": {
221 |     "python": {
222 |      "delete_cmd_postfix": "",
223 |      "delete_cmd_prefix": "del ",
224 |      "library": "var_list.py",
225 |      "varRefreshCmd": "print(var_dic_list())"
226 |     },
227 |     "r": {
228 |      "delete_cmd_postfix": ") ",
229 |      "delete_cmd_prefix": "rm(",
230 |      "library": "var_list.r",
231 |      "varRefreshCmd": "cat(var_dic_list()) "
232 |     }
233 |    },
234 |    "types_to_exclude": [
235 |     "module",
236 |     "function",
237 |     "builtin_function_or_method",
238 |     "instance",
239 |     "_Feature"
240 |    ],
241 |    "window_display": false
242 |   }
243 |  },
244 |  "nbformat": 4,
245 |  "nbformat_minor": 4
246 | }
247 | 


--------------------------------------------------------------------------------
/colab/colab_data.sh:
--------------------------------------------------------------------------------
1 | git clone https://github.com/guiwitz/NumpyPandas_course.git
2 | cp -r NumpyPandas_course/Data /content
3 | rm -r NumpyPandas_course/


--------------------------------------------------------------------------------
/svg.py:
--------------------------------------------------------------------------------
  1 | #This module is taken from the Dask project and can be found here:
  2 | #https://github.com/dask/dask/blob/master/dask/array/svg.py
  3 | #It has been slightly modified to allow for the representation of numpy arrays.
  4 | #Here is the accompanying license:
  5 | 
  6 | '''
  7 | Copyright (c) 2014-2018, Anaconda, Inc. and contributors
  8 | All rights reserved.
  9 | 
 10 | Redistribution and use in source and binary forms, with or without modification,
 11 | are permitted provided that the following conditions are met:
 12 | 
 13 | Redistributions of source code must retain the above copyright notice,
 14 | this list of conditions and the following disclaimer.
 15 | 
 16 | Redistributions in binary form must reproduce the above copyright notice,
 17 | this list of conditions and the following disclaimer in the documentation
 18 | and/or other materials provided with the distribution.
 19 | 
 20 | Neither the name of Anaconda nor the names of any contributors may be used to
 21 | endorse or promote products derived from this software without specific prior
 22 | written permission.
 23 | 
 24 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 25 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 26 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 27 | ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 28 | LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 29 | CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 30 | SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 31 | INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 32 | CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 33 | ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
 34 | THE POSSIBILITY OF SUCH DAMAGE.
 35 | '''
 36 | 
 37 | import math
 38 | import re
 39 | 
 40 | import numpy as np
 41 | from IPython.display import HTML
 42 | 
 43 | def svg(chunks, size=200, **kwargs):
 44 |     """ Convert chunks from Dask Array into an SVG Image
 45 | 
 46 |     Parameters
 47 |     ----------
 48 |     chunks: tuple
 49 |     size: int
 50 |         Rough size of the image
 51 | 
 52 |     Returns
 53 |     -------
 54 |     text: An svg string depicting the array as a grid of chunks
 55 |     """
 56 |     shape = tuple(map(sum, chunks))
 57 |     if np.isnan(shape).any():  # don't support unknown sizes
 58 |         raise NotImplementedError(
 59 |             "Can't generate SVG with unknown chunk sizes.\n\n"
 60 |             " A possible solution is with x.compute_chunk_sizes()"
 61 |         )
 62 |     if not all(shape):
 63 |         raise NotImplementedError("Can't generate SVG with 0-length dimensions")
 64 |     if len(chunks) == 0:
 65 |         raise NotImplementedError("Can't generate SVG with 0 dimensions")
 66 |     if len(chunks) == 1:
 67 |         return svg_1d(chunks, size=size, **kwargs)
 68 |     elif len(chunks) == 2:
 69 |         return svg_2d(chunks, size=size, **kwargs)
 70 |     elif len(chunks) == 3:
 71 |         return svg_3d(chunks, size=size, **kwargs)
 72 |     else:
 73 |         return svg_nd(chunks, size=size, **kwargs)
 74 | 
 75 | 
 76 | text_style = 'font-size="1.0rem" font-weight="100" text-anchor="middle"'
 77 | 
 78 | 
 79 | def svg_2d(chunks, offset=(0, 0), skew=(0, 0), size=200, sizes=None):
 80 |     shape = tuple(map(sum, chunks))
 81 |     sizes = sizes or draw_sizes(shape, size=size)
 82 |     y, x = grid_points(chunks, sizes)
 83 | 
 84 |     lines, (min_x, max_x, min_y, max_y) = svg_grid(x, y, offset=offset, skew=skew)
 85 | 
 86 |     header = (
 87 |         '<svg width="%d" height="%d" style="stroke:rgb(0,0,0);stroke-width:1" >\n'
 88 |         % (max_x + 50, max_y + 50)
 89 |     )
 90 |     footer = "\n</svg>"
 91 | 
 92 |     if shape[0] >= 100:
 93 |         rotate = -90
 94 |     else:
 95 |         rotate = 0
 96 | 
 97 |     text = [
 98 |         "",
 99 |         "  <!-- Text -->",
100 |         '  <text x="%f" y="%f" %s >%d</text>'
101 |         % (max_x / 2, max_y + 20, text_style, shape[1]),
102 |         '  <text x="%f" y="%f" %s transform="rotate(%d,%f,%f)">%d</text>'
103 |         % (max_x + 20, max_y / 2, text_style, rotate, max_x + 20, max_y / 2, shape[0]),
104 |     ]
105 | 
106 |     return header + "\n".join(lines + text) + footer
107 | 
108 | 
109 | def svg_3d(chunks, size=200, sizes=None, offset=(0, 0)):
110 |     shape = tuple(map(sum, chunks))
111 |     sizes = sizes or draw_sizes(shape, size=size)
112 |     x, y, z = grid_points(chunks, sizes)
113 |     ox, oy = offset
114 | 
115 |     xy, (mnx, mxx, mny, mxy) = svg_grid(
116 |         x / 1.7, y, offset=(ox + 10, oy + 0), skew=(1, 0)
117 |     )
118 | 
119 |     zx, (_, _, _, max_x) = svg_grid(z, x / 1.7, offset=(ox + 10, oy + 0), skew=(0, 1))
120 |     zy, (min_z, max_z, min_y, max_y) = svg_grid(
121 |         z, y, offset=(ox + max_x + 10, oy + max_x), skew=(0, 0)
122 |     )
123 | 
124 |     header = (
125 |         '<svg width="%d" height="%d" style="stroke:rgb(0,0,0);stroke-width:1" >\n'
126 |         % (max_z + 50, max_y + 50)
127 |     )
128 |     footer = "\n</svg>"
129 | 
130 |     if shape[1] >= 100:
131 |         rotate = -90
132 |     else:
133 |         rotate = 0
134 | 
135 |     text = [
136 |         "",
137 |         "  <!-- Text -->",
138 |         '  <text x="%f" y="%f" %s >%d</text>'
139 |         % ((min_z + max_z) / 2, max_y + 20, text_style, shape[2]),
140 |         '  <text x="%f" y="%f" %s transform="rotate(%d,%f,%f)">%d</text>'
141 |         % (
142 |             max_z + 20,
143 |             (min_y + max_y) / 2,
144 |             text_style,
145 |             rotate,
146 |             max_z + 20,
147 |             (min_y + max_y) / 2,
148 |             shape[1],
149 |         ),
150 |         '  <text x="%f" y="%f" %s transform="rotate(45,%f,%f)">%d</text>'
151 |         % (
152 |             (mnx + mxx) / 2 - 10,
153 |             mxy - (mxx - mnx) / 2 + 20,
154 |             text_style,
155 |             (mnx + mxx) / 2 - 10,
156 |             mxy - (mxx - mnx) / 2 + 20,
157 |             shape[0],
158 |         ),
159 |     ]
160 | 
161 |     return header + "\n".join(xy + zx + zy + text) + footer
162 | 
163 | 
164 | def svg_nd(chunks, size=200):
165 |     if len(chunks) % 3 == 1:
166 |         chunks = ((1,),) + chunks
167 |     shape = tuple(map(sum, chunks))
168 |     sizes = draw_sizes(shape, size=size)
169 | 
170 |     chunks2 = chunks
171 |     sizes2 = sizes
172 |     out = []
173 |     left = 0
174 |     total_height = 0
175 |     while chunks2:
176 |         n = len(chunks2) % 3 or 3
177 |         o = svg(chunks2[:n], sizes=sizes2[:n], offset=(left, 0))
178 |         chunks2 = chunks2[n:]
179 |         sizes2 = sizes2[n:]
180 | 
181 |         lines = o.split("\n")
182 |         header = lines[0]
183 |         height = float(re.search(r'height="(\d*\.?\d*)"', header).groups()[0])
184 |         total_height = max(total_height, height)
185 |         width = float(re.search(r'width="(\d*\.?\d*)"', header).groups()[0])
186 |         left += width + 10
187 |         o = "\n".join(lines[1:-1])  # remove header and footer
188 | 
189 |         out.append(o)
190 | 
191 |     header = (
192 |         '<svg width="%d" height="%d" style="stroke:rgb(0,0,0);stroke-width:1" >\n'
193 |         % (left, total_height)
194 |     )
195 |     footer = "\n</svg>"
196 |     return header + "\n\n".join(out) + footer
197 | 
198 | 
199 | def svg_lines(x1, y1, x2, y2):
200 |     """ Convert points into lines of text for an SVG plot
201 | 
202 |     Examples
203 |     --------
204 |     >>> svg_lines([0, 1], [0, 0], [10, 11], [1, 1])  # doctest: +NORMALIZE_WHITESPACE
205 |     ['  <line x1="0" y1="0" x2="10" y2="1" style="stroke-width:2" />',
206 |      '  <line x1="1" y1="0" x2="11" y2="1" style="stroke-width:2" />']
207 |     """
208 |     n = len(x1)
209 |     lines = [
210 |         '  <line x1="%d" y1="%d" x2="%d" y2="%d" />' % (x1[i], y1[i], x2[i], y2[i])
211 |         for i in range(n)
212 |     ]
213 | 
214 |     lines[0] = lines[0].replace(" /", ' style="stroke-width:2" /')
215 |     lines[-1] = lines[-1].replace(" /", ' style="stroke-width:2" /')
216 |     return lines
217 | 
218 | 
219 | def svg_grid(x, y, offset=(0, 0), skew=(0, 0)):
220 |     """ Create lines of SVG text that show a grid
221 | 
222 |     Parameters
223 |     ----------
224 |     x: numpy.ndarray
225 |     y: numpy.ndarray
226 |     offset: tuple
227 |         translational displacement of the grid in SVG coordinates
228 |     skew: tuple
229 |     """
230 |     # Horizontal lines
231 |     x1 = np.zeros_like(y) + offset[0]
232 |     y1 = y + offset[1]
233 |     x2 = np.full_like(y, x[-1]) + offset[0]
234 |     y2 = y + offset[1]
235 | 
236 |     if skew[0]:
237 |         y2 += x.max() * skew[0]
238 |     if skew[1]:
239 |         x1 += skew[1] * y
240 |         x2 += skew[1] * y
241 | 
242 |     min_x = min(x1.min(), x2.min())
243 |     min_y = min(y1.min(), y2.min())
244 |     max_x = max(x1.max(), x2.max())
245 |     max_y = max(y1.max(), y2.max())
246 | 
247 |     h_lines = ["", "  <!-- Horizontal lines -->"] + svg_lines(x1, y1, x2, y2)
248 | 
249 |     # Vertical lines
250 |     x1 = x + offset[0]
251 |     y1 = np.zeros_like(x) + offset[1]
252 |     x2 = x + offset[0]
253 |     y2 = np.full_like(x, y[-1]) + offset[1]
254 | 
255 |     if skew[0]:
256 |         y1 += skew[0] * x
257 |         y2 += skew[0] * x
258 |     if skew[1]:
259 |         x2 += skew[1] * y.max()
260 | 
261 |     v_lines = ["", "  <!-- Vertical lines -->"] + svg_lines(x1, y1, x2, y2)
262 | 
263 |     rect = [
264 |         "",
265 |         "  <!-- Colored Rectangle -->",
266 |         '  <polygon points="%f,%f %f,%f %f,%f %f,%f" style="fill:#ECB172A0;stroke-width:0"/>'
267 |         % (x1[0], y1[0], x1[-1], y1[-1], x2[-1], y2[-1], x2[0], y2[0]),
268 |     ]
269 | 
270 |     return h_lines + v_lines + rect, (min_x, max_x, min_y, max_y)
271 | 
272 | 
273 | def svg_1d(chunks, sizes=None, **kwargs):
274 |     return svg_2d(((1,),) + chunks, **kwargs)
275 | 
276 | 
277 | def grid_points(chunks, sizes):
278 |     cumchunks = [np.cumsum((0,) + c) for c in chunks]
279 |     points = [x * size / x[-1] for x, size in zip(cumchunks, sizes)]
280 |     return points
281 | 
282 | 
283 | def draw_sizes(shape, size=200):
284 |     """ Get size in pixels for all dimensions """
285 |     mx = max(shape)
286 |     ratios = [mx / max(0.1, d) for d in shape]
287 |     ratios = [ratio_response(r) for r in ratios]
288 |     return tuple(size / r for r in ratios)
289 | 
290 | 
291 | def ratio_response(x):
292 |     """ How we display actual size ratios
293 | 
294 |     Common ratios in sizes span several orders of magnitude,
295 |     which is hard for us to perceive.
296 | 
297 |     We keep ratios in the 1-3 range accurate, and then apply a logarithm to
298 |     values up until about 100 or so, at which point we stop scaling.
299 |     """
300 |     if x < math.e:
301 |         return x
302 |     elif x <= 100:
303 |         return math.log(x + 12.4)  # f(e) == e
304 |     else:
305 |         return math.log(100 + 12.4)
306 |     
307 | def numpy_to_svg(array):
308 |     
309 |     return HTML(svg(tuple((tuple(np.ones(x)) for x in array.shape))))
310 |     
311 |     
312 |     
313 | 


--------------------------------------------------------------------------------