├── 00 Jupyter.ipynb ├── 01 Basics.ipynb ├── 02 Primitive Data Types.ipynb ├── 03 Lists and Tuples.ipynb ├── 04 Sets and Maps.ipynb ├── 05 Control Logics.ipynb ├── 06 Functions and External Libraries.ipynb ├── 07 Comprehensions and Functional Programming.ipynb ├── 08 Classes and Inheritance.ipynb ├── 09 Unittests.ipynb ├── README.md └── requirements.txt /01 Basics.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**What you learn:**\n", 8 | "\n", 9 | "In this notebook you will learn the absolute basics about Python. This includes code vs comments, variables and assignment, printing, arithmetic operations, naming conventions, and very simple string operations.\n", 10 | "\n", 11 | "Assumes that you already set up Jupyter successfully.\n", 12 | "\n", 13 | "Assumes that you installed the [Jupyter variable inspector](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/varInspector/README.html?highlight=varinspector).\n", 14 | "\n", 15 | "Based on a [tutorial by Zhiya Zuo](https://github.com/zhiyzuo/python-tutorial) and extended where appropriate.\n", 16 | "\n", 17 | "Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)\n", 18 | "\n", 19 | "This notebook is available on https://github.com/BigDataAnalyticsGroup/python." 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "### Code vs Comment" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 1, 32 | "metadata": {}, 33 | "outputs": [ 34 | { 35 | "name": "stdout", 36 | "output_type": "stream", 37 | "text": [ 38 | "hello world\n" 39 | ] 40 | } 41 | ], 42 | "source": [ 43 | "# this is a single-line comment\n", 44 | "print('hello world') # this is also a comment starting from the \"#\" symbol: it is ignored by the Python interpreter" 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 2, 50 | "metadata": {}, 51 | "outputs": [ 52 | { 53 | "name": "stdout", 54 | "output_type": "stream", 55 | "text": [ 56 | "hello world\n" 57 | ] 58 | } 59 | ], 60 | "source": [ 61 | "\"\"\"\n", 62 | "This is a comment spanning\n", 63 | "multiple\n", 64 | "lines.\n", 65 | "\"\"\"\n", 66 | "print(\"hello world\")" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "We write one command per line:" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 3, 79 | "metadata": {}, 80 | "outputs": [ 81 | { 82 | "name": "stdout", 83 | "output_type": "stream", 84 | "text": [ 85 | "1\n", 86 | "2\n" 87 | ] 88 | } 89 | ], 90 | "source": [ 91 | "print('1')\n", 92 | "print('2')" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "and not:" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 4, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "name": "stdout", 109 | "output_type": "stream", 110 | "text": [ 111 | "1\n" 112 | ] 113 | } 114 | ], 115 | "source": [ 116 | "print('1') #print('2')" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "unless you separate them by a semicolon:" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 5, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "name": "stdout", 133 | "output_type": "stream", 134 | "text": [ 135 | "1\n", 136 | "2\n" 137 | ] 138 | } 139 | ], 140 | "source": [ 141 | "print('1'); print('2')" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "*recommendation:* use only one statment per line (increases readability)" 149 | ] 150 | }, 151 | { 152 | "cell_type": "markdown", 153 | "metadata": {}, 154 | "source": [ 155 | "### Variables" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": {}, 161 | "source": [ 162 | "Variables can be considered __containers__. You can put anything inside a container, __without specifying the size or type__, which would be needed in Java or C. Note that Python is case-sensitive. Be careful when using characters in different cases." 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": {}, 168 | "source": [ 169 | "When assigning values, we put the variable to be assigned to on the left hand side (LHS), while the value to plug in on the RHS. LHS and RHS are connected by an equal sign (`=`), meaning assignment." 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 6, 175 | "metadata": { 176 | "ExecuteTime": { 177 | "end_time": "2017-10-04T18:38:12.552928Z", 178 | "start_time": "2017-10-04T18:38:12.543725Z" 179 | } 180 | }, 181 | "outputs": [ 182 | { 183 | "name": "stdout", 184 | "output_type": "stream", 185 | "text": [ 186 | "3 \n", 187 | "3.0 \n", 188 | "Hello \n", 189 | "Wonderful! \n" 190 | ] 191 | } 192 | ], 193 | "source": [ 194 | "x = 3 # integer\n", 195 | "y = 3. # floating point number\n", 196 | "z = \"Hello\" # strings\n", 197 | "# another string, stored in a variable capital z.\n", 198 | "Z = \"Wonderful!\"\n", 199 | "print(x, type(x))\n", 200 | "print(y, type(y))\n", 201 | "print(z, type(z))\n", 202 | "print(Z, type(Z))" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": {}, 208 | "source": [ 209 | "You can do operations on numeric values as well as strings." 210 | ] 211 | }, 212 | { 213 | "cell_type": "code", 214 | "execution_count": 7, 215 | "metadata": { 216 | "ExecuteTime": { 217 | "end_time": "2018-10-09T01:38:17.443096Z", 218 | "start_time": "2018-10-09T01:38:17.439036Z" 219 | } 220 | }, 221 | "outputs": [ 222 | { 223 | "name": "stdout", 224 | "output_type": "stream", 225 | "text": [ 226 | "6.0\n" 227 | ] 228 | } 229 | ], 230 | "source": [ 231 | "sum_ = x + y # int + float = float\n", 232 | "print(sum_)" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 8, 238 | "metadata": { 239 | "ExecuteTime": { 240 | "end_time": "2017-10-04T18:38:13.972732Z", 241 | "start_time": "2017-10-04T18:38:13.967236Z" 242 | } 243 | }, 244 | "outputs": [ 245 | { 246 | "name": "stdout", 247 | "output_type": "stream", 248 | "text": [ 249 | "Hello World!\n" 250 | ] 251 | } 252 | ], 253 | "source": [ 254 | "v = \"World!\"\n", 255 | "sum_string = z + \" \" + v # concatenate strings\n", 256 | "print(sum_string)" 257 | ] 258 | }, 259 | { 260 | "cell_type": "markdown", 261 | "metadata": {}, 262 | "source": [ 263 | "Print with formating with `%`" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 9, 269 | "metadata": { 270 | "ExecuteTime": { 271 | "end_time": "2017-10-04T18:38:15.178277Z", 272 | "start_time": "2017-10-04T18:38:15.173368Z" 273 | } 274 | }, 275 | "outputs": [ 276 | { 277 | "name": "stdout", 278 | "output_type": "stream", 279 | "text": [ 280 | "The sum of x and y is 6.0\n" 281 | ] 282 | } 283 | ], 284 | "source": [ 285 | "# %f for floating point number, <.x> specifies x decimal places (Nachkommastellen)\n", 286 | "print(\"The sum of x and y is %.1f\"%sum_) " 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 10, 292 | "metadata": { 293 | "ExecuteTime": { 294 | "end_time": "2017-10-04T18:38:15.472965Z", 295 | "start_time": "2017-10-04T18:38:15.467306Z" 296 | } 297 | }, 298 | "outputs": [ 299 | { 300 | "name": "stdout", 301 | "output_type": "stream", 302 | "text": [ 303 | "The string `sum_string` is 'Hello World!'\n" 304 | ] 305 | } 306 | ], 307 | "source": [ 308 | "# %s for string\n", 309 | "print(\"The string `sum_string` is '%s'\"%sum_string)" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": {}, 315 | "source": [ 316 | "### Arithmetic operations:" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 11, 322 | "metadata": {}, 323 | "outputs": [ 324 | { 325 | "name": "stdout", 326 | "output_type": "stream", 327 | "text": [ 328 | "Sum : 8\n", 329 | "Diff : 2\n", 330 | "Product : 15\n", 331 | "Modulo : 2\n", 332 | "Floor Division : 1\n", 333 | "Float Division : 1.6666666666666667\n" 334 | ] 335 | }, 336 | { 337 | "data": { 338 | "text/plain": [ 339 | "float" 340 | ] 341 | }, 342 | "execution_count": 11, 343 | "metadata": {}, 344 | "output_type": "execute_result" 345 | } 346 | ], 347 | "source": [ 348 | "i = 5\n", 349 | "j = 3\n", 350 | "print(\"Sum : \", i+j)\n", 351 | "print(\"Diff : \" , i-j)\n", 352 | "print(\"Product : \" , i*j)\n", 353 | "print(\"Modulo : \" , i%j)\n", 354 | "print(\"Floor Division : \" , i//j)\n", 355 | "print(\"Float Division : \" , i/j)\n", 356 | "result = i/j\n", 357 | "type(result)" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": {}, 363 | "source": [ 364 | "#### Naming conventions" 365 | ] 366 | }, 367 | { 368 | "cell_type": "markdown", 369 | "metadata": {}, 370 | "source": [ 371 | "There are two commonly used naming conventions in programming:\n", 372 | "\n", 373 | "1. __camelCase__\n", 374 | "2. __snake_case__ or __lower_case_with_underscore__" 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "All variable (function and class) names must start with a letter or underscore (\\_). You can include numbers." 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 12, 387 | "metadata": { 388 | "ExecuteTime": { 389 | "end_time": "2018-10-09T00:16:28.497486Z", 390 | "start_time": "2018-10-09T00:16:28.491966Z" 391 | } 392 | }, 393 | "outputs": [ 394 | { 395 | "data": { 396 | "text/plain": [ 397 | "'my string'" 398 | ] 399 | }, 400 | "execution_count": 12, 401 | "metadata": {}, 402 | "output_type": "execute_result" 403 | } 404 | ], 405 | "source": [ 406 | "myStringHere = 'my string'\n", 407 | "myStringHere" 408 | ] 409 | }, 410 | { 411 | "cell_type": "code", 412 | "execution_count": 13, 413 | "metadata": { 414 | "ExecuteTime": { 415 | "end_time": "2018-10-09T00:16:28.879472Z", 416 | "start_time": "2018-10-09T00:16:28.875812Z" 417 | } 418 | }, 419 | "outputs": [], 420 | "source": [ 421 | "x = 3 # valid\n", 422 | "x_3 = \"xyz\" # valid" 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": 14, 428 | "metadata": { 429 | "ExecuteTime": { 430 | "end_time": "2018-10-09T00:16:29.096887Z", 431 | "start_time": "2018-10-09T00:16:29.091476Z" 432 | } 433 | }, 434 | "outputs": [], 435 | "source": [ 436 | "#3_x = \"456\" # invalid. Numbers cannot be in the first position." 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "metadata": {}, 442 | "source": [ 443 | "You can choose either camel case or snake case. Always make sure you use one convention consistenly across one notebook/project." 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "See more here:\n", 451 | "\n", 452 | "[1] https://www.python.org/dev/peps/pep-0008/#descriptive-naming-styles\n", 453 | "\n", 454 | "[2] https://en.wikipedia.org/wiki/Naming_convention_(programming)" 455 | ] 456 | }, 457 | { 458 | "cell_type": "markdown", 459 | "metadata": {}, 460 | "source": [ 461 | "#### Some notes on Strings" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": {}, 467 | "source": [ 468 | "To initialize a string variable, you can use either double or single quotes." 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": 15, 474 | "metadata": { 475 | "ExecuteTime": { 476 | "end_time": "2018-10-09T00:16:32.520585Z", 477 | "start_time": "2018-10-09T00:16:32.516838Z" 478 | } 479 | }, 480 | "outputs": [ 481 | { 482 | "data": { 483 | "text/plain": [ 484 | "'Data Science and Artificial Intelligence'" 485 | ] 486 | }, 487 | "execution_count": 15, 488 | "metadata": {}, 489 | "output_type": "execute_result" 490 | } 491 | ], 492 | "source": [ 493 | "dsai = \"Data Science and Artificial Intelligence\"\n", 494 | "dsai" 495 | ] 496 | }, 497 | { 498 | "cell_type": "markdown", 499 | "metadata": {}, 500 | "source": [ 501 | "You can think of strings as a sequence of characters (or a __list__ of characters, see the next section). In this case, indices and bracket notations can be used to access specific ranges of characters." 502 | ] 503 | }, 504 | { 505 | "cell_type": "code", 506 | "execution_count": 16, 507 | "metadata": { 508 | "ExecuteTime": { 509 | "end_time": "2018-10-09T00:16:32.814921Z", 510 | "start_time": "2018-10-09T00:16:32.811324Z" 511 | } 512 | }, 513 | "outputs": [ 514 | { 515 | "data": { 516 | "text/plain": [ 517 | "'Art'" 518 | ] 519 | }, 520 | "execution_count": 16, 521 | "metadata": {}, 522 | "output_type": "execute_result" 523 | } 524 | ], 525 | "source": [ 526 | "mySubstring = dsai[17:20] # [start, end), end is exclusive; Python starts with 0 and NOT 1\n", 527 | "mySubstring" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 17, 533 | "metadata": { 534 | "ExecuteTime": { 535 | "end_time": "2018-10-09T00:16:33.080003Z", 536 | "start_time": "2018-10-09T00:16:33.075926Z" 537 | } 538 | }, 539 | "outputs": [ 540 | { 541 | "data": { 542 | "text/plain": [ 543 | "'c'" 544 | ] 545 | }, 546 | "execution_count": 17, 547 | "metadata": {}, 548 | "output_type": "execute_result" 549 | } 550 | ], 551 | "source": [ 552 | "lastLetter = dsai[-2] # -1 means the last element\n", 553 | "lastLetter" 554 | ] 555 | } 556 | ], 557 | "metadata": { 558 | "hide_input": false, 559 | "kernelspec": { 560 | "display_name": "Python 3 (ipykernel)", 561 | "language": "python", 562 | "name": "python3" 563 | }, 564 | "language_info": { 565 | "codemirror_mode": { 566 | "name": "ipython", 567 | "version": 3 568 | }, 569 | "file_extension": ".py", 570 | "mimetype": "text/x-python", 571 | "name": "python", 572 | "nbconvert_exporter": "python", 573 | "pygments_lexer": "ipython3", 574 | "version": "3.9.12" 575 | }, 576 | "toc": { 577 | "base_numbering": 1, 578 | "nav_menu": {}, 579 | "number_sections": true, 580 | "sideBar": true, 581 | "skip_h1_title": false, 582 | "title_cell": "Table of Contents", 583 | "title_sidebar": "Contents", 584 | "toc_cell": false, 585 | "toc_position": { 586 | "height": "677px", 587 | "left": "0px", 588 | "right": "1111px", 589 | "top": "43px", 590 | "width": "340px" 591 | }, 592 | "toc_section_display": "block", 593 | "toc_window_display": true 594 | }, 595 | "varInspector": { 596 | "cols": { 597 | "lenName": 16, 598 | "lenType": 16, 599 | "lenVar": 40 600 | }, 601 | "kernels_config": { 602 | "python": { 603 | "delete_cmd_postfix": "", 604 | "delete_cmd_prefix": "del ", 605 | "library": "var_list.py", 606 | "varRefreshCmd": "print(var_dic_list())" 607 | }, 608 | "r": { 609 | "delete_cmd_postfix": ") ", 610 | "delete_cmd_prefix": "rm(", 611 | "library": "var_list.r", 612 | "varRefreshCmd": "cat(var_dic_list()) " 613 | } 614 | }, 615 | "types_to_exclude": [ 616 | "module", 617 | "function", 618 | "builtin_function_or_method", 619 | "instance", 620 | "_Feature" 621 | ], 622 | "window_display": false 623 | } 624 | }, 625 | "nbformat": 4, 626 | "nbformat_minor": 2 627 | } 628 | -------------------------------------------------------------------------------- /02 Primitive Data Types.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**What you learn:**\n", 8 | "\n", 9 | "In this notebook you will learn about basic (aka primitive) data types in Python. This includes numbers, text/string, boolean, and type conversion.\n", 10 | "\n", 11 | "Assumes that you already set up Jupyter successfully.\n", 12 | "\n", 13 | "Assumes that you installed the [Jupyter variable inspector](https://jupyter-contrib-nbextensions.readthedocs.io/en/latest/nbextensions/varInspector/README.html?highlight=varinspector).\n", 14 | "\n", 15 | "Based on a [tutorial by Zhiya Zuo](https://github.com/zhiyzuo/python-tutorial) and extended where appropriate.\n", 16 | "\n", 17 | "Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)\n", 18 | "\n", 19 | "This notebook is available on https://github.com/BigDataAnalyticsGroup/python." 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "#### Numbers" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "For numbers w/o fractional parts, we say they are ___integer___. In Python, they are called `int`" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": 1, 39 | "metadata": { 40 | "ExecuteTime": { 41 | "end_time": "2018-10-08T23:51:10.632494Z", 42 | "start_time": "2018-10-08T23:51:10.598421Z" 43 | } 44 | }, 45 | "outputs": [ 46 | { 47 | "data": { 48 | "text/plain": [ 49 | "int" 50 | ] 51 | }, 52 | "execution_count": 1, 53 | "metadata": {}, 54 | "output_type": "execute_result" 55 | } 56 | ], 57 | "source": [ 58 | "x = 3\n", 59 | "type(x)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "For numbers w/ fractional parts, they are floating point numbers. They are named `float` in Python." 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": 2, 72 | "metadata": { 73 | "ExecuteTime": { 74 | "end_time": "2018-10-08T23:51:54.013974Z", 75 | "start_time": "2018-10-08T23:51:54.008407Z" 76 | } 77 | }, 78 | "outputs": [ 79 | { 80 | "data": { 81 | "text/plain": [ 82 | "float" 83 | ] 84 | }, 85 | "execution_count": 2, 86 | "metadata": {}, 87 | "output_type": "execute_result" 88 | } 89 | ], 90 | "source": [ 91 | "y = 3.0\n", 92 | "type(y)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": { 98 | "ExecuteTime": { 99 | "end_time": "2018-10-08T23:55:03.134594Z", 100 | "start_time": "2018-10-08T23:55:03.118103Z" 101 | } 102 | }, 103 | "source": [ 104 | "We can apply arithmetic to these numbers. However, one thing we need to be careful about is ___type conversion___. See the example below." 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 3, 110 | "metadata": { 111 | "ExecuteTime": { 112 | "end_time": "2018-10-08T23:55:31.855870Z", 113 | "start_time": "2018-10-08T23:55:31.847879Z" 114 | } 115 | }, 116 | "outputs": [ 117 | { 118 | "data": { 119 | "text/plain": [ 120 | "int" 121 | ] 122 | }, 123 | "execution_count": 3, 124 | "metadata": {}, 125 | "output_type": "execute_result" 126 | } 127 | ], 128 | "source": [ 129 | "z = 2 * x\n", 130 | "type(z)" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 4, 136 | "metadata": { 137 | "ExecuteTime": { 138 | "end_time": "2018-10-08T23:55:36.207707Z", 139 | "start_time": "2018-10-08T23:55:36.202449Z" 140 | } 141 | }, 142 | "outputs": [ 143 | { 144 | "data": { 145 | "text/plain": [ 146 | "float" 147 | ] 148 | }, 149 | "execution_count": 4, 150 | "metadata": {}, 151 | "output_type": "execute_result" 152 | } 153 | ], 154 | "source": [ 155 | "z = y + x # float + int -> float\n", 156 | "type(z)" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": {}, 162 | "source": [ 163 | "#### Text/Characters/Strings" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "In Python, we use `str` type for storing letters, words, and any other characters, as mentioned previously." 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 5, 176 | "metadata": { 177 | "ExecuteTime": { 178 | "end_time": "2018-10-08T23:57:26.831211Z", 179 | "start_time": "2018-10-08T23:57:26.826020Z" 180 | } 181 | }, 182 | "outputs": [ 183 | { 184 | "data": { 185 | "text/plain": [ 186 | "str" 187 | ] 188 | }, 189 | "execution_count": 5, 190 | "metadata": {}, 191 | "output_type": "execute_result" 192 | } 193 | ], 194 | "source": [ 195 | "my_word = \"see you\"\n", 196 | "type(my_word)" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "Unlike numbers, `str` is an iterable object, meaning that we can iterate through each individual character:" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 6, 209 | "metadata": { 210 | "ExecuteTime": { 211 | "end_time": "2018-10-08T23:58:44.058368Z", 212 | "start_time": "2018-10-08T23:58:44.052694Z" 213 | } 214 | }, 215 | "outputs": [ 216 | { 217 | "name": "stdout", 218 | "output_type": "stream", 219 | "text": [ 220 | "s\n", 221 | "e yo\n" 222 | ] 223 | } 224 | ], 225 | "source": [ 226 | "print(my_word[0])\n", 227 | "print(my_word[2:6])" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "We can also use `+` to _concatenate_ different strings " 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 7, 240 | "metadata": { 241 | "ExecuteTime": { 242 | "end_time": "2018-10-08T23:59:16.399980Z", 243 | "start_time": "2018-10-08T23:59:16.395030Z" 244 | } 245 | }, 246 | "outputs": [ 247 | { 248 | "data": { 249 | "text/plain": [ 250 | "str" 251 | ] 252 | }, 253 | "execution_count": 7, 254 | "metadata": {}, 255 | "output_type": "execute_result" 256 | } 257 | ], 258 | "source": [ 259 | "result = my_word + ' tomorrow'\n", 260 | "type(result)" 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "#### Boolean" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": {}, 273 | "source": [ 274 | "Boolean type comes in handy when we need to check conditions. For example:" 275 | ] 276 | }, 277 | { 278 | "cell_type": "code", 279 | "execution_count": 8, 280 | "metadata": { 281 | "ExecuteTime": { 282 | "end_time": "2018-10-09T00:00:33.751846Z", 283 | "start_time": "2018-10-09T00:00:33.746658Z" 284 | } 285 | }, 286 | "outputs": [ 287 | { 288 | "data": { 289 | "text/plain": [ 290 | "(False, bool, 42, 43, 'izg')" 291 | ] 292 | }, 293 | "execution_count": 8, 294 | "metadata": {}, 295 | "output_type": "execute_result" 296 | } 297 | ], 298 | "source": [ 299 | "my_error = 1.6\n", 300 | "compare_result = my_error < 0.1\n", 301 | "compare_result, type(compare_result), 42, 43, \"izg\"" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "There are two and only two valid Boolean values: `True` and `False`. We can also think of them as `1` and `0`, respectively." 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 9, 314 | "metadata": { 315 | "ExecuteTime": { 316 | "end_time": "2018-10-09T00:02:30.834630Z", 317 | "start_time": "2018-10-09T00:02:30.830339Z" 318 | } 319 | }, 320 | "outputs": [ 321 | { 322 | "data": { 323 | "text/plain": [ 324 | "True" 325 | ] 326 | }, 327 | "execution_count": 9, 328 | "metadata": {}, 329 | "output_type": "execute_result" 330 | } 331 | ], 332 | "source": [ 333 | "my_error > 0" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "When using Boolean values for arithmetic operations, they will be converted to `1/0` automatically." 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 10, 346 | "metadata": { 347 | "ExecuteTime": { 348 | "end_time": "2018-10-09T00:02:52.791120Z", 349 | "start_time": "2018-10-09T00:02:52.783315Z" 350 | } 351 | }, 352 | "outputs": [ 353 | { 354 | "data": { 355 | "text/plain": [ 356 | "3" 357 | ] 358 | }, 359 | "execution_count": 10, 360 | "metadata": {}, 361 | "output_type": "execute_result" 362 | } 363 | ], 364 | "source": [ 365 | "(my_error>0) + 2" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": {}, 371 | "source": [ 372 | "Sometimes we want to check whether a particular object is of a particular type. This can be done with `isinstance()`:" 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 11, 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "data": { 382 | "text/plain": [ 383 | "True" 384 | ] 385 | }, 386 | "execution_count": 11, 387 | "metadata": {}, 388 | "output_type": "execute_result" 389 | } 390 | ], 391 | "source": [ 392 | "isinstance(my_error,float)" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 12, 398 | "metadata": {}, 399 | "outputs": [ 400 | { 401 | "data": { 402 | "text/plain": [ 403 | "False" 404 | ] 405 | }, 406 | "execution_count": 12, 407 | "metadata": {}, 408 | "output_type": "execute_result" 409 | } 410 | ], 411 | "source": [ 412 | "isinstance(my_error,int)" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "#### Type Conversion" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": {}, 425 | "source": [ 426 | "Since variables in Python are dynamically typed, we need to be careful about type conversion." 427 | ] 428 | }, 429 | { 430 | "cell_type": "markdown", 431 | "metadata": {}, 432 | "source": [ 433 | "When two variables share the same data type, there is not much to be worried about:" 434 | ] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "execution_count": 13, 439 | "metadata": { 440 | "ExecuteTime": { 441 | "end_time": "2018-10-09T00:04:59.077234Z", 442 | "start_time": "2018-10-09T00:04:59.072252Z" 443 | } 444 | }, 445 | "outputs": [ 446 | { 447 | "data": { 448 | "text/plain": [ 449 | "'no problem. talk to you later'" 450 | ] 451 | }, 452 | "execution_count": 13, 453 | "metadata": {}, 454 | "output_type": "execute_result" 455 | } 456 | ], 457 | "source": [ 458 | "s1 = \"no problem. \"\n", 459 | "s2 = \"talk to you later\"\n", 460 | "s1 + s2" 461 | ] 462 | }, 463 | { 464 | "cell_type": "markdown", 465 | "metadata": {}, 466 | "source": [ 467 | "But be careful when we are mixing variables up:" 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": 14, 473 | "metadata": { 474 | "ExecuteTime": { 475 | "end_time": "2018-10-09T00:06:03.221806Z", 476 | "start_time": "2018-10-09T00:06:03.217855Z" 477 | } 478 | }, 479 | "outputs": [ 480 | { 481 | "data": { 482 | "text/plain": [ 483 | "(5.0, float, 2.0, float)" 484 | ] 485 | }, 486 | "execution_count": 14, 487 | "metadata": {}, 488 | "output_type": "execute_result" 489 | } 490 | ], 491 | "source": [ 492 | "a = 3 # recall that this is an int\n", 493 | "b = 2.0 # float\n", 494 | "c = a + b # float!\n", 495 | "c, type(c), b, type(b)" 496 | ] 497 | }, 498 | { 499 | "cell_type": "markdown", 500 | "metadata": {}, 501 | "source": [ 502 | "To make things work between string and numbers, we can explicitly convert numbers into `str`:" 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": 15, 508 | "metadata": { 509 | "ExecuteTime": { 510 | "end_time": "2018-10-09T00:07:59.974061Z", 511 | "start_time": "2018-10-09T00:07:59.964373Z" 512 | } 513 | }, 514 | "outputs": [], 515 | "source": [ 516 | "# s1 + 3" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": 16, 522 | "metadata": { 523 | "ExecuteTime": { 524 | "end_time": "2018-10-09T00:08:04.034457Z", 525 | "start_time": "2018-10-09T00:08:04.027842Z" 526 | } 527 | }, 528 | "outputs": [ 529 | { 530 | "data": { 531 | "text/plain": [ 532 | "'no problem. 3'" 533 | ] 534 | }, 535 | "execution_count": 16, 536 | "metadata": {}, 537 | "output_type": "execute_result" 538 | } 539 | ], 540 | "source": [ 541 | "s1 + str(3)" 542 | ] 543 | } 544 | ], 545 | "metadata": { 546 | "hide_input": false, 547 | "kernelspec": { 548 | "display_name": "Python 3 (ipykernel)", 549 | "language": "python", 550 | "name": "python3" 551 | }, 552 | "language_info": { 553 | "codemirror_mode": { 554 | "name": "ipython", 555 | "version": 3 556 | }, 557 | "file_extension": ".py", 558 | "mimetype": "text/x-python", 559 | "name": "python", 560 | "nbconvert_exporter": "python", 561 | "pygments_lexer": "ipython3", 562 | "version": "3.9.12" 563 | }, 564 | "toc": { 565 | "base_numbering": 1, 566 | "nav_menu": {}, 567 | "number_sections": true, 568 | "sideBar": true, 569 | "skip_h1_title": false, 570 | "title_cell": "Table of Contents", 571 | "title_sidebar": "Contents", 572 | "toc_cell": false, 573 | "toc_position": { 574 | "height": "677px", 575 | "left": "0px", 576 | "right": "1111px", 577 | "top": "43px", 578 | "width": "340px" 579 | }, 580 | "toc_section_display": "block", 581 | "toc_window_display": true 582 | }, 583 | "varInspector": { 584 | "cols": { 585 | "lenName": 16, 586 | "lenType": 16, 587 | "lenVar": 40 588 | }, 589 | "kernels_config": { 590 | "python": { 591 | "delete_cmd_postfix": "", 592 | "delete_cmd_prefix": "del ", 593 | "library": "var_list.py", 594 | "varRefreshCmd": "print(var_dic_list())" 595 | }, 596 | "r": { 597 | "delete_cmd_postfix": ") ", 598 | "delete_cmd_prefix": "rm(", 599 | "library": "var_list.r", 600 | "varRefreshCmd": "cat(var_dic_list()) " 601 | } 602 | }, 603 | "types_to_exclude": [ 604 | "module", 605 | "function", 606 | "builtin_function_or_method", 607 | "instance", 608 | "_Feature" 609 | ], 610 | "window_display": false 611 | } 612 | }, 613 | "nbformat": 4, 614 | "nbformat_minor": 2 615 | } 616 | -------------------------------------------------------------------------------- /03 Lists and Tuples.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**What you learn:**\n", 8 | "\n", 9 | "In this notebook you will learn about basic data structures in Python. This includes lists and tuples.\n", 10 | "\n", 11 | "Based on a [tutorial by Zhiya Zuo](https://github.com/zhiyzuo/python-tutorial) and extended where appropriate.\n", 12 | "\n", 13 | "Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)\n", 14 | "\n", 15 | "This notebook is available on https://github.com/BigDataAnalyticsGroup/python." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "## List" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "A list is an ordered collection of items that may contain duplicates." 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "Initialize a list with brackets. You can store anything in a list, even if the individual elements have different types. A list may have duplicates." 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 1, 42 | "metadata": { 43 | "ExecuteTime": { 44 | "end_time": "2018-10-09T00:10:24.538230Z", 45 | "start_time": "2018-10-09T00:10:24.528793Z" 46 | } 47 | }, 48 | "outputs": [ 49 | { 50 | "data": { 51 | "text/plain": [ 52 | "(list, [42, 9, 53, 7, 9])" 53 | ] 54 | }, 55 | "execution_count": 1, 56 | "metadata": {}, 57 | "output_type": "execute_result" 58 | } 59 | ], 60 | "source": [ 61 | "# define a list:\n", 62 | "# variable = [el0, el1, ..., eln ]\n", 63 | "a_list = [42, 9, 53, 7, 9] # commas to seperate elements that are part of a list\n", 64 | "type(a_list), a_list" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 2, 70 | "metadata": {}, 71 | "outputs": [ 72 | { 73 | "name": "stdout", 74 | "output_type": "stream", 75 | "text": [ 76 | "Length of a_list is: 5\n", 77 | "The 3rd element of a_list is: 53\n", 78 | "The last element of a_list is: 53\n", 79 | "The sum of a_list is: 120\n" 80 | ] 81 | } 82 | ], 83 | "source": [ 84 | "print(\"Length of a_list is: %i\"%(len(a_list)))\n", 85 | "print(\"The 3rd element of a_list is: %s\" %(a_list[2])) # Remember Python starts with 0\n", 86 | "print(\"The last element of a_list is: %s\" %(a_list[-3])) # -1 means the end\n", 87 | "print(\"The sum of a_list is: %i\"%(sum(a_list)))" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": { 93 | "ExecuteTime": { 94 | "end_time": "2018-10-09T00:10:32.820098Z", 95 | "start_time": "2018-10-09T00:10:32.815341Z" 96 | } 97 | }, 98 | "source": [ 99 | "We can put different elements of different types into a list:" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 3, 105 | "metadata": { 106 | "ExecuteTime": { 107 | "end_time": "2018-10-09T00:10:35.486742Z", 108 | "start_time": "2018-10-09T00:10:35.479281Z" 109 | } 110 | }, 111 | "outputs": [ 112 | { 113 | "data": { 114 | "text/plain": [ 115 | "str" 116 | ] 117 | }, 118 | "execution_count": 3, 119 | "metadata": {}, 120 | "output_type": "execute_result" 121 | } 122 | ], 123 | "source": [ 124 | "b_list = [20, True, \"good\", \"good\"] \n", 125 | "b_list\n", 126 | "type(b_list[2])" 127 | ] 128 | }, 129 | { 130 | "cell_type": "markdown", 131 | "metadata": {}, 132 | "source": [ 133 | "Modify and Update a list using __pop__, __remove__, __append__, __extend__:" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 4, 139 | "metadata": { 140 | "ExecuteTime": { 141 | "end_time": "2018-10-09T00:10:41.107486Z", 142 | "start_time": "2018-10-09T00:10:41.102493Z" 143 | } 144 | }, 145 | "outputs": [ 146 | { 147 | "data": { 148 | "text/plain": [ 149 | "[42, 9, 53, 7, 8, 2, 3, 1]" 150 | ] 151 | }, 152 | "execution_count": 4, 153 | "metadata": {}, 154 | "output_type": "execute_result" 155 | } 156 | ], 157 | "source": [ 158 | "a_list = [42, 9, 53, 7, 8, 2, 3, 1]\n", 159 | "a_list" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 5, 165 | "metadata": {}, 166 | "outputs": [ 167 | { 168 | "name": "stdout", 169 | "output_type": "stream", 170 | "text": [ 171 | "Pop 9 out of a_list\n", 172 | "[42, 53, 7, 8, 2, 3, 1]\n" 173 | ] 174 | } 175 | ], 176 | "source": [ 177 | "print(\"Pop %i out of a_list\"%a_list.pop(1)) # pop (ie.e. remove) the value at an index position\n", 178 | "print(a_list)" 179 | ] 180 | }, 181 | { 182 | "cell_type": "code", 183 | "execution_count": 6, 184 | "metadata": {}, 185 | "outputs": [ 186 | { 187 | "name": "stdout", 188 | "output_type": "stream", 189 | "text": [ 190 | "Pop 7 out of a_list\n", 191 | "[42, 53, 8, 2, 3, 1]\n" 192 | ] 193 | } 194 | ], 195 | "source": [ 196 | "print(\"Pop %i out of a_list\"%a_list.pop(2)) # pop (ie.e. remove) the value at an index position\n", 197 | "print(a_list)" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 7, 203 | "metadata": { 204 | "ExecuteTime": { 205 | "end_time": "2018-10-09T00:10:43.906572Z", 206 | "start_time": "2018-10-09T00:10:43.899877Z" 207 | } 208 | }, 209 | "outputs": [ 210 | { 211 | "data": { 212 | "text/plain": [ 213 | "[20, True, 'good', 42, 'good', 11]" 214 | ] 215 | }, 216 | "execution_count": 7, 217 | "metadata": {}, 218 | "output_type": "execute_result" 219 | } 220 | ], 221 | "source": [ 222 | "b_list = [20, True, \"good\", 42, \"good\", 11] \n", 223 | "b_list" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 8, 229 | "metadata": {}, 230 | "outputs": [ 231 | { 232 | "name": "stdout", 233 | "output_type": "stream", 234 | "text": [ 235 | "Remove the string good from b_list:\n", 236 | "[20, True, 42, 'good', 11]\n" 237 | ] 238 | } 239 | ], 240 | "source": [ 241 | "print(\"Remove the string good from b_list:\")\n", 242 | "b_list.remove(\"good\") # remove first occurence(!) of a specific value\n", 243 | "print(b_list)" 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": 9, 249 | "metadata": {}, 250 | "outputs": [ 251 | { 252 | "data": { 253 | "text/plain": [ 254 | "[42, 53, 8, 2, 3, 1]" 255 | ] 256 | }, 257 | "execution_count": 9, 258 | "metadata": {}, 259 | "output_type": "execute_result" 260 | } 261 | ], 262 | "source": [ 263 | "a_list" 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 10, 269 | "metadata": { 270 | "ExecuteTime": { 271 | "end_time": "2018-10-09T00:10:45.307563Z", 272 | "start_time": "2018-10-09T00:10:45.302951Z" 273 | } 274 | }, 275 | "outputs": [ 276 | { 277 | "name": "stdout", 278 | "output_type": "stream", 279 | "text": [ 280 | "[42, 53, 8, 2, 3, 1, 10]\n" 281 | ] 282 | } 283 | ], 284 | "source": [ 285 | "a_list.append(10) # append integer 10 to the end of the list\n", 286 | "print(a_list)" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 11, 292 | "metadata": {}, 293 | "outputs": [ 294 | { 295 | "data": { 296 | "text/plain": [ 297 | "([42, 53, 8, 2, 3, 1, 10], [20, True, 42, 'good', 11])" 298 | ] 299 | }, 300 | "execution_count": 11, 301 | "metadata": {}, 302 | "output_type": "execute_result" 303 | } 304 | ], 305 | "source": [ 306 | "a_list, b_list" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": { 312 | "ExecuteTime": { 313 | "end_time": "2018-10-09T00:10:45.952388Z", 314 | "start_time": "2018-10-09T00:10:45.943473Z" 315 | } 316 | }, 317 | "source": [ 318 | "merge `a_list` and `b_list`, i.e. append all elements of `b_list` to the end of `a_list`: " 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 12, 324 | "metadata": { 325 | "ExecuteTime": { 326 | "end_time": "2018-10-09T00:12:02.249064Z", 327 | "start_time": "2018-10-09T00:12:02.241901Z" 328 | } 329 | }, 330 | "outputs": [ 331 | { 332 | "name": "stdout", 333 | "output_type": "stream", 334 | "text": [ 335 | "Merging a_list and b_list: [42, 53, 8, 2, 3, 1, 10, 20, True, 42, 'good', 11]\n" 336 | ] 337 | } 338 | ], 339 | "source": [ 340 | "a_list.extend(b_list)\n", 341 | "print(\"Merging a_list and b_list: %s\"%(str(a_list)))" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": { 347 | "ExecuteTime": { 348 | "end_time": "2018-10-09T00:10:45.952388Z", 349 | "start_time": "2018-10-09T00:10:45.943473Z" 350 | } 351 | }, 352 | "source": [ 353 | "We can also use `+` as a shorthand to concatenate two lists:" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": 13, 359 | "metadata": { 360 | "ExecuteTime": { 361 | "end_time": "2018-10-09T00:12:03.301969Z", 362 | "start_time": "2018-10-09T00:12:03.295626Z" 363 | } 364 | }, 365 | "outputs": [ 366 | { 367 | "data": { 368 | "text/plain": [ 369 | "[42, 53, 8, 2, 3, 1, 10, 20, True, 42, 'good', 11, 20, True, 42, 'good', 11]" 370 | ] 371 | }, 372 | "execution_count": 13, 373 | "metadata": {}, 374 | "output_type": "execute_result" 375 | } 376 | ], 377 | "source": [ 378 | "a_list + b_list " 379 | ] 380 | }, 381 | { 382 | "cell_type": "code", 383 | "execution_count": 14, 384 | "metadata": {}, 385 | "outputs": [ 386 | { 387 | "data": { 388 | "text/plain": [ 389 | "[]" 390 | ] 391 | }, 392 | "execution_count": 14, 393 | "metadata": {}, 394 | "output_type": "execute_result" 395 | } 396 | ], 397 | "source": [ 398 | "foo = [] # create an empty list\n", 399 | "foo" 400 | ] 401 | }, 402 | { 403 | "cell_type": "code", 404 | "execution_count": 15, 405 | "metadata": {}, 406 | "outputs": [ 407 | { 408 | "data": { 409 | "text/plain": [ 410 | "['A', 'B', 'C']" 411 | ] 412 | }, 413 | "execution_count": 15, 414 | "metadata": {}, 415 | "output_type": "execute_result" 416 | } 417 | ], 418 | "source": [ 419 | "foo.append(\"A\") # add to the end\n", 420 | "foo.append('B') # add to the end, single or double quote makes no difference\n", 421 | "foo.append(\"C\") # add to the end\n", 422 | "foo" 423 | ] 424 | }, 425 | { 426 | "cell_type": "code", 427 | "execution_count": 16, 428 | "metadata": {}, 429 | "outputs": [ 430 | { 431 | "data": { 432 | "text/plain": [ 433 | "['A', 'B', 'C', 'F']" 434 | ] 435 | }, 436 | "execution_count": 16, 437 | "metadata": {}, 438 | "output_type": "execute_result" 439 | } 440 | ], 441 | "source": [ 442 | "foo.insert(5,\"F\") # insert at an index\n", 443 | "foo" 444 | ] 445 | }, 446 | { 447 | "cell_type": "code", 448 | "execution_count": 17, 449 | "metadata": {}, 450 | "outputs": [ 451 | { 452 | "data": { 453 | "text/plain": [ 454 | "(['A', 'B', 'C', 'F'], ['A', 'B', 'C', 'F'])" 455 | ] 456 | }, 457 | "execution_count": 17, 458 | "metadata": {}, 459 | "output_type": "execute_result" 460 | } 461 | ], 462 | "source": [ 463 | "sorted_list = sorted(foo) # returns a new, sorted list\n", 464 | "sorted_list, foo" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 18, 470 | "metadata": {}, 471 | "outputs": [ 472 | { 473 | "data": { 474 | "text/plain": [ 475 | "['A', 'B', 'C', 'F']" 476 | ] 477 | }, 478 | "execution_count": 18, 479 | "metadata": {}, 480 | "output_type": "execute_result" 481 | } 482 | ], 483 | "source": [ 484 | "foo.sort() # in place sort\n", 485 | "foo" 486 | ] 487 | }, 488 | { 489 | "cell_type": "code", 490 | "execution_count": 19, 491 | "metadata": {}, 492 | "outputs": [ 493 | { 494 | "data": { 495 | "text/plain": [ 496 | "['A', 'C', 'F']" 497 | ] 498 | }, 499 | "execution_count": 19, 500 | "metadata": {}, 501 | "output_type": "execute_result" 502 | } 503 | ], 504 | "source": [ 505 | "foo.pop(1) # return and remove the item at index 2\n", 506 | "foo" 507 | ] 508 | }, 509 | { 510 | "cell_type": "code", 511 | "execution_count": 20, 512 | "metadata": {}, 513 | "outputs": [ 514 | { 515 | "data": { 516 | "text/plain": [ 517 | "['F', 'C', 'A']" 518 | ] 519 | }, 520 | "execution_count": 20, 521 | "metadata": {}, 522 | "output_type": "execute_result" 523 | } 524 | ], 525 | "source": [ 526 | "foo.reverse() # reverse the list\n", 527 | "foo" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 21, 533 | "metadata": {}, 534 | "outputs": [], 535 | "source": [ 536 | "foo.append(\"A\")" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 22, 542 | "metadata": {}, 543 | "outputs": [ 544 | { 545 | "data": { 546 | "text/plain": [ 547 | "['F', 'C', 'A', 'A']" 548 | ] 549 | }, 550 | "execution_count": 22, 551 | "metadata": {}, 552 | "output_type": "execute_result" 553 | } 554 | ], 555 | "source": [ 556 | "foo" 557 | ] 558 | }, 559 | { 560 | "cell_type": "code", 561 | "execution_count": 23, 562 | "metadata": {}, 563 | "outputs": [ 564 | { 565 | "data": { 566 | "text/plain": [ 567 | "2" 568 | ] 569 | }, 570 | "execution_count": 23, 571 | "metadata": {}, 572 | "output_type": "execute_result" 573 | } 574 | ], 575 | "source": [ 576 | "a = foo.index(\"A\") # return index of \"A\"\n", 577 | "a" 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": 24, 583 | "metadata": {}, 584 | "outputs": [ 585 | { 586 | "data": { 587 | "text/plain": [ 588 | "'C'" 589 | ] 590 | }, 591 | "execution_count": 24, 592 | "metadata": {}, 593 | "output_type": "execute_result" 594 | } 595 | ], 596 | "source": [ 597 | "foo.pop(foo.index(\"C\"))" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "metadata": {}, 603 | "source": [ 604 | "## Tuple" 605 | ] 606 | }, 607 | { 608 | "cell_type": "markdown", 609 | "metadata": { 610 | "ExecuteTime": { 611 | "end_time": "2017-10-03T15:58:48.469747Z", 612 | "start_time": "2017-10-03T15:58:48.461327Z" 613 | } 614 | }, 615 | "source": [ 616 | "A tuple is a special case of a list whose elements cannot be changed." 617 | ] 618 | }, 619 | { 620 | "cell_type": "markdown", 621 | "metadata": {}, 622 | "source": [ 623 | "Initialize a tuple with parenthesis. " 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 25, 629 | "metadata": { 630 | "ExecuteTime": { 631 | "end_time": "2018-10-09T00:12:16.790034Z", 632 | "start_time": "2018-10-09T00:12:16.784149Z" 633 | } 634 | }, 635 | "outputs": [ 636 | { 637 | "name": "stdout", 638 | "output_type": "stream", 639 | "text": [ 640 | "(1, 2, 3, 10)\n", 641 | "First element of a_tuple: 1\n" 642 | ] 643 | } 644 | ], 645 | "source": [ 646 | "a_tuple = (1, 2, 3, 10)\n", 647 | "print(a_tuple)\n", 648 | "print(\"First element of a_tuple: %i\"%a_tuple[0])" 649 | ] 650 | }, 651 | { 652 | "cell_type": "code", 653 | "execution_count": 26, 654 | "metadata": {}, 655 | "outputs": [], 656 | "source": [ 657 | "#a_tuple.remove(2)" 658 | ] 659 | }, 660 | { 661 | "cell_type": "markdown", 662 | "metadata": { 663 | "ExecuteTime": { 664 | "end_time": "2018-10-09T00:12:17.683012Z", 665 | "start_time": "2018-10-09T00:12:17.672651Z" 666 | } 667 | }, 668 | "source": [ 669 | "You cannot change the values of a_tuple:" 670 | ] 671 | }, 672 | { 673 | "cell_type": "code", 674 | "execution_count": 27, 675 | "metadata": { 676 | "ExecuteTime": { 677 | "end_time": "2018-10-09T00:12:24.288097Z", 678 | "start_time": "2018-10-09T00:12:24.278444Z" 679 | } 680 | }, 681 | "outputs": [], 682 | "source": [ 683 | "#a_tuple[0] = 5" 684 | ] 685 | }, 686 | { 687 | "cell_type": "markdown", 688 | "metadata": {}, 689 | "source": [ 690 | "In order to create a single value tuple, you need to add a ',':" 691 | ] 692 | }, 693 | { 694 | "cell_type": "code", 695 | "execution_count": 28, 696 | "metadata": {}, 697 | "outputs": [ 698 | { 699 | "data": { 700 | "text/plain": [ 701 | "int" 702 | ] 703 | }, 704 | "execution_count": 28, 705 | "metadata": {}, 706 | "output_type": "execute_result" 707 | } 708 | ], 709 | "source": [ 710 | "a_tuple = (1) # this would create an int type\n", 711 | "type(a_tuple)" 712 | ] 713 | }, 714 | { 715 | "cell_type": "code", 716 | "execution_count": 29, 717 | "metadata": {}, 718 | "outputs": [ 719 | { 720 | "data": { 721 | "text/plain": [ 722 | "tuple" 723 | ] 724 | }, 725 | "execution_count": 29, 726 | "metadata": {}, 727 | "output_type": "execute_result" 728 | } 729 | ], 730 | "source": [ 731 | "b_tuple = (42,) # this would create a tuple type, take note of the comma.\n", 732 | "type(b_tuple)" 733 | ] 734 | } 735 | ], 736 | "metadata": { 737 | "hide_input": false, 738 | "kernelspec": { 739 | "display_name": "Python 3 (ipykernel)", 740 | "language": "python", 741 | "name": "python3" 742 | }, 743 | "language_info": { 744 | "codemirror_mode": { 745 | "name": "ipython", 746 | "version": 3 747 | }, 748 | "file_extension": ".py", 749 | "mimetype": "text/x-python", 750 | "name": "python", 751 | "nbconvert_exporter": "python", 752 | "pygments_lexer": "ipython3", 753 | "version": "3.9.12" 754 | }, 755 | "toc": { 756 | "base_numbering": 1, 757 | "nav_menu": {}, 758 | "number_sections": true, 759 | "sideBar": true, 760 | "skip_h1_title": false, 761 | "title_cell": "Table of Contents", 762 | "title_sidebar": "Contents", 763 | "toc_cell": false, 764 | "toc_position": { 765 | "height": "677px", 766 | "left": "0px", 767 | "right": "1111px", 768 | "top": "43px", 769 | "width": "340px" 770 | }, 771 | "toc_section_display": "block", 772 | "toc_window_display": true 773 | }, 774 | "varInspector": { 775 | "cols": { 776 | "lenName": 16, 777 | "lenType": 16, 778 | "lenVar": 40 779 | }, 780 | "kernels_config": { 781 | "python": { 782 | "delete_cmd_postfix": "", 783 | "delete_cmd_prefix": "del ", 784 | "library": "var_list.py", 785 | "varRefreshCmd": "print(var_dic_list())" 786 | }, 787 | "r": { 788 | "delete_cmd_postfix": ") ", 789 | "delete_cmd_prefix": "rm(", 790 | "library": "var_list.r", 791 | "varRefreshCmd": "cat(var_dic_list()) " 792 | } 793 | }, 794 | "types_to_exclude": [ 795 | "module", 796 | "function", 797 | "builtin_function_or_method", 798 | "instance", 799 | "_Feature" 800 | ], 801 | "window_display": false 802 | } 803 | }, 804 | "nbformat": 4, 805 | "nbformat_minor": 2 806 | } 807 | -------------------------------------------------------------------------------- /04 Sets and Maps.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**What you learn:**\n", 8 | "\n", 9 | "In this notebook you will learn about basic data structures in Python. This includes sets and maps (dictionaries).\n", 10 | "\n", 11 | "Based on a [tutorial by Zhiya Zuo](https://github.com/zhiyzuo/python-tutorial) and extended where appropriate.\n", 12 | "\n", 13 | "Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)\n", 14 | "\n", 15 | "This notebook is available on https://github.com/BigDataAnalyticsGroup/python." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "## Set" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "a set is an unordered, duplicate-free collection of items" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 1, 35 | "metadata": {}, 36 | "outputs": [ 37 | { 38 | "data": { 39 | "text/plain": [ 40 | "([42, 9, 53, 7, 9], list)" 41 | ] 42 | }, 43 | "execution_count": 1, 44 | "metadata": {}, 45 | "output_type": "execute_result" 46 | } 47 | ], 48 | "source": [ 49 | "a_list = [42, 9, 53, 7, 9] \n", 50 | "a_list, type(a_list)" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 2, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "data": { 60 | "text/plain": [ 61 | "({7, 9, 42, 53}, set)" 62 | ] 63 | }, 64 | "execution_count": 2, 65 | "metadata": {}, 66 | "output_type": "execute_result" 67 | } 68 | ], 69 | "source": [ 70 | "a_set = {42, 9, 53, 7, 9}\n", 71 | "a_set, type(a_set)" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 3, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "data": { 81 | "text/plain": [ 82 | "['__and__',\n", 83 | " '__class__',\n", 84 | " '__class_getitem__',\n", 85 | " '__contains__',\n", 86 | " '__delattr__',\n", 87 | " '__dir__',\n", 88 | " '__doc__',\n", 89 | " '__eq__',\n", 90 | " '__format__',\n", 91 | " '__ge__',\n", 92 | " '__getattribute__',\n", 93 | " '__gt__',\n", 94 | " '__hash__',\n", 95 | " '__iand__',\n", 96 | " '__init__',\n", 97 | " '__init_subclass__',\n", 98 | " '__ior__',\n", 99 | " '__isub__',\n", 100 | " '__iter__',\n", 101 | " '__ixor__',\n", 102 | " '__le__',\n", 103 | " '__len__',\n", 104 | " '__lt__',\n", 105 | " '__ne__',\n", 106 | " '__new__',\n", 107 | " '__or__',\n", 108 | " '__rand__',\n", 109 | " '__reduce__',\n", 110 | " '__reduce_ex__',\n", 111 | " '__repr__',\n", 112 | " '__ror__',\n", 113 | " '__rsub__',\n", 114 | " '__rxor__',\n", 115 | " '__setattr__',\n", 116 | " '__sizeof__',\n", 117 | " '__str__',\n", 118 | " '__sub__',\n", 119 | " '__subclasshook__',\n", 120 | " '__xor__',\n", 121 | " 'add',\n", 122 | " 'clear',\n", 123 | " 'copy',\n", 124 | " 'difference',\n", 125 | " 'difference_update',\n", 126 | " 'discard',\n", 127 | " 'intersection',\n", 128 | " 'intersection_update',\n", 129 | " 'isdisjoint',\n", 130 | " 'issubset',\n", 131 | " 'issuperset',\n", 132 | " 'pop',\n", 133 | " 'remove',\n", 134 | " 'symmetric_difference',\n", 135 | " 'symmetric_difference_update',\n", 136 | " 'union',\n", 137 | " 'update']" 138 | ] 139 | }, 140 | "execution_count": 3, 141 | "metadata": {}, 142 | "output_type": "execute_result" 143 | } 144 | ], 145 | "source": [ 146 | "dir(a_set)" 147 | ] 148 | }, 149 | { 150 | "cell_type": "code", 151 | "execution_count": 4, 152 | "metadata": {}, 153 | "outputs": [ 154 | { 155 | "data": { 156 | "text/plain": [ 157 | "{7, 9, 42, 43, 53}" 158 | ] 159 | }, 160 | "execution_count": 4, 161 | "metadata": {}, 162 | "output_type": "execute_result" 163 | } 164 | ], 165 | "source": [ 166 | "a_set.add(43)\n", 167 | "a_set" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 5, 173 | "metadata": {}, 174 | "outputs": [ 175 | { 176 | "data": { 177 | "text/plain": [ 178 | "{7, 9, 42, 53}" 179 | ] 180 | }, 181 | "execution_count": 5, 182 | "metadata": {}, 183 | "output_type": "execute_result" 184 | } 185 | ], 186 | "source": [ 187 | "a_set.remove(43)\n", 188 | "a_set" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 6, 194 | "metadata": {}, 195 | "outputs": [ 196 | { 197 | "data": { 198 | "text/plain": [ 199 | "{7, 9, 42, 53}" 200 | ] 201 | }, 202 | "execution_count": 6, 203 | "metadata": {}, 204 | "output_type": "execute_result" 205 | } 206 | ], 207 | "source": [ 208 | "a_set" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 7, 214 | "metadata": {}, 215 | "outputs": [ 216 | { 217 | "data": { 218 | "text/plain": [ 219 | "7" 220 | ] 221 | }, 222 | "execution_count": 7, 223 | "metadata": {}, 224 | "output_type": "execute_result" 225 | } 226 | ], 227 | "source": [ 228 | "a_set.pop() # remove an arbitrary element from the set" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 8, 234 | "metadata": {}, 235 | "outputs": [ 236 | { 237 | "data": { 238 | "text/plain": [ 239 | "[42, 9, 53, 7, 9]" 240 | ] 241 | }, 242 | "execution_count": 8, 243 | "metadata": {}, 244 | "output_type": "execute_result" 245 | } 246 | ], 247 | "source": [ 248 | "a_list" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 9, 254 | "metadata": {}, 255 | "outputs": [ 256 | { 257 | "data": { 258 | "text/plain": [ 259 | "({7, 9, 42, 53}, set)" 260 | ] 261 | }, 262 | "execution_count": 9, 263 | "metadata": {}, 264 | "output_type": "execute_result" 265 | } 266 | ], 267 | "source": [ 268 | "# you can convert a list to a set:\n", 269 | "conv = set(a_list)\n", 270 | "conv, type(conv)" 271 | ] 272 | }, 273 | { 274 | "cell_type": "code", 275 | "execution_count": 10, 276 | "metadata": {}, 277 | "outputs": [ 278 | { 279 | "data": { 280 | "text/plain": [ 281 | "{9, 42, 53}" 282 | ] 283 | }, 284 | "execution_count": 10, 285 | "metadata": {}, 286 | "output_type": "execute_result" 287 | } 288 | ], 289 | "source": [ 290 | "a_set" 291 | ] 292 | }, 293 | { 294 | "cell_type": "code", 295 | "execution_count": 11, 296 | "metadata": {}, 297 | "outputs": [ 298 | { 299 | "data": { 300 | "text/plain": [ 301 | "([9, 42, 53], list)" 302 | ] 303 | }, 304 | "execution_count": 11, 305 | "metadata": {}, 306 | "output_type": "execute_result" 307 | } 308 | ], 309 | "source": [ 310 | "# and vice versa:\n", 311 | "# you can convert a list to a set:\n", 312 | "conv = list(a_set)\n", 313 | "conv, type(conv)" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "## Map (aka Dictionaries): key-value pairs" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": {}, 326 | "source": [ 327 | "A dictionary (aka map) is an unordered collection of keys that are mapped to values, the values mapped to may contain duplicates." 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": {}, 333 | "source": [ 334 | "Initialize a dictionary using curly brackets `{}`:" 335 | ] 336 | }, 337 | { 338 | "cell_type": "code", 339 | "execution_count": 12, 340 | "metadata": { 341 | "ExecuteTime": { 342 | "end_time": "2018-10-09T00:12:58.821876Z", 343 | "start_time": "2018-10-09T00:12:58.817697Z" 344 | } 345 | }, 346 | "outputs": [ 347 | { 348 | "data": { 349 | "text/plain": [ 350 | "{1: 'foo', 7: 'bar', 3: 'blubb'}" 351 | ] 352 | }, 353 | "execution_count": 12, 354 | "metadata": {}, 355 | "output_type": "execute_result" 356 | } 357 | ], 358 | "source": [ 359 | "d = {} # empty dictionary\n", 360 | "d[1] = \"foo\" # add a key-value mapping by using []-bracket (key).\n", 361 | "d[7] = \"bar\"\n", 362 | "d[3] = \"blubb\"\n", 363 | "d" 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": 13, 369 | "metadata": {}, 370 | "outputs": [ 371 | { 372 | "data": { 373 | "text/plain": [ 374 | "{1: 'foo', 7: 'bar', 3: 'blubb', 'KI': 'AI'}" 375 | ] 376 | }, 377 | "execution_count": 13, 378 | "metadata": {}, 379 | "output_type": "execute_result" 380 | } 381 | ], 382 | "source": [ 383 | "d['KI'] = 'AI'\n", 384 | "d" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 14, 390 | "metadata": {}, 391 | "outputs": [ 392 | { 393 | "data": { 394 | "text/plain": [ 395 | "'AI'" 396 | ] 397 | }, 398 | "execution_count": 14, 399 | "metadata": {}, 400 | "output_type": "execute_result" 401 | } 402 | ], 403 | "source": [ 404 | "d['KI'] " 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 15, 410 | "metadata": {}, 411 | "outputs": [ 412 | { 413 | "data": { 414 | "text/plain": [ 415 | "set" 416 | ] 417 | }, 418 | "execution_count": 15, 419 | "metadata": {}, 420 | "output_type": "execute_result" 421 | } 422 | ], 423 | "source": [ 424 | "#notice that the type of {} is dictionary and NOT set (this is for historic reasons)\n", 425 | "type(set())" 426 | ] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": {}, 431 | "source": [ 432 | "list all keys present in the dictionary:" 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": 16, 438 | "metadata": {}, 439 | "outputs": [ 440 | { 441 | "data": { 442 | "text/plain": [ 443 | "[1, 7, 3, 'KI']" 444 | ] 445 | }, 446 | "execution_count": 16, 447 | "metadata": {}, 448 | "output_type": "execute_result" 449 | } 450 | ], 451 | "source": [ 452 | "list(d.keys())" 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": 17, 458 | "metadata": {}, 459 | "outputs": [ 460 | { 461 | "data": { 462 | "text/plain": [ 463 | "{'anchor': 2, 'dock': 3}" 464 | ] 465 | }, 466 | "execution_count": 17, 467 | "metadata": {}, 468 | "output_type": "execute_result" 469 | } 470 | ], 471 | "source": [ 472 | "#wordcount_map = {} # create a new, empty dict\n", 473 | "wordcount_map = {\"anchor\":2, \"dock\":3} # create a new dict and add key-values\n", 474 | "wordcount_map" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": 18, 480 | "metadata": {}, 481 | "outputs": [ 482 | { 483 | "data": { 484 | "text/plain": [ 485 | "{'anchor': 2, 'dock': 3, 'the': 10, 'a': 8, 'boat': 1}" 486 | ] 487 | }, 488 | "execution_count": 18, 489 | "metadata": {}, 490 | "output_type": "execute_result" 491 | } 492 | ], 493 | "source": [ 494 | "# add keys and values:\n", 495 | "wordcount_map[\"the\"] = 10\n", 496 | "wordcount_map[\"a\"] = 8\n", 497 | "wordcount_map[\"boat\"] = 1\n", 498 | "wordcount_map" 499 | ] 500 | }, 501 | { 502 | "cell_type": "code", 503 | "execution_count": 19, 504 | "metadata": {}, 505 | "outputs": [ 506 | { 507 | "name": "stdout", 508 | "output_type": "stream", 509 | "text": [ 510 | "10\n", 511 | "['anchor', 'dock', 'the', 'a', 'boat']\n", 512 | "[2, 3, 10, 8, 1]\n" 513 | ] 514 | } 515 | ], 516 | "source": [ 517 | "print(wordcount_map[\"the\"]) # value of a key\n", 518 | "print(list(wordcount_map.keys())) # List of keys\n", 519 | "print(list(wordcount_map.values())) # List of values" 520 | ] 521 | }, 522 | { 523 | "cell_type": "code", 524 | "execution_count": 20, 525 | "metadata": {}, 526 | "outputs": [ 527 | { 528 | "name": "stdout", 529 | "output_type": "stream", 530 | "text": [ 531 | "True\n" 532 | ] 533 | } 534 | ], 535 | "source": [ 536 | "print(\"a\" in wordcount_map) # True" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 21, 542 | "metadata": {}, 543 | "outputs": [], 544 | "source": [ 545 | "mySet = {3,7,2,5}\n" 546 | ] 547 | }, 548 | { 549 | "cell_type": "code", 550 | "execution_count": 22, 551 | "metadata": {}, 552 | "outputs": [ 553 | { 554 | "data": { 555 | "text/plain": [ 556 | "False" 557 | ] 558 | }, 559 | "execution_count": 22, 560 | "metadata": {}, 561 | "output_type": "execute_result" 562 | } 563 | ], 564 | "source": [ 565 | "42 in mySet" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": 23, 571 | "metadata": {}, 572 | "outputs": [ 573 | { 574 | "name": "stdout", 575 | "output_type": "stream", 576 | "text": [ 577 | "[('anchor', 2), ('dock', 3), ('the', 10), ('a', 8), ('boat', 1)]\n" 578 | ] 579 | } 580 | ], 581 | "source": [ 582 | "print(list(wordcount_map.items())) #prints tuples of key-value pairs" 583 | ] 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": 24, 588 | "metadata": {}, 589 | "outputs": [ 590 | { 591 | "name": "stdout", 592 | "output_type": "stream", 593 | "text": [ 594 | "3\n" 595 | ] 596 | } 597 | ], 598 | "source": [ 599 | "print(wordcount_map[\"dock\"]) # throws a KeyError" 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": 25, 605 | "metadata": {}, 606 | "outputs": [ 607 | { 608 | "data": { 609 | "text/plain": [ 610 | "['__and__',\n", 611 | " '__class__',\n", 612 | " '__class_getitem__',\n", 613 | " '__contains__',\n", 614 | " '__delattr__',\n", 615 | " '__dir__',\n", 616 | " '__doc__',\n", 617 | " '__eq__',\n", 618 | " '__format__',\n", 619 | " '__ge__',\n", 620 | " '__getattribute__',\n", 621 | " '__gt__',\n", 622 | " '__hash__',\n", 623 | " '__iand__',\n", 624 | " '__init__',\n", 625 | " '__init_subclass__',\n", 626 | " '__ior__',\n", 627 | " '__isub__',\n", 628 | " '__iter__',\n", 629 | " '__ixor__',\n", 630 | " '__le__',\n", 631 | " '__len__',\n", 632 | " '__lt__',\n", 633 | " '__ne__',\n", 634 | " '__new__',\n", 635 | " '__or__',\n", 636 | " '__rand__',\n", 637 | " '__reduce__',\n", 638 | " '__reduce_ex__',\n", 639 | " '__repr__',\n", 640 | " '__ror__',\n", 641 | " '__rsub__',\n", 642 | " '__rxor__',\n", 643 | " '__setattr__',\n", 644 | " '__sizeof__',\n", 645 | " '__str__',\n", 646 | " '__sub__',\n", 647 | " '__subclasshook__',\n", 648 | " '__xor__',\n", 649 | " 'add',\n", 650 | " 'clear',\n", 651 | " 'copy',\n", 652 | " 'difference',\n", 653 | " 'difference_update',\n", 654 | " 'discard',\n", 655 | " 'intersection',\n", 656 | " 'intersection_update',\n", 657 | " 'isdisjoint',\n", 658 | " 'issubset',\n", 659 | " 'issuperset',\n", 660 | " 'pop',\n", 661 | " 'remove',\n", 662 | " 'symmetric_difference',\n", 663 | " 'symmetric_difference_update',\n", 664 | " 'union',\n", 665 | " 'update']" 666 | ] 667 | }, 668 | "execution_count": 25, 669 | "metadata": {}, 670 | "output_type": "execute_result" 671 | } 672 | ], 673 | "source": [ 674 | "dir(mySet)" 675 | ] 676 | } 677 | ], 678 | "metadata": { 679 | "hide_input": false, 680 | "kernelspec": { 681 | "display_name": "Python 3 (ipykernel)", 682 | "language": "python", 683 | "name": "python3" 684 | }, 685 | "language_info": { 686 | "codemirror_mode": { 687 | "name": "ipython", 688 | "version": 3 689 | }, 690 | "file_extension": ".py", 691 | "mimetype": "text/x-python", 692 | "name": "python", 693 | "nbconvert_exporter": "python", 694 | "pygments_lexer": "ipython3", 695 | "version": "3.9.12" 696 | }, 697 | "toc": { 698 | "base_numbering": 1, 699 | "nav_menu": {}, 700 | "number_sections": true, 701 | "sideBar": true, 702 | "skip_h1_title": false, 703 | "title_cell": "Table of Contents", 704 | "title_sidebar": "Contents", 705 | "toc_cell": false, 706 | "toc_position": { 707 | "height": "677px", 708 | "left": "0px", 709 | "right": "1111px", 710 | "top": "43px", 711 | "width": "340px" 712 | }, 713 | "toc_section_display": "block", 714 | "toc_window_display": true 715 | }, 716 | "varInspector": { 717 | "cols": { 718 | "lenName": 16, 719 | "lenType": 16, 720 | "lenVar": 40 721 | }, 722 | "kernels_config": { 723 | "python": { 724 | "delete_cmd_postfix": "", 725 | "delete_cmd_prefix": "del ", 726 | "library": "var_list.py", 727 | "varRefreshCmd": "print(var_dic_list())" 728 | }, 729 | "r": { 730 | "delete_cmd_postfix": ") ", 731 | "delete_cmd_prefix": "rm(", 732 | "library": "var_list.r", 733 | "varRefreshCmd": "cat(var_dic_list()) " 734 | } 735 | }, 736 | "types_to_exclude": [ 737 | "module", 738 | "function", 739 | "builtin_function_or_method", 740 | "instance", 741 | "_Feature" 742 | ], 743 | "window_display": false 744 | } 745 | }, 746 | "nbformat": 4, 747 | "nbformat_minor": 2 748 | } 749 | -------------------------------------------------------------------------------- /05 Control Logics.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**What you learn:**\n", 8 | "\n", 9 | "In this notebook you will learn about control logics in Python. This includes comparisons, if-then-else, for-loops, while-loops, break and continue.\n", 10 | "\n", 11 | "Based on a [tutorial by Zhiya Zuo](https://github.com/zhiyzuo/python-tutorial) and extended where appropriate.\n", 12 | "\n", 13 | "Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)\n", 14 | "\n", 15 | "This notebook is available on https://github.com/BigDataAnalyticsGroup/python." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "#### Comparison" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "Python syntax for comparison is the same as our hand-written convention: \n", 30 | "\n", 31 | "1. Larger (or equal): `>` (`>=`)\n", 32 | "2. Smaller (or equal): `<` (`<=`)\n", 33 | "3. Equal to: `==` (__Notie here that there are double equal signs__)\n", 34 | "4. Not equal to: `!=`" 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 1, 40 | "metadata": { 41 | "ExecuteTime": { 42 | "end_time": "2018-10-09T00:16:20.683160Z", 43 | "start_time": "2018-10-09T00:16:20.676171Z" 44 | } 45 | }, 46 | "outputs": [ 47 | { 48 | "data": { 49 | "text/plain": [ 50 | "False" 51 | ] 52 | }, 53 | "execution_count": 1, 54 | "metadata": {}, 55 | "output_type": "execute_result" 56 | } 57 | ], 58 | "source": [ 59 | "# the following is a condition which must return a boolean value\n", 60 | "3 == 5 " 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 2, 66 | "metadata": {}, 67 | "outputs": [ 68 | { 69 | "data": { 70 | "text/plain": [ 71 | "767" 72 | ] 73 | }, 74 | "execution_count": 2, 75 | "metadata": {}, 76 | "output_type": "execute_result" 77 | } 78 | ], 79 | "source": [ 80 | "a = 767\n", 81 | "a" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 3, 87 | "metadata": {}, 88 | "outputs": [ 89 | { 90 | "data": { 91 | "text/plain": [ 92 | "True" 93 | ] 94 | }, 95 | "execution_count": 3, 96 | "metadata": {}, 97 | "output_type": "execute_result" 98 | } 99 | ], 100 | "source": [ 101 | "a == 767" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 4, 107 | "metadata": {}, 108 | "outputs": [ 109 | { 110 | "data": { 111 | "text/plain": [ 112 | "767" 113 | ] 114 | }, 115 | "execution_count": 4, 116 | "metadata": {}, 117 | "output_type": "execute_result" 118 | } 119 | ], 120 | "source": [ 121 | "a" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 5, 127 | "metadata": { 128 | "ExecuteTime": { 129 | "end_time": "2018-10-09T00:16:21.107023Z", 130 | "start_time": "2018-10-09T00:16:21.102607Z" 131 | } 132 | }, 133 | "outputs": [ 134 | { 135 | "data": { 136 | "text/plain": [ 137 | "True" 138 | ] 139 | }, 140 | "execution_count": 5, 141 | "metadata": {}, 142 | "output_type": "execute_result" 143 | } 144 | ], 145 | "source": [ 146 | "72 >= 2" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "IMPORTANT: It is worth noting that comparisons between floating point numbers are tricky." 154 | ] 155 | }, 156 | { 157 | "cell_type": "code", 158 | "execution_count": 6, 159 | "metadata": { 160 | "ExecuteTime": { 161 | "end_time": "2018-10-09T00:17:16.577984Z", 162 | "start_time": "2018-10-09T00:17:16.571556Z" 163 | } 164 | }, 165 | "outputs": [ 166 | { 167 | "name": "stdout", 168 | "output_type": "stream", 169 | "text": [ 170 | "6.6000000000000005\n" 171 | ] 172 | }, 173 | { 174 | "data": { 175 | "text/plain": [ 176 | "False" 177 | ] 178 | }, 179 | "execution_count": 6, 180 | "metadata": {}, 181 | "output_type": "execute_result" 182 | } 183 | ], 184 | "source": [ 185 | "print(2.2 * 3.0)\n", 186 | "(2.2 * 3.0) == 6.6" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 7, 192 | "metadata": {}, 193 | "outputs": [ 194 | { 195 | "data": { 196 | "text/plain": [ 197 | "True" 198 | ] 199 | }, 200 | "execution_count": 7, 201 | "metadata": {}, 202 | "output_type": "execute_result" 203 | } 204 | ], 205 | "source": [ 206 | "2.2 * 10.0 == 22.0" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "see https://docs.python.org/2/tutorial/floatingpoint.html for the explanation, wou will get back to this in the Programming 2 lecture" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "Therefore, be really careful when you have to do such comparison." 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": {}, 226 | "source": [ 227 | "#### If-(then)-Else" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": 8, 233 | "metadata": { 234 | "ExecuteTime": { 235 | "end_time": "2018-10-09T01:38:46.027739Z", 236 | "start_time": "2018-10-09T01:38:46.023274Z" 237 | } 238 | }, 239 | "outputs": [], 240 | "source": [ 241 | "sum = 42" 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 9, 247 | "metadata": { 248 | "ExecuteTime": { 249 | "end_time": "2018-10-09T01:38:49.218876Z", 250 | "start_time": "2018-10-09T01:38:49.214871Z" 251 | } 252 | }, 253 | "outputs": [ 254 | { 255 | "name": "stdout", 256 | "output_type": "stream", 257 | "text": [ 258 | "sum_ is above 5\n", 259 | "dsfsd\n" 260 | ] 261 | } 262 | ], 263 | "source": [ 264 | "if sum > 5:\n", 265 | " print('sum_ is above 5') # this statement MUST have a tab in front\n", 266 | " print('dsfsd')" 267 | ] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "execution_count": 10, 272 | "metadata": {}, 273 | "outputs": [ 274 | { 275 | "name": "stdout", 276 | "output_type": "stream", 277 | "text": [ 278 | "sum_ is above 5\n", 279 | "sum_ is above 15\n" 280 | ] 281 | } 282 | ], 283 | "source": [ 284 | "if sum > 5:\n", 285 | " print('sum_ is above 5') # this statement MUST have a tab in front\n", 286 | " if sum > 15:\n", 287 | " print('sum_ is above 15')" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": {}, 293 | "source": [ 294 | "In python TAB is used to symbolize blocks. In Java and C++ blocks are marked using curly brackets {}.\n", 295 | "\n", 296 | "```Java\n", 297 | "if (sum>5) {\n", 298 | " System.out.println(\"sum_ is above 5\");\n", 299 | "}\n", 300 | "```\n", 301 | "\n", 302 | "We do not have this in Python!" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": 11, 308 | "metadata": { 309 | "ExecuteTime": { 310 | "end_time": "2018-10-09T01:38:47.709640Z", 311 | "start_time": "2018-10-09T01:38:47.704307Z" 312 | } 313 | }, 314 | "outputs": [ 315 | { 316 | "name": "stdout", 317 | "output_type": "stream", 318 | "text": [ 319 | "sum is above 0 and its value is 1\n" 320 | ] 321 | } 322 | ], 323 | "source": [ 324 | "sum = 1\n", 325 | "if sum == 0:\n", 326 | " print(\"sum is 0\") \n", 327 | "elif sum < 0:\n", 328 | " print(\"sum is less than 0\")\n", 329 | "else:\n", 330 | " print(\"sum is above 0 and its value is \" + str(sum)) # Cast sum into string type." 331 | ] 332 | }, 333 | { 334 | "cell_type": "markdown", 335 | "metadata": {}, 336 | "source": [ 337 | "Comparing to check if strings are similar" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": 12, 343 | "metadata": { 344 | "ExecuteTime": { 345 | "end_time": "2018-10-09T01:38:50.755605Z", 346 | "start_time": "2018-10-09T01:38:50.751523Z" 347 | } 348 | }, 349 | "outputs": [], 350 | "source": [ 351 | "store_name = 'Walmart'" 352 | ] 353 | }, 354 | { 355 | "cell_type": "code", 356 | "execution_count": 13, 357 | "metadata": {}, 358 | "outputs": [ 359 | { 360 | "data": { 361 | "text/plain": [ 362 | "True" 363 | ] 364 | }, 365 | "execution_count": 13, 366 | "metadata": {}, 367 | "output_type": "execute_result" 368 | } 369 | ], 370 | "source": [ 371 | "store_name == 'Walmart'" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 14, 377 | "metadata": {}, 378 | "outputs": [ 379 | { 380 | "data": { 381 | "text/plain": [ 382 | "False" 383 | ] 384 | }, 385 | "execution_count": 14, 386 | "metadata": {}, 387 | "output_type": "execute_result" 388 | } 389 | ], 390 | "source": [ 391 | "store_name == 'walmart'" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 15, 397 | "metadata": { 398 | "ExecuteTime": { 399 | "end_time": "2018-10-09T01:38:57.568568Z", 400 | "start_time": "2018-10-09T01:38:57.563726Z" 401 | } 402 | }, 403 | "outputs": [ 404 | { 405 | "name": "stdout", 406 | "output_type": "stream", 407 | "text": [ 408 | "yep.\n" 409 | ] 410 | } 411 | ], 412 | "source": [ 413 | "# check whether substring contained in a string:\n", 414 | "if 'alm' in store_name:\n", 415 | " print(\"yep.\")\n", 416 | "else:\n", 417 | " print(\"nope.\")" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 16, 423 | "metadata": {}, 424 | "outputs": [ 425 | { 426 | "name": "stdout", 427 | "output_type": "stream", 428 | "text": [ 429 | "Really Not Weird\n" 430 | ] 431 | } 432 | ], 433 | "source": [ 434 | "# another example:\n", 435 | "n = 42\n", 436 | "if n%2 == 1:\n", 437 | " print(\"Weird\")\n", 438 | "else:\n", 439 | " if n >= 2 and n <= 5:\n", 440 | " print(\"Not Weird\")\n", 441 | " elif n >= 6 and n <= 20:\n", 442 | " print(\"Weird\")\n", 443 | " elif n > 20: \n", 444 | " print(\"Really Not Weird\")" 445 | ] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "metadata": {}, 450 | "source": [ 451 | "#### For loop: Iterating through a sequence" 452 | ] 453 | }, 454 | { 455 | "cell_type": "markdown", 456 | "metadata": { 457 | "ExecuteTime": { 458 | "end_time": "2017-10-04T18:38:51.792543Z", 459 | "start_time": "2017-10-04T18:38:51.786228Z" 460 | } 461 | }, 462 | "source": [ 463 | "`range()` is a function to create interger sequences:" 464 | ] 465 | }, 466 | { 467 | "cell_type": "code", 468 | "execution_count": 17, 469 | "metadata": {}, 470 | "outputs": [ 471 | { 472 | "data": { 473 | "text/plain": [ 474 | "[0, 1, 2, 3, 4, 5, 6, 7, 8]" 475 | ] 476 | }, 477 | "execution_count": 17, 478 | "metadata": {}, 479 | "output_type": "execute_result" 480 | } 481 | ], 482 | "source": [ 483 | "# create an int range [0;7[\n", 484 | "list(range(9))" 485 | ] 486 | }, 487 | { 488 | "cell_type": "code", 489 | "execution_count": 18, 490 | "metadata": {}, 491 | "outputs": [ 492 | { 493 | "data": { 494 | "text/plain": [ 495 | "[4, 5, 6]" 496 | ] 497 | }, 498 | "execution_count": 18, 499 | "metadata": {}, 500 | "output_type": "execute_result" 501 | } 502 | ], 503 | "source": [ 504 | "# create an int range [4;7[\n", 505 | "list(range(4,7))" 506 | ] 507 | }, 508 | { 509 | "cell_type": "code", 510 | "execution_count": 19, 511 | "metadata": {}, 512 | "outputs": [ 513 | { 514 | "data": { 515 | "text/plain": [ 516 | "[7, 10, 13, 16, 19, 22]" 517 | ] 518 | }, 519 | "execution_count": 19, 520 | "metadata": {}, 521 | "output_type": "execute_result" 522 | } 523 | ], 524 | "source": [ 525 | "# create an int range [7;25[, however starting from 7 add integers in steps of 3 only\n", 526 | "list(range(7,25,3))" 527 | ] 528 | }, 529 | { 530 | "cell_type": "code", 531 | "execution_count": 20, 532 | "metadata": { 533 | "ExecuteTime": { 534 | "end_time": "2018-10-09T01:41:05.935416Z", 535 | "start_time": "2018-10-09T01:41:05.930030Z" 536 | } 537 | }, 538 | "outputs": [ 539 | { 540 | "name": "stdout", 541 | "output_type": "stream", 542 | "text": [ 543 | "7\n", 544 | "10\n", 545 | "13\n", 546 | "16\n", 547 | "19\n", 548 | "22\n" 549 | ] 550 | } 551 | ], 552 | "source": [ 553 | "# range() is very useful in combination with for-loops:\n", 554 | "for index in range(7,25,3): # length of a sequence\n", 555 | " print(index)\n", 556 | " \n", 557 | "# java/C++-syntax:\n", 558 | "# for (int i=0; i<42;i++){\n", 559 | "# // actual code\n", 560 | "# }" 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": 21, 566 | "metadata": {}, 567 | "outputs": [ 568 | { 569 | "data": { 570 | "text/plain": [ 571 | "[0, 1, 2, 3, 4, 5, 6]" 572 | ] 573 | }, 574 | "execution_count": 21, 575 | "metadata": {}, 576 | "output_type": "execute_result" 577 | } 578 | ], 579 | "source": [ 580 | "list(range(len(store_name)))" 581 | ] 582 | }, 583 | { 584 | "cell_type": "code", 585 | "execution_count": 22, 586 | "metadata": {}, 587 | "outputs": [ 588 | { 589 | "name": "stdout", 590 | "output_type": "stream", 591 | "text": [ 592 | "The 0th letter in store_name is: W\n", 593 | "The 1th letter in store_name is: a\n", 594 | "The 2th letter in store_name is: l\n", 595 | "The 3th letter in store_name is: m\n", 596 | "The 4th letter in store_name is: a\n", 597 | "The 5th letter in store_name is: r\n", 598 | "The 6th letter in store_name is: t\n" 599 | ] 600 | } 601 | ], 602 | "source": [ 603 | "# range() is very useful in combination with for-loops:\n", 604 | "for index in range(len(store_name)): # length of a sequence\n", 605 | " print(\"The %ith letter in store_name is: %s\"%(index, store_name[index]))" 606 | ] 607 | }, 608 | { 609 | "cell_type": "code", 610 | "execution_count": 23, 611 | "metadata": { 612 | "ExecuteTime": { 613 | "end_time": "2018-10-09T01:39:07.309307Z", 614 | "start_time": "2018-10-09T01:39:07.305696Z" 615 | } 616 | }, 617 | "outputs": [ 618 | { 619 | "name": "stdout", 620 | "output_type": "stream", 621 | "text": [ 622 | "W\n", 623 | "a\n", 624 | "l\n", 625 | "m\n", 626 | "a\n", 627 | "r\n", 628 | "t\n" 629 | ] 630 | } 631 | ], 632 | "source": [ 633 | "for letter in store_name:\n", 634 | " print(letter)" 635 | ] 636 | }, 637 | { 638 | "cell_type": "markdown", 639 | "metadata": {}, 640 | "source": [ 641 | "#### While loop: Keep doing until condition no longer holds." 642 | ] 643 | }, 644 | { 645 | "cell_type": "markdown", 646 | "metadata": {}, 647 | "source": [ 648 | "Use `for` when you know __the exact number of iterations__; use `while` when you __do not (e.g., checking convergence)__." 649 | ] 650 | }, 651 | { 652 | "cell_type": "code", 653 | "execution_count": 24, 654 | "metadata": { 655 | "ExecuteTime": { 656 | "end_time": "2018-10-09T01:41:12.053207Z", 657 | "start_time": "2018-10-09T01:41:12.049347Z" 658 | } 659 | }, 660 | "outputs": [], 661 | "source": [ 662 | "x = 2" 663 | ] 664 | }, 665 | { 666 | "cell_type": "code", 667 | "execution_count": 25, 668 | "metadata": {}, 669 | "outputs": [ 670 | { 671 | "data": { 672 | "text/plain": [ 673 | "2" 674 | ] 675 | }, 676 | "execution_count": 25, 677 | "metadata": {}, 678 | "output_type": "execute_result" 679 | } 680 | ], 681 | "source": [ 682 | "x" 683 | ] 684 | }, 685 | { 686 | "cell_type": "code", 687 | "execution_count": 26, 688 | "metadata": { 689 | "ExecuteTime": { 690 | "end_time": "2018-10-09T01:41:12.294825Z", 691 | "start_time": "2018-10-09T01:41:12.290846Z" 692 | } 693 | }, 694 | "outputs": [ 695 | { 696 | "name": "stdout", 697 | "output_type": "stream", 698 | "text": [ 699 | "2\n", 700 | "3\n", 701 | "5\n", 702 | "9\n" 703 | ] 704 | } 705 | ], 706 | "source": [ 707 | "while x < 10:\n", 708 | " print(x)\n", 709 | " x = x + (x-1)\n", 710 | " # x += x-1" 711 | ] 712 | }, 713 | { 714 | "cell_type": "markdown", 715 | "metadata": {}, 716 | "source": [ 717 | "#### Notes on `break` and `continue`" 718 | ] 719 | }, 720 | { 721 | "cell_type": "markdown", 722 | "metadata": {}, 723 | "source": [ 724 | "`break` means get out of the loop immediately. Any code after the `break` will NOT be executed." 725 | ] 726 | }, 727 | { 728 | "cell_type": "code", 729 | "execution_count": 27, 730 | "metadata": { 731 | "ExecuteTime": { 732 | "end_time": "2018-10-09T01:41:21.493159Z", 733 | "start_time": "2018-10-09T01:41:21.489747Z" 734 | } 735 | }, 736 | "outputs": [], 737 | "source": [ 738 | "store_name = 'Walmart'" 739 | ] 740 | }, 741 | { 742 | "cell_type": "code", 743 | "execution_count": 28, 744 | "metadata": { 745 | "ExecuteTime": { 746 | "end_time": "2018-10-09T01:41:22.084338Z", 747 | "start_time": "2018-10-09T01:41:22.076857Z" 748 | } 749 | }, 750 | "outputs": [ 751 | { 752 | "name": "stdout", 753 | "output_type": "stream", 754 | "text": [ 755 | "W\n", 756 | "-> End at a, position: 1\n" 757 | ] 758 | } 759 | ], 760 | "source": [ 761 | "index = 0\n", 762 | "while True:\n", 763 | " print(store_name[index])\n", 764 | " index += 1 # a += b means a = a + b\n", 765 | " if store_name[index] == \"a\":\n", 766 | " print(\"-> End at a, position: \", index)\n", 767 | " break # instead of setting flag to False, we can directly break out of the loop\n", 768 | " print(\"Hello!\") # This will NOT be run" 769 | ] 770 | }, 771 | { 772 | "cell_type": "markdown", 773 | "metadata": {}, 774 | "source": [ 775 | "`continue` means get to the next iteration of loop. It will __break__ the current iteration and __continue__ to the next." 776 | ] 777 | }, 778 | { 779 | "cell_type": "code", 780 | "execution_count": 29, 781 | "metadata": { 782 | "ExecuteTime": { 783 | "end_time": "2018-10-09T01:41:23.822603Z", 784 | "start_time": "2018-10-09T01:41:23.818257Z" 785 | } 786 | }, 787 | "outputs": [ 788 | { 789 | "name": "stdout", 790 | "output_type": "stream", 791 | "text": [ 792 | "W\n", 793 | "l\n", 794 | "m\n", 795 | "r\n", 796 | "t\n" 797 | ] 798 | } 799 | ], 800 | "source": [ 801 | "for letter in store_name:\n", 802 | " if letter == \"a\":\n", 803 | " continue # Not printing 'a'\n", 804 | " else:\n", 805 | " print(letter)" 806 | ] 807 | }, 808 | { 809 | "cell_type": "code", 810 | "execution_count": 30, 811 | "metadata": {}, 812 | "outputs": [ 813 | { 814 | "name": "stdout", 815 | "output_type": "stream", 816 | "text": [ 817 | "W\n", 818 | "l\n", 819 | "m\n", 820 | "r\n", 821 | "t\n" 822 | ] 823 | } 824 | ], 825 | "source": [ 826 | "for letter in store_name:\n", 827 | " if letter != \"a\":\n", 828 | " print(letter)" 829 | ] 830 | } 831 | ], 832 | "metadata": { 833 | "hide_input": false, 834 | "kernelspec": { 835 | "display_name": "Python 3 (ipykernel)", 836 | "language": "python", 837 | "name": "python3" 838 | }, 839 | "language_info": { 840 | "codemirror_mode": { 841 | "name": "ipython", 842 | "version": 3 843 | }, 844 | "file_extension": ".py", 845 | "mimetype": "text/x-python", 846 | "name": "python", 847 | "nbconvert_exporter": "python", 848 | "pygments_lexer": "ipython3", 849 | "version": "3.9.12" 850 | }, 851 | "toc": { 852 | "base_numbering": 1, 853 | "nav_menu": {}, 854 | "number_sections": true, 855 | "sideBar": true, 856 | "skip_h1_title": false, 857 | "title_cell": "Table of Contents", 858 | "title_sidebar": "Contents", 859 | "toc_cell": false, 860 | "toc_position": { 861 | "height": "677px", 862 | "left": "0px", 863 | "right": "1111px", 864 | "top": "43px", 865 | "width": "340px" 866 | }, 867 | "toc_section_display": "block", 868 | "toc_window_display": true 869 | }, 870 | "varInspector": { 871 | "cols": { 872 | "lenName": 16, 873 | "lenType": 16, 874 | "lenVar": 40 875 | }, 876 | "kernels_config": { 877 | "python": { 878 | "delete_cmd_postfix": "", 879 | "delete_cmd_prefix": "del ", 880 | "library": "var_list.py", 881 | "varRefreshCmd": "print(var_dic_list())" 882 | }, 883 | "r": { 884 | "delete_cmd_postfix": ") ", 885 | "delete_cmd_prefix": "rm(", 886 | "library": "var_list.r", 887 | "varRefreshCmd": "cat(var_dic_list()) " 888 | } 889 | }, 890 | "types_to_exclude": [ 891 | "module", 892 | "function", 893 | "builtin_function_or_method", 894 | "instance", 895 | "_Feature" 896 | ], 897 | "window_display": false 898 | } 899 | }, 900 | "nbformat": 4, 901 | "nbformat_minor": 2 902 | } 903 | -------------------------------------------------------------------------------- /06 Functions and External Libraries.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**What you learn:**\n", 8 | "\n", 9 | "In this notebook you will learn about functions in Python. This includes named functions, lambda (anonymous) functions, generators, and function libraries. \n", 10 | "\n", 11 | "Originally based on a [tutorial by Zhiya Zuo](https://github.com/zhiyzuo/python-tutorial) and extended where appropriate.\n", 12 | "\n", 13 | "Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)\n", 14 | "\n", 15 | "This notebook is available on https://github.com/BigDataAnalyticsGroup/python." 16 | ] 17 | }, 18 | { 19 | "cell_type": "markdown", 20 | "metadata": {}, 21 | "source": [ 22 | "### Functions" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "#### Calling functions" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "Previously, we have already made use of many built-in functions to facilitate programming. A function is a block of code with (optional) input arguments (optional) return values. In Python (and many other languages), a function can be called as follows:\n", 37 | "\n", 38 | "```python\n", 39 | ">> output = foo(input_argument1, input_argument2)\n", 40 | "```" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "We called several functions already when handling [loops](https://github.com/BigDataAnalyticsGroup/python/blob/master/05%20Control%20Logics.ipynb) of this tutorial, for example:" 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": 1, 53 | "metadata": { 54 | "ExecuteTime": { 55 | "end_time": "2018-10-25T17:47:37.407545Z", 56 | "start_time": "2018-10-25T17:47:37.389644Z" 57 | } 58 | }, 59 | "outputs": [ 60 | { 61 | "data": { 62 | "text/plain": [ 63 | "range(0, 5)" 64 | ] 65 | }, 66 | "execution_count": 1, 67 | "metadata": {}, 68 | "output_type": "execute_result" 69 | } 70 | ], 71 | "source": [ 72 | "a = range(5)\n", 73 | "a" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "We can nest function calls, here the output of range(5) is the input to list(..):" 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": 2, 86 | "metadata": { 87 | "ExecuteTime": { 88 | "end_time": "2018-10-25T17:48:25.160955Z", 89 | "start_time": "2018-10-25T17:48:25.155505Z" 90 | } 91 | }, 92 | "outputs": [ 93 | { 94 | "data": { 95 | "text/plain": [ 96 | "[0, 1, 2, 3, 4]" 97 | ] 98 | }, 99 | "execution_count": 2, 100 | "metadata": {}, 101 | "output_type": "execute_result" 102 | } 103 | ], 104 | "source": [ 105 | "list(range(5))" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "Another example:" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 3, 118 | "metadata": { 119 | "ExecuteTime": { 120 | "end_time": "2018-10-25T17:48:45.482719Z", 121 | "start_time": "2018-10-25T17:48:45.476899Z" 122 | } 123 | }, 124 | "outputs": [ 125 | { 126 | "data": { 127 | "text/plain": [ 128 | "3.5" 129 | ] 130 | }, 131 | "execution_count": 3, 132 | "metadata": {}, 133 | "output_type": "execute_result" 134 | } 135 | ], 136 | "source": [ 137 | "abs(-3.5)" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "Often we need more than one input argument. For example:" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 4, 150 | "metadata": { 151 | "ExecuteTime": { 152 | "end_time": "2018-10-25T17:20:29.734990Z", 153 | "start_time": "2018-10-25T17:20:29.729907Z" 154 | } 155 | }, 156 | "outputs": [ 157 | { 158 | "data": { 159 | "text/plain": [ 160 | "[5, 4, 3, 2, 1]" 161 | ] 162 | }, 163 | "execution_count": 4, 164 | "metadata": {}, 165 | "output_type": "execute_result" 166 | } 167 | ], 168 | "source": [ 169 | "list(range(5, 0, -1))" 170 | ] 171 | }, 172 | { 173 | "cell_type": "markdown", 174 | "metadata": {}, 175 | "source": [ 176 | "A second example, given a dictionary produce a list of the keys sorted by their associated values in the dictionary (not the keys themselves!):" 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 5, 182 | "metadata": { 183 | "ExecuteTime": { 184 | "end_time": "2018-10-25T17:50:16.189583Z", 185 | "start_time": "2018-10-25T17:50:16.183753Z" 186 | } 187 | }, 188 | "outputs": [ 189 | { 190 | "data": { 191 | "text/plain": [ 192 | "['a', 'c', 'b']" 193 | ] 194 | }, 195 | "execution_count": 5, 196 | "metadata": {}, 197 | "output_type": "execute_result" 198 | } 199 | ], 200 | "source": [ 201 | "d = {'a': 100, 'c': 50, 'b': 70}\n", 202 | "# output keys available in this dictionary:\n", 203 | "list(d.keys())" 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": 6, 209 | "metadata": { 210 | "ExecuteTime": { 211 | "end_time": "2018-10-25T17:53:27.808548Z", 212 | "start_time": "2018-10-25T17:53:27.798150Z" 213 | } 214 | }, 215 | "outputs": [ 216 | { 217 | "data": { 218 | "text/plain": [ 219 | "(dict, {'a': 100, 'c': 50, 'b': 70})" 220 | ] 221 | }, 222 | "execution_count": 6, 223 | "metadata": {}, 224 | "output_type": "execute_result" 225 | } 226 | ], 227 | "source": [ 228 | "type(d), d" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "execution_count": 7, 234 | "metadata": {}, 235 | "outputs": [ 236 | { 237 | "data": { 238 | "text/plain": [ 239 | "(list, ['a', 'b', 'c'])" 240 | ] 241 | }, 242 | "execution_count": 7, 243 | "metadata": {}, 244 | "output_type": "execute_result" 245 | } 246 | ], 247 | "source": [ 248 | "# sort dictionary (this function will return a list of the keys)\n", 249 | "l = sorted(d)\n", 250 | "type(l), l" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": 8, 256 | "metadata": { 257 | "ExecuteTime": { 258 | "end_time": "2018-10-25T17:53:33.866813Z", 259 | "start_time": "2018-10-25T17:53:33.861638Z" 260 | } 261 | }, 262 | "outputs": [ 263 | { 264 | "data": { 265 | "text/plain": [ 266 | "100" 267 | ] 268 | }, 269 | "execution_count": 8, 270 | "metadata": {}, 271 | "output_type": "execute_result" 272 | } 273 | ], 274 | "source": [ 275 | "# show the value of key 'a':\n", 276 | "d['a']" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 9, 282 | "metadata": {}, 283 | "outputs": [], 284 | "source": [ 285 | "def values(key):\n", 286 | " return d[key]" 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 10, 292 | "metadata": {}, 293 | "outputs": [ 294 | { 295 | "data": { 296 | "text/plain": [ 297 | "['c', 'b', 'a']" 298 | ] 299 | }, 300 | "execution_count": 10, 301 | "metadata": {}, 302 | "output_type": "execute_result" 303 | } 304 | ], 305 | "source": [ 306 | "# sort the keys of the dictionary by their associated values:\n", 307 | "sorted(d, key=values)" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 11, 313 | "metadata": { 314 | "ExecuteTime": { 315 | "end_time": "2018-10-25T17:51:39.294194Z", 316 | "start_time": "2018-10-25T17:51:39.286761Z" 317 | } 318 | }, 319 | "outputs": [ 320 | { 321 | "data": { 322 | "text/plain": [ 323 | "['c', 'b', 'a']" 324 | ] 325 | }, 326 | "execution_count": 11, 327 | "metadata": {}, 328 | "output_type": "execute_result" 329 | } 330 | ], 331 | "source": [ 332 | "# sort the keys of the dictionary by their associated values:\n", 333 | "sorted(d, key=lambda k: d[k])" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 12, 339 | "metadata": {}, 340 | "outputs": [ 341 | { 342 | "data": { 343 | "text/plain": [ 344 | "['a', 'b', 'c']" 345 | ] 346 | }, 347 | "execution_count": 12, 348 | "metadata": {}, 349 | "output_type": "execute_result" 350 | } 351 | ], 352 | "source": [ 353 | "# sort the keys of the dictionary by their associated values in reverse (aka descending) order:\n", 354 | "sorted(d, key=lambda k: d[k], reverse=True)" 355 | ] 356 | }, 357 | { 358 | "cell_type": "markdown", 359 | "metadata": {}, 360 | "source": [ 361 | "#### Lambda functions" 362 | ] 363 | }, 364 | { 365 | "cell_type": "markdown", 366 | "metadata": {}, 367 | "source": [ 368 | "Aha, we just saw something different: `lambda`!" 369 | ] 370 | }, 371 | { 372 | "cell_type": "markdown", 373 | "metadata": {}, 374 | "source": [ 375 | "Lambda functions are just functions, except that they are anonymous (literally). See [here](https://stackoverflow.com/questions/890128/why-are-python-lambdas-useful) for many good discussions. In short, you can use regular functions to achieve anything with `lambda`. Yet, it is handy because it is lightweight and anonymous." 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": {}, 381 | "source": [ 382 | "The example above is actually a good example of when to use `lambda`:" 383 | ] 384 | }, 385 | { 386 | "cell_type": "code", 387 | "execution_count": 13, 388 | "metadata": { 389 | "ExecuteTime": { 390 | "end_time": "2018-10-25T17:20:29.761643Z", 391 | "start_time": "2018-10-25T17:20:29.757012Z" 392 | } 393 | }, 394 | "outputs": [ 395 | { 396 | "data": { 397 | "text/plain": [ 398 | "['c', 'b', 'a']" 399 | ] 400 | }, 401 | "execution_count": 13, 402 | "metadata": {}, 403 | "output_type": "execute_result" 404 | } 405 | ], 406 | "source": [ 407 | "sorted(d, key=lambda k: d[k])" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": {}, 413 | "source": [ 414 | "There is one and only one expression within the `lambda` function. In this case, the input parameter is `k`, it is expected to be an existing key inside the dictionary `d`. The output of the lambda function is `d[k]`. Therefore we are sorting our dictionary entries by their values rather than by the keys themselves." 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": {}, 420 | "source": [ 421 | "#### Define our own functions" 422 | ] 423 | }, 424 | { 425 | "cell_type": "markdown", 426 | "metadata": {}, 427 | "source": [ 428 | "Note that we are not limited to built-in or lambda functions only. Let's now try make our own functions. Before that, we need to be clear on the structure of a function:\n", 429 | "```python\n", 430 | "def func_name(arg1, arg2, arg3, ...):\n", 431 | " # start of the actual code block: must start with a tab\n", 432 | " # Do something here # <-- whatever number of code lines, must start with a tab\n", 433 | " # end of the actual code block: must start with a tab\n", 434 | " return output\n", 435 | "```\n", 436 | "\n", 437 | "\\* *`return output` and all arguments are optional*\n", 438 | "\n", 439 | "So again: each line in the code blocks must start with a `tab`. In contrast to Java/C++ which uses {}-Syntax for this." 440 | ] 441 | }, 442 | { 443 | "cell_type": "markdown", 444 | "metadata": {}, 445 | "source": [ 446 | "In the following example, we make use of `sum`, a built-in function to sum up numeric iterables:" 447 | ] 448 | }, 449 | { 450 | "cell_type": "code", 451 | "execution_count": 14, 452 | "metadata": { 453 | "ExecuteTime": { 454 | "end_time": "2018-10-25T18:00:12.954116Z", 455 | "start_time": "2018-10-25T18:00:12.950910Z" 456 | } 457 | }, 458 | "outputs": [], 459 | "source": [ 460 | "def mySum(list_to_sum):\n", 461 | " print('mySum was called.')\n", 462 | " return sum(list_to_sum)" 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": 15, 468 | "metadata": { 469 | "ExecuteTime": { 470 | "end_time": "2018-10-25T18:00:13.219092Z", 471 | "start_time": "2018-10-25T18:00:13.214135Z" 472 | } 473 | }, 474 | "outputs": [ 475 | { 476 | "name": "stdout", 477 | "output_type": "stream", 478 | "text": [ 479 | "mySum was called.\n" 480 | ] 481 | }, 482 | { 483 | "data": { 484 | "text/plain": [ 485 | "10" 486 | ] 487 | }, 488 | "execution_count": 15, 489 | "metadata": {}, 490 | "output_type": "execute_result" 491 | } 492 | ], 493 | "source": [ 494 | "mySum(range(5))" 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": 16, 500 | "metadata": {}, 501 | "outputs": [ 502 | { 503 | "data": { 504 | "text/plain": [ 505 | "10" 506 | ] 507 | }, 508 | "execution_count": 16, 509 | "metadata": {}, 510 | "output_type": "execute_result" 511 | } 512 | ], 513 | "source": [ 514 | "# in this case the outpout is no different to calling sum() directly (other than print statement above):\n", 515 | "sum(range(5))" 516 | ] 517 | }, 518 | { 519 | "cell_type": "markdown", 520 | "metadata": {}, 521 | "source": [ 522 | "The same sum function using a for loop to a add up the values in the input list:" 523 | ] 524 | }, 525 | { 526 | "cell_type": "code", 527 | "execution_count": 17, 528 | "metadata": { 529 | "ExecuteTime": { 530 | "end_time": "2018-10-25T18:00:55.632800Z", 531 | "start_time": "2018-10-25T18:00:55.628985Z" 532 | } 533 | }, 534 | "outputs": [], 535 | "source": [ 536 | "def mySumUsingLoop(list_to_sum):\n", 537 | " runningSum = 0\n", 538 | " for item in list_to_sum:\n", 539 | " print(item)\n", 540 | " runningSum += item\n", 541 | " print('current runningSum:', runningSum)\n", 542 | " return runningSum" 543 | ] 544 | }, 545 | { 546 | "cell_type": "code", 547 | "execution_count": 18, 548 | "metadata": { 549 | "ExecuteTime": { 550 | "end_time": "2018-10-25T18:00:55.894819Z", 551 | "start_time": "2018-10-25T18:00:55.887776Z" 552 | } 553 | }, 554 | "outputs": [ 555 | { 556 | "name": "stdout", 557 | "output_type": "stream", 558 | "text": [ 559 | "0\n", 560 | "current runningSum: 0\n", 561 | "1\n", 562 | "current runningSum: 1\n", 563 | "2\n", 564 | "current runningSum: 3\n", 565 | "3\n", 566 | "current runningSum: 6\n", 567 | "4\n", 568 | "current runningSum: 10\n" 569 | ] 570 | }, 571 | { 572 | "data": { 573 | "text/plain": [ 574 | "10" 575 | ] 576 | }, 577 | "execution_count": 18, 578 | "metadata": {}, 579 | "output_type": "execute_result" 580 | } 581 | ], 582 | "source": [ 583 | "mySumUsingLoop(range(5))" 584 | ] 585 | }, 586 | { 587 | "cell_type": "markdown", 588 | "metadata": {}, 589 | "source": [ 590 | "*The two example functions are not doing anything interesting but just serve as illustrations to build customized functions.*" 591 | ] 592 | }, 593 | { 594 | "cell_type": "markdown", 595 | "metadata": {}, 596 | "source": [ 597 | "#### Functions without side effects:" 598 | ] 599 | }, 600 | { 601 | "cell_type": "markdown", 602 | "metadata": {}, 603 | "source": [ 604 | "Sometimes functions may have surprising side-effects.\n", 605 | "\n", 606 | "Actually, a function should **never** have a side-effect." 607 | ] 608 | }, 609 | { 610 | "cell_type": "code", 611 | "execution_count": 19, 612 | "metadata": {}, 613 | "outputs": [ 614 | { 615 | "name": "stdout", 616 | "output_type": "stream", 617 | "text": [ 618 | "outer1 [0, 1, 1, 2, 3, 5, 8]\n", 619 | " inner1 [0, 1, 1, 2, 3, 5, 8]\n", 620 | " inner2 [47, 11]\n", 621 | "outer2 [0, 1, 1, 2, 3, 5, 8]\n" 622 | ] 623 | } 624 | ], 625 | "source": [ 626 | "# no side effect:\n", 627 | "def func1(mylist):\n", 628 | " print (\" inner1 \", mylist)\n", 629 | " mylist = [47,11] # this creates a new list object and assigns it to local variable myList!\n", 630 | " print (\" inner2 \", mylist)\n", 631 | "\n", 632 | "fib = [0,1,1,2,3,5,8]\n", 633 | "print(\"outer1 \", fib)\n", 634 | "func1(fib)\n", 635 | "print(\"outer2 \", fib)" 636 | ] 637 | }, 638 | { 639 | "cell_type": "code", 640 | "execution_count": 20, 641 | "metadata": {}, 642 | "outputs": [ 643 | { 644 | "name": "stdout", 645 | "output_type": "stream", 646 | "text": [ 647 | "4465607168\n" 648 | ] 649 | }, 650 | { 651 | "data": { 652 | "text/plain": [ 653 | "4415594064" 654 | ] 655 | }, 656 | "execution_count": 20, 657 | "metadata": {}, 658 | "output_type": "execute_result" 659 | } 660 | ], 661 | "source": [ 662 | "# show the python-internal id of object fib:\n", 663 | "print(id(fib))\n", 664 | "a = 42\n", 665 | "id(a)" 666 | ] 667 | }, 668 | { 669 | "cell_type": "markdown", 670 | "metadata": {}, 671 | "source": [ 672 | "From the [Python Docu](https://docs.python.org/3/library/functions.html#id):\n", 673 | "\n", 674 | "**id(object)**\n", 675 | "> Return the “identity” of an object. This is an integer which is guaranteed\n", 676 | "> to be unique and constant for this object during its lifetime. Two objects\n", 677 | "> with non-overlapping lifetimes may have the same id() value." 678 | ] 679 | }, 680 | { 681 | "cell_type": "code", 682 | "execution_count": 21, 683 | "metadata": {}, 684 | "outputs": [ 685 | { 686 | "name": "stdout", 687 | "output_type": "stream", 688 | "text": [ 689 | "outer1 4465608768 [0, 1, 1, 2, 3, 5, 8]\n", 690 | " inner1 4465608768 [0, 1, 1, 2, 3, 5, 8]\n", 691 | " inner2 4465608704 [47, 11]\n", 692 | "outer2 4465608768 [0, 1, 1, 2, 3, 5, 8]\n" 693 | ] 694 | } 695 | ], 696 | "source": [ 697 | "# same as above, just additionally printing the id of the list\n", 698 | "def func1(mylist):\n", 699 | " print (\" inner1 \", id(mylist), mylist)\n", 700 | " mylist = [47,11] # this creates a new list object and assigns it to local variable myList!\n", 701 | " print (\" inner2 \", id(mylist), mylist)\n", 702 | "\n", 703 | "fib = [0,1,1,2,3,5,8]\n", 704 | "print(\"outer1 \", id(fib), fib)\n", 705 | "func1(fib)\n", 706 | "print(\"outer2 \", id(fib), fib)" 707 | ] 708 | }, 709 | { 710 | "cell_type": "markdown", 711 | "metadata": {}, 712 | "source": [ 713 | "#### Functions **with** side effects:" 714 | ] 715 | }, 716 | { 717 | "cell_type": "code", 718 | "execution_count": 22, 719 | "metadata": {}, 720 | "outputs": [ 721 | { 722 | "name": "stdout", 723 | "output_type": "stream", 724 | "text": [ 725 | "outer1 4465574976 [0, 1, 1, 2, 3, 5, 8]\n", 726 | " inner1 4465574976 [0, 1, 1, 2, 3, 5, 8]\n", 727 | " inner2 4465574976 [0, 1, 1, 2, 3, 5, 8, 47, 11]\n", 728 | "outer2 4465574976 [0, 1, 1, 2, 3, 5, 8, 47, 11]\n" 729 | ] 730 | } 731 | ], 732 | "source": [ 733 | "# list and fib refer to the same address as shown by calling id():\n", 734 | "def func2(mylist):\n", 735 | " print (\" inner1 \", id(mylist), mylist)\n", 736 | " mylist += [47,11] # appends two elements to the list pointed to\n", 737 | " print (\" inner2 \", id(mylist), mylist)\n", 738 | " \n", 739 | "fib = [0,1,1,2,3,5,8]\n", 740 | "print(\"outer1 \", id(fib), fib)\n", 741 | "func2(fib)\n", 742 | "print(\"outer2 \", id(fib), fib)" 743 | ] 744 | }, 745 | { 746 | "cell_type": "markdown", 747 | "metadata": {}, 748 | "source": [ 749 | "#### Functions **without** side effects:" 750 | ] 751 | }, 752 | { 753 | "cell_type": "code", 754 | "execution_count": 23, 755 | "metadata": {}, 756 | "outputs": [ 757 | { 758 | "name": "stdout", 759 | "output_type": "stream", 760 | "text": [ 761 | "outer1 4465765552 bla\n", 762 | " inner1 4465765552 bla\n", 763 | " inner2 4465765680 blablub\n", 764 | "outer2 4465765552 bla\n" 765 | ] 766 | } 767 | ], 768 | "source": [ 769 | "# no side effect:\n", 770 | "def func3(stlocal):\n", 771 | " print (\" inner1 \", id(stlocal), stlocal)\n", 772 | " stlocal += 'blub' # this creates a new string object and assigns it to stlocal!\n", 773 | " print (\" inner2 \", id(stlocal), stlocal)\n", 774 | " \n", 775 | "st = 'bla'\n", 776 | "print(\"outer1 \", id(st), st)\n", 777 | "func3(st)\n", 778 | "print(\"outer2 \", id(st), st)" 779 | ] 780 | }, 781 | { 782 | "cell_type": "markdown", 783 | "metadata": {}, 784 | "source": [ 785 | "#### Generator function:" 786 | ] 787 | }, 788 | { 789 | "cell_type": "markdown", 790 | "metadata": {}, 791 | "source": [ 792 | "generator functions are useful in situations where we need to iterator over a sequence of items, a prominent example of this is are for-loops:" 793 | ] 794 | }, 795 | { 796 | "cell_type": "code", 797 | "execution_count": 24, 798 | "metadata": {}, 799 | "outputs": [ 800 | { 801 | "name": "stdout", 802 | "output_type": "stream", 803 | "text": [ 804 | "yield:, 0\n", 805 | "loop: 0\n", 806 | "yield:, 1\n", 807 | "loop: 1\n", 808 | "yield:, 4\n", 809 | "loop: 4\n", 810 | "yield:, 9\n", 811 | "loop: 9\n", 812 | "yield:, 16\n", 813 | "loop: 16\n", 814 | "yield:, 25\n", 815 | "loop: 25\n", 816 | "yield:, 36\n", 817 | "loop: 36\n", 818 | "yield:, 49\n", 819 | "loop: 49\n", 820 | "yield:, 64\n", 821 | "loop: 64\n", 822 | "yield:, 81\n", 823 | "loop: 81\n" 824 | ] 825 | } 826 | ], 827 | "source": [ 828 | "# generator function:\n", 829 | "def square(n):\n", 830 | " for i in range(n):\n", 831 | " print (\"yield:, \", i**2)\n", 832 | " yield i**2\n", 833 | "\n", 834 | "for i in square(10):\n", 835 | " print(\"loop: \",i)" 836 | ] 837 | }, 838 | { 839 | "cell_type": "code", 840 | "execution_count": 25, 841 | "metadata": {}, 842 | "outputs": [ 843 | { 844 | "name": "stdout", 845 | "output_type": "stream", 846 | "text": [ 847 | "lhjiqgs, ajeoisu, abveyxi, aebjwut, vqfyaun, oxvbsnx, pbreoqa, rxmfgmn, yxtlmnc, tajonch, " 848 | ] 849 | } 850 | ], 851 | "source": [ 852 | "import random\n", 853 | "import string\n", 854 | "\n", 855 | "# generator function for words consisting of random characters:\n", 856 | "def blabla(n, length):\n", 857 | " for i in range(n):\n", 858 | " ret = ''\n", 859 | " for y in range(length):\n", 860 | " ret += random.choice(string.ascii_letters)\n", 861 | " yield ret.lower()\n", 862 | "\n", 863 | "for bla in blabla(10,7):\n", 864 | " print(bla+\", \", end=\"\")\n", 865 | " " 866 | ] 867 | }, 868 | { 869 | "cell_type": "markdown", 870 | "metadata": {}, 871 | "source": [ 872 | "Here, a function is easier (and more elegant) than defining a lambda function." 873 | ] 874 | }, 875 | { 876 | "cell_type": "markdown", 877 | "metadata": {}, 878 | "source": [ 879 | "### Libraries" 880 | ] 881 | }, 882 | { 883 | "cell_type": "markdown", 884 | "metadata": {}, 885 | "source": [ 886 | "Often we need either internal or external help for complicated computation tasks. In these occasions, we need to _import libraries_, basically collections of existing functions.\n", 887 | "\n", 888 | "One strength of Python is the **incredible universe of available libraries**. You can find libraries for almost anything.\n", 889 | "\n", 890 | "to start:\n", 891 | "\n", 892 | "[Python Standard Library](https://docs.python.org/3/library/)\n", 893 | "\n", 894 | "[Python Package Index](https://pypi.org/search/)" 895 | ] 896 | }, 897 | { 898 | "cell_type": "markdown", 899 | "metadata": {}, 900 | "source": [ 901 | "#### Built-in libraries" 902 | ] 903 | }, 904 | { 905 | "cell_type": "markdown", 906 | "metadata": {}, 907 | "source": [ 908 | "We will use the __math__-library as an example." 909 | ] 910 | }, 911 | { 912 | "cell_type": "code", 913 | "execution_count": 26, 914 | "metadata": { 915 | "ExecuteTime": { 916 | "end_time": "2018-10-25T18:11:30.413987Z", 917 | "start_time": "2018-10-25T18:11:30.411351Z" 918 | } 919 | }, 920 | "outputs": [], 921 | "source": [ 922 | "import math # use import to load a library" 923 | ] 924 | }, 925 | { 926 | "cell_type": "markdown", 927 | "metadata": {}, 928 | "source": [ 929 | "To use functions from the library, do: `library_name.function_name`. For example, when we want to calculate the logarithm using a function from `math` library, we can do `math.log`" 930 | ] 931 | }, 932 | { 933 | "cell_type": "code", 934 | "execution_count": 27, 935 | "metadata": { 936 | "ExecuteTime": { 937 | "end_time": "2018-10-25T18:12:36.793378Z", 938 | "start_time": "2018-10-25T18:12:36.788201Z" 939 | } 940 | }, 941 | "outputs": [ 942 | { 943 | "name": "stdout", 944 | "output_type": "stream", 945 | "text": [ 946 | "e^5 = 148.413159\n", 947 | "log(5) = 1.609438\n" 948 | ] 949 | } 950 | ], 951 | "source": [ 952 | "x = 5\n", 953 | "print(\"e^%i\"%x,\"= %f\"%math.exp(x))\n", 954 | "print(\"log(%i)\"%x,\"= %f\"%math.log(x))" 955 | ] 956 | }, 957 | { 958 | "cell_type": "markdown", 959 | "metadata": {}, 960 | "source": [ 961 | "You can also import one specific function only:" 962 | ] 963 | }, 964 | { 965 | "cell_type": "code", 966 | "execution_count": 28, 967 | "metadata": { 968 | "ExecuteTime": { 969 | "end_time": "2018-10-25T18:12:37.162731Z", 970 | "start_time": "2018-10-25T18:12:37.157103Z" 971 | } 972 | }, 973 | "outputs": [ 974 | { 975 | "name": "stdout", 976 | "output_type": "stream", 977 | "text": [ 978 | "148.4131591025766\n" 979 | ] 980 | } 981 | ], 982 | "source": [ 983 | "from math import exp # You can import a specific function\n", 984 | "print(exp(x)) # This way, you don't need to use math.exp but just exp" 985 | ] 986 | }, 987 | { 988 | "cell_type": "markdown", 989 | "metadata": {}, 990 | "source": [ 991 | "Or import all functions from a library:" 992 | ] 993 | }, 994 | { 995 | "cell_type": "code", 996 | "execution_count": 29, 997 | "metadata": { 998 | "ExecuteTime": { 999 | "end_time": "2018-10-25T17:20:30.324507Z", 1000 | "start_time": "2018-10-25T17:20:30.320936Z" 1001 | } 1002 | }, 1003 | "outputs": [], 1004 | "source": [ 1005 | "from math import * # Import all functions" 1006 | ] 1007 | }, 1008 | { 1009 | "cell_type": "code", 1010 | "execution_count": 30, 1011 | "metadata": { 1012 | "ExecuteTime": { 1013 | "end_time": "2018-10-25T17:20:30.331795Z", 1014 | "start_time": "2018-10-25T17:20:30.327483Z" 1015 | } 1016 | }, 1017 | "outputs": [ 1018 | { 1019 | "name": "stdout", 1020 | "output_type": "stream", 1021 | "text": [ 1022 | "148.4131591025766\n", 1023 | "1.6094379124341003\n" 1024 | ] 1025 | } 1026 | ], 1027 | "source": [ 1028 | "print(exp(x))\n", 1029 | "print(log(x)) # Before importing math, calling `exp` or `log` will raise errors" 1030 | ] 1031 | }, 1032 | { 1033 | "cell_type": "markdown", 1034 | "metadata": {}, 1035 | "source": [ 1036 | "Depending on what you want to achieve, you may want to choose between importing a few or all (by `*`) functions within a package." 1037 | ] 1038 | }, 1039 | { 1040 | "cell_type": "markdown", 1041 | "metadata": {}, 1042 | "source": [ 1043 | "#### External libraries" 1044 | ] 1045 | }, 1046 | { 1047 | "cell_type": "markdown", 1048 | "metadata": { 1049 | "ExecuteTime": { 1050 | "end_time": "2017-10-21T16:45:05.266743Z", 1051 | "start_time": "2017-10-21T16:45:05.260803Z" 1052 | } 1053 | }, 1054 | "source": [ 1055 | "There are times you'll want some advanced utility functions not provided by Python. There are many useful packages by developers.\n", 1056 | "\n", 1057 | "We'll use __numpy__ as an example. (__numpy__, __scipy__, __matplotlib__,and probably __pandas__ will be of the most importance to you for data analyses.\n", 1058 | "\n", 1059 | "Installation of packages for Python through the command line is easy pip:" 1060 | ] 1061 | }, 1062 | { 1063 | "cell_type": "markdown", 1064 | "metadata": {}, 1065 | "source": [ 1066 | "```bash\n", 1067 | "~$ pip install numpy scipy pandas\n", 1068 | "```\n", 1069 | "This assumes that pip executes pip3. On my machine I have to call:\n", 1070 | "\n", 1071 | "```bash\n", 1072 | "~$ pip3 install numpy scipy pandas\n", 1073 | "```" 1074 | ] 1075 | }, 1076 | { 1077 | "cell_type": "markdown", 1078 | "metadata": {}, 1079 | "source": [ 1080 | "For this lecture, you do not have to install libraries yourself. All necessary libraries are preinstalled through vagrant. If we need more, we will update [the vagrant file](https://github.com/BigDataAnalyticsGroup/python/blob/master/Vagrantfile). Also make sure you do not miss [our instructions](https://github.com/BigDataAnalyticsGroup/python/blob/master/Instructions.md) on how to use vagrant." 1081 | ] 1082 | } 1083 | ], 1084 | "metadata": { 1085 | "hide_input": false, 1086 | "kernelspec": { 1087 | "display_name": "Python 3 (ipykernel)", 1088 | "language": "python", 1089 | "name": "python3" 1090 | }, 1091 | "language_info": { 1092 | "codemirror_mode": { 1093 | "name": "ipython", 1094 | "version": 3 1095 | }, 1096 | "file_extension": ".py", 1097 | "mimetype": "text/x-python", 1098 | "name": "python", 1099 | "nbconvert_exporter": "python", 1100 | "pygments_lexer": "ipython3", 1101 | "version": "3.9.12" 1102 | }, 1103 | "toc": { 1104 | "base_numbering": 1, 1105 | "nav_menu": {}, 1106 | "number_sections": true, 1107 | "sideBar": true, 1108 | "skip_h1_title": false, 1109 | "title_cell": "Table of Contents", 1110 | "title_sidebar": "Contents", 1111 | "toc_cell": false, 1112 | "toc_position": { 1113 | "height": "840px", 1114 | "left": "0px", 1115 | "right": "1111px", 1116 | "top": "113px", 1117 | "width": "253px" 1118 | }, 1119 | "toc_section_display": "block", 1120 | "toc_window_display": true 1121 | }, 1122 | "varInspector": { 1123 | "cols": { 1124 | "lenName": 16, 1125 | "lenType": 16, 1126 | "lenVar": 40 1127 | }, 1128 | "kernels_config": { 1129 | "python": { 1130 | "delete_cmd_postfix": "", 1131 | "delete_cmd_prefix": "del ", 1132 | "library": "var_list.py", 1133 | "varRefreshCmd": "print(var_dic_list())" 1134 | }, 1135 | "r": { 1136 | "delete_cmd_postfix": ") ", 1137 | "delete_cmd_prefix": "rm(", 1138 | "library": "var_list.r", 1139 | "varRefreshCmd": "cat(var_dic_list()) " 1140 | } 1141 | }, 1142 | "position": { 1143 | "height": "275px", 1144 | "left": "918px", 1145 | "right": "20px", 1146 | "top": "120px", 1147 | "width": "338px" 1148 | }, 1149 | "types_to_exclude": [ 1150 | "module", 1151 | "function", 1152 | "builtin_function_or_method", 1153 | "instance", 1154 | "_Feature" 1155 | ], 1156 | "window_display": false 1157 | } 1158 | }, 1159 | "nbformat": 4, 1160 | "nbformat_minor": 2 1161 | } 1162 | -------------------------------------------------------------------------------- /07 Comprehensions and Functional Programming.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**What you learn:**\n", 8 | "\n", 9 | "In this notebook you will learn about functional programming in Python. This includes list comprehensions, map(), unpacking (*-operator), zip(). \n", 10 | "\n", 11 | "Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)\n", 12 | "\n", 13 | "This notebook is available on https://github.com/BigDataAnalyticsGroup/python." 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "#### List comprehensions:" 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "A standard for-loop:" 28 | ] 29 | }, 30 | { 31 | "cell_type": "code", 32 | "execution_count": 1, 33 | "metadata": {}, 34 | "outputs": [ 35 | { 36 | "name": "stdout", 37 | "output_type": "stream", 38 | "text": [ 39 | "0\n", 40 | "1\n", 41 | "2\n", 42 | "3\n", 43 | "4\n" 44 | ] 45 | } 46 | ], 47 | "source": [ 48 | "# Syntax:\n", 49 | "# for item in list:\n", 50 | "# expression\n", 51 | "for n in range(5):\n", 52 | " print(n)" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 2, 58 | "metadata": {}, 59 | "outputs": [ 60 | { 61 | "data": { 62 | "text/plain": [ 63 | "[0, 1, 2, 3, 4]" 64 | ] 65 | }, 66 | "execution_count": 2, 67 | "metadata": {}, 68 | "output_type": "execute_result" 69 | } 70 | ], 71 | "source": [ 72 | "list(range(5))" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "**vs**\n", 80 | "\n", 81 | "a list comprehension:" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 3, 87 | "metadata": {}, 88 | "outputs": [ 89 | { 90 | "data": { 91 | "text/plain": [ 92 | "['0 hallo welt!',\n", 93 | " '1 hallo welt!',\n", 94 | " '2 hallo welt!',\n", 95 | " '3 hallo welt!',\n", 96 | " '4 hallo welt!']" 97 | ] 98 | }, 99 | "execution_count": 3, 100 | "metadata": {}, 101 | "output_type": "execute_result" 102 | } 103 | ], 104 | "source": [ 105 | "# Syntax:\n", 106 | "# [expression for item in list]\n", 107 | "# \n", 108 | "[str(n)+\" hallo welt!\" for n in range(5)]" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "A standard for-loop with an if-clause:" 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": 4, 121 | "metadata": {}, 122 | "outputs": [ 123 | { 124 | "name": "stdout", 125 | "output_type": "stream", 126 | "text": [ 127 | "0 is even\n", 128 | "2 is even\n", 129 | "4 is even\n" 130 | ] 131 | } 132 | ], 133 | "source": [ 134 | "# Syntax:\n", 135 | "# for item in list:\n", 136 | "# if condition:\n", 137 | "# expression\n", 138 | "for n in range(5):\n", 139 | " if n%2 == 0:\n", 140 | " print(n, \" is even\")" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "**vs**\n", 148 | "\n", 149 | "a list comprehension with an if-clause:" 150 | ] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "execution_count": 5, 155 | "metadata": {}, 156 | "outputs": [ 157 | { 158 | "data": { 159 | "text/plain": [ 160 | "[0, 2, 4]" 161 | ] 162 | }, 163 | "execution_count": 5, 164 | "metadata": {}, 165 | "output_type": "execute_result" 166 | } 167 | ], 168 | "source": [ 169 | "# Syntax:\n", 170 | "# [expression for item in list if condition]\n", 171 | "# \n", 172 | "ar = [n for n in range(5) if n%2 == 0]\n", 173 | "ar" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 6, 179 | "metadata": {}, 180 | "outputs": [ 181 | { 182 | "name": "stdout", 183 | "output_type": "stream", 184 | "text": [ 185 | "output 4\n" 186 | ] 187 | }, 188 | { 189 | "data": { 190 | "text/plain": [ 191 | "[4]" 192 | ] 193 | }, 194 | "execution_count": 6, 195 | "metadata": {}, 196 | "output_type": "execute_result" 197 | } 198 | ], 199 | "source": [ 200 | "def r(f):\n", 201 | " print(\"output\",f)\n", 202 | " return f\n", 203 | "# Syntax:\n", 204 | "# [expression for item in list if condition]\n", 205 | "# \n", 206 | "[r(t) for t in ar if t == 4]" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "PS: obviously if the if-clause is just about odd/even numbers or iterating over integers with a fixed offset, we could do this much easier:" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 7, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/plain": [ 224 | "[0, 2, 4]" 225 | ] 226 | }, 227 | "execution_count": 7, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "[i for i in range(0,5,2)]" 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": 8, 239 | "metadata": {}, 240 | "outputs": [ 241 | { 242 | "data": { 243 | "text/plain": [ 244 | "[0, 1, 2, 3, 4]" 245 | ] 246 | }, 247 | "execution_count": 8, 248 | "metadata": {}, 249 | "output_type": "execute_result" 250 | } 251 | ], 252 | "source": [ 253 | "# Syntax:\n", 254 | "# range(start_including [, end_excluding [, stepsize]])\n", 255 | "list(range(0,5))" 256 | ] 257 | }, 258 | { 259 | "cell_type": "markdown", 260 | "metadata": {}, 261 | "source": [ 262 | "#### map()\n", 263 | "\n", 264 | "map() is useful to map sequences of items. Do not confuse `map()` with the data structure `map` (aka dictionary) we introduced [in a previous notebook](https://github.com/BigDataAnalyticsGroup/python/blob/master/04%20Sets%20and%20Maps.ipynb).\n", 265 | "\n", 266 | "The difference is: map(key) -> value is a function mapping input keys to values.\n", 267 | "\n", 268 | "The data structure `map` (aka dictionary created through foo={}) does something similar, however, only **for a predefined and materialized set of key/value-pairs**.\n", 269 | "\n", 270 | "In other words, a dictionary, for any given key can only return a value iff that key is present in the dictionary. In contrast, the more general `map()` is simply a function signature that is typically implemented by using code to map values to keys.\n", 271 | "\n", 272 | "Of course you could use a dictionary to implement a map()-function." 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 9, 278 | "metadata": {}, 279 | "outputs": [ 280 | { 281 | "data": { 282 | "text/plain": [ 283 | "[0, 1, 2, 3, 4]" 284 | ] 285 | }, 286 | "execution_count": 9, 287 | "metadata": {}, 288 | "output_type": "execute_result" 289 | } 290 | ], 291 | "source": [ 292 | "ar = [0, 1, 2, 3, 4]\n", 293 | "ar" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "map() using a lambda-function to map keys to their squares:" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 10, 306 | "metadata": {}, 307 | "outputs": [ 308 | { 309 | "data": { 310 | "text/plain": [ 311 | "[0, 1, 4, 9, 16]" 312 | ] 313 | }, 314 | "execution_count": 10, 315 | "metadata": {}, 316 | "output_type": "execute_result" 317 | } 318 | ], 319 | "source": [ 320 | "squared2 = list(map(lambda x: x**2, ar))\n", 321 | "squared2" 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": {}, 327 | "source": [ 328 | "map() using a named function to map keys to their squares:" 329 | ] 330 | }, 331 | { 332 | "cell_type": "code", 333 | "execution_count": 11, 334 | "metadata": {}, 335 | "outputs": [ 336 | { 337 | "data": { 338 | "text/plain": [ 339 | "[0, 1, 4, 9, 16]" 340 | ] 341 | }, 342 | "execution_count": 11, 343 | "metadata": {}, 344 | "output_type": "execute_result" 345 | } 346 | ], 347 | "source": [ 348 | "def pow2(x):\n", 349 | " return x**2\n", 350 | "\n", 351 | "squared = list(map(pow2, ar))\n", 352 | "squared" 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": 12, 358 | "metadata": {}, 359 | "outputs": [ 360 | { 361 | "data": { 362 | "text/plain": [ 363 | "[0, 1, 4, 9, 16]" 364 | ] 365 | }, 366 | "execution_count": 12, 367 | "metadata": {}, 368 | "output_type": "execute_result" 369 | } 370 | ], 371 | "source": [ 372 | "squared" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "metadata": {}, 378 | "source": [ 379 | "#### Unpacking and the *-operator" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 13, 385 | "metadata": {}, 386 | "outputs": [ 387 | { 388 | "name": "stdout", 389 | "output_type": "stream", 390 | "text": [ 391 | "my_list: [1, 2, 3, 4]\n", 392 | "*my_list: 1 2 3 4\n", 393 | "1 2 3 4\n" 394 | ] 395 | } 396 | ], 397 | "source": [ 398 | "my_list = [1, 2, 3, 4] \n", 399 | "\n", 400 | "def foo(a, b, c, d): \n", 401 | " print(a, b, c, d) \n", 402 | " \n", 403 | "print(\"my_list:\", my_list)\n", 404 | "print(\"*my_list:\", *my_list)\n", 405 | "\n", 406 | "# unpack my_list into four arguments:\n", 407 | "foo(*my_list) # equivalent to: foo(my_list[0], my_list[1], my_list[2], my_list[3] )\n", 408 | "\n", 409 | "#fun(my_list) " 410 | ] 411 | }, 412 | { 413 | "cell_type": "markdown", 414 | "metadata": {}, 415 | "source": [ 416 | "**Example usage:**" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": 14, 422 | "metadata": {}, 423 | "outputs": [], 424 | "source": [ 425 | "# define two lists:\n", 426 | "l1 = range(5)\n", 427 | "l2 = range(6,11)" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": 15, 433 | "metadata": {}, 434 | "outputs": [ 435 | { 436 | "data": { 437 | "text/plain": [ 438 | "range(0, 5)" 439 | ] 440 | }, 441 | "execution_count": 15, 442 | "metadata": {}, 443 | "output_type": "execute_result" 444 | } 445 | ], 446 | "source": [ 447 | "l1" 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 16, 453 | "metadata": {}, 454 | "outputs": [ 455 | { 456 | "data": { 457 | "text/plain": [ 458 | "range" 459 | ] 460 | }, 461 | "execution_count": 16, 462 | "metadata": {}, 463 | "output_type": "execute_result" 464 | } 465 | ], 466 | "source": [ 467 | "type(l1)" 468 | ] 469 | }, 470 | { 471 | "cell_type": "markdown", 472 | "metadata": {}, 473 | "source": [ 474 | "we obtain an object of type range but we want to have a list!\n", 475 | "\n", 476 | "how do we obtain a list?" 477 | ] 478 | }, 479 | { 480 | "cell_type": "code", 481 | "execution_count": 17, 482 | "metadata": {}, 483 | "outputs": [ 484 | { 485 | "data": { 486 | "text/plain": [ 487 | "[0, 1, 2, 3, 4]" 488 | ] 489 | }, 490 | "execution_count": 17, 491 | "metadata": {}, 492 | "output_type": "execute_result" 493 | } 494 | ], 495 | "source": [ 496 | "# using a comprehension:\n", 497 | "[i for i in l1]" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 18, 503 | "metadata": {}, 504 | "outputs": [ 505 | { 506 | "data": { 507 | "text/plain": [ 508 | "([0, 1, 2, 3, 4], [6, 7, 8, 9, 10])" 509 | ] 510 | }, 511 | "execution_count": 18, 512 | "metadata": {}, 513 | "output_type": "execute_result" 514 | } 515 | ], 516 | "source": [ 517 | "# using the star-operator and list brackets []\n", 518 | "[*l1], [*l2]" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": {}, 524 | "source": [ 525 | "#### zip():" 526 | ] 527 | }, 528 | { 529 | "cell_type": "code", 530 | "execution_count": 19, 531 | "metadata": {}, 532 | "outputs": [ 533 | { 534 | "data": { 535 | "text/plain": [ 536 | "[(0, 6), (1, 7), (2, 8), (3, 9), (4, 10)]" 537 | ] 538 | }, 539 | "execution_count": 19, 540 | "metadata": {}, 541 | "output_type": "execute_result" 542 | } 543 | ], 544 | "source": [ 545 | "# zip the two lists together:\n", 546 | "zipped = zip(l1,l2)\n", 547 | "[i for i in zipped]" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": 20, 553 | "metadata": {}, 554 | "outputs": [ 555 | { 556 | "data": { 557 | "text/plain": [ 558 | "[(0, 6), (1, 7), (2, 8), (3, 9), (4, 10)]" 559 | ] 560 | }, 561 | "execution_count": 20, 562 | "metadata": {}, 563 | "output_type": "execute_result" 564 | } 565 | ], 566 | "source": [ 567 | "# or using star:\n", 568 | "zipped = zip(l1,l2)\n", 569 | "[*zipped]" 570 | ] 571 | }, 572 | { 573 | "cell_type": "markdown", 574 | "metadata": {}, 575 | "source": [ 576 | "Notice that the star-operator draws the elements from its input: so if you use the star on the same object again, the resulting list will be empty!\n" 577 | ] 578 | }, 579 | { 580 | "cell_type": "code", 581 | "execution_count": 21, 582 | "metadata": {}, 583 | "outputs": [ 584 | { 585 | "data": { 586 | "text/plain": [ 587 | "[]" 588 | ] 589 | }, 590 | "execution_count": 21, 591 | "metadata": {}, 592 | "output_type": "execute_result" 593 | } 594 | ], 595 | "source": [ 596 | "[*zipped]" 597 | ] 598 | } 599 | ], 600 | "metadata": { 601 | "kernelspec": { 602 | "display_name": "Python 3 (ipykernel)", 603 | "language": "python", 604 | "name": "python3" 605 | }, 606 | "language_info": { 607 | "codemirror_mode": { 608 | "name": "ipython", 609 | "version": 3 610 | }, 611 | "file_extension": ".py", 612 | "mimetype": "text/x-python", 613 | "name": "python", 614 | "nbconvert_exporter": "python", 615 | "pygments_lexer": "ipython3", 616 | "version": "3.9.12" 617 | }, 618 | "varInspector": { 619 | "cols": { 620 | "lenName": 16, 621 | "lenType": 16, 622 | "lenVar": 40 623 | }, 624 | "kernels_config": { 625 | "python": { 626 | "delete_cmd_postfix": "", 627 | "delete_cmd_prefix": "del ", 628 | "library": "var_list.py", 629 | "varRefreshCmd": "print(var_dic_list())" 630 | }, 631 | "r": { 632 | "delete_cmd_postfix": ") ", 633 | "delete_cmd_prefix": "rm(", 634 | "library": "var_list.r", 635 | "varRefreshCmd": "cat(var_dic_list()) " 636 | } 637 | }, 638 | "position": { 639 | "height": "339px", 640 | "left": "918px", 641 | "right": "20px", 642 | "top": "120px", 643 | "width": "347px" 644 | }, 645 | "types_to_exclude": [ 646 | "module", 647 | "function", 648 | "builtin_function_or_method", 649 | "instance", 650 | "_Feature" 651 | ], 652 | "window_display": false 653 | } 654 | }, 655 | "nbformat": 4, 656 | "nbformat_minor": 2 657 | } 658 | -------------------------------------------------------------------------------- /08 Classes and Inheritance.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**What you learn:**\n", 8 | "\n", 9 | "In this notebook you will learn about object-oriented programming in Python. This includes basic class definitions, constructors, destructors (finalizers), inheritance, instance vs class attributes. \n", 10 | "\n", 11 | "Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)\n", 12 | "\n", 13 | "This notebook is available on https://github.com/BigDataAnalyticsGroup/python." 14 | ] 15 | }, 16 | { 17 | "cell_type": "code", 18 | "execution_count": 1, 19 | "metadata": {}, 20 | "outputs": [], 21 | "source": [ 22 | "class A:\n", 23 | " # the constructor\n", 24 | " def __init__(self, counter=0, data=['what','ever']): \n", 25 | " self.counter = counter\n", 26 | " self.words = data\n", 27 | " print('constructor of class A was called, counter:', self.counter)\n", 28 | "\n", 29 | " # the \"kind of\"-destructor/finalizer\n", 30 | " # Note that in Python (as in Java) and in contrast to languages like C++ there is no guarantee\n", 31 | " # if and when this method will be executed!\n", 32 | " # If you want to make sure that certain cleanup routines are executed, define a separate close()-method.\n", 33 | " def __del__(self): \n", 34 | " print('destructor of an instance of A was called, counter:', self.counter)\n", 35 | " if self.words != None:\n", 36 | " del(self.words)\n", 37 | "\n", 38 | " def __len__(self): # think of \"self\" as \"this\" in Java, instance methods must have self as their first parameter\n", 39 | " return len(self.words)\n", 40 | " \n", 41 | " def close(self):\n", 42 | " # my own cleanup method\n", 43 | " pass" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 2, 49 | "metadata": {}, 50 | "outputs": [], 51 | "source": [ 52 | "counter = 0" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 3, 58 | "metadata": {}, 59 | "outputs": [ 60 | { 61 | "name": "stdout", 62 | "output_type": "stream", 63 | "text": [ 64 | "constructor of class A was called, counter: 1\n" 65 | ] 66 | } 67 | ], 68 | "source": [ 69 | "# Usage\n", 70 | "counter +=1\n", 71 | "a = A(counter) # create an instance oy MyClass" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 4, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "name": "stdout", 81 | "output_type": "stream", 82 | "text": [ 83 | "2\n" 84 | ] 85 | } 86 | ], 87 | "source": [ 88 | "print(len(a)) # will print 2" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 5, 94 | "metadata": {}, 95 | "outputs": [ 96 | { 97 | "name": "stdout", 98 | "output_type": "stream", 99 | "text": [ 100 | "['what', 'ever']\n" 101 | ] 102 | } 103 | ], 104 | "source": [ 105 | "print(a.words) # will print ['what','ever']" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 6, 111 | "metadata": {}, 112 | "outputs": [], 113 | "source": [ 114 | "a2 = a" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 7, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "a = None" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 8, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "name": "stdout", 133 | "output_type": "stream", 134 | "text": [ 135 | "destructor of an instance of A was called, counter: 1\n" 136 | ] 137 | } 138 | ], 139 | "source": [ 140 | "# setting the last ref to the object to None will call the destructor (reference counting):\n", 141 | "a2 = None" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 9, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "name": "stdout", 151 | "output_type": "stream", 152 | "text": [ 153 | "constructor of class A was called, counter: 0\n", 154 | "2\n", 155 | "['foo', 'bar']\n" 156 | ] 157 | } 158 | ], 159 | "source": [ 160 | "a3 = A(data=[\"foo\",\"bar\"])\n", 161 | "print(len(a3)) # will print 2\n", 162 | "print(a3.words) # will print ['foo','bar']" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 10, 168 | "metadata": {}, 169 | "outputs": [], 170 | "source": [ 171 | "# B is a subclass of A:\n", 172 | "class B(A): # brackets are used to extend a base class\n", 173 | " def __init__(self, counter=0, data=None): # the constructor\n", 174 | " print('constructor of class B was called')\n", 175 | " super().__init__(counter, data) # call the constructor of the superclass\n", 176 | " print('counter: ',self.counter)\n", 177 | " # do something here...\n", 178 | " \n", 179 | " def __del__(self): \n", 180 | " print('destructor of an instance of B was called', self.counter)\n", 181 | " # do something here...\n", 182 | "\n", 183 | " super().__del__() # call the destructor of the superclass" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 11, 189 | "metadata": {}, 190 | "outputs": [ 191 | { 192 | "name": "stdout", 193 | "output_type": "stream", 194 | "text": [ 195 | "constructor of class B was called\n", 196 | "constructor of class A was called, counter: 0\n", 197 | "counter: 0\n" 198 | ] 199 | } 200 | ], 201 | "source": [ 202 | "b = B()" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 12, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "data": { 212 | "text/plain": [ 213 | "0" 214 | ] 215 | }, 216 | "execution_count": 12, 217 | "metadata": {}, 218 | "output_type": "execute_result" 219 | } 220 | ], 221 | "source": [ 222 | "b.counter" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 13, 228 | "metadata": {}, 229 | "outputs": [ 230 | { 231 | "name": "stdout", 232 | "output_type": "stream", 233 | "text": [ 234 | "destructor of an instance of B was called 0\n", 235 | "destructor of an instance of A was called, counter: 0\n" 236 | ] 237 | } 238 | ], 239 | "source": [ 240 | "b = None" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "**Pitfall**: Instance vs class attributes" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 14, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "# in a class declaration an instance attribute may have the same name as the class attribute!\n", 257 | "class A: \n", 258 | " counter = 0 # this is a class(!) attribute used for counting the number of instances of this class\n", 259 | " \n", 260 | " def __init__(self): \n", 261 | " print('constructor of class A was called')\n", 262 | " # increase the class attribute:\n", 263 | " type(self).counter += 1\n", 264 | " #alternatively:\n", 265 | " #A.counter += 1\n", 266 | " # set the instance attribute:\n", 267 | " self.counter = 0\n", 268 | "\n", 269 | " def __del__(self): \n", 270 | " print('destructor of an instance of A was called')\n", 271 | "\n", 272 | " # decrease the class attribute:\n", 273 | " type(self).counter -= 1\n", 274 | "\n", 275 | " def incInstanceCounter(self):\n", 276 | " # increase the instance attribute:\n", 277 | " self.counter += 1\n", 278 | " \n", 279 | " def printCounters(self):\n", 280 | " print(\"instance: \", self.counter, \" class: \", type(self).counter)" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 15, 286 | "metadata": {}, 287 | "outputs": [ 288 | { 289 | "name": "stdout", 290 | "output_type": "stream", 291 | "text": [ 292 | "constructor of class A was called\n", 293 | "instance: 0 class: 1\n", 294 | "constructor of class A was called\n", 295 | "instance: 0 class: 2\n", 296 | "instance: 0 class: 2\n" 297 | ] 298 | } 299 | ], 300 | "source": [ 301 | "# let's instantiate some objects and thereby increase the instance counter:\n", 302 | "x = A()\n", 303 | "x.printCounters()\n", 304 | "y = A()\n", 305 | "x.printCounters()\n", 306 | "y.printCounters()" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "now let's increase the individual instance attributes:" 314 | ] 315 | }, 316 | { 317 | "cell_type": "code", 318 | "execution_count": 16, 319 | "metadata": {}, 320 | "outputs": [ 321 | { 322 | "name": "stdout", 323 | "output_type": "stream", 324 | "text": [ 325 | "instance: 1 class: 2\n" 326 | ] 327 | } 328 | ], 329 | "source": [ 330 | "y.incInstanceCounter()\n", 331 | "y.printCounters()" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": 17, 337 | "metadata": {}, 338 | "outputs": [ 339 | { 340 | "name": "stdout", 341 | "output_type": "stream", 342 | "text": [ 343 | "destructor of an instance of A was called\n", 344 | "destructor of an instance of A was called\n" 345 | ] 346 | }, 347 | { 348 | "data": { 349 | "text/plain": [ 350 | "(None, None)" 351 | ] 352 | }, 353 | "execution_count": 17, 354 | "metadata": {}, 355 | "output_type": "execute_result" 356 | } 357 | ], 358 | "source": [ 359 | "x = None\n", 360 | "y = None\n", 361 | "x,y" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 18, 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "data": { 371 | "text/plain": [ 372 | "0" 373 | ] 374 | }, 375 | "execution_count": 18, 376 | "metadata": {}, 377 | "output_type": "execute_result" 378 | } 379 | ], 380 | "source": [ 381 | "A.counter" 382 | ] 383 | }, 384 | { 385 | "cell_type": "code", 386 | "execution_count": 19, 387 | "metadata": {}, 388 | "outputs": [], 389 | "source": [ 390 | "class Foo: \n", 391 | " counter = 42" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": 20, 397 | "metadata": {}, 398 | "outputs": [ 399 | { 400 | "data": { 401 | "text/plain": [ 402 | "42" 403 | ] 404 | }, 405 | "execution_count": 20, 406 | "metadata": {}, 407 | "output_type": "execute_result" 408 | } 409 | ], 410 | "source": [ 411 | "Foo.counter" 412 | ] 413 | } 414 | ], 415 | "metadata": { 416 | "kernelspec": { 417 | "display_name": "Python 3 (ipykernel)", 418 | "language": "python", 419 | "name": "python3" 420 | }, 421 | "language_info": { 422 | "codemirror_mode": { 423 | "name": "ipython", 424 | "version": 3 425 | }, 426 | "file_extension": ".py", 427 | "mimetype": "text/x-python", 428 | "name": "python", 429 | "nbconvert_exporter": "python", 430 | "pygments_lexer": "ipython3", 431 | "version": "3.9.12" 432 | }, 433 | "varInspector": { 434 | "cols": { 435 | "lenName": 16, 436 | "lenType": 16, 437 | "lenVar": 40 438 | }, 439 | "kernels_config": { 440 | "python": { 441 | "delete_cmd_postfix": "", 442 | "delete_cmd_prefix": "del ", 443 | "library": "var_list.py", 444 | "varRefreshCmd": "print(var_dic_list())" 445 | }, 446 | "r": { 447 | "delete_cmd_postfix": ") ", 448 | "delete_cmd_prefix": "rm(", 449 | "library": "var_list.r", 450 | "varRefreshCmd": "cat(var_dic_list()) " 451 | } 452 | }, 453 | "position": { 454 | "height": "240px", 455 | "left": "868px", 456 | "right": "20px", 457 | "top": "122px", 458 | "width": "378px" 459 | }, 460 | "types_to_exclude": [ 461 | "module", 462 | "function", 463 | "builtin_function_or_method", 464 | "instance", 465 | "_Feature" 466 | ], 467 | "window_display": false 468 | } 469 | }, 470 | "nbformat": 4, 471 | "nbformat_minor": 2 472 | } 473 | -------------------------------------------------------------------------------- /09 Unittests.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "**What you learn:**\n", 8 | "\n", 9 | "In this notebook you will learn about automatic unit tests in Python. \n", 10 | "\n", 11 | "Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)\n", 12 | "\n", 13 | "This notebook is available on https://github.com/BigDataAnalyticsGroup/python." 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "## unittests\n", 21 | "\n", 22 | "How do we make sure that a certain function does what we expect it to do?\n", 23 | "\n", 24 | "The answer is **unittests**. A unittest checks whether a given function produces an expected output for a given input.\n", 25 | "\n", 26 | "So, for any Python function out = f(in) that we implement, a unittest defines typically multiple (in, out)-tuples.\n", 27 | "\n", 28 | "In any **real** software delevopment unittests are the state-of-the-art method to control the correctness of your software. Unittests are typically executed automatically (every night, every time you change anything, etc.)\n", 29 | "\n", 30 | "Do not confuse the terms `try out` with `testing`.\n", 31 | "\n", 32 | "`try out`: let's run my program with a couple of inputs and see what happens, if it looks good, we are done. Really?\n", 33 | "\n", 34 | "`testing`: write unittests for each and every situation we expect our function to do. Run these tests systematically every time you change anything in your code. This is also called `automatic testing`. If we pass all tests, we are done.\n", 35 | "\n" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "#### Principle structure of a test" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 1, 48 | "metadata": {}, 49 | "outputs": [], 50 | "source": [ 51 | "import unittest\n", 52 | "\n", 53 | "# 'class' wraps a number of tests into a single unit\n", 54 | "# this comes from a programming paradigm called \"object-oriented programming\" (OOP),\n", 55 | "# which you do not really need for this lecture\n", 56 | "# simply read it as \"a class bundles a couple of test-functions into one unit\"\n", 57 | "#\n", 58 | "# in OOP-terms:\n", 59 | "# \"this is a class definition extending/inheriting from unittest.TestCase\" \n", 60 | "class MyTest(unittest.TestCase):\n", 61 | "\n", 62 | " # the following function will be called before every test-function:\n", 63 | " def setUp(self):\n", 64 | " print('setUp MyTest')\n", 65 | " \n", 66 | " # each test-function must have a name starting with 'test':\n", 67 | " def test_1(self):\n", 68 | " print('test_1')\n", 69 | "\n", 70 | " def test_2(self):\n", 71 | " print('test_2')\n", 72 | "\n", 73 | " def test_3(self):\n", 74 | " print('test_3')\n", 75 | "\n", 76 | " # the following function will be called after every test-function:\n", 77 | " def tearDown(self):\n", 78 | " print('tearDown')" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 2, 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "name": "stderr", 88 | "output_type": "stream", 89 | "text": [ 90 | "test_1 (__main__.MyTest) ... ok\n", 91 | "test_2 (__main__.MyTest) ... ok\n", 92 | "test_3 (__main__.MyTest) ... ok\n", 93 | "\n", 94 | "----------------------------------------------------------------------\n", 95 | "Ran 3 tests in 0.007s\n", 96 | "\n", 97 | "OK\n" 98 | ] 99 | }, 100 | { 101 | "name": "stdout", 102 | "output_type": "stream", 103 | "text": [ 104 | "setUp MyTest\n", 105 | "test_1\n", 106 | "tearDown\n", 107 | "setUp MyTest\n", 108 | "test_2\n", 109 | "tearDown\n", 110 | "setUp MyTest\n", 111 | "test_3\n", 112 | "tearDown\n" 113 | ] 114 | }, 115 | { 116 | "data": { 117 | "text/plain": [ 118 | "" 119 | ] 120 | }, 121 | "execution_count": 2, 122 | "metadata": {}, 123 | "output_type": "execute_result" 124 | } 125 | ], 126 | "source": [ 127 | "# Run the unit test without shutting down the jupyter kernel\n", 128 | "# notice that this call will look for all test classes, i.e.\n", 129 | "# all classes inheriting from unittest.TestCase\n", 130 | "# each of these functions will be executed once\n", 131 | "# unittest.main(argv=['ignored', '-v'], verbosity=2, exit=False)\n", 132 | "\n", 133 | "# only execute a specific Test-class:\n", 134 | "unittest.main(argv=['ignored', '-v'], defaultTest='MyTest', verbosity=2, exit=False)\n" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "#### Example Testing Scenario" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 3, 147 | "metadata": {}, 148 | "outputs": [], 149 | "source": [ 150 | "# some class we implemented:\n", 151 | "class Bar:\n", 152 | " def __init__(self):\n", 153 | " self.a = 42\n", 154 | "\n", 155 | " def whatever(self, par):\n", 156 | " if par == 0:\n", 157 | " raise ValueError(\"Division by zero is not defined.\")\n", 158 | " self.a /= par\n", 159 | " return self.a" 160 | ] 161 | }, 162 | { 163 | "cell_type": "code", 164 | "execution_count": 4, 165 | "metadata": {}, 166 | "outputs": [], 167 | "source": [ 168 | "b = Bar()" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 5, 174 | "metadata": {}, 175 | "outputs": [ 176 | { 177 | "data": { 178 | "text/plain": [ 179 | "42" 180 | ] 181 | }, 182 | "execution_count": 5, 183 | "metadata": {}, 184 | "output_type": "execute_result" 185 | } 186 | ], 187 | "source": [ 188 | "b.a" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 6, 194 | "metadata": {}, 195 | "outputs": [ 196 | { 197 | "data": { 198 | "text/plain": [ 199 | "float" 200 | ] 201 | }, 202 | "execution_count": 6, 203 | "metadata": {}, 204 | "output_type": "execute_result" 205 | } 206 | ], 207 | "source": [ 208 | "type(b.whatever(2))" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 7, 214 | "metadata": {}, 215 | "outputs": [], 216 | "source": [ 217 | "# unittests in Python are very similar to unittests in Java:\n", 218 | "import unittest\n", 219 | "\n", 220 | "class BarTest(unittest.TestCase):\n", 221 | " # the following method will be called before every test-Method\n", 222 | " def setUp(self):\n", 223 | " print(\"setup BarTest\")\n", 224 | " self.bar = Bar()\n", 225 | " \n", 226 | " def test_div(self):\n", 227 | " print('test_div')\n", 228 | " self.assertEqual(self.bar.a, 42)\n", 229 | " local_a = 42\n", 230 | " par = 77\n", 231 | " ret = self.bar.whatever(par)\n", 232 | " self.assertEqual(ret, local_a/par)\n", 233 | " self.assertEqual(self.bar.a, local_a/par)\n", 234 | " \n", 235 | " def test_init(self):\n", 236 | " print('test_init')\n", 237 | " self.assertEqual(self.bar.a, 42)\n", 238 | "\n", 239 | " def test_divbyzero(self):\n", 240 | " print('test_divbyzero')\n", 241 | " with self.assertRaises(ValueError):\n", 242 | " self.bar.whatever(0)" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": 8, 248 | "metadata": {}, 249 | "outputs": [ 250 | { 251 | "name": "stderr", 252 | "output_type": "stream", 253 | "text": [ 254 | "test_div (__main__.BarTest) ... ok\n", 255 | "test_divbyzero (__main__.BarTest) ... ok\n", 256 | "test_init (__main__.BarTest) ... ok\n", 257 | "\n", 258 | "----------------------------------------------------------------------\n", 259 | "Ran 3 tests in 0.005s\n", 260 | "\n", 261 | "OK\n" 262 | ] 263 | }, 264 | { 265 | "name": "stdout", 266 | "output_type": "stream", 267 | "text": [ 268 | "setup BarTest\n", 269 | "test_div\n", 270 | "setup BarTest\n", 271 | "test_divbyzero\n", 272 | "setup BarTest\n", 273 | "test_init\n" 274 | ] 275 | }, 276 | { 277 | "data": { 278 | "text/plain": [ 279 | "" 280 | ] 281 | }, 282 | "execution_count": 8, 283 | "metadata": {}, 284 | "output_type": "execute_result" 285 | } 286 | ], 287 | "source": [ 288 | "# only execute a specific Test-class:\n", 289 | "unittest.main(argv=['ignored', '-v'], defaultTest='BarTest', verbosity=2, exit=False)" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "#### Example: sum of squares" 297 | ] 298 | }, 299 | { 300 | "cell_type": "markdown", 301 | "metadata": {}, 302 | "source": [ 303 | "We want to compute $$ \\sum_{i=low}^{high} i^2$$" 304 | ] 305 | }, 306 | { 307 | "cell_type": "code", 308 | "execution_count": 9, 309 | "metadata": {}, 310 | "outputs": [], 311 | "source": [ 312 | "# returns the sum of squares in the int interval [low;high]\n", 313 | "# a straightforward implementation of a squaredSum\n", 314 | "def mySquaredSum(low, high):\n", 315 | " _sum = 0\n", 316 | " for i in range(low,high+1): # note the 'high+1' (rather than 'high')\n", 317 | " _sum += i*i\n", 318 | "\n", 319 | " return _sum" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 10, 325 | "metadata": {}, 326 | "outputs": [ 327 | { 328 | "data": { 329 | "text/plain": [ 330 | "385" 331 | ] 332 | }, 333 | "execution_count": 10, 334 | "metadata": {}, 335 | "output_type": "execute_result" 336 | } 337 | ], 338 | "source": [ 339 | "# let's \"try it out\":\n", 340 | "mySquaredSum(1,10)" 341 | ] 342 | }, 343 | { 344 | "cell_type": "code", 345 | "execution_count": 11, 346 | "metadata": {}, 347 | "outputs": [ 348 | { 349 | "data": { 350 | "text/plain": [ 351 | "0" 352 | ] 353 | }, 354 | "execution_count": 11, 355 | "metadata": {}, 356 | "output_type": "execute_result" 357 | } 358 | ], 359 | "source": [ 360 | "# let's \"try it out\" even more:\n", 361 | "mySquaredSum(12,11)" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 12, 367 | "metadata": {}, 368 | "outputs": [ 369 | { 370 | "data": { 371 | "text/plain": [ 372 | "0" 373 | ] 374 | }, 375 | "execution_count": 12, 376 | "metadata": {}, 377 | "output_type": "execute_result" 378 | } 379 | ], 380 | "source": [ 381 | "# let's \"try it out\" even more:\n", 382 | "mySquaredSum(0,-12)" 383 | ] 384 | }, 385 | { 386 | "cell_type": "markdown", 387 | "metadata": {}, 388 | "source": [ 389 | "Looks good. But are we done here? No, we need to write proper unittests for this." 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": {}, 395 | "source": [ 396 | "#### Unittest for sum of squares" 397 | ] 398 | }, 399 | { 400 | "cell_type": "markdown", 401 | "metadata": {}, 402 | "source": [ 403 | "OK, let's write a real unittest for our squaredSum-function:" 404 | ] 405 | }, 406 | { 407 | "cell_type": "code", 408 | "execution_count": 13, 409 | "metadata": {}, 410 | "outputs": [], 411 | "source": [ 412 | "# returns the sum of squares in the int interval [low;high]\n", 413 | "def squaredSumRecursive(low, high, count=0, indent=''):\n", 414 | " #print(indent, 'sqsr', low, high)\n", 415 | " if count > 15: # recursion emergency brake...\n", 416 | " return 0\n", 417 | " _sum = 0\n", 418 | " if high>low:\n", 419 | " middleLeft = low + (high-low)//2\n", 420 | " middleRight = middleLeft + 1\n", 421 | " _sum += squaredSumRecursive(low, middleLeft, count+1, indent+' ')\n", 422 | " _sum += squaredSumRecursive(middleRight, high, count+1, indent+' ')\n", 423 | " return _sum\n", 424 | " elif high" 535 | ] 536 | }, 537 | "execution_count": 16, 538 | "metadata": {}, 539 | "output_type": "execute_result" 540 | } 541 | ], 542 | "source": [ 543 | "# Run the unit test without shutting down the jupyter kernel\n", 544 | "# here we are running only SumTest!\n", 545 | "unittest.main(argv=['ignored', '-v'], defaultTest='SumTest', verbosity=2, exit=False)" 546 | ] 547 | } 548 | ], 549 | "metadata": { 550 | "hide_input": false, 551 | "kernelspec": { 552 | "display_name": "Python 3 (ipykernel)", 553 | "language": "python", 554 | "name": "python3" 555 | }, 556 | "language_info": { 557 | "codemirror_mode": { 558 | "name": "ipython", 559 | "version": 3 560 | }, 561 | "file_extension": ".py", 562 | "mimetype": "text/x-python", 563 | "name": "python", 564 | "nbconvert_exporter": "python", 565 | "pygments_lexer": "ipython3", 566 | "version": "3.9.12" 567 | }, 568 | "toc": { 569 | "base_numbering": 1, 570 | "nav_menu": {}, 571 | "number_sections": true, 572 | "sideBar": true, 573 | "skip_h1_title": false, 574 | "title_cell": "Table of Contents", 575 | "title_sidebar": "Contents", 576 | "toc_cell": false, 577 | "toc_position": { 578 | "height": "840px", 579 | "left": "0px", 580 | "right": "1111px", 581 | "top": "113px", 582 | "width": "253px" 583 | }, 584 | "toc_section_display": "block", 585 | "toc_window_display": true 586 | }, 587 | "varInspector": { 588 | "cols": { 589 | "lenName": 16, 590 | "lenType": 16, 591 | "lenVar": 40 592 | }, 593 | "kernels_config": { 594 | "python": { 595 | "delete_cmd_postfix": "", 596 | "delete_cmd_prefix": "del ", 597 | "library": "var_list.py", 598 | "varRefreshCmd": "print(var_dic_list())" 599 | }, 600 | "r": { 601 | "delete_cmd_postfix": ") ", 602 | "delete_cmd_prefix": "rm(", 603 | "library": "var_list.r", 604 | "varRefreshCmd": "cat(var_dic_list()) " 605 | } 606 | }, 607 | "types_to_exclude": [ 608 | "module", 609 | "function", 610 | "builtin_function_or_method", 611 | "instance", 612 | "_Feature" 613 | ], 614 | "window_display": false 615 | } 616 | }, 617 | "nbformat": 4, 618 | "nbformat_minor": 2 619 | } 620 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Python Introduction 2 | 3 | A short intro to Python for Java/C++ developers. 4 | 5 | Videos (in German) are available on [youtube](https://www.youtube.com/watch?v=1S4Cgtkxqhs&list=PLC4UZxBVGKte4XagApdryLsnIXpjZWSAn). 6 | 7 | To run the notebooks, install the corresponding Python requirements. 8 | ```sh 9 | pip install -r requirements.txt 10 | ``` 11 | 12 | Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode) 13 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | altair==4.2.0 2 | matplotlib==3.5.2 3 | numpy==1.22.3 4 | pandas==1.4.2 5 | vega-datasets==0.9.0 6 | jupyter==1.0.0 7 | --------------------------------------------------------------------------------