├── Banner_MATH2319.png ├── PB1_nb_intro.ipynb ├── PB2_nb_markdown.ipynb ├── PB3_intro_to_python.ipynb ├── PB4_numpy.ipynb ├── PB5_pandas.ipynb ├── PB6_matplotlib.ipynb ├── PB7_seaborn.ipynb ├── PB8_python_vs_r.ipynb └── README.md /Banner_MATH2319.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/akmand/python_data_science_tutorials/2c5ced93bf3c8b853d516c40c6abea012234b3bd/Banner_MATH2319.png -------------------------------------------------------------------------------- /PB1_nb_intro.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# INTRODUCTION TO JUPYTER NOTEBOOKS\n", 8 | "\n", 9 | "In this tutorial, we discuss some basic tasks to get your Jupyter notebooks up and running on your computer." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "collapsed": true, 16 | "jupyter": { 17 | "outputs_hidden": true 18 | } 19 | }, 20 | "source": [ 21 | "## Spellchecker: LanguageTool Browser Extension" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "Last thing you want on your Jupyter notebooks is typos. Jupyter notebooks have some spellchecker extensions, but it gets problematic installing them on different software environments. For spellchecking, we actually recommend a free browser-based extension called **LanguageTool** [here](https://languagetool.org/). This extension not only checks for typos in your notebooks, but also anything you type within your browser as an extra bonus. Sweet!" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "## How to check for Python and module versions" 36 | ] 37 | }, 38 | { 39 | "cell_type": "markdown", 40 | "metadata": {}, 41 | "source": [ 42 | "Within your shell, you can issue the command:\n", 43 | "```HTML\n", 44 | "> python --version\n", 45 | "```\n", 46 | "Sometimes python's executable command name will be \"python3\", so you might need:\n", 47 | "```HTML\n", 48 | "> python3 --version\n", 49 | "```\n", 50 | "Within the Jupyter notebooks environment, to issue a system command, you will need to an exclamation mark (\"!\") in front as shown below, which will have the same effect:" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 1, 56 | "metadata": {}, 57 | "outputs": [ 58 | { 59 | "name": "stdout", 60 | "output_type": "stream", 61 | "text": [ 62 | "Python 3.11.9\n" 63 | ] 64 | } 65 | ], 66 | "source": [ 67 | "!python3 --version" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "To check for version number of a Python module, you can view its `__version__` attribute as below." 75 | ] 76 | }, 77 | { 78 | "cell_type": "code", 79 | "execution_count": 4, 80 | "metadata": {}, 81 | "outputs": [], 82 | "source": [ 83 | "import numpy as np" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 5, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "data": { 93 | "text/plain": [ 94 | "'2.0.0'" 95 | ] 96 | }, 97 | "execution_count": 5, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | } 101 | ], 102 | "source": [ 103 | "np.__version__" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "## How to read CSV files" 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 6, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "import pandas as pd" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "Assuming that your file is under a directory called `data`:" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 7, 132 | "metadata": {}, 133 | "outputs": [ 134 | { 135 | "data": { 136 | "text/html": [ 137 | "
\n", 138 | "\n", 151 | "\n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | " \n", 182 | " \n", 183 | " \n", 184 | " \n", 185 | " \n", 186 | " \n", 187 | " \n", 188 | " \n", 189 | " \n", 190 | " \n", 191 | " \n", 192 | " \n", 193 | " \n", 194 | " \n", 195 | " \n", 196 | " \n", 197 | " \n", 198 | " \n", 199 | " \n", 200 | " \n", 201 | " \n", 202 | " \n", 203 | " \n", 204 | " \n", 205 | " \n", 206 | " \n", 207 | " \n", 208 | " \n", 209 | " \n", 210 | " \n", 211 | " \n", 212 | " \n", 213 | " \n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | "
mean_radiusmean_texturemean_perimetermean_areamean_smoothnessmean_compactnessmean_concavitymean_concave_pointsmean_symmetrymean_fractal_dimension...worst_textureworst_perimeterworst_areaworst_smoothnessworst_compactnessworst_concavityworst_concave_pointsworst_symmetryworst_fractal_dimensiondiagnosis
017.9910.38122.801001.00.118400.277600.30010.147100.24190.07871...17.33184.602019.00.16220.66560.71190.26540.46010.11890M
120.5717.77132.901326.00.084740.078640.08690.070170.18120.05667...23.41158.801956.00.12380.18660.24160.18600.27500.08902M
219.6921.25130.001203.00.109600.159900.19740.127900.20690.05999...25.53152.501709.00.14440.42450.45040.24300.36130.08758M
311.4220.3877.58386.10.142500.283900.24140.105200.25970.09744...26.5098.87567.70.20980.86630.68690.25750.66380.17300M
420.2914.34135.101297.00.100300.132800.19800.104300.18090.05883...16.67152.201575.00.13740.20500.40000.16250.23640.07678M
\n", 301 | "

5 rows × 31 columns

\n", 302 | "
" 303 | ], 304 | "text/plain": [ 305 | " mean_radius mean_texture mean_perimeter mean_area mean_smoothness \\\n", 306 | "0 17.99 10.38 122.80 1001.0 0.11840 \n", 307 | "1 20.57 17.77 132.90 1326.0 0.08474 \n", 308 | "2 19.69 21.25 130.00 1203.0 0.10960 \n", 309 | "3 11.42 20.38 77.58 386.1 0.14250 \n", 310 | "4 20.29 14.34 135.10 1297.0 0.10030 \n", 311 | "\n", 312 | " mean_compactness mean_concavity mean_concave_points mean_symmetry \\\n", 313 | "0 0.27760 0.3001 0.14710 0.2419 \n", 314 | "1 0.07864 0.0869 0.07017 0.1812 \n", 315 | "2 0.15990 0.1974 0.12790 0.2069 \n", 316 | "3 0.28390 0.2414 0.10520 0.2597 \n", 317 | "4 0.13280 0.1980 0.10430 0.1809 \n", 318 | "\n", 319 | " mean_fractal_dimension ... worst_texture worst_perimeter worst_area \\\n", 320 | "0 0.07871 ... 17.33 184.60 2019.0 \n", 321 | "1 0.05667 ... 23.41 158.80 1956.0 \n", 322 | "2 0.05999 ... 25.53 152.50 1709.0 \n", 323 | "3 0.09744 ... 26.50 98.87 567.7 \n", 324 | "4 0.05883 ... 16.67 152.20 1575.0 \n", 325 | "\n", 326 | " worst_smoothness worst_compactness worst_concavity worst_concave_points \\\n", 327 | "0 0.1622 0.6656 0.7119 0.2654 \n", 328 | "1 0.1238 0.1866 0.2416 0.1860 \n", 329 | "2 0.1444 0.4245 0.4504 0.2430 \n", 330 | "3 0.2098 0.8663 0.6869 0.2575 \n", 331 | "4 0.1374 0.2050 0.4000 0.1625 \n", 332 | "\n", 333 | " worst_symmetry worst_fractal_dimension diagnosis \n", 334 | "0 0.4601 0.11890 M \n", 335 | "1 0.2750 0.08902 M \n", 336 | "2 0.3613 0.08758 M \n", 337 | "3 0.6638 0.17300 M \n", 338 | "4 0.2364 0.07678 M \n", 339 | "\n", 340 | "[5 rows x 31 columns]" 341 | ] 342 | }, 343 | "execution_count": 7, 344 | "metadata": {}, 345 | "output_type": "execute_result" 346 | } 347 | ], 348 | "source": [ 349 | "df = pd.read_csv('./data/breast_cancer_wisconsin.csv')\n", 350 | "df.head()" 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 8, 356 | "metadata": {}, 357 | "outputs": [ 358 | { 359 | "data": { 360 | "text/plain": [ 361 | "(569, 31)" 362 | ] 363 | }, 364 | "execution_count": 8, 365 | "metadata": {}, 366 | "output_type": "execute_result" 367 | } 368 | ], 369 | "source": [ 370 | "df.shape" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "## Python Package Management" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "We strongly recommend installing a virtual environment to avoid module version clashes. For Mac:\n", 385 | "\n", 386 | "The command below will create a folder called `.venv` which will host your virtual environment.\n", 387 | "```HTML\n", 388 | "> python3 -m venv .venv\n", 389 | "```\n", 390 | "Activate it so that you can use it:\n", 391 | "```HTML\n", 392 | "> source .venv/bin/activate\n", 393 | "```\n", 394 | "When done, simply deactivate your virtual environment:\n", 395 | "```HTML\n", 396 | "> deactivate\n", 397 | "```\n", 398 | "Please look this up on Google if you're on Windows.\n", 399 | "\n", 400 | "To get a list of all the Python modules on your current environment, try pip list:" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": null, 406 | "metadata": {}, 407 | "outputs": [], 408 | "source": [ 409 | "!pip list " 410 | ] 411 | }, 412 | { 413 | "cell_type": "code", 414 | "execution_count": null, 415 | "metadata": { 416 | "scrolled": true 417 | }, 418 | "outputs": [], 419 | "source": [ 420 | "!pip install --upgrade pip" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": {}, 426 | "source": [ 427 | "How to install multiple packages at once:" 428 | ] 429 | }, 430 | { 431 | "cell_type": "code", 432 | "execution_count": null, 433 | "metadata": {}, 434 | "outputs": [], 435 | "source": [ 436 | "!pip install pandas matplotlib" 437 | ] 438 | }, 439 | { 440 | "cell_type": "markdown", 441 | "metadata": {}, 442 | "source": [ 443 | "## The pipreqs Module" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "In many cases, you will need to compile a list of all the modules you installed in your virtual environment for documentation, which is usually in a text file called `requirements.txt`. \n", 451 | "\n", 452 | "We recommend the `pipreqs` module for this purpose, which is usually better than the common practice of `pip freeze requirements.txt`. In particular, `pipreqs` will avoid listing Jupyter notebooks modules, which is what you need as you won't need these in case you just need to run the code elsewhere without Jupyter notebooks. \n", 453 | "\n", 454 | "Simply install it via \n", 455 | "```HTML\n", 456 | "> pip install pipreqs \n", 457 | "```\n", 458 | "Next, save the list of your installed modules to `requirements.txt` via \n", 459 | "```HTML\n", 460 | "> pipreqs . --force\n", 461 | "```\n", 462 | "The `--force` option above overrides any existing `requirements.txt` file. The dot means \"this folder\". That is, you will need to run this command where your virtual environment folder `.venv` is, so that pipreqs picks up the correct modules.\n", 463 | "\n", 464 | "When you need to replicate your Python environment on a different machine, create a new virtual environment and install all the modules in your requirements.txt file as below:\n", 465 | "```HTML\n", 466 | "> pip install -r requirements.txt\n", 467 | "```\n", 468 | "This way, all the modules you need for your project will be installed with the correct version numbers." 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "---" 476 | ] 477 | } 478 | ], 479 | "metadata": { 480 | "kernelspec": { 481 | "display_name": "Python 3 (ipykernel)", 482 | "language": "python", 483 | "name": "python3" 484 | }, 485 | "language_info": { 486 | "codemirror_mode": { 487 | "name": "ipython", 488 | "version": 3 489 | }, 490 | "file_extension": ".py", 491 | "mimetype": "text/x-python", 492 | "name": "python", 493 | "nbconvert_exporter": "python", 494 | "pygments_lexer": "ipython3", 495 | "version": "3.11.9" 496 | } 497 | }, 498 | "nbformat": 4, 499 | "nbformat_minor": 4 500 | } 501 | -------------------------------------------------------------------------------- /PB3_intro_to_python.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction to Python Programming" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "This tutorial provides a basic overview of Python (>= 3.6) specifically for data analytics. In particular, we do not cover object-oriented programming (OOP) aspects of Python, how to manage errors and exceptions, etc. \n", 15 | "\n", 16 | "Python is a mature general-purpose programming language and has lots of bells and whistles accordingly. However, for introductory data analytics, you only need to be proficient in a certain subset of all the features that Python has to offer and our goal here is to focus on these features only. \n", 17 | "\n", 18 | "This tutorial does not assume any prior programming experience of any kind, though some background in functional programming would certainly be beneficial. The reader is referred to this [book](https://github.com/jakevdp/WhirlwindTourOfPython) for a solid introduction to the Python programming language. \n", 19 | "\n", 20 | "If you would like a cheat sheet for Python basics, this one [here](https://perso.limsi.fr/pointal/_media/python:cours:mementopython3-english.pdf) is quite useful. In addition, the link [here](https://jobtensor.com/Python-Introduction) has an excellent range of interactive learning materials on Python programming." 21 | ] 22 | }, 23 | { 24 | "cell_type": "markdown", 25 | "metadata": {}, 26 | "source": [ 27 | "## Table of Contents\n", 28 | " * [Variables](#Variables)\n", 29 | " * [Base Data Types in Python](#Base-Data-Types-in-Python)\n", 30 | " + [Integers](#Integers)\n", 31 | " + [Floats](#Floats)\n", 32 | " + [Strings](#Strings)\n", 33 | " + [Booleans](#Booleans)\n", 34 | " * [Comparison and Logical Operators](#Comparison-and-Logical-Operators)\n", 35 | " + [Comparison Operators](#Comparison-Operators)\n", 36 | " + [Logical Operators](#Logical-Operators)\n", 37 | " * [Basic Mathematical Operations](#Basic-Mathematical-Operations)\n", 38 | " * [Containers](#Containers)\n", 39 | " + [Lists](#List)\n", 40 | " + [Dictionaries](#Dictionaries)\n", 41 | " + [Common Operations on Containers](#Common-Operations-on-Containers)\n", 42 | " * [Conditional Statements and Loops](#Conditional-Statements-and-Loops)\n", 43 | " + [If](#If)\n", 44 | " + [If-else](#If-else)\n", 45 | " + [If-elif](#If-elif)\n", 46 | " + [Nested if](#Nested-if)\n", 47 | " + [While](#While)\n", 48 | " + [For](#For)\n", 49 | " + [Break](#Break)\n", 50 | " + [Continue](#Continue)\n", 51 | " + [List Comprehension](#List-Comprehension)\n", 52 | " * [Functions](#Functions)\n", 53 | " * [Object Introspection](#Object-Introspection)\n", 54 | " * [Modules](#Modules)\n", 55 | " * [Exercises](#Exercises)\n", 56 | " + [Solutions](#Solutions)" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "Let's first suppress warnings as they can get annoying sometimes." 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 1, 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [ 72 | "import warnings\n", 73 | "warnings.filterwarnings('ignore')" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "In Python, comments are indicated with a \"#\" (pound) sign. Anything that comes after the # sign is ignored by the Python interpreter.\n", 81 | "\n", 82 | "If you want to execute a line but suppress its output, you can end the line with a semi-colon. " 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": 2, 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "# we would like to run the line below, \n", 92 | "# but we would like to hide its output:\n", 93 | "2+2;" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "## Variables" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "In Python, you can assign a name to a value and this name is called a \"variable\". To create a variable, you use the `=` sign. " 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 3, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "x = 'Hello' # here, x is of \"string\" base type, which we discuss further below" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "Once you define the variable x, you can use that variable instead of its actual value. Let's verify this using the `print` **function** (sometimes we will refer to a function as a \"**command**\" or \"**method**\" - practically they all mean the same thing)." 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 4, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "name": "stdout", 133 | "output_type": "stream", 134 | "text": [ 135 | "Hello\n" 136 | ] 137 | } 138 | ], 139 | "source": [ 140 | "print(x)" 141 | ] 142 | }, 143 | { 144 | "cell_type": "markdown", 145 | "metadata": {}, 146 | "source": [ 147 | "You can assign the same value to more than one variable." 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 5, 153 | "metadata": {}, 154 | "outputs": [ 155 | { 156 | "name": "stdout", 157 | "output_type": "stream", 158 | "text": [ 159 | "23\n", 160 | "23\n" 161 | ] 162 | } 163 | ], 164 | "source": [ 165 | "y = z = 23\n", 166 | "print(y)\n", 167 | "print(z)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "Python allows you to assign multiple variables to multiple values simultaneously. This is called **multiple assignment**." 175 | ] 176 | }, 177 | { 178 | "cell_type": "code", 179 | "execution_count": 6, 180 | "metadata": {}, 181 | "outputs": [ 182 | { 183 | "name": "stdout", 184 | "output_type": "stream", 185 | "text": [ 186 | "3.14 Hello 1887\n" 187 | ] 188 | } 189 | ], 190 | "source": [ 191 | "a, b, c = 3.14, 'Hello', 1887\n", 192 | "print(a, b, c)" 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "In Python, you can easily swap values between multiple variables as below." 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": 7, 205 | "metadata": {}, 206 | "outputs": [ 207 | { 208 | "name": "stdout", 209 | "output_type": "stream", 210 | "text": [ 211 | "1887 Hello 3.14\n" 212 | ] 213 | } 214 | ], 215 | "source": [ 216 | "a, b, c = c, b, a\n", 217 | "print(a, b, c)" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "If you no longer need a variable, you can delete it from your computer's memory by using the `del()` function." 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 8, 230 | "metadata": {}, 231 | "outputs": [], 232 | "source": [ 233 | "del(x) " 234 | ] 235 | }, 236 | { 237 | "cell_type": "markdown", 238 | "metadata": {}, 239 | "source": [ 240 | "## Base Data Types in Python\n", 241 | "\n", 242 | "Python has the following base data types (there is also a `byte` base type, but we do not cover this)." 243 | ] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": {}, 248 | "source": [ 249 | "| Type | Description |\n", 250 | "|----|---|\n", 251 | "| int | Integer |\n", 252 | "| float | Floating-point number |\n", 253 | "| str | String |\n", 254 | "| bool | Boolean (True or False) |" 255 | ] 256 | }, 257 | { 258 | "cell_type": "markdown", 259 | "metadata": {}, 260 | "source": [ 261 | "To see the type of an object, you can use the `type()` function." 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": 9, 267 | "metadata": {}, 268 | "outputs": [ 269 | { 270 | "data": { 271 | "text/plain": [ 272 | "int" 273 | ] 274 | }, 275 | "execution_count": 9, 276 | "metadata": {}, 277 | "output_type": "execute_result" 278 | } 279 | ], 280 | "source": [ 281 | "type(10)" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 10, 287 | "metadata": {}, 288 | "outputs": [ 289 | { 290 | "data": { 291 | "text/plain": [ 292 | "float" 293 | ] 294 | }, 295 | "execution_count": 10, 296 | "metadata": {}, 297 | "output_type": "execute_result" 298 | } 299 | ], 300 | "source": [ 301 | "type(10.5)" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": 11, 307 | "metadata": {}, 308 | "outputs": [ 309 | { 310 | "data": { 311 | "text/plain": [ 312 | "str" 313 | ] 314 | }, 315 | "execution_count": 11, 316 | "metadata": {}, 317 | "output_type": "execute_result" 318 | } 319 | ], 320 | "source": [ 321 | "type('Python')" 322 | ] 323 | }, 324 | { 325 | "cell_type": "code", 326 | "execution_count": 12, 327 | "metadata": {}, 328 | "outputs": [ 329 | { 330 | "data": { 331 | "text/plain": [ 332 | "bool" 333 | ] 334 | }, 335 | "execution_count": 12, 336 | "metadata": {}, 337 | "output_type": "execute_result" 338 | } 339 | ], 340 | "source": [ 341 | "type(True)" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "### Integers" 349 | ] 350 | }, 351 | { 352 | "cell_type": "markdown", 353 | "metadata": {}, 354 | "source": [ 355 | "In Python, you can construct integer values with no limits on the number of digits. Below, ** is the power operator." 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": 13, 361 | "metadata": {}, 362 | "outputs": [ 363 | { 364 | "name": "stdout", 365 | "output_type": "stream", 366 | "text": [ 367 | "4922235242952026704037113243122008064\n" 368 | ] 369 | } 370 | ], 371 | "source": [ 372 | "long_integer = 12**34\n", 373 | "print(long_integer)" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "The function `int()` lets you construct an integer number from a (compatible) string." 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": 14, 386 | "metadata": {}, 387 | "outputs": [ 388 | { 389 | "name": "stdout", 390 | "output_type": "stream", 391 | "text": [ 392 | "10\n" 393 | ] 394 | } 395 | ], 396 | "source": [ 397 | "x = int('10')\n", 398 | "print(x)" 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 15, 404 | "metadata": {}, 405 | "outputs": [ 406 | { 407 | "data": { 408 | "text/plain": [ 409 | "int" 410 | ] 411 | }, 412 | "execution_count": 15, 413 | "metadata": {}, 414 | "output_type": "execute_result" 415 | } 416 | ], 417 | "source": [ 418 | "type(x)" 419 | ] 420 | }, 421 | { 422 | "cell_type": "markdown", 423 | "metadata": {}, 424 | "source": [ 425 | "While converting from `string` to `integer`, you will get an error if the string you are trying to convert does not represent any numbers." 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": 16, 431 | "metadata": {}, 432 | "outputs": [], 433 | "source": [ 434 | "# this will not work: x = int('Python')" 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": {}, 440 | "source": [ 441 | "### Floats " 442 | ] 443 | }, 444 | { 445 | "cell_type": "markdown", 446 | "metadata": {}, 447 | "source": [ 448 | "Real numbers are represented by [floating-points](https://en.wikipedia.org/wiki/Floating-point_arithmetic) in Python." 449 | ] 450 | }, 451 | { 452 | "cell_type": "code", 453 | "execution_count": 17, 454 | "metadata": {}, 455 | "outputs": [ 456 | { 457 | "name": "stdout", 458 | "output_type": "stream", 459 | "text": [ 460 | "18.87\n" 461 | ] 462 | } 463 | ], 464 | "source": [ 465 | "f = 18.87\n", 466 | "print(f)" 467 | ] 468 | }, 469 | { 470 | "cell_type": "markdown", 471 | "metadata": {}, 472 | "source": [ 473 | "If you use `int()` with a floating-point number, you will only get the integer part of that number." 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": 18, 479 | "metadata": {}, 480 | "outputs": [ 481 | { 482 | "name": "stdout", 483 | "output_type": "stream", 484 | "text": [ 485 | "18\n" 486 | ] 487 | } 488 | ], 489 | "source": [ 490 | "f = int(18.87)\n", 491 | "print(f)" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": 19, 497 | "metadata": {}, 498 | "outputs": [ 499 | { 500 | "data": { 501 | "text/plain": [ 502 | "int" 503 | ] 504 | }, 505 | "execution_count": 19, 506 | "metadata": {}, 507 | "output_type": "execute_result" 508 | } 509 | ], 510 | "source": [ 511 | "type(f)" 512 | ] 513 | }, 514 | { 515 | "cell_type": "markdown", 516 | "metadata": {}, 517 | "source": [ 518 | "We can use the `float()` function to define a floating-point number from a (compatible) string." 519 | ] 520 | }, 521 | { 522 | "cell_type": "code", 523 | "execution_count": 20, 524 | "metadata": {}, 525 | "outputs": [ 526 | { 527 | "name": "stdout", 528 | "output_type": "stream", 529 | "text": [ 530 | "18.87\n" 531 | ] 532 | } 533 | ], 534 | "source": [ 535 | "f = float('18.87')\n", 536 | "print(f)" 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 21, 542 | "metadata": {}, 543 | "outputs": [ 544 | { 545 | "data": { 546 | "text/plain": [ 547 | "float" 548 | ] 549 | }, 550 | "execution_count": 21, 551 | "metadata": {}, 552 | "output_type": "execute_result" 553 | } 554 | ], 555 | "source": [ 556 | "type(f)" 557 | ] 558 | }, 559 | { 560 | "cell_type": "markdown", 561 | "metadata": {}, 562 | "source": [ 563 | "### Strings" 564 | ] 565 | }, 566 | { 567 | "cell_type": "markdown", 568 | "metadata": {}, 569 | "source": [ 570 | "A *string* is a sequence of characters. In Python, strings are enclosed in either single or double quotes - it doesn't matter which one you use. However, we recommend using **single quotes** over double quotes since you can create them with one less key press!" 571 | ] 572 | }, 573 | { 574 | "cell_type": "code", 575 | "execution_count": 22, 576 | "metadata": {}, 577 | "outputs": [ 578 | { 579 | "name": "stdout", 580 | "output_type": "stream", 581 | "text": [ 582 | "I love chocolate.\n", 583 | "I love chocolate.\n" 584 | ] 585 | } 586 | ], 587 | "source": [ 588 | "s1 = \"I love chocolate.\"\n", 589 | "s2 = 'I love chocolate.'\n", 590 | "print(s1)\n", 591 | "print(s2)" 592 | ] 593 | }, 594 | { 595 | "cell_type": "markdown", 596 | "metadata": {}, 597 | "source": [ 598 | "The exception here is that if you need to put a single quote in your string, you need to put the entire string inside double quotes and vice versa." 599 | ] 600 | }, 601 | { 602 | "cell_type": "code", 603 | "execution_count": 23, 604 | "metadata": {}, 605 | "outputs": [ 606 | { 607 | "name": "stdout", 608 | "output_type": "stream", 609 | "text": [ 610 | "Let's learn some Python.\n", 611 | "The most popular languages for data analysis are apparently \"R\" and \"Python\".\n" 612 | ] 613 | } 614 | ], 615 | "source": [ 616 | "s3 = \"Let's learn some Python.\"\n", 617 | "s4 = 'The most popular languages for data analysis are apparently \"R\" and \"Python\".'\n", 618 | "print(s3)\n", 619 | "print(s4)" 620 | ] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "metadata": {}, 625 | "source": [ 626 | "The `str()` function constructs a string from other compatible data types." 627 | ] 628 | }, 629 | { 630 | "cell_type": "code", 631 | "execution_count": 24, 632 | "metadata": {}, 633 | "outputs": [ 634 | { 635 | "name": "stdout", 636 | "output_type": "stream", 637 | "text": [ 638 | "Python\n" 639 | ] 640 | } 641 | ], 642 | "source": [ 643 | "s5 = str('Python')\n", 644 | "print(s5)" 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": 25, 650 | "metadata": {}, 651 | "outputs": [ 652 | { 653 | "data": { 654 | "text/plain": [ 655 | "str" 656 | ] 657 | }, 658 | "execution_count": 25, 659 | "metadata": {}, 660 | "output_type": "execute_result" 661 | } 662 | ], 663 | "source": [ 664 | "type(s5)" 665 | ] 666 | }, 667 | { 668 | "cell_type": "code", 669 | "execution_count": 26, 670 | "metadata": {}, 671 | "outputs": [ 672 | { 673 | "name": "stdout", 674 | "output_type": "stream", 675 | "text": [ 676 | "18.87\n" 677 | ] 678 | } 679 | ], 680 | "source": [ 681 | "s6 = str(18.87)\n", 682 | "print(s6)" 683 | ] 684 | }, 685 | { 686 | "cell_type": "code", 687 | "execution_count": 27, 688 | "metadata": {}, 689 | "outputs": [ 690 | { 691 | "data": { 692 | "text/plain": [ 693 | "str" 694 | ] 695 | }, 696 | "execution_count": 27, 697 | "metadata": {}, 698 | "output_type": "execute_result" 699 | } 700 | ], 701 | "source": [ 702 | "type(s6)" 703 | ] 704 | }, 705 | { 706 | "cell_type": "markdown", 707 | "metadata": {}, 708 | "source": [ 709 | "Starting Python *3.6*, you can use `f-strings` to put other variables inside strings. This is extremely handy." 710 | ] 711 | }, 712 | { 713 | "cell_type": "code", 714 | "execution_count": 28, 715 | "metadata": {}, 716 | "outputs": [ 717 | { 718 | "name": "stdout", 719 | "output_type": "stream", 720 | "text": [ 721 | "The value of pi is 3.14.\n" 722 | ] 723 | } 724 | ], 725 | "source": [ 726 | "name = 'pi'\n", 727 | "value = 3.14\n", 728 | "print(f'The value of {name} is {value}.')" 729 | ] 730 | }, 731 | { 732 | "cell_type": "markdown", 733 | "metadata": {}, 734 | "source": [ 735 | "The function `find()` returns the starting index of a given sequence of characters in the string. If not found, it returns -1." 736 | ] 737 | }, 738 | { 739 | "cell_type": "code", 740 | "execution_count": 29, 741 | "metadata": {}, 742 | "outputs": [ 743 | { 744 | "name": "stdout", 745 | "output_type": "stream", 746 | "text": [ 747 | "0\n", 748 | "-1\n" 749 | ] 750 | } 751 | ], 752 | "source": [ 753 | "print(s1.find('I'))\n", 754 | "print(s1.find('we'))" 755 | ] 756 | }, 757 | { 758 | "cell_type": "markdown", 759 | "metadata": {}, 760 | "source": [ 761 | "`startswith()` checks if a string starts with a particular sequence of characters." 762 | ] 763 | }, 764 | { 765 | "cell_type": "code", 766 | "execution_count": 30, 767 | "metadata": {}, 768 | "outputs": [ 769 | { 770 | "name": "stdout", 771 | "output_type": "stream", 772 | "text": [ 773 | "True\n" 774 | ] 775 | } 776 | ], 777 | "source": [ 778 | "print(s1.startswith('I love'))" 779 | ] 780 | }, 781 | { 782 | "cell_type": "markdown", 783 | "metadata": {}, 784 | "source": [ 785 | "`endswith()` checks if a string ends with a particular sequence of characters." 786 | ] 787 | }, 788 | { 789 | "cell_type": "code", 790 | "execution_count": 31, 791 | "metadata": {}, 792 | "outputs": [ 793 | { 794 | "name": "stdout", 795 | "output_type": "stream", 796 | "text": [ 797 | "True\n" 798 | ] 799 | } 800 | ], 801 | "source": [ 802 | "print(s1.endswith('.'))" 803 | ] 804 | }, 805 | { 806 | "cell_type": "markdown", 807 | "metadata": {}, 808 | "source": [ 809 | "`count()` counts the number of occurance of a sequence of characters in the given string." 810 | ] 811 | }, 812 | { 813 | "cell_type": "code", 814 | "execution_count": 32, 815 | "metadata": {}, 816 | "outputs": [ 817 | { 818 | "name": "stdout", 819 | "output_type": "stream", 820 | "text": [ 821 | "2\n", 822 | "0\n" 823 | ] 824 | } 825 | ], 826 | "source": [ 827 | "print(s1.count('e'))\n", 828 | "print(s1.count('ee'))" 829 | ] 830 | }, 831 | { 832 | "cell_type": "markdown", 833 | "metadata": {}, 834 | "source": [ 835 | "`lower()` converts any upper case to lower and `upper()` does vice versa." 836 | ] 837 | }, 838 | { 839 | "cell_type": "code", 840 | "execution_count": 33, 841 | "metadata": {}, 842 | "outputs": [ 843 | { 844 | "name": "stdout", 845 | "output_type": "stream", 846 | "text": [ 847 | "I love chocolate.\n", 848 | "i love chocolate.\n", 849 | "I LOVE CHOCOLATE.\n" 850 | ] 851 | } 852 | ], 853 | "source": [ 854 | "print(s1)\n", 855 | "print(s1.lower())\n", 856 | "print(s1.upper())" 857 | ] 858 | }, 859 | { 860 | "cell_type": "markdown", 861 | "metadata": {}, 862 | "source": [ 863 | "The function `replace()` replaces one substring with another one. You need to pay attention that Python strings are **immutable**, that is, you cannot change a string once it is defined. However, you can replace a substring and set the output to a new string (possibly the original string), as shown below." 864 | ] 865 | }, 866 | { 867 | "cell_type": "code", 868 | "execution_count": 34, 869 | "metadata": {}, 870 | "outputs": [ 871 | { 872 | "name": "stdout", 873 | "output_type": "stream", 874 | "text": [ 875 | "I love chocolate.\n", 876 | "We all love chocolate.\n" 877 | ] 878 | } 879 | ], 880 | "source": [ 881 | "s1.replace('I','We all')\n", 882 | "\n", 883 | "# Python strings are immutable!\n", 884 | "# You cannot change them in place\n", 885 | "print(s1)\n", 886 | "\n", 887 | "# But the following will change the string s1\n", 888 | "s1= s1.replace('I','We all')\n", 889 | "print(s1)" 890 | ] 891 | }, 892 | { 893 | "cell_type": "markdown", 894 | "metadata": {}, 895 | "source": [ 896 | "For other methods that are available for a string (or any other data structure), you can use the **tab completion feature** of Jupyter Notebook. Just define a string, put a dot, and then hit the `tab` button." 897 | ] 898 | }, 899 | { 900 | "cell_type": "code", 901 | "execution_count": 35, 902 | "metadata": {}, 903 | "outputs": [ 904 | { 905 | "data": { 906 | "text/plain": [ 907 | "'AbC'" 908 | ] 909 | }, 910 | "execution_count": 35, 911 | "metadata": {}, 912 | "output_type": "execute_result" 913 | } 914 | ], 915 | "source": [ 916 | "st = 'aBc'\n", 917 | "st.swapcase()" 918 | ] 919 | }, 920 | { 921 | "cell_type": "markdown", 922 | "metadata": {}, 923 | "source": [ 924 | "A string is actually of a **container type**, so you can combine two strings using the \"+\" sign." 925 | ] 926 | }, 927 | { 928 | "cell_type": "code", 929 | "execution_count": 36, 930 | "metadata": {}, 931 | "outputs": [ 932 | { 933 | "name": "stdout", 934 | "output_type": "stream", 935 | "text": [ 936 | "I love chocolate.\n" 937 | ] 938 | } 939 | ], 940 | "source": [ 941 | "first_string = 'I love '\n", 942 | "second_string = 'chocolate.'\n", 943 | "print(first_string + second_string)" 944 | ] 945 | }, 946 | { 947 | "cell_type": "markdown", 948 | "metadata": {}, 949 | "source": [ 950 | "### Booleans" 951 | ] 952 | }, 953 | { 954 | "cell_type": "markdown", 955 | "metadata": {}, 956 | "source": [ 957 | "Boolean represents logical values: True or False. The function `bool()` returns either `True` or `False` based on its input parameter. In particular, `bool()` will always return `True` unless its input parameter is one of the below:\n", 958 | "1. Empty (such as `[]` (empty list), `()` (empty tuple), `{}` (empty dictionary))\n", 959 | "2. False\n", 960 | "3. None (this is a Python keyword for undefined objects)\n", 961 | "4. 0\n", 962 | " \n", 963 | "**NOTE:** Python is a *case sensitive* programming language. Therefore, valid boolean values are `True` and `False`, not ~~`true`~~ or ~~`false`~~." 964 | ] 965 | }, 966 | { 967 | "cell_type": "code", 968 | "execution_count": 37, 969 | "metadata": {}, 970 | "outputs": [ 971 | { 972 | "data": { 973 | "text/plain": [ 974 | "True" 975 | ] 976 | }, 977 | "execution_count": 37, 978 | "metadata": {}, 979 | "output_type": "execute_result" 980 | } 981 | ], 982 | "source": [ 983 | "bool(100)" 984 | ] 985 | }, 986 | { 987 | "cell_type": "code", 988 | "execution_count": 38, 989 | "metadata": {}, 990 | "outputs": [ 991 | { 992 | "data": { 993 | "text/plain": [ 994 | "True" 995 | ] 996 | }, 997 | "execution_count": 38, 998 | "metadata": {}, 999 | "output_type": "execute_result" 1000 | } 1001 | ], 1002 | "source": [ 1003 | "bool(True)" 1004 | ] 1005 | }, 1006 | { 1007 | "cell_type": "code", 1008 | "execution_count": 39, 1009 | "metadata": {}, 1010 | "outputs": [ 1011 | { 1012 | "data": { 1013 | "text/plain": [ 1014 | "False" 1015 | ] 1016 | }, 1017 | "execution_count": 39, 1018 | "metadata": {}, 1019 | "output_type": "execute_result" 1020 | } 1021 | ], 1022 | "source": [ 1023 | "bool([])" 1024 | ] 1025 | }, 1026 | { 1027 | "cell_type": "code", 1028 | "execution_count": 40, 1029 | "metadata": {}, 1030 | "outputs": [ 1031 | { 1032 | "data": { 1033 | "text/plain": [ 1034 | "False" 1035 | ] 1036 | }, 1037 | "execution_count": 40, 1038 | "metadata": {}, 1039 | "output_type": "execute_result" 1040 | } 1041 | ], 1042 | "source": [ 1043 | "bool(False)" 1044 | ] 1045 | }, 1046 | { 1047 | "cell_type": "code", 1048 | "execution_count": 41, 1049 | "metadata": {}, 1050 | "outputs": [ 1051 | { 1052 | "data": { 1053 | "text/plain": [ 1054 | "False" 1055 | ] 1056 | }, 1057 | "execution_count": 41, 1058 | "metadata": {}, 1059 | "output_type": "execute_result" 1060 | } 1061 | ], 1062 | "source": [ 1063 | "bool(None)" 1064 | ] 1065 | }, 1066 | { 1067 | "cell_type": "code", 1068 | "execution_count": 42, 1069 | "metadata": {}, 1070 | "outputs": [ 1071 | { 1072 | "data": { 1073 | "text/plain": [ 1074 | "False" 1075 | ] 1076 | }, 1077 | "execution_count": 42, 1078 | "metadata": {}, 1079 | "output_type": "execute_result" 1080 | } 1081 | ], 1082 | "source": [ 1083 | "bool(0)" 1084 | ] 1085 | }, 1086 | { 1087 | "cell_type": "markdown", 1088 | "metadata": {}, 1089 | "source": [ 1090 | "## Comparison and Logical Operators" 1091 | ] 1092 | }, 1093 | { 1094 | "cell_type": "markdown", 1095 | "metadata": {}, 1096 | "source": [ 1097 | "In Python, comparison and logical operators allows you to evaluate a condition to a single *boolean* value, `True` or `False`." 1098 | ] 1099 | }, 1100 | { 1101 | "cell_type": "markdown", 1102 | "metadata": {}, 1103 | "source": [ 1104 | "### Comparison Operators" 1105 | ] 1106 | }, 1107 | { 1108 | "cell_type": "markdown", 1109 | "metadata": {}, 1110 | "source": [ 1111 | "| Operator | Meaning |\n", 1112 | "|:----:|:----:|\n", 1113 | "| **==** | True, if equal |\n", 1114 | "| **!=** | True, if not equal to |\n", 1115 | "| **<** | less than |\n", 1116 | "| **>** | greater than |\n", 1117 | "| **<=** | less than or equal to |\n", 1118 | "| **>=** | greater than or equal to |" 1119 | ] 1120 | }, 1121 | { 1122 | "cell_type": "code", 1123 | "execution_count": 43, 1124 | "metadata": {}, 1125 | "outputs": [], 1126 | "source": [ 1127 | "x = 1000\n", 1128 | "y = 2000" 1129 | ] 1130 | }, 1131 | { 1132 | "cell_type": "code", 1133 | "execution_count": 44, 1134 | "metadata": {}, 1135 | "outputs": [ 1136 | { 1137 | "name": "stdout", 1138 | "output_type": "stream", 1139 | "text": [ 1140 | "x == y: False\n", 1141 | "x != y: True\n", 1142 | "x < y: True\n", 1143 | "x > y: False\n", 1144 | "x <= y: True\n", 1145 | "x >= y: False\n" 1146 | ] 1147 | } 1148 | ], 1149 | "source": [ 1150 | "print('x == y:', x == y)\n", 1151 | "print('x != y:', x != y)\n", 1152 | "print('x < y:', x < y)\n", 1153 | "print('x > y:', x > y)\n", 1154 | "print('x <= y:', x <= y)\n", 1155 | "print('x >= y:', x >= y)" 1156 | ] 1157 | }, 1158 | { 1159 | "cell_type": "markdown", 1160 | "metadata": {}, 1161 | "source": [ 1162 | "You can also compare *strings* with comparison operators. " 1163 | ] 1164 | }, 1165 | { 1166 | "cell_type": "code", 1167 | "execution_count": 45, 1168 | "metadata": {}, 1169 | "outputs": [], 1170 | "source": [ 1171 | "string_1 = 'Python'\n", 1172 | "string_2 = 'PYTHON'" 1173 | ] 1174 | }, 1175 | { 1176 | "cell_type": "code", 1177 | "execution_count": 46, 1178 | "metadata": {}, 1179 | "outputs": [ 1180 | { 1181 | "name": "stdout", 1182 | "output_type": "stream", 1183 | "text": [ 1184 | "Python == PYTHON: False\n" 1185 | ] 1186 | } 1187 | ], 1188 | "source": [ 1189 | "print(string_1 + ' == ' + string_2 + ':', string_1 == string_2)" 1190 | ] 1191 | }, 1192 | { 1193 | "cell_type": "markdown", 1194 | "metadata": {}, 1195 | "source": [ 1196 | "Alternatively, for base types, instead `==` or `!=`, you can use `is` or `is not` respectively." 1197 | ] 1198 | }, 1199 | { 1200 | "cell_type": "code", 1201 | "execution_count": 47, 1202 | "metadata": {}, 1203 | "outputs": [ 1204 | { 1205 | "name": "stdout", 1206 | "output_type": "stream", 1207 | "text": [ 1208 | "x is y: False\n", 1209 | "x is not y: True\n" 1210 | ] 1211 | } 1212 | ], 1213 | "source": [ 1214 | "print('x is y:', x is y)\n", 1215 | "print('x is not y:', x is not y)" 1216 | ] 1217 | }, 1218 | { 1219 | "cell_type": "markdown", 1220 | "metadata": {}, 1221 | "source": [ 1222 | "### Logical Operators\n", 1223 | "\n", 1224 | "You can use logical operators to compare boolean values." 1225 | ] 1226 | }, 1227 | { 1228 | "cell_type": "markdown", 1229 | "metadata": {}, 1230 | "source": [ 1231 | "| Operator | Meaning |\n", 1232 | "|:----:|:----:|\n", 1233 | "| **and** | True, if both statements are true |\n", 1234 | "| **or** | True, if one of statements is true |\n", 1235 | "| **not** | False, if the result is true|" 1236 | ] 1237 | }, 1238 | { 1239 | "cell_type": "code", 1240 | "execution_count": 48, 1241 | "metadata": {}, 1242 | "outputs": [ 1243 | { 1244 | "data": { 1245 | "text/plain": [ 1246 | "True" 1247 | ] 1248 | }, 1249 | "execution_count": 48, 1250 | "metadata": {}, 1251 | "output_type": "execute_result" 1252 | } 1253 | ], 1254 | "source": [ 1255 | "(18 == 18) and (18 != -1)" 1256 | ] 1257 | }, 1258 | { 1259 | "cell_type": "code", 1260 | "execution_count": 49, 1261 | "metadata": {}, 1262 | "outputs": [ 1263 | { 1264 | "data": { 1265 | "text/plain": [ 1266 | "True" 1267 | ] 1268 | }, 1269 | "execution_count": 49, 1270 | "metadata": {}, 1271 | "output_type": "execute_result" 1272 | } 1273 | ], 1274 | "source": [ 1275 | "(10 < 15) or (19 > 20)" 1276 | ] 1277 | }, 1278 | { 1279 | "cell_type": "code", 1280 | "execution_count": 50, 1281 | "metadata": {}, 1282 | "outputs": [ 1283 | { 1284 | "data": { 1285 | "text/plain": [ 1286 | "True" 1287 | ] 1288 | }, 1289 | "execution_count": 50, 1290 | "metadata": {}, 1291 | "output_type": "execute_result" 1292 | } 1293 | ], 1294 | "source": [ 1295 | "not(1900 >= 2000)" 1296 | ] 1297 | }, 1298 | { 1299 | "cell_type": "markdown", 1300 | "metadata": {}, 1301 | "source": [ 1302 | "## Basic Mathematical Operations" 1303 | ] 1304 | }, 1305 | { 1306 | "cell_type": "markdown", 1307 | "metadata": {}, 1308 | "source": [ 1309 | "| Operator | Task Performed |\n", 1310 | "|----|---|\n", 1311 | "| + | Addition |\n", 1312 | "| - | Subtraction |\n", 1313 | "| / | Division |\n", 1314 | "| // | Floor division |\n", 1315 | "| * | Multiplication |\n", 1316 | "| % | Modulo |\n", 1317 | "| ** | Exponent |\n" 1318 | ] 1319 | }, 1320 | { 1321 | "cell_type": "code", 1322 | "execution_count": 51, 1323 | "metadata": {}, 1324 | "outputs": [ 1325 | { 1326 | "data": { 1327 | "text/plain": [ 1328 | "5" 1329 | ] 1330 | }, 1331 | "execution_count": 51, 1332 | "metadata": {}, 1333 | "output_type": "execute_result" 1334 | } 1335 | ], 1336 | "source": [ 1337 | "2 + 3" 1338 | ] 1339 | }, 1340 | { 1341 | "cell_type": "code", 1342 | "execution_count": 52, 1343 | "metadata": {}, 1344 | "outputs": [ 1345 | { 1346 | "data": { 1347 | "text/plain": [ 1348 | "7" 1349 | ] 1350 | }, 1351 | "execution_count": 52, 1352 | "metadata": {}, 1353 | "output_type": "execute_result" 1354 | } 1355 | ], 1356 | "source": [ 1357 | "20 - 13" 1358 | ] 1359 | }, 1360 | { 1361 | "cell_type": "code", 1362 | "execution_count": 53, 1363 | "metadata": {}, 1364 | "outputs": [ 1365 | { 1366 | "data": { 1367 | "text/plain": [ 1368 | "2.0" 1369 | ] 1370 | }, 1371 | "execution_count": 53, 1372 | "metadata": {}, 1373 | "output_type": "execute_result" 1374 | } 1375 | ], 1376 | "source": [ 1377 | "4/2" 1378 | ] 1379 | }, 1380 | { 1381 | "cell_type": "markdown", 1382 | "metadata": {}, 1383 | "source": [ 1384 | "The `/` operator always results in a float, even if the result is actually an integer." 1385 | ] 1386 | }, 1387 | { 1388 | "cell_type": "code", 1389 | "execution_count": 54, 1390 | "metadata": {}, 1391 | "outputs": [ 1392 | { 1393 | "name": "stdout", 1394 | "output_type": "stream", 1395 | "text": [ 1396 | "\n" 1397 | ] 1398 | } 1399 | ], 1400 | "source": [ 1401 | "print(type(4/2))" 1402 | ] 1403 | }, 1404 | { 1405 | "cell_type": "code", 1406 | "execution_count": 55, 1407 | "metadata": {}, 1408 | "outputs": [ 1409 | { 1410 | "data": { 1411 | "text/plain": [ 1412 | "1.3333333333333333" 1413 | ] 1414 | }, 1415 | "execution_count": 55, 1416 | "metadata": {}, 1417 | "output_type": "execute_result" 1418 | } 1419 | ], 1420 | "source": [ 1421 | "4/3" 1422 | ] 1423 | }, 1424 | { 1425 | "cell_type": "markdown", 1426 | "metadata": {}, 1427 | "source": [ 1428 | "The operator `//` results in an integer division such that only the integer part is kept and the result is of integer type." 1429 | ] 1430 | }, 1431 | { 1432 | "cell_type": "code", 1433 | "execution_count": 56, 1434 | "metadata": {}, 1435 | "outputs": [ 1436 | { 1437 | "data": { 1438 | "text/plain": [ 1439 | "1" 1440 | ] 1441 | }, 1442 | "execution_count": 56, 1443 | "metadata": {}, 1444 | "output_type": "execute_result" 1445 | } 1446 | ], 1447 | "source": [ 1448 | "4//3" 1449 | ] 1450 | }, 1451 | { 1452 | "cell_type": "code", 1453 | "execution_count": 57, 1454 | "metadata": {}, 1455 | "outputs": [ 1456 | { 1457 | "name": "stdout", 1458 | "output_type": "stream", 1459 | "text": [ 1460 | "\n" 1461 | ] 1462 | } 1463 | ], 1464 | "source": [ 1465 | "print(type(4//3))" 1466 | ] 1467 | }, 1468 | { 1469 | "cell_type": "code", 1470 | "execution_count": 58, 1471 | "metadata": {}, 1472 | "outputs": [ 1473 | { 1474 | "data": { 1475 | "text/plain": [ 1476 | "230" 1477 | ] 1478 | }, 1479 | "execution_count": 58, 1480 | "metadata": {}, 1481 | "output_type": "execute_result" 1482 | } 1483 | ], 1484 | "source": [ 1485 | "23*10" 1486 | ] 1487 | }, 1488 | { 1489 | "cell_type": "code", 1490 | "execution_count": 59, 1491 | "metadata": {}, 1492 | "outputs": [ 1493 | { 1494 | "data": { 1495 | "text/plain": [ 1496 | "3" 1497 | ] 1498 | }, 1499 | "execution_count": 59, 1500 | "metadata": {}, 1501 | "output_type": "execute_result" 1502 | } 1503 | ], 1504 | "source": [ 1505 | "23%10" 1506 | ] 1507 | }, 1508 | { 1509 | "cell_type": "code", 1510 | "execution_count": 60, 1511 | "metadata": {}, 1512 | "outputs": [ 1513 | { 1514 | "data": { 1515 | "text/plain": [ 1516 | "1024" 1517 | ] 1518 | }, 1519 | "execution_count": 60, 1520 | "metadata": {}, 1521 | "output_type": "execute_result" 1522 | } 1523 | ], 1524 | "source": [ 1525 | "2**10" 1526 | ] 1527 | }, 1528 | { 1529 | "cell_type": "markdown", 1530 | "metadata": {}, 1531 | "source": [ 1532 | "You can also use the `pow()` function to compute \"x to the power y\"." 1533 | ] 1534 | }, 1535 | { 1536 | "cell_type": "code", 1537 | "execution_count": 61, 1538 | "metadata": { 1539 | "scrolled": true 1540 | }, 1541 | "outputs": [ 1542 | { 1543 | "data": { 1544 | "text/plain": [ 1545 | "1024" 1546 | ] 1547 | }, 1548 | "execution_count": 61, 1549 | "metadata": {}, 1550 | "output_type": "execute_result" 1551 | } 1552 | ], 1553 | "source": [ 1554 | "pow(2, 10)" 1555 | ] 1556 | }, 1557 | { 1558 | "cell_type": "markdown", 1559 | "metadata": {}, 1560 | "source": [ 1561 | "`round()` simply rounds a number based on the specified number of decimals." 1562 | ] 1563 | }, 1564 | { 1565 | "cell_type": "code", 1566 | "execution_count": 62, 1567 | "metadata": {}, 1568 | "outputs": [ 1569 | { 1570 | "data": { 1571 | "text/plain": [ 1572 | "3.0" 1573 | ] 1574 | }, 1575 | "execution_count": 62, 1576 | "metadata": {}, 1577 | "output_type": "execute_result" 1578 | } 1579 | ], 1580 | "source": [ 1581 | "round(3.14159265359, 0)" 1582 | ] 1583 | }, 1584 | { 1585 | "cell_type": "code", 1586 | "execution_count": 63, 1587 | "metadata": {}, 1588 | "outputs": [ 1589 | { 1590 | "data": { 1591 | "text/plain": [ 1592 | "3.1" 1593 | ] 1594 | }, 1595 | "execution_count": 63, 1596 | "metadata": {}, 1597 | "output_type": "execute_result" 1598 | } 1599 | ], 1600 | "source": [ 1601 | "round(3.14159265359, 1)" 1602 | ] 1603 | }, 1604 | { 1605 | "cell_type": "code", 1606 | "execution_count": 64, 1607 | "metadata": {}, 1608 | "outputs": [ 1609 | { 1610 | "data": { 1611 | "text/plain": [ 1612 | "3.14" 1613 | ] 1614 | }, 1615 | "execution_count": 64, 1616 | "metadata": {}, 1617 | "output_type": "execute_result" 1618 | } 1619 | ], 1620 | "source": [ 1621 | "round(3.14159265359, 2)" 1622 | ] 1623 | }, 1624 | { 1625 | "cell_type": "markdown", 1626 | "metadata": {}, 1627 | "source": [ 1628 | "Expect to see some strange behavior with `round()` - for instance, rounding to an integer is done to the nearest even number! Unlike what you might expect, rounding is a very tricky business and it has even caused [fatalities](https://en.wikipedia.org/wiki/Round-off_error). For a detailed explanation for rounding in Python, please see [this](https://realpython.com/python-rounding/).\n", 1629 | " " 1630 | ] 1631 | }, 1632 | { 1633 | "cell_type": "code", 1634 | "execution_count": 65, 1635 | "metadata": {}, 1636 | "outputs": [ 1637 | { 1638 | "data": { 1639 | "text/plain": [ 1640 | "4.0" 1641 | ] 1642 | }, 1643 | "execution_count": 65, 1644 | "metadata": {}, 1645 | "output_type": "execute_result" 1646 | } 1647 | ], 1648 | "source": [ 1649 | "round(3.5, 0)" 1650 | ] 1651 | }, 1652 | { 1653 | "cell_type": "code", 1654 | "execution_count": 66, 1655 | "metadata": {}, 1656 | "outputs": [ 1657 | { 1658 | "data": { 1659 | "text/plain": [ 1660 | "4.0" 1661 | ] 1662 | }, 1663 | "execution_count": 66, 1664 | "metadata": {}, 1665 | "output_type": "execute_result" 1666 | } 1667 | ], 1668 | "source": [ 1669 | "round(4.5, 0)" 1670 | ] 1671 | }, 1672 | { 1673 | "cell_type": "markdown", 1674 | "metadata": {}, 1675 | "source": [ 1676 | "`abs()` returns the absolute value of its input parameter." 1677 | ] 1678 | }, 1679 | { 1680 | "cell_type": "code", 1681 | "execution_count": 67, 1682 | "metadata": {}, 1683 | "outputs": [ 1684 | { 1685 | "data": { 1686 | "text/plain": [ 1687 | "3.4" 1688 | ] 1689 | }, 1690 | "execution_count": 67, 1691 | "metadata": {}, 1692 | "output_type": "execute_result" 1693 | } 1694 | ], 1695 | "source": [ 1696 | "abs(-3.4)" 1697 | ] 1698 | }, 1699 | { 1700 | "cell_type": "markdown", 1701 | "metadata": {}, 1702 | "source": [ 1703 | "## Containers\n", 1704 | "\n", 1705 | "Python defines two types of containers: \n", 1706 | "- Ordered sequences (lists, tuples, and strings)\n", 1707 | "- Key containers (dictionaries and sets)." 1708 | ] 1709 | }, 1710 | { 1711 | "cell_type": "markdown", 1712 | "metadata": {}, 1713 | "source": [ 1714 | "### Lists" 1715 | ] 1716 | }, 1717 | { 1718 | "cell_type": "markdown", 1719 | "metadata": {}, 1720 | "source": [ 1721 | "A Python *list* is an ordered sequence of elements that is enclosed in square brackets and separated by a comma. You can access any of these elements by simply referring to its index value.\n", 1722 | "\n", 1723 | "You can put any combination of data types into a list. Lists are declared by `[]` or `list()`." 1724 | ] 1725 | }, 1726 | { 1727 | "cell_type": "code", 1728 | "execution_count": 68, 1729 | "metadata": {}, 1730 | "outputs": [ 1731 | { 1732 | "name": "stdout", 1733 | "output_type": "stream", 1734 | "text": [ 1735 | " \n" 1736 | ] 1737 | } 1738 | ], 1739 | "source": [ 1740 | "# create an empty list\n", 1741 | "list0 = []\n", 1742 | "list1 = list()\n", 1743 | "print(type(list0), type(list1))" 1744 | ] 1745 | }, 1746 | { 1747 | "cell_type": "code", 1748 | "execution_count": 69, 1749 | "metadata": {}, 1750 | "outputs": [ 1751 | { 1752 | "name": "stdout", 1753 | "output_type": "stream", 1754 | "text": [ 1755 | "[1, 1.23, True, 'hello', None]\n" 1756 | ] 1757 | } 1758 | ], 1759 | "source": [ 1760 | "lst = [1, 1.23, True, 'hello', None]\n", 1761 | "print(lst)" 1762 | ] 1763 | }, 1764 | { 1765 | "cell_type": "code", 1766 | "execution_count": 70, 1767 | "metadata": {}, 1768 | "outputs": [ 1769 | { 1770 | "name": "stdout", 1771 | "output_type": "stream", 1772 | "text": [ 1773 | "[1, 1.23, True, 'hello', None]\n" 1774 | ] 1775 | } 1776 | ], 1777 | "source": [ 1778 | "lst = list((1, 1.23, True, 'hello', None)) # notice the double round brackets\n", 1779 | "print(lst)" 1780 | ] 1781 | }, 1782 | { 1783 | "cell_type": "code", 1784 | "execution_count": 71, 1785 | "metadata": {}, 1786 | "outputs": [], 1787 | "source": [ 1788 | "cars = ['Toyota', 'Mercedes', 'Ford']" 1789 | ] 1790 | }, 1791 | { 1792 | "cell_type": "code", 1793 | "execution_count": 72, 1794 | "metadata": {}, 1795 | "outputs": [ 1796 | { 1797 | "name": "stdout", 1798 | "output_type": "stream", 1799 | "text": [ 1800 | "['Toyota', 'Mercedes', 'Ford']\n" 1801 | ] 1802 | } 1803 | ], 1804 | "source": [ 1805 | "print(cars)" 1806 | ] 1807 | }, 1808 | { 1809 | "cell_type": "markdown", 1810 | "metadata": {}, 1811 | "source": [ 1812 | "**NOTE:** In Python, indexing starts from 0. Thus, for instance, the list `cars` will have *Toyota* at 0 index, *Mercedes* at 1 index, and *Ford* at 2 index." 1813 | ] 1814 | }, 1815 | { 1816 | "cell_type": "code", 1817 | "execution_count": 73, 1818 | "metadata": { 1819 | "scrolled": true 1820 | }, 1821 | "outputs": [ 1822 | { 1823 | "data": { 1824 | "text/plain": [ 1825 | "'Toyota'" 1826 | ] 1827 | }, 1828 | "execution_count": 73, 1829 | "metadata": {}, 1830 | "output_type": "execute_result" 1831 | } 1832 | ], 1833 | "source": [ 1834 | "cars[0]" 1835 | ] 1836 | }, 1837 | { 1838 | "cell_type": "markdown", 1839 | "metadata": {}, 1840 | "source": [ 1841 | "Indexing in reverse order is also possible. For instance, if you want to access Ford, the last element in `cars`, the index would be -1. And index -2 will be Mercedes, and finally index -3 will be Toyota." 1842 | ] 1843 | }, 1844 | { 1845 | "cell_type": "code", 1846 | "execution_count": 74, 1847 | "metadata": {}, 1848 | "outputs": [ 1849 | { 1850 | "data": { 1851 | "text/plain": [ 1852 | "'Ford'" 1853 | ] 1854 | }, 1855 | "execution_count": 74, 1856 | "metadata": {}, 1857 | "output_type": "execute_result" 1858 | } 1859 | ], 1860 | "source": [ 1861 | "cars[-1]" 1862 | ] 1863 | }, 1864 | { 1865 | "cell_type": "markdown", 1866 | "metadata": {}, 1867 | "source": [ 1868 | "Indexing is limited to accessing a single element. Slicing, on the other hand, is accessing a sequence of elements inside the list. " 1869 | ] 1870 | }, 1871 | { 1872 | "cell_type": "code", 1873 | "execution_count": 75, 1874 | "metadata": {}, 1875 | "outputs": [ 1876 | { 1877 | "name": "stdout", 1878 | "output_type": "stream", 1879 | "text": [ 1880 | "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n" 1881 | ] 1882 | } 1883 | ], 1884 | "source": [ 1885 | "# pay attention to the range() function\n", 1886 | "# and how we use list() to convert the output to a proper list\n", 1887 | "num = list(range(0,10))\n", 1888 | "print(num)" 1889 | ] 1890 | }, 1891 | { 1892 | "cell_type": "code", 1893 | "execution_count": 76, 1894 | "metadata": {}, 1895 | "outputs": [ 1896 | { 1897 | "name": "stdout", 1898 | "output_type": "stream", 1899 | "text": [ 1900 | "[0, 1, 2, 3]\n", 1901 | "[4, 5, 6, 7, 8, 9]\n" 1902 | ] 1903 | } 1904 | ], 1905 | "source": [ 1906 | "print(num[0:4])\n", 1907 | "print(num[4:])" 1908 | ] 1909 | }, 1910 | { 1911 | "cell_type": "code", 1912 | "execution_count": 77, 1913 | "metadata": {}, 1914 | "outputs": [ 1915 | { 1916 | "name": "stdout", 1917 | "output_type": "stream", 1918 | "text": [ 1919 | "[7, 8, 9]\n" 1920 | ] 1921 | } 1922 | ], 1923 | "source": [ 1924 | "print(num[-3:]) # get the 3 elements from the end" 1925 | ] 1926 | }, 1927 | { 1928 | "cell_type": "markdown", 1929 | "metadata": {}, 1930 | "source": [ 1931 | "It is also possible to slice a parent list with a step length." 1932 | ] 1933 | }, 1934 | { 1935 | "cell_type": "code", 1936 | "execution_count": 78, 1937 | "metadata": {}, 1938 | "outputs": [ 1939 | { 1940 | "data": { 1941 | "text/plain": [ 1942 | "[0, 3, 6]" 1943 | ] 1944 | }, 1945 | "execution_count": 78, 1946 | "metadata": {}, 1947 | "output_type": "execute_result" 1948 | } 1949 | ], 1950 | "source": [ 1951 | "num[0:9:3]" 1952 | ] 1953 | }, 1954 | { 1955 | "cell_type": "code", 1956 | "execution_count": 79, 1957 | "metadata": {}, 1958 | "outputs": [ 1959 | { 1960 | "data": { 1961 | "text/plain": [ 1962 | "[0, 5]" 1963 | ] 1964 | }, 1965 | "execution_count": 79, 1966 | "metadata": {}, 1967 | "output_type": "execute_result" 1968 | } 1969 | ], 1970 | "source": [ 1971 | "num[0:9:5]" 1972 | ] 1973 | }, 1974 | { 1975 | "cell_type": "markdown", 1976 | "metadata": {}, 1977 | "source": [ 1978 | "As in lists, you can also access any character in a string through indexing and slicing." 1979 | ] 1980 | }, 1981 | { 1982 | "cell_type": "code", 1983 | "execution_count": 80, 1984 | "metadata": {}, 1985 | "outputs": [ 1986 | { 1987 | "name": "stdout", 1988 | "output_type": "stream", 1989 | "text": [ 1990 | "P\n", 1991 | "ython\n" 1992 | ] 1993 | } 1994 | ], 1995 | "source": [ 1996 | "name = 'Python'\n", 1997 | "print(name[0])\n", 1998 | "print(name[1:])" 1999 | ] 2000 | }, 2001 | { 2002 | "cell_type": "markdown", 2003 | "metadata": {}, 2004 | "source": [ 2005 | "However, you cannot modify a string as strings are immutable." 2006 | ] 2007 | }, 2008 | { 2009 | "cell_type": "code", 2010 | "execution_count": 81, 2011 | "metadata": {}, 2012 | "outputs": [], 2013 | "source": [ 2014 | "# This will not work: name[3] = 'y'" 2015 | ] 2016 | }, 2017 | { 2018 | "cell_type": "markdown", 2019 | "metadata": {}, 2020 | "source": [ 2021 | "#### Equal vs. Identical Lists\n", 2022 | "\n", 2023 | "Pay attention to *equal* vs. *identical* for two lists (or variables of other container types in general). The equality operator \"==\" checks whether two variables have the same contents. The \"is\" operator checks whether two variables are identical, that is, if they point to the same address in the memory. As the example below shows, two variables might have the same content, but they might be pointing to different addresses in the memory. (This distinction is only for container types; for base types (such as integers and strings), there is no difference between being equal and identical.)" 2024 | ] 2025 | }, 2026 | { 2027 | "cell_type": "code", 2028 | "execution_count": 82, 2029 | "metadata": {}, 2030 | "outputs": [ 2031 | { 2032 | "name": "stdout", 2033 | "output_type": "stream", 2034 | "text": [ 2035 | "x == y: True\n", 2036 | "z == y: True\n", 2037 | "x is y: True\n", 2038 | "z is y: False\n" 2039 | ] 2040 | } 2041 | ], 2042 | "source": [ 2043 | "x = y = [1, 2, 3] # both x and y point to the same address in the memory, so they are identical\n", 2044 | "z = [1, 2, 3] # z has the same content as x and y, but it points to a different address\n", 2045 | "\n", 2046 | "print('x == y:', x == y)\n", 2047 | "print('z == y:', z == y)\n", 2048 | "print('x is y:', x is y)\n", 2049 | "\n", 2050 | "# this is False as x and z point to different addresses \n", 2051 | "# in the memory even though they have the same contents\n", 2052 | "print('z is y:', z is y) " 2053 | ] 2054 | }, 2055 | { 2056 | "cell_type": "markdown", 2057 | "metadata": {}, 2058 | "source": [ 2059 | "#### Operations on Lists" 2060 | ] 2061 | }, 2062 | { 2063 | "cell_type": "markdown", 2064 | "metadata": {}, 2065 | "source": [ 2066 | "`append()` adds a element to the end of the list." 2067 | ] 2068 | }, 2069 | { 2070 | "cell_type": "code", 2071 | "execution_count": 83, 2072 | "metadata": {}, 2073 | "outputs": [ 2074 | { 2075 | "name": "stdout", 2076 | "output_type": "stream", 2077 | "text": [ 2078 | "[1, 2, 3, 4, 5]\n", 2079 | "[1, 2, 3, 4, 5, 6]\n" 2080 | ] 2081 | } 2082 | ], 2083 | "source": [ 2084 | "lst = [1, 2, 3, 4, 5]\n", 2085 | "print(lst)\n", 2086 | "\n", 2087 | "lst.append(6)\n", 2088 | "print(lst)" 2089 | ] 2090 | }, 2091 | { 2092 | "cell_type": "markdown", 2093 | "metadata": {}, 2094 | "source": [ 2095 | "`extend()` adds another list at the end." 2096 | ] 2097 | }, 2098 | { 2099 | "cell_type": "code", 2100 | "execution_count": 84, 2101 | "metadata": {}, 2102 | "outputs": [ 2103 | { 2104 | "name": "stdout", 2105 | "output_type": "stream", 2106 | "text": [ 2107 | "[1, 2, 3, 4, 5, 6, 7, 8]\n" 2108 | ] 2109 | } 2110 | ], 2111 | "source": [ 2112 | "lst.extend([7, 8])\n", 2113 | "print(lst)" 2114 | ] 2115 | }, 2116 | { 2117 | "cell_type": "markdown", 2118 | "metadata": {}, 2119 | "source": [ 2120 | "Alternatively, you can use `+` to combine multiple lists (or multiple strings)." 2121 | ] 2122 | }, 2123 | { 2124 | "cell_type": "code", 2125 | "execution_count": 85, 2126 | "metadata": {}, 2127 | "outputs": [ 2128 | { 2129 | "name": "stdout", 2130 | "output_type": "stream", 2131 | "text": [ 2132 | "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n" 2133 | ] 2134 | } 2135 | ], 2136 | "source": [ 2137 | "lst = lst + [9, 10]\n", 2138 | "print(lst)" 2139 | ] 2140 | }, 2141 | { 2142 | "cell_type": "markdown", 2143 | "metadata": {}, 2144 | "source": [ 2145 | "If you want to insert an item at the position you specify, use ` insert(x, y)`. Remember, ` append()` can insert the element only at the end." 2146 | ] 2147 | }, 2148 | { 2149 | "cell_type": "code", 2150 | "execution_count": 86, 2151 | "metadata": {}, 2152 | "outputs": [ 2153 | { 2154 | "name": "stdout", 2155 | "output_type": "stream", 2156 | "text": [ 2157 | "[1, 2, 3, 4, 5, 'Python', 6, 7, 8, 9, 10]\n" 2158 | ] 2159 | } 2160 | ], 2161 | "source": [ 2162 | "lst.insert(5, 'Python')\n", 2163 | "print(lst)" 2164 | ] 2165 | }, 2166 | { 2167 | "cell_type": "markdown", 2168 | "metadata": {}, 2169 | "source": [ 2170 | "You can use `remove()` to remove the first occurance of an element by specifying the element itself using the function." 2171 | ] 2172 | }, 2173 | { 2174 | "cell_type": "code", 2175 | "execution_count": 87, 2176 | "metadata": {}, 2177 | "outputs": [ 2178 | { 2179 | "name": "stdout", 2180 | "output_type": "stream", 2181 | "text": [ 2182 | "[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n" 2183 | ] 2184 | } 2185 | ], 2186 | "source": [ 2187 | "lst.remove('Python')\n", 2188 | "print(lst)" 2189 | ] 2190 | }, 2191 | { 2192 | "cell_type": "markdown", 2193 | "metadata": {}, 2194 | "source": [ 2195 | "` sort()` method arranges the elements in ascending order **in place**. That is, the original list is updated with the new order. You can sort all numerical or all string lists, but not a mix of them." 2196 | ] 2197 | }, 2198 | { 2199 | "cell_type": "code", 2200 | "execution_count": 88, 2201 | "metadata": {}, 2202 | "outputs": [ 2203 | { 2204 | "name": "stdout", 2205 | "output_type": "stream", 2206 | "text": [ 2207 | "[1.23, 3, 5]\n" 2208 | ] 2209 | } 2210 | ], 2211 | "source": [ 2212 | "lst_num = [3, 5, 1.23]\n", 2213 | "lst_num.sort()\n", 2214 | "print(lst_num)" 2215 | ] 2216 | }, 2217 | { 2218 | "cell_type": "code", 2219 | "execution_count": 89, 2220 | "metadata": {}, 2221 | "outputs": [ 2222 | { 2223 | "name": "stdout", 2224 | "output_type": "stream", 2225 | "text": [ 2226 | "['hello', 'world']\n", 2227 | "['world', 'hello']\n" 2228 | ] 2229 | } 2230 | ], 2231 | "source": [ 2232 | "lst_str = ['hello', 'world']\n", 2233 | "print(lst_str)\n", 2234 | "lst_str.sort(reverse=True)\n", 2235 | "print(lst_str)" 2236 | ] 2237 | }, 2238 | { 2239 | "cell_type": "code", 2240 | "execution_count": 90, 2241 | "metadata": {}, 2242 | "outputs": [], 2243 | "source": [ 2244 | "lst_mix = [3, 5, 1.23, 'hello']\n", 2245 | "# this will not work: lst_mix.sort()" 2246 | ] 2247 | }, 2248 | { 2249 | "cell_type": "markdown", 2250 | "metadata": {}, 2251 | "source": [ 2252 | "For reversing a list in place, use `reverse()`" 2253 | ] 2254 | }, 2255 | { 2256 | "cell_type": "code", 2257 | "execution_count": 91, 2258 | "metadata": {}, 2259 | "outputs": [ 2260 | { 2261 | "name": "stdout", 2262 | "output_type": "stream", 2263 | "text": [ 2264 | "['hello', 1.23, 5, 3]\n" 2265 | ] 2266 | } 2267 | ], 2268 | "source": [ 2269 | "lst_mix.reverse()\n", 2270 | "print(lst_mix)" 2271 | ] 2272 | }, 2273 | { 2274 | "cell_type": "markdown", 2275 | "metadata": {}, 2276 | "source": [ 2277 | "If you do not want to modify the original list, use `sorted()` and `reversed()` and set them equal to a new list." 2278 | ] 2279 | }, 2280 | { 2281 | "cell_type": "code", 2282 | "execution_count": 92, 2283 | "metadata": {}, 2284 | "outputs": [ 2285 | { 2286 | "name": "stdout", 2287 | "output_type": "stream", 2288 | "text": [ 2289 | "original: [3, 5, 1]\n", 2290 | "sorted: [1, 3, 5]\n", 2291 | "just reversed: \n", 2292 | "reversed and re-listed: [1, 5, 3]\n" 2293 | ] 2294 | } 2295 | ], 2296 | "source": [ 2297 | "lst = [3, 5, 1]\n", 2298 | "lst_new = sorted(lst)\n", 2299 | "print('original:', lst)\n", 2300 | "print('sorted:', lst_new)\n", 2301 | "lst_reversed = reversed(lst) # this returns an interator, not a list!\n", 2302 | "print('just reversed:', lst_reversed)\n", 2303 | "lst_reversed_list = list(lst_reversed)\n", 2304 | "print('reversed and re-listed:', lst_reversed_list)\n" 2305 | ] 2306 | }, 2307 | { 2308 | "cell_type": "markdown", 2309 | "metadata": {}, 2310 | "source": [ 2311 | "`count()` counts the number of a particular element that is present in the list. If there is none, it will return 0." 2312 | ] 2313 | }, 2314 | { 2315 | "cell_type": "code", 2316 | "execution_count": 93, 2317 | "metadata": {}, 2318 | "outputs": [ 2319 | { 2320 | "data": { 2321 | "text/plain": [ 2322 | "1" 2323 | ] 2324 | }, 2325 | "execution_count": 93, 2326 | "metadata": {}, 2327 | "output_type": "execute_result" 2328 | } 2329 | ], 2330 | "source": [ 2331 | "lst.count(1)" 2332 | ] 2333 | }, 2334 | { 2335 | "cell_type": "markdown", 2336 | "metadata": {}, 2337 | "source": [ 2338 | "`index()` finds the index of a particular element. Note that if there are multiple elements of the same value then this will return the first index. if there is none, it will throw an error." 2339 | ] 2340 | }, 2341 | { 2342 | "cell_type": "code", 2343 | "execution_count": 94, 2344 | "metadata": {}, 2345 | "outputs": [ 2346 | { 2347 | "data": { 2348 | "text/plain": [ 2349 | "[3, 5, 1]" 2350 | ] 2351 | }, 2352 | "execution_count": 94, 2353 | "metadata": {}, 2354 | "output_type": "execute_result" 2355 | } 2356 | ], 2357 | "source": [ 2358 | "lst" 2359 | ] 2360 | }, 2361 | { 2362 | "cell_type": "code", 2363 | "execution_count": 95, 2364 | "metadata": {}, 2365 | "outputs": [ 2366 | { 2367 | "data": { 2368 | "text/plain": [ 2369 | "2" 2370 | ] 2371 | }, 2372 | "execution_count": 95, 2373 | "metadata": {}, 2374 | "output_type": "execute_result" 2375 | } 2376 | ], 2377 | "source": [ 2378 | "lst.index(1)" 2379 | ] 2380 | }, 2381 | { 2382 | "cell_type": "markdown", 2383 | "metadata": {}, 2384 | "source": [ 2385 | "For other methods that are available for a list (or any other object), you can use the **tab completion feature** of Jupyter Notebook. Just define a list, put a dot, and then hit the `tab` button." 2386 | ] 2387 | }, 2388 | { 2389 | "cell_type": "code", 2390 | "execution_count": 96, 2391 | "metadata": {}, 2392 | "outputs": [], 2393 | "source": [ 2394 | "lst.clear()" 2395 | ] 2396 | }, 2397 | { 2398 | "cell_type": "markdown", 2399 | "metadata": {}, 2400 | "source": [ 2401 | "If you want your list to be immutable, that is unchangable, use the **tuple** container. You can define a tuple by `()` or `tuple()`." 2402 | ] 2403 | }, 2404 | { 2405 | "cell_type": "code", 2406 | "execution_count": 97, 2407 | "metadata": {}, 2408 | "outputs": [], 2409 | "source": [ 2410 | "tpl = (1, 2, 3)\n", 2411 | "# You cannot change a tuple. For instance, try tpl[0] = 3.14" 2412 | ] 2413 | }, 2414 | { 2415 | "cell_type": "markdown", 2416 | "metadata": {}, 2417 | "source": [ 2418 | "If you want a set in a mathematical sense, use the **set** container. You can define a set by `set()`. Python has a rich collection of methods for sets such as union, intersection, set difference, etc." 2419 | ] 2420 | }, 2421 | { 2422 | "cell_type": "code", 2423 | "execution_count": 98, 2424 | "metadata": {}, 2425 | "outputs": [ 2426 | { 2427 | "name": "stdout", 2428 | "output_type": "stream", 2429 | "text": [ 2430 | "{1, 2}\n" 2431 | ] 2432 | } 2433 | ], 2434 | "source": [ 2435 | "st = set([1, 1, 1, 2, 2, 2, 2])\n", 2436 | "print(st)" 2437 | ] 2438 | }, 2439 | { 2440 | "cell_type": "markdown", 2441 | "metadata": {}, 2442 | "source": [ 2443 | "### Dictionaries" 2444 | ] 2445 | }, 2446 | { 2447 | "cell_type": "markdown", 2448 | "metadata": {}, 2449 | "source": [ 2450 | "Dictionaries are like a lookup table. A dictionary consists of \"key: value\" pairs. To define a dictionary, you can use either `{}` or `dict()`. " 2451 | ] 2452 | }, 2453 | { 2454 | "cell_type": "code", 2455 | "execution_count": 99, 2456 | "metadata": {}, 2457 | "outputs": [ 2458 | { 2459 | "name": "stdout", 2460 | "output_type": "stream", 2461 | "text": [ 2462 | " \n" 2463 | ] 2464 | } 2465 | ], 2466 | "source": [ 2467 | "# create an empty dictionary\n", 2468 | "dict0 = {}\n", 2469 | "dict1 = dict()\n", 2470 | "print(type(dict0), type(dict1))" 2471 | ] 2472 | }, 2473 | { 2474 | "cell_type": "code", 2475 | "execution_count": 100, 2476 | "metadata": {}, 2477 | "outputs": [ 2478 | { 2479 | "name": "stdout", 2480 | "output_type": "stream", 2481 | "text": [ 2482 | "{'One': 1, 'Two': 2, 'Three': 3}\n" 2483 | ] 2484 | } 2485 | ], 2486 | "source": [ 2487 | "dict0 = {}\n", 2488 | "dict0['One'] = 1\n", 2489 | "dict0['Two'] = 2 \n", 2490 | "dict0['Three'] = 3\n", 2491 | "print(dict0)" 2492 | ] 2493 | }, 2494 | { 2495 | "cell_type": "markdown", 2496 | "metadata": {}, 2497 | "source": [ 2498 | "An alternative way to define a dictionary is below." 2499 | ] 2500 | }, 2501 | { 2502 | "cell_type": "code", 2503 | "execution_count": 101, 2504 | "metadata": {}, 2505 | "outputs": [ 2506 | { 2507 | "name": "stdout", 2508 | "output_type": "stream", 2509 | "text": [ 2510 | "{'One': 1, 'Two': 2, 'Three': 3}\n" 2511 | ] 2512 | } 2513 | ], 2514 | "source": [ 2515 | "dict1 = {'One': 1, 'Two': 2, 'Three': 3}\n", 2516 | "print(dict1)" 2517 | ] 2518 | }, 2519 | { 2520 | "cell_type": "markdown", 2521 | "metadata": {}, 2522 | "source": [ 2523 | "You can access the value '3' via the key 'Three'." 2524 | ] 2525 | }, 2526 | { 2527 | "cell_type": "code", 2528 | "execution_count": 102, 2529 | "metadata": { 2530 | "scrolled": true 2531 | }, 2532 | "outputs": [ 2533 | { 2534 | "name": "stdout", 2535 | "output_type": "stream", 2536 | "text": [ 2537 | "3\n" 2538 | ] 2539 | } 2540 | ], 2541 | "source": [ 2542 | "print(dict0['Three'])" 2543 | ] 2544 | }, 2545 | { 2546 | "cell_type": "markdown", 2547 | "metadata": {}, 2548 | "source": [ 2549 | "#### Operations on Dictionaries" 2550 | ] 2551 | }, 2552 | { 2553 | "cell_type": "markdown", 2554 | "metadata": {}, 2555 | "source": [ 2556 | "`values()` returns a list of values in a dictionary." 2557 | ] 2558 | }, 2559 | { 2560 | "cell_type": "code", 2561 | "execution_count": 103, 2562 | "metadata": {}, 2563 | "outputs": [ 2564 | { 2565 | "data": { 2566 | "text/plain": [ 2567 | "dict_values([1, 2, 3])" 2568 | ] 2569 | }, 2570 | "execution_count": 103, 2571 | "metadata": {}, 2572 | "output_type": "execute_result" 2573 | } 2574 | ], 2575 | "source": [ 2576 | "dict0.values()" 2577 | ] 2578 | }, 2579 | { 2580 | "cell_type": "markdown", 2581 | "metadata": {}, 2582 | "source": [ 2583 | "`keys()` returns all the keys in a dictionary." 2584 | ] 2585 | }, 2586 | { 2587 | "cell_type": "code", 2588 | "execution_count": 104, 2589 | "metadata": {}, 2590 | "outputs": [ 2591 | { 2592 | "data": { 2593 | "text/plain": [ 2594 | "dict_keys(['One', 'Two', 'Three'])" 2595 | ] 2596 | }, 2597 | "execution_count": 104, 2598 | "metadata": {}, 2599 | "output_type": "execute_result" 2600 | } 2601 | ], 2602 | "source": [ 2603 | "dict0.keys()" 2604 | ] 2605 | }, 2606 | { 2607 | "cell_type": "markdown", 2608 | "metadata": {}, 2609 | "source": [ 2610 | "`items()` returns the list with all dictionary keys with values." 2611 | ] 2612 | }, 2613 | { 2614 | "cell_type": "code", 2615 | "execution_count": 105, 2616 | "metadata": {}, 2617 | "outputs": [ 2618 | { 2619 | "data": { 2620 | "text/plain": [ 2621 | "dict_items([('One', 1), ('Two', 2), ('Three', 3)])" 2622 | ] 2623 | }, 2624 | "execution_count": 105, 2625 | "metadata": {}, 2626 | "output_type": "execute_result" 2627 | } 2628 | ], 2629 | "source": [ 2630 | "dict0.items()" 2631 | ] 2632 | }, 2633 | { 2634 | "cell_type": "markdown", 2635 | "metadata": {}, 2636 | "source": [ 2637 | "` update()` inserts items to a dictionary." 2638 | ] 2639 | }, 2640 | { 2641 | "cell_type": "code", 2642 | "execution_count": 106, 2643 | "metadata": {}, 2644 | "outputs": [ 2645 | { 2646 | "data": { 2647 | "text/plain": [ 2648 | "{'One': 1, 'Two': 2, 'Three': 3, 'Four': 4}" 2649 | ] 2650 | }, 2651 | "execution_count": 106, 2652 | "metadata": {}, 2653 | "output_type": "execute_result" 2654 | } 2655 | ], 2656 | "source": [ 2657 | "dict1 = {'Four': 4}\n", 2658 | "dict0.update(dict1)\n", 2659 | "dict0" 2660 | ] 2661 | }, 2662 | { 2663 | "cell_type": "markdown", 2664 | "metadata": {}, 2665 | "source": [ 2666 | "`clear()` clears the entire dictionary." 2667 | ] 2668 | }, 2669 | { 2670 | "cell_type": "code", 2671 | "execution_count": 107, 2672 | "metadata": {}, 2673 | "outputs": [ 2674 | { 2675 | "name": "stdout", 2676 | "output_type": "stream", 2677 | "text": [ 2678 | "{}\n" 2679 | ] 2680 | } 2681 | ], 2682 | "source": [ 2683 | "dict0.clear()\n", 2684 | "print(dict0)" 2685 | ] 2686 | }, 2687 | { 2688 | "cell_type": "markdown", 2689 | "metadata": {}, 2690 | "source": [ 2691 | "### Common Operations on Containers" 2692 | ] 2693 | }, 2694 | { 2695 | "cell_type": "code", 2696 | "execution_count": 108, 2697 | "metadata": {}, 2698 | "outputs": [ 2699 | { 2700 | "name": "stdout", 2701 | "output_type": "stream", 2702 | "text": [ 2703 | "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]\n" 2704 | ] 2705 | } 2706 | ], 2707 | "source": [ 2708 | "num = list(range(10))\n", 2709 | "print(num)" 2710 | ] 2711 | }, 2712 | { 2713 | "cell_type": "markdown", 2714 | "metadata": {}, 2715 | "source": [ 2716 | "To find the length of the list, that is, the number of elements in a list, use the `len()` method. For dictionaries, this method will return the total number of items.\n" 2717 | ] 2718 | }, 2719 | { 2720 | "cell_type": "code", 2721 | "execution_count": 109, 2722 | "metadata": {}, 2723 | "outputs": [ 2724 | { 2725 | "data": { 2726 | "text/plain": [ 2727 | "10" 2728 | ] 2729 | }, 2730 | "execution_count": 109, 2731 | "metadata": {}, 2732 | "output_type": "execute_result" 2733 | } 2734 | ], 2735 | "source": [ 2736 | "len(num)" 2737 | ] 2738 | }, 2739 | { 2740 | "cell_type": "code", 2741 | "execution_count": 110, 2742 | "metadata": {}, 2743 | "outputs": [ 2744 | { 2745 | "data": { 2746 | "text/plain": [ 2747 | "0" 2748 | ] 2749 | }, 2750 | "execution_count": 110, 2751 | "metadata": {}, 2752 | "output_type": "execute_result" 2753 | } 2754 | ], 2755 | "source": [ 2756 | "len(dict0)" 2757 | ] 2758 | }, 2759 | { 2760 | "cell_type": "markdown", 2761 | "metadata": {}, 2762 | "source": [ 2763 | "If a list consists of all numeric or all string elements, then `min()` and `max()` gives the minimum and maximum value in the list." 2764 | ] 2765 | }, 2766 | { 2767 | "cell_type": "code", 2768 | "execution_count": 111, 2769 | "metadata": {}, 2770 | "outputs": [ 2771 | { 2772 | "data": { 2773 | "text/plain": [ 2774 | "0" 2775 | ] 2776 | }, 2777 | "execution_count": 111, 2778 | "metadata": {}, 2779 | "output_type": "execute_result" 2780 | } 2781 | ], 2782 | "source": [ 2783 | "min(num)" 2784 | ] 2785 | }, 2786 | { 2787 | "cell_type": "code", 2788 | "execution_count": 112, 2789 | "metadata": {}, 2790 | "outputs": [ 2791 | { 2792 | "data": { 2793 | "text/plain": [ 2794 | "9" 2795 | ] 2796 | }, 2797 | "execution_count": 112, 2798 | "metadata": {}, 2799 | "output_type": "execute_result" 2800 | } 2801 | ], 2802 | "source": [ 2803 | "max(num)" 2804 | ] 2805 | }, 2806 | { 2807 | "cell_type": "code", 2808 | "execution_count": 113, 2809 | "metadata": {}, 2810 | "outputs": [ 2811 | { 2812 | "data": { 2813 | "text/plain": [ 2814 | "'two'" 2815 | ] 2816 | }, 2817 | "execution_count": 113, 2818 | "metadata": {}, 2819 | "output_type": "execute_result" 2820 | } 2821 | ], 2822 | "source": [ 2823 | "num2 = num + ['hello']\n", 2824 | "# this won't work because not all elements are numeric: min(num2)\n", 2825 | "# min() and max() also work with strings:\n", 2826 | "st = ['one','two', 'three']\n", 2827 | "max(st)" 2828 | ] 2829 | }, 2830 | { 2831 | "cell_type": "markdown", 2832 | "metadata": {}, 2833 | "source": [ 2834 | "How to check if a particular element is in a predefined list or dictionary:" 2835 | ] 2836 | }, 2837 | { 2838 | "cell_type": "code", 2839 | "execution_count": 114, 2840 | "metadata": {}, 2841 | "outputs": [], 2842 | "source": [ 2843 | "names = ['Earth','Air','Fire']" 2844 | ] 2845 | }, 2846 | { 2847 | "cell_type": "code", 2848 | "execution_count": 115, 2849 | "metadata": {}, 2850 | "outputs": [ 2851 | { 2852 | "data": { 2853 | "text/plain": [ 2854 | "False" 2855 | ] 2856 | }, 2857 | "execution_count": 115, 2858 | "metadata": {}, 2859 | "output_type": "execute_result" 2860 | } 2861 | ], 2862 | "source": [ 2863 | "'Tree' in names" 2864 | ] 2865 | }, 2866 | { 2867 | "cell_type": "code", 2868 | "execution_count": 116, 2869 | "metadata": { 2870 | "scrolled": true 2871 | }, 2872 | "outputs": [ 2873 | { 2874 | "data": { 2875 | "text/plain": [ 2876 | "True" 2877 | ] 2878 | }, 2879 | "execution_count": 116, 2880 | "metadata": {}, 2881 | "output_type": "execute_result" 2882 | } 2883 | ], 2884 | "source": [ 2885 | "'Air' in names" 2886 | ] 2887 | }, 2888 | { 2889 | "cell_type": "markdown", 2890 | "metadata": {}, 2891 | "source": [ 2892 | "For a dictionary, `in` checks the keys, not values." 2893 | ] 2894 | }, 2895 | { 2896 | "cell_type": "code", 2897 | "execution_count": 117, 2898 | "metadata": {}, 2899 | "outputs": [], 2900 | "source": [ 2901 | "dict0 = {'One': 1, 'Two': 2, 'Three': 3}" 2902 | ] 2903 | }, 2904 | { 2905 | "cell_type": "code", 2906 | "execution_count": 118, 2907 | "metadata": {}, 2908 | "outputs": [ 2909 | { 2910 | "data": { 2911 | "text/plain": [ 2912 | "True" 2913 | ] 2914 | }, 2915 | "execution_count": 118, 2916 | "metadata": {}, 2917 | "output_type": "execute_result" 2918 | } 2919 | ], 2920 | "source": [ 2921 | "'One' in dict0 " 2922 | ] 2923 | }, 2924 | { 2925 | "cell_type": "code", 2926 | "execution_count": 119, 2927 | "metadata": {}, 2928 | "outputs": [ 2929 | { 2930 | "data": { 2931 | "text/plain": [ 2932 | "False" 2933 | ] 2934 | }, 2935 | "execution_count": 119, 2936 | "metadata": {}, 2937 | "output_type": "execute_result" 2938 | } 2939 | ], 2940 | "source": [ 2941 | "'Four' in dict0 " 2942 | ] 2943 | }, 2944 | { 2945 | "cell_type": "markdown", 2946 | "metadata": {}, 2947 | "source": [ 2948 | "## Conditional Statements and Loops" 2949 | ] 2950 | }, 2951 | { 2952 | "cell_type": "markdown", 2953 | "metadata": {}, 2954 | "source": [ 2955 | "### If" 2956 | ] 2957 | }, 2958 | { 2959 | "cell_type": "markdown", 2960 | "metadata": {}, 2961 | "source": [ 2962 | "Statement block is executed only if a condition is true." 2963 | ] 2964 | }, 2965 | { 2966 | "cell_type": "markdown", 2967 | "metadata": {}, 2968 | "source": [ 2969 | "```python\n", 2970 | "if logical_condition:\n", 2971 | " statement(s)\n", 2972 | "```" 2973 | ] 2974 | }, 2975 | { 2976 | "cell_type": "markdown", 2977 | "metadata": {}, 2978 | "source": [ 2979 | "Make sure you put *colon* `:` at the end **and** indent the next line (preferably by 4 spaces, not a tab)." 2980 | ] 2981 | }, 2982 | { 2983 | "cell_type": "code", 2984 | "execution_count": 120, 2985 | "metadata": {}, 2986 | "outputs": [ 2987 | { 2988 | "name": "stdout", 2989 | "output_type": "stream", 2990 | "text": [ 2991 | "a is greater than b\n" 2992 | ] 2993 | } 2994 | ], 2995 | "source": [ 2996 | "a = 2000\n", 2997 | "b = 1999\n", 2998 | "\n", 2999 | "if a > b:\n", 3000 | " print('a is greater than b')" 3001 | ] 3002 | }, 3003 | { 3004 | "cell_type": "markdown", 3005 | "metadata": {}, 3006 | "source": [ 3007 | "### If-else" 3008 | ] 3009 | }, 3010 | { 3011 | "cell_type": "markdown", 3012 | "metadata": {}, 3013 | "source": [ 3014 | "```python\n", 3015 | "if logical_condition: \n", 3016 | " statement(s)\n", 3017 | "else:\n", 3018 | " statement(s)\n", 3019 | "```" 3020 | ] 3021 | }, 3022 | { 3023 | "cell_type": "code", 3024 | "execution_count": 121, 3025 | "metadata": {}, 3026 | "outputs": [ 3027 | { 3028 | "name": "stdout", 3029 | "output_type": "stream", 3030 | "text": [ 3031 | "a is greater than b\n" 3032 | ] 3033 | } 3034 | ], 3035 | "source": [ 3036 | "a = 2000\n", 3037 | "b = 1999\n", 3038 | "\n", 3039 | "if a < b:\n", 3040 | " print('b is greater than a')\n", 3041 | "else:\n", 3042 | " print('a is greater than b')" 3043 | ] 3044 | }, 3045 | { 3046 | "cell_type": "markdown", 3047 | "metadata": {}, 3048 | "source": [ 3049 | "### If-elif" 3050 | ] 3051 | }, 3052 | { 3053 | "cell_type": "markdown", 3054 | "metadata": {}, 3055 | "source": [ 3056 | "```python\n", 3057 | "if logical_condition: \n", 3058 | " statement(s)\n", 3059 | "elif:\n", 3060 | " statement(s)\n", 3061 | "else:\n", 3062 | " statement(s)\n", 3063 | "```" 3064 | ] 3065 | }, 3066 | { 3067 | "cell_type": "code", 3068 | "execution_count": 122, 3069 | "metadata": {}, 3070 | "outputs": [ 3071 | { 3072 | "name": "stdout", 3073 | "output_type": "stream", 3074 | "text": [ 3075 | "a and b are equal\n" 3076 | ] 3077 | } 3078 | ], 3079 | "source": [ 3080 | "a = 2000\n", 3081 | "b = 2000\n", 3082 | "\n", 3083 | "if b > a:\n", 3084 | " print('b is greater than a')\n", 3085 | "elif a == b:\n", 3086 | " print('a and b are equal')\n", 3087 | "else:\n", 3088 | " print('a is greater than b')" 3089 | ] 3090 | }, 3091 | { 3092 | "cell_type": "markdown", 3093 | "metadata": {}, 3094 | "source": [ 3095 | "### Nested if" 3096 | ] 3097 | }, 3098 | { 3099 | "cell_type": "markdown", 3100 | "metadata": {}, 3101 | "source": [ 3102 | "You can also write if statements inside a if statement." 3103 | ] 3104 | }, 3105 | { 3106 | "cell_type": "code", 3107 | "execution_count": 123, 3108 | "metadata": {}, 3109 | "outputs": [ 3110 | { 3111 | "name": "stdout", 3112 | "output_type": "stream", 3113 | "text": [ 3114 | "a < b\n", 3115 | "a = 1999\n" 3116 | ] 3117 | } 3118 | ], 3119 | "source": [ 3120 | "a = 1999\n", 3121 | "b = 2000\n", 3122 | "\n", 3123 | "if a > b:\n", 3124 | " print('a > b')\n", 3125 | "elif a < b:\n", 3126 | " print('a < b')\n", 3127 | " if a == 1999:\n", 3128 | " print('a = 1999')\n", 3129 | " else:\n", 3130 | " print('a is not equal to 1999')\n", 3131 | "else:\n", 3132 | " print('a = b')" 3133 | ] 3134 | }, 3135 | { 3136 | "cell_type": "markdown", 3137 | "metadata": {}, 3138 | "source": [ 3139 | "### While" 3140 | ] 3141 | }, 3142 | { 3143 | "cell_type": "markdown", 3144 | "metadata": {}, 3145 | "source": [ 3146 | "```python\n", 3147 | "while logical_condition:\n", 3148 | " statement(s)\n", 3149 | "```" 3150 | ] 3151 | }, 3152 | { 3153 | "cell_type": "code", 3154 | "execution_count": 124, 3155 | "metadata": {}, 3156 | "outputs": [ 3157 | { 3158 | "name": "stdout", 3159 | "output_type": "stream", 3160 | "text": [ 3161 | "2\n", 3162 | "4\n", 3163 | "6\n", 3164 | "8\n", 3165 | "10\n", 3166 | "12\n", 3167 | "14\n", 3168 | "16\n", 3169 | "18\n", 3170 | "Mission accomplished!\n" 3171 | ] 3172 | } 3173 | ], 3174 | "source": [ 3175 | "i = 1\n", 3176 | "while i < 10:\n", 3177 | " print(i*2)\n", 3178 | " i = i + 1\n", 3179 | "print('Mission accomplished!')" 3180 | ] 3181 | }, 3182 | { 3183 | "cell_type": "markdown", 3184 | "metadata": {}, 3185 | "source": [ 3186 | "### For" 3187 | ] 3188 | }, 3189 | { 3190 | "cell_type": "markdown", 3191 | "metadata": {}, 3192 | "source": [ 3193 | "For each item of a sequence, statements are executed. " 3194 | ] 3195 | }, 3196 | { 3197 | "cell_type": "markdown", 3198 | "metadata": {}, 3199 | "source": [ 3200 | "```python\n", 3201 | "for variable in sequence:\n", 3202 | " statement(s)\n", 3203 | "```" 3204 | ] 3205 | }, 3206 | { 3207 | "cell_type": "markdown", 3208 | "metadata": {}, 3209 | "source": [ 3210 | "**NOTE:** Keep in mind that the *colon* `:` at the end **and** indentation of the next line are **mandatory**." 3211 | ] 3212 | }, 3213 | { 3214 | "cell_type": "code", 3215 | "execution_count": 125, 3216 | "metadata": {}, 3217 | "outputs": [ 3218 | { 3219 | "name": "stdout", 3220 | "output_type": "stream", 3221 | "text": [ 3222 | "0\n", 3223 | "1\n", 3224 | "2\n", 3225 | "3\n", 3226 | "4\n", 3227 | "5\n", 3228 | "6\n", 3229 | "7\n", 3230 | "8\n", 3231 | "9\n" 3232 | ] 3233 | } 3234 | ], 3235 | "source": [ 3236 | "for i in range(10):\n", 3237 | " print(i)" 3238 | ] 3239 | }, 3240 | { 3241 | "cell_type": "code", 3242 | "execution_count": 126, 3243 | "metadata": {}, 3244 | "outputs": [ 3245 | { 3246 | "name": "stdout", 3247 | "output_type": "stream", 3248 | "text": [ 3249 | "5\n", 3250 | "25\n", 3251 | "50\n", 3252 | "75\n" 3253 | ] 3254 | } 3255 | ], 3256 | "source": [ 3257 | "for i in [1, 5, 10, 15]:\n", 3258 | " print(i*5)" 3259 | ] 3260 | }, 3261 | { 3262 | "cell_type": "code", 3263 | "execution_count": 127, 3264 | "metadata": {}, 3265 | "outputs": [ 3266 | { 3267 | "name": "stdout", 3268 | "output_type": "stream", 3269 | "text": [ 3270 | "key is One, value is 1.\n", 3271 | "key is Two, value is 2.\n", 3272 | "key is Three, value is 3.\n" 3273 | ] 3274 | } 3275 | ], 3276 | "source": [ 3277 | "dict0 = {'One': 1, 'Two': 2, 'Three': 3}\n", 3278 | "for key, value in dict0.items():\n", 3279 | " print(f'key is {key}, value is {value}.')" 3280 | ] 3281 | }, 3282 | { 3283 | "cell_type": "markdown", 3284 | "metadata": {}, 3285 | "source": [ 3286 | "### Break" 3287 | ] 3288 | }, 3289 | { 3290 | "cell_type": "markdown", 3291 | "metadata": {}, 3292 | "source": [ 3293 | "`break` terminates the loop when a condition becomes true." 3294 | ] 3295 | }, 3296 | { 3297 | "cell_type": "code", 3298 | "execution_count": 128, 3299 | "metadata": {}, 3300 | "outputs": [ 3301 | { 3302 | "name": "stdout", 3303 | "output_type": "stream", 3304 | "text": [ 3305 | "0\n", 3306 | "1\n", 3307 | "2\n", 3308 | "3\n", 3309 | "4\n" 3310 | ] 3311 | } 3312 | ], 3313 | "source": [ 3314 | "for i in range(10):\n", 3315 | " print(i)\n", 3316 | " if i >= 4:\n", 3317 | " break" 3318 | ] 3319 | }, 3320 | { 3321 | "cell_type": "markdown", 3322 | "metadata": {}, 3323 | "source": [ 3324 | "### Continue" 3325 | ] 3326 | }, 3327 | { 3328 | "cell_type": "markdown", 3329 | "metadata": {}, 3330 | "source": [ 3331 | "Unlike `break`, when a condition becomes a true, `continue` lets you skip the rest of the code inside a loop for the current iteration only and continue on with the next iteration." 3332 | ] 3333 | }, 3334 | { 3335 | "cell_type": "code", 3336 | "execution_count": 129, 3337 | "metadata": {}, 3338 | "outputs": [ 3339 | { 3340 | "name": "stdout", 3341 | "output_type": "stream", 3342 | "text": [ 3343 | "0\n", 3344 | "1\n", 3345 | "Ignoring 2\n", 3346 | "3\n", 3347 | "4\n", 3348 | "5\n", 3349 | "6\n", 3350 | "7\n", 3351 | "8\n", 3352 | "9\n" 3353 | ] 3354 | } 3355 | ], 3356 | "source": [ 3357 | "for i in range(10):\n", 3358 | " if i == 2:\n", 3359 | " print('Ignoring 2')\n", 3360 | " continue\n", 3361 | " else:\n", 3362 | " print(i)" 3363 | ] 3364 | }, 3365 | { 3366 | "cell_type": "markdown", 3367 | "metadata": {}, 3368 | "source": [ 3369 | "### List Comprehension" 3370 | ] 3371 | }, 3372 | { 3373 | "cell_type": "markdown", 3374 | "metadata": {}, 3375 | "source": [ 3376 | "You can create a list with a for-loop as below. This is called list comprehension and it is a very commonly used Python feature.\n", 3377 | "\n", 3378 | "```python\n", 3379 | "[do_something_with_x for x in sequence]\n", 3380 | "```" 3381 | ] 3382 | }, 3383 | { 3384 | "cell_type": "markdown", 3385 | "metadata": {}, 3386 | "source": [ 3387 | "For example, say we would like create a list of numbers ranging from 0 to 9." 3388 | ] 3389 | }, 3390 | { 3391 | "cell_type": "code", 3392 | "execution_count": 130, 3393 | "metadata": {}, 3394 | "outputs": [ 3395 | { 3396 | "data": { 3397 | "text/plain": [ 3398 | "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]" 3399 | ] 3400 | }, 3401 | "execution_count": 130, 3402 | "metadata": {}, 3403 | "output_type": "execute_result" 3404 | } 3405 | ], 3406 | "source": [ 3407 | "[z for z in range(10)]" 3408 | ] 3409 | }, 3410 | { 3411 | "cell_type": "markdown", 3412 | "metadata": {}, 3413 | "source": [ 3414 | "How to convert the above numbers to strings and then combine them with a dash in between:" 3415 | ] 3416 | }, 3417 | { 3418 | "cell_type": "code", 3419 | "execution_count": 131, 3420 | "metadata": {}, 3421 | "outputs": [ 3422 | { 3423 | "name": "stdout", 3424 | "output_type": "stream", 3425 | "text": [ 3426 | "0-1-2-3-4-5-6-7-8-9\n" 3427 | ] 3428 | } 3429 | ], 3430 | "source": [ 3431 | "st_lst = [str(z) for z in range(10)]\n", 3432 | "print('-'.join(st_lst))" 3433 | ] 3434 | }, 3435 | { 3436 | "cell_type": "markdown", 3437 | "metadata": {}, 3438 | "source": [ 3439 | "* List comprehension is very flexible as you can have an additional conditional statement.\n", 3440 | " ```python\n", 3441 | " [do_something_with_x for x in sequence if x some_condition]\n", 3442 | " ```" 3443 | ] 3444 | }, 3445 | { 3446 | "cell_type": "markdown", 3447 | "metadata": {}, 3448 | "source": [ 3449 | "For example, how to return a list of even numbers ranging from 0 to 10:" 3450 | ] 3451 | }, 3452 | { 3453 | "cell_type": "code", 3454 | "execution_count": 132, 3455 | "metadata": {}, 3456 | "outputs": [ 3457 | { 3458 | "data": { 3459 | "text/plain": [ 3460 | "[0, 2, 4, 6, 8, 10]" 3461 | ] 3462 | }, 3463 | "execution_count": 132, 3464 | "metadata": {}, 3465 | "output_type": "execute_result" 3466 | } 3467 | ], 3468 | "source": [ 3469 | "[z for z in range(11) if z % 2 == 0]" 3470 | ] 3471 | }, 3472 | { 3473 | "cell_type": "markdown", 3474 | "metadata": {}, 3475 | "source": [ 3476 | "## Functions" 3477 | ] 3478 | }, 3479 | { 3480 | "cell_type": "markdown", 3481 | "metadata": {}, 3482 | "source": [ 3483 | "You can define your own functions that perform a particular task. You can also pass in input parameters to your functions. You define a function using the keyword `def` followed by the function name and any input parameters you might have. If you like, you can define default values for your parameters. Functions can return a value using the `return()` command. If there is no return statement, the function will implicitly return `None`. A typical syntax for a function is as follows." 3484 | ] 3485 | }, 3486 | { 3487 | "cell_type": "markdown", 3488 | "metadata": {}, 3489 | "source": [ 3490 | "```python\n", 3491 | "def function_name(parameter):\n", 3492 | " \"\"\" documentation \"\"\"\n", 3493 | " statement(s)\n", 3494 | " return(value)\n", 3495 | "```" 3496 | ] 3497 | }, 3498 | { 3499 | "cell_type": "code", 3500 | "execution_count": 133, 3501 | "metadata": {}, 3502 | "outputs": [], 3503 | "source": [ 3504 | "def test_function(): \n", 3505 | " print('test')" 3506 | ] 3507 | }, 3508 | { 3509 | "cell_type": "code", 3510 | "execution_count": 134, 3511 | "metadata": {}, 3512 | "outputs": [ 3513 | { 3514 | "name": "stdout", 3515 | "output_type": "stream", 3516 | "text": [ 3517 | "test\n" 3518 | ] 3519 | } 3520 | ], 3521 | "source": [ 3522 | "test_function()" 3523 | ] 3524 | }, 3525 | { 3526 | "cell_type": "code", 3527 | "execution_count": 135, 3528 | "metadata": {}, 3529 | "outputs": [], 3530 | "source": [ 3531 | "def iLove(food):\n", 3532 | " print('I love ' + food)" 3533 | ] 3534 | }, 3535 | { 3536 | "cell_type": "code", 3537 | "execution_count": 136, 3538 | "metadata": {}, 3539 | "outputs": [ 3540 | { 3541 | "name": "stdout", 3542 | "output_type": "stream", 3543 | "text": [ 3544 | "I love chocolate\n" 3545 | ] 3546 | } 3547 | ], 3548 | "source": [ 3549 | "iLove('chocolate')" 3550 | ] 3551 | }, 3552 | { 3553 | "cell_type": "markdown", 3554 | "metadata": {}, 3555 | "source": [ 3556 | "Here is another version of the above function that uses a default value for its input parameter." 3557 | ] 3558 | }, 3559 | { 3560 | "cell_type": "code", 3561 | "execution_count": 137, 3562 | "metadata": {}, 3563 | "outputs": [ 3564 | { 3565 | "name": "stdout", 3566 | "output_type": "stream", 3567 | "text": [ 3568 | "I love junk food\n" 3569 | ] 3570 | } 3571 | ], 3572 | "source": [ 3573 | "def iLove(food='junk food'):\n", 3574 | " print('I love ' + food)\n", 3575 | "iLove()" 3576 | ] 3577 | }, 3578 | { 3579 | "cell_type": "markdown", 3580 | "metadata": {}, 3581 | "source": [ 3582 | "Another example of function that performs a mathematical operation based on the input value." 3583 | ] 3584 | }, 3585 | { 3586 | "cell_type": "code", 3587 | "execution_count": 138, 3588 | "metadata": {}, 3589 | "outputs": [], 3590 | "source": [ 3591 | "def five_times(x):\n", 3592 | " return(5 * x)" 3593 | ] 3594 | }, 3595 | { 3596 | "cell_type": "code", 3597 | "execution_count": 139, 3598 | "metadata": {}, 3599 | "outputs": [ 3600 | { 3601 | "data": { 3602 | "text/plain": [ 3603 | "15" 3604 | ] 3605 | }, 3606 | "execution_count": 139, 3607 | "metadata": {}, 3608 | "output_type": "execute_result" 3609 | } 3610 | ], 3611 | "source": [ 3612 | "five_times(3)" 3613 | ] 3614 | }, 3615 | { 3616 | "cell_type": "markdown", 3617 | "metadata": {}, 3618 | "source": [ 3619 | "It is always a good practice to document your functions. You should write the documentation right after declaring the function. For example, suppose you would like to create a `square` function to return the squared value of its input parameter." 3620 | ] 3621 | }, 3622 | { 3623 | "cell_type": "code", 3624 | "execution_count": 140, 3625 | "metadata": {}, 3626 | "outputs": [], 3627 | "source": [ 3628 | "def square(x):\n", 3629 | " \"\"\"\n", 3630 | " Returns the square of the input.\n", 3631 | " \"\"\"\n", 3632 | " return x ** 2" 3633 | ] 3634 | }, 3635 | { 3636 | "cell_type": "code", 3637 | "execution_count": 141, 3638 | "metadata": {}, 3639 | "outputs": [ 3640 | { 3641 | "data": { 3642 | "text/plain": [ 3643 | "16" 3644 | ] 3645 | }, 3646 | "execution_count": 141, 3647 | "metadata": {}, 3648 | "output_type": "execute_result" 3649 | } 3650 | ], 3651 | "source": [ 3652 | "square(4)" 3653 | ] 3654 | }, 3655 | { 3656 | "cell_type": "markdown", 3657 | "metadata": {}, 3658 | "source": [ 3659 | "To access \"documentation\" of a function, you use its `__doc__` method." 3660 | ] 3661 | }, 3662 | { 3663 | "cell_type": "code", 3664 | "execution_count": 142, 3665 | "metadata": {}, 3666 | "outputs": [ 3667 | { 3668 | "data": { 3669 | "text/plain": [ 3670 | "'\\n Returns the square of the input.\\n '" 3671 | ] 3672 | }, 3673 | "execution_count": 142, 3674 | "metadata": {}, 3675 | "output_type": "execute_result" 3676 | } 3677 | ], 3678 | "source": [ 3679 | "square.__doc__" 3680 | ] 3681 | }, 3682 | { 3683 | "cell_type": "markdown", 3684 | "metadata": {}, 3685 | "source": [ 3686 | "## Object Introspection" 3687 | ] 3688 | }, 3689 | { 3690 | "cell_type": "markdown", 3691 | "metadata": {}, 3692 | "source": [ 3693 | "For help with variables and functions, add `?` at the beginning." 3694 | ] 3695 | }, 3696 | { 3697 | "cell_type": "code", 3698 | "execution_count": 143, 3699 | "metadata": {}, 3700 | "outputs": [ 3701 | { 3702 | "data": { 3703 | "text/plain": [ 3704 | "\u001b[0;31mSignature:\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m*\u001b[0m\u001b[0margs\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0msep\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m' '\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mend\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'\\n'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mfile\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mNone\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mflush\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;32mFalse\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 3705 | "\u001b[0;31mDocstring:\u001b[0m\n", 3706 | "Prints the values to a stream, or to sys.stdout by default.\n", 3707 | "\n", 3708 | "sep\n", 3709 | " string inserted between values, default a space.\n", 3710 | "end\n", 3711 | " string appended after the last value, default a newline.\n", 3712 | "file\n", 3713 | " a file-like object (stream); defaults to the current sys.stdout.\n", 3714 | "flush\n", 3715 | " whether to forcibly flush the stream.\n", 3716 | "\u001b[0;31mType:\u001b[0m builtin_function_or_method" 3717 | ] 3718 | }, 3719 | "metadata": {}, 3720 | "output_type": "display_data" 3721 | } 3722 | ], 3723 | "source": [ 3724 | "# the output will appear in a box at the bottom of your browser.\n", 3725 | "?print" 3726 | ] 3727 | }, 3728 | { 3729 | "cell_type": "code", 3730 | "execution_count": 144, 3731 | "metadata": {}, 3732 | "outputs": [ 3733 | { 3734 | "data": { 3735 | "text/plain": [ 3736 | "\u001b[0;31mSignature:\u001b[0m \u001b[0msquare\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 3737 | "\u001b[0;31mDocstring:\u001b[0m Returns the square of the input.\n", 3738 | "\u001b[0;31mFile:\u001b[0m /var/folders/2p/_97_wp4j3vq9k61zw0xjljs40000gn/T/ipykernel_20504/3098292944.py\n", 3739 | "\u001b[0;31mType:\u001b[0m function" 3740 | ] 3741 | }, 3742 | "metadata": {}, 3743 | "output_type": "display_data" 3744 | } 3745 | ], 3746 | "source": [ 3747 | "?square" 3748 | ] 3749 | }, 3750 | { 3751 | "cell_type": "markdown", 3752 | "metadata": {}, 3753 | "source": [ 3754 | "With functions, add `??` at the beginning to see the source code, if available." 3755 | ] 3756 | }, 3757 | { 3758 | "cell_type": "code", 3759 | "execution_count": 145, 3760 | "metadata": {}, 3761 | "outputs": [ 3762 | { 3763 | "data": { 3764 | "text/plain": [ 3765 | "\u001b[0;31mSignature:\u001b[0m \u001b[0msquare\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 3766 | "\u001b[0;31mSource:\u001b[0m \n", 3767 | "\u001b[0;32mdef\u001b[0m \u001b[0msquare\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mx\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\n", 3768 | "\u001b[0;34m\u001b[0m \u001b[0;34m\"\"\"\u001b[0m\n", 3769 | "\u001b[0;34m Returns the square of the input.\u001b[0m\n", 3770 | "\u001b[0;34m \"\"\"\u001b[0m\u001b[0;34m\u001b[0m\n", 3771 | "\u001b[0;34m\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mx\u001b[0m \u001b[0;34m**\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 3772 | "\u001b[0;31mFile:\u001b[0m /var/folders/2p/_97_wp4j3vq9k61zw0xjljs40000gn/T/ipykernel_20504/3098292944.py\n", 3773 | "\u001b[0;31mType:\u001b[0m function" 3774 | ] 3775 | }, 3776 | "metadata": {}, 3777 | "output_type": "display_data" 3778 | } 3779 | ], 3780 | "source": [ 3781 | "??square" 3782 | ] 3783 | }, 3784 | { 3785 | "cell_type": "markdown", 3786 | "metadata": {}, 3787 | "source": [ 3788 | "## Modules" 3789 | ] 3790 | }, 3791 | { 3792 | "cell_type": "markdown", 3793 | "metadata": {}, 3794 | "source": [ 3795 | "A module is just a code library. It's a file containing a collection of functions (also known as methods/ commands) and variables (also known as attributes) that you can include in your code. If someone has already written a module that you need, you can just use it instead of reinventing the wheel! The most commonly used modules in Python for data analytics are the `NumPy`, `Pandas`, `StatsModels`, and `Scikit-Learn` modules.\n", 3796 | "\n", 3797 | "You can see a list of available variables and functions in a module using the `dir()` command." 3798 | ] 3799 | }, 3800 | { 3801 | "cell_type": "code", 3802 | "execution_count": 146, 3803 | "metadata": {}, 3804 | "outputs": [ 3805 | { 3806 | "data": { 3807 | "text/plain": [ 3808 | "['bitwise_not',\n", 3809 | " 'bitwise_or',\n", 3810 | " 'bitwise_right_shift',\n", 3811 | " 'bitwise_xor',\n", 3812 | " 'blackman',\n", 3813 | " 'block',\n", 3814 | " 'bmat',\n", 3815 | " 'bool',\n", 3816 | " 'bool_',\n", 3817 | " 'broadcast']" 3818 | ] 3819 | }, 3820 | "execution_count": 146, 3821 | "metadata": {}, 3822 | "output_type": "execute_result" 3823 | } 3824 | ], 3825 | "source": [ 3826 | "import numpy as np\n", 3827 | "dir(np)[100:110]" 3828 | ] 3829 | }, 3830 | { 3831 | "cell_type": "markdown", 3832 | "metadata": {}, 3833 | "source": [ 3834 | "As a simple illustration, let's create a matrix using `NumPy`." 3835 | ] 3836 | }, 3837 | { 3838 | "cell_type": "code", 3839 | "execution_count": 147, 3840 | "metadata": {}, 3841 | "outputs": [ 3842 | { 3843 | "data": { 3844 | "text/plain": [ 3845 | "array([[ 0, 1, 2, 3],\n", 3846 | " [ 4, 5, 6, 7],\n", 3847 | " [ 8, 9, 10, 11]])" 3848 | ] 3849 | }, 3850 | "execution_count": 147, 3851 | "metadata": {}, 3852 | "output_type": "execute_result" 3853 | } 3854 | ], 3855 | "source": [ 3856 | "data = np.arange(12)\n", 3857 | "data.reshape(3,4)" 3858 | ] 3859 | }, 3860 | { 3861 | "cell_type": "markdown", 3862 | "metadata": {}, 3863 | "source": [ 3864 | "## Exercises" 3865 | ] 3866 | }, 3867 | { 3868 | "cell_type": "markdown", 3869 | "metadata": {}, 3870 | "source": [ 3871 | "1. Return a list of numbers ranging between 0 to 50 (inclusive) that are divisible by 3 and 5." 3872 | ] 3873 | }, 3874 | { 3875 | "cell_type": "markdown", 3876 | "metadata": {}, 3877 | "source": [ 3878 | "2. Suppose we have the following dictionary. Add a new course named 'Introduction to Analytics' with course code MATH2350 to this dictionary.\n", 3879 | "\n", 3880 | "```python\n", 3881 | "course_names = {'MATH2319': 'Machine learning', 'MATH1298': 'Categorical Data Analysis'}\n", 3882 | "```" 3883 | ] 3884 | }, 3885 | { 3886 | "cell_type": "markdown", 3887 | "metadata": {}, 3888 | "source": [ 3889 | "3. Given a value `x`, write a function that checks if `x` is a number and returns its squared value if so. If not, the function should return None." 3890 | ] 3891 | }, 3892 | { 3893 | "cell_type": "markdown", 3894 | "metadata": {}, 3895 | "source": [ 3896 | "### Solutions\n", 3897 | "\n", 3898 | "
    \n", 3899 | "
  1. List comprehension
  2. \n", 3900 | " \n", 3901 | "```python \n", 3902 | "[z for z in range(51) if z%3 == 0 and z%5 == 0] \n", 3903 | "```\n", 3904 | "
  3. Dictionaries
  4. \n", 3905 | " \n", 3906 | "```python \n", 3907 | "course_names['MATH2350'] = 'Introduction to Analytics'\n", 3908 | "```\n", 3909 | "
  5. Conditional statements and functions
  6. \n", 3910 | "\n", 3911 | "```python\n", 3912 | "def square(x):\n", 3913 | " \"\"\"\n", 3914 | " Return the square of x if number, None otherwise.\n", 3915 | " \"\"\"\n", 3916 | " if isinstance(x, (int, float)):\n", 3917 | " return x ** 2\n", 3918 | " else:\n", 3919 | " return None\n", 3920 | "```" 3921 | ] 3922 | }, 3923 | { 3924 | "cell_type": "markdown", 3925 | "metadata": {}, 3926 | "source": [ 3927 | "---" 3928 | ] 3929 | } 3930 | ], 3931 | "metadata": { 3932 | "kernelspec": { 3933 | "display_name": "Python 3 (ipykernel)", 3934 | "language": "python", 3935 | "name": "python3" 3936 | }, 3937 | "language_info": { 3938 | "codemirror_mode": { 3939 | "name": "ipython", 3940 | "version": 3 3941 | }, 3942 | "file_extension": ".py", 3943 | "mimetype": "text/x-python", 3944 | "name": "python", 3945 | "nbconvert_exporter": "python", 3946 | "pygments_lexer": "ipython3", 3947 | "version": "3.11.9" 3948 | } 3949 | }, 3950 | "nbformat": 4, 3951 | "nbformat_minor": 4 3952 | } 3953 | -------------------------------------------------------------------------------- /PB4_numpy.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# NumPy" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "NumPy (Numerical Python) is the core module for numerical computation in Python. NumPy contains a fast and memory-efficient implementation of a list-like *array* data structure and it contains useful linear algebra and random number functions. A large portion of NumPy is actually written in the `C` programming language. \n", 15 | "\n", 16 | "A NumPy array is similar to Python's `list` data structure. A Python list can contain any combination of element types: integers, floats, strings, functions, objects, etc. A NumPy array, on the other hand, must contain only one element type at a time. This way, NumPy arrays can be much faster and more memory efficient.\n", 17 | "\n", 18 | "Both the `Pandas` module (for data analysis) and the `Scikit-Learn` module (for machine learning) are built upon the NumPy module. The `Matplotlib` module (for plotting) also plays nicely with NumPy. These four modules plus the base Python is practically all you need for basic to intermediate machine learning.\n", 19 | "\n", 20 | "Two other fundamental Python modules closely related to machine learning are as follows - though we will not cover these in our tutorials:\n", 21 | "* `SciPy`: This module is for numerical computing including integration, differentiation, optimization, probability distributions, and parallel programming.\n", 22 | "* `StatsModels`: This module provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration." 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "## Table of Contents\n", 30 | " * [Creating arrays with NumPy](#Creating-arrays-with-NumPy)\n", 31 | " * [Data types for arrays](#Data-types-for-arrays)\n", 32 | " * [Arithmetic operations on arrays](#Arithmetic-operations-on-arrays)\n", 33 | " * [Reshaping arrays](#Reshaping-arrays)\n", 34 | " * [Adding and removing elements](#Adding-and-removing-elements)\n", 35 | " * [Copying arrays](#Copying-arrays)\n", 36 | " * [Broadcasting](#Broadcasting)\n", 37 | " * [Conditional expressions with arrays](#Conditional-expressions-with-arrays)\n", 38 | " * [Mathematical and statistical functions](#Mathematical-and-statistical-functions)\n", 39 | " * [Universal functions](#Universal-functions)\n", 40 | " + [Binary universal functions](#Binary-universal-functions)\n", 41 | " * [Array indexing and slicing](#Array-indexing-and-slicing)\n", 42 | " + [One-dimensional arrays](#One-dimensional-arrays)\n", 43 | " + [Multi-dimensional arrays](#Multi-dimensional-arrays)\n", 44 | " * [Transposing arrays](#Transposing-arrays)\n", 45 | " * [Combining arrays](#Combining-arrays)\n", 46 | " * [Sorting arrays](#Sorting-arrays)\n", 47 | " * [Exercises](#Exercises)\n", 48 | " + [Possible solutions](#Possible-solutions)" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "Let's import `numpy` with usual convention of `np`." 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 1, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "import numpy as np" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "## Creating arrays with NumPy" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "NumPy’s array class is called **ndarray** (the n-dimensional array). It is also known by the name **array**. \n", 79 | "\n", 80 | "* In a NumPy array, each dimension is called an **axis** and the number of axes is called the **rank**. \n", 81 | " * For example, a 3x4 matrix is an array of rank 2 (it is 2-dimensional).\n", 82 | " * The first axis has length 3, the second has length 4.\n", 83 | "* An array's list of axis lengths is called the **shape** of the array.\n", 84 | " * For example, a 3x4 matrix's shape is `(3, 4)`.\n", 85 | " * The rank is equal to the shape's length.\n", 86 | "* The **size** of an array is the total number of elements, which is the product of all axis lengths (eg. 3*4=12)" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": {}, 92 | "source": [ 93 | "### `np.array`\n", 94 | "The easiest way to create an array is to use the `array` function. This accepts any sequence-like object (including other arrays) and produces a new NumPy array containing the passed data." 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "execution_count": 2, 100 | "metadata": {}, 101 | "outputs": [ 102 | { 103 | "data": { 104 | "text/plain": [ 105 | "array([ 2. , 10.2, 5.4, 80. , 0. ])" 106 | ] 107 | }, 108 | "execution_count": 2, 109 | "metadata": {}, 110 | "output_type": "execute_result" 111 | } 112 | ], 113 | "source": [ 114 | "arr1 = np.array([2, 10.2, 5.4, 80, 0])\n", 115 | "arr1" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "Nested sequences, like a list of equal-length lists, will be converted into a multi-dimensional array:" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 3, 128 | "metadata": {}, 129 | "outputs": [ 130 | { 131 | "data": { 132 | "text/plain": [ 133 | "array([[1, 2, 3, 4],\n", 134 | " [5, 6, 7, 8]])" 135 | ] 136 | }, 137 | "execution_count": 3, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "data = [[1, 2, 3, 4], [5, 6, 7, 8]]\n", 144 | "arr2 = np.array(data)\n", 145 | "arr2" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 4, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "data": { 155 | "text/plain": [ 156 | "(2, 4)" 157 | ] 158 | }, 159 | "execution_count": 4, 160 | "metadata": {}, 161 | "output_type": "execute_result" 162 | } 163 | ], 164 | "source": [ 165 | "arr2.shape" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 5, 171 | "metadata": {}, 172 | "outputs": [ 173 | { 174 | "data": { 175 | "text/plain": [ 176 | "2" 177 | ] 178 | }, 179 | "execution_count": 5, 180 | "metadata": {}, 181 | "output_type": "execute_result" 182 | } 183 | ], 184 | "source": [ 185 | "arr2.ndim # equal to len(a.shape)" 186 | ] 187 | }, 188 | { 189 | "cell_type": "code", 190 | "execution_count": 6, 191 | "metadata": {}, 192 | "outputs": [ 193 | { 194 | "data": { 195 | "text/plain": [ 196 | "8" 197 | ] 198 | }, 199 | "execution_count": 6, 200 | "metadata": {}, 201 | "output_type": "execute_result" 202 | } 203 | ], 204 | "source": [ 205 | "arr2.size" 206 | ] 207 | }, 208 | { 209 | "cell_type": "markdown", 210 | "metadata": {}, 211 | "source": [ 212 | "### Other functions to create arrays\n", 213 | "There are several other convenience NumPy functions to create arrays." 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "### `np.zeros`\n", 221 | "Creates an array containing any number of zeros." 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 7, 227 | "metadata": {}, 228 | "outputs": [ 229 | { 230 | "data": { 231 | "text/plain": [ 232 | "array([0., 0., 0., 0., 0.])" 233 | ] 234 | }, 235 | "execution_count": 7, 236 | "metadata": {}, 237 | "output_type": "execute_result" 238 | } 239 | ], 240 | "source": [ 241 | "np.zeros(5)" 242 | ] 243 | }, 244 | { 245 | "cell_type": "markdown", 246 | "metadata": {}, 247 | "source": [ 248 | "It's just as easy to create a 2-D array (i.e., a matrix) by providing a tuple with the desired number of rows and columns. For example, here's a 3x4 matrix:" 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 8, 254 | "metadata": {}, 255 | "outputs": [ 256 | { 257 | "data": { 258 | "text/plain": [ 259 | "array([[0., 0., 0.],\n", 260 | " [0., 0., 0.]])" 261 | ] 262 | }, 263 | "execution_count": 8, 264 | "metadata": {}, 265 | "output_type": "execute_result" 266 | } 267 | ], 268 | "source": [ 269 | "np.zeros((2, 3)) # notice the double parantheses" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "### `np.ones`\n", 277 | "Produces an array of all ones." 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 9, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "data": { 287 | "text/plain": [ 288 | "array([[1., 1., 1.],\n", 289 | " [1., 1., 1.]])" 290 | ] 291 | }, 292 | "execution_count": 9, 293 | "metadata": {}, 294 | "output_type": "execute_result" 295 | } 296 | ], 297 | "source": [ 298 | "np.ones((2, 3))" 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": {}, 304 | "source": [ 305 | "How to create an array with the same values:" 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": 10, 311 | "metadata": {}, 312 | "outputs": [ 313 | { 314 | "data": { 315 | "text/plain": [ 316 | "array([[3.14, 3.14, 3.14, 3.14],\n", 317 | " [3.14, 3.14, 3.14, 3.14],\n", 318 | " [3.14, 3.14, 3.14, 3.14]])" 319 | ] 320 | }, 321 | "execution_count": 10, 322 | "metadata": {}, 323 | "output_type": "execute_result" 324 | } 325 | ], 326 | "source": [ 327 | "(np.pi * np.ones((3,4))).round(2)" 328 | ] 329 | }, 330 | { 331 | "cell_type": "markdown", 332 | "metadata": {}, 333 | "source": [ 334 | "### `np.arange`\n", 335 | "This is similar to Python's built-in `range` function, but much faster." 336 | ] 337 | }, 338 | { 339 | "cell_type": "code", 340 | "execution_count": 11, 341 | "metadata": {}, 342 | "outputs": [ 343 | { 344 | "data": { 345 | "text/plain": [ 346 | "array([0, 1, 2, 3, 4])" 347 | ] 348 | }, 349 | "execution_count": 11, 350 | "metadata": {}, 351 | "output_type": "execute_result" 352 | } 353 | ], 354 | "source": [ 355 | "np.arange(5)" 356 | ] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "execution_count": 12, 361 | "metadata": { 362 | "scrolled": true 363 | }, 364 | "outputs": [ 365 | { 366 | "data": { 367 | "text/plain": [ 368 | "array([1, 2, 3, 4])" 369 | ] 370 | }, 371 | "execution_count": 12, 372 | "metadata": {}, 373 | "output_type": "execute_result" 374 | } 375 | ], 376 | "source": [ 377 | "np.arange(1, 5)" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "It also works with floats:" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": 13, 390 | "metadata": {}, 391 | "outputs": [ 392 | { 393 | "data": { 394 | "text/plain": [ 395 | "array([1., 2., 3., 4.])" 396 | ] 397 | }, 398 | "execution_count": 13, 399 | "metadata": {}, 400 | "output_type": "execute_result" 401 | } 402 | ], 403 | "source": [ 404 | "np.arange(1.0, 5.0)" 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "metadata": {}, 410 | "source": [ 411 | "Of course, you can provide a step parameter:" 412 | ] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "execution_count": 14, 417 | "metadata": {}, 418 | "outputs": [ 419 | { 420 | "data": { 421 | "text/plain": [ 422 | "array([1. , 1.5, 2. , 2.5, 3. , 3.5, 4. , 4.5])" 423 | ] 424 | }, 425 | "execution_count": 14, 426 | "metadata": {}, 427 | "output_type": "execute_result" 428 | } 429 | ], 430 | "source": [ 431 | "np.arange(1, 5, step = 0.5)" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "### `np.linspace`\n", 439 | "This is similar to `seq()` in R. Its inputs are (start, stop, number of elements) and it returns evenly-spaced numbers over a specified interval. By default, the stop value is **included**." 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": 15, 445 | "metadata": {}, 446 | "outputs": [ 447 | { 448 | "data": { 449 | "text/plain": [ 450 | "array([ 0., 2., 4., 6., 8., 10.])" 451 | ] 452 | }, 453 | "execution_count": 15, 454 | "metadata": {}, 455 | "output_type": "execute_result" 456 | } 457 | ], 458 | "source": [ 459 | "np.linspace(0, 10, 6)" 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": {}, 465 | "source": [ 466 | "### `np.quantile`\n", 467 | "Computes the q-th quantile of its input. It plays nicely with `np.linspace`. " 468 | ] 469 | }, 470 | { 471 | "cell_type": "code", 472 | "execution_count": 16, 473 | "metadata": {}, 474 | "outputs": [ 475 | { 476 | "name": "stdout", 477 | "output_type": "stream", 478 | "text": [ 479 | "a = [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20]\n", 480 | "quartiles = [0. 0.25 0.5 0.75 1. ]\n" 481 | ] 482 | } 483 | ], 484 | "source": [ 485 | "a = np.arange(1, 21)\n", 486 | "print('a =', a)\n", 487 | "quartiles = np.linspace(0, 1, 5)\n", 488 | "print('quartiles =', quartiles)" 489 | ] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "execution_count": 17, 494 | "metadata": {}, 495 | "outputs": [ 496 | { 497 | "data": { 498 | "text/plain": [ 499 | "np.float64(10.5)" 500 | ] 501 | }, 502 | "execution_count": 17, 503 | "metadata": {}, 504 | "output_type": "execute_result" 505 | } 506 | ], 507 | "source": [ 508 | "np.quantile(a, 0.5) # how to compute the median" 509 | ] 510 | }, 511 | { 512 | "cell_type": "code", 513 | "execution_count": 18, 514 | "metadata": {}, 515 | "outputs": [ 516 | { 517 | "data": { 518 | "text/plain": [ 519 | "array([ 1. , 5.75, 10.5 , 15.25, 20. ])" 520 | ] 521 | }, 522 | "execution_count": 18, 523 | "metadata": {}, 524 | "output_type": "execute_result" 525 | } 526 | ], 527 | "source": [ 528 | "np.quantile(a, quartiles)" 529 | ] 530 | }, 531 | { 532 | "cell_type": "markdown", 533 | "metadata": {}, 534 | "source": [ 535 | "### `np.rand` and `np.randn`\n", 536 | "A number of functions are available in NumPy's `random` module to create arrays initialized with random values.\n", 537 | "For example, here is a matrix initialized with random floats between 0 and 1 (uniform distribution):" 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": 19, 543 | "metadata": {}, 544 | "outputs": [ 545 | { 546 | "data": { 547 | "text/plain": [ 548 | "array([[0.867, 0.295, 0.074],\n", 549 | " [0.419, 0.619, 0.623]])" 550 | ] 551 | }, 552 | "execution_count": 19, 553 | "metadata": {}, 554 | "output_type": "execute_result" 555 | } 556 | ], 557 | "source": [ 558 | "np.random.rand(2,3).round(3)" 559 | ] 560 | }, 561 | { 562 | "cell_type": "markdown", 563 | "metadata": {}, 564 | "source": [ 565 | "Here's a matrix containing random floats sampled from a univariate [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) (Gaussian distribution) with mean 0 and variance 1:" 566 | ] 567 | }, 568 | { 569 | "cell_type": "code", 570 | "execution_count": 20, 571 | "metadata": {}, 572 | "outputs": [ 573 | { 574 | "data": { 575 | "text/plain": [ 576 | "array([[-0.102, -1.267, 0.367],\n", 577 | " [ 0.564, 1.524, 0.681]])" 578 | ] 579 | }, 580 | "execution_count": 20, 581 | "metadata": {}, 582 | "output_type": "execute_result" 583 | } 584 | ], 585 | "source": [ 586 | "np.random.randn(2,3).round(3)" 587 | ] 588 | }, 589 | { 590 | "cell_type": "markdown", 591 | "metadata": {}, 592 | "source": [ 593 | "## Data types for arrays" 594 | ] 595 | }, 596 | { 597 | "cell_type": "markdown", 598 | "metadata": {}, 599 | "source": [ 600 | "| Type | Description |\n", 601 | "|----|---|\n", 602 | "| int16 | 16-bit integer types |\n", 603 | "| int32 | 32-bit integer types |\n", 604 | "| int64 | 64-bit integer types |\n", 605 | "| float16 | Half-precision floating point |\n", 606 | "| float32 | Standard single-precision floating point |\n", 607 | "| float64 | Standard double-precision floating point |\n", 608 | "| bool | Boolean (True or False) |\n", 609 | "| string_ | String |\n", 610 | "| object | A value can be any Python object |" 611 | ] 612 | }, 613 | { 614 | "cell_type": "markdown", 615 | "metadata": {}, 616 | "source": [ 617 | "### `np.array.dtype`\n", 618 | "NumPy's arrays are also efficient in part because all their elements must have the same type (usually numbers).\n", 619 | "You can check what the data type is by looking at the `dtype` attribute." 620 | ] 621 | }, 622 | { 623 | "cell_type": "code", 624 | "execution_count": 21, 625 | "metadata": {}, 626 | "outputs": [], 627 | "source": [ 628 | "arr1 = np.array([1, 2, 3], dtype = np.float64)" 629 | ] 630 | }, 631 | { 632 | "cell_type": "code", 633 | "execution_count": 22, 634 | "metadata": { 635 | "scrolled": true 636 | }, 637 | "outputs": [ 638 | { 639 | "name": "stdout", 640 | "output_type": "stream", 641 | "text": [ 642 | "Data type name: float64\n" 643 | ] 644 | } 645 | ], 646 | "source": [ 647 | "print(\"Data type name:\", arr1.dtype.name)" 648 | ] 649 | }, 650 | { 651 | "cell_type": "code", 652 | "execution_count": 23, 653 | "metadata": {}, 654 | "outputs": [], 655 | "source": [ 656 | "arr2 = np.array([1, 2, 3], dtype = np.int32)" 657 | ] 658 | }, 659 | { 660 | "cell_type": "code", 661 | "execution_count": 24, 662 | "metadata": {}, 663 | "outputs": [ 664 | { 665 | "name": "stdout", 666 | "output_type": "stream", 667 | "text": [ 668 | "int32 [1 2 3]\n" 669 | ] 670 | } 671 | ], 672 | "source": [ 673 | "print(arr2.dtype, arr2)" 674 | ] 675 | }, 676 | { 677 | "cell_type": "markdown", 678 | "metadata": {}, 679 | "source": [ 680 | "### `np.array.astype `\n", 681 | "You can explicitly convert or cast an array from one `dtype` to another using `astype` method." 682 | ] 683 | }, 684 | { 685 | "cell_type": "code", 686 | "execution_count": 25, 687 | "metadata": {}, 688 | "outputs": [ 689 | { 690 | "data": { 691 | "text/plain": [ 692 | "dtype('int32')" 693 | ] 694 | }, 695 | "execution_count": 25, 696 | "metadata": {}, 697 | "output_type": "execute_result" 698 | } 699 | ], 700 | "source": [ 701 | "arr2.dtype" 702 | ] 703 | }, 704 | { 705 | "cell_type": "code", 706 | "execution_count": 26, 707 | "metadata": {}, 708 | "outputs": [], 709 | "source": [ 710 | "arr2 = arr2.astype(np.float64)" 711 | ] 712 | }, 713 | { 714 | "cell_type": "code", 715 | "execution_count": 27, 716 | "metadata": {}, 717 | "outputs": [ 718 | { 719 | "data": { 720 | "text/plain": [ 721 | "dtype('float64')" 722 | ] 723 | }, 724 | "execution_count": 27, 725 | "metadata": {}, 726 | "output_type": "execute_result" 727 | } 728 | ], 729 | "source": [ 730 | "arr2.dtype # integers are now cast to floating point" 731 | ] 732 | }, 733 | { 734 | "cell_type": "markdown", 735 | "metadata": {}, 736 | "source": [ 737 | "If you have an array of strings representing numbers, you can use list comprehension to convert them to numeric form." 738 | ] 739 | }, 740 | { 741 | "cell_type": "code", 742 | "execution_count": 28, 743 | "metadata": {}, 744 | "outputs": [ 745 | { 746 | "data": { 747 | "text/plain": [ 748 | "array([ 1.25, -9.6 , 42. ])" 749 | ] 750 | }, 751 | "execution_count": 28, 752 | "metadata": {}, 753 | "output_type": "execute_result" 754 | } 755 | ], 756 | "source": [ 757 | "arr3 = np.array(['1.25', '-9.6', '42'])\n", 758 | "\n", 759 | "numeric_strings = np.array([float(x) for x in arr3])\n", 760 | "\n", 761 | "numeric_strings" 762 | ] 763 | }, 764 | { 765 | "cell_type": "code", 766 | "execution_count": 29, 767 | "metadata": {}, 768 | "outputs": [ 769 | { 770 | "data": { 771 | "text/plain": [ 772 | "array([ 1.25, -9.6 , 42. ])" 773 | ] 774 | }, 775 | "execution_count": 29, 776 | "metadata": {}, 777 | "output_type": "execute_result" 778 | } 779 | ], 780 | "source": [ 781 | "numeric_strings.astype(float) # this will not take effect unless you do set it to a new variable!" 782 | ] 783 | }, 784 | { 785 | "cell_type": "code", 786 | "execution_count": 30, 787 | "metadata": {}, 788 | "outputs": [ 789 | { 790 | "data": { 791 | "text/plain": [ 792 | "dtype('float64')" 793 | ] 794 | }, 795 | "execution_count": 30, 796 | "metadata": {}, 797 | "output_type": "execute_result" 798 | } 799 | ], 800 | "source": [ 801 | "numeric_strings.dtype" 802 | ] 803 | }, 804 | { 805 | "cell_type": "markdown", 806 | "metadata": {}, 807 | "source": [ 808 | "## Arithmetic operations on arrays" 809 | ] 810 | }, 811 | { 812 | "cell_type": "markdown", 813 | "metadata": {}, 814 | "source": [ 815 | "All the usual arithmetic operators (`+`, `-`, `*`, `/`, `//`, `**`, etc.) can be used with arrays. They apply *element-wise*." 816 | ] 817 | }, 818 | { 819 | "cell_type": "code", 820 | "execution_count": 31, 821 | "metadata": {}, 822 | "outputs": [ 823 | { 824 | "name": "stdout", 825 | "output_type": "stream", 826 | "text": [ 827 | "a + b = [19 27 35 43]\n", 828 | "a - b = [ 9 19 29 39]\n", 829 | "a * b = [70 92 96 82]\n", 830 | "a / b = [ 2.8 5.75 10.66666667 20.5 ]\n", 831 | "a // b = [ 2 5 10 20]\n", 832 | "a % b = [4 3 2 1]\n", 833 | "a ** b = [537824 279841 32768 1681]\n" 834 | ] 835 | } 836 | ], 837 | "source": [ 838 | "a = np.array([14, 23, 32, 41])\n", 839 | "b = np.array([5, 4, 3, 2])\n", 840 | "print(\"a + b =\", a + b)\n", 841 | "print(\"a - b =\", a - b)\n", 842 | "print(\"a * b =\", a * b)\n", 843 | "print(\"a / b =\", a / b)\n", 844 | "print(\"a // b =\", a // b)\n", 845 | "print(\"a % b =\", a % b)\n", 846 | "print(\"a ** b =\", a ** b)" 847 | ] 848 | }, 849 | { 850 | "cell_type": "markdown", 851 | "metadata": {}, 852 | "source": [ 853 | "Note that the multiplication is **not** a matrix multiplication.\n", 854 | "\n", 855 | "The arrays must have the same shape. If they do not, NumPy will apply the *broadcasting rules*, which is discussed further below." 856 | ] 857 | }, 858 | { 859 | "cell_type": "markdown", 860 | "metadata": {}, 861 | "source": [ 862 | "## Reshaping arrays" 863 | ] 864 | }, 865 | { 866 | "cell_type": "markdown", 867 | "metadata": {}, 868 | "source": [ 869 | "In many cases, you can convert an array from one shape to another without copying any data." 870 | ] 871 | }, 872 | { 873 | "cell_type": "markdown", 874 | "metadata": {}, 875 | "source": [ 876 | "### `np.array.shape`\n", 877 | "Changing the shape of an array is as simple as setting its `shape` attribute. However, the array's size must remain the same." 878 | ] 879 | }, 880 | { 881 | "cell_type": "code", 882 | "execution_count": 32, 883 | "metadata": {}, 884 | "outputs": [ 885 | { 886 | "name": "stdout", 887 | "output_type": "stream", 888 | "text": [ 889 | "[ 0 1 2 3 4 5 6 7 8 9 10 11]\n", 890 | "Rank: 1\n" 891 | ] 892 | } 893 | ], 894 | "source": [ 895 | "g = np.arange(12)\n", 896 | "print(g)\n", 897 | "print(\"Rank:\", g.ndim)" 898 | ] 899 | }, 900 | { 901 | "cell_type": "code", 902 | "execution_count": 33, 903 | "metadata": {}, 904 | "outputs": [ 905 | { 906 | "name": "stdout", 907 | "output_type": "stream", 908 | "text": [ 909 | "[[ 0 1]\n", 910 | " [ 2 3]\n", 911 | " [ 4 5]\n", 912 | " [ 6 7]\n", 913 | " [ 8 9]\n", 914 | " [10 11]]\n", 915 | "Rank: 2\n" 916 | ] 917 | } 918 | ], 919 | "source": [ 920 | "g.shape = (6, 2)\n", 921 | "print(g)\n", 922 | "print(\"Rank:\", g.ndim)" 923 | ] 924 | }, 925 | { 926 | "cell_type": "markdown", 927 | "metadata": {}, 928 | "source": [ 929 | "### `np.array.reshape`\n", 930 | "Another way to change an array's shape is to use the `reshape()` method, which returns a new array object. " 931 | ] 932 | }, 933 | { 934 | "cell_type": "code", 935 | "execution_count": 34, 936 | "metadata": { 937 | "scrolled": true 938 | }, 939 | "outputs": [ 940 | { 941 | "name": "stdout", 942 | "output_type": "stream", 943 | "text": [ 944 | "[[ 0 1 2]\n", 945 | " [ 3 4 5]\n", 946 | " [ 6 7 8]\n", 947 | " [ 9 10 11]]\n", 948 | "Rank: 2\n" 949 | ] 950 | } 951 | ], 952 | "source": [ 953 | "g2 = g.reshape(4,3) # you need to set this to a new variable to take effect!\n", 954 | "print(g2)\n", 955 | "print(\"Rank:\", g2.ndim)" 956 | ] 957 | }, 958 | { 959 | "cell_type": "markdown", 960 | "metadata": {}, 961 | "source": [ 962 | "How about we get lazy and let NumPy figure out the details?" 963 | ] 964 | }, 965 | { 966 | "cell_type": "code", 967 | "execution_count": 35, 968 | "metadata": {}, 969 | "outputs": [ 970 | { 971 | "name": "stdout", 972 | "output_type": "stream", 973 | "text": [ 974 | "[[ 0 1 2]\n", 975 | " [ 3 4 5]\n", 976 | " [ 6 7 8]\n", 977 | " [ 9 10 11]]\n" 978 | ] 979 | } 980 | ], 981 | "source": [ 982 | "g2 = g.reshape(4, -1) \n", 983 | "print(g2)" 984 | ] 985 | }, 986 | { 987 | "cell_type": "markdown", 988 | "metadata": {}, 989 | "source": [ 990 | "How to convert a multi-dimensional array back to 1-dimensional (a.k.a ***array flattening***): you can use the `flatten` method." 991 | ] 992 | }, 993 | { 994 | "cell_type": "code", 995 | "execution_count": 36, 996 | "metadata": {}, 997 | "outputs": [ 998 | { 999 | "name": "stdout", 1000 | "output_type": "stream", 1001 | "text": [ 1002 | "[[0 1]\n", 1003 | " [2 3]\n", 1004 | " [4 5]]\n", 1005 | "[0 1 2 3 4 5]\n", 1006 | "(6,)\n" 1007 | ] 1008 | } 1009 | ], 1010 | "source": [ 1011 | "f = np.arange(6).reshape(3,2)\n", 1012 | "print(f)\n", 1013 | "f = f.flatten() # you need to set this to a new variable to take effect!\n", 1014 | "print(f)\n", 1015 | "print(f.shape)" 1016 | ] 1017 | }, 1018 | { 1019 | "cell_type": "markdown", 1020 | "metadata": {}, 1021 | "source": [ 1022 | "## Adding and removing elements" 1023 | ] 1024 | }, 1025 | { 1026 | "cell_type": "markdown", 1027 | "metadata": {}, 1028 | "source": [ 1029 | "### `np.append` and `np.insert`" 1030 | ] 1031 | }, 1032 | { 1033 | "cell_type": "code", 1034 | "execution_count": 37, 1035 | "metadata": {}, 1036 | "outputs": [ 1037 | { 1038 | "name": "stdout", 1039 | "output_type": "stream", 1040 | "text": [ 1041 | "original array:\n", 1042 | " [0 1 2 3 4 5]\n", 1043 | "appending an element to the end:\n", 1044 | " [ 0 1 2 3 4 5 111]\n", 1045 | "inserting an element at a specific position:\n", 1046 | " [111 0 1 2 3 4 5]\n" 1047 | ] 1048 | } 1049 | ], 1050 | "source": [ 1051 | "a = np.arange(6)\n", 1052 | "print('original array:\\n', a)\n", 1053 | "\n", 1054 | "b = np.append(a, 111)\n", 1055 | "print('appending an element to the end:\\n', b)\n", 1056 | "\n", 1057 | "c = np.insert(a, 0, 111) \n", 1058 | "print('inserting an element at a specific position:\\n', c)\n", 1059 | "\n", 1060 | "# watch out: these will NOT work: a.append(111), a.insert(0, 111)\n" 1061 | ] 1062 | }, 1063 | { 1064 | "cell_type": "markdown", 1065 | "metadata": {}, 1066 | "source": [ 1067 | "### `np.delete`" 1068 | ] 1069 | }, 1070 | { 1071 | "cell_type": "code", 1072 | "execution_count": 38, 1073 | "metadata": {}, 1074 | "outputs": [ 1075 | { 1076 | "name": "stdout", 1077 | "output_type": "stream", 1078 | "text": [ 1079 | "deleting the first two elements:\n", 1080 | " [2 3 4 5]\n", 1081 | "a after resize():\n", 1082 | " [[0 1 2]\n", 1083 | " [3 4 5]]\n", 1084 | "first column deleted:\n", 1085 | " [[1 2]\n", 1086 | " [4 5]]\n", 1087 | "first row deleted:\n", 1088 | " [[3 4 5]]\n" 1089 | ] 1090 | } 1091 | ], 1092 | "source": [ 1093 | "a = np.arange(6)\n", 1094 | "a\n", 1095 | "c = np.delete(a, [0,1])\n", 1096 | "print('deleting the first two elements:\\n', c)\n", 1097 | "\n", 1098 | "a.resize(2,3)\n", 1099 | "print('a after resize():\\n', a)\n", 1100 | "\n", 1101 | "e = np.delete(a, 0, axis=1) # you can delete an entire column by specifying axis=1\n", 1102 | "print('first column deleted:\\n', e)\n", 1103 | "\n", 1104 | "f = np.delete(a, 0, axis=0) # or you can delete an entire row by specifying axis=0\n", 1105 | "print('first row deleted:\\n', f)" 1106 | ] 1107 | }, 1108 | { 1109 | "cell_type": "markdown", 1110 | "metadata": {}, 1111 | "source": [ 1112 | "## Copying arrays\n", 1113 | "\n", 1114 | "NumPy usually does not make copies for efficiency. Most assignments are just views, not copies. If you want a copy, you need to say so.\n", 1115 | "\n", 1116 | "You can use either `np.array.copy` or `np.copy`." 1117 | ] 1118 | }, 1119 | { 1120 | "cell_type": "code", 1121 | "execution_count": 39, 1122 | "metadata": {}, 1123 | "outputs": [ 1124 | { 1125 | "name": "stdout", 1126 | "output_type": "stream", 1127 | "text": [ 1128 | "[ True True True True True True]\n", 1129 | "False\n", 1130 | "True\n" 1131 | ] 1132 | }, 1133 | { 1134 | "data": { 1135 | "text/plain": [ 1136 | "array([0, 1, 2, 3, 4, 5])" 1137 | ] 1138 | }, 1139 | "execution_count": 39, 1140 | "metadata": {}, 1141 | "output_type": "execute_result" 1142 | } 1143 | ], 1144 | "source": [ 1145 | "b = a = np.arange(6)\n", 1146 | "a_copy = a.copy()\n", 1147 | "# alternatively,\n", 1148 | "a_copy = np.copy(a)\n", 1149 | "a\n", 1150 | "b\n", 1151 | "a_copy\n", 1152 | "print(a == a_copy) # element-wise comparison\n", 1153 | "print(a is a_copy) # this is False\n", 1154 | "print(a is b) # this is True\n", 1155 | "a[0] = -111 # changing a has no effect on a_copy\n", 1156 | "a\n", 1157 | "a_copy" 1158 | ] 1159 | }, 1160 | { 1161 | "cell_type": "markdown", 1162 | "metadata": {}, 1163 | "source": [ 1164 | "## Broadcasting" 1165 | ] 1166 | }, 1167 | { 1168 | "cell_type": "markdown", 1169 | "metadata": {}, 1170 | "source": [ 1171 | "Broadcasting describes how NumPy treats arrays with different shapes during arithmetic operations. Broadcasting can get complicated, so we recommend you avoid it all together if you can and do either one of the two things below:\n", 1172 | "* Broadcast only a scalar with an array\n", 1173 | "* Broadcast arrays of the same shape" 1174 | ] 1175 | }, 1176 | { 1177 | "cell_type": "code", 1178 | "execution_count": 40, 1179 | "metadata": {}, 1180 | "outputs": [ 1181 | { 1182 | "data": { 1183 | "text/plain": [ 1184 | "array([[ 6, 7],\n", 1185 | " [ 8, 9],\n", 1186 | " [10, 11]])" 1187 | ] 1188 | }, 1189 | "execution_count": 40, 1190 | "metadata": {}, 1191 | "output_type": "execute_result" 1192 | } 1193 | ], 1194 | "source": [ 1195 | "A = np.arange(6).reshape(3,2)\n", 1196 | "B = np.arange(6, 12).reshape(3,2)\n", 1197 | "A\n", 1198 | "B" 1199 | ] 1200 | }, 1201 | { 1202 | "cell_type": "code", 1203 | "execution_count": 41, 1204 | "metadata": {}, 1205 | "outputs": [ 1206 | { 1207 | "data": { 1208 | "text/plain": [ 1209 | "array([[ 6, 8],\n", 1210 | " [10, 12],\n", 1211 | " [14, 16]])" 1212 | ] 1213 | }, 1214 | "execution_count": 41, 1215 | "metadata": {}, 1216 | "output_type": "execute_result" 1217 | } 1218 | ], 1219 | "source": [ 1220 | "A + B" 1221 | ] 1222 | }, 1223 | { 1224 | "cell_type": "code", 1225 | "execution_count": 42, 1226 | "metadata": {}, 1227 | "outputs": [ 1228 | { 1229 | "data": { 1230 | "text/plain": [ 1231 | "array([[ 0, 3],\n", 1232 | " [ 6, 9],\n", 1233 | " [12, 15]])" 1234 | ] 1235 | }, 1236 | "execution_count": 42, 1237 | "metadata": {}, 1238 | "output_type": "execute_result" 1239 | } 1240 | ], 1241 | "source": [ 1242 | "3 * A" 1243 | ] 1244 | }, 1245 | { 1246 | "cell_type": "code", 1247 | "execution_count": 43, 1248 | "metadata": {}, 1249 | "outputs": [ 1250 | { 1251 | "data": { 1252 | "text/plain": [ 1253 | "array([[0. , 0.33],\n", 1254 | " [0.67, 1. ],\n", 1255 | " [1.33, 1.67]])" 1256 | ] 1257 | }, 1258 | "execution_count": 43, 1259 | "metadata": {}, 1260 | "output_type": "execute_result" 1261 | } 1262 | ], 1263 | "source": [ 1264 | "(A / 3).round(2) # float division" 1265 | ] 1266 | }, 1267 | { 1268 | "cell_type": "code", 1269 | "execution_count": 44, 1270 | "metadata": {}, 1271 | "outputs": [ 1272 | { 1273 | "data": { 1274 | "text/plain": [ 1275 | "array([[0, 0],\n", 1276 | " [0, 1],\n", 1277 | " [1, 1]])" 1278 | ] 1279 | }, 1280 | "execution_count": 44, 1281 | "metadata": {}, 1282 | "output_type": "execute_result" 1283 | } 1284 | ], 1285 | "source": [ 1286 | "A // 3 # integer division" 1287 | ] 1288 | }, 1289 | { 1290 | "cell_type": "code", 1291 | "execution_count": 45, 1292 | "metadata": {}, 1293 | "outputs": [ 1294 | { 1295 | "data": { 1296 | "text/plain": [ 1297 | "array([[11, 12],\n", 1298 | " [13, 14],\n", 1299 | " [15, 16]])" 1300 | ] 1301 | }, 1302 | "execution_count": 45, 1303 | "metadata": {}, 1304 | "output_type": "execute_result" 1305 | } 1306 | ], 1307 | "source": [ 1308 | "11 + A" 1309 | ] 1310 | }, 1311 | { 1312 | "cell_type": "markdown", 1313 | "metadata": {}, 1314 | "source": [ 1315 | "Element-wise matrix multiplication is done by `*`." 1316 | ] 1317 | }, 1318 | { 1319 | "cell_type": "code", 1320 | "execution_count": 46, 1321 | "metadata": {}, 1322 | "outputs": [ 1323 | { 1324 | "data": { 1325 | "text/plain": [ 1326 | "array([[ 0, 7],\n", 1327 | " [16, 27],\n", 1328 | " [40, 55]])" 1329 | ] 1330 | }, 1331 | "execution_count": 46, 1332 | "metadata": {}, 1333 | "output_type": "execute_result" 1334 | } 1335 | ], 1336 | "source": [ 1337 | "A * B" 1338 | ] 1339 | }, 1340 | { 1341 | "cell_type": "markdown", 1342 | "metadata": {}, 1343 | "source": [ 1344 | "For usual matrix multiplication, you need to use `np.dot`." 1345 | ] 1346 | }, 1347 | { 1348 | "cell_type": "code", 1349 | "execution_count": 47, 1350 | "metadata": {}, 1351 | "outputs": [ 1352 | { 1353 | "data": { 1354 | "text/plain": [ 1355 | "array([[ 9, 10, 11],\n", 1356 | " [39, 44, 49],\n", 1357 | " [69, 78, 87]])" 1358 | ] 1359 | }, 1360 | "execution_count": 47, 1361 | "metadata": {}, 1362 | "output_type": "execute_result" 1363 | } 1364 | ], 1365 | "source": [ 1366 | "B_new = B.reshape(2,-1)\n", 1367 | "B_new\n", 1368 | "np.dot(A, B_new)" 1369 | ] 1370 | }, 1371 | { 1372 | "cell_type": "markdown", 1373 | "metadata": {}, 1374 | "source": [ 1375 | "## Conditional expressions with arrays" 1376 | ] 1377 | }, 1378 | { 1379 | "cell_type": "code", 1380 | "execution_count": 48, 1381 | "metadata": {}, 1382 | "outputs": [ 1383 | { 1384 | "data": { 1385 | "text/plain": [ 1386 | "array([False, False, True, True, True])" 1387 | ] 1388 | }, 1389 | "execution_count": 48, 1390 | "metadata": {}, 1391 | "output_type": "execute_result" 1392 | } 1393 | ], 1394 | "source": [ 1395 | "x = np.array([10,20,30,40,50])\n", 1396 | "x >= 30" 1397 | ] 1398 | }, 1399 | { 1400 | "cell_type": "code", 1401 | "execution_count": 49, 1402 | "metadata": {}, 1403 | "outputs": [ 1404 | { 1405 | "data": { 1406 | "text/plain": [ 1407 | "array([30, 40, 50])" 1408 | ] 1409 | }, 1410 | "execution_count": 49, 1411 | "metadata": {}, 1412 | "output_type": "execute_result" 1413 | } 1414 | ], 1415 | "source": [ 1416 | "x[x >= 30]" 1417 | ] 1418 | }, 1419 | { 1420 | "cell_type": "markdown", 1421 | "metadata": {}, 1422 | "source": [ 1423 | "### `np.where`\n", 1424 | "Returns the indices of elements in an input array where the given condition is satisfied." 1425 | ] 1426 | }, 1427 | { 1428 | "cell_type": "code", 1429 | "execution_count": 50, 1430 | "metadata": {}, 1431 | "outputs": [ 1432 | { 1433 | "name": "stdout", 1434 | "output_type": "stream", 1435 | "text": [ 1436 | "[0 1 2 3 4 5 6 7 8 9]\n" 1437 | ] 1438 | }, 1439 | { 1440 | "data": { 1441 | "text/plain": [ 1442 | "(array([0, 1, 2, 3, 4]),)" 1443 | ] 1444 | }, 1445 | "execution_count": 50, 1446 | "metadata": {}, 1447 | "output_type": "execute_result" 1448 | } 1449 | ], 1450 | "source": [ 1451 | "y = np.arange(10)\n", 1452 | "print(y)\n", 1453 | "np.where(y < 5)" 1454 | ] 1455 | }, 1456 | { 1457 | "cell_type": "markdown", 1458 | "metadata": {}, 1459 | "source": [ 1460 | "**Extremely useful:** You can use *`where`* for vectorised if-else statements." 1461 | ] 1462 | }, 1463 | { 1464 | "cell_type": "code", 1465 | "execution_count": 51, 1466 | "metadata": {}, 1467 | "outputs": [ 1468 | { 1469 | "name": "stdout", 1470 | "output_type": "stream", 1471 | "text": [ 1472 | "[np.str_('smaller'), np.str_('smaller'), np.str_('smaller'), np.str_('smaller'), np.str_('smaller'), np.str_('bigger'), np.str_('bigger'), np.str_('bigger'), np.str_('bigger'), np.str_('bigger')]\n" 1473 | ] 1474 | } 1475 | ], 1476 | "source": [ 1477 | "compared_to_5 = list(np.where(y < 5, 'smaller', 'bigger'))\n", 1478 | "print(compared_to_5)" 1479 | ] 1480 | }, 1481 | { 1482 | "cell_type": "markdown", 1483 | "metadata": {}, 1484 | "source": [ 1485 | "## Mathematical and statistical functions" 1486 | ] 1487 | }, 1488 | { 1489 | "cell_type": "markdown", 1490 | "metadata": {}, 1491 | "source": [ 1492 | "A set of mathematical functions that compute statistics about an entire array or about the data along an axis are accessible as methods of the array class." 1493 | ] 1494 | }, 1495 | { 1496 | "cell_type": "code", 1497 | "execution_count": 52, 1498 | "metadata": {}, 1499 | "outputs": [ 1500 | { 1501 | "name": "stdout", 1502 | "output_type": "stream", 1503 | "text": [ 1504 | "[[-2.5 3.1 7. ]\n", 1505 | " [10. 11. 12. ]]\n" 1506 | ] 1507 | } 1508 | ], 1509 | "source": [ 1510 | "a = np.array([[-2.5, 3.1, 7], [10, 11, 12]])\n", 1511 | "print(a)" 1512 | ] 1513 | }, 1514 | { 1515 | "cell_type": "code", 1516 | "execution_count": 53, 1517 | "metadata": {}, 1518 | "outputs": [ 1519 | { 1520 | "data": { 1521 | "text/plain": [ 1522 | "np.float64(12.0)" 1523 | ] 1524 | }, 1525 | "execution_count": 53, 1526 | "metadata": {}, 1527 | "output_type": "execute_result" 1528 | } 1529 | ], 1530 | "source": [ 1531 | "np.max(a)" 1532 | ] 1533 | }, 1534 | { 1535 | "cell_type": "code", 1536 | "execution_count": 54, 1537 | "metadata": {}, 1538 | "outputs": [ 1539 | { 1540 | "data": { 1541 | "text/plain": [ 1542 | "np.float64(-2.5)" 1543 | ] 1544 | }, 1545 | "execution_count": 54, 1546 | "metadata": {}, 1547 | "output_type": "execute_result" 1548 | } 1549 | ], 1550 | "source": [ 1551 | "np.min(a)" 1552 | ] 1553 | }, 1554 | { 1555 | "cell_type": "code", 1556 | "execution_count": 55, 1557 | "metadata": {}, 1558 | "outputs": [ 1559 | { 1560 | "data": { 1561 | "text/plain": [ 1562 | "np.float64(6.767)" 1563 | ] 1564 | }, 1565 | "execution_count": 55, 1566 | "metadata": {}, 1567 | "output_type": "execute_result" 1568 | } 1569 | ], 1570 | "source": [ 1571 | "np.mean(a).round(3)" 1572 | ] 1573 | }, 1574 | { 1575 | "cell_type": "code", 1576 | "execution_count": 56, 1577 | "metadata": {}, 1578 | "outputs": [ 1579 | { 1580 | "data": { 1581 | "text/plain": [ 1582 | "np.float64(-71610.0)" 1583 | ] 1584 | }, 1585 | "execution_count": 56, 1586 | "metadata": {}, 1587 | "output_type": "execute_result" 1588 | } 1589 | ], 1590 | "source": [ 1591 | "np.prod(a)" 1592 | ] 1593 | }, 1594 | { 1595 | "cell_type": "code", 1596 | "execution_count": 57, 1597 | "metadata": {}, 1598 | "outputs": [ 1599 | { 1600 | "data": { 1601 | "text/plain": [ 1602 | "np.float64(5.085)" 1603 | ] 1604 | }, 1605 | "execution_count": 57, 1606 | "metadata": {}, 1607 | "output_type": "execute_result" 1608 | } 1609 | ], 1610 | "source": [ 1611 | "np.std(a).round(3)" 1612 | ] 1613 | }, 1614 | { 1615 | "cell_type": "code", 1616 | "execution_count": 58, 1617 | "metadata": {}, 1618 | "outputs": [ 1619 | { 1620 | "data": { 1621 | "text/plain": [ 1622 | "np.float64(25.856)" 1623 | ] 1624 | }, 1625 | "execution_count": 58, 1626 | "metadata": {}, 1627 | "output_type": "execute_result" 1628 | } 1629 | ], 1630 | "source": [ 1631 | "np.var(a).round(3)" 1632 | ] 1633 | }, 1634 | { 1635 | "cell_type": "code", 1636 | "execution_count": 59, 1637 | "metadata": {}, 1638 | "outputs": [ 1639 | { 1640 | "data": { 1641 | "text/plain": [ 1642 | "np.float64(40.6)" 1643 | ] 1644 | }, 1645 | "execution_count": 59, 1646 | "metadata": {}, 1647 | "output_type": "execute_result" 1648 | } 1649 | ], 1650 | "source": [ 1651 | "np.sum(a)" 1652 | ] 1653 | }, 1654 | { 1655 | "cell_type": "markdown", 1656 | "metadata": {}, 1657 | "source": [ 1658 | "These functions accept an optional argument `axis` which lets you ask for the operation to be performed on elements along the given axis. For example:" 1659 | ] 1660 | }, 1661 | { 1662 | "cell_type": "code", 1663 | "execution_count": 60, 1664 | "metadata": {}, 1665 | "outputs": [ 1666 | { 1667 | "data": { 1668 | "text/plain": [ 1669 | "array([[ 0, 1, 2, 3, 4, 5],\n", 1670 | " [ 6, 7, 8, 9, 10, 11]])" 1671 | ] 1672 | }, 1673 | "execution_count": 60, 1674 | "metadata": {}, 1675 | "output_type": "execute_result" 1676 | } 1677 | ], 1678 | "source": [ 1679 | "b = np.arange(12).reshape(2,-1)\n", 1680 | "b" 1681 | ] 1682 | }, 1683 | { 1684 | "cell_type": "code", 1685 | "execution_count": 61, 1686 | "metadata": {}, 1687 | "outputs": [ 1688 | { 1689 | "data": { 1690 | "text/plain": [ 1691 | "array([ 6, 8, 10, 12, 14, 16])" 1692 | ] 1693 | }, 1694 | "execution_count": 61, 1695 | "metadata": {}, 1696 | "output_type": "execute_result" 1697 | } 1698 | ], 1699 | "source": [ 1700 | "b.sum(axis=0) # sum across columns" 1701 | ] 1702 | }, 1703 | { 1704 | "cell_type": "code", 1705 | "execution_count": 62, 1706 | "metadata": {}, 1707 | "outputs": [ 1708 | { 1709 | "data": { 1710 | "text/plain": [ 1711 | "array([15, 51])" 1712 | ] 1713 | }, 1714 | "execution_count": 62, 1715 | "metadata": {}, 1716 | "output_type": "execute_result" 1717 | } 1718 | ], 1719 | "source": [ 1720 | "b.sum(axis=1) # sum across rows" 1721 | ] 1722 | }, 1723 | { 1724 | "cell_type": "markdown", 1725 | "metadata": {}, 1726 | "source": [ 1727 | "## Universal functions" 1728 | ] 1729 | }, 1730 | { 1731 | "cell_type": "markdown", 1732 | "metadata": {}, 1733 | "source": [ 1734 | "A universal function, or **ufunc**, is a function that performs element-wise operations on data in ndarrays. You can think of them as fast vectorized wrappers for simple functions that take one or more scalar values and produce one or more scalar results.\n", 1735 | "\n", 1736 | "Many ufuncs are simple element-wise transformations, like sqrt or exp. These are referred to as **unary ufuncs**. " 1737 | ] 1738 | }, 1739 | { 1740 | "cell_type": "code", 1741 | "execution_count": 63, 1742 | "metadata": {}, 1743 | "outputs": [], 1744 | "source": [ 1745 | "z = np.array([[-2.5, 3.1, 7], [10, 11, 12]])" 1746 | ] 1747 | }, 1748 | { 1749 | "cell_type": "markdown", 1750 | "metadata": {}, 1751 | "source": [ 1752 | "### `np.square`\n", 1753 | "Element-wise square of the input." 1754 | ] 1755 | }, 1756 | { 1757 | "cell_type": "code", 1758 | "execution_count": 64, 1759 | "metadata": {}, 1760 | "outputs": [ 1761 | { 1762 | "data": { 1763 | "text/plain": [ 1764 | "array([[ 6.25, 9.61, 49. ],\n", 1765 | " [100. , 121. , 144. ]])" 1766 | ] 1767 | }, 1768 | "execution_count": 64, 1769 | "metadata": {}, 1770 | "output_type": "execute_result" 1771 | } 1772 | ], 1773 | "source": [ 1774 | "np.square(z)" 1775 | ] 1776 | }, 1777 | { 1778 | "cell_type": "markdown", 1779 | "metadata": {}, 1780 | "source": [ 1781 | "### `np.exp`\n", 1782 | "Calculate the exponential of all elements in the input array." 1783 | ] 1784 | }, 1785 | { 1786 | "cell_type": "code", 1787 | "execution_count": 65, 1788 | "metadata": {}, 1789 | "outputs": [ 1790 | { 1791 | "data": { 1792 | "text/plain": [ 1793 | "array([[8.20849986e-02, 2.21979513e+01, 1.09663316e+03],\n", 1794 | " [2.20264658e+04, 5.98741417e+04, 1.62754791e+05]])" 1795 | ] 1796 | }, 1797 | "execution_count": 65, 1798 | "metadata": {}, 1799 | "output_type": "execute_result" 1800 | } 1801 | ], 1802 | "source": [ 1803 | "np.exp(z)" 1804 | ] 1805 | }, 1806 | { 1807 | "cell_type": "markdown", 1808 | "metadata": {}, 1809 | "source": [ 1810 | "### Binary universal functions\n", 1811 | "Others, such as add or maximum, take two arrays (thus, **binary ufuncs**) and return a single array as the result:" 1812 | ] 1813 | }, 1814 | { 1815 | "cell_type": "code", 1816 | "execution_count": 66, 1817 | "metadata": {}, 1818 | "outputs": [ 1819 | { 1820 | "name": "stdout", 1821 | "output_type": "stream", 1822 | "text": [ 1823 | "[3 6 1]\n", 1824 | "[4 2 9]\n" 1825 | ] 1826 | } 1827 | ], 1828 | "source": [ 1829 | "x = np.array([3, 6, 1])\n", 1830 | "y = np.array([4, 2, 9])\n", 1831 | "print(x)\n", 1832 | "print(y)" 1833 | ] 1834 | }, 1835 | { 1836 | "cell_type": "markdown", 1837 | "metadata": {}, 1838 | "source": [ 1839 | "### `np.maximum`\n", 1840 | "Element-wise maximum of array elements - do not confuse with `np.max` which finds the max element in the array." 1841 | ] 1842 | }, 1843 | { 1844 | "cell_type": "code", 1845 | "execution_count": 67, 1846 | "metadata": {}, 1847 | "outputs": [ 1848 | { 1849 | "data": { 1850 | "text/plain": [ 1851 | "array([4, 6, 9])" 1852 | ] 1853 | }, 1854 | "execution_count": 67, 1855 | "metadata": {}, 1856 | "output_type": "execute_result" 1857 | } 1858 | ], 1859 | "source": [ 1860 | "np.maximum(x,y)" 1861 | ] 1862 | }, 1863 | { 1864 | "cell_type": "markdown", 1865 | "metadata": {}, 1866 | "source": [ 1867 | "### `np.minimum`\n", 1868 | "Element-wise minimum of array elements - do not confuse with `np.min` which finds the min element in the array." 1869 | ] 1870 | }, 1871 | { 1872 | "cell_type": "code", 1873 | "execution_count": 68, 1874 | "metadata": {}, 1875 | "outputs": [ 1876 | { 1877 | "data": { 1878 | "text/plain": [ 1879 | "array([3, 2, 1])" 1880 | ] 1881 | }, 1882 | "execution_count": 68, 1883 | "metadata": {}, 1884 | "output_type": "execute_result" 1885 | } 1886 | ], 1887 | "source": [ 1888 | "np.minimum(x,y)" 1889 | ] 1890 | }, 1891 | { 1892 | "cell_type": "markdown", 1893 | "metadata": {}, 1894 | "source": [ 1895 | "### `np.power`\n", 1896 | "First array elements raised to powers from second array, element-wise." 1897 | ] 1898 | }, 1899 | { 1900 | "cell_type": "code", 1901 | "execution_count": 69, 1902 | "metadata": {}, 1903 | "outputs": [ 1904 | { 1905 | "data": { 1906 | "text/plain": [ 1907 | "array([81, 36, 1])" 1908 | ] 1909 | }, 1910 | "execution_count": 69, 1911 | "metadata": {}, 1912 | "output_type": "execute_result" 1913 | } 1914 | ], 1915 | "source": [ 1916 | "np.power(x,y)" 1917 | ] 1918 | }, 1919 | { 1920 | "cell_type": "markdown", 1921 | "metadata": {}, 1922 | "source": [ 1923 | "## Array indexing and slicing" 1924 | ] 1925 | }, 1926 | { 1927 | "cell_type": "markdown", 1928 | "metadata": {}, 1929 | "source": [ 1930 | "### One-dimensional arrays\n", 1931 | "One-dimensional NumPy arrays can be accessed more or less like regular Python arrays:" 1932 | ] 1933 | }, 1934 | { 1935 | "cell_type": "code", 1936 | "execution_count": 70, 1937 | "metadata": {}, 1938 | "outputs": [ 1939 | { 1940 | "data": { 1941 | "text/plain": [ 1942 | "np.int64(19)" 1943 | ] 1944 | }, 1945 | "execution_count": 70, 1946 | "metadata": {}, 1947 | "output_type": "execute_result" 1948 | } 1949 | ], 1950 | "source": [ 1951 | "a = np.array([1, 5, 3, 19, 13, 7, 3])\n", 1952 | "a[3]" 1953 | ] 1954 | }, 1955 | { 1956 | "cell_type": "code", 1957 | "execution_count": 71, 1958 | "metadata": {}, 1959 | "outputs": [ 1960 | { 1961 | "data": { 1962 | "text/plain": [ 1963 | "array([ 3, 19, 13])" 1964 | ] 1965 | }, 1966 | "execution_count": 71, 1967 | "metadata": {}, 1968 | "output_type": "execute_result" 1969 | } 1970 | ], 1971 | "source": [ 1972 | "a[2:5]" 1973 | ] 1974 | }, 1975 | { 1976 | "cell_type": "code", 1977 | "execution_count": 72, 1978 | "metadata": {}, 1979 | "outputs": [ 1980 | { 1981 | "data": { 1982 | "text/plain": [ 1983 | "array([ 3, 19, 13, 7])" 1984 | ] 1985 | }, 1986 | "execution_count": 72, 1987 | "metadata": {}, 1988 | "output_type": "execute_result" 1989 | } 1990 | ], 1991 | "source": [ 1992 | "a[2:-1]" 1993 | ] 1994 | }, 1995 | { 1996 | "cell_type": "code", 1997 | "execution_count": 73, 1998 | "metadata": {}, 1999 | "outputs": [ 2000 | { 2001 | "data": { 2002 | "text/plain": [ 2003 | "array([1, 5])" 2004 | ] 2005 | }, 2006 | "execution_count": 73, 2007 | "metadata": {}, 2008 | "output_type": "execute_result" 2009 | } 2010 | ], 2011 | "source": [ 2012 | "a[:2]" 2013 | ] 2014 | }, 2015 | { 2016 | "cell_type": "code", 2017 | "execution_count": 74, 2018 | "metadata": {}, 2019 | "outputs": [ 2020 | { 2021 | "data": { 2022 | "text/plain": [ 2023 | "array([ 3, 13, 3])" 2024 | ] 2025 | }, 2026 | "execution_count": 74, 2027 | "metadata": {}, 2028 | "output_type": "execute_result" 2029 | } 2030 | ], 2031 | "source": [ 2032 | "a[2::2]" 2033 | ] 2034 | }, 2035 | { 2036 | "cell_type": "code", 2037 | "execution_count": 75, 2038 | "metadata": {}, 2039 | "outputs": [ 2040 | { 2041 | "data": { 2042 | "text/plain": [ 2043 | "array([ 3, 7, 13, 19, 3, 5, 1])" 2044 | ] 2045 | }, 2046 | "execution_count": 75, 2047 | "metadata": {}, 2048 | "output_type": "execute_result" 2049 | } 2050 | ], 2051 | "source": [ 2052 | "a[::-1]" 2053 | ] 2054 | }, 2055 | { 2056 | "cell_type": "markdown", 2057 | "metadata": {}, 2058 | "source": [ 2059 | "Of course, you can modify elements:" 2060 | ] 2061 | }, 2062 | { 2063 | "cell_type": "code", 2064 | "execution_count": 76, 2065 | "metadata": {}, 2066 | "outputs": [ 2067 | { 2068 | "data": { 2069 | "text/plain": [ 2070 | "array([ 1, 5, 3, 999, 13, 7, 3])" 2071 | ] 2072 | }, 2073 | "execution_count": 76, 2074 | "metadata": {}, 2075 | "output_type": "execute_result" 2076 | } 2077 | ], 2078 | "source": [ 2079 | "a[3]=999\n", 2080 | "a" 2081 | ] 2082 | }, 2083 | { 2084 | "cell_type": "markdown", 2085 | "metadata": {}, 2086 | "source": [ 2087 | "You can also modify an array slice:" 2088 | ] 2089 | }, 2090 | { 2091 | "cell_type": "code", 2092 | "execution_count": 77, 2093 | "metadata": {}, 2094 | "outputs": [ 2095 | { 2096 | "data": { 2097 | "text/plain": [ 2098 | "array([ 1, 5, 997, 998, 999, 7, 3])" 2099 | ] 2100 | }, 2101 | "execution_count": 77, 2102 | "metadata": {}, 2103 | "output_type": "execute_result" 2104 | } 2105 | ], 2106 | "source": [ 2107 | "a[2:5] = [997, 998, 999]\n", 2108 | "a" 2109 | ] 2110 | }, 2111 | { 2112 | "cell_type": "markdown", 2113 | "metadata": {}, 2114 | "source": [ 2115 | "### Multi-dimensional arrays\n", 2116 | "Multi-dimensional arrays can be accessed in a similar way by providing an index or slice for each axis, separated by commas:" 2117 | ] 2118 | }, 2119 | { 2120 | "cell_type": "code", 2121 | "execution_count": 78, 2122 | "metadata": {}, 2123 | "outputs": [ 2124 | { 2125 | "data": { 2126 | "text/plain": [ 2127 | "array([[ 0, 1, 2],\n", 2128 | " [ 3, 4, 5],\n", 2129 | " [ 6, 7, 8],\n", 2130 | " [ 9, 10, 11]])" 2131 | ] 2132 | }, 2133 | "execution_count": 78, 2134 | "metadata": {}, 2135 | "output_type": "execute_result" 2136 | } 2137 | ], 2138 | "source": [ 2139 | "b = np.arange(12).reshape(4, 3)\n", 2140 | "b" 2141 | ] 2142 | }, 2143 | { 2144 | "cell_type": "code", 2145 | "execution_count": 79, 2146 | "metadata": {}, 2147 | "outputs": [ 2148 | { 2149 | "data": { 2150 | "text/plain": [ 2151 | "np.int64(4)" 2152 | ] 2153 | }, 2154 | "execution_count": 79, 2155 | "metadata": {}, 2156 | "output_type": "execute_result" 2157 | } 2158 | ], 2159 | "source": [ 2160 | "b[1, 1] # row 2, col 2 (recall that Python slices starting at index 0)" 2161 | ] 2162 | }, 2163 | { 2164 | "cell_type": "code", 2165 | "execution_count": 80, 2166 | "metadata": {}, 2167 | "outputs": [ 2168 | { 2169 | "data": { 2170 | "text/plain": [ 2171 | "array([0, 1, 2])" 2172 | ] 2173 | }, 2174 | "execution_count": 80, 2175 | "metadata": {}, 2176 | "output_type": "execute_result" 2177 | } 2178 | ], 2179 | "source": [ 2180 | "b[0, :] # row 1, all columns" 2181 | ] 2182 | }, 2183 | { 2184 | "cell_type": "code", 2185 | "execution_count": 81, 2186 | "metadata": {}, 2187 | "outputs": [ 2188 | { 2189 | "data": { 2190 | "text/plain": [ 2191 | "array([0, 3, 6, 9])" 2192 | ] 2193 | }, 2194 | "execution_count": 81, 2195 | "metadata": {}, 2196 | "output_type": "execute_result" 2197 | } 2198 | ], 2199 | "source": [ 2200 | "b[:, 0] # all rows, column 1" 2201 | ] 2202 | }, 2203 | { 2204 | "cell_type": "markdown", 2205 | "metadata": {}, 2206 | "source": [ 2207 | "**Caution**: Note the subtle difference between these two expressions: " 2208 | ] 2209 | }, 2210 | { 2211 | "cell_type": "code", 2212 | "execution_count": 82, 2213 | "metadata": { 2214 | "scrolled": true 2215 | }, 2216 | "outputs": [ 2217 | { 2218 | "name": "stdout", 2219 | "output_type": "stream", 2220 | "text": [ 2221 | "[0 1 2]\n", 2222 | "(3,)\n" 2223 | ] 2224 | } 2225 | ], 2226 | "source": [ 2227 | "c = b[0, :]\n", 2228 | "print(c)\n", 2229 | "print(c.shape)" 2230 | ] 2231 | }, 2232 | { 2233 | "cell_type": "code", 2234 | "execution_count": 83, 2235 | "metadata": {}, 2236 | "outputs": [ 2237 | { 2238 | "name": "stdout", 2239 | "output_type": "stream", 2240 | "text": [ 2241 | "[[0 1 2]]\n", 2242 | "(1, 3)\n" 2243 | ] 2244 | } 2245 | ], 2246 | "source": [ 2247 | "d = b[0:1, :]\n", 2248 | "print(d)\n", 2249 | "print(d.shape)" 2250 | ] 2251 | }, 2252 | { 2253 | "cell_type": "markdown", 2254 | "metadata": {}, 2255 | "source": [ 2256 | "The first expression returns row 1 as a 1D array of shape `(3,)`, while the second returns that same row as a 2D array of shape `(1, 3)`." 2257 | ] 2258 | }, 2259 | { 2260 | "cell_type": "markdown", 2261 | "metadata": {}, 2262 | "source": [ 2263 | "## Transposing arrays" 2264 | ] 2265 | }, 2266 | { 2267 | "cell_type": "markdown", 2268 | "metadata": {}, 2269 | "source": [ 2270 | "An array's `transpose()` method transposes the array." 2271 | ] 2272 | }, 2273 | { 2274 | "cell_type": "code", 2275 | "execution_count": 84, 2276 | "metadata": {}, 2277 | "outputs": [ 2278 | { 2279 | "data": { 2280 | "text/plain": [ 2281 | "array([[0, 1],\n", 2282 | " [2, 3],\n", 2283 | " [4, 5],\n", 2284 | " [6, 7],\n", 2285 | " [8, 9]])" 2286 | ] 2287 | }, 2288 | "execution_count": 84, 2289 | "metadata": {}, 2290 | "output_type": "execute_result" 2291 | } 2292 | ], 2293 | "source": [ 2294 | "a = np.arange(10).reshape(5,-1)\n", 2295 | "a" 2296 | ] 2297 | }, 2298 | { 2299 | "cell_type": "code", 2300 | "execution_count": 85, 2301 | "metadata": {}, 2302 | "outputs": [ 2303 | { 2304 | "data": { 2305 | "text/plain": [ 2306 | "array([[0, 2, 4, 6, 8],\n", 2307 | " [1, 3, 5, 7, 9]])" 2308 | ] 2309 | }, 2310 | "execution_count": 85, 2311 | "metadata": {}, 2312 | "output_type": "execute_result" 2313 | } 2314 | ], 2315 | "source": [ 2316 | "a = a.transpose() # notice the assignment for this method to work!\n", 2317 | "a" 2318 | ] 2319 | }, 2320 | { 2321 | "cell_type": "markdown", 2322 | "metadata": {}, 2323 | "source": [ 2324 | "## Combining arrays" 2325 | ] 2326 | }, 2327 | { 2328 | "cell_type": "markdown", 2329 | "metadata": {}, 2330 | "source": [ 2331 | "### `np.vstack`: stack arrays vertically" 2332 | ] 2333 | }, 2334 | { 2335 | "cell_type": "code", 2336 | "execution_count": 86, 2337 | "metadata": {}, 2338 | "outputs": [ 2339 | { 2340 | "name": "stdout", 2341 | "output_type": "stream", 2342 | "text": [ 2343 | "[1 2 3]\n", 2344 | "[-1 -2 -3]\n", 2345 | "[11 12 13]\n", 2346 | "stack vertically:\n", 2347 | " [[ 1 2 3]\n", 2348 | " [-1 -2 -3]\n", 2349 | " [11 12 13]]\n" 2350 | ] 2351 | } 2352 | ], 2353 | "source": [ 2354 | "a = 1 + np.arange(3)\n", 2355 | "b = -1 * a\n", 2356 | "c = 10 + a\n", 2357 | "print(a)\n", 2358 | "print(b)\n", 2359 | "print(c)\n", 2360 | "d = np.vstack((a, b, c)) # notice the double parantheses\n", 2361 | "print('stack vertically:\\n', d)" 2362 | ] 2363 | }, 2364 | { 2365 | "cell_type": "markdown", 2366 | "metadata": {}, 2367 | "source": [ 2368 | "### `np.hstack`: stack arrays horizontally" 2369 | ] 2370 | }, 2371 | { 2372 | "cell_type": "code", 2373 | "execution_count": 87, 2374 | "metadata": {}, 2375 | "outputs": [ 2376 | { 2377 | "name": "stdout", 2378 | "output_type": "stream", 2379 | "text": [ 2380 | "stack horizontally:\n", 2381 | " [ 1 2 3 -1 -2 -3 11 12 13]\n" 2382 | ] 2383 | } 2384 | ], 2385 | "source": [ 2386 | "d = np.hstack((a, b, c)) # notice the double parantheses\n", 2387 | "print('stack horizontally:\\n', d)" 2388 | ] 2389 | }, 2390 | { 2391 | "cell_type": "markdown", 2392 | "metadata": {}, 2393 | "source": [ 2394 | "## Sorting arrays\n", 2395 | "You can use an array's `sort` method, but pay attention as sorting is done **in-place**!" 2396 | ] 2397 | }, 2398 | { 2399 | "cell_type": "code", 2400 | "execution_count": 88, 2401 | "metadata": {}, 2402 | "outputs": [ 2403 | { 2404 | "name": "stdout", 2405 | "output_type": "stream", 2406 | "text": [ 2407 | "[ 3 5 -1 0 11]\n", 2408 | "a has been sorted in place:\n", 2409 | " [-1 0 3 5 11]\n", 2410 | "None\n" 2411 | ] 2412 | } 2413 | ], 2414 | "source": [ 2415 | "a = np.array([3, 5, -1, 0, 11])\n", 2416 | "print(a)\n", 2417 | "sort_output = a.sort()\n", 2418 | "print('a has been sorted in place:\\n', a)\n", 2419 | "print(sort_output) # tricky: this will print None!" 2420 | ] 2421 | }, 2422 | { 2423 | "cell_type": "markdown", 2424 | "metadata": {}, 2425 | "source": [ 2426 | "If you do not want to sort in place, you need to use **`np.sort`**." 2427 | ] 2428 | }, 2429 | { 2430 | "cell_type": "code", 2431 | "execution_count": 89, 2432 | "metadata": {}, 2433 | "outputs": [ 2434 | { 2435 | "name": "stdout", 2436 | "output_type": "stream", 2437 | "text": [ 2438 | "[ 3 5 -1 0 11]\n", 2439 | "[-1 0 3 5 11]\n", 2440 | "Notice a is not changed:\n", 2441 | " [ 3 5 -1 0 11]\n" 2442 | ] 2443 | } 2444 | ], 2445 | "source": [ 2446 | "a = np.array([3, 5, -1, 0, 11])\n", 2447 | "print(a)\n", 2448 | "b = np.sort(a)\n", 2449 | "print(b)\n", 2450 | "print('Notice a is not changed:\\n', a)" 2451 | ] 2452 | }, 2453 | { 2454 | "cell_type": "markdown", 2455 | "metadata": {}, 2456 | "source": [ 2457 | "If you want reverse sort, you need to do it indirectly as there is no direct option for it inside the `sort` methods." 2458 | ] 2459 | }, 2460 | { 2461 | "cell_type": "code", 2462 | "execution_count": 90, 2463 | "metadata": {}, 2464 | "outputs": [ 2465 | { 2466 | "name": "stdout", 2467 | "output_type": "stream", 2468 | "text": [ 2469 | "[11 5 3 0 -1]\n" 2470 | ] 2471 | } 2472 | ], 2473 | "source": [ 2474 | "a_reverse_sorted = np.sort(a)[::-1]\n", 2475 | "print(a_reverse_sorted)" 2476 | ] 2477 | }, 2478 | { 2479 | "cell_type": "markdown", 2480 | "metadata": {}, 2481 | "source": [ 2482 | "## Exercises" 2483 | ] 2484 | }, 2485 | { 2486 | "cell_type": "markdown", 2487 | "metadata": {}, 2488 | "source": [ 2489 | "1- Initialize a 5 $\\times$ 3 2D array with all numbers divisible by 3 between 3 and 48. **HINT**: `np.arange`'s argument `step`. For example, you can create an array of 0, 2, 4, 6, 8 by calling `np.arange(0, 10, step = 2)`. Then slice the last column of the array." 2490 | ] 2491 | }, 2492 | { 2493 | "cell_type": "markdown", 2494 | "metadata": {}, 2495 | "source": [ 2496 | "2- Create an array say `a = np.random.uniform(1, 10, 10)`. Find the location or index of the maximum value in `a`. How about the location of the minimum value? **HINT**: use `argmax` and `argmin` methods" 2497 | ] 2498 | }, 2499 | { 2500 | "cell_type": "markdown", 2501 | "metadata": {}, 2502 | "source": [ 2503 | "3- Create the following array and find the maximum values in each row. How about column-wise maximum values? **HINT**: use `np.amax`.\n", 2504 | "\n", 2505 | "$$A = \\begin{bmatrix} 1 & 3 & 4 \\\\ 2 & 7 & -1 \\end{bmatrix}$$" 2506 | ] 2507 | }, 2508 | { 2509 | "cell_type": "markdown", 2510 | "metadata": {}, 2511 | "source": [ 2512 | "4- Missing values such as `NA` and `nan` are not uncommon in data science (technically, `nan` is not a missing value. It stands for not-a-number.) Create the following matrix which contains one `nan` using `np.nan`.\n", 2513 | "\n", 2514 | "$$B = \\begin{bmatrix} 1 & 3 & \\text{nan} \\\\ 2 & 7 & -1 \\end{bmatrix}$$" 2515 | ] 2516 | }, 2517 | { 2518 | "cell_type": "markdown", 2519 | "metadata": {}, 2520 | "source": [ 2521 | "5- Find the column-wise and the row-wise maximum values in `B` created in the previous question. Does `np.amax` return any value? **HINT**: Try `np.nanmax` method." 2522 | ] 2523 | }, 2524 | { 2525 | "cell_type": "markdown", 2526 | "metadata": {}, 2527 | "source": [ 2528 | "### Possible solutions\n", 2529 | "\n", 2530 | "1- Initializing and slicing arrays\n", 2531 | "\n", 2532 | "```python\n", 2533 | "import numpy as np\n", 2534 | "# Create and reshape the array\n", 2535 | "myarray = np.arange(3, 48, step = 3)\n", 2536 | "myarray.shape = (5, 3)\n", 2537 | "\n", 2538 | "# Slice the last column\n", 2539 | "myarray[:,2]\n", 2540 | "```\n", 2541 | "\n", 2542 | "2- Indexing the maximum and minimum\n", 2543 | "\n", 2544 | "```python\n", 2545 | "import numpy as np\n", 2546 | "a = np.random.uniform(1, 10, 10)\n", 2547 | "a\n", 2548 | "a.argmax() # Find the maximum index\n", 2549 | "a.argmin() # Find the minimum index\n", 2550 | "```\n", 2551 | "\n", 2552 | "3- Column-wise and row-wise maximum and minimum values.\n", 2553 | "\n", 2554 | "```python\n", 2555 | "import numpy as np\n", 2556 | "A = np.array([[1, 3, 4],[2, 7, -1]])\n", 2557 | "np.amax(A, axis = 0) # Column-wise\n", 2558 | "np.amax(A, axis = 1) # Row-wise\n", 2559 | "```\n", 2560 | "\n", 2561 | "4- Creating `nan` with `numpy`.\n", 2562 | "\n", 2563 | "```python\n", 2564 | "import numpy as np\n", 2565 | "B = np.array([[1, 3, np.nan],[2, 7, -1]])\n", 2566 | "```\n", 2567 | "\n", 2568 | "5- Column-wise and row-wise maximum and minimum values in the presence of `nan` values.\n", 2569 | "\n", 2570 | "```python\n", 2571 | "import numpy as np\n", 2572 | "B = np.array([[1, 3, np.nan],[2, 7, -1]])\n", 2573 | "np.nanmax(B, axis = 0) # Column-wise\n", 2574 | "np.nanmax(B, axis = 1) # Row-wise\n", 2575 | "```" 2576 | ] 2577 | }, 2578 | { 2579 | "cell_type": "markdown", 2580 | "metadata": {}, 2581 | "source": [ 2582 | "---" 2583 | ] 2584 | } 2585 | ], 2586 | "metadata": { 2587 | "kernelspec": { 2588 | "display_name": "Python 3 (ipykernel)", 2589 | "language": "python", 2590 | "name": "python3" 2591 | }, 2592 | "language_info": { 2593 | "codemirror_mode": { 2594 | "name": "ipython", 2595 | "version": 3 2596 | }, 2597 | "file_extension": ".py", 2598 | "mimetype": "text/x-python", 2599 | "name": "python", 2600 | "nbconvert_exporter": "python", 2601 | "pygments_lexer": "ipython3", 2602 | "version": "3.11.9" 2603 | }, 2604 | "toc": { 2605 | "toc_cell": false, 2606 | "toc_number_sections": true, 2607 | "toc_section_display": "block", 2608 | "toc_threshold": 6, 2609 | "toc_window_display": false 2610 | }, 2611 | "toc_position": { 2612 | "height": "677px", 2613 | "left": "1195.02px", 2614 | "right": "20px", 2615 | "top": "78px", 2616 | "width": "238px" 2617 | } 2618 | }, 2619 | "nbformat": 4, 2620 | "nbformat_minor": 4 2621 | } 2622 | -------------------------------------------------------------------------------- /PB8_python_vs_r.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Python vs. R\n", 8 | "\n", 9 | "This tutorial summarizes some of the main differences between R and Python. It is meant to help you avoid some of the potential pitfalls if you are coming from an R programming background." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "**In Python, indexing starts at 0, so the first element of a list is selected by the 0-th index.**" 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "data": { 26 | "text/plain": [ 27 | "'A'" 28 | ] 29 | }, 30 | "execution_count": 1, 31 | "metadata": {}, 32 | "output_type": "execute_result" 33 | } 34 | ], 35 | "source": [ 36 | "lst = [\"A\", \"B\", 3.45]\n", 37 | "lst[0]" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "**R**:\n", 45 | "```R\n", 46 | "lst <- list(\"A\",\"B\", 3)\n", 47 | "\n", 48 | "lst[1]\n", 49 | "\n", 50 | "Output: \"A\"\n", 51 | "```" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "**Unlike R, the ending index is excluded in Python.**" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 2, 64 | "metadata": {}, 65 | "outputs": [ 66 | { 67 | "data": { 68 | "text/plain": [ 69 | "['A']" 70 | ] 71 | }, 72 | "execution_count": 2, 73 | "metadata": {}, 74 | "output_type": "execute_result" 75 | } 76 | ], 77 | "source": [ 78 | "lst[0:1]" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "data": { 88 | "text/plain": [ 89 | "['A', 'B']" 90 | ] 91 | }, 92 | "execution_count": 3, 93 | "metadata": {}, 94 | "output_type": "execute_result" 95 | } 96 | ], 97 | "source": [ 98 | "lst[0:2]" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "**In R, you use {} to define scope. In Python, there are no curlies, and you use indentation to define scope.**" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "**R**:\n", 113 | "```R\n", 114 | "printString <- function(x,y) {\n", 115 | "\n", 116 | "print(\"Hello!\") # indenting this line is not necessary\n", 117 | "\n", 118 | "}\n", 119 | "```" 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "In Python, when you indent, you need to end the line above with a colon. We suggest that you use 4 spaces for indentation, but tabs work just as fine." 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 4, 132 | "metadata": { 133 | "scrolled": true 134 | }, 135 | "outputs": [ 136 | { 137 | "name": "stdout", 138 | "output_type": "stream", 139 | "text": [ 140 | "String: Hello world!\n", 141 | "Numeric: Hello 123!\n", 142 | "Numeric: Hello 1.45!\n", 143 | "We don't greet strangers!\n" 144 | ] 145 | } 146 | ], 147 | "source": [ 148 | "def printInput(name):\n", 149 | " if type(name) is str:\n", 150 | " print(\"String: Hello \" + name + '!')\n", 151 | " elif type(name) is int or type(name) is float:\n", 152 | " print(\"Numeric: Hello \" + str(name) + '!')\n", 153 | " else:\n", 154 | " print(\"We don't greet strangers!\")\n", 155 | "\n", 156 | "printInput(\"world\")\n", 157 | "printInput(123)\n", 158 | "printInput(1.45)\n", 159 | "printInput(None)\n" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "**In Python, variables are passed as ``object references`` to functions. In R, they are passed as values.**\n", 167 | "\n", 168 | "Python uses static scoping. That is, you can figure out the scope of a variable by just looking at the code. Also, variables inside a function in Python are local to that function and they cannot be accessed from outside. However, passing a variable to a function in Python is a bit tricky. When you pass a variable to a function, Python creates a new local reference that points to that variable. For this reason, if you modify a mutable variable inside a Python function (like a list), it will also change in the main function. This is called ``unintended aliasing`` and it can result in hard to find bugs if you don't pay attention. Here is an example." 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 5, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "x = [1, 2, 3]" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 6, 183 | "metadata": {}, 184 | "outputs": [], 185 | "source": [ 186 | "def lst(x):\n", 187 | " x.append(4)\n", 188 | " return x" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 7, 194 | "metadata": {}, 195 | "outputs": [ 196 | { 197 | "data": { 198 | "text/plain": [ 199 | "[1, 2, 3, 4]" 200 | ] 201 | }, 202 | "execution_count": 7, 203 | "metadata": {}, 204 | "output_type": "execute_result" 205 | } 206 | ], 207 | "source": [ 208 | "lst(x)" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 8, 214 | "metadata": {}, 215 | "outputs": [ 216 | { 217 | "data": { 218 | "text/plain": [ 219 | "[1, 2, 3, 4]" 220 | ] 221 | }, 222 | "execution_count": 8, 223 | "metadata": {}, 224 | "output_type": "execute_result" 225 | } 226 | ], 227 | "source": [ 228 | "# Variable x globally changed too!\n", 229 | "x" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "If this is not what you want, you need to explicitly tell Python to create a copy of x and call it y. This way, both variables will be independent of each other." 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 9, 242 | "metadata": {}, 243 | "outputs": [], 244 | "source": [ 245 | "x = [1, 2, 3]\n", 246 | "def lst2(x):\n", 247 | " y = list(x) # create a copy of x and not a reference\n", 248 | " y.append(4) # change the copy\n", 249 | " return y" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": 10, 255 | "metadata": {}, 256 | "outputs": [ 257 | { 258 | "data": { 259 | "text/plain": [ 260 | "[1, 2, 3, 4]" 261 | ] 262 | }, 263 | "execution_count": 10, 264 | "metadata": {}, 265 | "output_type": "execute_result" 266 | } 267 | ], 268 | "source": [ 269 | "lst2(x)" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": 11, 275 | "metadata": {}, 276 | "outputs": [ 277 | { 278 | "data": { 279 | "text/plain": [ 280 | "[1, 2, 3]" 281 | ] 282 | }, 283 | "execution_count": 11, 284 | "metadata": {}, 285 | "output_type": "execute_result" 286 | } 287 | ], 288 | "source": [ 289 | "# Variable x is still [1, 2, 3]\n", 290 | "x" 291 | ] 292 | }, 293 | { 294 | "cell_type": "markdown", 295 | "metadata": {}, 296 | "source": [ 297 | "On the other hand, if you assign a different value to the reference pointing to an outside variable, that change will NOT be reflected outside, because remember, this reference is local to the function. Here is a simple example." 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": 12, 303 | "metadata": {}, 304 | "outputs": [], 305 | "source": [ 306 | "x = [1, 2, 3]\n", 307 | "def lst3(x):\n", 308 | " x = [4, 5, 6]\n", 309 | " x.append(7)\n", 310 | " return x" 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": 13, 316 | "metadata": {}, 317 | "outputs": [ 318 | { 319 | "data": { 320 | "text/plain": [ 321 | "[4, 5, 6, 7]" 322 | ] 323 | }, 324 | "execution_count": 13, 325 | "metadata": {}, 326 | "output_type": "execute_result" 327 | } 328 | ], 329 | "source": [ 330 | "lst3(x)" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 14, 336 | "metadata": {}, 337 | "outputs": [ 338 | { 339 | "data": { 340 | "text/plain": [ 341 | "[1, 2, 3]" 342 | ] 343 | }, 344 | "execution_count": 14, 345 | "metadata": {}, 346 | "output_type": "execute_result" 347 | } 348 | ], 349 | "source": [ 350 | "# Variable x is still [1, 2, 3]\n", 351 | "x" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "**Assignment is not always what you think.**" 359 | ] 360 | }, 361 | { 362 | "cell_type": "code", 363 | "execution_count": 15, 364 | "metadata": {}, 365 | "outputs": [], 366 | "source": [ 367 | "a1 = [1,1]\n", 368 | "a2 = [1,1]" 369 | ] 370 | }, 371 | { 372 | "cell_type": "code", 373 | "execution_count": 16, 374 | "metadata": {}, 375 | "outputs": [], 376 | "source": [ 377 | "# This simply creates a view: both a and b point to the \n", 378 | "# same location in the computer memory\n", 379 | "b = a1" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 17, 385 | "metadata": {}, 386 | "outputs": [], 387 | "source": [ 388 | "b[0] = 'boo!'" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 18, 394 | "metadata": {}, 395 | "outputs": [ 396 | { 397 | "name": "stdout", 398 | "output_type": "stream", 399 | "text": [ 400 | "['boo!', 1]\n" 401 | ] 402 | } 403 | ], 404 | "source": [ 405 | "print(a1)" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "If you want a real copy, do either one of the below:" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 19, 418 | "metadata": {}, 419 | "outputs": [ 420 | { 421 | "data": { 422 | "text/plain": [ 423 | "[1, 1]" 424 | ] 425 | }, 426 | "execution_count": 19, 427 | "metadata": {}, 428 | "output_type": "execute_result" 429 | } 430 | ], 431 | "source": [ 432 | "c = list(a2)\n", 433 | "# OR\n", 434 | "c = a2[:]\n", 435 | "c" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": {}, 441 | "source": [ 442 | "How to check if two variables point to the same address in the memory:" 443 | ] 444 | }, 445 | { 446 | "cell_type": "code", 447 | "execution_count": 20, 448 | "metadata": {}, 449 | "outputs": [ 450 | { 451 | "data": { 452 | "text/plain": [ 453 | "True" 454 | ] 455 | }, 456 | "execution_count": 20, 457 | "metadata": {}, 458 | "output_type": "execute_result" 459 | } 460 | ], 461 | "source": [ 462 | "b is a1" 463 | ] 464 | }, 465 | { 466 | "cell_type": "code", 467 | "execution_count": 21, 468 | "metadata": {}, 469 | "outputs": [ 470 | { 471 | "data": { 472 | "text/plain": [ 473 | "False" 474 | ] 475 | }, 476 | "execution_count": 21, 477 | "metadata": {}, 478 | "output_type": "execute_result" 479 | } 480 | ], 481 | "source": [ 482 | "c is a2" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "Tricky! How to check if two variables have the same value:" 490 | ] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "execution_count": 22, 495 | "metadata": {}, 496 | "outputs": [ 497 | { 498 | "data": { 499 | "text/plain": [ 500 | "True" 501 | ] 502 | }, 503 | "execution_count": 22, 504 | "metadata": {}, 505 | "output_type": "execute_result" 506 | } 507 | ], 508 | "source": [ 509 | "c == a2" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": {}, 515 | "source": [ 516 | "Python does this for memory efficiency. However, base types will work just fine:" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": 23, 522 | "metadata": {}, 523 | "outputs": [ 524 | { 525 | "name": "stdout", 526 | "output_type": "stream", 527 | "text": [ 528 | "1\n" 529 | ] 530 | } 531 | ], 532 | "source": [ 533 | "a = 1\n", 534 | "b = a\n", 535 | "b = 'boo!'\n", 536 | "print(a)" 537 | ] 538 | }, 539 | { 540 | "cell_type": "markdown", 541 | "metadata": {}, 542 | "source": [ 543 | "**In R, to perform exponentiation, you can use either the caret symbol or double asterisk. In Python, you can only use double asterisk because ^ is bitwise XOR in Python.**" 544 | ] 545 | }, 546 | { 547 | "cell_type": "markdown", 548 | "metadata": {}, 549 | "source": [ 550 | "So, here is $2^3$ (notice how you can embed Latex code inside a notebook):" 551 | ] 552 | }, 553 | { 554 | "cell_type": "markdown", 555 | "metadata": {}, 556 | "source": [ 557 | "**R**:\n", 558 | "```R\n", 559 | "Input: 2**3\n", 560 | "\n", 561 | "Output: 8\n", 562 | "\n", 563 | "Input: 2^3\n", 564 | "\n", 565 | "Output: 8\n", 566 | "```" 567 | ] 568 | }, 569 | { 570 | "cell_type": "code", 571 | "execution_count": 24, 572 | "metadata": {}, 573 | "outputs": [ 574 | { 575 | "data": { 576 | "text/plain": [ 577 | "8" 578 | ] 579 | }, 580 | "execution_count": 24, 581 | "metadata": {}, 582 | "output_type": "execute_result" 583 | } 584 | ], 585 | "source": [ 586 | "2**3" 587 | ] 588 | }, 589 | { 590 | "cell_type": "code", 591 | "execution_count": 25, 592 | "metadata": {}, 593 | "outputs": [ 594 | { 595 | "data": { 596 | "text/plain": [ 597 | "1" 598 | ] 599 | }, 600 | "execution_count": 25, 601 | "metadata": {}, 602 | "output_type": "execute_result" 603 | } 604 | ], 605 | "source": [ 606 | "2^3" 607 | ] 608 | }, 609 | { 610 | "cell_type": "markdown", 611 | "metadata": {}, 612 | "source": [ 613 | "**In R, you can usually use dot when naming variables and functions. In Python, you use dot to access methods and attributes of classes and objects. In Python, you should not use dot when naming anything.**" 614 | ] 615 | }, 616 | { 617 | "cell_type": "markdown", 618 | "metadata": {}, 619 | "source": [ 620 | "**R**:\n", 621 | "```R\n", 622 | "my.integer.variable <- 5\n", 623 | "```" 624 | ] 625 | }, 626 | { 627 | "cell_type": "code", 628 | "execution_count": 26, 629 | "metadata": {}, 630 | "outputs": [ 631 | { 632 | "name": "stdout", 633 | "output_type": "stream", 634 | "text": [ 635 | "[1, 2, 3]\n", 636 | "[1, 2, 3, 4]\n" 637 | ] 638 | } 639 | ], 640 | "source": [ 641 | "a = [1,2,3]\n", 642 | "print(a)\n", 643 | "\n", 644 | "a.append(4)\n", 645 | "print(a)" 646 | ] 647 | }, 648 | { 649 | "cell_type": "markdown", 650 | "metadata": {}, 651 | "source": [ 652 | "**In R, by default, reshaping of data happens column-wise. The default behaviour in Python is to reshape row-wise. This can cause subtle bugs that are hard to catch.**" 653 | ] 654 | }, 655 | { 656 | "cell_type": "markdown", 657 | "metadata": {}, 658 | "source": [ 659 | "**R**:\n", 660 | "```R\n", 661 | "matrix(0:9, nrow=2, ncol=5)\n", 662 | " [,1] [,2] [,3] [,4] [,5]\n", 663 | "[1,] 0 2 4 6 8\n", 664 | "[2,] 1 3 5 7 9\n", 665 | "```" 666 | ] 667 | }, 668 | { 669 | "cell_type": "code", 670 | "execution_count": 27, 671 | "metadata": {}, 672 | "outputs": [ 673 | { 674 | "data": { 675 | "text/plain": [ 676 | "array([[0, 1, 2, 3, 4],\n", 677 | " [5, 6, 7, 8, 9]])" 678 | ] 679 | }, 680 | "execution_count": 27, 681 | "metadata": {}, 682 | "output_type": "execute_result" 683 | } 684 | ], 685 | "source": [ 686 | "import numpy as np\n", 687 | "np.arange(10).reshape(2, 5)" 688 | ] 689 | }, 690 | { 691 | "cell_type": "markdown", 692 | "metadata": {}, 693 | "source": [ 694 | "However, you can force Python to do column-wise reshaping by setting the `order` parameter to 'F' inside the `reshape` function." 695 | ] 696 | }, 697 | { 698 | "cell_type": "code", 699 | "execution_count": 28, 700 | "metadata": {}, 701 | "outputs": [ 702 | { 703 | "data": { 704 | "text/plain": [ 705 | "array([[0, 2, 4, 6, 8],\n", 706 | " [1, 3, 5, 7, 9]])" 707 | ] 708 | }, 709 | "execution_count": 28, 710 | "metadata": {}, 711 | "output_type": "execute_result" 712 | } 713 | ], 714 | "source": [ 715 | "import numpy as np\n", 716 | "np.arange(10).reshape(2, 5, order='F')" 717 | ] 718 | }, 719 | { 720 | "cell_type": "markdown", 721 | "metadata": {}, 722 | "source": [ 723 | "***" 724 | ] 725 | } 726 | ], 727 | "metadata": { 728 | "kernelspec": { 729 | "display_name": "Python 3 (ipykernel)", 730 | "language": "python", 731 | "name": "python3" 732 | }, 733 | "language_info": { 734 | "codemirror_mode": { 735 | "name": "ipython", 736 | "version": 3 737 | }, 738 | "file_extension": ".py", 739 | "mimetype": "text/x-python", 740 | "name": "python", 741 | "nbconvert_exporter": "python", 742 | "pygments_lexer": "ipython3", 743 | "version": "3.11.9" 744 | } 745 | }, 746 | "nbformat": 4, 747 | "nbformat_minor": 4 748 | } 749 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ![Python](https://img.shields.io/badge/Python-3.x-blue.svg) 2 | ![License](https://img.shields.io/badge/License-MIT-brightgreen.svg) 3 | 4 | # Python Basics (PB) Tutorial Series 5 | 6 | Welcome to the Python Basics Tutorial Series! This repository contains a collection of Jupyter notebooks designed to help you learn the fundamental concepts and modules of Python programming together with Python's Data Science stack. These notebooks were written by yours truly, David Akman, and are my own work for the most part. They have been tested with Python 3.11. 7 | 8 | Each notebook focuses on a specific topic and provides a hands-on approach to learning through code examples and exercises. 9 | 10 | ## Notebooks Overview 11 | 12 | 1. **PB1_nb_intro.ipynb** 13 | - **Introduction to Jupyter Notebooks**: Learn the basics of using Jupyter Notebooks, including how to check for Python and module versions, spellchecking, and reading in CSV files. Understand the notebook essential features and package management in Python. 14 | 15 | 2. **PB2_nb_markdown.ipynb** 16 | - **Markdown in Jupyter Notebooks**: Explore how to use Markdown to format text in Jupyter Notebooks. Learn how to create headers, lists, links, images, and more to document your code effectively. 17 | 18 | 3. **PB3_intro_to_python.ipynb** 19 | - **Introduction to Python**: Get started with Python programming. This notebook covers the basic syntax, variables, data types, and control structures such as loops and conditionals. 20 | 21 | 4. **PB4_numpy.ipynb** 22 | - **NumPy Basics**: Dive into NumPy, the fundamental package for numerical computing in Python. Learn about arrays, array operations, and essential functions for scientific computing. 23 | 24 | 5. **PB5_pandas.ipynb** 25 | - **Pandas Basics**: Discover the power of Pandas for data manipulation and analysis. This notebook covers DataFrames, Series, data cleaning, and data transformation techniques. 26 | 27 | 6. **PB6_matplotlib.ipynb** 28 | - **Matplotlib for Data Visualisation**: Learn how to create visualisations using Matplotlib. This notebook covers basic plots, customisation options, and advanced plotting techniques. 29 | 30 | 7. **PB7_seaborn.ipynb** 31 | - **Seaborn for Statistical Plots**: Explore Seaborn, a Python visualisation library based on Matplotlib. Learn how to create attractive and informative statistical graphics. 32 | 33 | 8. **PB8_python_vs_r.ipynb** 34 | - **Python vs. R**: Compare Python and R, two popular languages for data analysis. If you are coming from an R programming background, you should definitely have a look at this tutorial for some key differences between these two languages at a basic level. 35 | 36 | ## Getting Started 37 | 38 | To get started with these notebooks, you need to have Jupyter notebook installed on your machine. If you haven't installed Jupyter notebook yet, you can do so by following the instructions on the [Jupyter website](https://jupyter.org/install). 39 | 40 | Once Jupyter notebook is installed, you can clone this repository and open the notebooks in Jupyter Notebook or JupyterLab: 41 | 42 | ```sh 43 | git clone 44 | cd 45 | jupyter notebook 46 | 47 | 48 | 49 | --------------------------------------------------------------------------------