├── README.md ├── lecture_learning_objectives.md └── lectures ├── circle.py ├── lecture1.ipynb ├── lecture2.ipynb ├── lecture3.ipynb ├── lecture4.ipynb └── listCopySmall.jpg /README.md: -------------------------------------------------------------------------------- 1 | # DSCI 511: Python Programming for Data Science 2 | 3 | Program design and data manipulation with Python. Overview of data structures, iteration, flow control, program design, and using libraries for data exploration and analysis. 4 | 5 | ## Course Learning Outcomes 6 | 7 |
8 | Click to expand! 9 |

10 | 11 | By the end of the course, students are expected to: 12 | 13 | 1. Translate fundamental programming concepts such as loops, conditionals, etc into Python code. 14 | 2. Understand the key data structures in Python. 15 | 3. Understand how to write functions in Python and assess if they are correct via unit testing. 16 | 4. Know when and how to abstract code (e.g., into functions, or classes) to make it more modular and robust. 17 | 5. Produce human-readable code that incorporates best practices of programming, documentation, and coding style. 18 | 6. Use NumPy perform common data wrangling and computational tasks in Python. 19 | 7. Use Pandas to create and manipulate data structures like Series and DataFrames. 20 | 8. Wrangle different types of data in Pandas including numeric data, strings, and datetimes. 21 | 22 | Specific learning objectives can be found in the [Lecture Learning Objectives](lecture_learning_objectives.md) document. 23 | 24 |

25 |
26 | 27 | 28 | ### Lectures 29 | 30 | The table below shows the general lecture outline; see the [Lecture Learning Objectives](lecture_learning_objectives.md) document for lecture-specific learning objectives. 31 | 32 | | Lecture | Topic | Optional Pre-readings | Practice exercises | 33 | | :-----: | ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------ | 34 | | 1 | Basics | [WTP: Section 3 - Section 7](https://jakevdp.github.io/WhirlwindTourOfPython/index.html) | 35 | | 2 | Loops & Functions | [WTP: Section 8 - Section 13](https://jakevdp.github.io/WhirlwindTourOfPython/index.html)
[PEP 257: Docstrings](https://www.python.org/dev/peps/pep-0257/)
[NumPy docstring examples](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html) | 36 | | 3 | Unit Tests & Classes | [Python documentation: 9. Classes](https://docs.python.org/3/tutorial/classes.html)
[Think Python](http://greenteapress.com/thinkpython/html/index.html): "Classes and objects", "Classes and functions", "Classes and methods" | 37 | | 4 | Style Guides, Scripts, Imports | [PEP 257: Style Guide](https://www.python.org/dev/peps/pep-0008/)
[Getting Started with Python in VS Code](https://code.visualstudio.com/docs/python/python-tutorial) up to "Run Hello World"
[Python documentation: 5. The import system](https://docs.python.org/3/reference/import.html) | 38 | | 5 | Introduction to NumPy | [PDSH: Introduction to Numpy](https://jakevdp.github.io/PythonDataScienceHandbook/02.00-introduction-to-numpy.html)
[Numpy documentation: Quickstart tutorial](https://numpy.org/doc/1.19/) | 39 | | 6 | Introduction to Pandas | [PDSH: Data Manipulation with Pandas](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html) up to "Operating on Data in Pandas"
[Pandas documentation: 10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html), up to "Selection" | 40 | | 7 | Basic Data Wrangling with Pandas | [PDSH: Data Manipulation with Pandas](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html)
[Pandas documentation: 10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html) | 41 | | 8 | Advanced Data Wrangling with Pandas | [PDSH: Data Manipulation with Pandas](https://jakevdp.github.io/PythonDataScienceHandbook/03.00-introduction-to-pandas.html)
[Pandas documentation: 10 minutes to pandas](https://pandas.pydata.org/docs/user_guide/10min.html) | 42 | 43 | ### Labs and Quizzes 44 | 45 | You are responsible for the following deliverables, which will determine your course grade: 46 | 47 | | Assessment | Weight | Due Date | Location | 48 | |------------------|--------|----------------------------|---------------------------| 49 | | Lab Assignment 1 | 15% | Sunday, Sept 13 at 18:00 | Submit to Github & Canvas | 50 | | Lab Assignment 2 | 15% | Saturday, Sept 19 at 18:00 | Submit to Github & Canvas | 51 | | Quiz 1 | 20% | Tuesday, Sept 22 at 14:00 | Online | 52 | | Lab Assignment 3 | 15% | Saturday, Sept 26 at 18:00 | Submit to Github & Canvas | 53 | | Lab Assignment 4 | 15% | Saturday, Oct 3 at 18:00 | Submit to Github & Canvas | 54 | | Quiz 2 | 20% | Tuesday, Oct 6 at 10:00 | Online | 55 | 56 | 57 | Quizzes will be held in week 3 and week 5, are open book and are typically 30 mins long with a focus on short-answer questions. More information on quizzes will be provided closer to their dates. 58 | 59 | 60 | ## Optional Additional Reference/Learning Materials 61 | 62 | * [Python documentation](https://docs.python.org/3/index.html) 63 | * [Think Python: How to Think Like a Computer Scientist](https://greenteapress.com/wp/think-python/) 64 | * [A Whirlwind Tour of Python (WTP)](https://jakevdp.github.io/WhirlwindTourOfPython/index.html), Jake VanderPlas (O’Reilly). Copyright 2016 O’Reilly Media, Inc., 978-1-491-96465-1. 65 | * [Python Data Science Handbook (PDSH)](https://github.com/jakevdp/PythonDataScienceHandbook), Jake VanderPlas (O’Reilly). Copyright 2016 O’Reilly Media, Inc., 978-1-491-91205-8. 66 | * [Python for Data Analysis](http://webcat1.library.ubc.ca/vwebv/holdingsInfo?searchId=1382036&recCount=10&recPointer=0&bibId=7430458), Wes McKinney (O'Reilly). Copyright 2013 O’Reilly Media, Inc, *you can download chapters from the book for free from the UBC library*. 67 | * [Kaggle Learn Python Tutorials](https://www.kaggle.com/learn/python) 68 | -------------------------------------------------------------------------------- /lecture_learning_objectives.md: -------------------------------------------------------------------------------- 1 | # Lecture Learning Objectives 2 | 3 | ## Lecture 1: Python Basics 4 | 5 | - Create, describe and differentiate standard Python datatypes such as `int`, `float`, `string`, `list`, `dict`, `tuple`, etc. 6 | - Perform arithmetic operations like `+`, `-`, `*`, `**` on numeric values. 7 | - Perform basic string operations like `.lower()`, `.split()` to manipulate strings. 8 | - Compute boolean values using comparison operators operations (`==`, `!=`, `>`, etc.) and boolean operators (`and`, `or`, `not`). 9 | - Assign, index, slice and subset values to and from tuples, lists, strings and dictionaries. 10 | - Write a conditional statement with `if`, `elif` and `else`. 11 | - Identify code blocks by levels of indentation. 12 | - Explain the difference between mutable objects like a `list` and immutable objects like a `tuple`. 13 | 14 | ## Lecture 2: Loops & Functions 15 | 16 | - Write `for` and `while` loops in Python 17 | - Identify iterable datatypes which can be used in `for` loops. 18 | - Create a `list`, `dictionary`, or `set` using comprehension. 19 | - Write a `try`/`except` statement. 20 | - Define a function and an anonymous function in Python. 21 | - Describe the difference between positional and keyword arguments. 22 | - Describe the difference between local and global variables. 23 | - Apply the `DRY principle` to write modular code. 24 | - Assess whether a function has side effects. 25 | - Write a docstring for a function that describes parameters, return values, behaviour and usage. 26 | 27 | ## Lecture 3: Unit Tests & Classes 28 | 29 | - Formulate a test case to prove a function design specification. 30 | - Use an `assert` statement to validate a test case. 31 | - Debug Python code with the `pdb` module, or by using `%debug` in a Jupyter code cell. 32 | - Describe the difference between a `class` and a `function` in Python. 33 | - Be able to create a `class`. 34 | - Differentiate between `instance attributes` and `class attributes`. 35 | - Differentiate between `methods`, `class methods` and `static methods`. 36 | - Understand and implement `subclassing`/`inheritance` with Python classes. 37 | 38 | ## Lecture 4: Style Guides, Scripts & Imports 39 | 40 | - Describe why code style is important. 41 | - Differentiate between the role of a linter like `flake8` and an autoformatter like `black`. 42 | - Implement linting and formatting from the command line or within Jupyter or another IDE. 43 | - Write a Python module (`.py` file) in VSCode or other IDE of your choice. 44 | - Import installed or custom packages using the `import` syntax. 45 | - Explain the notion of a reference in Python. 46 | - Explain the notion of scoping in Python. 47 | - Anticipate whether changing one variable will change another in Python. 48 | - Anticipate whether a function changes the caller's version of an argument variable in Python. 49 | - Select the appropriate choice between `==` and `is` in Python. 50 | 51 | ## Lecture 5: Introduction to NumPy 52 | 53 | - Use NumPy to create arrays with built-in functions inlcuding `np.array()`, `np.arange()`, `np.linspace()` and `np.full()`, `np.zeros()`, `np.ones()`. 54 | - Be able to access values from a NumPy array by numeric indexing and slicing and boolean indexing. 55 | - Perform mathematical operations on and with arrays. 56 | - Explain what broadcasting is and how to use it. 57 | - Reshape arrays by adding/removing/reshaping axes with `.reshape()`, `np.newaxis()`, `.ravel()`, `.flatten()`. 58 | - Understand how to use built-in NumPy functions like `np.sum()`, `np.mean()`, `np.log()` as stand alone functions or as methods of numpy arrays (when available). 59 | 60 | ## Lecture 6: Introduction to Pandas 61 | 62 | - Create Pandas series with `pd.Series()` and Pandas dataframe with `pd.DataFrame()`. 63 | - Be able to access values from a Series/DataFrame by indexing, slicing and boolean indexing using notation such as `df[]`, `df.loc[]`, `df.iloc[]`, `df.query[]`. 64 | - Perform basic arithmetic operations between two series and anticipate the result. 65 | - Describe how Pandas assigns dtypes to Series and what the `object` dtype is. 66 | - Read a standard .csv file from a local path or url using Pandas `pd.read_csv()`. 67 | - Apply basic operations to a dataframe like `.min()`, `.mean()`, `.sort_values()`, etc. 68 | - Explain the relationship and differences between `np.ndarray`, `pd.Series` and `pd.DataFrame` objects in Python. 69 | 70 | ## Lecture 7: Basic Data Wrangling with Pandas 71 | 72 | - Inspect a dataframe with `df.head()`, `df.tail()`, `df.info()`, `df.describe()`. 73 | - Obtain dataframe summaries with `df.info()` and `df.describe()`. 74 | - Manipulate how a dataframe displays in Jupyter by modifying Pandas configuration options such as `pd.set_option("display.max_rows", n)`. 75 | - Rename columns of a dataframe using the `df.rename()` function or by accessing the `df.columns` attribute. 76 | - Modify the index name and index values of a dataframe using `.set_index()`, `.reset_index()` , `df.index.name`, `.index`. 77 | - Use `df.melt()` and `df.pivot()` to reshape dataframes, specifically to make tidy dataframes. 78 | - Combine dataframes using `df.merge()` and `pd.concat()` and know when to use these different methods. 79 | - Apply functions to a dataframe `df.apply()` and `df.applymap()` 80 | - Perform grouping and aggregating operations using `df.groupby()` and `df.agg()`. 81 | - Perform aggregating methods on grouped or ungrouped objects such as finding the minimum, maximum and sum of values in a dataframe using `df.agg()`. 82 | - Remove or fill missing values in a dataframe with `df.dropna()` and `df.fillna()`. 83 | - Understand what the `SettingWithCopyWarning` means in Pandas. 84 | 85 | ## Lecture 8: Advanced Data Wrangling with Pandas 86 | 87 | - Manipulate strings in Pandas by accessing methods from the `Series.str` attribute. 88 | - Understand how to use regular expressions in Pandas for wrangling strings. 89 | - Differentiate between datetime object in Pandas such as `Timestamp`, `Timedelta`, `Period`, `DateOffset`. 90 | - Create these datetime objects with functions like `pd.Timestamp()`, `pd.Period()`, `pd.date_range()`, `pd.period_range()`. 91 | - Index a datetime index with partial string indexing. 92 | - Perform basic datetime operations like splitting a datetime into constituent parts (e.g., `year`, `weekday`, `second`, etc), apply offsets, change timezones, and resample with `.resample()`. 93 | - Make basic plots in Pandas by accessing the `.plot` attribute or importing functions from `pandas.plotting`. 94 | -------------------------------------------------------------------------------- /lectures/circle.py: -------------------------------------------------------------------------------- 1 | class Circle: 2 | """A circle with a centre (x,y) and radius r.""" 3 | 4 | def __init__(self, x, y, r): 5 | self.x = x 6 | self.y = y 7 | self.r = r 8 | 9 | def area(self): 10 | return np.pi * self.r**2 11 | 12 | def circumference(self): 13 | return 2.0 * np.pi * self.r 14 | 15 | def dist(self): 16 | """Compute the distance to the origin.""" 17 | return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r) 18 | 19 | def dist_between(self, other): 20 | """Compute the distance between this circle and another circle.""" 21 | return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r) 22 | 23 | def translate(self, Δx, Δy): 24 | """Move the circle by (Δx, Δy)""" 25 | self.x += Δx 26 | self.y += Δy 27 | return self # This is not needed, but is sometimes convenient. 28 | 29 | def __str__(self): 30 | return "A Circle at (%.1f, %.1f) with radius %.1f." % (self.x, self.y, self.r) 31 | 32 | 33 | MY_CONSTANT = 5 34 | 35 | def my_function(): 36 | pass 37 | -------------------------------------------------------------------------------- /lectures/lecture2.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# DSCI 511 Lecture 2" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "- Comments (0 min)\n", 15 | "- Why Python? (0 min)\n", 16 | "- Loops (15 min)\n", 17 | "- Comprehensions (5 min)\n", 18 | "- Functions intro (10 min)\n", 19 | "- DRY principle (15 min)\n", 20 | "- Break (5 min)\n", 21 | "- Keyword arguments (5 min)\n", 22 | "- Docstrings (10 min)\n", 23 | "- Unit tests, corner cases (10 min)\n", 24 | "- Multiple return values (5 min)" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "## Announcements\n", 32 | "\n", 33 | "- There are a few corrections to the lab, see the [Issues](https://github.ubc.ca/MDS-2019-20/DSCI_511_prog-dsci_students/issues) in the course repo (you should have gotten email from these Issues)." 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "## Comments (0 min)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "code", 45 | "execution_count": null, 46 | "metadata": {}, 47 | "outputs": [], 48 | "source": [ 49 | "x = 1 # this is a comment" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": null, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "\"\"\"\n", 59 | "this is a string, which does nothing\n", 60 | "and can be used as a comment\n", 61 | "\"\"\"\n", 62 | "\n", 63 | "7\n", 64 | "\n", 65 | "\n", 66 | "x = 1" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": {}, 72 | "source": [ 73 | "## Why Python? (0 min)\n", 74 | "\n", 75 | "- Why did we choose Python in the MDS program?\n", 76 | " - Extremely popular in DS (and beyond!)\n", 77 | " - Relatively easy to learn\n", 78 | " - Good documentation\n", 79 | " - **Huge user community**\n", 80 | " - Lots of Stack Overflow and other forums\n", 81 | " - Lots of useful packages (more onm this next week)" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "## Loops (10 min)" 89 | ] 90 | }, 91 | { 92 | "cell_type": "markdown", 93 | "metadata": {}, 94 | "source": [ 95 | "- Loops allow us to execute a block of code multiple times. \n", 96 | "- We will focus on [`for` loops](https://docs.python.org/3/tutorial/controlflow.html#for-statements)" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": null, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "for n in [2, 7, -1, 5]:\n", 106 | " print(\"The number is\", n, \"its square is\", n**2)\n", 107 | " # this is inside the loop\n", 108 | "# this is outside the loop" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "The main points to notice:\n", 116 | "\n", 117 | "* Keyword `for` begins the loop\n", 118 | "* Colon `:` ends the first line of the loop\n", 119 | "* We can iterate over any kind of iterable: list, tuple, range, string. In this case, we are iterating over the values in a list\n", 120 | "* Block of code indented is executed for each value in the list (hence the name \"for\" loops, sometimes also called \"for each\" loops)\n", 121 | "* The loop ends after the variable `n` has taken all the values in the list" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 3, 127 | "metadata": {}, 128 | "outputs": [ 129 | { 130 | "data": { 131 | "text/plain": [ 132 | "'abcdef'" 133 | ] 134 | }, 135 | "execution_count": 3, 136 | "metadata": {}, 137 | "output_type": "execute_result" 138 | } 139 | ], 140 | "source": [ 141 | "\"abc\" + \"def\"" 142 | ] 143 | }, 144 | { 145 | "cell_type": "code", 146 | "execution_count": 2, 147 | "metadata": {}, 148 | "outputs": [ 149 | { 150 | "name": "stdout", 151 | "output_type": "stream", 152 | "text": [ 153 | "Gimme a P!\n", 154 | "Gimme a y!\n", 155 | "Gimme a t!\n", 156 | "Gimme a h!\n", 157 | "Gimme a o!\n", 158 | "Gimme a n!\n", 159 | "What's that spell?!! Python!\n" 160 | ] 161 | } 162 | ], 163 | "source": [ 164 | "word = \"Python\"\n", 165 | "for letter in word:\n", 166 | " print(\"Gimme a \" + letter + \"!\")\n", 167 | "\n", 168 | "print(\"What's that spell?!! \" + word + \"!\")" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "- A very common pattern is to use `for` with `range`. \n", 176 | "- `range` gives you a sequence of integers up to some value." 177 | ] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "execution_count": 5, 182 | "metadata": {}, 183 | "outputs": [ 184 | { 185 | "name": "stdout", 186 | "output_type": "stream", 187 | "text": [ 188 | "0\n", 189 | "1\n", 190 | "2\n", 191 | "3\n", 192 | "4\n", 193 | "5\n", 194 | "6\n", 195 | "7\n", 196 | "8\n", 197 | "9\n" 198 | ] 199 | } 200 | ], 201 | "source": [ 202 | "for i in range(10):\n", 203 | " print(i)" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "We can also specify a start value and a skip-by value with `range`:" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": 6, 216 | "metadata": {}, 217 | "outputs": [ 218 | { 219 | "name": "stdout", 220 | "output_type": "stream", 221 | "text": [ 222 | "1\n", 223 | "11\n", 224 | "21\n", 225 | "31\n", 226 | "41\n", 227 | "51\n", 228 | "61\n", 229 | "71\n", 230 | "81\n", 231 | "91\n" 232 | ] 233 | } 234 | ], 235 | "source": [ 236 | "for i in range(1,101,10):\n", 237 | " print(i)" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "We can write a loop inside another loop to iterate over multiple dimensions of data. Consider the following loop as enumerating the coordinates in a 3 by 3 grid of points." 245 | ] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "execution_count": 8, 250 | "metadata": {}, 251 | "outputs": [ 252 | { 253 | "name": "stdout", 254 | "output_type": "stream", 255 | "text": [ 256 | "(1, 'a')\n", 257 | "(1, 'b')\n", 258 | "(1, 'c')\n", 259 | "(2, 'a')\n", 260 | "(2, 'b')\n", 261 | "(2, 'c')\n", 262 | "(3, 'a')\n", 263 | "(3, 'b')\n", 264 | "(3, 'c')\n" 265 | ] 266 | } 267 | ], 268 | "source": [ 269 | "for x in [1,2,3]:\n", 270 | " for y in [\"a\",\"b\",\"c\"]:\n", 271 | " print((x,y))" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "execution_count": 9, 277 | "metadata": {}, 278 | "outputs": [ 279 | { 280 | "name": "stdout", 281 | "output_type": "stream", 282 | "text": [ 283 | "1 a\n", 284 | "2 b\n", 285 | "3 c\n" 286 | ] 287 | } 288 | ], 289 | "source": [ 290 | "list_1 = [1,2,3]\n", 291 | "list_2 = [\"a\",\"b\",\"c\"]\n", 292 | "for i in range(3):\n", 293 | " print(list_1[i], list_2[i])" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "We can loop through key-value pairs of a dictionary using `.items()`:" 301 | ] 302 | }, 303 | { 304 | "cell_type": "code", 305 | "execution_count": 10, 306 | "metadata": {}, 307 | "outputs": [ 308 | { 309 | "name": "stdout", 310 | "output_type": "stream", 311 | "text": [ 312 | "DSCI 521 is awesome\n", 313 | "DSCI 551 is riveting\n", 314 | "DSCI 511 is naptime!\n" 315 | ] 316 | } 317 | ], 318 | "source": [ 319 | "courses = {521 : \"awesome\",\n", 320 | " 551 : \"riveting\",\n", 321 | " 511 : \"naptime!\"}\n", 322 | "\n", 323 | "for course_num, description in courses.items():\n", 324 | " print(\"DSCI\", course_num, \"is\", description)" 325 | ] 326 | }, 327 | { 328 | "cell_type": "code", 329 | "execution_count": 14, 330 | "metadata": {}, 331 | "outputs": [ 332 | { 333 | "name": "stdout", 334 | "output_type": "stream", 335 | "text": [ 336 | "521 awesome\n", 337 | "551 riveting\n", 338 | "511 naptime!\n" 339 | ] 340 | } 341 | ], 342 | "source": [ 343 | "for course_num in courses:\n", 344 | " print(course_num, courses[course_num])" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": {}, 350 | "source": [ 351 | "Above: the general syntax is `for key, value in dictionary.items():`" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "#### `while` loops\n", 359 | "\n", 360 | "- We can also use a [`while` loop](https://docs.python.org/3/reference/compound_stmts.html#while) to excute a block of code several times. \n", 361 | "- In reality, I rarely use these.\n", 362 | "- Beware! If the conditional expression is always `True`, then you've got an infintite loop! \n", 363 | " - (Use the \"Stop\" button in the toolbar above, or Ctrl-C in the terminal, to kill the program if you get an infinite loop.)" 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "execution_count": 12, 369 | "metadata": {}, 370 | "outputs": [ 371 | { 372 | "name": "stdout", 373 | "output_type": "stream", 374 | "text": [ 375 | "10\n", 376 | "9\n", 377 | "8\n", 378 | "7\n", 379 | "6\n", 380 | "5\n", 381 | "4\n", 382 | "3\n", 383 | "2\n", 384 | "1\n", 385 | "Blast off!\n" 386 | ] 387 | } 388 | ], 389 | "source": [ 390 | "n = 10\n", 391 | "while n > 0:\n", 392 | " print(n)\n", 393 | " n = n - 1\n", 394 | "\n", 395 | "print(\"Blast off!\")" 396 | ] 397 | }, 398 | { 399 | "cell_type": "markdown", 400 | "metadata": {}, 401 | "source": [ 402 | "## Comprehensions (5 min)\n", 403 | "\n", 404 | "Comprehensions allow us to build lists/tuples/sets/dictionaries in one convenient, compact line of code." 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": 18, 410 | "metadata": { 411 | "collapsed": false, 412 | "jupyter": { 413 | "outputs_hidden": false 414 | } 415 | }, 416 | "outputs": [ 417 | { 418 | "data": { 419 | "text/plain": [ 420 | "['o', 'e', 'e', 'm']" 421 | ] 422 | }, 423 | "execution_count": 18, 424 | "metadata": {}, 425 | "output_type": "execute_result" 426 | } 427 | ], 428 | "source": [ 429 | "words = [\"hello\", \"goodbye\", \"the\", \"antidisestablishmentarianism\"]\n", 430 | "\n", 431 | "y = [word[-1] for word in words] # list comprehension\n", 432 | "y" 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": 17, 438 | "metadata": {}, 439 | "outputs": [ 440 | { 441 | "data": { 442 | "text/plain": [ 443 | "['o', 'e', 'e', 'm']" 444 | ] 445 | }, 446 | "execution_count": 17, 447 | "metadata": {}, 448 | "output_type": "execute_result" 449 | } 450 | ], 451 | "source": [ 452 | "y = list()\n", 453 | "for word in words:\n", 454 | " y.append(word[-1])\n", 455 | "y" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 21, 461 | "metadata": {}, 462 | "outputs": [ 463 | { 464 | "name": "stdout", 465 | "output_type": "stream", 466 | "text": [ 467 | " at 0x106730228>\n" 468 | ] 469 | } 470 | ], 471 | "source": [ 472 | "y = (word[-1] for word in words) # this is NOT a tuple comprehension - more on generators later\n", 473 | "print(y)" 474 | ] 475 | }, 476 | { 477 | "cell_type": "code", 478 | "execution_count": 19, 479 | "metadata": {}, 480 | "outputs": [ 481 | { 482 | "name": "stdout", 483 | "output_type": "stream", 484 | "text": [ 485 | "{'m', 'o', 'e'}\n" 486 | ] 487 | } 488 | ], 489 | "source": [ 490 | "y = {word[-1] for word in words} # set comprehension\n", 491 | "print(y)" 492 | ] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "execution_count": 20, 497 | "metadata": { 498 | "collapsed": false, 499 | "jupyter": { 500 | "outputs_hidden": false 501 | } 502 | }, 503 | "outputs": [ 504 | { 505 | "data": { 506 | "text/plain": [ 507 | "{'hello': 5, 'goodbye': 7, 'the': 3, 'antidisestablishmentarianism': 28}" 508 | ] 509 | }, 510 | "execution_count": 20, 511 | "metadata": {}, 512 | "output_type": "execute_result" 513 | } 514 | ], 515 | "source": [ 516 | "word_lengths = {word : len(word) for word in words} # dictionary comprehension\n", 517 | "word_lengths" 518 | ] 519 | }, 520 | { 521 | "cell_type": "markdown", 522 | "metadata": {}, 523 | "source": [ 524 | "## Functions intro (5 min)\n", 525 | "\n", 526 | "- Define a [**function**](https://docs.python.org/3/tutorial/controlflow.html#defining-functions) to re-use a block of code with different input parameters, also known as **arguments**. \n", 527 | "- For example, define a function called `square` which takes one input parameter `n` and returns the square `n**2`." 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 26, 533 | "metadata": {}, 534 | "outputs": [], 535 | "source": [ 536 | "def square(n):\n", 537 | " n_squared = n**2\n", 538 | " return n_squared" 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 27, 544 | "metadata": {}, 545 | "outputs": [ 546 | { 547 | "data": { 548 | "text/plain": [ 549 | "4" 550 | ] 551 | }, 552 | "execution_count": 27, 553 | "metadata": {}, 554 | "output_type": "execute_result" 555 | } 556 | ], 557 | "source": [ 558 | "square(2)" 559 | ] 560 | }, 561 | { 562 | "cell_type": "code", 563 | "execution_count": 28, 564 | "metadata": {}, 565 | "outputs": [ 566 | { 567 | "data": { 568 | "text/plain": [ 569 | "10000" 570 | ] 571 | }, 572 | "execution_count": 28, 573 | "metadata": {}, 574 | "output_type": "execute_result" 575 | } 576 | ], 577 | "source": [ 578 | "square(100)" 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": 29, 584 | "metadata": {}, 585 | "outputs": [ 586 | { 587 | "data": { 588 | "text/plain": [ 589 | "152399025" 590 | ] 591 | }, 592 | "execution_count": 29, 593 | "metadata": {}, 594 | "output_type": "execute_result" 595 | } 596 | ], 597 | "source": [ 598 | "square(12345)" 599 | ] 600 | }, 601 | { 602 | "cell_type": "markdown", 603 | "metadata": {}, 604 | "source": [ 605 | "* Begins with `def` keyword, function name, input parameters and then colon (`:`)\n", 606 | "* Function block defined by indentation\n", 607 | "* Output or \"return\" value of the function is given by the `return` keyword" 608 | ] 609 | }, 610 | { 611 | "cell_type": "markdown", 612 | "metadata": {}, 613 | "source": [ 614 | "#### Side effects\n", 615 | "\n", 616 | "- If a function changes the variables passed into it, then it is said to have **side effects**\n", 617 | "- Example:" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 40, 623 | "metadata": {}, 624 | "outputs": [], 625 | "source": [ 626 | "def silly_sum(sri):\n", 627 | " sri.append(0)\n", 628 | " return sum(sri)\n", 629 | " " 630 | ] 631 | }, 632 | { 633 | "cell_type": "code", 634 | "execution_count": 41, 635 | "metadata": {}, 636 | "outputs": [ 637 | { 638 | "data": { 639 | "text/plain": [ 640 | "10" 641 | ] 642 | }, 643 | "execution_count": 41, 644 | "metadata": {}, 645 | "output_type": "execute_result" 646 | } 647 | ], 648 | "source": [ 649 | "silly_sum([1,2,3,4])" 650 | ] 651 | }, 652 | { 653 | "cell_type": "markdown", 654 | "metadata": {}, 655 | "source": [ 656 | "Looks good, like it sums the numbers? But wait...\n" 657 | ] 658 | }, 659 | { 660 | "cell_type": "code", 661 | "execution_count": 42, 662 | "metadata": {}, 663 | "outputs": [ 664 | { 665 | "data": { 666 | "text/plain": [ 667 | "10" 668 | ] 669 | }, 670 | "execution_count": 42, 671 | "metadata": {}, 672 | "output_type": "execute_result" 673 | } 674 | ], 675 | "source": [ 676 | "lst = [1,2,3,4]\n", 677 | "silly_sum(lst)" 678 | ] 679 | }, 680 | { 681 | "cell_type": "code", 682 | "execution_count": 43, 683 | "metadata": {}, 684 | "outputs": [ 685 | { 686 | "data": { 687 | "text/plain": [ 688 | "[1, 2, 3, 4, 0]" 689 | ] 690 | }, 691 | "execution_count": 43, 692 | "metadata": {}, 693 | "output_type": "execute_result" 694 | } 695 | ], 696 | "source": [ 697 | "lst" 698 | ] 699 | }, 700 | { 701 | "cell_type": "markdown", 702 | "metadata": {}, 703 | "source": [ 704 | "- If you function has side effects like this, you must mention it in the documentation (later today).\n", 705 | "- More on how this works in Tuesday's class." 706 | ] 707 | }, 708 | { 709 | "cell_type": "markdown", 710 | "metadata": {}, 711 | "source": [ 712 | "#### Null return type\n", 713 | "\n", 714 | "If you do not specify a return value, the function returns `None` when it terminates:" 715 | ] 716 | }, 717 | { 718 | "cell_type": "code", 719 | "execution_count": 44, 720 | "metadata": {}, 721 | "outputs": [ 722 | { 723 | "name": "stdout", 724 | "output_type": "stream", 725 | "text": [ 726 | "None\n" 727 | ] 728 | } 729 | ], 730 | "source": [ 731 | "def f(x):\n", 732 | " x + 1 # no return!\n", 733 | " if x == 999:\n", 734 | " return\n", 735 | "print(f(0))" 736 | ] 737 | }, 738 | { 739 | "cell_type": "markdown", 740 | "metadata": {}, 741 | "source": [ 742 | "## DRY principle, designing good functions (15 min)\n", 743 | "\n", 744 | "- DRY: **Don't Repeat Yourself**\n", 745 | "- See [Wikipedia article](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself)\n", 746 | "- Consider the task of, for each element of a list, turning it into a palindrome\n", 747 | " - e.g. \"mike\" --> \"mikeekim\"" 748 | ] 749 | }, 750 | { 751 | "cell_type": "code", 752 | "execution_count": 45, 753 | "metadata": {}, 754 | "outputs": [], 755 | "source": [ 756 | "names = [\"milad\", \"rodolfo\", \"tiffany\"]" 757 | ] 758 | }, 759 | { 760 | "cell_type": "code", 761 | "execution_count": 46, 762 | "metadata": {}, 763 | "outputs": [ 764 | { 765 | "data": { 766 | "text/plain": [ 767 | "'ekim'" 768 | ] 769 | }, 770 | "execution_count": 46, 771 | "metadata": {}, 772 | "output_type": "execute_result" 773 | } 774 | ], 775 | "source": [ 776 | "name = \"mike\"\n", 777 | "name[::-1]" 778 | ] 779 | }, 780 | { 781 | "cell_type": "code", 782 | "execution_count": 47, 783 | "metadata": {}, 784 | "outputs": [ 785 | { 786 | "data": { 787 | "text/plain": [ 788 | "['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']" 789 | ] 790 | }, 791 | "execution_count": 47, 792 | "metadata": {}, 793 | "output_type": "execute_result" 794 | } 795 | ], 796 | "source": [ 797 | "names_backwards = list()\n", 798 | "\n", 799 | "names_backwards.append(names[0] + names[0][::-1])\n", 800 | "names_backwards.append(names[1] + names[1][::-1])\n", 801 | "names_backwards.append(names[2] + names[2][::-1])\n", 802 | "names_backwards" 803 | ] 804 | }, 805 | { 806 | "cell_type": "markdown", 807 | "metadata": {}, 808 | "source": [ 809 | "- Above: this is gross, terrible, yucky code\n", 810 | " 1. It only works for a list with 3 elements\n", 811 | " 2. It only works for a list named `names`\n", 812 | " 3. If we want to change its functionality, we need to change 3 similar lines of code (Don't Repeat Yourself!!)\n", 813 | " 4. It is hard to understand what it does just by looking at it" 814 | ] 815 | }, 816 | { 817 | "cell_type": "code", 818 | "execution_count": 48, 819 | "metadata": {}, 820 | "outputs": [ 821 | { 822 | "data": { 823 | "text/plain": [ 824 | "['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']" 825 | ] 826 | }, 827 | "execution_count": 48, 828 | "metadata": {}, 829 | "output_type": "execute_result" 830 | } 831 | ], 832 | "source": [ 833 | "names_backwards = list()\n", 834 | "\n", 835 | "for name in names:\n", 836 | " names_backwards.append(name + name[::-1])\n", 837 | " \n", 838 | "names_backwards" 839 | ] 840 | }, 841 | { 842 | "cell_type": "markdown", 843 | "metadata": {}, 844 | "source": [ 845 | "Above: this is slightly better. We have solved problems (1) and (3)." 846 | ] 847 | }, 848 | { 849 | "cell_type": "code", 850 | "execution_count": 49, 851 | "metadata": {}, 852 | "outputs": [ 853 | { 854 | "data": { 855 | "text/plain": [ 856 | "['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']" 857 | ] 858 | }, 859 | "execution_count": 49, 860 | "metadata": {}, 861 | "output_type": "execute_result" 862 | } 863 | ], 864 | "source": [ 865 | "def make_palindromes(names):\n", 866 | " names_backwards = list()\n", 867 | " \n", 868 | " for name in names:\n", 869 | " names_backwards.append(name + name[::-1])\n", 870 | " \n", 871 | " return names_backwards\n", 872 | "\n", 873 | "make_palindromes(names)" 874 | ] 875 | }, 876 | { 877 | "cell_type": "markdown", 878 | "metadata": {}, 879 | "source": [ 880 | "- Above: this is even better. We have now also solved problem (2), because you can call the function with any list, not just `names`. \n", 881 | "- For example, what if we had multiple _lists_:" 882 | ] 883 | }, 884 | { 885 | "cell_type": "code", 886 | "execution_count": 50, 887 | "metadata": {}, 888 | "outputs": [], 889 | "source": [ 890 | "names1 = [\"milad\", \"rodolfo\", \"tiffany\"]\n", 891 | "names2 = [\"Trudeau\", \"Scheer\", \"Singh\", \"Blanchet\", \"May\"]\n", 892 | "names3 = [\"apple\", \"orange\", \"banana\"]" 893 | ] 894 | }, 895 | { 896 | "cell_type": "code", 897 | "execution_count": 51, 898 | "metadata": {}, 899 | "outputs": [ 900 | { 901 | "data": { 902 | "text/plain": [ 903 | "['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']" 904 | ] 905 | }, 906 | "execution_count": 51, 907 | "metadata": {}, 908 | "output_type": "execute_result" 909 | } 910 | ], 911 | "source": [ 912 | "names_backwards_1 = list()\n", 913 | "\n", 914 | "for name in names1:\n", 915 | " names_backwards_1.append(name + name[::-1])\n", 916 | " \n", 917 | "names_backwards_1" 918 | ] 919 | }, 920 | { 921 | "cell_type": "code", 922 | "execution_count": 52, 923 | "metadata": {}, 924 | "outputs": [ 925 | { 926 | "data": { 927 | "text/plain": [ 928 | "['TrudeauuaedurT', 'ScheerreehcS', 'SinghhgniS', 'BlanchettehcnalB', 'MayyaM']" 929 | ] 930 | }, 931 | "execution_count": 52, 932 | "metadata": {}, 933 | "output_type": "execute_result" 934 | } 935 | ], 936 | "source": [ 937 | "names_backwards_2 = list()\n", 938 | "\n", 939 | "for name in names2:\n", 940 | " names_backwards_2.append(name + name[::-1])\n", 941 | " \n", 942 | "names_backwards_2" 943 | ] 944 | }, 945 | { 946 | "cell_type": "code", 947 | "execution_count": 53, 948 | "metadata": {}, 949 | "outputs": [ 950 | { 951 | "data": { 952 | "text/plain": [ 953 | "['appleelppa', 'orangeegnaro', 'bananaananab']" 954 | ] 955 | }, 956 | "execution_count": 53, 957 | "metadata": {}, 958 | "output_type": "execute_result" 959 | } 960 | ], 961 | "source": [ 962 | "names_backwards_3 = list()\n", 963 | "\n", 964 | "for name in names3:\n", 965 | " names_backwards_3.append(name + name[::-1])\n", 966 | " \n", 967 | "names_backwards_3" 968 | ] 969 | }, 970 | { 971 | "cell_type": "markdown", 972 | "metadata": {}, 973 | "source": [ 974 | "Above: this is very bad also (and imagine if it was 20 lines of code instead of 2). This was problem (2). Our function makes it much better:" 975 | ] 976 | }, 977 | { 978 | "cell_type": "code", 979 | "execution_count": 54, 980 | "metadata": {}, 981 | "outputs": [ 982 | { 983 | "data": { 984 | "text/plain": [ 985 | "['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']" 986 | ] 987 | }, 988 | "execution_count": 54, 989 | "metadata": {}, 990 | "output_type": "execute_result" 991 | } 992 | ], 993 | "source": [ 994 | "make_palindromes(names1)" 995 | ] 996 | }, 997 | { 998 | "cell_type": "code", 999 | "execution_count": 55, 1000 | "metadata": {}, 1001 | "outputs": [ 1002 | { 1003 | "data": { 1004 | "text/plain": [ 1005 | "['TrudeauuaedurT', 'ScheerreehcS', 'SinghhgniS', 'BlanchettehcnalB', 'MayyaM']" 1006 | ] 1007 | }, 1008 | "execution_count": 55, 1009 | "metadata": {}, 1010 | "output_type": "execute_result" 1011 | } 1012 | ], 1013 | "source": [ 1014 | "make_palindromes(names2)" 1015 | ] 1016 | }, 1017 | { 1018 | "cell_type": "code", 1019 | "execution_count": 56, 1020 | "metadata": {}, 1021 | "outputs": [ 1022 | { 1023 | "data": { 1024 | "text/plain": [ 1025 | "['appleelppa', 'orangeegnaro', 'bananaananab']" 1026 | ] 1027 | }, 1028 | "execution_count": 56, 1029 | "metadata": {}, 1030 | "output_type": "execute_result" 1031 | } 1032 | ], 1033 | "source": [ 1034 | "make_palindromes(names3)" 1035 | ] 1036 | }, 1037 | { 1038 | "cell_type": "markdown", 1039 | "metadata": {}, 1040 | "source": [ 1041 | "- You could get even more fancy, and put the lists of names into a list (so you have a list of lists). \n", 1042 | "- Then you could loop over the list and call the function each time:" 1043 | ] 1044 | }, 1045 | { 1046 | "cell_type": "code", 1047 | "execution_count": 57, 1048 | "metadata": {}, 1049 | "outputs": [ 1050 | { 1051 | "name": "stdout", 1052 | "output_type": "stream", 1053 | "text": [ 1054 | "['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']\n", 1055 | "['TrudeauuaedurT', 'ScheerreehcS', 'SinghhgniS', 'BlanchettehcnalB', 'MayyaM']\n", 1056 | "['appleelppa', 'orangeegnaro', 'bananaananab']\n" 1057 | ] 1058 | } 1059 | ], 1060 | "source": [ 1061 | "for list_of_names in [names1, names2, names3]:\n", 1062 | " print(make_palindromes(list_of_names))" 1063 | ] 1064 | }, 1065 | { 1066 | "cell_type": "markdown", 1067 | "metadata": {}, 1068 | "source": [ 1069 | "#### Designing good functions" 1070 | ] 1071 | }, 1072 | { 1073 | "cell_type": "markdown", 1074 | "metadata": {}, 1075 | "source": [ 1076 | "- How far you go with this is sort of a matter of personal style, and how you choose to apply the DRY principle: DON'T REPEAT YOURSELF!\n", 1077 | "- These decisions are often ambiguous. For example: \n", 1078 | " - Should `make_palindromes` be a function if I'm only ever doing it once? Twice?\n", 1079 | " - Should the loop be inside the function, or outside?\n", 1080 | " - Or should there be TWO functions, one that loops over the other??" 1081 | ] 1082 | }, 1083 | { 1084 | "cell_type": "markdown", 1085 | "metadata": {}, 1086 | "source": [ 1087 | "- In my personal opinion, `make_palindromes` does a bit too much to be understandable.\n", 1088 | "- I prefer this:" 1089 | ] 1090 | }, 1091 | { 1092 | "cell_type": "code", 1093 | "execution_count": 59, 1094 | "metadata": {}, 1095 | "outputs": [ 1096 | { 1097 | "data": { 1098 | "text/plain": [ 1099 | "'miladdalim'" 1100 | ] 1101 | }, 1102 | "execution_count": 59, 1103 | "metadata": {}, 1104 | "output_type": "execute_result" 1105 | } 1106 | ], 1107 | "source": [ 1108 | "def make_palindrome(name):\n", 1109 | " return name + name[::-1]\n", 1110 | "\n", 1111 | "make_palindrome(\"milad\")" 1112 | ] 1113 | }, 1114 | { 1115 | "cell_type": "markdown", 1116 | "metadata": {}, 1117 | "source": [ 1118 | "- From here, we want to \"apply `make_palindrome` to every element of a list\"\n", 1119 | "- It turns out this is an extremely common desire, so Python has built-in functions.\n", 1120 | "- One of these is `map`, which we'll cover later. But for now, just a comprehension will do:" 1121 | ] 1122 | }, 1123 | { 1124 | "cell_type": "code", 1125 | "execution_count": 60, 1126 | "metadata": {}, 1127 | "outputs": [ 1128 | { 1129 | "data": { 1130 | "text/plain": [ 1131 | "['miladdalim', 'rodolfooflodor', 'tiffanyynaffit']" 1132 | ] 1133 | }, 1134 | "execution_count": 60, 1135 | "metadata": {}, 1136 | "output_type": "execute_result" 1137 | } 1138 | ], 1139 | "source": [ 1140 | "[make_palindrome(name) for name in names]" 1141 | ] 1142 | }, 1143 | { 1144 | "cell_type": "markdown", 1145 | "metadata": {}, 1146 | "source": [ 1147 | "Other function design considerations:\n", 1148 | "\n", 1149 | "- Should we print output or produce plots inside or outside functions? \n", 1150 | " - I would usually say outside, because this is a \"side effect\" of sorts\n", 1151 | "- Should the function do one thing or many things?\n", 1152 | " - This is a tough one, hard to answer in general" 1153 | ] 1154 | }, 1155 | { 1156 | "cell_type": "markdown", 1157 | "metadata": {}, 1158 | "source": [ 1159 | "## Break (5 min)" 1160 | ] 1161 | }, 1162 | { 1163 | "cell_type": "markdown", 1164 | "metadata": {}, 1165 | "source": [ 1166 | "## Optional & keyword arguments (5 min)\n", 1167 | "\n", 1168 | "- Sometimes it is convenient to have _default values_ for some arguments in a function. \n", 1169 | "- Because they have default values, these arguments are optional, hence \"optional arguments\"\n", 1170 | "- Example:" 1171 | ] 1172 | }, 1173 | { 1174 | "cell_type": "code", 1175 | "execution_count": 61, 1176 | "metadata": {}, 1177 | "outputs": [], 1178 | "source": [ 1179 | "def repeat_string(s, n=2):\n", 1180 | " return s*n" 1181 | ] 1182 | }, 1183 | { 1184 | "cell_type": "code", 1185 | "execution_count": 62, 1186 | "metadata": {}, 1187 | "outputs": [ 1188 | { 1189 | "data": { 1190 | "text/plain": [ 1191 | "'mdsmds'" 1192 | ] 1193 | }, 1194 | "execution_count": 62, 1195 | "metadata": {}, 1196 | "output_type": "execute_result" 1197 | } 1198 | ], 1199 | "source": [ 1200 | "repeat_string(\"mds\", 2)" 1201 | ] 1202 | }, 1203 | { 1204 | "cell_type": "code", 1205 | "execution_count": 63, 1206 | "metadata": {}, 1207 | "outputs": [ 1208 | { 1209 | "data": { 1210 | "text/plain": [ 1211 | "'mdsmdsmdsmdsmds'" 1212 | ] 1213 | }, 1214 | "execution_count": 63, 1215 | "metadata": {}, 1216 | "output_type": "execute_result" 1217 | } 1218 | ], 1219 | "source": [ 1220 | "repeat_string(\"mds\", 5)" 1221 | ] 1222 | }, 1223 | { 1224 | "cell_type": "code", 1225 | "execution_count": 65, 1226 | "metadata": {}, 1227 | "outputs": [ 1228 | { 1229 | "data": { 1230 | "text/plain": [ 1231 | "'mdsmds'" 1232 | ] 1233 | }, 1234 | "execution_count": 65, 1235 | "metadata": {}, 1236 | "output_type": "execute_result" 1237 | } 1238 | ], 1239 | "source": [ 1240 | "repeat_string(\"mds\") # do not specify `n`; it is optional" 1241 | ] 1242 | }, 1243 | { 1244 | "cell_type": "markdown", 1245 | "metadata": {}, 1246 | "source": [ 1247 | "Sane defaults:\n", 1248 | "\n", 1249 | "- Ideally, the default should be carefully chosen. \n", 1250 | "- Here, the idea of \"repeating\" something makes me think of having 2 copies, so `n=2` feels like a sane default." 1251 | ] 1252 | }, 1253 | { 1254 | "cell_type": "markdown", 1255 | "metadata": {}, 1256 | "source": [ 1257 | "Syntax:\n", 1258 | "\n", 1259 | "- You can have any number of arguments and any number of optional arguments\n", 1260 | "- All the optional arguments must come after the regular arguments\n", 1261 | "- The regular arguments are mapped by the order they appear\n", 1262 | "- The optional arguments can be specified out of order" 1263 | ] 1264 | }, 1265 | { 1266 | "cell_type": "code", 1267 | "execution_count": 1, 1268 | "metadata": {}, 1269 | "outputs": [ 1270 | { 1271 | "name": "stdout", 1272 | "output_type": "stream", 1273 | "text": [ 1274 | "1 2 3 4\n" 1275 | ] 1276 | } 1277 | ], 1278 | "source": [ 1279 | "def example(a, b, c=\"DEFAULT\", d=\"DEFAULT\"):\n", 1280 | " print(a,b,c,d)\n", 1281 | " \n", 1282 | "example(1,2,3,4)" 1283 | ] 1284 | }, 1285 | { 1286 | "cell_type": "markdown", 1287 | "metadata": {}, 1288 | "source": [ 1289 | "Using the defaults for `c` and `d`:" 1290 | ] 1291 | }, 1292 | { 1293 | "cell_type": "code", 1294 | "execution_count": 2, 1295 | "metadata": {}, 1296 | "outputs": [ 1297 | { 1298 | "name": "stdout", 1299 | "output_type": "stream", 1300 | "text": [ 1301 | "1 2 DEFAULT DEFAULT\n" 1302 | ] 1303 | } 1304 | ], 1305 | "source": [ 1306 | "example(1,2)" 1307 | ] 1308 | }, 1309 | { 1310 | "cell_type": "markdown", 1311 | "metadata": {}, 1312 | "source": [ 1313 | "Specifying `c` and `d` as **keyword arguments** (i.e. by name):" 1314 | ] 1315 | }, 1316 | { 1317 | "cell_type": "code", 1318 | "execution_count": 3, 1319 | "metadata": {}, 1320 | "outputs": [ 1321 | { 1322 | "name": "stdout", 1323 | "output_type": "stream", 1324 | "text": [ 1325 | "1 2 3 4\n" 1326 | ] 1327 | } 1328 | ], 1329 | "source": [ 1330 | "example(1,2,c=3,d=4)" 1331 | ] 1332 | }, 1333 | { 1334 | "cell_type": "markdown", 1335 | "metadata": {}, 1336 | "source": [ 1337 | "Specifying only one of the optional arguments, by keyword:" 1338 | ] 1339 | }, 1340 | { 1341 | "cell_type": "code", 1342 | "execution_count": 9, 1343 | "metadata": {}, 1344 | "outputs": [ 1345 | { 1346 | "name": "stdout", 1347 | "output_type": "stream", 1348 | "text": [ 1349 | "1 2 3 DEFAULT\n" 1350 | ] 1351 | } 1352 | ], 1353 | "source": [ 1354 | "example(1,2,c=3)" 1355 | ] 1356 | }, 1357 | { 1358 | "cell_type": "markdown", 1359 | "metadata": {}, 1360 | "source": [ 1361 | "Or the other:" 1362 | ] 1363 | }, 1364 | { 1365 | "cell_type": "code", 1366 | "execution_count": 10, 1367 | "metadata": {}, 1368 | "outputs": [ 1369 | { 1370 | "name": "stdout", 1371 | "output_type": "stream", 1372 | "text": [ 1373 | "1 2 DEFAULT 4\n" 1374 | ] 1375 | } 1376 | ], 1377 | "source": [ 1378 | "example(1,2,d=4)" 1379 | ] 1380 | }, 1381 | { 1382 | "cell_type": "markdown", 1383 | "metadata": {}, 1384 | "source": [ 1385 | "Specifying all the arguments as keyword arguments, even though only `c` and `d` are optional:" 1386 | ] 1387 | }, 1388 | { 1389 | "cell_type": "code", 1390 | "execution_count": 5, 1391 | "metadata": {}, 1392 | "outputs": [ 1393 | { 1394 | "name": "stdout", 1395 | "output_type": "stream", 1396 | "text": [ 1397 | "1 2 3 4\n" 1398 | ] 1399 | } 1400 | ], 1401 | "source": [ 1402 | "example(a=1,b=2,c=3,d=4)" 1403 | ] 1404 | }, 1405 | { 1406 | "cell_type": "markdown", 1407 | "metadata": {}, 1408 | "source": [ 1409 | "Specifying `c` by the fact that it comes 3rd (I do not recommend this because I find it is confusing):" 1410 | ] 1411 | }, 1412 | { 1413 | "cell_type": "code", 1414 | "execution_count": 6, 1415 | "metadata": {}, 1416 | "outputs": [ 1417 | { 1418 | "name": "stdout", 1419 | "output_type": "stream", 1420 | "text": [ 1421 | "1 2 3 DEFAULT\n" 1422 | ] 1423 | } 1424 | ], 1425 | "source": [ 1426 | "example(1,2,3) " 1427 | ] 1428 | }, 1429 | { 1430 | "cell_type": "markdown", 1431 | "metadata": {}, 1432 | "source": [ 1433 | "Specifying the optional arguments by keyword, but in the wrong order (this is also somewhat confusing, but not so terrible - I am OK with it):" 1434 | ] 1435 | }, 1436 | { 1437 | "cell_type": "code", 1438 | "execution_count": 74, 1439 | "metadata": {}, 1440 | "outputs": [ 1441 | { 1442 | "name": "stdout", 1443 | "output_type": "stream", 1444 | "text": [ 1445 | "1 2 3 4\n" 1446 | ] 1447 | } 1448 | ], 1449 | "source": [ 1450 | "example(1,2,d=4,c=3) " 1451 | ] 1452 | }, 1453 | { 1454 | "cell_type": "markdown", 1455 | "metadata": {}, 1456 | "source": [ 1457 | "Specifying the non-optional arguments by keyword (I am fine with this):" 1458 | ] 1459 | }, 1460 | { 1461 | "cell_type": "code", 1462 | "execution_count": 8, 1463 | "metadata": {}, 1464 | "outputs": [ 1465 | { 1466 | "name": "stdout", 1467 | "output_type": "stream", 1468 | "text": [ 1469 | "1 2 DEFAULT DEFAULT\n" 1470 | ] 1471 | } 1472 | ], 1473 | "source": [ 1474 | "example(a=1,b=2)" 1475 | ] 1476 | }, 1477 | { 1478 | "cell_type": "markdown", 1479 | "metadata": {}, 1480 | "source": [ 1481 | "Specifying the non-optional arguments by keyword, but in the wrong order (not recommended, I find it confusing):" 1482 | ] 1483 | }, 1484 | { 1485 | "cell_type": "code", 1486 | "execution_count": 7, 1487 | "metadata": {}, 1488 | "outputs": [ 1489 | { 1490 | "name": "stdout", 1491 | "output_type": "stream", 1492 | "text": [ 1493 | "1 2 DEFAULT DEFAULT\n" 1494 | ] 1495 | } 1496 | ], 1497 | "source": [ 1498 | "example(b=2,a=1)" 1499 | ] 1500 | }, 1501 | { 1502 | "cell_type": "markdown", 1503 | "metadata": {}, 1504 | "source": [ 1505 | "Specifying keyword arguments before non-keyword arguments (this throws an error):" 1506 | ] 1507 | }, 1508 | { 1509 | "cell_type": "code", 1510 | "execution_count": 12, 1511 | "metadata": {}, 1512 | "outputs": [ 1513 | { 1514 | "ename": "SyntaxError", 1515 | "evalue": "positional argument follows keyword argument (, line 1)", 1516 | "output_type": "error", 1517 | "traceback": [ 1518 | "\u001b[0;36m File \u001b[0;32m\"\"\u001b[0;36m, line \u001b[0;32m1\u001b[0m\n\u001b[0;31m example(a=2,1)\u001b[0m\n\u001b[0m ^\u001b[0m\n\u001b[0;31mSyntaxError\u001b[0m\u001b[0;31m:\u001b[0m positional argument follows keyword argument\n" 1519 | ] 1520 | } 1521 | ], 1522 | "source": [ 1523 | "example(a=2,1)" 1524 | ] 1525 | }, 1526 | { 1527 | "cell_type": "markdown", 1528 | "metadata": {}, 1529 | "source": [ 1530 | "- In general, I am used to calling non-optional arguments by order, and optional arguments by keyword.\n", 1531 | "- The language allows us to deviate from this, but it can be unnecessarily confusing sometimes." 1532 | ] 1533 | }, 1534 | { 1535 | "cell_type": "markdown", 1536 | "metadata": {}, 1537 | "source": [ 1538 | "#### Advanced stuff (optional):\n", 1539 | "\n", 1540 | "- You can also call/define functions with `*args` and `**kwargs`; see, e.g. [here](https://realpython.com/python-kwargs-and-args/)\n", 1541 | "- Do not instantiate objects in the function definition - see [here](https://docs.python-guide.org/writing/gotchas/) under \"Mutable Default Arguments\"" 1542 | ] 1543 | }, 1544 | { 1545 | "cell_type": "code", 1546 | "execution_count": null, 1547 | "metadata": {}, 1548 | "outputs": [], 1549 | "source": [ 1550 | "def example(a, b=[]): # don't do this!\n", 1551 | " return 0" 1552 | ] 1553 | }, 1554 | { 1555 | "cell_type": "code", 1556 | "execution_count": null, 1557 | "metadata": {}, 1558 | "outputs": [], 1559 | "source": [ 1560 | "def example(a, b=None): # insted, do this\n", 1561 | " if b is None:\n", 1562 | " b = []\n", 1563 | " return 0" 1564 | ] 1565 | }, 1566 | { 1567 | "cell_type": "markdown", 1568 | "metadata": {}, 1569 | "source": [ 1570 | "## Docstrings (10 min)" 1571 | ] 1572 | }, 1573 | { 1574 | "cell_type": "markdown", 1575 | "metadata": {}, 1576 | "source": [ 1577 | "- We got pretty far above, but we never solved problem (4): It is hard to understand what it does just by looking at it\n", 1578 | "- Enter the idea of function documentation (and in particular docstrings)\n", 1579 | "- The [docstring](https://www.python.org/dev/peps/pep-0257/) goes right after the `def` line." 1580 | ] 1581 | }, 1582 | { 1583 | "cell_type": "code", 1584 | "execution_count": 82, 1585 | "metadata": {}, 1586 | "outputs": [], 1587 | "source": [ 1588 | "def make_palindrome(string):\n", 1589 | " \"\"\"Turns the string into a palindrome by concatenating itself with a reversed version of itself.\"\"\"\n", 1590 | " \n", 1591 | " return string + string[::-1]" 1592 | ] 1593 | }, 1594 | { 1595 | "cell_type": "markdown", 1596 | "metadata": {}, 1597 | "source": [ 1598 | "In IPython/Jupyter, we can use `?` to view the documentation string of any function in our environment." 1599 | ] 1600 | }, 1601 | { 1602 | "cell_type": "code", 1603 | "execution_count": 83, 1604 | "metadata": {}, 1605 | "outputs": [], 1606 | "source": [ 1607 | "make_palindrome?" 1608 | ] 1609 | }, 1610 | { 1611 | "cell_type": "code", 1612 | "execution_count": 79, 1613 | "metadata": {}, 1614 | "outputs": [], 1615 | "source": [ 1616 | "print?" 1617 | ] 1618 | }, 1619 | { 1620 | "cell_type": "markdown", 1621 | "metadata": {}, 1622 | "source": [ 1623 | "#### Docstring structure\n", 1624 | "\n", 1625 | "1. **Single-line**: If it's short, then just a single line describing the function will do (as above).\n", 1626 | "2. **PEP-8 style** Multi-line description + a list of arguments; see [here](https://www.python.org/dev/peps/pep-0257/).\n", 1627 | "3. **Scipy style**: The most elaborate & informative; see [here](https://numpydoc.readthedocs.io/en/latest/format.html) and [here](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_numpy.html).\n" 1628 | ] 1629 | }, 1630 | { 1631 | "cell_type": "markdown", 1632 | "metadata": {}, 1633 | "source": [ 1634 | "The PEP-8 style:" 1635 | ] 1636 | }, 1637 | { 1638 | "cell_type": "code", 1639 | "execution_count": 84, 1640 | "metadata": {}, 1641 | "outputs": [], 1642 | "source": [ 1643 | "def make_palindrome(string):\n", 1644 | " \"\"\"\n", 1645 | " Turns the string into a palindrome by concatenating itself \n", 1646 | " with a reversed version of itself.\n", 1647 | " \n", 1648 | " Arguments:\n", 1649 | " string - (str) the string to turn into a palindrome\n", 1650 | " \"\"\"\n", 1651 | " return string + string[::-1]" 1652 | ] 1653 | }, 1654 | { 1655 | "cell_type": "code", 1656 | "execution_count": 85, 1657 | "metadata": {}, 1658 | "outputs": [], 1659 | "source": [ 1660 | "make_palindrome?" 1661 | ] 1662 | }, 1663 | { 1664 | "cell_type": "markdown", 1665 | "metadata": {}, 1666 | "source": [ 1667 | "The scipy style:" 1668 | ] 1669 | }, 1670 | { 1671 | "cell_type": "code", 1672 | "execution_count": null, 1673 | "metadata": {}, 1674 | "outputs": [], 1675 | "source": [ 1676 | "def make_palindrome(string):\n", 1677 | " \"\"\"\n", 1678 | " Turn a string into a palindrome.\n", 1679 | " \n", 1680 | " Turns the string into a palindrome by concatenating itself \n", 1681 | " with a reversed version of itself, so that the returned\n", 1682 | " string is twice as long as the original.\n", 1683 | " \n", 1684 | " Parameters\n", 1685 | " ----------\n", 1686 | " string : str\n", 1687 | " The string to turn into a palindrome.\n", 1688 | " \n", 1689 | " Returns\n", 1690 | " -------\n", 1691 | " str\n", 1692 | " The new palindrome string. \n", 1693 | " \n", 1694 | " Examples\n", 1695 | " --------\n", 1696 | " >>> make_palindrome(\"abc\")\n", 1697 | " \"abccba\"\n", 1698 | " \"\"\"\n", 1699 | " return string + string[::-1]" 1700 | ] 1701 | }, 1702 | { 1703 | "cell_type": "code", 1704 | "execution_count": 86, 1705 | "metadata": {}, 1706 | "outputs": [], 1707 | "source": [ 1708 | "make_palindrome(# press shift-tab HERE to get docstring!!" 1709 | ] 1710 | }, 1711 | { 1712 | "cell_type": "markdown", 1713 | "metadata": {}, 1714 | "source": [ 1715 | "Below is the general form of the scipy docstring (reproduced from the scipy/numpy docs):" 1716 | ] 1717 | }, 1718 | { 1719 | "cell_type": "code", 1720 | "execution_count": null, 1721 | "metadata": {}, 1722 | "outputs": [], 1723 | "source": [ 1724 | "def function_name(param1,param2,param3):\n", 1725 | " \"\"\"First line is a short description of the function.\n", 1726 | " \n", 1727 | " A paragraph describing in a bit more detail what the\n", 1728 | " function does and what algorithms it uses and common\n", 1729 | " use cases.\n", 1730 | " \n", 1731 | " Parameters\n", 1732 | " ----------\n", 1733 | " param1 : datatype\n", 1734 | " A description of param1.\n", 1735 | " param2 : datatype\n", 1736 | " A description of param2.\n", 1737 | " param3 : datatype\n", 1738 | " A longer description because maybe this requires\n", 1739 | " more explanation and we can use several lines.\n", 1740 | " \n", 1741 | " Returns\n", 1742 | " -------\n", 1743 | " datatype\n", 1744 | " A description of the output, datatypes and behaviours.\n", 1745 | " Describe special cases and anything the user needs to\n", 1746 | " know to use the function.\n", 1747 | " \n", 1748 | " Examples\n", 1749 | " --------\n", 1750 | " >>> function_name(3,8,-5)\n", 1751 | " 2.0\n", 1752 | " \"\"\"" 1753 | ] 1754 | }, 1755 | { 1756 | "cell_type": "markdown", 1757 | "metadata": {}, 1758 | "source": [ 1759 | "#### Docstrings in your labs\n", 1760 | "\n", 1761 | "In MDS we will accept:\n", 1762 | "\n", 1763 | "- One-line docstrings for very simple functions.\n", 1764 | "- Either the PEP-8 or scipy style for bigger functions.\n", 1765 | " - But we think the scipy style is more common in the wild so you may want to get into the habit of using it.\n", 1766 | " - Personally, I like that it explicitly gives the datatype of the return value." 1767 | ] 1768 | }, 1769 | { 1770 | "cell_type": "markdown", 1771 | "metadata": {}, 1772 | "source": [ 1773 | "#### Docstrings with optional arguments\n", 1774 | "\n", 1775 | "When specifying the parameters, we specify the defaults for optional arguments:" 1776 | ] 1777 | }, 1778 | { 1779 | "cell_type": "code", 1780 | "execution_count": null, 1781 | "metadata": {}, 1782 | "outputs": [], 1783 | "source": [ 1784 | "# PEP-8 style\n", 1785 | "def repeat_string(s, n=2):\n", 1786 | " \"\"\"\n", 1787 | " Repeat the string s, n times.\n", 1788 | " \n", 1789 | " Arguments:\n", 1790 | " s -- (str) the string\n", 1791 | " n -- (int) the number of times (default 2)\n", 1792 | " \"\"\"\n", 1793 | " return s*n" 1794 | ] 1795 | }, 1796 | { 1797 | "cell_type": "code", 1798 | "execution_count": null, 1799 | "metadata": {}, 1800 | "outputs": [], 1801 | "source": [ 1802 | "# scipy style\n", 1803 | "def repeat_string(s, n=2):\n", 1804 | " \"\"\"\n", 1805 | " Repeat the string s, n times.\n", 1806 | " \n", 1807 | " Parameters\n", 1808 | " ----------\n", 1809 | " s : str \n", 1810 | " the string\n", 1811 | " n : int, optional (default = 2)\n", 1812 | " the number of times\n", 1813 | " \n", 1814 | " Returns\n", 1815 | " -------\n", 1816 | " str\n", 1817 | " the repeated string\n", 1818 | " \n", 1819 | " Examples\n", 1820 | " --------\n", 1821 | " >>> repeat_string(\"Blah\", 3)\n", 1822 | " \"BlahBlahBlah\"\n", 1823 | " \"\"\"\n", 1824 | " return s*n" 1825 | ] 1826 | }, 1827 | { 1828 | "cell_type": "markdown", 1829 | "metadata": {}, 1830 | "source": [ 1831 | "#### Automatically generated documentation\n", 1832 | "\n", 1833 | "- By following the docstring conventions, we can _automatically generate documentation_ using libraries like [sphinx](http://www.sphinx-doc.org/en/master/), [pydoc](https://docs.python.org/3.7/library/pydoc.html) or [Doxygen](http://www.doxygen.nl/).\n", 1834 | " - For example: compare this [documentation](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html) with this [code](https://github.com/scikit-learn/scikit-learn/blob/1495f6924/sklearn/neighbors/classification.py#L23).\n", 1835 | " - Notice the similarities? The webpage was automatically generated because the authors used standard conventions for docstrings!" 1836 | ] 1837 | }, 1838 | { 1839 | "cell_type": "markdown", 1840 | "metadata": {}, 1841 | "source": [ 1842 | "#### What makes good documentation?\n", 1843 | "\n", 1844 | "- What do you think about this?" 1845 | ] 1846 | }, 1847 | { 1848 | "cell_type": "code", 1849 | "execution_count": null, 1850 | "metadata": {}, 1851 | "outputs": [], 1852 | "source": [ 1853 | "################################\n", 1854 | "#\n", 1855 | "# NOT RECOMMENDED TO DO THIS!!!\n", 1856 | "#\n", 1857 | "################################\n", 1858 | "\n", 1859 | "def make_palindrome(string):\n", 1860 | " \"\"\"\n", 1861 | " Turns the string into a palindrome by concatenating itself \n", 1862 | " with a reversed version of itself. To do this, it uses the\n", 1863 | " Python syntax of `[::-1]` to flip the string, and stores\n", 1864 | " this in a variable called string_reversed. It then uses `+`\n", 1865 | " to concatenate the two strings and return them to the caller.\n", 1866 | " \n", 1867 | " Arguments:\n", 1868 | " string - (str) the string to turn into a palindrome\n", 1869 | " \n", 1870 | " Other variables:\n", 1871 | " string_reversed - (str) the reversed string\n", 1872 | " \"\"\"\n", 1873 | " \n", 1874 | " string_reversed = string[::-1]\n", 1875 | " return string + string_reversed" 1876 | ] 1877 | }, 1878 | { 1879 | "cell_type": "markdown", 1880 | "metadata": {}, 1881 | "source": [ 1882 | "

\n", 1883 | "\n", 1884 | "- This is poor documentation! More is not necessarily better!\n", 1885 | "- Why?\n", 1886 | " - " 1887 | ] 1888 | }, 1889 | { 1890 | "cell_type": "markdown", 1891 | "metadata": {}, 1892 | "source": [ 1893 | "## Unit tests, corner cases (10 min)" 1894 | ] 1895 | }, 1896 | { 1897 | "cell_type": "markdown", 1898 | "metadata": {}, 1899 | "source": [ 1900 | "#### `assert` statements\n", 1901 | "\n", 1902 | "- `assert` statementS cause your program to fail if the condition is `False`.\n", 1903 | "- They can be used as sanity checks for your program.\n", 1904 | "- There are more sophisticated way to \"test\" your programs, which we'll discuss in DSCI 524.\n", 1905 | "- The syntax is:\n", 1906 | "\n", 1907 | "```python\n", 1908 | "assert expression , \"Error message if expression is False or raises an error.\"\n", 1909 | "```" 1910 | ] 1911 | }, 1912 | { 1913 | "cell_type": "code", 1914 | "execution_count": 87, 1915 | "metadata": {}, 1916 | "outputs": [ 1917 | { 1918 | "ename": "AssertionError", 1919 | "evalue": "1 is not equal to 2.", 1920 | "output_type": "error", 1921 | "traceback": [ 1922 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 1923 | "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", 1924 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0;36m1\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m2\u001b[0m \u001b[0;34m,\u001b[0m \u001b[0;34m\"1 is not equal to 2.\"\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 1925 | "\u001b[0;31mAssertionError\u001b[0m: 1 is not equal to 2." 1926 | ] 1927 | } 1928 | ], 1929 | "source": [ 1930 | "assert 1 == 2 , \"1 is not equal to 2.\"" 1931 | ] 1932 | }, 1933 | { 1934 | "cell_type": "markdown", 1935 | "metadata": {}, 1936 | "source": [ 1937 | "#### Systematic Program Design" 1938 | ] 1939 | }, 1940 | { 1941 | "cell_type": "markdown", 1942 | "metadata": {}, 1943 | "source": [ 1944 | "A systematic approach to program design is a general set of steps to follow when writing programs. Our approach includes:\n", 1945 | "\n", 1946 | "1. Write a stub: a function that does nothing but accept all input parameters and return the correct datatype.\n", 1947 | "2. Write tests to satisfy the design specifications.\n", 1948 | "3. Outline the program with pseudo-code.\n", 1949 | "4. Write code and test frequently.\n", 1950 | "5. Write documentation.\n", 1951 | "\n", 1952 | "The key point: write tests BEFORE you write code.\n", 1953 | "\n", 1954 | "- You do not have to do this in MDS, but you may find it surprisingly helpful.\n", 1955 | "- Often writing tests helps you think through what you are trying to accomplish.\n", 1956 | "- It's best to have that clear before you write the actual code." 1957 | ] 1958 | }, 1959 | { 1960 | "cell_type": "markdown", 1961 | "metadata": {}, 1962 | "source": [ 1963 | "#### Testing woes - false positives\n", 1964 | "\n", 1965 | "- **Just because all your tests pass, this does not mean your program is correct!!**\n", 1966 | "- This happens all the time. How to deal with it?\n", 1967 | " - Write a lot of tests!\n", 1968 | " - Don't be overconfident, even after writing a lot of tests!" 1969 | ] 1970 | }, 1971 | { 1972 | "cell_type": "code", 1973 | "execution_count": 88, 1974 | "metadata": {}, 1975 | "outputs": [], 1976 | "source": [ 1977 | "def sample_median(x):\n", 1978 | " \"\"\"Finds the median of a list of numbers.\"\"\"\n", 1979 | " x_sorted = sorted(x)\n", 1980 | " return x_sorted[len(x_sorted)//2]\n", 1981 | "\n", 1982 | "assert sample_median([1,2,3,4,5]) == 3\n", 1983 | "assert sample_median([0,0,0,0]) == 0" 1984 | ] 1985 | }, 1986 | { 1987 | "cell_type": "markdown", 1988 | "metadata": {}, 1989 | "source": [ 1990 | "Looks good? ... ?\n", 1991 | "\n", 1992 | "




" 1993 | ] 1994 | }, 1995 | { 1996 | "cell_type": "code", 1997 | "execution_count": 89, 1998 | "metadata": {}, 1999 | "outputs": [ 2000 | { 2001 | "ename": "AssertionError", 2002 | "evalue": "", 2003 | "output_type": "error", 2004 | "traceback": [ 2005 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 2006 | "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", 2007 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0msample_median\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m4\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m2.5\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 2008 | "\u001b[0;31mAssertionError\u001b[0m: " 2009 | ] 2010 | } 2011 | ], 2012 | "source": [ 2013 | "assert sample_median([1,2,3,4]) == 2.5" 2014 | ] 2015 | }, 2016 | { 2017 | "cell_type": "markdown", 2018 | "metadata": {}, 2019 | "source": [ 2020 | "




" 2021 | ] 2022 | }, 2023 | { 2024 | "cell_type": "code", 2025 | "execution_count": 90, 2026 | "metadata": {}, 2027 | "outputs": [ 2028 | { 2029 | "ename": "AssertionError", 2030 | "evalue": "", 2031 | "output_type": "error", 2032 | "traceback": [ 2033 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 2034 | "\u001b[0;31mAssertionError\u001b[0m Traceback (most recent call last)", 2035 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0;32massert\u001b[0m \u001b[0msample_median\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m[\u001b[0m\u001b[0;36m1\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m3\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;36m2\u001b[0m\u001b[0;34m]\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m==\u001b[0m \u001b[0;36m2\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 2036 | "\u001b[0;31mAssertionError\u001b[0m: " 2037 | ] 2038 | } 2039 | ], 2040 | "source": [ 2041 | "assert sample_median([1,3,2]) == 2" 2042 | ] 2043 | }, 2044 | { 2045 | "cell_type": "markdown", 2046 | "metadata": {}, 2047 | "source": [ 2048 | "#### Testing woes - false negatives\n", 2049 | "\n", 2050 | "- It can also happen, though more rarely, that your tests fail but your program is correct.\n", 2051 | "- This means there is something wrong with your test.\n", 2052 | "- For example, in the autograding for lab1 this happened to some people, because of tiny roundoff errors." 2053 | ] 2054 | }, 2055 | { 2056 | "cell_type": "markdown", 2057 | "metadata": {}, 2058 | "source": [ 2059 | "#### Corner cases\n", 2060 | "\n", 2061 | "- A **corner case** is an input that is reasonable but a bit unusual, and may trip up your code.\n", 2062 | "- For example, taking the median of an empty list, or a list with only one element. \n", 2063 | "- Often it is desirable to add test cases to address corner cases." 2064 | ] 2065 | }, 2066 | { 2067 | "cell_type": "code", 2068 | "execution_count": null, 2069 | "metadata": {}, 2070 | "outputs": [], 2071 | "source": [ 2072 | "assert sample_median([1]) == 1" 2073 | ] 2074 | }, 2075 | { 2076 | "cell_type": "markdown", 2077 | "metadata": {}, 2078 | "source": [ 2079 | "- In this case the code worked with no extra effort, but sometimes we need `if` statements to handle the weird cases.\n", 2080 | "- Sometimes we want the code to throw an error (e.g. median of an empty list); more on this later." 2081 | ] 2082 | }, 2083 | { 2084 | "cell_type": "markdown", 2085 | "metadata": {}, 2086 | "source": [ 2087 | "## Multiple return values (0 min)\n", 2088 | "\n", 2089 | "- In most (all?) programming languages I've seen, functions can only return one thing.\n", 2090 | "- That is technically true in Python, but there is a \"workaround\", which is to return a tuple." 2091 | ] 2092 | }, 2093 | { 2094 | "cell_type": "code", 2095 | "execution_count": 91, 2096 | "metadata": {}, 2097 | "outputs": [], 2098 | "source": [ 2099 | "# not good from a design perspective!\n", 2100 | "def sum_and_product(x, y):\n", 2101 | " return (x+y, x*y)" 2102 | ] 2103 | }, 2104 | { 2105 | "cell_type": "code", 2106 | "execution_count": 92, 2107 | "metadata": {}, 2108 | "outputs": [ 2109 | { 2110 | "data": { 2111 | "text/plain": [ 2112 | "(11, 30)" 2113 | ] 2114 | }, 2115 | "execution_count": 92, 2116 | "metadata": {}, 2117 | "output_type": "execute_result" 2118 | } 2119 | ], 2120 | "source": [ 2121 | "sum_and_product(5,6)" 2122 | ] 2123 | }, 2124 | { 2125 | "cell_type": "markdown", 2126 | "metadata": {}, 2127 | "source": [ 2128 | "In some cases in Python, the parentheses can be omitted: " 2129 | ] 2130 | }, 2131 | { 2132 | "cell_type": "code", 2133 | "execution_count": 93, 2134 | "metadata": {}, 2135 | "outputs": [], 2136 | "source": [ 2137 | "def sum_and_product(x, y):\n", 2138 | " return x+y, x*y" 2139 | ] 2140 | }, 2141 | { 2142 | "cell_type": "code", 2143 | "execution_count": 94, 2144 | "metadata": {}, 2145 | "outputs": [ 2146 | { 2147 | "data": { 2148 | "text/plain": [ 2149 | "(11, 30)" 2150 | ] 2151 | }, 2152 | "execution_count": 94, 2153 | "metadata": {}, 2154 | "output_type": "execute_result" 2155 | } 2156 | ], 2157 | "source": [ 2158 | "sum_and_product(5,6)" 2159 | ] 2160 | }, 2161 | { 2162 | "cell_type": "markdown", 2163 | "metadata": {}, 2164 | "source": [ 2165 | "It is common to store these in separate variables, so it really feels like the function is returning multiple values:" 2166 | ] 2167 | }, 2168 | { 2169 | "cell_type": "code", 2170 | "execution_count": 95, 2171 | "metadata": {}, 2172 | "outputs": [], 2173 | "source": [ 2174 | "s, p = sum_and_product(5, 6)" 2175 | ] 2176 | }, 2177 | { 2178 | "cell_type": "code", 2179 | "execution_count": 96, 2180 | "metadata": {}, 2181 | "outputs": [ 2182 | { 2183 | "data": { 2184 | "text/plain": [ 2185 | "11" 2186 | ] 2187 | }, 2188 | "execution_count": 96, 2189 | "metadata": {}, 2190 | "output_type": "execute_result" 2191 | } 2192 | ], 2193 | "source": [ 2194 | "s" 2195 | ] 2196 | }, 2197 | { 2198 | "cell_type": "code", 2199 | "execution_count": 97, 2200 | "metadata": {}, 2201 | "outputs": [ 2202 | { 2203 | "data": { 2204 | "text/plain": [ 2205 | "30" 2206 | ] 2207 | }, 2208 | "execution_count": 97, 2209 | "metadata": {}, 2210 | "output_type": "execute_result" 2211 | } 2212 | ], 2213 | "source": [ 2214 | "p" 2215 | ] 2216 | }, 2217 | { 2218 | "cell_type": "markdown", 2219 | "metadata": {}, 2220 | "source": [ 2221 | "- Question: is this good function design.\n", 2222 | "- Answer: usually not, but sometimes. \n", 2223 | "- You will encounter this in some Python packages." 2224 | ] 2225 | }, 2226 | { 2227 | "cell_type": "markdown", 2228 | "metadata": {}, 2229 | "source": [ 2230 | "Advanced / optional: you can ignore return values you don't need with `_`:" 2231 | ] 2232 | }, 2233 | { 2234 | "cell_type": "code", 2235 | "execution_count": 98, 2236 | "metadata": {}, 2237 | "outputs": [], 2238 | "source": [ 2239 | "s, _ = sum_and_product(5, 6)" 2240 | ] 2241 | }, 2242 | { 2243 | "cell_type": "code", 2244 | "execution_count": 99, 2245 | "metadata": {}, 2246 | "outputs": [ 2247 | { 2248 | "data": { 2249 | "text/plain": [ 2250 | "11" 2251 | ] 2252 | }, 2253 | "execution_count": 99, 2254 | "metadata": {}, 2255 | "output_type": "execute_result" 2256 | } 2257 | ], 2258 | "source": [ 2259 | "s" 2260 | ] 2261 | }, 2262 | { 2263 | "cell_type": "markdown", 2264 | "metadata": {}, 2265 | "source": [ 2266 | "#### Fun with tuples\n", 2267 | "\n", 2268 | "In general, you can do some weird stuff with tuples:" 2269 | ] 2270 | }, 2271 | { 2272 | "cell_type": "code", 2273 | "execution_count": 100, 2274 | "metadata": {}, 2275 | "outputs": [], 2276 | "source": [ 2277 | "a, b = 5, 6" 2278 | ] 2279 | }, 2280 | { 2281 | "cell_type": "code", 2282 | "execution_count": 101, 2283 | "metadata": {}, 2284 | "outputs": [], 2285 | "source": [ 2286 | "a, b = (5, 6)" 2287 | ] 2288 | }, 2289 | { 2290 | "cell_type": "code", 2291 | "execution_count": 102, 2292 | "metadata": {}, 2293 | "outputs": [], 2294 | "source": [ 2295 | "a, b = b, a # in other languages this requires a \"temp\" variable" 2296 | ] 2297 | }, 2298 | { 2299 | "cell_type": "code", 2300 | "execution_count": 103, 2301 | "metadata": {}, 2302 | "outputs": [ 2303 | { 2304 | "data": { 2305 | "text/plain": [ 2306 | "6" 2307 | ] 2308 | }, 2309 | "execution_count": 103, 2310 | "metadata": {}, 2311 | "output_type": "execute_result" 2312 | } 2313 | ], 2314 | "source": [ 2315 | "a" 2316 | ] 2317 | }, 2318 | { 2319 | "cell_type": "code", 2320 | "execution_count": 104, 2321 | "metadata": {}, 2322 | "outputs": [ 2323 | { 2324 | "data": { 2325 | "text/plain": [ 2326 | "5" 2327 | ] 2328 | }, 2329 | "execution_count": 104, 2330 | "metadata": {}, 2331 | "output_type": "execute_result" 2332 | } 2333 | ], 2334 | "source": [ 2335 | "b" 2336 | ] 2337 | } 2338 | ], 2339 | "metadata": { 2340 | "kernelspec": { 2341 | "display_name": "Python 3", 2342 | "language": "python", 2343 | "name": "python3" 2344 | }, 2345 | "language_info": { 2346 | "codemirror_mode": { 2347 | "name": "ipython", 2348 | "version": 3 2349 | }, 2350 | "file_extension": ".py", 2351 | "mimetype": "text/x-python", 2352 | "name": "python", 2353 | "nbconvert_exporter": "python", 2354 | "pygments_lexer": "ipython3", 2355 | "version": "3.7.3" 2356 | } 2357 | }, 2358 | "nbformat": 4, 2359 | "nbformat_minor": 4 2360 | } 2361 | -------------------------------------------------------------------------------- /lectures/lecture4.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# DSCI 511 Lecture 4" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Outline:\n", 15 | "\n", 16 | "- Python classes (20 min)\n", 17 | "- Python `import`(10 min)\n", 18 | "- Importing your own functions (5 min)\n", 19 | "- Break (5 min)\n", 20 | "- Intriguing behaviour in Python (5 min)\n", 21 | "- References (10 min)\n", 22 | "- Function calls and references (5 min)\n", 23 | "- `copy` and `deepcopy` (10 min)\n", 24 | "- Scoping (10 min)" 25 | ] 26 | }, 27 | { 28 | "cell_type": "code", 29 | "execution_count": 1, 30 | "metadata": {}, 31 | "outputs": [], 32 | "source": [ 33 | "import numpy as np" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "## Python Classes (20 min)" 41 | ] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": {}, 46 | "source": [ 47 | "- We've seen data types like `dict` (built in to Python) and `np.ndarray` (3rd party library). \n", 48 | "- Today we'll see how to create our own data types. \n", 49 | "- These are called **classes** and an instance is called an **object**. (Classes documentation [here](https://docs.python.org/3/tutorial/classes.html).)\n", 50 | "- For our purposes, a type and a class are the same thing. Some discussion of the differences [here](https://stackoverflow.com/questions/468145/what-is-the-difference-between-type-and-class).\n", 51 | "- The general approach to programming using classes and objects is called [object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming)." 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 2, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "d = dict()" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "Here, `d` is an object, whereas `dict` is a type. " 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": 3, 73 | "metadata": {}, 74 | "outputs": [ 75 | { 76 | "data": { 77 | "text/plain": [ 78 | "dict" 79 | ] 80 | }, 81 | "execution_count": 3, 82 | "metadata": {}, 83 | "output_type": "execute_result" 84 | } 85 | ], 86 | "source": [ 87 | "type(d)" 88 | ] 89 | }, 90 | { 91 | "cell_type": "code", 92 | "execution_count": 4, 93 | "metadata": {}, 94 | "outputs": [ 95 | { 96 | "data": { 97 | "text/plain": [ 98 | "type" 99 | ] 100 | }, 101 | "execution_count": 4, 102 | "metadata": {}, 103 | "output_type": "execute_result" 104 | } 105 | ], 106 | "source": [ 107 | "type(dict)" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "We say `d` is an **instance** of type `dict`. Hence" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": 5, 120 | "metadata": {}, 121 | "outputs": [ 122 | { 123 | "data": { 124 | "text/plain": [ 125 | "True" 126 | ] 127 | }, 128 | "execution_count": 5, 129 | "metadata": {}, 130 | "output_type": "execute_result" 131 | } 132 | ], 133 | "source": [ 134 | "isinstance(d, dict)" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "#### Why create your own types/classes?\n", 142 | "\n", 143 | "- Example: a circle in 2D space\n", 144 | "- You want to be able to _change_ the circle in several ways: move it or make it bigger or smaller.\n", 145 | "- You want to be able to compute properties of the circle: its area, circumference, and its distance to the origin." 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 6, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "x = 2.0\n", 155 | "y = 3.0\n", 156 | "r = 1.0 # radius\n", 157 | "\n", 158 | "def area(r):\n", 159 | " \"\"\"Compute the area of a circle with radius r.\"\"\"\n", 160 | " return np.pi * r**2\n", 161 | "\n", 162 | "def circumference(r):\n", 163 | " \"\"\"Compute the circumference of a circle with radius r.\"\"\"\n", 164 | " return 2.0 * np.pi * r\n", 165 | "\n", 166 | "def dist(x, y, r):\n", 167 | " \"\"\"Compute the distance to the origin from a circle with centre x, y and radius r.\"\"\"\n", 168 | " return np.abs(np.sqrt(x**2 + y**2) - r)" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 7, 174 | "metadata": {}, 175 | "outputs": [ 176 | { 177 | "data": { 178 | "text/plain": [ 179 | "2.605551275463989" 180 | ] 181 | }, 182 | "execution_count": 7, 183 | "metadata": {}, 184 | "output_type": "execute_result" 185 | } 186 | ], 187 | "source": [ 188 | "dist(x, y, r)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": 8, 194 | "metadata": {}, 195 | "outputs": [ 196 | { 197 | "data": { 198 | "text/plain": [ 199 | "3.141592653589793" 200 | ] 201 | }, 202 | "execution_count": 8, 203 | "metadata": {}, 204 | "output_type": "execute_result" 205 | } 206 | ], 207 | "source": [ 208 | "area(r)" 209 | ] 210 | }, 211 | { 212 | "cell_type": "markdown", 213 | "metadata": {}, 214 | "source": [ 215 | "Now let's say you want two circles..." 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": 9, 221 | "metadata": {}, 222 | "outputs": [ 223 | { 224 | "data": { 225 | "text/plain": [ 226 | "4.5" 227 | ] 228 | }, 229 | "execution_count": 9, 230 | "metadata": {}, 231 | "output_type": "execute_result" 232 | } 233 | ], 234 | "source": [ 235 | "x2 = -3\n", 236 | "y2 = 4\n", 237 | "r2 = 0.5\n", 238 | "\n", 239 | "dist(x2, y2, r2)" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "This approach is very clunky. What if you accidentally call" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 10, 252 | "metadata": {}, 253 | "outputs": [ 254 | { 255 | "data": { 256 | "text/plain": [ 257 | "4.0" 258 | ] 259 | }, 260 | "execution_count": 10, 261 | "metadata": {}, 262 | "output_type": "execute_result" 263 | } 264 | ], 265 | "source": [ 266 | "dist(x2, y2, r) # use the radius of the other circle by accident" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": {}, 272 | "source": [ 273 | "Ok, so maybe you can wrap everything in dictionaries:" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": 11, 279 | "metadata": {}, 280 | "outputs": [ 281 | { 282 | "data": { 283 | "text/plain": [ 284 | "2.605551275463989" 285 | ] 286 | }, 287 | "execution_count": 11, 288 | "metadata": {}, 289 | "output_type": "execute_result" 290 | } 291 | ], 292 | "source": [ 293 | "circle1 = {\"x\" : x,\n", 294 | " \"y\" : y,\n", 295 | " \"r\" : r}\n", 296 | "\n", 297 | "circle2 = {\"x\" : x2,\n", 298 | " \"y\" : y2,\n", 299 | " \"r\" : r2}\n", 300 | "\n", 301 | "dist(**circle1) # fancy syntax to \"unpack\" a dictionary into the arguments of a function, assuming the keys of the dictionary match the expected argument names" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "The above is slightly better, but still awkward. For example, you might accidentally do" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 12, 314 | "metadata": {}, 315 | "outputs": [], 316 | "source": [ 317 | "circle3 = {\"x\" : 5,\n", 318 | " \"z\" : 2, # now circle3 has different property names by accident\n", 319 | " \"r\" : 3}" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 13, 325 | "metadata": {}, 326 | "outputs": [ 327 | { 328 | "ename": "TypeError", 329 | "evalue": "dist() got an unexpected keyword argument 'z'", 330 | "output_type": "error", 331 | "traceback": [ 332 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 333 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 334 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mdist\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m**\u001b[0m\u001b[0mcircle3\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 335 | "\u001b[0;31mTypeError\u001b[0m: dist() got an unexpected keyword argument 'z'" 336 | ] 337 | } 338 | ], 339 | "source": [ 340 | "dist(**circle3)" 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "- Classes allow us to enforce the _structure of our data_.\n", 348 | " - That is, a circle contains a $x$, $y$, and $r$.\n", 349 | "- It also helps writing functions, as you'll see.\n", 350 | " - Above, all our functions had to take in the same data and re-explain the arguments." 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "#### Making a class" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": {}, 363 | "source": [ 364 | "- The syntax below creates a class, or type, called `Circle`. \n", 365 | "- The functions defined inside a class are called **methods**.\n", 366 | "- The `__init__` method is run when you create a new instance of the class (i.e. a new `Circle` object)." 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": 14, 372 | "metadata": {}, 373 | "outputs": [], 374 | "source": [ 375 | "class Circle:\n", 376 | " \"\"\"A circle with a centre (x,y) and radius r.\"\"\"\n", 377 | " \n", 378 | " def __init__(self, x, y, r):\n", 379 | " self.x = x\n", 380 | " self.y = y\n", 381 | " self.r = r" 382 | ] 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "metadata": {}, 387 | "source": [ 388 | "Let's re-create `circle1`:" 389 | ] 390 | }, 391 | { 392 | "cell_type": "code", 393 | "execution_count": 15, 394 | "metadata": {}, 395 | "outputs": [], 396 | "source": [ 397 | "circle1 = Circle(2.0, 3.0, 1.0)" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": 16, 403 | "metadata": {}, 404 | "outputs": [ 405 | { 406 | "data": { 407 | "text/plain": [ 408 | "__main__.Circle" 409 | ] 410 | }, 411 | "execution_count": 16, 412 | "metadata": {}, 413 | "output_type": "execute_result" 414 | } 415 | ], 416 | "source": [ 417 | "type(circle1)" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": 17, 423 | "metadata": {}, 424 | "outputs": [ 425 | { 426 | "data": { 427 | "text/plain": [ 428 | "2.0" 429 | ] 430 | }, 431 | "execution_count": 17, 432 | "metadata": {}, 433 | "output_type": "execute_result" 434 | } 435 | ], 436 | "source": [ 437 | "circle1.x # retrieve one of the fields" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": {}, 443 | "source": [ 444 | "Let's now implement the methods:" 445 | ] 446 | }, 447 | { 448 | "cell_type": "code", 449 | "execution_count": 18, 450 | "metadata": {}, 451 | "outputs": [], 452 | "source": [ 453 | "class Circle:\n", 454 | " \"\"\"A circle with a centre (x,y) and radius r.\"\"\"\n", 455 | " \n", 456 | " def __init__(self, x, y, r=1.0):\n", 457 | " # For those familiar with a \"constructor\" - this is it!\n", 458 | " self.x = x\n", 459 | " self.y = y\n", 460 | " self.r = r\n", 461 | " \n", 462 | " def area(self):\n", 463 | " return np.pi * self.r**2\n", 464 | "\n", 465 | " def circumference(self):\n", 466 | " return 2.0 * np.pi * self.r\n", 467 | "\n", 468 | " def dist(self):\n", 469 | " \"\"\"Compute the distance to the origin.\"\"\"\n", 470 | " return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)" 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": {}, 476 | "source": [ 477 | "Some things to note:\n", 478 | "\n", 479 | "- The inputs to the methods are just `self`. \n", 480 | "- This `self` object is literally itself; thus, it gives you access to all the data inside the class using `self.x`, etc. \n", 481 | "- No need to re-explain the arguments each time, just explain the data at the start of the class.\n", 482 | " - This makes the code cleaner, more reusable and more modular.\n", 483 | "- We call the functions with the `.`" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": 19, 489 | "metadata": {}, 490 | "outputs": [], 491 | "source": [ 492 | "circle1 = Circle(2.0, 3.0, 1.0)" 493 | ] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "execution_count": 20, 498 | "metadata": {}, 499 | "outputs": [ 500 | { 501 | "data": { 502 | "text/plain": [ 503 | "3.141592653589793" 504 | ] 505 | }, 506 | "execution_count": 20, 507 | "metadata": {}, 508 | "output_type": "execute_result" 509 | } 510 | ], 511 | "source": [ 512 | "circle1.area()" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": 21, 518 | "metadata": {}, 519 | "outputs": [ 520 | { 521 | "data": { 522 | "text/plain": [ 523 | "2.605551275463989" 524 | ] 525 | }, 526 | "execution_count": 21, 527 | "metadata": {}, 528 | "output_type": "execute_result" 529 | } 530 | ], 531 | "source": [ 532 | "circle1.dist()" 533 | ] 534 | }, 535 | { 536 | "cell_type": "markdown", 537 | "metadata": {}, 538 | "source": [ 539 | "In fact, we've seen this before:" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": 22, 545 | "metadata": {}, 546 | "outputs": [], 547 | "source": [ 548 | "d = dict()\n", 549 | "\n", 550 | "for key, val in d.items():\n", 551 | " pass" 552 | ] 553 | }, 554 | { 555 | "cell_type": "markdown", 556 | "metadata": {}, 557 | "source": [ 558 | "This is the same `.` because `items` is a method of the `dict` class." 559 | ] 560 | }, 561 | { 562 | "cell_type": "code", 563 | "execution_count": 23, 564 | "metadata": {}, 565 | "outputs": [ 566 | { 567 | "data": { 568 | "text/plain": [ 569 | "array([5, 3, 4, 5, 2, 7, 0, 5])" 570 | ] 571 | }, 572 | "execution_count": 23, 573 | "metadata": {}, 574 | "output_type": "execute_result" 575 | } 576 | ], 577 | "source": [ 578 | "a = np.random.randint(10, size=8) # make a numpy array\n", 579 | "a" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": 24, 585 | "metadata": {}, 586 | "outputs": [ 587 | { 588 | "data": { 589 | "text/plain": [ 590 | "(8,)" 591 | ] 592 | }, 593 | "execution_count": 24, 594 | "metadata": {}, 595 | "output_type": "execute_result" 596 | } 597 | ], 598 | "source": [ 599 | "a.shape" 600 | ] 601 | }, 602 | { 603 | "cell_type": "code", 604 | "execution_count": 25, 605 | "metadata": {}, 606 | "outputs": [ 607 | { 608 | "data": { 609 | "text/plain": [ 610 | "8" 611 | ] 612 | }, 613 | "execution_count": 25, 614 | "metadata": {}, 615 | "output_type": "execute_result" 616 | } 617 | ], 618 | "source": [ 619 | "a.size" 620 | ] 621 | }, 622 | { 623 | "cell_type": "markdown", 624 | "metadata": {}, 625 | "source": [ 626 | "These are fields of the `ndarray` object. Here is a method:" 627 | ] 628 | }, 629 | { 630 | "cell_type": "code", 631 | "execution_count": 26, 632 | "metadata": {}, 633 | "outputs": [ 634 | { 635 | "data": { 636 | "text/plain": [ 637 | "array([0, 2, 3, 4, 5, 5, 5, 7])" 638 | ] 639 | }, 640 | "execution_count": 26, 641 | "metadata": {}, 642 | "output_type": "execute_result" 643 | } 644 | ], 645 | "source": [ 646 | "a.sort()\n", 647 | "a" 648 | ] 649 | }, 650 | { 651 | "cell_type": "markdown", 652 | "metadata": {}, 653 | "source": [ 654 | "- Now imagine we also wanted a function to compute the distance between two circles.\n", 655 | "- This would have been a pain before:" 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": 27, 661 | "metadata": {}, 662 | "outputs": [ 663 | { 664 | "data": { 665 | "text/plain": [ 666 | "3.5990195135927845" 667 | ] 668 | }, 669 | "execution_count": 27, 670 | "metadata": {}, 671 | "output_type": "execute_result" 672 | } 673 | ], 674 | "source": [ 675 | "def dist_between(x1, y1, r1, x2, y2, r2):\n", 676 | " \"\"\"\n", 677 | " Compute the distance between one circle and another circle.\n", 678 | " \n", 679 | " Arguments:\n", 680 | " x1 -- (float) x-coordinate of the centre of the first circle\n", 681 | " y1 -- (float) y-coordinate of the centre of the first circle\n", 682 | " r1 -- (float) radius of the first circle\n", 683 | " x2 -- (float) x-coordinate of the centre of the second circle\n", 684 | " y2 -- (float) y-coordinate of the centre of the second circle\n", 685 | " r2 -- (float) radius of the second circle\n", 686 | " \"\"\"\n", 687 | " return np.sqrt((x1 - x2)**2 + (y1 - y2)**2) - (r1 + r2)\n", 688 | "\n", 689 | "dist_between(x, y, r, x2, y2, r2)" 690 | ] 691 | }, 692 | { 693 | "cell_type": "markdown", 694 | "metadata": {}, 695 | "source": [ 696 | "- What a mess!\n", 697 | "- Now it's much cleaner (and yes I'm violating DRY, but just for teaching purposes!): " 698 | ] 699 | }, 700 | { 701 | "cell_type": "code", 702 | "execution_count": 36, 703 | "metadata": {}, 704 | "outputs": [], 705 | "source": [ 706 | "class Circle:\n", 707 | " \"\"\"A circle with a centre (x,y) and radius r.\"\"\"\n", 708 | " \n", 709 | " def __init__(self, x, y, r):\n", 710 | " self.x = x\n", 711 | " self.y = y\n", 712 | " self.r = r\n", 713 | " \n", 714 | " def area(self):\n", 715 | " return np.pi * self.r**2\n", 716 | "\n", 717 | " def circumference(self):\n", 718 | " return 2.0 * np.pi * self.r\n", 719 | "\n", 720 | " def dist(self):\n", 721 | " \"\"\"Compute the distance to the origin.\"\"\"\n", 722 | " return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)\n", 723 | " \n", 724 | " def dist_between(self, other):\n", 725 | " \"\"\"\n", 726 | " Compute the distance between this circle and another circle.\n", 727 | " \n", 728 | " Parameters\n", 729 | " ----------\n", 730 | " other : Circle\n", 731 | " the other circle.\n", 732 | " \"\"\"\n", 733 | " if not isinstance(other, Circle):\n", 734 | " raise Exception(\"other must be a Circle!!!\")\n", 735 | " \n", 736 | " return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r)" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": 41, 742 | "metadata": {}, 743 | "outputs": [], 744 | "source": [ 745 | "circle1 = Circle(2.0, 3.0, 1.0)" 746 | ] 747 | }, 748 | { 749 | "cell_type": "code", 750 | "execution_count": 42, 751 | "metadata": {}, 752 | "outputs": [], 753 | "source": [ 754 | "circle2 = Circle(8,9,0.1)" 755 | ] 756 | }, 757 | { 758 | "cell_type": "code", 759 | "execution_count": 43, 760 | "metadata": {}, 761 | "outputs": [ 762 | { 763 | "data": { 764 | "text/plain": [ 765 | "7.38528137423857" 766 | ] 767 | }, 768 | "execution_count": 43, 769 | "metadata": {}, 770 | "output_type": "execute_result" 771 | } 772 | ], 773 | "source": [ 774 | "circle2.dist_between(circle1)" 775 | ] 776 | }, 777 | { 778 | "cell_type": "markdown", 779 | "metadata": {}, 780 | "source": [ 781 | "#### Changing data in a class\n", 782 | "\n", 783 | "- Classes you create are generally mutable.\n", 784 | "- You can directly change the data like this:" 785 | ] 786 | }, 787 | { 788 | "cell_type": "code", 789 | "execution_count": 44, 790 | "metadata": {}, 791 | "outputs": [ 792 | { 793 | "data": { 794 | "text/plain": [ 795 | "6.283185307179586" 796 | ] 797 | }, 798 | "execution_count": 44, 799 | "metadata": {}, 800 | "output_type": "execute_result" 801 | } 802 | ], 803 | "source": [ 804 | "circle1.circumference()" 805 | ] 806 | }, 807 | { 808 | "cell_type": "code", 809 | "execution_count": 45, 810 | "metadata": {}, 811 | "outputs": [ 812 | { 813 | "data": { 814 | "text/plain": [ 815 | "62.83185307179586" 816 | ] 817 | }, 818 | "execution_count": 45, 819 | "metadata": {}, 820 | "output_type": "execute_result" 821 | } 822 | ], 823 | "source": [ 824 | "circle1.r = 10\n", 825 | "circle1.circumference()" 826 | ] 827 | }, 828 | { 829 | "cell_type": "markdown", 830 | "metadata": {}, 831 | "source": [ 832 | "You can also create methods that allow the user to change the object:" 833 | ] 834 | }, 835 | { 836 | "cell_type": "code", 837 | "execution_count": 46, 838 | "metadata": {}, 839 | "outputs": [], 840 | "source": [ 841 | "class Circle:\n", 842 | " \"\"\"A circle with a centre (x,y) and radius r.\"\"\"\n", 843 | " \n", 844 | " def __init__(self, x, y, r):\n", 845 | " self.x = x\n", 846 | " self.y = y\n", 847 | " self.r = r\n", 848 | " \n", 849 | " def area(self):\n", 850 | " return np.pi * self.r**2\n", 851 | "\n", 852 | " def circumference(self):\n", 853 | " return 2.0 * np.pi * self.r\n", 854 | "\n", 855 | " def dist(self):\n", 856 | " \"\"\"Compute the distance to the origin.\"\"\"\n", 857 | " return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)\n", 858 | " \n", 859 | " def dist_between(self, other):\n", 860 | " \"\"\"Compute the distance between this circle and another circle.\"\"\"\n", 861 | " return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r)\n", 862 | " \n", 863 | " def translate(self, Δx, Δy):\n", 864 | " \"\"\"Move the circle by (Δx, Δy)\"\"\"\n", 865 | " self.x += Δx\n", 866 | " self.y += Δy\n", 867 | " return self # This is not needed, but is sometimes convenient." 868 | ] 869 | }, 870 | { 871 | "cell_type": "code", 872 | "execution_count": 47, 873 | "metadata": {}, 874 | "outputs": [], 875 | "source": [ 876 | "circle1 = Circle(2.0, 3.0, 1.0)" 877 | ] 878 | }, 879 | { 880 | "cell_type": "code", 881 | "execution_count": 48, 882 | "metadata": {}, 883 | "outputs": [ 884 | { 885 | "data": { 886 | "text/plain": [ 887 | "2.605551275463989" 888 | ] 889 | }, 890 | "execution_count": 48, 891 | "metadata": {}, 892 | "output_type": "execute_result" 893 | } 894 | ], 895 | "source": [ 896 | "circle1.dist()" 897 | ] 898 | }, 899 | { 900 | "cell_type": "code", 901 | "execution_count": 49, 902 | "metadata": {}, 903 | "outputs": [ 904 | { 905 | "data": { 906 | "text/plain": [ 907 | "16.69180601295413" 908 | ] 909 | }, 910 | "execution_count": 49, 911 | "metadata": {}, 912 | "output_type": "execute_result" 913 | } 914 | ], 915 | "source": [ 916 | "circle1.translate(10, 10)\n", 917 | "circle1.dist()" 918 | ] 919 | }, 920 | { 921 | "cell_type": "markdown", 922 | "metadata": {}, 923 | "source": [ 924 | "#### Other special methods" 925 | ] 926 | }, 927 | { 928 | "cell_type": "markdown", 929 | "metadata": {}, 930 | "source": [ 931 | "- Aside from `__init__`, there are other special methods you might find useful.\n", 932 | "- For example, what if we want to print our object." 933 | ] 934 | }, 935 | { 936 | "cell_type": "code", 937 | "execution_count": 50, 938 | "metadata": {}, 939 | "outputs": [ 940 | { 941 | "name": "stdout", 942 | "output_type": "stream", 943 | "text": [ 944 | "<__main__.Circle object at 0x106ce4dd8>\n" 945 | ] 946 | } 947 | ], 948 | "source": [ 949 | "print(circle1)" 950 | ] 951 | }, 952 | { 953 | "cell_type": "markdown", 954 | "metadata": {}, 955 | "source": [ 956 | "- This doesn't look very good.\n", 957 | "- But other objects, like numpy arrays, print out nicely:" 958 | ] 959 | }, 960 | { 961 | "cell_type": "code", 962 | "execution_count": 51, 963 | "metadata": {}, 964 | "outputs": [ 965 | { 966 | "name": "stdout", 967 | "output_type": "stream", 968 | "text": [ 969 | "[0 2 3 4 5 5 5 7]\n" 970 | ] 971 | } 972 | ], 973 | "source": [ 974 | "print(a)" 975 | ] 976 | }, 977 | { 978 | "cell_type": "markdown", 979 | "metadata": {}, 980 | "source": [ 981 | "- To specify how our object is printed, we can define a method called `__str__` ([Python documentation](https://docs.python.org/3/reference/datamodel.html#object.__str__))." 982 | ] 983 | }, 984 | { 985 | "cell_type": "code", 986 | "execution_count": 114, 987 | "metadata": {}, 988 | "outputs": [], 989 | "source": [ 990 | "class Circle:\n", 991 | " \"\"\"A circle with a centre (x,y) and radius r.\"\"\"\n", 992 | " \n", 993 | " def __init__(self, x, y, r):\n", 994 | " self.x = x\n", 995 | " self.y = y\n", 996 | " self.r = r\n", 997 | " self.area = np.pi * self.r**2\n", 998 | " \n", 999 | " def area(self):\n", 1000 | " return np.pi * self.r**2\n", 1001 | "\n", 1002 | " def circumference(self):\n", 1003 | " return 2.0 * np.pi * self.r\n", 1004 | "\n", 1005 | " def dist(self):\n", 1006 | " \"\"\"Compute the distance to the origin.\"\"\"\n", 1007 | " return np.abs(np.sqrt(self.x**2 + self.y**2) - self.r)\n", 1008 | " \n", 1009 | " def dist_between(self, other):\n", 1010 | " \"\"\"Compute the distance between this circle and another circle.\"\"\"\n", 1011 | " return np.sqrt((self.x - other.x)**2 + (self.y - other.y)**2) - (self.r + other.r)\n", 1012 | " \n", 1013 | " def translate(self, Δx, Δy):\n", 1014 | " \"\"\"Move the circle by (Δx, Δy)\"\"\"\n", 1015 | " self.x += Δx\n", 1016 | " self.y += Δy\n", 1017 | " return self # This is not needed, but is sometimes convenient.\n", 1018 | " \n", 1019 | " def __str__(self):\n", 1020 | " return \"A Circle at (%.1f, %.1f) with radius %.1f.\" % (self.x, self.y, self.r)" 1021 | ] 1022 | }, 1023 | { 1024 | "cell_type": "code", 1025 | "execution_count": 115, 1026 | "metadata": {}, 1027 | "outputs": [], 1028 | "source": [ 1029 | "circle1 = Circle(2.0, 3.0, 1.0)" 1030 | ] 1031 | }, 1032 | { 1033 | "cell_type": "code", 1034 | "execution_count": 116, 1035 | "metadata": {}, 1036 | "outputs": [ 1037 | { 1038 | "name": "stdout", 1039 | "output_type": "stream", 1040 | "text": [ 1041 | "A Circle at (2.0, 3.0) with radius 1.0.\n" 1042 | ] 1043 | } 1044 | ], 1045 | "source": [ 1046 | "print(circle1)" 1047 | ] 1048 | }, 1049 | { 1050 | "cell_type": "markdown", 1051 | "metadata": {}, 1052 | "source": [ 1053 | "## Python `import` (10 min)" 1054 | ] 1055 | }, 1056 | { 1057 | "cell_type": "markdown", 1058 | "metadata": {}, 1059 | "source": [ 1060 | "- It is often useful to collect a bunch of classes and functions into **modules** or **packages** ([Python package documentation](https://docs.python.org/3/tutorial/modules.html#packages)).\n", 1061 | " - For example, numpy is a package that contains both classes (e.g. `np.ndarray`) and functions (e.g. `np.sqrt`) and even constants (e.g. `np.pi`).\n", 1062 | "- We will discuss packages in depth in DSCI 524.\n", 1063 | "- For now, we'll just discuss importing packages.\n", 1064 | "- Unfortunately, this is a bit confusing." 1065 | ] 1066 | }, 1067 | { 1068 | "cell_type": "markdown", 1069 | "metadata": {}, 1070 | "source": [ 1071 | "#### Ways of importing things\n", 1072 | "\n", 1073 | "Let's use `numpy` as an example, and import it in various ways.\n" 1074 | ] 1075 | }, 1076 | { 1077 | "cell_type": "markdown", 1078 | "metadata": {}, 1079 | "source": [ 1080 | "Import a package:" 1081 | ] 1082 | }, 1083 | { 1084 | "cell_type": "code", 1085 | "execution_count": 55, 1086 | "metadata": {}, 1087 | "outputs": [], 1088 | "source": [ 1089 | "import numpy" 1090 | ] 1091 | }, 1092 | { 1093 | "cell_type": "code", 1094 | "execution_count": 56, 1095 | "metadata": {}, 1096 | "outputs": [ 1097 | { 1098 | "data": { 1099 | "text/plain": [ 1100 | "2.23606797749979" 1101 | ] 1102 | }, 1103 | "execution_count": 56, 1104 | "metadata": {}, 1105 | "output_type": "execute_result" 1106 | } 1107 | ], 1108 | "source": [ 1109 | "numpy.sqrt(5)" 1110 | ] 1111 | }, 1112 | { 1113 | "cell_type": "markdown", 1114 | "metadata": {}, 1115 | "source": [ 1116 | "Import a package, but refer to it by a different name:" 1117 | ] 1118 | }, 1119 | { 1120 | "cell_type": "code", 1121 | "execution_count": 57, 1122 | "metadata": {}, 1123 | "outputs": [], 1124 | "source": [ 1125 | "import numpy as np" 1126 | ] 1127 | }, 1128 | { 1129 | "cell_type": "code", 1130 | "execution_count": 58, 1131 | "metadata": {}, 1132 | "outputs": [ 1133 | { 1134 | "data": { 1135 | "text/plain": [ 1136 | "2.23606797749979" 1137 | ] 1138 | }, 1139 | "execution_count": 58, 1140 | "metadata": {}, 1141 | "output_type": "execute_result" 1142 | } 1143 | ], 1144 | "source": [ 1145 | "np.sqrt(5)" 1146 | ] 1147 | }, 1148 | { 1149 | "cell_type": "code", 1150 | "execution_count": 59, 1151 | "metadata": {}, 1152 | "outputs": [ 1153 | { 1154 | "data": { 1155 | "text/plain": [ 1156 | "-0.26086894926921717" 1157 | ] 1158 | }, 1159 | "execution_count": 59, 1160 | "metadata": {}, 1161 | "output_type": "execute_result" 1162 | } 1163 | ], 1164 | "source": [ 1165 | "np.random.randn()" 1166 | ] 1167 | }, 1168 | { 1169 | "cell_type": "markdown", 1170 | "metadata": {}, 1171 | "source": [ 1172 | "Import a particular function from a package:" 1173 | ] 1174 | }, 1175 | { 1176 | "cell_type": "code", 1177 | "execution_count": 60, 1178 | "metadata": {}, 1179 | "outputs": [], 1180 | "source": [ 1181 | "from numpy.random import randn" 1182 | ] 1183 | }, 1184 | { 1185 | "cell_type": "code", 1186 | "execution_count": 61, 1187 | "metadata": {}, 1188 | "outputs": [ 1189 | { 1190 | "data": { 1191 | "text/plain": [ 1192 | "-0.44897876253709507" 1193 | ] 1194 | }, 1195 | "execution_count": 61, 1196 | "metadata": {}, 1197 | "output_type": "execute_result" 1198 | } 1199 | ], 1200 | "source": [ 1201 | "randn() # now I can refer to it without the package/module names" 1202 | ] 1203 | }, 1204 | { 1205 | "cell_type": "code", 1206 | "execution_count": 62, 1207 | "metadata": {}, 1208 | "outputs": [], 1209 | "source": [ 1210 | "from numpy.random import randn as random_gaussian" 1211 | ] 1212 | }, 1213 | { 1214 | "cell_type": "code", 1215 | "execution_count": 63, 1216 | "metadata": {}, 1217 | "outputs": [ 1218 | { 1219 | "data": { 1220 | "text/plain": [ 1221 | "0.898816560829361" 1222 | ] 1223 | }, 1224 | "execution_count": 63, 1225 | "metadata": {}, 1226 | "output_type": "execute_result" 1227 | } 1228 | ], 1229 | "source": [ 1230 | "random_gaussian()" 1231 | ] 1232 | }, 1233 | { 1234 | "cell_type": "code", 1235 | "execution_count": 64, 1236 | "metadata": {}, 1237 | "outputs": [ 1238 | { 1239 | "data": { 1240 | "text/plain": [ 1241 | "0.6725552471462851" 1242 | ] 1243 | }, 1244 | "execution_count": 64, 1245 | "metadata": {}, 1246 | "output_type": "execute_result" 1247 | } 1248 | ], 1249 | "source": [ 1250 | "np.random.rand()" 1251 | ] 1252 | }, 1253 | { 1254 | "cell_type": "markdown", 1255 | "metadata": {}, 1256 | "source": [ 1257 | "It's also possible to import everything in a module, though this is generally not recommended:" 1258 | ] 1259 | }, 1260 | { 1261 | "cell_type": "code", 1262 | "execution_count": 65, 1263 | "metadata": {}, 1264 | "outputs": [], 1265 | "source": [ 1266 | "from numpy.random import *" 1267 | ] 1268 | }, 1269 | { 1270 | "cell_type": "code", 1271 | "execution_count": 66, 1272 | "metadata": {}, 1273 | "outputs": [ 1274 | { 1275 | "data": { 1276 | "text/plain": [ 1277 | "1" 1278 | ] 1279 | }, 1280 | "execution_count": 66, 1281 | "metadata": {}, 1282 | "output_type": "execute_result" 1283 | } 1284 | ], 1285 | "source": [ 1286 | "binomial(10, 0.1)" 1287 | ] 1288 | }, 1289 | { 1290 | "cell_type": "markdown", 1291 | "metadata": {}, 1292 | "source": [ 1293 | "#### Some annoying facts of life" 1294 | ] 1295 | }, 1296 | { 1297 | "cell_type": "markdown", 1298 | "metadata": {}, 1299 | "source": [ 1300 | "The module and the function might have the same name:" 1301 | ] 1302 | }, 1303 | { 1304 | "cell_type": "code", 1305 | "execution_count": 67, 1306 | "metadata": {}, 1307 | "outputs": [], 1308 | "source": [ 1309 | "import random" 1310 | ] 1311 | }, 1312 | { 1313 | "cell_type": "code", 1314 | "execution_count": 68, 1315 | "metadata": {}, 1316 | "outputs": [ 1317 | { 1318 | "data": { 1319 | "text/plain": [ 1320 | "0.31015211304267387" 1321 | ] 1322 | }, 1323 | "execution_count": 68, 1324 | "metadata": {}, 1325 | "output_type": "execute_result" 1326 | } 1327 | ], 1328 | "source": [ 1329 | "random.random()" 1330 | ] 1331 | }, 1332 | { 1333 | "cell_type": "code", 1334 | "execution_count": 69, 1335 | "metadata": {}, 1336 | "outputs": [], 1337 | "source": [ 1338 | "from random import random" 1339 | ] 1340 | }, 1341 | { 1342 | "cell_type": "code", 1343 | "execution_count": 70, 1344 | "metadata": {}, 1345 | "outputs": [ 1346 | { 1347 | "data": { 1348 | "text/plain": [ 1349 | "0.04886840168047635" 1350 | ] 1351 | }, 1352 | "execution_count": 70, 1353 | "metadata": {}, 1354 | "output_type": "execute_result" 1355 | } 1356 | ], 1357 | "source": [ 1358 | "random()" 1359 | ] 1360 | }, 1361 | { 1362 | "cell_type": "markdown", 1363 | "metadata": {}, 1364 | "source": [ 1365 | "Sometimes you may need to explicitly import submodules to use them:" 1366 | ] 1367 | }, 1368 | { 1369 | "cell_type": "code", 1370 | "execution_count": 71, 1371 | "metadata": {}, 1372 | "outputs": [], 1373 | "source": [ 1374 | "import scipy" 1375 | ] 1376 | }, 1377 | { 1378 | "cell_type": "code", 1379 | "execution_count": 72, 1380 | "metadata": {}, 1381 | "outputs": [ 1382 | { 1383 | "ename": "AttributeError", 1384 | "evalue": "module 'scipy' has no attribute 'stats'", 1385 | "output_type": "error", 1386 | "traceback": [ 1387 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 1388 | "\u001b[0;31mAttributeError\u001b[0m Traceback (most recent call last)", 1389 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mscipy\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mstats\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 1390 | "\u001b[0;31mAttributeError\u001b[0m: module 'scipy' has no attribute 'stats'" 1391 | ] 1392 | } 1393 | ], 1394 | "source": [ 1395 | "scipy.stats" 1396 | ] 1397 | }, 1398 | { 1399 | "cell_type": "code", 1400 | "execution_count": 73, 1401 | "metadata": {}, 1402 | "outputs": [], 1403 | "source": [ 1404 | "import scipy.stats" 1405 | ] 1406 | }, 1407 | { 1408 | "cell_type": "code", 1409 | "execution_count": 74, 1410 | "metadata": {}, 1411 | "outputs": [ 1412 | { 1413 | "data": { 1414 | "text/plain": [ 1415 | "" 1416 | ] 1417 | }, 1418 | "execution_count": 74, 1419 | "metadata": {}, 1420 | "output_type": "execute_result" 1421 | } 1422 | ], 1423 | "source": [ 1424 | "scipy.stats" 1425 | ] 1426 | }, 1427 | { 1428 | "cell_type": "markdown", 1429 | "metadata": {}, 1430 | "source": [ 1431 | "In Python, the import name and the install name do not necessarily match:" 1432 | ] 1433 | }, 1434 | { 1435 | "cell_type": "code", 1436 | "execution_count": 75, 1437 | "metadata": {}, 1438 | "outputs": [], 1439 | "source": [ 1440 | "import sklearn" 1441 | ] 1442 | }, 1443 | { 1444 | "cell_type": "markdown", 1445 | "metadata": {}, 1446 | "source": [ 1447 | "To install, run `pip install scikit-learn`." 1448 | ] 1449 | }, 1450 | { 1451 | "cell_type": "markdown", 1452 | "metadata": {}, 1453 | "source": [ 1454 | "#### `dir`\n", 1455 | "\n", 1456 | "You can use `dir` to look up what can be done with an object:" 1457 | ] 1458 | }, 1459 | { 1460 | "cell_type": "code", 1461 | "execution_count": 76, 1462 | "metadata": {}, 1463 | "outputs": [ 1464 | { 1465 | "data": { 1466 | "text/plain": [ 1467 | "['__class__',\n", 1468 | " '__delattr__',\n", 1469 | " '__dict__',\n", 1470 | " '__dir__',\n", 1471 | " '__doc__',\n", 1472 | " '__eq__',\n", 1473 | " '__format__',\n", 1474 | " '__ge__',\n", 1475 | " '__getattribute__',\n", 1476 | " '__gt__',\n", 1477 | " '__hash__',\n", 1478 | " '__init__',\n", 1479 | " '__init_subclass__',\n", 1480 | " '__le__',\n", 1481 | " '__lt__',\n", 1482 | " '__module__',\n", 1483 | " '__ne__',\n", 1484 | " '__new__',\n", 1485 | " '__reduce__',\n", 1486 | " '__reduce_ex__',\n", 1487 | " '__repr__',\n", 1488 | " '__setattr__',\n", 1489 | " '__sizeof__',\n", 1490 | " '__str__',\n", 1491 | " '__subclasshook__',\n", 1492 | " '__weakref__',\n", 1493 | " 'area',\n", 1494 | " 'circumference',\n", 1495 | " 'dist',\n", 1496 | " 'dist_between',\n", 1497 | " 'r',\n", 1498 | " 'translate',\n", 1499 | " 'x',\n", 1500 | " 'y']" 1501 | ] 1502 | }, 1503 | "execution_count": 76, 1504 | "metadata": {}, 1505 | "output_type": "execute_result" 1506 | } 1507 | ], 1508 | "source": [ 1509 | "dir(circle1)" 1510 | ] 1511 | }, 1512 | { 1513 | "cell_type": "markdown", 1514 | "metadata": {}, 1515 | "source": [ 1516 | "## Importing your own functions (5 min)\n", 1517 | "\n", 1518 | "- In many MDS courses we only work in Jupyter - it is a great teaching & learning environment.\n", 1519 | "- However, when we write larger pieces of code we will need to move to `.py` files. \n", 1520 | "- Let's restart the kernel so that `Circle` is no longer in the environment." 1521 | ] 1522 | }, 1523 | { 1524 | "cell_type": "code", 1525 | "execution_count": 77, 1526 | "metadata": {}, 1527 | "outputs": [], 1528 | "source": [ 1529 | "circle = Circle(1,2,3)" 1530 | ] 1531 | }, 1532 | { 1533 | "cell_type": "markdown", 1534 | "metadata": {}, 1535 | "source": [ 1536 | "- Luckily, I have a file in this directory named `circle.py` - let's take a look." 1537 | ] 1538 | }, 1539 | { 1540 | "cell_type": "code", 1541 | "execution_count": 78, 1542 | "metadata": {}, 1543 | "outputs": [], 1544 | "source": [ 1545 | "from circle import Circle" 1546 | ] 1547 | }, 1548 | { 1549 | "cell_type": "code", 1550 | "execution_count": 79, 1551 | "metadata": {}, 1552 | "outputs": [], 1553 | "source": [ 1554 | "c = Circle(1,2,3)" 1555 | ] 1556 | }, 1557 | { 1558 | "cell_type": "code", 1559 | "execution_count": 80, 1560 | "metadata": {}, 1561 | "outputs": [ 1562 | { 1563 | "ename": "NameError", 1564 | "evalue": "name 'my_function' is not defined", 1565 | "output_type": "error", 1566 | "traceback": [ 1567 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 1568 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 1569 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mmy_function\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 1570 | "\u001b[0;31mNameError\u001b[0m: name 'my_function' is not defined" 1571 | ] 1572 | } 1573 | ], 1574 | "source": [ 1575 | "my_function()" 1576 | ] 1577 | }, 1578 | { 1579 | "cell_type": "code", 1580 | "execution_count": 81, 1581 | "metadata": {}, 1582 | "outputs": [], 1583 | "source": [ 1584 | "from circle import *" 1585 | ] 1586 | }, 1587 | { 1588 | "cell_type": "code", 1589 | "execution_count": 82, 1590 | "metadata": {}, 1591 | "outputs": [], 1592 | "source": [ 1593 | "my_function()" 1594 | ] 1595 | }, 1596 | { 1597 | "cell_type": "code", 1598 | "execution_count": 83, 1599 | "metadata": {}, 1600 | "outputs": [ 1601 | { 1602 | "data": { 1603 | "text/plain": [ 1604 | "5" 1605 | ] 1606 | }, 1607 | "execution_count": 83, 1608 | "metadata": {}, 1609 | "output_type": "execute_result" 1610 | } 1611 | ], 1612 | "source": [ 1613 | "MY_CONSTANT" 1614 | ] 1615 | }, 1616 | { 1617 | "cell_type": "markdown", 1618 | "metadata": {}, 1619 | "source": [ 1620 | "- We imported not only a class, but also a function and a single variable.\n", 1621 | "- It makes sense that we can import all of these, because they are all objects in Python, just with different types:" 1622 | ] 1623 | }, 1624 | { 1625 | "cell_type": "code", 1626 | "execution_count": 84, 1627 | "metadata": {}, 1628 | "outputs": [ 1629 | { 1630 | "data": { 1631 | "text/plain": [ 1632 | "type" 1633 | ] 1634 | }, 1635 | "execution_count": 84, 1636 | "metadata": {}, 1637 | "output_type": "execute_result" 1638 | } 1639 | ], 1640 | "source": [ 1641 | "type(Circle)" 1642 | ] 1643 | }, 1644 | { 1645 | "cell_type": "code", 1646 | "execution_count": 85, 1647 | "metadata": {}, 1648 | "outputs": [ 1649 | { 1650 | "data": { 1651 | "text/plain": [ 1652 | "function" 1653 | ] 1654 | }, 1655 | "execution_count": 85, 1656 | "metadata": {}, 1657 | "output_type": "execute_result" 1658 | } 1659 | ], 1660 | "source": [ 1661 | "type(my_function)" 1662 | ] 1663 | }, 1664 | { 1665 | "cell_type": "code", 1666 | "execution_count": 86, 1667 | "metadata": {}, 1668 | "outputs": [ 1669 | { 1670 | "data": { 1671 | "text/plain": [ 1672 | "int" 1673 | ] 1674 | }, 1675 | "execution_count": 86, 1676 | "metadata": {}, 1677 | "output_type": "execute_result" 1678 | } 1679 | ], 1680 | "source": [ 1681 | "type(MY_CONSTANT)" 1682 | ] 1683 | }, 1684 | { 1685 | "cell_type": "markdown", 1686 | "metadata": {}, 1687 | "source": [ 1688 | "And `c` itself has a type that we defined:" 1689 | ] 1690 | }, 1691 | { 1692 | "cell_type": "code", 1693 | "execution_count": 87, 1694 | "metadata": {}, 1695 | "outputs": [ 1696 | { 1697 | "data": { 1698 | "text/plain": [ 1699 | "circle.Circle" 1700 | ] 1701 | }, 1702 | "execution_count": 87, 1703 | "metadata": {}, 1704 | "output_type": "execute_result" 1705 | } 1706 | ], 1707 | "source": [ 1708 | "type(c)" 1709 | ] 1710 | }, 1711 | { 1712 | "cell_type": "code", 1713 | "execution_count": 88, 1714 | "metadata": {}, 1715 | "outputs": [ 1716 | { 1717 | "data": { 1718 | "text/plain": [ 1719 | "3.141592653589793" 1720 | ] 1721 | }, 1722 | "execution_count": 88, 1723 | "metadata": {}, 1724 | "output_type": "execute_result" 1725 | } 1726 | ], 1727 | "source": [ 1728 | "np.pi" 1729 | ] 1730 | }, 1731 | { 1732 | "cell_type": "markdown", 1733 | "metadata": {}, 1734 | "source": [ 1735 | "## Break (5 min)" 1736 | ] 1737 | }, 1738 | { 1739 | "cell_type": "code", 1740 | "execution_count": 89, 1741 | "metadata": {}, 1742 | "outputs": [], 1743 | "source": [ 1744 | "import numpy as np" 1745 | ] 1746 | }, 1747 | { 1748 | "cell_type": "markdown", 1749 | "metadata": {}, 1750 | "source": [ 1751 | "## Intriguing behaviour in Python (5 min)" 1752 | ] 1753 | }, 1754 | { 1755 | "cell_type": "markdown", 1756 | "metadata": {}, 1757 | "source": [ 1758 | "What do you think the code below will print?" 1759 | ] 1760 | }, 1761 | { 1762 | "cell_type": "code", 1763 | "execution_count": 90, 1764 | "metadata": { 1765 | "jupyter": { 1766 | "outputs_hidden": false 1767 | } 1768 | }, 1769 | "outputs": [ 1770 | { 1771 | "data": { 1772 | "text/plain": [ 1773 | "1" 1774 | ] 1775 | }, 1776 | "execution_count": 90, 1777 | "metadata": {}, 1778 | "output_type": "execute_result" 1779 | } 1780 | ], 1781 | "source": [ 1782 | "x = 1\n", 1783 | "y = x\n", 1784 | "x = 2\n", 1785 | "y" 1786 | ] 1787 | }, 1788 | { 1789 | "cell_type": "markdown", 1790 | "metadata": {}, 1791 | "source": [ 1792 | "And how about the next one?" 1793 | ] 1794 | }, 1795 | { 1796 | "cell_type": "code", 1797 | "execution_count": 91, 1798 | "metadata": {}, 1799 | "outputs": [ 1800 | { 1801 | "data": { 1802 | "text/plain": [ 1803 | "[2]" 1804 | ] 1805 | }, 1806 | "execution_count": 91, 1807 | "metadata": {}, 1808 | "output_type": "execute_result" 1809 | } 1810 | ], 1811 | "source": [ 1812 | "x = [1]\n", 1813 | "y = x\n", 1814 | "x[0] = 2\n", 1815 | "y" 1816 | ] 1817 | }, 1818 | { 1819 | "cell_type": "markdown", 1820 | "metadata": {}, 1821 | "source": [ 1822 | "## References (10 min)\n", 1823 | "\n", 1824 | "- In Python, the list `x` is a **reference** to some location in the computer's memory. \n", 1825 | "- When you set `y = x` these two variables now refer to the same location in memory - the one that `x` referred to.\n", 1826 | "- Setting `x[0] = 2` goes and modifies that memory. So `x` and `y` are both modified. \n", 1827 | " - It makes no different if you set `x[0] = 2` or `y[0] = 2`, both modify the same memory.\n" 1828 | ] 1829 | }, 1830 | { 1831 | "cell_type": "markdown", 1832 | "metadata": {}, 1833 | "source": [ 1834 | "- However, some basic built-in types `int`, `float`, `bool` etc are _exceptions_ to this logic:\n", 1835 | " - When you set `y = x` it actually copies the value `1`, so `x` and `y` are decoupled.\n", 1836 | " - Thus, the list example is actually the typical case, the integer example is the \"special\" case. \n", 1837 | " \n", 1838 | "- Analogy:\n", 1839 | " - I share a Dropbox folder (or git repo) with you, and you modify it -- I sent you _the location of the stuff_ (this is like the list case)\n", 1840 | " - I send you an email with a file attached, you download it and modify the file -- I sent you _the stuff itself_ (this is like the integer case)\n", 1841 | "\n" 1842 | ] 1843 | }, 1844 | { 1845 | "cell_type": "markdown", 1846 | "metadata": {}, 1847 | "source": [ 1848 | "And this?" 1849 | ] 1850 | }, 1851 | { 1852 | "cell_type": "code", 1853 | "execution_count": 92, 1854 | "metadata": {}, 1855 | "outputs": [ 1856 | { 1857 | "data": { 1858 | "text/plain": [ 1859 | "[1]" 1860 | ] 1861 | }, 1862 | "execution_count": 92, 1863 | "metadata": {}, 1864 | "output_type": "execute_result" 1865 | } 1866 | ], 1867 | "source": [ 1868 | "x = [1]\n", 1869 | "y = x\n", 1870 | "x = [2] # before we had x[0] = 2\n", 1871 | "y" 1872 | ] 1873 | }, 1874 | { 1875 | "cell_type": "markdown", 1876 | "metadata": {}, 1877 | "source": [ 1878 | "


\n", 1879 | "No, here we are not modifying the contents of `x`, we are setting `x` to refer to a new list `[2]`." 1880 | ] 1881 | }, 1882 | { 1883 | "cell_type": "markdown", 1884 | "metadata": {}, 1885 | "source": [ 1886 | "#### Additional weirdness" 1887 | ] 1888 | }, 1889 | { 1890 | "cell_type": "code", 1891 | "execution_count": 93, 1892 | "metadata": {}, 1893 | "outputs": [ 1894 | { 1895 | "data": { 1896 | "text/plain": [ 1897 | "array([1, 2, 3, 4, 5])" 1898 | ] 1899 | }, 1900 | "execution_count": 93, 1901 | "metadata": {}, 1902 | "output_type": "execute_result" 1903 | } 1904 | ], 1905 | "source": [ 1906 | "x = np.array([1,2,3,4,5])\n", 1907 | "y = x\n", 1908 | "x = x + 5\n", 1909 | "y" 1910 | ] 1911 | }, 1912 | { 1913 | "cell_type": "code", 1914 | "execution_count": 94, 1915 | "metadata": {}, 1916 | "outputs": [ 1917 | { 1918 | "data": { 1919 | "text/plain": [ 1920 | "array([ 6, 7, 8, 9, 10])" 1921 | ] 1922 | }, 1923 | "execution_count": 94, 1924 | "metadata": {}, 1925 | "output_type": "execute_result" 1926 | } 1927 | ], 1928 | "source": [ 1929 | "x = np.array([1,2,3,4,5])\n", 1930 | "y = x\n", 1931 | "x += 5\n", 1932 | "y" 1933 | ] 1934 | }, 1935 | { 1936 | "cell_type": "markdown", 1937 | "metadata": {}, 1938 | "source": [ 1939 | "So, it turns out `x += 5` is not identical `x = x + 5`.\n", 1940 | "\n", 1941 | "- The former modifies the contents of `x`.\n", 1942 | "- The latter first evaluates `x + 5` to a new array of the same size, and then overwrites the name `x` with a reference to this new array." 1943 | ] 1944 | }, 1945 | { 1946 | "cell_type": "markdown", 1947 | "metadata": {}, 1948 | "source": [ 1949 | "## Function calls and references (5 min)" 1950 | ] 1951 | }, 1952 | { 1953 | "cell_type": "markdown", 1954 | "metadata": {}, 1955 | "source": [ 1956 | "How about these?" 1957 | ] 1958 | }, 1959 | { 1960 | "cell_type": "code", 1961 | "execution_count": 95, 1962 | "metadata": { 1963 | "jupyter": { 1964 | "outputs_hidden": false 1965 | } 1966 | }, 1967 | "outputs": [ 1968 | { 1969 | "data": { 1970 | "text/plain": [ 1971 | "\"I'm outside.\"" 1972 | ] 1973 | }, 1974 | "execution_count": 95, 1975 | "metadata": {}, 1976 | "output_type": "execute_result" 1977 | } 1978 | ], 1979 | "source": [ 1980 | "def foo(y):\n", 1981 | " y = \"Hello from inside foo!\"\n", 1982 | " return y\n", 1983 | "\n", 1984 | "x = \"I'm outside.\"\n", 1985 | "foo(x)\n", 1986 | "x" 1987 | ] 1988 | }, 1989 | { 1990 | "cell_type": "code", 1991 | "execution_count": 96, 1992 | "metadata": {}, 1993 | "outputs": [ 1994 | { 1995 | "data": { 1996 | "text/plain": [ 1997 | "['Hello from inside foo!']" 1998 | ] 1999 | }, 2000 | "execution_count": 96, 2001 | "metadata": {}, 2002 | "output_type": "execute_result" 2003 | } 2004 | ], 2005 | "source": [ 2006 | "def bar(y):\n", 2007 | " y[0] = \"Hello from inside foo!\"\n", 2008 | "x = [\"I'm outside.\"]\n", 2009 | "bar(x)\n", 2010 | "x" 2011 | ] 2012 | }, 2013 | { 2014 | "cell_type": "markdown", 2015 | "metadata": {}, 2016 | "source": [ 2017 | "- Above: the fact that you called a function is not relevant.\n", 2018 | "- When pass the value of `x` into the function and it becomes `y` in the function, that is basically like `y = x` we had above.\n", 2019 | "- In the latter case, we say the function has a [side effect](https://en.wikipedia.org/wiki/Side_effect_(computer_science))." 2020 | ] 2021 | }, 2022 | { 2023 | "cell_type": "code", 2024 | "execution_count": 97, 2025 | "metadata": { 2026 | "jupyter": { 2027 | "outputs_hidden": false 2028 | } 2029 | }, 2030 | "outputs": [ 2031 | { 2032 | "data": { 2033 | "text/plain": [ 2034 | "'Hello from inside foo!'" 2035 | ] 2036 | }, 2037 | "execution_count": 97, 2038 | "metadata": {}, 2039 | "output_type": "execute_result" 2040 | } 2041 | ], 2042 | "source": [ 2043 | "x = \"I'm outside.\"\n", 2044 | "x = foo(x)\n", 2045 | "x" 2046 | ] 2047 | }, 2048 | { 2049 | "cell_type": "markdown", 2050 | "metadata": {}, 2051 | "source": [ 2052 | "- Above: in this case, `x` is not getting modified inside `foo`.\n", 2053 | "- Rather it's getting overwritten after the function call." 2054 | ] 2055 | }, 2056 | { 2057 | "cell_type": "markdown", 2058 | "metadata": {}, 2059 | "source": [ 2060 | "- (Optional) If you're interested, there is a bunch of terminology you can look up\n", 2061 | " - pass by value (call by value)\n", 2062 | " - pass by reference (call by reference)\n", 2063 | " - copy-on-modify\n", 2064 | " - lazy copying\n", 2065 | " - ...\n" 2066 | ] 2067 | }, 2068 | { 2069 | "cell_type": "markdown", 2070 | "metadata": {}, 2071 | "source": [ 2072 | "- Good news: the we don't need to memorize special rules for calling functions. \n", 2073 | "- Copying happens with `int`, `float`, `bool`, probably some other things I'm forgetting; the rest is \"by reference\"\n", 2074 | "- now you see why we care if objects are mutable or immutable... passing around a reference can be dangerous!\n", 2075 | "- **General rule**: if you do `x = ...` then you're not modifying the original, but if you do `x.SOMETHING = y` or `x[SOMETHING] = y` or `x *= y` then you probably are." 2076 | ] 2077 | }, 2078 | { 2079 | "cell_type": "markdown", 2080 | "metadata": {}, 2081 | "source": [ 2082 | "Note: In R, life is simpler - means you're never \"modifying the original\" inside a function." 2083 | ] 2084 | }, 2085 | { 2086 | "cell_type": "markdown", 2087 | "metadata": {}, 2088 | "source": [ 2089 | "## `copy` and `deepcopy` (10 min)" 2090 | ] 2091 | }, 2092 | { 2093 | "cell_type": "code", 2094 | "execution_count": 98, 2095 | "metadata": { 2096 | "jupyter": { 2097 | "outputs_hidden": false 2098 | } 2099 | }, 2100 | "outputs": [ 2101 | { 2102 | "data": { 2103 | "text/plain": [ 2104 | "[2]" 2105 | ] 2106 | }, 2107 | "execution_count": 98, 2108 | "metadata": {}, 2109 | "output_type": "execute_result" 2110 | } 2111 | ], 2112 | "source": [ 2113 | "import copy\n", 2114 | "\n", 2115 | "x = [1]\n", 2116 | "y = x\n", 2117 | "x[0] = 2\n", 2118 | "y" 2119 | ] 2120 | }, 2121 | { 2122 | "cell_type": "code", 2123 | "execution_count": 99, 2124 | "metadata": {}, 2125 | "outputs": [ 2126 | { 2127 | "data": { 2128 | "text/plain": [ 2129 | "[1]" 2130 | ] 2131 | }, 2132 | "execution_count": 99, 2133 | "metadata": {}, 2134 | "output_type": "execute_result" 2135 | } 2136 | ], 2137 | "source": [ 2138 | "x = [1]\n", 2139 | "y = copy.copy(x)\n", 2140 | "x[0] = 2\n", 2141 | "y" 2142 | ] 2143 | }, 2144 | { 2145 | "cell_type": "markdown", 2146 | "metadata": {}, 2147 | "source": [ 2148 | "Ok, so what do you think will happen here?" 2149 | ] 2150 | }, 2151 | { 2152 | "cell_type": "code", 2153 | "execution_count": 100, 2154 | "metadata": { 2155 | "jupyter": { 2156 | "outputs_hidden": false 2157 | } 2158 | }, 2159 | "outputs": [ 2160 | { 2161 | "name": "stdout", 2162 | "output_type": "stream", 2163 | "text": [ 2164 | "[['pikachu'], [2, 99], [3, 'hi']]\n", 2165 | "[['pikachu'], [2, 99], [3, 'hi']]\n" 2166 | ] 2167 | } 2168 | ], 2169 | "source": [ 2170 | "x = [[1], [2,99], [3, \"hi\"]] # a list of lists\n", 2171 | "\n", 2172 | "y = copy.copy(x) \n", 2173 | "\n", 2174 | "x[0][0] = \"pikachu\"\n", 2175 | "print(x)\n", 2176 | "print(y)" 2177 | ] 2178 | }, 2179 | { 2180 | "cell_type": "markdown", 2181 | "metadata": {}, 2182 | "source": [ 2183 | "


\n", 2184 | "What happened? \n", 2185 | "\n", 2186 | "- `copy` makes the _containers_ different, i.e. the outer list. \n", 2187 | "- But the outer lists both point to the same data.\n", 2188 | "- This is what happens after `y = copy.copy(x)`:\n", 2189 | "\n", 2190 | "![](listCopySmall.jpg)" 2191 | ] 2192 | }, 2193 | { 2194 | "cell_type": "markdown", 2195 | "metadata": {}, 2196 | "source": [ 2197 | "We can use `is` to tell apart these scenarios." 2198 | ] 2199 | }, 2200 | { 2201 | "cell_type": "code", 2202 | "execution_count": 101, 2203 | "metadata": {}, 2204 | "outputs": [ 2205 | { 2206 | "data": { 2207 | "text/plain": [ 2208 | "True" 2209 | ] 2210 | }, 2211 | "execution_count": 101, 2212 | "metadata": {}, 2213 | "output_type": "execute_result" 2214 | } 2215 | ], 2216 | "source": [ 2217 | "x == y # they are both lists of the same lists" 2218 | ] 2219 | }, 2220 | { 2221 | "cell_type": "code", 2222 | "execution_count": 102, 2223 | "metadata": {}, 2224 | "outputs": [ 2225 | { 2226 | "data": { 2227 | "text/plain": [ 2228 | "False" 2229 | ] 2230 | }, 2231 | "execution_count": 102, 2232 | "metadata": {}, 2233 | "output_type": "execute_result" 2234 | } 2235 | ], 2236 | "source": [ 2237 | "x is y # but they are not the *same* lists of that stuff" 2238 | ] 2239 | }, 2240 | { 2241 | "cell_type": "markdown", 2242 | "metadata": {}, 2243 | "source": [ 2244 | "So, by that logic..." 2245 | ] 2246 | }, 2247 | { 2248 | "cell_type": "code", 2249 | "execution_count": 103, 2250 | "metadata": {}, 2251 | "outputs": [ 2252 | { 2253 | "name": "stdout", 2254 | "output_type": "stream", 2255 | "text": [ 2256 | "[['pikachu'], [2, 99], [3, 'hi']]\n", 2257 | "[['pikachu'], [2, 99], [3, 'hi'], 5]\n" 2258 | ] 2259 | } 2260 | ], 2261 | "source": [ 2262 | "y.append(5)\n", 2263 | "print(x)\n", 2264 | "print(y)" 2265 | ] 2266 | }, 2267 | { 2268 | "cell_type": "code", 2269 | "execution_count": 104, 2270 | "metadata": {}, 2271 | "outputs": [ 2272 | { 2273 | "data": { 2274 | "text/plain": [ 2275 | "False" 2276 | ] 2277 | }, 2278 | "execution_count": 104, 2279 | "metadata": {}, 2280 | "output_type": "execute_result" 2281 | } 2282 | ], 2283 | "source": [ 2284 | "x == y" 2285 | ] 2286 | }, 2287 | { 2288 | "cell_type": "markdown", 2289 | "metadata": {}, 2290 | "source": [ 2291 | "


\n", 2292 | "That makes sense, as weird as it seems. " 2293 | ] 2294 | }, 2295 | { 2296 | "cell_type": "markdown", 2297 | "metadata": {}, 2298 | "source": [ 2299 | "- In short, `copy` copies one level down.\n", 2300 | "- What if we want to copy everything?\n", 2301 | "- Enter our friend `deepcopy`:" 2302 | ] 2303 | }, 2304 | { 2305 | "cell_type": "code", 2306 | "execution_count": 105, 2307 | "metadata": { 2308 | "jupyter": { 2309 | "outputs_hidden": false 2310 | } 2311 | }, 2312 | "outputs": [ 2313 | { 2314 | "name": "stdout", 2315 | "output_type": "stream", 2316 | "text": [ 2317 | "[['pikachu'], [2, 99], [3, 'hi']]\n", 2318 | "[[1], [2, 99], [3, 'hi']]\n" 2319 | ] 2320 | } 2321 | ], 2322 | "source": [ 2323 | "x = [[1], [2,99], [3, \"hi\"]] \n", 2324 | "\n", 2325 | "y = copy.deepcopy(x)\n", 2326 | "\n", 2327 | "x[0][0] = \"pikachu\"\n", 2328 | "print(x)\n", 2329 | "print(y)" 2330 | ] 2331 | }, 2332 | { 2333 | "cell_type": "markdown", 2334 | "metadata": {}, 2335 | "source": [ 2336 | "## Scoping (10 min)" 2337 | ] 2338 | }, 2339 | { 2340 | "cell_type": "code", 2341 | "execution_count": 106, 2342 | "metadata": {}, 2343 | "outputs": [ 2344 | { 2345 | "data": { 2346 | "text/plain": [ 2347 | "5" 2348 | ] 2349 | }, 2350 | "execution_count": 106, 2351 | "metadata": {}, 2352 | "output_type": "execute_result" 2353 | } 2354 | ], 2355 | "source": [ 2356 | "def f():\n", 2357 | " x = 10\n", 2358 | "\n", 2359 | "x = 5\n", 2360 | "f()\n", 2361 | "x" 2362 | ] 2363 | }, 2364 | { 2365 | "cell_type": "code", 2366 | "execution_count": 107, 2367 | "metadata": { 2368 | "jupyter": { 2369 | "outputs_hidden": false 2370 | } 2371 | }, 2372 | "outputs": [ 2373 | { 2374 | "ename": "NameError", 2375 | "evalue": "name 'new_variable' is not defined", 2376 | "output_type": "error", 2377 | "traceback": [ 2378 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 2379 | "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", 2380 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mf\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 5\u001b[0;31m \u001b[0mnew_variable\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 2381 | "\u001b[0;31mNameError\u001b[0m: name 'new_variable' is not defined" 2382 | ] 2383 | } 2384 | ], 2385 | "source": [ 2386 | "def f():\n", 2387 | " new_variable = 10\n", 2388 | "\n", 2389 | "f()\n", 2390 | "new_variable" 2391 | ] 2392 | }, 2393 | { 2394 | "cell_type": "markdown", 2395 | "metadata": {}, 2396 | "source": [ 2397 | "- It looks like the `x` inside and outside the function are different.\n", 2398 | "- It looks like `new_variable` is defined only for use inside the function.\n", 2399 | "- That is generally a good way of thinking, and is more true in other languages.\n", 2400 | "- This is called **scope** (see [Wikipedia article](https://en.wikipedia.org/wiki/Scope_(computer_science))).\n", 2401 | "- However, in Python things are dangerously loose and permissive, so **be careful**." 2402 | ] 2403 | }, 2404 | { 2405 | "cell_type": "code", 2406 | "execution_count": 108, 2407 | "metadata": { 2408 | "jupyter": { 2409 | "outputs_hidden": false 2410 | } 2411 | }, 2412 | "outputs": [ 2413 | { 2414 | "name": "stdout", 2415 | "output_type": "stream", 2416 | "text": [ 2417 | "hello world\n" 2418 | ] 2419 | } 2420 | ], 2421 | "source": [ 2422 | "def bat():\n", 2423 | " print(s)\n", 2424 | " \n", 2425 | "s = \"hello world\"\n", 2426 | "bat()" 2427 | ] 2428 | }, 2429 | { 2430 | "cell_type": "code", 2431 | "execution_count": 109, 2432 | "metadata": { 2433 | "jupyter": { 2434 | "outputs_hidden": false 2435 | } 2436 | }, 2437 | "outputs": [ 2438 | { 2439 | "name": "stdout", 2440 | "output_type": "stream", 2441 | "text": [ 2442 | "another string\n" 2443 | ] 2444 | } 2445 | ], 2446 | "source": [ 2447 | "def bat(s):\n", 2448 | " print(s)\n", 2449 | " \n", 2450 | "s = \"hello world\" \n", 2451 | "bat(\"another string\")" 2452 | ] 2453 | }, 2454 | { 2455 | "cell_type": "markdown", 2456 | "metadata": {}, 2457 | "source": [ 2458 | "What happened? \n", 2459 | "\n", 2460 | "- In the first case, `s` was not defined, so it was borrowed from the scope outside the function.\n", 2461 | "- In the second case, `s` was passed in directly, so it was used.\n", 2462 | "- This is very worrying, because of the following:" 2463 | ] 2464 | }, 2465 | { 2466 | "cell_type": "code", 2467 | "execution_count": 110, 2468 | "metadata": {}, 2469 | "outputs": [ 2470 | { 2471 | "data": { 2472 | "text/plain": [ 2473 | "[99999, 2, 3]" 2474 | ] 2475 | }, 2476 | "execution_count": 110, 2477 | "metadata": {}, 2478 | "output_type": "execute_result" 2479 | } 2480 | ], 2481 | "source": [ 2482 | "def modify_the_stuff():\n", 2483 | " the_stuff[0] = 99999\n", 2484 | " \n", 2485 | "the_stuff = [1,2,3]\n", 2486 | "modify_the_stuff()\n", 2487 | "the_stuff" 2488 | ] 2489 | }, 2490 | { 2491 | "cell_type": "markdown", 2492 | "metadata": {}, 2493 | "source": [ 2494 | "- Above: `modify_the_stuff` modified a variable that was not even passed in as an argument!\n", 2495 | "- So functions can really mess with your stuff without you knowing. \n", 2496 | "- Please do not write code like this!\n", 2497 | " - Safest: functions with no side effects.\n", 2498 | " - Acceptable: functions with side effects, clearly documented.\n", 2499 | " - Disaster: functions with undocumented side effects on its arguments.\n", 2500 | " - Complete disaster: functions modifying stuff that you didn't even pass into the function." 2501 | ] 2502 | }, 2503 | { 2504 | "cell_type": "markdown", 2505 | "metadata": {}, 2506 | "source": [ 2507 | "Some other things to avoid:" 2508 | ] 2509 | }, 2510 | { 2511 | "cell_type": "code", 2512 | "execution_count": 111, 2513 | "metadata": {}, 2514 | "outputs": [ 2515 | { 2516 | "ename": "TypeError", 2517 | "evalue": "'int' object is not callable", 2518 | "output_type": "error", 2519 | "traceback": [ 2520 | "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", 2521 | "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)", 2522 | "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"hello\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m", 2523 | "\u001b[0;32m\u001b[0m in \u001b[0;36mfunc\u001b[0;34m(s, len)\u001b[0m\n\u001b[1;32m 1\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mlen\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 2\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mlen\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0ms\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 3\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mfunc\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"hello\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;36m5\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", 2524 | "\u001b[0;31mTypeError\u001b[0m: 'int' object is not callable" 2525 | ] 2526 | } 2527 | ], 2528 | "source": [ 2529 | "def func(s, len):\n", 2530 | " print(len(s))\n", 2531 | " \n", 2532 | "func(\"hello\", 5)" 2533 | ] 2534 | }, 2535 | { 2536 | "cell_type": "markdown", 2537 | "metadata": {}, 2538 | "source": [ 2539 | "- Above: don't do this - inside the function there's a variable called `len` which is overwriting the built-in `len` function.\n", 2540 | "- Below: functions can access other functions if they are all in the global scope:" 2541 | ] 2542 | }, 2543 | { 2544 | "cell_type": "code", 2545 | "execution_count": 112, 2546 | "metadata": {}, 2547 | "outputs": [ 2548 | { 2549 | "data": { 2550 | "text/plain": [ 2551 | "6" 2552 | ] 2553 | }, 2554 | "execution_count": 112, 2555 | "metadata": {}, 2556 | "output_type": "execute_result" 2557 | } 2558 | ], 2559 | "source": [ 2560 | "def hello(a):\n", 2561 | " a = a + 5 \n", 2562 | " return a\n", 2563 | "\n", 2564 | "a = 1\n", 2565 | "hello(a) # hello(1)" 2566 | ] 2567 | }, 2568 | { 2569 | "cell_type": "code", 2570 | "execution_count": 113, 2571 | "metadata": {}, 2572 | "outputs": [ 2573 | { 2574 | "name": "stdout", 2575 | "output_type": "stream", 2576 | "text": [ 2577 | "Hello from f!\n" 2578 | ] 2579 | } 2580 | ], 2581 | "source": [ 2582 | "def f():\n", 2583 | " print(\"Hello from f!\")\n", 2584 | " \n", 2585 | "def g():\n", 2586 | " f()\n", 2587 | " \n", 2588 | "g()" 2589 | ] 2590 | }, 2591 | { 2592 | "cell_type": "markdown", 2593 | "metadata": {}, 2594 | "source": [ 2595 | "That is, there's no need to pass the function `f` into `g` to call it, because `f` is \"global\"." 2596 | ] 2597 | }, 2598 | { 2599 | "cell_type": "markdown", 2600 | "metadata": {}, 2601 | "source": [ 2602 | "#### That's all, folks\n", 2603 | "\n", 2604 | "- This is my last lecture of DSCI 511.\n", 2605 | "- MDS-V students, I will see you in DSCI 512, 572, 553.\n", 2606 | "- MDS-CL students, I will see you in lab tomorrow.\n", 2607 | "- I have office hours right now (12:30-1:30)\n", 2608 | "- Good luck with the lab!" 2609 | ] 2610 | } 2611 | ], 2612 | "metadata": { 2613 | "kernelspec": { 2614 | "display_name": "Python 3", 2615 | "language": "python", 2616 | "name": "python3" 2617 | }, 2618 | "language_info": { 2619 | "codemirror_mode": { 2620 | "name": "ipython", 2621 | "version": 3 2622 | }, 2623 | "file_extension": ".py", 2624 | "mimetype": "text/x-python", 2625 | "name": "python", 2626 | "nbconvert_exporter": "python", 2627 | "pygments_lexer": "ipython3", 2628 | "version": "3.7.3" 2629 | } 2630 | }, 2631 | "nbformat": 4, 2632 | "nbformat_minor": 4 2633 | } 2634 | -------------------------------------------------------------------------------- /lectures/listCopySmall.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/UBC-MDS/DSCI_511_prog-dsci/b7fad58c131b6f8f1715b435c642048eef9c7b87/lectures/listCopySmall.jpg --------------------------------------------------------------------------------