19 | ```
20 |
21 | # OOP I: Objects and Methods
22 |
23 | ## Overview
24 |
25 | The traditional programming paradigm (think Fortran, C, MATLAB, etc.) is called [procedural](https://en.wikipedia.org/wiki/Procedural_programming).
26 |
27 | It works as follows
28 |
29 | * The program has a state corresponding to the values of its variables.
30 | * Functions are called to act on and transform the state.
31 | * Final outputs are produced via a sequence of function calls.
32 |
33 | Two other important paradigms are [object-oriented programming](https://en.wikipedia.org/wiki/Object-oriented_programming) (OOP) and [functional programming](https://en.wikipedia.org/wiki/Functional_programming).
34 |
35 |
36 | In the OOP paradigm, data and functions are bundled together into "objects" --- and functions in this context are referred to as **methods**.
37 |
38 | Methods are called on to transform the data contained in the object.
39 |
40 | * Think of a Python list that contains data and has methods such as `append()` and `pop()` that transform the data.
41 |
42 | Functional programming languages are built on the idea of composing functions.
43 |
44 | * Influential examples include [Lisp](https://en.wikipedia.org/wiki/Common_Lisp), [Haskell](https://en.wikipedia.org/wiki/Haskell) and [Elixir](https://en.wikipedia.org/wiki/Elixir_(programming_language)).
45 |
46 | So which of these categories does Python fit into?
47 |
48 | Actually Python is a pragmatic language that blends object-oriented, functional and procedural styles, rather than taking a purist approach.
49 |
50 | On one hand, this allows Python and its users to cherry pick nice aspects of different paradigms.
51 |
52 | On the other hand, the lack of purity might at times lead to some confusion.
53 |
54 | Fortunately this confusion is minimized if you understand that, at a foundational level, Python *is* object-oriented.
55 |
56 | By this we mean that, in Python, *everything is an object*.
57 |
58 | In this lecture, we explain what that statement means and why it matters.
59 |
60 | We'll make use of the following third party library
61 |
62 |
63 | ```{code-cell} python3
64 | :tags: [hide-output]
65 | !pip install rich
66 | ```
67 |
68 |
69 | ## Objects
70 |
71 | ```{index} single: Python; Objects
72 | ```
73 |
74 | In Python, an *object* is a collection of data and instructions held in computer memory that consists of
75 |
76 | 1. a type
77 | 1. a unique identity
78 | 1. data (i.e., content)
79 | 1. methods
80 |
81 | These concepts are defined and discussed sequentially below.
82 |
83 | (type)=
84 | ### Type
85 |
86 | ```{index} single: Python; Type
87 | ```
88 |
89 | Python provides for different types of objects, to accommodate different categories of data.
90 |
91 | For example
92 |
93 | ```{code-cell} python3
94 | s = 'This is a string'
95 | type(s)
96 | ```
97 |
98 | ```{code-cell} python3
99 | x = 42 # Now let's create an integer
100 | type(x)
101 | ```
102 |
103 | The type of an object matters for many expressions.
104 |
105 | For example, the addition operator between two strings means concatenation
106 |
107 | ```{code-cell} python3
108 | '300' + 'cc'
109 | ```
110 |
111 | On the other hand, between two numbers it means ordinary addition
112 |
113 | ```{code-cell} python3
114 | 300 + 400
115 | ```
116 |
117 | Consider the following expression
118 |
119 | ```{code-cell} python3
120 | ---
121 | tags: [raises-exception]
122 | ---
123 | '300' + 400
124 | ```
125 |
126 | Here we are mixing types, and it's unclear to Python whether the user wants to
127 |
128 | * convert `'300'` to an integer and then add it to `400`, or
129 | * convert `400` to string and then concatenate it with `'300'`
130 |
131 | Some languages might try to guess but Python is *strongly typed*
132 |
133 | * Type is important, and implicit type conversion is rare.
134 | * Python will respond instead by raising a `TypeError`.
135 |
136 | To avoid the error, you need to clarify by changing the relevant type.
137 |
138 | For example,
139 |
140 | ```{code-cell} python3
141 | int('300') + 400 # To add as numbers, change the string to an integer
142 | ```
143 |
144 | (identity)=
145 | ### Identity
146 |
147 | ```{index} single: Python; Identity
148 | ```
149 |
150 | In Python, each object has a unique identifier, which helps Python (and us) keep track of the object.
151 |
152 | The identity of an object can be obtained via the `id()` function
153 |
154 | ```{code-cell} python3
155 | y = 2.5
156 | z = 2.5
157 | id(y)
158 | ```
159 |
160 | ```{code-cell} python3
161 | id(z)
162 | ```
163 |
164 | In this example, `y` and `z` happen to have the same value (i.e., `2.5`), but they are not the same object.
165 |
166 | The identity of an object is in fact just the address of the object in memory.
167 |
168 | ### Object Content: Data and Attributes
169 |
170 | ```{index} single: Python; Content
171 | ```
172 |
173 | If we set `x = 42` then we create an object of type `int` that contains
174 | the data `42`.
175 |
176 | In fact, it contains more, as the following example shows
177 |
178 | ```{code-cell} python3
179 | x = 42
180 | x
181 | ```
182 |
183 | ```{code-cell} python3
184 | x.imag
185 | ```
186 |
187 | ```{code-cell} python3
188 | x.__class__
189 | ```
190 |
191 | When Python creates this integer object, it stores with it various auxiliary information, such as the imaginary part, and the type.
192 |
193 | Any name following a dot is called an *attribute* of the object to the left of the dot.
194 |
195 | * e.g.,``imag`` and `__class__` are attributes of `x`.
196 |
197 | We see from this example that objects have attributes that contain auxiliary information.
198 |
199 | They also have attributes that act like functions, called *methods*.
200 |
201 | These attributes are important, so let's discuss them in-depth.
202 |
203 | (methods)=
204 | ### Methods
205 |
206 | ```{index} single: Python; Methods
207 | ```
208 |
209 | Methods are *functions that are bundled with objects*.
210 |
211 | Formally, methods are attributes of objects that are **callable** -- i.e., attributes that can be called as functions
212 |
213 | ```{code-cell} python3
214 | x = ['foo', 'bar']
215 | callable(x.append)
216 | ```
217 |
218 | ```{code-cell} python3
219 | callable(x.__doc__)
220 | ```
221 |
222 | Methods typically act on the data contained in the object they belong to, or combine that data with other data
223 |
224 | ```{code-cell} python3
225 | x = ['a', 'b']
226 | x.append('c')
227 | s = 'This is a string'
228 | s.upper()
229 | ```
230 |
231 | ```{code-cell} python3
232 | s.lower()
233 | ```
234 |
235 | ```{code-cell} python3
236 | s.replace('This', 'That')
237 | ```
238 |
239 | A great deal of Python functionality is organized around method calls.
240 |
241 | For example, consider the following piece of code
242 |
243 | ```{code-cell} python3
244 | x = ['a', 'b']
245 | x[0] = 'aa' # Item assignment using square bracket notation
246 | x
247 | ```
248 |
249 | It doesn't look like there are any methods used here, but in fact the square bracket assignment notation is just a convenient interface to a method call.
250 |
251 | What actually happens is that Python calls the `__setitem__` method, as follows
252 |
253 | ```{code-cell} python3
254 | x = ['a', 'b']
255 | x.__setitem__(0, 'aa') # Equivalent to x[0] = 'aa'
256 | x
257 | ```
258 |
259 | (If you wanted to you could modify the `__setitem__` method, so that square bracket assignment does something totally different)
260 |
261 | ## Inspection Using Rich
262 |
263 | There's a nice package called [rich](https://github.com/Textualize/rich) that
264 | helps us view the contents of an object.
265 |
266 | For example,
267 |
268 | ```{code-cell} python3
269 | from rich import inspect
270 | x = 10
271 | inspect(10)
272 | ```
273 | If we want to see the methods as well, we can use
274 |
275 | ```{code-cell} python3
276 | inspect(10, methods=True)
277 | ```
278 |
279 | In fact there are still more methods, as you can see if you execute `inspect(10, all=True)`.
280 |
281 |
282 |
283 | ## A Little Mystery
284 |
285 | In this lecture we claimed that Python is, at heart, an object oriented language.
286 |
287 | But here's an example that looks more procedural.
288 |
289 | ```{code-cell} python3
290 | x = ['a', 'b']
291 | m = len(x)
292 | m
293 | ```
294 |
295 | If Python is object oriented, why don't we use `x.len()`?
296 |
297 | The answer is related to the fact that Python aims for readability and consistent style.
298 |
299 | In Python, it is common for users to build custom objects --- we discuss how to
300 | do this {doc}`later `.
301 |
302 | It's quite common for users to add methods to their that measure the length of
303 | the object, suitably defined.
304 |
305 | When naming such a method, natural choices are `len()` and `length()`.
306 |
307 | If some users choose `len()` and others choose `length()`, then the style will
308 | be inconsistent and harder to remember.
309 |
310 | To avoid this, the creator of Python chose to add
311 | `len()` as a built-in function, to help emphasize that `len()` is the convention.
312 |
313 | Now, having said all of this, Python *is* still object oriented under the hood.
314 |
315 | In fact, the list `x` discussed above has a method called `__len__()`.
316 |
317 | All that the function `len()` does is call this method.
318 |
319 | In other words, the following code is equivalent:
320 |
321 | ```{code-cell} python3
322 | x = ['a', 'b']
323 | len(x)
324 | ```
325 | and
326 |
327 | ```{code-cell} python3
328 | x = ['a', 'b']
329 | x.__len__()
330 | ```
331 |
332 |
333 | ## Summary
334 |
335 | The message in this lecture is clear:
336 |
337 | * In Python, *everything in memory is treated as an object*.
338 |
339 | This includes not just lists, strings, etc., but also less obvious things, such as
340 |
341 | * functions (once they have been read into memory)
342 | * modules (ditto)
343 | * files opened for reading or writing
344 | * integers, etc.
345 |
346 | Remember that everything is an object will help you interact with your programs
347 | and write clear Pythonic code.
348 |
349 | ## Exercises
350 |
351 | ```{exercise-start}
352 | :label: oop_intro_ex1
353 | ```
354 |
355 | We have met the {any}`boolean data type ` previously.
356 |
357 | Using what we have learnt in this lecture, print a list of methods of the
358 | boolean object `True`.
359 |
360 | ```{hint}
361 | :class: dropdown
362 |
363 | You can use `callable()` to test whether an attribute of an object can be called as a function
364 | ```
365 |
366 | ```{exercise-end}
367 | ```
368 |
369 | ```{solution-start} oop_intro_ex1
370 | :class: dropdown
371 | ```
372 |
373 | Firstly, we need to find all attributes of `True`, which can be done via
374 |
375 | ```{code-cell} python3
376 | print(sorted(True.__dir__()))
377 | ```
378 |
379 | or
380 |
381 | ```{code-cell} python3
382 | print(sorted(dir(True)))
383 | ```
384 |
385 | Since the boolean data type is a primitive type, you can also find it in the built-in namespace
386 |
387 | ```{code-cell} python3
388 | print(dir(__builtins__.bool))
389 | ```
390 |
391 | Here we use a `for` loop to filter out attributes that are callable
392 |
393 | ```{code-cell} python3
394 | attributes = dir(__builtins__.bool)
395 | callablels = []
396 |
397 | for attribute in attributes:
398 | # Use eval() to evaluate a string as an expression
399 | if callable(eval(f'True.{attribute}')):
400 | callablels.append(attribute)
401 | print(callablels)
402 | ```
403 |
404 |
405 | ```{solution-end}
406 | ```
407 |
--------------------------------------------------------------------------------
/lectures/workspace.md:
--------------------------------------------------------------------------------
1 | ---
2 | jupytext:
3 | text_representation:
4 | extension: .md
5 | format_name: myst
6 | format_version: 0.13
7 | jupytext_version: 1.17.2
8 | kernelspec:
9 | name: python3
10 | display_name: Python 3 (ipykernel)
11 | language: python
12 | ---
13 |
14 | (workspace)=
15 | ```{raw} jupyter
16 |
21 | ```
22 |
23 | # Writing Longer Programs
24 |
25 | ## Overview
26 |
27 | So far, we have explored the use of Jupyter Notebooks in writing and executing Python code.
28 |
29 | While they are efficient and adaptable when working with short pieces of code, Notebooks are not the best choice for longer programs and scripts.
30 |
31 | Jupyter Notebooks are well suited to interactive computing (i.e. data science workflows) and can help execute chunks of code one at a time.
32 |
33 | Text files and scripts allow for long pieces of code to be written and executed in a single go.
34 |
35 | We will explore the use of Python scripts as an alternative.
36 |
37 | The Jupyter Lab and Visual Studio Code (VS Code) development environments are then introduced along with a primer on version control (Git).
38 |
39 | In this lecture, you will learn to
40 | - work with Python scripts
41 | - set up various development environments
42 | - get started with GitHub
43 |
44 | ```{note}
45 | Going forward, it is assumed that you have an Anaconda environment up and running.
46 |
47 | You may want to [create a new conda environment](https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-with-commands) if you haven't done so already.
48 | ```
49 |
50 | ## Working with Python files
51 |
52 | Python files are used when writing long, reusable blocks of code - by convention, they have a `.py` suffix.
53 |
54 | Let us begin by working with the following example.
55 |
56 | ```{code-cell} ipython3
57 | :caption: sine_wave.py
58 | :lineno-start: 1
59 |
60 | import matplotlib.pyplot as plt
61 | import numpy as np
62 |
63 | x = np.linspace(0, 10, 100)
64 | y = np.sin(x)
65 |
66 | plt.plot(x, y)
67 | plt.xlabel('x')
68 | plt.ylabel('y')
69 | plt.title('Sine Wave')
70 | plt.show()
71 | ```
72 |
73 | As there are various ways to execute the code, we will explore them in the context of different development environments.
74 |
75 | One major advantage of using Python scripts lies in the fact that you can "import" functionality from other scripts into your current script or Jupyter Notebook.
76 |
77 | Let's rewrite the earlier code into a function and write to to a file called `sine_wave.py`.
78 |
79 | ```{code-cell} ipython3
80 | :caption: sine_wave.py
81 | :lineno-start: 1
82 |
83 | %%writefile sine_wave.py
84 |
85 | import matplotlib.pyplot as plt
86 | import numpy as np
87 |
88 | # Define the plot_wave function.
89 | def plot_wave(title : str = 'Sine Wave'):
90 | x = np.linspace(0, 10, 100)
91 | y = np.sin(x)
92 |
93 | plt.plot(x, y)
94 | plt.xlabel('x')
95 | plt.ylabel('y')
96 | plt.title(title)
97 | plt.show()
98 | ```
99 |
100 | ```{code-cell} ipython3
101 | :caption: second_script.py
102 | :lineno-start: 1
103 |
104 | import sine_wave # Import the sine_wave script
105 |
106 | # Call the plot_wave function.
107 | sine_wave.plot_wave("Sine Wave - Called from the Second Script")
108 | ```
109 |
110 | This allows you to split your code into chunks and structure your codebase better.
111 |
112 | Look into the use of [modules](https://docs.python.org/3/tutorial/modules.html) and [packages](https://docs.python.org/3/tutorial/modules.html#packages) for more information on importing functionality.
113 |
114 | ## Development environments
115 |
116 | A development environment is a one stop workspace where you can
117 | - edit and run your code
118 | - test and debug
119 | - manage project files
120 |
121 | This lecture takes you through the workings of two development environments.
122 |
123 | ## A step forward from Jupyter Notebooks: JupyterLab
124 |
125 | JupyterLab is a browser based development environment for Jupyter Notebooks, code scripts, and data files.
126 |
127 | You can [try JupyterLab in the browser](https://jupyter.org/try-jupyter/lab/) if you want to test it out before installing it locally.
128 |
129 | You can install JupyterLab using pip
130 |
131 | ```
132 | > pip install jupyterlab
133 | ```
134 |
135 | and launch it in the browser, similar to Jupyter Notebooks.
136 |
137 | ```
138 | > jupyter-lab
139 | ```
140 |
141 | ```{figure} /_static/lecture_specific/workspace/jupyter_lab_cmd.png
142 | :figclass: auto
143 | ```
144 |
145 | You can see that the Jupyter Server is running on port 8888 on the localhost.
146 |
147 | The following interface should open up on your default browser automatically - if not, CTRL + Click the server URL.
148 |
149 | ```{figure} /_static/lecture_specific/workspace/jupyter_lab.png
150 | :figclass: auto
151 | ```
152 |
153 | Click on
154 |
155 | - the Python 3 (ipykernel) button under Notebooks to open a new Jupyter Notebook
156 | - the Python File button to open a new Python script (.py)
157 |
158 | You can always open this launcher tab by clicking the '+' button on the top.
159 |
160 | All the files and folders in your working directory can be found in the File Browser (tab on the left).
161 |
162 | You can create new files and folders using the buttons available at the top of the File Browser tab.
163 |
164 | ```{figure} /_static/lecture_specific/workspace/file_browser.png
165 | :figclass: auto
166 | ```
167 | You can install extensions that increase the functionality of JupyterLab by visiting the Extensions tab.
168 |
169 | ```{figure} /_static/lecture_specific/workspace/extensions.png
170 | :figclass: auto
171 | ```
172 | Coming back to the example scripts from earlier, there are two ways to work with them in JupyterLab.
173 |
174 | - Using magic commands
175 | - Using the terminal
176 |
177 | ### Using magic commands
178 |
179 | Jupyter Notebooks and JupyterLab support the use of [magic commands](https://ipython.readthedocs.io/en/stable/interactive/magics.html) - commands that extend the capabilities of a standard Jupyter Notebook.
180 |
181 | The `%run` magic command allows you to run a Python script from within a Notebook.
182 |
183 | This is a convenient way to run scripts that you are working on in the same directory as your Notebook and present the outputs within the Notebook.
184 |
185 | ```{figure} /_static/lecture_specific/workspace/jupyter_lab_py_run.png
186 | :figclass: auto
187 | ```
188 |
189 | ### Using the terminal
190 |
191 | However, if you are looking into just running the `.py` file, it is sometimes easier to use the terminal.
192 |
193 | Open a terminal from the launcher and run the following command.
194 |
195 | ```
196 | > python
197 | ```
198 |
199 | ```{figure} /_static/lecture_specific/workspace/jupyter_lab_py_run_term.png
200 | :figclass: auto
201 | ```
202 |
203 | ```{note}
204 | You can also run the script line by line by opening an ipykernel console either
205 | - from the launcher
206 | - by right clicking within the Notebook and selecting Create Console for Editor
207 |
208 | Use Shift + Enter to run a line of code.
209 | ```
210 |
211 | ## A walk through Visual Studio Code
212 |
213 | Visual Studio Code (VS Code) is a code editor and development workspace that can run
214 | - in the [browser](https://vscode.dev/).
215 | - as a local [installation](https://code.visualstudio.com/docs/?dv=win).
216 |
217 | Both interfaces are identical.
218 |
219 | When you launch VS Code, you will see the following interface.
220 |
221 | ```{figure} /_static/lecture_specific/workspace/vs_code_home.png
222 | :figclass: auto
223 | ```
224 |
225 | Explore how to customize VS Code to your liking through the guided walkthroughs.
226 |
227 | ```{figure} /_static/lecture_specific/workspace/vs_code_walkthrough.png
228 | :figclass: auto
229 | ```
230 | When presented with the following prompt, go ahead an install all recommended extensions.
231 |
232 | ```{figure} /_static/lecture_specific/workspace/vs_code_install_ext.png
233 | :figclass: auto
234 | ```
235 | You can also install extensions from the Extensions tab.
236 |
237 | ```{figure} /_static/lecture_specific/workspace/vs_code_extensions.png
238 | :figclass: auto
239 | ```
240 | Jupyter Notebooks (`.ipynb` files) can be worked on in VS Code.
241 |
242 | Make sure to install the Jupyter extension from the Extensions tab before you try to open a Jupyter Notebook.
243 |
244 | Create a new file (in the file Explorer tab) and save it with the `.ipynb` extension.
245 |
246 | Choose a kernel/environment to run the Notebook in by clicking on the Select Kernel button on the top right corner of the editor.
247 |
248 | ```{figure} /_static/lecture_specific/workspace/vs_code_kernels.png
249 | :figclass: auto
250 | ```
251 |
252 | VS Code also has excellent version control functionality through the Source Control tab.
253 |
254 | ```{figure} /_static/lecture_specific/workspace/vs_code_git.png
255 | :figclass: auto
256 | ```
257 | Link your GitHub account to VS Code to push and pull changes to and from your repositories.
258 |
259 | Further discussions about version control can be found in the next section.
260 |
261 | To open a new Terminal in VS Code, click on the Terminal tab and select New Terminal.
262 |
263 | VS Code opens a new Terminal in the same directory you are working in - a PowerShell in Windows and a Bash in Linux.
264 |
265 | You can change the shell or open a new instance through the dropdown menu on the right end of the terminal tab.
266 |
267 | ```{figure} /_static/lecture_specific/workspace/vs_code_terminal_opts.png
268 | :figclass: auto
269 | ```
270 |
271 | VS Code helps you manage conda environments without using the command line.
272 |
273 | Open the Command Palette (CTRL + SHIFT + P or from the dropdown menu under View tab) and search for ```Python: Select Interpreter```.
274 |
275 | This loads existing environments.
276 |
277 | You can also create new environments using ```Python: Create Environment``` in the Command Palette.
278 |
279 | A new environment (.conda folder) is created in the the current working directory.
280 |
281 | Coming to the example scripts from earlier, there are again two ways to work with them in VS Code.
282 |
283 | - Using the run button
284 | - Using the terminal
285 |
286 | ### Using the run button
287 |
288 | You can run the script by clicking on the run button on the top right corner of the editor.
289 |
290 | ```{figure} /_static/lecture_specific/workspace/vs_code_run.png
291 | :figclass: auto
292 | ```
293 |
294 | You can also run the script interactively by selecting the **Run Current File in Interactive Window** option from the dropdown.
295 |
296 | ```{figure} /_static/lecture_specific/workspace/vs_code_run_button.png
297 | :figclass: auto
298 | ```
299 | This creates an ipykernel console and runs the script.
300 |
301 | ### Using the terminal
302 |
303 | The command `python ` is executed on the console of your choice.
304 |
305 | If you are using a Windows machine, you can either use the Anaconda Prompt or the Command Prompt - but, generally not the PowerShell.
306 |
307 | Here's an execution of the earlier code.
308 |
309 | ```{figure} /_static/lecture_specific/workspace/sine_wave_import.png
310 | :figclass: auto
311 | ```
312 |
313 | ```{note}
314 | If you would like to develop packages and build tools using Python, you may want to look into [the use of Docker containers and VS Code](https://github.com/RamiKrispin/vscode-python).
315 |
316 | However, this is outside the focus of these lectures.
317 | ```
318 |
319 | ## Git your hands dirty
320 |
321 | This section will familiarize you with git and GitHub.
322 |
323 | [Git](https://git-scm.com/) is a *version control system* --- a piece of software used to manage digital projects such as code libraries.
324 |
325 | In many cases, the associated collections of files --- called *repositories* --- are stored on [GitHub](https://github.com/).
326 |
327 | GitHub is a wonderland of collaborative coding projects.
328 |
329 | For example, it hosts many of the scientific libraries we'll be using later
330 | on, such as [this one](https://github.com/pandas-dev/pandas).
331 |
332 | Git is the underlying software used to manage these projects.
333 |
334 | Git is an extremely powerful tool for distributed collaboration --- for
335 | example, we use it to share and synchronize all the source files for these
336 | lectures.
337 |
338 | There are two main flavors of Git
339 |
340 | 1. the plain vanilla [command line Git](https://git-scm.com/downloads) version
341 | 2. the various point-and-click GUI versions
342 | * See, for example, the [GitHub version](https://github.com/apps/desktop) or Git GUI integrated into your IDE.
343 |
344 | In case you already haven't, try
345 |
346 | 1. Installing Git.
347 | 1. Getting a copy of [QuantEcon.py](https://github.com/QuantEcon/QuantEcon.py) using Git.
348 |
349 | For example, if you've installed the command line version, open up a terminal and enter.
350 |
351 | ```bash
352 | git clone https://github.com/QuantEcon/QuantEcon.py
353 | ```
354 | (This is just `git clone` in front of the URL for the repository)
355 |
356 | This command will download all necessary components to rebuild the lecture you are reading now.
357 |
358 | As the 2nd task,
359 |
360 | 1. Sign up to [GitHub](https://github.com/).
361 | 1. Look into 'forking' GitHub repositories (forking means making your own copy of a GitHub repository, stored on GitHub).
362 | 1. Fork [QuantEcon.py](https://github.com/QuantEcon/QuantEcon.py).
363 | 1. Clone your fork to some local directory, make edits, commit them, and push them back up to your forked GitHub repo.
364 | 1. If you made a valuable improvement, send us a [pull request](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/about-pull-requests)!
365 |
366 | For reading on these and other topics, try
367 |
368 | * [The official Git documentation](https://git-scm.com/doc).
369 | * Reading through the docs on [GitHub](https://docs.github.com/en).
370 | * [Pro Git Book](https://git-scm.com/book) by Scott Chacon and Ben Straub.
371 | * One of the thousands of Git tutorials on the Net.
372 |
--------------------------------------------------------------------------------
/lectures/matplotlib.md:
--------------------------------------------------------------------------------
1 | ---
2 | jupytext:
3 | text_representation:
4 | extension: .md
5 | format_name: myst
6 | kernelspec:
7 | display_name: Python 3
8 | language: python
9 | name: python3
10 | ---
11 |
12 | (matplotlib)=
13 | ```{raw} jupyter
14 |
19 | ```
20 |
21 | # {index}`Matplotlib `
22 |
23 | ```{index} single: Python; Matplotlib
24 | ```
25 |
26 | ## Overview
27 |
28 | We've already generated quite a few figures in these lectures using [Matplotlib](https://matplotlib.org/).
29 |
30 | Matplotlib is an outstanding graphics library, designed for scientific computing, with
31 |
32 | * high-quality 2D and 3D plots
33 | * output in all the usual formats (PDF, PNG, etc.)
34 | * LaTeX integration
35 | * fine-grained control over all aspects of presentation
36 | * animation, etc.
37 |
38 | ### Matplotlib's Split Personality
39 |
40 | Matplotlib is unusual in that it offers two different interfaces to plotting.
41 |
42 | One is a simple MATLAB-style API (Application Programming Interface) that was written to help MATLAB refugees find a ready home.
43 |
44 | The other is a more "Pythonic" object-oriented API.
45 |
46 | For reasons described below, we recommend that you use the second API.
47 |
48 | But first, let's discuss the difference.
49 |
50 | ## The APIs
51 |
52 | ```{index} single: Matplotlib; Simple API
53 | ```
54 |
55 | ### The MATLAB-style API
56 |
57 | Here's the kind of easy example you might find in introductory treatments
58 |
59 | ```{code-cell} ipython
60 | import matplotlib.pyplot as plt
61 | import numpy as np
62 |
63 | x = np.linspace(0, 10, 200)
64 | y = np.sin(x)
65 |
66 | plt.plot(x, y, 'b-', linewidth=2)
67 | plt.show()
68 | ```
69 |
70 | This is simple and convenient, but also somewhat limited and un-Pythonic.
71 |
72 | For example, in the function calls, a lot of objects get created and passed around without making themselves known to the programmer.
73 |
74 | Python programmers tend to prefer a more explicit style of programming (run `import this` in a code block and look at the second line).
75 |
76 | This leads us to the alternative, object-oriented Matplotlib API.
77 |
78 | ### The Object-Oriented API
79 |
80 | Here's the code corresponding to the preceding figure using the object-oriented API
81 |
82 | ```{code-cell} python3
83 | fig, ax = plt.subplots()
84 | ax.plot(x, y, 'b-', linewidth=2)
85 | plt.show()
86 | ```
87 |
88 | Here the call `fig, ax = plt.subplots()` returns a pair, where
89 |
90 | * `fig` is a `Figure` instance---like a blank canvas.
91 | * `ax` is an `AxesSubplot` instance---think of a frame for plotting in.
92 |
93 | The `plot()` function is actually a method of `ax`.
94 |
95 | While there's a bit more typing, the more explicit use of objects gives us better control.
96 |
97 | This will become more clear as we go along.
98 |
99 | ### Tweaks
100 |
101 | Here we've changed the line to red and added a legend
102 |
103 | ```{code-cell} python3
104 | fig, ax = plt.subplots()
105 | ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
106 | ax.legend()
107 | plt.show()
108 | ```
109 |
110 | We've also used `alpha` to make the line slightly transparent---which makes it look smoother.
111 |
112 | The location of the legend can be changed by replacing `ax.legend()` with `ax.legend(loc='upper center')`.
113 |
114 | ```{code-cell} python3
115 | fig, ax = plt.subplots()
116 | ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
117 | ax.legend(loc='upper center')
118 | plt.show()
119 | ```
120 |
121 | If everything is properly configured, then adding LaTeX is trivial
122 |
123 | ```{code-cell} python3
124 | fig, ax = plt.subplots()
125 | ax.plot(x, y, 'r-', linewidth=2, label=r'$y=\sin(x)$', alpha=0.6)
126 | ax.legend(loc='upper center')
127 | plt.show()
128 | ```
129 |
130 | Controlling the ticks, adding titles and so on is also straightforward
131 |
132 | ```{code-cell} python3
133 | fig, ax = plt.subplots()
134 | ax.plot(x, y, 'r-', linewidth=2, label=r'$y=\sin(x)$', alpha=0.6)
135 | ax.legend(loc='upper center')
136 | ax.set_yticks([-1, 0, 1])
137 | ax.set_title('Test plot')
138 | plt.show()
139 | ```
140 |
141 | ## More Features
142 |
143 | Matplotlib has a huge array of functions and features, which you can discover
144 | over time as you have need for them.
145 |
146 | We mention just a few.
147 |
148 | ### Multiple Plots on One Axis
149 |
150 | ```{index} single: Matplotlib; Multiple Plots on One Axis
151 | ```
152 |
153 | It's straightforward to generate multiple plots on the same axes.
154 |
155 | Here's an example that randomly generates three normal densities and adds a label with their mean
156 |
157 | ```{code-cell} python3
158 | from scipy.stats import norm
159 | from random import uniform
160 |
161 | fig, ax = plt.subplots()
162 | x = np.linspace(-4, 4, 150)
163 | for i in range(3):
164 | m, s = uniform(-1, 1), uniform(1, 2)
165 | y = norm.pdf(x, loc=m, scale=s)
166 | current_label = rf'$\mu = {m:.2}$'
167 | ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
168 | ax.legend()
169 | plt.show()
170 | ```
171 |
172 | ### Multiple Subplots
173 |
174 | ```{index} single: Matplotlib; Subplots
175 | ```
176 |
177 | Sometimes we want multiple subplots in one figure.
178 |
179 | Here's an example that generates 6 histograms
180 |
181 | ```{code-cell} python3
182 | num_rows, num_cols = 3, 2
183 | fig, axes = plt.subplots(num_rows, num_cols, figsize=(10, 12))
184 | for i in range(num_rows):
185 | for j in range(num_cols):
186 | m, s = uniform(-1, 1), uniform(1, 2)
187 | x = norm.rvs(loc=m, scale=s, size=100)
188 | axes[i, j].hist(x, alpha=0.6, bins=20)
189 | t = rf'$\mu = {m:.2}, \quad \sigma = {s:.2}$'
190 | axes[i, j].set(title=t, xticks=[-4, 0, 4], yticks=[])
191 | plt.show()
192 | ```
193 |
194 | ### 3D Plots
195 |
196 | ```{index} single: Matplotlib; 3D Plots
197 | ```
198 |
199 | Matplotlib does a nice job of 3D plots --- here is one example
200 |
201 | ```{code-cell} python3
202 | from mpl_toolkits.mplot3d.axes3d import Axes3D
203 | from matplotlib import cm
204 |
205 |
206 | def f(x, y):
207 | return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
208 |
209 | xgrid = np.linspace(-3, 3, 50)
210 | ygrid = xgrid
211 | x, y = np.meshgrid(xgrid, ygrid)
212 |
213 | fig = plt.figure(figsize=(10, 6))
214 | ax = fig.add_subplot(111, projection='3d')
215 | ax.plot_surface(x,
216 | y,
217 | f(x, y),
218 | rstride=2, cstride=2,
219 | cmap=cm.jet,
220 | alpha=0.7,
221 | linewidth=0.25)
222 | ax.set_zlim(-0.5, 1.0)
223 | plt.show()
224 | ```
225 |
226 | ### A Customizing Function
227 |
228 | Perhaps you will find a set of customizations that you regularly use.
229 |
230 | Suppose we usually prefer our axes to go through the origin, and to have a grid.
231 |
232 | Here's a nice example from [Matthew Doty](https://github.com/xcthulhu) of how the object-oriented API can be used to build a custom `subplots` function that implements these changes.
233 |
234 | Read carefully through the code and see if you can follow what's going on
235 |
236 | ```{code-cell} python3
237 | def subplots():
238 | "Custom subplots with axes through the origin"
239 | fig, ax = plt.subplots()
240 |
241 | # Set the axes through the origin
242 | for spine in ['left', 'bottom']:
243 | ax.spines[spine].set_position('zero')
244 | for spine in ['right', 'top']:
245 | ax.spines[spine].set_color('none')
246 |
247 | ax.grid()
248 | return fig, ax
249 |
250 |
251 | fig, ax = subplots() # Call the local version, not plt.subplots()
252 | x = np.linspace(-2, 10, 200)
253 | y = np.sin(x)
254 | ax.plot(x, y, 'r-', linewidth=2, label='sine function', alpha=0.6)
255 | ax.legend(loc='lower right')
256 | plt.show()
257 | ```
258 |
259 | The custom `subplots` function
260 |
261 | 1. calls the standard `plt.subplots` function internally to generate the `fig, ax` pair,
262 | 1. makes the desired customizations to `ax`, and
263 | 1. passes the `fig, ax` pair back to the calling code.
264 |
265 | ### Style Sheets
266 |
267 | Another useful feature in Matplotlib is [style sheets](https://matplotlib.org/stable/gallery/style_sheets/style_sheets_reference.html).
268 |
269 | We can use style sheets to create plots with uniform styles.
270 |
271 | We can find a list of available styles by printing the attribute `plt.style.available`
272 |
273 |
274 | ```{code-cell} python3
275 | print(plt.style.available)
276 | ```
277 |
278 | We can now use the `plt.style.use()` method to set the style sheet.
279 |
280 | Let's write a function that takes the name of a style sheet and draws different plots with the style
281 |
282 | ```{code-cell} python3
283 |
284 | def draw_graphs(style='default'):
285 |
286 | # Setting a style sheet
287 | plt.style.use(style)
288 |
289 | fig, axes = plt.subplots(nrows=1, ncols=4, figsize=(10, 3))
290 | x = np.linspace(-13, 13, 150)
291 |
292 | # Set seed values to replicate results of random draws
293 | np.random.seed(9)
294 |
295 | for i in range(3):
296 |
297 | # Draw mean and standard deviation from uniform distributions
298 | m, s = np.random.uniform(-8, 8), np.random.uniform(2, 2.5)
299 |
300 | # Generate a normal density plot
301 | y = norm.pdf(x, loc=m, scale=s)
302 | axes[0].plot(x, y, linewidth=3, alpha=0.7)
303 |
304 | # Create a scatter plot with random X and Y values
305 | # from normal distributions
306 | rnormX = norm.rvs(loc=m, scale=s, size=150)
307 | rnormY = norm.rvs(loc=m, scale=s, size=150)
308 | axes[1].plot(rnormX, rnormY, ls='none', marker='o', alpha=0.7)
309 |
310 | # Create a histogram with random X values
311 | axes[2].hist(rnormX, alpha=0.7)
312 |
313 | # and a line graph with random Y values
314 | axes[3].plot(x, rnormY, linewidth=2, alpha=0.7)
315 |
316 | style_name = style.split('-')[0]
317 | plt.suptitle(f'Style: {style_name}', fontsize=13)
318 | plt.show()
319 |
320 | ```
321 |
322 | Let's see what some of the styles look like.
323 |
324 | First, we draw graphs with the style sheet `seaborn`
325 |
326 | ```{code-cell} python3
327 | draw_graphs(style='seaborn-v0_8')
328 | ```
329 |
330 | We can use `grayscale` to remove colors in plots
331 |
332 | ```{code-cell} python3
333 | draw_graphs(style='grayscale')
334 | ```
335 |
336 | Here is what `ggplot` looks like
337 |
338 | ```{code-cell} python3
339 | draw_graphs(style='ggplot')
340 | ```
341 |
342 | We can also use the style `dark_background`
343 |
344 | ```{code-cell} python3
345 | draw_graphs(style='dark_background')
346 | ```
347 |
348 | You can use the function to experiment with other styles in the list.
349 |
350 | If you are interested, you can even create your own style sheets.
351 |
352 | Parameters for your style sheets are stored in a dictionary-like variable `plt.rcParams`
353 |
354 | ```{code-cell} python3
355 | ---
356 | tags: [hide-output]
357 | ---
358 |
359 | print(plt.rcParams.keys())
360 |
361 | ```
362 |
363 | There are many parameters you could set for your style sheets.
364 |
365 | Set parameters for your style sheet by:
366 |
367 | 1. creating your own [`matplotlibrc` file](https://matplotlib.org/stable/users/explain/customizing.html), or
368 | 2. updating values stored in the dictionary-like variable `plt.rcParams`
369 |
370 | Let's change the style of our overlaid density lines using the second method
371 |
372 | ```{code-cell} python3
373 | from cycler import cycler
374 |
375 | # set to the default style sheet
376 | plt.style.use('default')
377 |
378 | # You can update single values using keys:
379 |
380 | # Set the font style to italic
381 | plt.rcParams['font.style'] = 'italic'
382 |
383 | # Update linewidth
384 | plt.rcParams['lines.linewidth'] = 2
385 |
386 |
387 | # You can also update many values at once using the update() method:
388 |
389 | parameters = {
390 |
391 | # Change default figure size
392 | 'figure.figsize': (5, 4),
393 |
394 | # Add horizontal grid lines
395 | 'axes.grid': True,
396 | 'axes.grid.axis': 'y',
397 |
398 | # Update colors for density lines
399 | 'axes.prop_cycle': cycler('color',
400 | ['dimgray', 'slategrey', 'darkgray'])
401 | }
402 |
403 | plt.rcParams.update(parameters)
404 |
405 |
406 | ```
407 |
408 | ```{note}
409 |
410 | These settings are `global`.
411 |
412 | Any plot generated after changing parameters in `.rcParams` will be affected by the setting.
413 |
414 | ```
415 |
416 | ```{code-cell} python3
417 | fig, ax = plt.subplots()
418 | x = np.linspace(-4, 4, 150)
419 | for i in range(3):
420 | m, s = uniform(-1, 1), uniform(1, 2)
421 | y = norm.pdf(x, loc=m, scale=s)
422 | current_label = rf'$\mu = {m:.2}$'
423 | ax.plot(x, y, linewidth=2, alpha=0.6, label=current_label)
424 | ax.legend()
425 | plt.show()
426 | ```
427 |
428 | Apply the `default` style sheet again to change your style back to default
429 |
430 | ```{code-cell} python3
431 |
432 | plt.style.use('default')
433 |
434 | # Reset default figure size
435 | plt.rcParams['figure.figsize'] = (10, 6)
436 |
437 | ```
438 |
439 | ## Further Reading
440 |
441 | * The [Matplotlib gallery](https://matplotlib.org/stable/gallery/index.html) provides many examples.
442 | * A nice [Matplotlib tutorial](https://scipy-lectures.org/intro/matplotlib/index.html) by Nicolas Rougier, Mike Muller and Gael Varoquaux.
443 | * [mpltools](https://tonysyu.github.io/mpltools/index.html) allows easy
444 | switching between plot styles.
445 | * [Seaborn](https://github.com/mwaskom/seaborn) facilitates common statistics plots in Matplotlib.
446 |
447 | ## Exercises
448 |
449 | ```{exercise-start}
450 | :label: mpl_ex1
451 | ```
452 |
453 | Plot the function
454 |
455 | $$
456 | f(x) = \cos(\pi \theta x) \exp(-x)
457 | $$
458 |
459 | over the interval $[0, 5]$ for each $\theta$ in `np.linspace(0, 2, 10)`.
460 |
461 | Place all the curves in the same figure.
462 |
463 | The output should look like this
464 |
465 | ```{image} /_static/lecture_specific/matplotlib/matplotlib_ex1.png
466 | :scale: 130
467 | :align: center
468 | ```
469 |
470 | ```{exercise-end}
471 | ```
472 |
473 | ```{solution-start} mpl_ex1
474 | :class: dropdown
475 | ```
476 |
477 | Here's one solution
478 |
479 | ```{code-cell} ipython3
480 | def f(x, θ):
481 | return np.cos(np.pi * θ * x ) * np.exp(- x)
482 |
483 | θ_vals = np.linspace(0, 2, 10)
484 | x = np.linspace(0, 5, 200)
485 | fig, ax = plt.subplots()
486 |
487 | for θ in θ_vals:
488 | ax.plot(x, f(x, θ))
489 |
490 | plt.show()
491 | ```
492 |
493 | ```{solution-end}
494 | ```
--------------------------------------------------------------------------------
/lectures/debugging.md:
--------------------------------------------------------------------------------
1 | ---
2 | jupytext:
3 | text_representation:
4 | extension: .md
5 | format_name: myst
6 | kernelspec:
7 | display_name: Python 3
8 | language: python
9 | name: python3
10 | ---
11 |
12 | (debugging)=
13 | ```{raw} jupyter
14 |
19 | ```
20 |
21 | # Debugging and Handling Errors
22 |
23 | ```{index} single: Debugging
24 | ```
25 |
26 | ```{epigraph}
27 | "Debugging is twice as hard as writing the code in the first place.
28 | Therefore, if you write the code as cleverly as possible, you are, by definition,
29 | not smart enough to debug it." -- Brian Kernighan
30 | ```
31 |
32 | ## Overview
33 |
34 | Are you one of those programmers who fills their code with `print` statements when trying to debug their programs?
35 |
36 | Hey, we all used to do that.
37 |
38 | (OK, sometimes we still do that...)
39 |
40 | But once you start writing larger programs you'll need a better system.
41 |
42 | You may also want to handle potential errors in your code as they occur.
43 |
44 | In this lecture, we will discuss how to debug our programs and improve error handling.
45 |
46 | ## Debugging
47 |
48 | ```{index} single: Debugging
49 | ```
50 |
51 | Debugging tools for Python vary across platforms, IDEs and editors.
52 |
53 | For example, a [visual debugger](https://jupyterlab.readthedocs.io/en/stable/user/debugger.html) is available in JupyterLab.
54 |
55 | Here we'll focus on Jupyter Notebook and leave you to explore other settings.
56 |
57 | We'll need the following imports
58 |
59 | ```{code-cell} ipython
60 | import numpy as np
61 | import matplotlib.pyplot as plt
62 | ```
63 |
64 | (debug_magic)=
65 | ### The `debug` Magic
66 |
67 | Let's consider a simple (and rather contrived) example
68 |
69 | ```{code-cell} ipython
70 | ---
71 | tags: [raises-exception]
72 | ---
73 | def plot_log():
74 | fig, ax = plt.subplots(2, 1)
75 | x = np.linspace(1, 2, 10)
76 | ax.plot(x, np.log(x))
77 | plt.show()
78 |
79 | plot_log() # Call the function, generate plot
80 | ```
81 |
82 | This code is intended to plot the `log` function over the interval $[1, 2]$.
83 |
84 | But there's an error here: `plt.subplots(2, 1)` should be just `plt.subplots()`.
85 |
86 | (The call `plt.subplots(2, 1)` returns a NumPy array containing two axes objects, suitable for having two subplots on the same figure)
87 |
88 | The traceback shows that the error occurs at the method call `ax.plot(x, np.log(x))`.
89 |
90 | The error occurs because we have mistakenly made `ax` a NumPy array, and a NumPy array has no `plot` method.
91 |
92 | But let's pretend that we don't understand this for the moment.
93 |
94 | We might suspect there's something wrong with `ax` but when we try to investigate this object, we get the following exception:
95 |
96 | ```{code-cell} python3
97 | ---
98 | tags: [raises-exception]
99 | ---
100 | ax
101 | ```
102 |
103 | The problem is that `ax` was defined inside `plot_log()`, and the name is
104 | lost once that function terminates.
105 |
106 | Let's try doing it a different way.
107 |
108 | We run the first cell block again, generating the same error
109 |
110 | ```{code-cell} python3
111 | ---
112 | tags: [raises-exception]
113 | ---
114 | def plot_log():
115 | fig, ax = plt.subplots(2, 1)
116 | x = np.linspace(1, 2, 10)
117 | ax.plot(x, np.log(x))
118 | plt.show()
119 |
120 | plot_log() # Call the function, generate plot
121 | ```
122 |
123 | But this time we type in the following cell block
124 |
125 | ```{code-block} ipython
126 | :class: no-execute
127 | %debug
128 | ```
129 |
130 | You should be dropped into a new prompt that looks something like this
131 |
132 | ```{code-block} ipython
133 | :class: no-execute
134 | ipdb>
135 | ```
136 |
137 | (You might see pdb> instead)
138 |
139 | Now we can investigate the value of our variables at this point in the program, step forward through the code, etc.
140 |
141 | For example, here we simply type the name `ax` to see what's happening with
142 | this object:
143 |
144 | ```{code-block} ipython
145 | :class: no-execute
146 | ipdb> ax
147 | array([,
148 | ], dtype=object)
149 | ```
150 |
151 | It's now very clear that `ax` is an array, which clarifies the source of the
152 | problem.
153 |
154 | To find out what else you can do from inside `ipdb` (or `pdb`), use the
155 | online help
156 |
157 | ```{code-block} ipython
158 | :class: no-execute
159 | ipdb> h
160 |
161 | Documented commands (type help ):
162 | ========================================
163 | EOF bt cont enable jump pdef r tbreak w
164 | a c continue exit l pdoc restart u whatis
165 | alias cl d h list pinfo return unalias where
166 | args clear debug help n pp run unt
167 | b commands disable ignore next q s until
168 | break condition down j p quit step up
169 |
170 | Miscellaneous help topics:
171 | ==========================
172 | exec pdb
173 |
174 | Undocumented commands:
175 | ======================
176 | retval rv
177 |
178 | ipdb> h c
179 | c(ont(inue))
180 | Continue execution, only stop when a breakpoint is encountered.
181 | ```
182 |
183 | ### Setting a Break Point
184 |
185 | The preceding approach is handy but sometimes insufficient.
186 |
187 | Consider the following modified version of our function above
188 |
189 | ```{code-cell} python3
190 | ---
191 | tags: [raises-exception]
192 | ---
193 | def plot_log():
194 | fig, ax = plt.subplots()
195 | x = np.logspace(1, 2, 10)
196 | ax.plot(x, np.log(x))
197 | plt.show()
198 |
199 | plot_log()
200 | ```
201 |
202 | Here the original problem is fixed, but we've accidentally written
203 | `np.logspace(1, 2, 10)` instead of `np.linspace(1, 2, 10)`.
204 |
205 | Now there won't be any exception, but the plot won't look right.
206 |
207 | To investigate, it would be helpful if we could inspect variables like `x` during execution of the function.
208 |
209 | To this end, we add a "break point" by inserting `breakpoint()` inside the function code block
210 |
211 | ```{code-block} python3
212 | :class: no-execute
213 | def plot_log():
214 | breakpoint()
215 | fig, ax = plt.subplots()
216 | x = np.logspace(1, 2, 10)
217 | ax.plot(x, np.log(x))
218 | plt.show()
219 |
220 | plot_log()
221 | ```
222 |
223 | Now let's run the script, and investigate via the debugger
224 |
225 | ```{code-block} ipython
226 | :class: no-execute
227 | > (6)plot_log()
228 | -> fig, ax = plt.subplots()
229 | (Pdb) n
230 | > (7)plot_log()
231 | -> x = np.logspace(1, 2, 10)
232 | (Pdb) n
233 | > (8)plot_log()
234 | -> ax.plot(x, np.log(x))
235 | (Pdb) x
236 | array([ 10. , 12.91549665, 16.68100537, 21.5443469 ,
237 | 27.82559402, 35.93813664, 46.41588834, 59.94842503,
238 | 77.42636827, 100. ])
239 | ```
240 |
241 | We used `n` twice to step forward through the code (one line at a time).
242 |
243 | Then we printed the value of `x` to see what was happening with that variable.
244 |
245 | To exit from the debugger, use `q`.
246 |
247 | ### Other Useful Magics
248 |
249 | In this lecture, we used the `%debug` IPython magic.
250 |
251 | There are many other useful magics:
252 |
253 | * `%precision 4` sets printed precision for floats to 4 decimal places
254 | * `%whos` gives a list of variables and their values
255 | * `%quickref` gives a list of magics
256 |
257 | The full list of magics is [here](https://ipython.readthedocs.io/en/stable/interactive/magics.html).
258 |
259 |
260 | ## Handling Errors
261 |
262 | ```{index} single: Python; Handling Errors
263 | ```
264 |
265 | Sometimes it's possible to anticipate bugs and errors as we're writing code.
266 |
267 | For example, the unbiased sample variance of sample $y_1, \ldots, y_n$
268 | is defined as
269 |
270 | $$
271 | s^2 := \frac{1}{n-1} \sum_{i=1}^n (y_i - \bar y)^2
272 | \qquad \bar y = \text{ sample mean}
273 | $$
274 |
275 | This can be calculated in NumPy using `np.var`.
276 |
277 | But if you were writing a function to handle such a calculation, you might
278 | anticipate a divide-by-zero error when the sample size is one.
279 |
280 | One possible action is to do nothing --- the program will just crash, and spit out an error message.
281 |
282 | But sometimes it's worth writing your code in a way that anticipates and deals with runtime errors that you think might arise.
283 |
284 | Why?
285 |
286 | * Because the debugging information provided by the interpreter is often less useful than what can be provided by a well written error message.
287 | * Because errors that cause execution to stop interrupt workflows.
288 | * Because it reduces confidence in your code on the part of your users (if you are writing for others).
289 |
290 |
291 | In this section, we'll discuss different types of errors in Python and techniques to handle potential errors in our programs.
292 |
293 | ### Errors in Python
294 |
295 | We have seen `AttributeError` and `NameError` in {any}`our previous examples `.
296 |
297 | In Python, there are two types of errors -- syntax errors and exceptions.
298 |
299 | ```{index} single: Python; Exceptions
300 | ```
301 |
302 | Here's an example of a common error type
303 |
304 | ```{code-cell} python3
305 | ---
306 | tags: [raises-exception]
307 | ---
308 | def f:
309 | ```
310 |
311 | Since illegal syntax cannot be executed, a syntax error terminates execution of the program.
312 |
313 | Here's a different kind of error, unrelated to syntax
314 |
315 | ```{code-cell} python3
316 | ---
317 | tags: [raises-exception]
318 | ---
319 | 1 / 0
320 | ```
321 |
322 | Here's another
323 |
324 | ```{code-cell} python3
325 | ---
326 | tags: [raises-exception]
327 | ---
328 | x1 = y1
329 | ```
330 |
331 | And another
332 |
333 | ```{code-cell} python3
334 | ---
335 | tags: [raises-exception]
336 | ---
337 | 'foo' + 6
338 | ```
339 |
340 | And another
341 |
342 | ```{code-cell} python3
343 | ---
344 | tags: [raises-exception]
345 | ---
346 | X = []
347 | x = X[0]
348 | ```
349 |
350 | On each occasion, the interpreter informs us of the error type
351 |
352 | * `NameError`, `TypeError`, `IndexError`, `ZeroDivisionError`, etc.
353 |
354 | In Python, these errors are called *exceptions*.
355 |
356 | ### Assertions
357 |
358 | ```{index} single: Python; Assertions
359 | ```
360 |
361 | Sometimes errors can be avoided by checking whether your program runs as expected.
362 |
363 | A relatively easy way to handle checks is with the `assert` keyword.
364 |
365 | For example, pretend for a moment that the `np.var` function doesn't
366 | exist and we need to write our own
367 |
368 | ```{code-cell} python3
369 | def var(y):
370 | n = len(y)
371 | assert n > 1, 'Sample size must be greater than one.'
372 | return np.sum((y - y.mean())**2) / float(n-1)
373 | ```
374 |
375 | If we run this with an array of length one, the program will terminate and
376 | print our error message
377 |
378 | ```{code-cell} python3
379 | ---
380 | tags: [raises-exception]
381 | ---
382 | var([1])
383 | ```
384 |
385 | The advantage is that we can
386 |
387 | * fail early, as soon as we know there will be a problem
388 | * supply specific information on why a program is failing
389 |
390 | ### Handling Errors During Runtime
391 |
392 | ```{index} single: Python; Runtime Errors
393 | ```
394 |
395 | The approach used above is a bit limited, because it always leads to
396 | termination.
397 |
398 | Sometimes we can handle errors more gracefully, by treating special cases.
399 |
400 | Let's look at how this is done.
401 |
402 | #### Catching Exceptions
403 |
404 | We can catch and deal with exceptions using `try` -- `except` blocks.
405 |
406 | Here's a simple example
407 |
408 | ```{code-cell} python3
409 | def f(x):
410 | try:
411 | return 1.0 / x
412 | except ZeroDivisionError:
413 | print('Error: division by zero. Returned None')
414 | return None
415 | ```
416 |
417 | When we call `f` we get the following output
418 |
419 | ```{code-cell} python3
420 | f(2)
421 | ```
422 |
423 | ```{code-cell} python3
424 | f(0)
425 | ```
426 |
427 | ```{code-cell} python3
428 | f(0.0)
429 | ```
430 |
431 | The error is caught and execution of the program is not terminated.
432 |
433 | Note that other error types are not caught.
434 |
435 | If we are worried the user might pass in a string, we can catch that error too
436 |
437 | ```{code-cell} python3
438 | def f(x):
439 | try:
440 | return 1.0 / x
441 | except ZeroDivisionError:
442 | print('Error: Division by zero. Returned None')
443 | except TypeError:
444 | print(f'Error: x cannot be of type {type(x)}. Returned None')
445 | return None
446 | ```
447 |
448 | Here's what happens
449 |
450 | ```{code-cell} python3
451 | f(2)
452 | ```
453 |
454 | ```{code-cell} python3
455 | f(0)
456 | ```
457 |
458 | ```{code-cell} python3
459 | f('foo')
460 | ```
461 |
462 | If we feel lazy we can catch these errors together
463 |
464 | ```{code-cell} python3
465 | def f(x):
466 | try:
467 | return 1.0 / x
468 | except:
469 | print(f'Error. An issue has occurred with x = {x} of type: {type(x)}')
470 | return None
471 | ```
472 |
473 | Here's what happens
474 |
475 | ```{code-cell} python3
476 | f(2)
477 | ```
478 |
479 | ```{code-cell} python3
480 | f(0)
481 | ```
482 |
483 | ```{code-cell} python3
484 | f('foo')
485 | ```
486 |
487 | In general it's better to be specific.
488 |
489 |
490 | ## Exercises
491 |
492 | ```{exercise-start}
493 | :label: debug_ex1
494 | ```
495 |
496 | Suppose we have a text file `numbers.txt` containing the following lines
497 |
498 | ```{code-block} none
499 | :class: no-execute
500 |
501 | prices
502 | 3
503 | 8
504 |
505 | 7
506 | 21
507 | ```
508 |
509 | Using `try` -- `except`, write a program to read in the contents of the file and sum the numbers, ignoring lines without numbers.
510 |
511 | You can use the `open()` function we learnt {any}`before` to open `numbers.txt`.
512 | ```{exercise-end}
513 | ```
514 |
515 |
516 | ```{solution-start} debug_ex1
517 | :class: dropdown
518 | ```
519 |
520 | Let's save the data first
521 |
522 | ```{code-cell} python3
523 | %%file numbers.txt
524 | prices
525 | 3
526 | 8
527 |
528 | 7
529 | 21
530 | ```
531 |
532 | ```{code-cell} python3
533 | f = open('numbers.txt')
534 |
535 | total = 0.0
536 | for line in f:
537 | try:
538 | total += float(line)
539 | except ValueError:
540 | pass
541 |
542 | f.close()
543 |
544 | print(total)
545 | ```
546 |
547 | ```{solution-end}
548 | ```
--------------------------------------------------------------------------------
/lectures/writing_good_code.md:
--------------------------------------------------------------------------------
1 | ---
2 | jupytext:
3 | text_representation:
4 | extension: .md
5 | format_name: myst
6 | kernelspec:
7 | display_name: Python 3
8 | language: python
9 | name: python3
10 | ---
11 |
12 | (writing_good_code)=
13 | ```{raw} jupyter
14 |
19 | ```
20 |
21 | # Writing Good Code
22 |
23 | ```{index} single: Models; Code style
24 | ```
25 |
26 | ```{epigraph}
27 | "Any fool can write code that a computer can understand. Good programmers write code that humans can understand." -- Martin Fowler
28 | ```
29 |
30 |
31 | ## Overview
32 |
33 | When computer programs are small, poorly written code is not overly costly.
34 |
35 | But more data, more sophisticated models, and more computer power are enabling us to take on more challenging problems that involve writing longer programs.
36 |
37 | For such programs, investment in good coding practices will pay high returns.
38 |
39 | The main payoffs are higher productivity and faster code.
40 |
41 | In this lecture, we review some elements of good coding practice.
42 |
43 | We also touch on modern developments in scientific computing --- such as just in time compilation --- and how they affect good program design.
44 |
45 | ## An Example of Poor Code
46 |
47 | Let's have a look at some poorly written code.
48 |
49 | The job of the code is to generate and plot time series of the simplified Solow model
50 |
51 | ```{math}
52 | :label: gc_solmod
53 |
54 | k_{t+1} = s k_t^{\alpha} + (1 - \delta) k_t,
55 | \quad t = 0, 1, 2, \ldots
56 | ```
57 |
58 | Here
59 |
60 | * $k_t$ is capital at time $t$ and
61 | * $s, \alpha, \delta$ are parameters (savings, a productivity parameter and depreciation)
62 |
63 | For each parameterization, the code
64 |
65 | 1. sets $k_0 = 1$
66 | 1. iterates using {eq}`gc_solmod` to produce a sequence $k_0, k_1, k_2 \ldots , k_T$
67 | 1. plots the sequence
68 |
69 | The plots will be grouped into three subfigures.
70 |
71 | In each subfigure, two parameters are held fixed while another varies
72 |
73 | ```{code-cell} ipython
74 | import numpy as np
75 | import matplotlib.pyplot as plt
76 |
77 | # Allocate memory for time series
78 | k = np.empty(50)
79 |
80 | fig, axes = plt.subplots(3, 1, figsize=(8, 16))
81 |
82 | # Trajectories with different α
83 | δ = 0.1
84 | s = 0.4
85 | α = (0.25, 0.33, 0.45)
86 |
87 | for j in range(3):
88 | k[0] = 1
89 | for t in range(49):
90 | k[t+1] = s * k[t]**α[j] + (1 - δ) * k[t]
91 | axes[0].plot(k, 'o-', label=rf"$\alpha = {α[j]},\; s = {s},\; \delta={δ}$")
92 |
93 | axes[0].grid(lw=0.2)
94 | axes[0].set_ylim(0, 18)
95 | axes[0].set_xlabel('time')
96 | axes[0].set_ylabel('capital')
97 | axes[0].legend(loc='upper left', frameon=True)
98 |
99 | # Trajectories with different s
100 | δ = 0.1
101 | α = 0.33
102 | s = (0.3, 0.4, 0.5)
103 |
104 | for j in range(3):
105 | k[0] = 1
106 | for t in range(49):
107 | k[t+1] = s[j] * k[t]**α + (1 - δ) * k[t]
108 | axes[1].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s[j]},\; \delta={δ}$")
109 |
110 | axes[1].grid(lw=0.2)
111 | axes[1].set_xlabel('time')
112 | axes[1].set_ylabel('capital')
113 | axes[1].set_ylim(0, 18)
114 | axes[1].legend(loc='upper left', frameon=True)
115 |
116 | # Trajectories with different δ
117 | δ = (0.05, 0.1, 0.15)
118 | α = 0.33
119 | s = 0.4
120 |
121 | for j in range(3):
122 | k[0] = 1
123 | for t in range(49):
124 | k[t+1] = s * k[t]**α + (1 - δ[j]) * k[t]
125 | axes[2].plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta={δ[j]}$")
126 |
127 | axes[2].set_ylim(0, 18)
128 | axes[2].set_xlabel('time')
129 | axes[2].set_ylabel('capital')
130 | axes[2].grid(lw=0.2)
131 | axes[2].legend(loc='upper left', frameon=True)
132 |
133 | plt.show()
134 | ```
135 |
136 | True, the code more or less follows [PEP8](https://peps.python.org/pep-0008/).
137 |
138 | At the same time, it's very poorly structured.
139 |
140 | Let's talk about why that's the case, and what we can do about it.
141 |
142 | ## Good Coding Practice
143 |
144 | There are usually many different ways to write a program that accomplishes a given task.
145 |
146 | For small programs, like the one above, the way you write code doesn't matter too much.
147 |
148 | But if you are ambitious and want to produce useful things, you'll write medium to large programs too.
149 |
150 | In those settings, coding style matters **a great deal**.
151 |
152 | Fortunately, lots of smart people have thought about the best way to write code.
153 |
154 | Here are some basic precepts.
155 |
156 | ### Don't Use Magic Numbers
157 |
158 | If you look at the code above, you'll see numbers like `50` and `49` and `3` scattered through the code.
159 |
160 | These kinds of numeric literals in the body of your code are sometimes called "magic numbers".
161 |
162 | This is not a compliment.
163 |
164 | While numeric literals are not all evil, the numbers shown in the program above
165 | should certainly be replaced by named constants.
166 |
167 | For example, the code above could declare the variable `time_series_length = 50`.
168 |
169 | Then in the loops, `49` should be replaced by `time_series_length - 1`.
170 |
171 | The advantages are:
172 |
173 | * the meaning is much clearer throughout
174 | * to alter the time series length, you only need to change one value
175 |
176 | ### Don't Repeat Yourself
177 |
178 | The other mortal sin in the code snippet above is repetition.
179 |
180 | Blocks of logic (such as the loop to generate time series) are repeated with only minor changes.
181 |
182 | This violates a fundamental tenet of programming: Don't repeat yourself (DRY).
183 |
184 | * Also called DIE (duplication is evil).
185 |
186 | Yes, we realize that you can just cut and paste and change a few symbols.
187 |
188 | But as a programmer, your aim should be to **automate** repetition, **not** do it yourself.
189 |
190 | More importantly, repeating the same logic in different places means that eventually one of them will likely be wrong.
191 |
192 | If you want to know more, read the excellent summary found on [this page](https://code.tutsplus.com/3-key-software-principles-you-must-understand--net-25161t).
193 |
194 | We'll talk about how to avoid repetition below.
195 |
196 | ### Minimize Global Variables
197 |
198 | Sure, global variables (i.e., names assigned to values outside of any function or class) are convenient.
199 |
200 | Rookie programmers typically use global variables with abandon --- as we once did ourselves.
201 |
202 | But global variables are dangerous, especially in medium to large size programs, since
203 |
204 | * they can affect what happens in any part of your program
205 | * they can be changed by any function
206 |
207 | This makes it much harder to be certain about what some small part of a given piece of code actually commands.
208 |
209 | Here's a [useful discussion on the topic](https://wiki.c2.com/?GlobalVariablesAreBad).
210 |
211 | While the odd global in small scripts is no big deal, we recommend that you teach yourself to avoid them.
212 |
213 | (We'll discuss how just below).
214 |
215 | #### JIT Compilation
216 |
217 | For scientific computing, there is another good reason to avoid global variables.
218 |
219 | As {doc}`we've seen in previous lectures `, JIT compilation can generate excellent performance for scripting languages like Python.
220 |
221 | But the task of the compiler used for JIT compilation becomes harder when global variables are present.
222 |
223 | Put differently, the type inference required for JIT compilation is safer and
224 | more effective when variables are sandboxed inside a function.
225 |
226 | ### Use Functions or Classes
227 |
228 | Fortunately, we can easily avoid the evils of global variables and WET code.
229 |
230 | * WET stands for "we enjoy typing" and is the opposite of DRY.
231 |
232 | We can do this by making frequent use of functions or classes.
233 |
234 | In fact, functions and classes are designed specifically to help us avoid shaming ourselves by repeating code or excessive use of global variables.
235 |
236 | #### Which One, Functions or Classes?
237 |
238 | Both can be useful, and in fact they work well with each other.
239 |
240 | We'll learn more about these topics over time.
241 |
242 | (Personal preference is part of the story too)
243 |
244 | What's really important is that you use one or the other or both.
245 |
246 | ## Revisiting the Example
247 |
248 | Here's some code that reproduces the plot above with better coding style.
249 |
250 | ```{code-cell} python3
251 | from itertools import product
252 |
253 | def plot_path(ax, αs, s_vals, δs, time_series_length=50):
254 | """
255 | Add a time series plot to the axes ax for all given parameters.
256 | """
257 | k = np.empty(time_series_length)
258 |
259 | for (α, s, δ) in product(αs, s_vals, δs):
260 | k[0] = 1
261 | for t in range(time_series_length-1):
262 | k[t+1] = s * k[t]**α + (1 - δ) * k[t]
263 | ax.plot(k, 'o-', label=rf"$\alpha = {α},\; s = {s},\; \delta = {δ}$")
264 |
265 | ax.set_xlabel('time')
266 | ax.set_ylabel('capital')
267 | ax.set_ylim(0, 18)
268 | ax.legend(loc='upper left', frameon=True)
269 |
270 | fig, axes = plt.subplots(3, 1, figsize=(8, 16))
271 |
272 | # Parameters (αs, s_vals, δs)
273 | set_one = ([0.25, 0.33, 0.45], [0.4], [0.1])
274 | set_two = ([0.33], [0.3, 0.4, 0.5], [0.1])
275 | set_three = ([0.33], [0.4], [0.05, 0.1, 0.15])
276 |
277 | for (ax, params) in zip(axes, (set_one, set_two, set_three)):
278 | αs, s_vals, δs = params
279 | plot_path(ax, αs, s_vals, δs)
280 |
281 | plt.show()
282 | ```
283 |
284 | If you inspect this code, you will see that
285 |
286 | * it uses a function to avoid repetition.
287 | * Global variables are quarantined by collecting them together at the end, not the start of the program.
288 | * Magic numbers are avoided.
289 | * The loop at the end where the actual work is done is short and relatively simple.
290 |
291 | ## Exercises
292 |
293 | ```{exercise-start}
294 | :label: wgc-exercise-1
295 | ```
296 |
297 | Here is some code that needs improving.
298 |
299 | It involves a basic supply and demand problem.
300 |
301 | Supply is given by
302 |
303 | $$
304 | q_s(p) = \exp(\alpha p) - \beta.
305 | $$
306 |
307 | The demand curve is
308 |
309 | $$
310 | q_d(p) = \gamma p^{-\delta}.
311 | $$
312 |
313 | The values $\alpha$, $\beta$, $\gamma$ and
314 | $\delta$ are **parameters**
315 |
316 | The equilibrium $p^*$ is the price such that
317 | $q_d(p) = q_s(p)$.
318 |
319 | We can solve for this equilibrium using a root finding algorithm.
320 | Specifically, we will find the $p$ such that $h(p) = 0$,
321 | where
322 |
323 | $$
324 | h(p) := q_d(p) - q_s(p)
325 | $$
326 |
327 | This yields the equilibrium price $p^*$. From this we get the
328 | equilibrium quantity by $q^* = q_s(p^*)$
329 |
330 | The parameter values will be
331 |
332 | - $\alpha = 0.1$
333 | - $\beta = 1$
334 | - $\gamma = 1$
335 | - $\delta = 1$
336 |
337 | ```{code-cell} ipython3
338 | from scipy.optimize import brentq
339 |
340 | # Compute equilibrium
341 | def h(p):
342 | return p**(-1) - (np.exp(0.1 * p) - 1) # demand - supply
343 |
344 | p_star = brentq(h, 2, 4)
345 | q_star = np.exp(0.1 * p_star) - 1
346 |
347 | print(f'Equilibrium price is {p_star: .2f}')
348 | print(f'Equilibrium quantity is {q_star: .2f}')
349 | ```
350 |
351 | Let's also plot our results.
352 |
353 | ```{code-cell} ipython3
354 | # Now plot
355 | grid = np.linspace(2, 4, 100)
356 | fig, ax = plt.subplots()
357 |
358 | qs = np.exp(0.1 * grid) - 1
359 | qd = grid**(-1)
360 |
361 |
362 | ax.plot(grid, qd, 'b-', lw=2, label='demand')
363 | ax.plot(grid, qs, 'g-', lw=2, label='supply')
364 |
365 | ax.set_xlabel('price')
366 | ax.set_ylabel('quantity')
367 | ax.legend(loc='upper center')
368 |
369 | plt.show()
370 | ```
371 |
372 | We also want to consider supply and demand shifts.
373 |
374 | For example, let's see what happens when demand shifts up, with $\gamma$ increasing to $1.25$:
375 |
376 | ```{code-cell} ipython3
377 | # Compute equilibrium
378 | def h(p):
379 | return 1.25 * p**(-1) - (np.exp(0.1 * p) - 1)
380 |
381 | p_star = brentq(h, 2, 4)
382 | q_star = np.exp(0.1 * p_star) - 1
383 |
384 | print(f'Equilibrium price is {p_star: .2f}')
385 | print(f'Equilibrium quantity is {q_star: .2f}')
386 | ```
387 |
388 | ```{code-cell} ipython3
389 | # Now plot
390 | p_grid = np.linspace(2, 4, 100)
391 | fig, ax = plt.subplots()
392 |
393 | qs = np.exp(0.1 * p_grid) - 1
394 | qd = 1.25 * p_grid**(-1)
395 |
396 |
397 | ax.plot(grid, qd, 'b-', lw=2, label='demand')
398 | ax.plot(grid, qs, 'g-', lw=2, label='supply')
399 |
400 | ax.set_xlabel('price')
401 | ax.set_ylabel('quantity')
402 | ax.legend(loc='upper center')
403 |
404 | plt.show()
405 | ```
406 |
407 | Now we might consider supply shifts, but you already get the idea that there's
408 | a lot of repeated code here.
409 |
410 | Refactor and improve clarity in the code above using the principles discussed
411 | in this lecture.
412 |
413 | ```{exercise-end}
414 | ```
415 |
416 | ```{solution-start} wgc-exercise-1
417 | :class: dropdown
418 | ```
419 |
420 | Here's one solution, that uses a class:
421 |
422 | ```{code-cell} ipython3
423 | class Equilibrium:
424 |
425 | def __init__(self, α=0.1, β=1, γ=1, δ=1):
426 | self.α, self.β, self.γ, self.δ = α, β, γ, δ
427 |
428 | def qs(self, p):
429 | return np.exp(self.α * p) - self.β
430 |
431 | def qd(self, p):
432 | return self.γ * p**(-self.δ)
433 |
434 | def compute_equilibrium(self):
435 | def h(p):
436 | return self.qd(p) - self.qs(p)
437 | p_star = brentq(h, 2, 4)
438 | q_star = np.exp(self.α * p_star) - self.β
439 |
440 | print(f'Equilibrium price is {p_star: .2f}')
441 | print(f'Equilibrium quantity is {q_star: .2f}')
442 |
443 | def plot_equilibrium(self):
444 | # Now plot
445 | grid = np.linspace(2, 4, 100)
446 | fig, ax = plt.subplots()
447 |
448 | ax.plot(grid, self.qd(grid), 'b-', lw=2, label='demand')
449 | ax.plot(grid, self.qs(grid), 'g-', lw=2, label='supply')
450 |
451 | ax.set_xlabel('price')
452 | ax.set_ylabel('quantity')
453 | ax.legend(loc='upper center')
454 |
455 | plt.show()
456 | ```
457 |
458 | Let's create an instance at the default parameter values.
459 |
460 | ```{code-cell} ipython3
461 | eq = Equilibrium()
462 | ```
463 |
464 | Now we'll compute the equilibrium and plot it.
465 |
466 | ```{code-cell} ipython3
467 | eq.compute_equilibrium()
468 | ```
469 |
470 | ```{code-cell} ipython3
471 | eq.plot_equilibrium()
472 | ```
473 |
474 | One of the nice things about our refactored code is that, when we change
475 | parameters, we don't need to repeat ourselves:
476 |
477 | ```{code-cell} ipython3
478 | eq.γ = 1.25
479 | ```
480 |
481 | ```{code-cell} ipython3
482 | eq.compute_equilibrium()
483 | ```
484 |
485 | ```{code-cell} ipython3
486 | eq.plot_equilibrium()
487 | ```
488 |
489 | ```{solution-end}
490 | ```
--------------------------------------------------------------------------------
/lectures/names.md:
--------------------------------------------------------------------------------
1 | ---
2 | jupytext:
3 | text_representation:
4 | extension: .md
5 | format_name: myst
6 | kernelspec:
7 | display_name: Python 3
8 | language: python
9 | name: python3
10 | ---
11 |
12 | (oop_names)=
13 | ```{raw} jupyter
14 |
19 | ```
20 |
21 | # Names and Namespaces
22 |
23 | ## Overview
24 |
25 | This lecture is all about variable names, how they can be used and how they are
26 | understood by the Python interpreter.
27 |
28 | This might sound a little dull but the model that Python has adopted for
29 | handling names is elegant and interesting.
30 |
31 | In addition, you will save yourself many hours of debugging if you have a good
32 | understanding of how names work in Python.
33 |
34 | (var_names)=
35 | ## Variable Names in Python
36 |
37 | ```{index} single: Python; Variable Names
38 | ```
39 |
40 | Consider the Python statement
41 |
42 | ```{code-cell} python3
43 | x = 42
44 | ```
45 |
46 | We now know that when this statement is executed, Python creates an object of
47 | type `int` in your computer's memory, containing
48 |
49 | * the value `42`
50 | * some associated attributes
51 |
52 | But what is `x` itself?
53 |
54 | In Python, `x` is called a **name**, and the statement `x = 42` **binds** the name `x` to the integer object we have just discussed.
55 |
56 | Under the hood, this process of binding names to objects is implemented as a dictionary---more about this in a moment.
57 |
58 | There is no problem binding two or more names to the one object, regardless of what that object is
59 |
60 | ```{code-cell} python3
61 | def f(string): # Create a function called f
62 | print(string) # that prints any string it's passed
63 |
64 | g = f
65 | id(g) == id(f)
66 | ```
67 |
68 | ```{code-cell} python3
69 | g('test')
70 | ```
71 |
72 | In the first step, a function object is created, and the name `f` is bound to it.
73 |
74 | After binding the name `g` to the same object, we can use it anywhere we would use `f`.
75 |
76 | What happens when the number of names bound to an object goes to zero?
77 |
78 | Here's an example of this situation, where the name `x` is first bound to one object and then **rebound** to another
79 |
80 | ```{code-cell} python3
81 | x = 'foo'
82 | id(x)
83 | x = 'bar'
84 | id(x)
85 | ```
86 |
87 | In this case, after we rebind `x` to `'bar'`, no names bound are to the first object `'foo'`.
88 |
89 | This is a trigger for `'foo'` to be garbage collected.
90 |
91 | In other words, the memory slot that stores that object is deallocated and returned to the operating system.
92 |
93 | Garbage collection is actually an active research area in computer science.
94 |
95 | You can [read more on garbage collection](https://rushter.com/blog/python-garbage-collector/) if you are interested.
96 |
97 | ## Namespaces
98 |
99 | ```{index} single: Python; Namespaces
100 | ```
101 |
102 | Recall from the preceding discussion that the statement
103 |
104 | ```{code-cell} python3
105 | x = 42
106 | ```
107 |
108 | binds the name `x` to the integer object on the right-hand side.
109 |
110 | We also mentioned that this process of binding `x` to the correct object is implemented as a dictionary.
111 |
112 | This dictionary is called a namespace.
113 |
114 | ```{admonition} Definition
115 | A **namespace** is a symbol table that maps names to objects in memory.
116 | ```
117 |
118 |
119 | Python uses multiple namespaces, creating them on the fly as necessary.
120 |
121 | For example, every time we import a module, Python creates a namespace for that module.
122 |
123 | To see this in action, suppose we write a script `mathfoo.py` with a single line
124 |
125 | ```{code-cell} python3
126 | %%file mathfoo.py
127 | pi = 'foobar'
128 | ```
129 |
130 | Now we start the Python interpreter and import it
131 |
132 | ```{code-cell} python3
133 | import mathfoo
134 | ```
135 |
136 | Next let's import the `math` module from the standard library
137 |
138 | ```{code-cell} python3
139 | import math
140 | ```
141 |
142 | Both of these modules have an attribute called `pi`
143 |
144 | ```{code-cell} python3
145 | math.pi
146 | ```
147 |
148 | ```{code-cell} python3
149 | mathfoo.pi
150 | ```
151 |
152 | These two different bindings of `pi` exist in different namespaces, each one implemented as a dictionary.
153 |
154 | If you wish, you can look at the dictionary directly, using `module_name.__dict__`.
155 |
156 | ```{code-cell} python3
157 | import math
158 |
159 | math.__dict__.items()
160 | ```
161 |
162 | ```{code-cell} python3
163 | import mathfoo
164 |
165 | mathfoo.__dict__
166 | ```
167 |
168 | As you know, we access elements of the namespace using the dotted attribute notation
169 |
170 | ```{code-cell} python3
171 | math.pi
172 | ```
173 |
174 | This is entirely equivalent to `math.__dict__['pi']`
175 |
176 | ```{code-cell} python3
177 | math.__dict__['pi']
178 | ```
179 |
180 | ## Viewing Namespaces
181 |
182 | As we saw above, the `math` namespace can be printed by typing `math.__dict__`.
183 |
184 | Another way to see its contents is to type `vars(math)`
185 |
186 | ```{code-cell} python3
187 | vars(math).items()
188 | ```
189 |
190 | If you just want to see the names, you can type
191 |
192 | ```{code-cell} python3
193 | # Show the first 10 names
194 | dir(math)[0:10]
195 | ```
196 |
197 | Notice the special names `__doc__` and `__name__`.
198 |
199 | These are initialized in the namespace when any module is imported
200 |
201 | * `__doc__` is the doc string of the module
202 | * `__name__` is the name of the module
203 |
204 | ```{code-cell} python3
205 | print(math.__doc__)
206 | ```
207 |
208 | ```{code-cell} python3
209 | math.__name__
210 | ```
211 |
212 | ## Interactive Sessions
213 |
214 | ```{index} single: Python; Interpreter
215 | ```
216 |
217 | In Python, **all** code executed by the interpreter runs in some module.
218 |
219 | What about commands typed at the prompt?
220 |
221 | These are also regarded as being executed within a module --- in this case, a module called `__main__`.
222 |
223 | To check this, we can look at the current module name via the value of `__name__` given at the prompt
224 |
225 | ```{code-cell} python3
226 | print(__name__)
227 | ```
228 |
229 | When we run a script using IPython's `run` command, the contents of the file are executed as part of `__main__` too.
230 |
231 | To see this, let's create a file `mod.py` that prints its own `__name__` attribute
232 |
233 | ```{code-cell} ipython
234 | %%file mod.py
235 | print(__name__)
236 | ```
237 |
238 | Now let's look at two different ways of running it in IPython
239 |
240 | ```{code-cell} python3
241 | import mod # Standard import
242 | ```
243 |
244 | ```{code-cell} ipython
245 | %run mod.py # Run interactively
246 | ```
247 |
248 | In the second case, the code is executed as part of `__main__`, so `__name__` is equal to `__main__`.
249 |
250 | To see the contents of the namespace of `__main__` we use `vars()` rather than `vars(__main__)`.
251 |
252 | If you do this in IPython, you will see a whole lot of variables that IPython
253 | needs, and has initialized when you started up your session.
254 |
255 | If you prefer to see only the variables you have initialized, use `%whos`
256 |
257 | ```{code-cell} ipython
258 | x = 2
259 | y = 3
260 |
261 | import numpy as np
262 |
263 | %whos
264 | ```
265 |
266 | ## The Global Namespace
267 |
268 | ```{index} single: Python; Namespace (Global)
269 | ```
270 |
271 | Python documentation often makes reference to the "global namespace".
272 |
273 | The global namespace is *the namespace of the module currently being executed*.
274 |
275 | For example, suppose that we start the interpreter and begin making assignments.
276 |
277 | We are now working in the module `__main__`, and hence the namespace for `__main__` is the global namespace.
278 |
279 | Next, we import a module called `amodule`
280 |
281 | ```{code-block} python3
282 | :class: no-execute
283 |
284 | import amodule
285 | ```
286 |
287 | At this point, the interpreter creates a namespace for the module `amodule` and starts executing commands in the module.
288 |
289 | While this occurs, the namespace `amodule.__dict__` is the global namespace.
290 |
291 | Once execution of the module finishes, the interpreter returns to the module from where the import statement was made.
292 |
293 | In this case it's `__main__`, so the namespace of `__main__` again becomes the global namespace.
294 |
295 | ## Local Namespaces
296 |
297 | ```{index} single: Python; Namespace (Local)
298 | ```
299 |
300 | Important fact: When we call a function, the interpreter creates a *local namespace* for that function, and registers the variables in that namespace.
301 |
302 | The reason for this will be explained in just a moment.
303 |
304 | Variables in the local namespace are called *local variables*.
305 |
306 | After the function returns, the namespace is deallocated and lost.
307 |
308 | While the function is executing, we can view the contents of the local namespace with `locals()`.
309 |
310 | For example, consider
311 |
312 | ```{code-cell} python3
313 | def f(x):
314 | a = 2
315 | print(locals())
316 | return a * x
317 | ```
318 |
319 | Now let's call the function
320 |
321 | ```{code-cell} python3
322 | f(1)
323 | ```
324 |
325 | You can see the local namespace of `f` before it is destroyed.
326 |
327 | ## The `__builtins__` Namespace
328 |
329 | ```{index} single: Python; Namespace (__builtins__)
330 | ```
331 |
332 | We have been using various built-in functions, such as `max(), dir(), str(), list(), len(), range(), type()`, etc.
333 |
334 | How does access to these names work?
335 |
336 | * These definitions are stored in a module called `__builtin__`.
337 | * They have their own namespace called `__builtins__`.
338 |
339 | ```{code-cell} python3
340 | # Show the first 10 names in `__main__`
341 | dir()[0:10]
342 | ```
343 |
344 | ```{code-cell} python3
345 | # Show the first 10 names in `__builtins__`
346 | dir(__builtins__)[0:10]
347 | ```
348 |
349 | We can access elements of the namespace as follows
350 |
351 | ```{code-cell} python3
352 | __builtins__.max
353 | ```
354 |
355 | But `__builtins__` is special, because we can always access them directly as well
356 |
357 | ```{code-cell} python3
358 | max
359 | ```
360 |
361 | ```{code-cell} python3
362 | __builtins__.max == max
363 | ```
364 |
365 | The next section explains how this works ...
366 |
367 | ## Name Resolution
368 |
369 | ```{index} single: Python; Namespace (Resolution)
370 | ```
371 |
372 | Namespaces are great because they help us organize variable names.
373 |
374 | (Type `import this` at the prompt and look at the last item that's printed)
375 |
376 | However, we do need to understand how the Python interpreter works with multiple namespaces.
377 |
378 | Understanding the flow of execution will help us to check which variables are in scope and how to operate on them when writing and debugging programs.
379 |
380 |
381 | At any point of execution, there are in fact at least two namespaces that can be accessed directly.
382 |
383 | ("Accessed directly" means without using a dot, as in `pi` rather than `math.pi`)
384 |
385 | These namespaces are
386 |
387 | * The global namespace (of the module being executed)
388 | * The builtin namespace
389 |
390 | If the interpreter is executing a function, then the directly accessible namespaces are
391 |
392 | * The local namespace of the function
393 | * The global namespace (of the module being executed)
394 | * The builtin namespace
395 |
396 | Sometimes functions are defined within other functions, like so
397 |
398 | ```{code-cell} python3
399 | def f():
400 | a = 2
401 | def g():
402 | b = 4
403 | print(a * b)
404 | g()
405 | ```
406 |
407 | Here `f` is the *enclosing function* for `g`, and each function gets its
408 | own namespaces.
409 |
410 | Now we can give the rule for how namespace resolution works:
411 |
412 | The order in which the interpreter searches for names is
413 |
414 | 1. the local namespace (if it exists)
415 | 1. the hierarchy of enclosing namespaces (if they exist)
416 | 1. the global namespace
417 | 1. the builtin namespace
418 |
419 | If the name is not in any of these namespaces, the interpreter raises a `NameError`.
420 |
421 | This is called the **LEGB rule** (local, enclosing, global, builtin).
422 |
423 | Here's an example that helps to illustrate.
424 |
425 | Visualizations here are created by [nbtutor](https://github.com/lgpage/nbtutor) in a Jupyter notebook.
426 |
427 | They can help you better understand your program when you are learning a new language.
428 |
429 | Consider a script `test.py` that looks as follows
430 |
431 | ```{code-cell} python3
432 | %%file test.py
433 | def g(x):
434 | a = 1
435 | x = x + a
436 | return x
437 |
438 | a = 0
439 | y = g(10)
440 | print("a = ", a, "y = ", y)
441 | ```
442 |
443 | What happens when we run this script?
444 |
445 | ```{code-cell} ipython
446 | %run test.py
447 | ```
448 |
449 | First,
450 |
451 | * The global namespace `{}` is created.
452 |
453 | ```{figure} /_static/lecture_specific/oop_intro/global.png
454 | ```
455 |
456 | * The function object is created, and `g` is bound to it within the global namespace.
457 | * The name `a` is bound to `0`, again in the global namespace.
458 |
459 | ```{figure} /_static/lecture_specific/oop_intro/global2.png
460 | ```
461 |
462 | Next `g` is called via `y = g(10)`, leading to the following sequence of actions
463 |
464 | * The local namespace for the function is created.
465 | * Local names `x` and `a` are bound, so that the local namespace becomes `{'x': 10, 'a': 1}`.
466 |
467 | Note that the global `a` was not affected by the local `a`.
468 |
469 | ```{figure} /_static/lecture_specific/oop_intro/local1.png
470 | ```
471 |
472 |
473 | * Statement `x = x + a` uses the local `a` and local `x` to compute `x + a`, and binds local name `x` to the result.
474 | * This value is returned, and `y` is bound to it in the global namespace.
475 | * Local `x` and `a` are discarded (and the local namespace is deallocated).
476 |
477 | ```{figure} /_static/lecture_specific/oop_intro/local_return.png
478 | ```
479 |
480 |
481 | (mutable_vs_immutable)=
482 | ### {index}`Mutable ` Versus {index}`Immutable ` Parameters
483 |
484 | This is a good time to say a little more about mutable vs immutable objects.
485 |
486 | Consider the code segment
487 |
488 | ```{code-cell} python3
489 | def f(x):
490 | x = x + 1
491 | return x
492 |
493 | x = 1
494 | print(f(x), x)
495 | ```
496 |
497 | We now understand what will happen here: The code prints `2` as the value of `f(x)` and `1` as the value of `x`.
498 |
499 | First `f` and `x` are registered in the global namespace.
500 |
501 | The call `f(x)` creates a local namespace and adds `x` to it, bound to `1`.
502 |
503 | Next, this local `x` is rebound to the new integer object `2`, and this value is returned.
504 |
505 | None of this affects the global `x`.
506 |
507 | However, it's a different story when we use a **mutable** data type such as a list
508 |
509 | ```{code-cell} python3
510 | def f(x):
511 | x[0] = x[0] + 1
512 | return x
513 |
514 | x = [1]
515 | print(f(x), x)
516 | ```
517 |
518 | This prints `[2]` as the value of `f(x)` and *same* for `x`.
519 |
520 | Here's what happens
521 |
522 | * `f` is registered as a function in the global namespace
523 |
524 | ```{figure} /_static/lecture_specific/oop_intro/mutable1.png
525 | ```
526 |
527 | * `x` is bound to `[1]` in the global namespace
528 |
529 | ```{figure} /_static/lecture_specific/oop_intro/mutable2.png
530 | ```
531 |
532 | * The call `f(x)`
533 | * Creates a local namespace
534 | * Adds `x` to the local namespace, bound to `[1]`
535 |
536 | ```{figure} /_static/lecture_specific/oop_intro/mutable3.png
537 | ```
538 |
539 | ```{note}
540 | The global `x` and the local `x` refer to the same `[1]`
541 | ```
542 |
543 | We can see the identity of local `x` and the identity of global `x` are the same
544 |
545 | ```{code-cell} python3
546 | def f(x):
547 | x[0] = x[0] + 1
548 | print(f'the identity of local x is {id(x)}')
549 | return x
550 |
551 | x = [1]
552 | print(f'the identity of global x is {id(x)}')
553 | print(f(x), x)
554 | ```
555 |
556 | * Within `f(x)`
557 | * The list `[1]` is modified to `[2]`
558 | * Returns the list `[2]`
559 |
560 | ```{figure} /_static/lecture_specific/oop_intro/mutable4.png
561 | ```
562 | * The local namespace is deallocated, and the local `x` is lost
563 |
564 | ```{figure} /_static/lecture_specific/oop_intro/mutable5.png
565 | ```
566 |
567 | If you want to modify the local `x` and the global `x` separately, you can create a [*copy*](https://docs.python.org/3/library/copy.html) of the list and assign the copy to the local `x`.
568 |
569 | We will leave this for you to explore.
570 |
571 |
572 |
573 |
--------------------------------------------------------------------------------
/lectures/numpy_vs_numba_vs_jax.md:
--------------------------------------------------------------------------------
1 | ---
2 | jupytext:
3 | text_representation:
4 | extension: .md
5 | format_name: myst
6 | kernelspec:
7 | display_name: Python 3
8 | language: python
9 | name: python3
10 | ---
11 |
12 | (parallel)=
13 | ```{raw} jupyter
14 |
19 | ```
20 |
21 | # NumPy vs Numba vs JAX
22 |
23 | In the preceding lectures, we've discussed three core libraries for scientific
24 | and numerical computing:
25 |
26 | * [NumPy](numpy)
27 | * [Numba](numba)
28 | * [JAX](jax_intro)
29 |
30 | Which one should we use in any given situation?
31 |
32 | This lecture addresses that question, at least partially, by discussing some use cases.
33 |
34 | Before getting started, we note that the first two are a natural pair: NumPy and
35 | Numba play well together.
36 |
37 | JAX, on the other hand, stands alone.
38 |
39 | When considering each approach, we will consider not just efficiency and memory
40 | footprint but also clarity and ease of use.
41 |
42 | In addition to what's in Anaconda, this lecture will need the following libraries:
43 |
44 | ```{code-cell} ipython3
45 | ---
46 | tags: [hide-output]
47 | ---
48 | !pip install quantecon jax
49 | ```
50 |
51 | ```{include} _admonition/gpu.md
52 | ```
53 |
54 | We will use the following imports.
55 |
56 | ```{code-cell} ipython3
57 | import random
58 | import numpy as np
59 | import quantecon as qe
60 | import matplotlib.pyplot as plt
61 | from mpl_toolkits.mplot3d.axes3d import Axes3D
62 | from matplotlib import cm
63 | import jax
64 | import jax.numpy as jnp
65 | ```
66 |
67 | ## Vectorized operations
68 |
69 | Some operations can be perfectly vectorized --- all loops are easily eliminated
70 | and numerical operations are reduced to calculations on arrays.
71 |
72 | In this case, which approach is best?
73 |
74 | ### Problem Statement
75 |
76 | Consider the problem of maximizing a function $f$ of two variables $(x,y)$ over
77 | the square $[-a, a] \times [-a, a]$.
78 |
79 | For $f$ and $a$ let's choose
80 |
81 | $$
82 | f(x,y) = \frac{\cos(x^2 + y^2)}{1 + x^2 + y^2}
83 | \quad \text{and} \quad
84 | a = 3
85 | $$
86 |
87 | Here's a plot of $f$
88 |
89 | ```{code-cell} ipython3
90 |
91 | def f(x, y):
92 | return np.cos(x**2 + y**2) / (1 + x**2 + y**2)
93 |
94 | xgrid = np.linspace(-3, 3, 50)
95 | ygrid = xgrid
96 | x, y = np.meshgrid(xgrid, ygrid)
97 |
98 | fig = plt.figure(figsize=(10, 8))
99 | ax = fig.add_subplot(111, projection='3d')
100 | ax.plot_surface(x,
101 | y,
102 | f(x, y),
103 | rstride=2, cstride=2,
104 | cmap=cm.jet,
105 | alpha=0.7,
106 | linewidth=0.25)
107 | ax.set_zlim(-0.5, 1.0)
108 | ax.set_xlabel('$x$', fontsize=14)
109 | ax.set_ylabel('$y$', fontsize=14)
110 | plt.show()
111 | ```
112 |
113 | For the sake of this exercise, we're going to use brute force for the
114 | maximization.
115 |
116 | 1. Evaluate $f$ for all $(x,y)$ in a grid on the square.
117 | 1. Return the maximum of observed values.
118 |
119 | Just to illustrate the idea, here's a non-vectorized version that uses Python loops.
120 |
121 | ```{code-cell} ipython3
122 | grid = np.linspace(-3, 3, 50)
123 | m = -np.inf
124 | for x in grid:
125 | for y in grid:
126 | z = f(x, y)
127 | if z > m:
128 | m = z
129 | ```
130 |
131 |
132 | ### NumPy vectorization
133 |
134 | If we switch to NumPy-style vectorization we can use a much larger grid and the
135 | code executes relatively quickly.
136 |
137 | Here we use `np.meshgrid` to create two-dimensional input grids `x` and `y` such
138 | that `f(x, y)` generates all evaluations on the product grid.
139 |
140 | (This strategy dates back to Matlab.)
141 |
142 | ```{code-cell} ipython3
143 | grid = np.linspace(-3, 3, 3_000)
144 | x, y = np.meshgrid(grid, grid)
145 |
146 | with qe.Timer(precision=8):
147 | z_max_numpy = np.max(f(x, y))
148 |
149 | print(f"NumPy result: {z_max_numpy:.6f}")
150 | ```
151 |
152 | In the vectorized version, all the looping takes place in compiled code.
153 |
154 | Moreover, NumPy uses implicit multithreading, so that at least some parallelization occurs.
155 |
156 | (The parallelization cannot be highly efficient because the binary is compiled
157 | before it sees the size of the arrays `x` and `y`.)
158 |
159 |
160 | ### A Comparison with Numba
161 |
162 | Now let's see if we can achieve better performance using Numba with a simple loop.
163 |
164 | ```{code-cell} ipython3
165 | import numba
166 |
167 | @numba.jit
168 | def compute_max_numba(grid):
169 | m = -np.inf
170 | for x in grid:
171 | for y in grid:
172 | z = np.cos(x**2 + y**2) / (1 + x**2 + y**2)
173 | if z > m:
174 | m = z
175 | return m
176 |
177 | grid = np.linspace(-3, 3, 3_000)
178 |
179 | with qe.Timer(precision=8):
180 | z_max_numpy = compute_max_numba(grid)
181 |
182 | print(f"Numba result: {z_max_numpy:.6f}")
183 | ```
184 |
185 | Let's run again to eliminate compile time.
186 |
187 | ```{code-cell} ipython3
188 | with qe.Timer(precision=8):
189 | compute_max_numba(grid)
190 | ```
191 |
192 | Depending on your machine, the Numba version can be a bit slower or a bit faster
193 | than NumPy.
194 |
195 | On one hand, NumPy combines efficient arithmetic (like Numba) with some
196 | multithreading (unlike this Numba code), which provides an advantage.
197 |
198 | On the other hand, the Numba routine uses much less memory, since we are only
199 | working with a single one-dimensional grid.
200 |
201 |
202 | ### Parallelized Numba
203 |
204 | Now let's try parallelization with Numba using `prange`:
205 |
206 | Here's a naive and *incorrect* attempt.
207 |
208 | ```{code-cell} ipython3
209 | @numba.jit(parallel=True)
210 | def compute_max_numba_parallel(grid):
211 | n = len(grid)
212 | m = -np.inf
213 | for i in numba.prange(n):
214 | for j in range(n):
215 | x = grid[i]
216 | y = grid[j]
217 | z = np.cos(x**2 + y**2) / (1 + x**2 + y**2)
218 | if z > m:
219 | m = z
220 | return m
221 |
222 | ```
223 |
224 | Usually this returns an incorrect result:
225 |
226 | ```{code-cell} ipython3
227 | z_max_parallel_incorrect = compute_max_numba_parallel(grid)
228 | print(f"Numba result: {z_max_parallel_incorrect} 😱")
229 | ```
230 |
231 | The reason is that the variable `m` is shared across threads and not properly controlled.
232 |
233 | When multiple threads try to read and write `m` simultaneously, they interfere with each other.
234 |
235 | Threads read stale values of `m` or overwrite each other's updates --— or `m` never gets updated from its initial value.
236 |
237 | Here's a more carefully written version.
238 |
239 | ```{code-cell} ipython3
240 | @numba.jit(parallel=True)
241 | def compute_max_numba_parallel(grid):
242 | n = len(grid)
243 | row_maxes = np.empty(n)
244 | for i in numba.prange(n):
245 | row_max = -np.inf
246 | for j in range(n):
247 | x = grid[i]
248 | y = grid[j]
249 | z = np.cos(x**2 + y**2) / (1 + x**2 + y**2)
250 | if z > row_max:
251 | row_max = z
252 | row_maxes[i] = row_max
253 | return np.max(row_maxes)
254 | ```
255 |
256 | Now the code block that `for i in numba.prange(n)` acts over is independent
257 | across `i`.
258 |
259 | Each thread writes to a separate element of the array `row_maxes` and
260 | the parallelization is safe.
261 |
262 | ```{code-cell} ipython3
263 | z_max_parallel = compute_max_numba_parallel(grid)
264 | print(f"Numba result: {z_max_parallel:.6f}")
265 | ```
266 |
267 | Here's the timing.
268 |
269 | ```{code-cell} ipython3
270 | with qe.Timer(precision=8):
271 | compute_max_numba_parallel(grid)
272 | ```
273 |
274 | If you have multiple cores, you should see at least some benefits from
275 | parallelization here.
276 |
277 | For more powerful machines and larger grid sizes, parallelization can generate
278 | major speed gains, even on the CPU.
279 |
280 |
281 | ### Vectorized code with JAX
282 |
283 | On the surface, vectorized code in JAX is similar to NumPy code.
284 |
285 | But there are also some differences, which we highlight here.
286 |
287 | Let's start with the function.
288 |
289 |
290 | ```{code-cell} ipython3
291 | @jax.jit
292 | def f(x, y):
293 | return jnp.cos(x**2 + y**2) / (1 + x**2 + y**2)
294 |
295 | ```
296 |
297 | As with NumPy, to get the right shape and the correct nested `for` loop
298 | calculation, we can use a `meshgrid` operation designed for this purpose:
299 |
300 | ```{code-cell} ipython3
301 | grid = jnp.linspace(-3, 3, 3_000)
302 | x_mesh, y_mesh = np.meshgrid(grid, grid)
303 |
304 | with qe.Timer(precision=8):
305 | z_max = jnp.max(f(x_mesh, y_mesh))
306 | z_max.block_until_ready()
307 |
308 | print(f"Plain vanilla JAX result: {z_max:.6f}")
309 | ```
310 |
311 | Let's run again to eliminate compile time.
312 |
313 | ```{code-cell} ipython3
314 | with qe.Timer(precision=8):
315 | z_max = jnp.max(f(x_mesh, y_mesh))
316 | z_max.block_until_ready()
317 | ```
318 |
319 | Once compiled, JAX is significantly faster than NumPy due to GPU acceleration.
320 |
321 | The compilation overhead is a one-time cost that pays off when the function is called repeatedly.
322 |
323 |
324 | ### JAX plus vmap
325 |
326 | There is one problem with both the NumPy code and the JAX code:
327 |
328 | While the flat arrays are low-memory
329 |
330 | ```{code-cell} ipython3
331 | grid.nbytes
332 | ```
333 |
334 | the mesh grids are memory intensive
335 |
336 | ```{code-cell} ipython3
337 | x_mesh.nbytes + y_mesh.nbytes
338 | ```
339 |
340 | This extra memory usage can be a big problem in actual research calculations.
341 |
342 | Fortunately, JAX admits a different approach
343 | using [jax.vmap](https://docs.jax.dev/en/latest/_autosummary/jax.vmap.html).
344 |
345 | #### Version 1
346 |
347 | Here's one way we can apply `vmap`.
348 |
349 | ```{code-cell} ipython3
350 | # Set up f to compute f(x, y) at every x for any given y
351 | f_vec_x = lambda y: f(grid, y)
352 | # Create a second function that vectorizes this operation over all y
353 | f_vec = jax.vmap(f_vec_x)
354 | ```
355 |
356 | Now `f_vec` will compute `f(x,y)` at every `x,y` when called with the flat array `grid`.
357 |
358 | Let's see the timing:
359 |
360 | ```{code-cell} ipython3
361 | with qe.Timer(precision=8):
362 | z_max = jnp.max(f_vec(grid))
363 | z_max.block_until_ready()
364 |
365 | print(f"JAX vmap v1 result: {z_max:.6f}")
366 | ```
367 |
368 | ```{code-cell} ipython3
369 | with qe.Timer(precision=8):
370 | z_max = jnp.max(f_vec(grid))
371 | z_max.block_until_ready()
372 | ```
373 |
374 | By avoiding the large input arrays `x_mesh` and `y_mesh`, this `vmap` version uses far less memory.
375 |
376 | When run on a CPU, its runtime is similar to that of the meshgrid version.
377 |
378 | When run on a GPU, it is usually significantly faster.
379 |
380 | In fact, using `vmap` has another advantage: It allows us to break vectorization up into stages.
381 |
382 | This leads to code that is often easier to comprehend than traditional vectorized code.
383 |
384 | We will investigate these ideas more when we tackle larger problems.
385 |
386 |
387 | ### vmap version 2
388 |
389 | We can be still more memory efficient using vmap.
390 |
391 | While we avoid large input arrays in the preceding version,
392 | we still create the large output array `f(x,y)` before we compute the max.
393 |
394 | Let's try a slightly different approach that takes the max to the inside.
395 |
396 | Because of this change, we never compute the two-dimensional array `f(x,y)`.
397 |
398 | ```{code-cell} ipython3
399 | @jax.jit
400 | def compute_max_vmap_v2(grid):
401 | # Construct a function that takes the max along each row
402 | f_vec_x_max = lambda y: jnp.max(f(grid, y))
403 | # Vectorize the function so we can call on all rows simultaneously
404 | f_vec_max = jax.vmap(f_vec_x_max)
405 | # Call the vectorized function and take the max
406 | return jnp.max(f_vec_max(grid))
407 | ```
408 |
409 | Here
410 |
411 | * `f_vec_x_max` computes the max along any given row
412 | * `f_vec_max` is a vectorized version that can compute the max of all rows in parallel.
413 |
414 | We apply this function to all rows and then take the max of the row maxes.
415 |
416 | Let's try it.
417 |
418 | ```{code-cell} ipython3
419 | with qe.Timer(precision=8):
420 | z_max = compute_max_vmap_v2(grid).block_until_ready()
421 |
422 | print(f"JAX vmap v1 result: {z_max:.6f}")
423 | ```
424 |
425 | Let's run it again to eliminate compilation time:
426 |
427 | ```{code-cell} ipython3
428 | with qe.Timer(precision=8):
429 | z_max = compute_max_vmap_v2(grid).block_until_ready()
430 | ```
431 |
432 | If you are running this on a GPU, as we are, you should see another nontrivial speed gain.
433 |
434 |
435 | ### Summary
436 |
437 | In our view, JAX is the winner for vectorized operations.
438 |
439 | It dominates NumPy both in terms of speed (via JIT-compilation and
440 | parallelization) and memory efficiency (via vmap).
441 |
442 | Moreover, the `vmap` approach can sometimes lead to significantly clearer code.
443 |
444 | While Numba is impressive, the beauty of JAX is that, with fully vectorized
445 | operations, we can run exactly the same code on machines with hardware
446 | accelerators and reap all the benefits without extra effort.
447 |
448 | Moreover, JAX already knows how to effectively parallelize many common array
449 | operations, which is key to fast execution.
450 |
451 | For most cases encountered in economics, econometrics, and finance, it is
452 | far better to hand over to the JAX compiler for efficient parallelization than to
453 | try to hand code these routines ourselves.
454 |
455 |
456 | ## Sequential operations
457 |
458 | Some operations are inherently sequential -- and hence difficult or impossible
459 | to vectorize.
460 |
461 | In this case NumPy is a poor option and we are left with the choice of Numba or
462 | JAX.
463 |
464 | To compare these choices, we will revisit the problem of iterating on the
465 | quadratic map that we saw in our {doc}`Numba lecture `.
466 |
467 |
468 | ### Numba Version
469 |
470 | Here's the Numba version.
471 |
472 | ```{code-cell} ipython3
473 | @numba.jit
474 | def qm(x0, n, α=4.0):
475 | x = np.empty(n+1)
476 | x[0] = x0
477 | for t in range(n):
478 | x[t+1] = α * x[t] * (1 - x[t])
479 | return x
480 | ```
481 |
482 | Let's generate a time series of length 10,000,000 and time the execution:
483 |
484 | ```{code-cell} ipython3
485 | n = 10_000_000
486 |
487 | with qe.Timer(precision=8):
488 | x = qm(0.1, n)
489 | ```
490 |
491 | Let's run it again to eliminate compilation time:
492 |
493 | ```{code-cell} ipython3
494 | with qe.Timer(precision=8):
495 | x = qm(0.1, n)
496 | ```
497 |
498 | Numba handles this sequential operation very efficiently.
499 |
500 | Notice that the second run is significantly faster after JIT compilation completes.
501 |
502 | Numba's compilation is typically quite fast, and the resulting code performance is excellent for sequential operations like this one.
503 |
504 | ### JAX Version
505 |
506 | Now let's create a JAX version using `lax.scan`:
507 |
508 | (We'll hold `n` static because it affects array size and hence JAX wants to specialize on its value in the compiled code.)
509 |
510 | ```{code-cell} ipython3
511 | from jax import lax
512 | from functools import partial
513 |
514 | cpu = jax.devices("cpu")[0]
515 |
516 | @partial(jax.jit, static_argnums=(1,), device=cpu)
517 | def qm_jax(x0, n, α=4.0):
518 | def update(x, t):
519 | x_new = α * x * (1 - x)
520 | return x_new, x_new
521 |
522 | _, x = lax.scan(update, x0, jnp.arange(n))
523 | return jnp.concatenate([jnp.array([x0]), x])
524 | ```
525 |
526 | This code is not easy to read but, in essence, `lax.scan` repeatedly calls `update` and accumulates the returns `x_new` into an array.
527 |
528 | ```{note}
529 | Sharp readers will notice that we specify `device=cpu` in the `jax.jit` decorator.
530 |
531 | The computation consists of many small sequential operations, leaving little
532 | opportunity for the GPU to exploit parallelism.
533 |
534 | As a result, kernel-launch overhead tends to dominate on the GPU, making the CPU
535 | a better fit for this workload.
536 |
537 | Curious readers can try removing this option to see how performance changes.
538 | ```
539 |
540 | Let's time it with the same parameters:
541 |
542 | ```{code-cell} ipython3
543 | with qe.Timer(precision=8):
544 | x_jax = qm_jax(0.1, n).block_until_ready()
545 | ```
546 |
547 | Let's run it again to eliminate compilation overhead:
548 |
549 | ```{code-cell} ipython3
550 | with qe.Timer(precision=8):
551 | x_jax = qm_jax(0.1, n).block_until_ready()
552 | ```
553 |
554 | JAX is also quite efficient for this sequential operation.
555 |
556 | Both JAX and Numba deliver strong performance after compilation, with Numba
557 | typically (but not always) offering slightly better speeds on purely sequential
558 | operations.
559 |
560 |
561 | ### Summary
562 |
563 | While both Numba and JAX deliver strong performance for sequential operations,
564 | *there are significant differences in code readability and ease of use*.
565 |
566 | The Numba version is straightforward and natural to read: we simply allocate an
567 | array and fill it element by element using a standard Python loop.
568 |
569 | This is exactly how most programmers think about the algorithm.
570 |
571 | The JAX version, on the other hand, requires using `lax.scan`, which is significantly less intuitive.
572 |
573 | Additionally, JAX's immutable arrays mean we cannot simply update array elements in place, making it hard to directly replicate the algorithm used by Numba.
574 |
575 | For this type of sequential operation, Numba is the clear winner in terms of
576 | code clarity and ease of implementation, as well as high performance.
577 |
578 |
--------------------------------------------------------------------------------
/lectures/about_py.md:
--------------------------------------------------------------------------------
1 | ---
2 | jupytext:
3 | text_representation:
4 | extension: .md
5 | format_name: myst
6 | kernelspec:
7 | display_name: Python 3
8 | language: python
9 | name: python3
10 | ---
11 |
12 | (about_py)=
13 | ```{raw} jupyter
14 |
19 | ```
20 |
21 | ```{index} single: python
22 | ```
23 |
24 | # About These Lectures
25 |
26 | ```{epigraph}
27 | "Python has gotten sufficiently weapons grade that we don’t descend into R
28 | anymore. Sorry, R people. I used to be one of you but we no longer descend
29 | into R." -- Chris Wiggins
30 | ```
31 |
32 | ## Overview
33 |
34 | This lecture series will teach you to use Python for scientific computing, with
35 | a focus on economics and finance.
36 |
37 | The series is aimed at Python novices, although experienced users will also find
38 | useful content in later lectures.
39 |
40 | In this lecture we will
41 |
42 | * introduce Python,
43 | * showcase some of its abilities,
44 | * explain why Python is our favorite language for scientific computing, and
45 | * point you to the next steps.
46 |
47 | You do **not** need to understand everything you see in this lecture -- we will work through the details slowly later in the lecture series.
48 |
49 |
50 | ### Can't I Just Use LLMs?
51 |
52 | No!
53 |
54 | Of course it's tempting to think that in the age of AI we don't need to learn how to code.
55 |
56 | And yes, we like to be lazy too sometimes.
57 |
58 | In addition, we agree that AIs are outstanding productivity tools for coders.
59 |
60 | But AIs cannot reliably solve new problems that they haven't seen before.
61 |
62 | You will need to be the architect and the supervisor -- and for these tasks you need to
63 | be able to read, write, and understand computer code.
64 |
65 | Having said that, a good LLM is a useful companion for these lectures -- try copy-pasting some
66 | code from this series and asking for an explanation.
67 |
68 |
69 | ### Isn't MATLAB Better?
70 |
71 | No, no, and one hundred times no.
72 |
73 | Nirvana was great (and Soundgarden [was better](https://www.youtube.com/watch?v=3mbBbFH9fAg&list=RD3mbBbFH9fAg)) but
74 | it's time to move on from the '90s.
75 |
76 | For most modern problems, Python's scientific libraries are now far in advance of MATLAB's capabilities.
77 |
78 | This is particularly the case in fast-growing fields such as deep learning and reinforcement learning.
79 |
80 | Moreover, all major LLMs are more proficient at writing Python code than MATLAB
81 | code.
82 |
83 | We will discuss relative merits of Python's libraries throughout this lecture
84 | series, as well as in our later series on [JAX](https://jax.quantecon.org/intro.html).
85 |
86 |
87 |
88 | ## Introducing Python
89 |
90 | [Python](https://www.python.org) is a general-purpose programming language conceived in 1989 by [Guido van Rossum](https://en.wikipedia.org/wiki/Guido_van_Rossum).
91 |
92 | Python is free and [open source](https://en.wikipedia.org/wiki/Open_source), with development coordinated through the [Python Software Foundation](https://www.python.org/psf-landing/).
93 |
94 | This is important because it
95 |
96 | * saves us money,
97 | * means that Python is controlled by the community of users rather than a for-profit corporation, and
98 | * encourages reproducibility and [open science](https://en.wikipedia.org/wiki/Open_science).
99 |
100 |
101 | ### Common Uses
102 |
103 | {index}`Python ` is a general-purpose language used
104 | in almost all application domains, including
105 |
106 | * AI and computer science
107 | * other scientific computing
108 | * communication
109 | * web development
110 | * CGI and graphical user interfaces
111 | * game development
112 | * resource planning
113 | * multimedia
114 | * etc.
115 |
116 | It is used and supported extensively by large tech firms including
117 |
118 | * [Google](https://www.google.com/)
119 | * [OpenAI](https://openai.com/)
120 | * [Netflix](https://www.netflix.com/)
121 | * [Meta](https://opensource.fb.com/)
122 | * [Amazon](https://www.amazon.com/)
123 | * [Reddit](https://www.reddit.com/)
124 | * etc.
125 |
126 |
127 | ### Relative Popularity
128 |
129 | Python is one of the most -- if not the most -- [popular programming languages](https://www.tiobe.com/tiobe-index/).
130 |
131 | Python libraries like [pandas](https://pandas.pydata.org/) and [Polars](https://pola.rs/) are replacing familiar tools like Excel and VBA as an essential skill in the fields of finance and banking.
132 |
133 | Moreover, Python is extremely popular within the scientific community -- especially those connected to AI
134 |
135 | For example, the following chart from Stack Overflow Trends shows how the
136 | popularity of a single Python deep learning library
137 | ([PyTorch](https://pytorch.org/)) has grown over the last few years.
138 |
139 |
140 | ```{figure} /_static/lecture_specific/about_py/pytorch_vs_matlab.png
141 | ```
142 | Pytorch is just one of several Python libraries for deep learning and AI.
143 |
144 |
145 |
146 | ### Features
147 |
148 | Python is a [high-level
149 | language](https://en.wikipedia.org/wiki/High-level_programming_language), which
150 | means it is relatively easy to read, write and debug.
151 |
152 | It has a relatively small core language that is easy to learn.
153 |
154 | This core is supported by many libraries, which can be studied as required.
155 |
156 | Python is flexible and pragmatic, supporting multiple programming styles (procedural, object-oriented, functional, etc.).
157 |
158 |
159 | ### Syntax and Design
160 |
161 | ```{index} single: Python; syntax and design
162 | ```
163 |
164 | One reason for Python's popularity is its simple and elegant design.
165 |
166 | To get a feeling for this, let's look at an example.
167 |
168 | The code below is written in [Java](https://en.wikipedia.org/wiki/Java_(programming_language)) rather than Python.
169 |
170 | You do **not** need to read and understand this code!
171 |
172 |
173 | ```{code-block} java
174 |
175 | import java.io.BufferedReader;
176 | import java.io.FileReader;
177 | import java.io.IOException;
178 |
179 | public class CSVReader {
180 | public static void main(String[] args) {
181 | String filePath = "data.csv";
182 | String line;
183 | String splitBy = ",";
184 | int columnIndex = 1;
185 | double sum = 0;
186 | int count = 0;
187 |
188 | try (BufferedReader br = new BufferedReader(new FileReader(filePath))) {
189 | while ((line = br.readLine()) != null) {
190 | String[] values = line.split(splitBy);
191 | if (values.length > columnIndex) {
192 | try {
193 | double value = Double.parseDouble(
194 | values[columnIndex]
195 | );
196 | sum += value;
197 | count++;
198 | } catch (NumberFormatException e) {
199 | System.out.println(
200 | "Skipping non-numeric value: " +
201 | values[columnIndex]
202 | );
203 | }
204 | }
205 | }
206 | } catch (IOException e) {
207 | e.printStackTrace();
208 | }
209 |
210 | if (count > 0) {
211 | double average = sum / count;
212 | System.out.println(
213 | "Average of the second column: " + average
214 | );
215 | } else {
216 | System.out.println(
217 | "No valid numeric data found in the second column."
218 | );
219 | }
220 | }
221 | }
222 |
223 | ```
224 |
225 | This Java code opens an imaginary file called `data.csv` and computes the mean
226 | of the values in the second column.
227 |
228 | Here's Python code that does the same thing.
229 |
230 | Even if you don't yet know Python, you can see that the code is far simpler and easier to read.
231 |
232 | ```{code-cell} python3
233 | :tags: [skip-execution]
234 |
235 | import csv
236 |
237 | total, count = 0, 0
238 | with open('data.csv', mode='r') as file:
239 | reader = csv.reader(file)
240 | for row in reader:
241 | try:
242 | total += float(row[1])
243 | count += 1
244 | except (ValueError, IndexError):
245 | pass
246 | print(f"Average: {total / count if count else 'No valid data'}")
247 |
248 | ```
249 |
250 |
251 |
252 | ### The AI Connection
253 |
254 | AI is in the process of taking over many tasks currently performed by humans,
255 | just as other forms of machinery have done over the past few centuries.
256 |
257 | Moreover, Python is playing a huge role in the advance of AI and machine learning.
258 |
259 | This means that tech firms are pouring money into development of extremely
260 | powerful Python libraries.
261 |
262 | Even if you don't plan to work on AI and machine learning, you can benefit from
263 | learning to use some of these libraries for your own projects in economics,
264 | finance and other fields of science.
265 |
266 | These lectures will explain how.
267 |
268 |
269 | ## Scientific Programming with Python
270 |
271 | ```{index} single: scientific programming
272 | ```
273 |
274 | We have already discussed the importance of Python for AI, machine learning and data science
275 |
276 | Python is also one of the dominant players in
277 |
278 | * astronomy
279 | * chemistry
280 | * computational biology
281 | * meteorology
282 | * natural language processing
283 | * etc.
284 |
285 | Use of Python is also rising in economics, finance, and adjacent fields like
286 | operations research -- which were previously dominated by MATLAB / Excel / STATA / C / Fortran.
287 |
288 | This section briefly showcases some examples of Python for general scientific programming.
289 |
290 |
291 | ### NumPy
292 |
293 | ```{index} single: scientific programming; numeric
294 | ```
295 |
296 | One of the most important parts of scientific computing is working with data.
297 |
298 | Data is often stored in matrices, vectors and arrays.
299 |
300 | We can create a simple array of numbers with pure Python as follows:
301 |
302 | ```{code-cell} python3
303 | a = [-3.14, 0, 3.14] # A Python list
304 | a
305 | ```
306 |
307 | This array is very small so it's fine to work with pure Python.
308 |
309 | But when we want to work with larger arrays in real programs we need more efficiency and more tools.
310 |
311 | For this we need to use libraries for working with arrays.
312 |
313 | For Python, the most important matrix and array processing library is
314 | [NumPy](https://numpy.org/) library.
315 |
316 | For example, let's build a NumPy array with 100 elements
317 |
318 | ```{code-cell} python3
319 | import numpy as np # Load the library
320 |
321 | a = np.linspace(-np.pi, np.pi, 100) # Create even grid from -π to π
322 | a
323 | ```
324 |
325 | Now let's transform this array by applying functions to it.
326 |
327 | ```{code-cell} python3
328 | b = np.cos(a) # Apply cosine to each element of a
329 | c = np.sin(a) # Apply sin to each element of a
330 | ```
331 |
332 | Now we can easily take the inner product of `b` and `c`.
333 |
334 | ```{code-cell} python3
335 | b @ c
336 | ```
337 |
338 | We can also do many other tasks, like
339 |
340 | * compute the mean and variance of arrays
341 | * build matrices and solve linear systems
342 | * generate random arrays for simulation, etc.
343 |
344 | We will discuss the details later in the lecture series, where we cover NumPy in depth.
345 |
346 |
347 | ### NumPy Alternatives
348 |
349 | While NumPy is still the king of array processing in Python, there are now
350 | important competitors.
351 |
352 | Libraries such as [JAX](https://github.com/jax-ml/jax), [Pytorch](https://pytorch.org/), and [CuPy](https://cupy.dev/) also have
353 | built in array types and array operations that can be very fast and efficient.
354 |
355 | In fact these libraries are better at exploiting parallelization and fast hardware, as
356 | we'll explain later in this series.
357 |
358 | However, you should still learn NumPy first because
359 |
360 | * NumPy is simpler and provides a strong foundation, and
361 | * libraries like JAX directly extend NumPy functionality and hence are easier to
362 | learn when you already know NumPy.
363 |
364 | This lecture series will provide you with extensive background in NumPy.
365 |
366 | ### SciPy
367 |
368 | The [SciPy](https://scipy.org/) library is built on top of NumPy and provides additional functionality.
369 |
370 | (tuple_unpacking_example)=
371 | For example, let's calculate $\int_{-2}^2 \phi(z) dz$ where $\phi$ is the standard normal density.
372 |
373 | ```{code-cell} python3
374 | from scipy.stats import norm
375 | from scipy.integrate import quad
376 |
377 | ϕ = norm()
378 | value, error = quad(ϕ.pdf, -2, 2) # Integrate using Gaussian quadrature
379 | value
380 | ```
381 |
382 | SciPy includes many of the standard routines used in
383 |
384 | * [linear algebra](https://docs.scipy.org/doc/scipy/reference/linalg.html)
385 | * [integration](https://docs.scipy.org/doc/scipy/reference/integrate.html)
386 | * [interpolation](https://docs.scipy.org/doc/scipy/reference/interpolate.html)
387 | * [optimization](https://docs.scipy.org/doc/scipy/reference/optimize.html)
388 | * [distributions and statistical techniques](https://docs.scipy.org/doc/scipy/reference/stats.html)
389 | * [signal processing](https://docs.scipy.org/doc/scipy/reference/signal.html)
390 |
391 | See them all [here](https://docs.scipy.org/doc/scipy/reference/index.html).
392 |
393 | Later we'll discuss SciPy in more detail.
394 |
395 |
396 | ### Graphics
397 |
398 | ```{index} single: Matplotlib
399 | ```
400 |
401 | A major strength of Python is data visualization.
402 |
403 | The most popular and comprehensive Python library for creating figures and graphs is [Matplotlib](https://matplotlib.org/), with functionality including
404 |
405 | * plots, histograms, contour images, 3D graphs, bar charts etc.
406 | * output in many formats (PDF, PNG, EPS, etc.)
407 | * LaTeX integration
408 |
409 | Example 2D plot with embedded LaTeX annotations
410 |
411 | ```{figure} /_static/lecture_specific/about_py/qs.png
412 | :scale: 75
413 | ```
414 |
415 | Example contour plot
416 |
417 | ```{figure} /_static/lecture_specific/about_py/bn_density1.png
418 | :scale: 70
419 | ```
420 |
421 | Example 3D plot
422 |
423 | ```{figure} /_static/lecture_specific/about_py/career_vf.png
424 | ```
425 |
426 | More examples can be found in the [Matplotlib thumbnail gallery](https://matplotlib.org/stable/gallery/index.html).
427 |
428 | Other graphics libraries include
429 |
430 | * [Plotly](https://plotly.com/python/)
431 | * [seaborn](https://seaborn.pydata.org/) --- a high-level interface for matplotlib
432 | * [Altair](https://altair-viz.github.io/)
433 | * [Bokeh](https://docs.bokeh.org/en/latest/)
434 |
435 | You can visit the [Python Graph Gallery](https://python-graph-gallery.com/) for more example plots drawn using a variety of libraries.
436 |
437 |
438 | ### Networks and Graphs
439 |
440 | The study of [networks](https://networks.quantecon.org/) is becoming an important part of scientific work
441 | in economics, finance and other fields.
442 |
443 | For example, we are interesting in studying
444 |
445 | * production networks
446 | * networks of banks and financial institutions
447 | * friendship and social networks
448 | * etc.
449 |
450 | Python has many libraries for studying networks and graphs.
451 |
452 | ```{index} single: NetworkX
453 | ```
454 |
455 | One well-known example is [NetworkX](https://networkx.org/).
456 |
457 | Its features include, among many other things:
458 |
459 | * standard graph algorithms for analyzing networks
460 | * plotting routines
461 |
462 | Here's some example code that generates and plots a random graph, with node color determined by the shortest path length from a central node.
463 |
464 | ```{code-cell} ipython
465 | import networkx as nx
466 | import matplotlib.pyplot as plt
467 | np.random.seed(1234)
468 |
469 | # Generate a random graph
470 | p = dict((i, (np.random.uniform(0, 1), np.random.uniform(0, 1)))
471 | for i in range(200))
472 | g = nx.random_geometric_graph(200, 0.12, pos=p)
473 | pos = nx.get_node_attributes(g, 'pos')
474 |
475 | # Find node nearest the center point (0.5, 0.5)
476 | dists = [(x - 0.5)**2 + (y - 0.5)**2 for x, y in list(pos.values())]
477 | ncenter = np.argmin(dists)
478 |
479 | # Plot graph, coloring by path length from central node
480 | p = nx.single_source_shortest_path_length(g, ncenter)
481 | plt.figure()
482 | nx.draw_networkx_edges(g, pos, alpha=0.4)
483 | nx.draw_networkx_nodes(g,
484 | pos,
485 | nodelist=list(p.keys()),
486 | node_size=120, alpha=0.5,
487 | node_color=list(p.values()),
488 | cmap=plt.cm.jet_r)
489 | plt.show()
490 | ```
491 |
492 |
493 | ### Other Scientific Libraries
494 |
495 | As discussed above, there are literally thousands of scientific libraries for
496 | Python.
497 |
498 | Some are small and do very specific tasks.
499 |
500 | Others are huge in terms of lines of code and investment from coders and tech
501 | firms.
502 |
503 | Here's a short list of some important scientific libraries for Python not
504 | mentioned above.
505 |
506 | * [SymPy](https://www.sympy.org/) for symbolic algebra, including limits, derivatives and integrals
507 | * [statsmodels](https://www.statsmodels.org/) for statistical routines
508 | * [scikit-learn](https://scikit-learn.org/) for machine learning
509 | * [Keras](https://keras.io/) for machine learning
510 | * [Pyro](https://pyro.ai/) and [PyStan](https://pystan.readthedocs.io/en/latest/) for Bayesian data analysis
511 | * [GeoPandas](https://geopandas.org/en/stable/) for spatial data analysis
512 | * [Dask](https://docs.dask.org/en/stable/) for parallelization
513 | * [Numba](https://numba.pydata.org/) for making Python run at the same speed as native machine code
514 | * [CVXPY](https://www.cvxpy.org/) for convex optimization
515 | * [scikit-image](https://scikit-image.org/) and [OpenCV](https://opencv.org/) for processing and analyzing image data
516 | * [BeautifulSoup](https://www.crummy.com/software/BeautifulSoup/bs4/doc/) for extracting data from HTML and XML files
517 |
518 |
519 | In this lecture series we will learn how to use many of these libraries for
520 | scientific computing tasks in economics and finance.
521 |
522 |
523 |
--------------------------------------------------------------------------------
/lectures/functions.md:
--------------------------------------------------------------------------------
1 | ---
2 | jupytext:
3 | text_representation:
4 | extension: .md
5 | format_name: myst
6 | kernelspec:
7 | display_name: Python 3
8 | language: python
9 | name: python3
10 | ---
11 |
12 | (functions)=
13 | ```{raw} jupyter
14 |
19 | ```
20 |
21 | # Functions
22 |
23 | ```{index} single: Python; User-defined functions
24 | ```
25 |
26 | ## Overview
27 |
28 | Functions are an extremely useful construct provided by almost all programming.
29 |
30 | We have already met several functions, such as
31 |
32 | * the `sqrt()` function from NumPy and
33 | * the built-in `print()` function
34 |
35 | In this lecture we'll
36 |
37 | 1. treat functions systematically and cover syntax and use-cases, and
38 | 2. learn to do is build our own user-defined functions.
39 |
40 | We will use the following imports.
41 |
42 | ```{code-cell} ipython
43 | import numpy as np
44 | import matplotlib.pyplot as plt
45 | ```
46 |
47 | ## Function Basics
48 |
49 | A function is a named section of a program that implements a specific task.
50 |
51 | Many functions exist already and we can use them as is.
52 |
53 | First we review these functions and then discuss how we can build our own.
54 |
55 | ### Built-In Functions
56 |
57 | Python has a number of **built-in** functions that are available without `import`.
58 |
59 | We have already met some
60 |
61 | ```{code-cell} python3
62 | max(19, 20)
63 | ```
64 |
65 | ```{code-cell} python3
66 | print('foobar')
67 | ```
68 |
69 | ```{code-cell} python3
70 | str(22)
71 | ```
72 |
73 | ```{code-cell} python3
74 | type(22)
75 | ```
76 |
77 | The full list of Python built-ins is [here](https://docs.python.org/3/library/functions.html).
78 |
79 |
80 | ### Third Party Functions
81 |
82 | If the built-in functions don't cover what we need, we either need to import
83 | functions or create our own.
84 |
85 | Examples of importing and using functions were given in the {doc}`previous lecture `
86 |
87 | Here's another one, which tests whether a given year is a leap year:
88 |
89 | ```{code-cell} python3
90 | import calendar
91 | calendar.isleap(2024)
92 | ```
93 |
94 | ## Defining Functions
95 |
96 | In many instances it's useful to be able to define our own functions.
97 |
98 | Let's start by discussing how it's done.
99 |
100 | ### Basic Syntax
101 |
102 | Here's a very simple Python function, that implements the mathematical function $f(x) = 2 x + 1$
103 |
104 | ```{code-cell} python3
105 | def f(x):
106 | return 2 * x + 1
107 | ```
108 |
109 | Now that we've defined this function, let's *call* it and check whether it does what we expect:
110 |
111 | ```{code-cell} python3
112 | f(1)
113 | ```
114 |
115 | ```{code-cell} python3
116 | f(10)
117 | ```
118 |
119 | Here's a longer function, that computes the absolute value of a given number.
120 |
121 | (Such a function already exists as a built-in, but let's write our own for the
122 | exercise.)
123 |
124 | ```{code-cell} python3
125 | def new_abs_function(x):
126 | if x < 0:
127 | abs_value = -x
128 | else:
129 | abs_value = x
130 | return abs_value
131 | ```
132 |
133 | Let's review the syntax here.
134 |
135 | * `def` is a Python keyword used to start function definitions.
136 | * `def new_abs_function(x):` indicates that the function is called `new_abs_function` and that it has a single argument `x`.
137 | * The indented code is a code block called the *function body*.
138 | * The `return` keyword indicates that `abs_value` is the object that should be returned to the calling code.
139 |
140 | This whole function definition is read by the Python interpreter and stored in memory.
141 |
142 | Let's call it to check that it works:
143 |
144 | ```{code-cell} python3
145 | print(new_abs_function(3))
146 | print(new_abs_function(-3))
147 | ```
148 |
149 |
150 | Note that a function can have arbitrarily many `return` statements (including zero).
151 |
152 | Execution of the function terminates when the first return is hit, allowing
153 | code like the following example
154 |
155 | ```{code-cell} python3
156 | def f(x):
157 | if x < 0:
158 | return 'negative'
159 | return 'nonnegative'
160 | ```
161 |
162 | (Writing functions with multiple return statements is typically discouraged, as
163 | it can make logic hard to follow.)
164 |
165 | Functions without a return statement automatically return the special Python object `None`.
166 |
167 | (pos_args)=
168 | ### Keyword Arguments
169 |
170 | ```{index} single: Python; keyword arguments
171 | ```
172 |
173 | In a {ref}`previous lecture `, you came across the statement
174 |
175 | ```{code-block} python3
176 | :class: no-execute
177 |
178 | plt.plot(x, 'b-', label="white noise")
179 | ```
180 |
181 | In this call to Matplotlib's `plot` function, notice that the last argument is passed in `name=argument` syntax.
182 |
183 | This is called a *keyword argument*, with `label` being the keyword.
184 |
185 | Non-keyword arguments are called *positional arguments*, since their meaning
186 | is determined by order
187 |
188 | * `plot(x, 'b-')` differs from `plot('b-', x)`
189 |
190 | Keyword arguments are particularly useful when a function has a lot of arguments, in which case it's hard to remember the right order.
191 |
192 | You can adopt keyword arguments in user-defined functions with no difficulty.
193 |
194 | The next example illustrates the syntax
195 |
196 | ```{code-cell} python3
197 | def f(x, a=1, b=1):
198 | return a + b * x
199 | ```
200 |
201 | The keyword argument values we supplied in the definition of `f` become the default values
202 |
203 | ```{code-cell} python3
204 | f(2)
205 | ```
206 |
207 | They can be modified as follows
208 |
209 | ```{code-cell} python3
210 | f(2, a=4, b=5)
211 | ```
212 |
213 | ### The Flexibility of Python Functions
214 |
215 | As we discussed in the {ref}`previous lecture `, Python functions are very flexible.
216 |
217 | In particular
218 |
219 | * Any number of functions can be defined in a given file.
220 | * Functions can be (and often are) defined inside other functions.
221 | * Any object can be passed to a function as an argument, including other functions.
222 | * A function can return any kind of object, including functions.
223 |
224 | We will give examples of how straightforward it is to pass a function to
225 | a function in the following sections.
226 |
227 | ### One-Line Functions: `lambda`
228 |
229 | ```{index} single: Python; lambda functions
230 | ```
231 |
232 | The `lambda` keyword is used to create simple functions on one line.
233 |
234 | For example, the definitions
235 |
236 | ```{code-cell} python3
237 | def f(x):
238 | return x**3
239 | ```
240 |
241 | and
242 |
243 | ```{code-cell} python3
244 | f = lambda x: x**3
245 | ```
246 |
247 | are entirely equivalent.
248 |
249 | To see why `lambda` is useful, suppose that we want to calculate $\int_0^2 x^3 dx$ (and have forgotten our high-school calculus).
250 |
251 | The SciPy library has a function called `quad` that will do this calculation for us.
252 |
253 | The syntax of the `quad` function is `quad(f, a, b)` where `f` is a function and `a` and `b` are numbers.
254 |
255 | To create the function $f(x) = x^3$ we can use `lambda` as follows
256 |
257 | ```{code-cell} python3
258 | from scipy.integrate import quad
259 |
260 | quad(lambda x: x**3, 0, 2)
261 | ```
262 |
263 | Here the function created by `lambda` is said to be *anonymous* because it was never given a name.
264 |
265 |
266 | ### Why Write Functions?
267 |
268 | User-defined functions are important for improving the clarity of your code by
269 |
270 | * separating different strands of logic
271 | * facilitating code reuse
272 |
273 | (Writing the same thing twice is [almost always a bad idea](https://en.wikipedia.org/wiki/Don%27t_repeat_yourself))
274 |
275 | We will say more about this {doc}`later `.
276 |
277 | ## Applications
278 |
279 | ### Random Draws
280 |
281 | Consider again this code from the {doc}`previous lecture `
282 |
283 | ```{code-cell} python3
284 | ts_length = 100
285 | ϵ_values = [] # empty list
286 |
287 | for i in range(ts_length):
288 | e = np.random.randn()
289 | ϵ_values.append(e)
290 |
291 | plt.plot(ϵ_values)
292 | plt.show()
293 | ```
294 |
295 | We will break this program into two parts:
296 |
297 | 1. A user-defined function that generates a list of random variables.
298 | 1. The main part of the program that
299 | 1. calls this function to get data
300 | 1. plots the data
301 |
302 | This is accomplished in the next program
303 |
304 | (funcloopprog)=
305 | ```{code-cell} python3
306 | def generate_data(n):
307 | ϵ_values = []
308 | for i in range(n):
309 | e = np.random.randn()
310 | ϵ_values.append(e)
311 | return ϵ_values
312 |
313 | data = generate_data(100)
314 | plt.plot(data)
315 | plt.show()
316 | ```
317 |
318 | When the interpreter gets to the expression `generate_data(100)`, it executes the function body with `n` set equal to 100.
319 |
320 | The net result is that the name `data` is *bound* to the list `ϵ_values` returned by the function.
321 |
322 | ### Adding Conditions
323 |
324 | ```{index} single: Python; Conditions
325 | ```
326 |
327 | Our function `generate_data()` is rather limited.
328 |
329 | Let's make it slightly more useful by giving it the ability to return either standard normals or uniform random variables on $(0, 1)$ as required.
330 |
331 | This is achieved in the next piece of code.
332 |
333 | (funcloopprog2)=
334 | ```{code-cell} python3
335 | def generate_data(n, generator_type):
336 | ϵ_values = []
337 | for i in range(n):
338 | if generator_type == 'U':
339 | e = np.random.uniform(0, 1)
340 | else:
341 | e = np.random.randn()
342 | ϵ_values.append(e)
343 | return ϵ_values
344 |
345 | data = generate_data(100, 'U')
346 | plt.plot(data)
347 | plt.show()
348 | ```
349 |
350 | Hopefully, the syntax of the if/else clause is self-explanatory, with indentation again delimiting the extent of the code blocks.
351 |
352 | Notes
353 |
354 | * We are passing the argument `U` as a string, which is why we write it as `'U'`.
355 | * Notice that equality is tested with the `==` syntax, not `=`.
356 | * For example, the statement `a = 10` assigns the name `a` to the value `10`.
357 | * The expression `a == 10` evaluates to either `True` or `False`, depending on the value of `a`.
358 |
359 | Now, there are several ways that we can simplify the code above.
360 |
361 | For example, we can get rid of the conditionals all together by just passing the desired generator type *as a function*.
362 |
363 | To understand this, consider the following version.
364 |
365 | (test_program_6)=
366 | ```{code-cell} python3
367 | def generate_data(n, generator_type):
368 | ϵ_values = []
369 | for i in range(n):
370 | e = generator_type()
371 | ϵ_values.append(e)
372 | return ϵ_values
373 |
374 | data = generate_data(100, np.random.uniform)
375 | plt.plot(data)
376 | plt.show()
377 | ```
378 |
379 | Now, when we call the function `generate_data()`, we pass `np.random.uniform`
380 | as the second argument.
381 |
382 | This object is a *function*.
383 |
384 | When the function call `generate_data(100, np.random.uniform)` is executed, Python runs the function code block with `n` equal to 100 and the name `generator_type` "bound" to the function `np.random.uniform`.
385 |
386 | * While these lines are executed, the names `generator_type` and `np.random.uniform` are "synonyms", and can be used in identical ways.
387 |
388 | This principle works more generally---for example, consider the following piece of code
389 |
390 | ```{code-cell} python3
391 | max(7, 2, 4) # max() is a built-in Python function
392 | ```
393 |
394 | ```{code-cell} python3
395 | m = max
396 | m(7, 2, 4)
397 | ```
398 |
399 | Here we created another name for the built-in function `max()`, which could
400 | then be used in identical ways.
401 |
402 | In the context of our program, the ability to bind new names to functions
403 | means that there is no problem *passing a function as an argument to another
404 | function*---as we did above.
405 |
406 |
407 | (recursive_functions)=
408 | ## Recursive Function Calls (Advanced)
409 |
410 | ```{index} single: Python; Recursion
411 | ```
412 |
413 | This is an advanced topic that you should feel free to skip.
414 |
415 | At the same time, it's a neat idea that you should learn it at some stage of
416 | your programming career.
417 |
418 | Basically, a recursive function is a function that calls itself.
419 |
420 | For example, consider the problem of computing $x_t$ for some t when
421 |
422 | ```{math}
423 | :label: xseqdoub
424 |
425 | x_{t+1} = 2 x_t, \quad x_0 = 1
426 | ```
427 |
428 | Obviously the answer is $2^t$.
429 |
430 | We can compute this easily enough with a loop
431 |
432 | ```{code-cell} python3
433 | def x_loop(t):
434 | x = 1
435 | for i in range(t):
436 | x = 2 * x
437 | return x
438 | ```
439 |
440 | We can also use a recursive solution, as follows
441 |
442 | ```{code-cell} python3
443 | def x(t):
444 | if t == 0:
445 | return 1
446 | else:
447 | return 2 * x(t-1)
448 | ```
449 |
450 | What happens here is that each successive call uses it's own *frame* in the *stack*
451 |
452 | * a frame is where the local variables of a given function call are held
453 | * stack is memory used to process function calls
454 | * a First In Last Out (FILO) queue
455 |
456 | This example is somewhat contrived, since the first (iterative) solution would usually be preferred to the recursive solution.
457 |
458 | We'll meet less contrived applications of recursion later on.
459 |
460 |
461 | (factorial_exercise)=
462 | ## Exercises
463 |
464 | ```{exercise-start}
465 | :label: func_ex1
466 | ```
467 |
468 | Recall that $n!$ is read as "$n$ factorial" and defined as
469 | $n! = n \times (n - 1) \times \cdots \times 2 \times 1$.
470 |
471 | We will only consider $n$ as a positive integer here.
472 |
473 | There are functions to compute this in various modules, but let's
474 | write our own version as an exercise.
475 |
476 | In particular, write a function `factorial` such that `factorial(n)` returns $n!$
477 | for any positive integer $n$.
478 |
479 | ```{exercise-end}
480 | ```
481 |
482 |
483 | ```{solution-start} func_ex1
484 | :class: dropdown
485 | ```
486 |
487 | Here's one solution:
488 |
489 | ```{code-cell} python3
490 | def factorial(n):
491 | k = 1
492 | for i in range(n):
493 | k = k * (i + 1)
494 | return k
495 |
496 | factorial(4)
497 | ```
498 |
499 |
500 | ```{solution-end}
501 | ```
502 |
503 |
504 | ```{exercise-start}
505 | :label: func_ex2
506 | ```
507 |
508 | The [binomial random variable](https://en.wikipedia.org/wiki/Binomial_distribution) $Y \sim Bin(n, p)$ represents the number of successes in $n$ binary trials, where each trial succeeds with probability $p$.
509 |
510 | Without any import besides `from numpy.random import uniform`, write a function
511 | `binomial_rv` such that `binomial_rv(n, p)` generates one draw of $Y$.
512 |
513 | ```{hint}
514 | :class: dropdown
515 |
516 | If $U$ is uniform on $(0, 1)$ and $p \in (0,1)$, then the expression `U < p` evaluates to `True` with probability $p$.
517 | ```
518 |
519 | ```{exercise-end}
520 | ```
521 |
522 |
523 | ```{solution-start} func_ex2
524 | :class: dropdown
525 | ```
526 |
527 | Here is one solution:
528 |
529 | ```{code-cell} python3
530 | from numpy.random import uniform
531 |
532 | def binomial_rv(n, p):
533 | count = 0
534 | for i in range(n):
535 | U = uniform()
536 | if U < p:
537 | count = count + 1 # Or count += 1
538 | return count
539 |
540 | binomial_rv(10, 0.5)
541 | ```
542 |
543 | ```{solution-end}
544 | ```
545 |
546 |
547 | ```{exercise-start}
548 | :label: func_ex3
549 | ```
550 |
551 | First, write a function that returns one realization of the following random device
552 |
553 | 1. Flip an unbiased coin 10 times.
554 | 1. If a head occurs `k` or more times consecutively within this sequence at least once, pay one dollar.
555 | 1. If not, pay nothing.
556 |
557 | Second, write another function that does the same task except that the second rule of the above random device becomes
558 |
559 | - If a head occurs `k` or more times within this sequence, pay one dollar.
560 |
561 | Use no import besides `from numpy.random import uniform`.
562 |
563 | ```{exercise-end}
564 | ```
565 |
566 | ```{solution-start} func_ex3
567 | :class: dropdown
568 | ```
569 |
570 | Here's a function for the first random device.
571 |
572 |
573 |
574 |
575 | ```{code-cell} python3
576 | from numpy.random import uniform
577 |
578 | def draw(k): # pays if k consecutive successes in a sequence
579 |
580 | payoff = 0
581 | count = 0
582 |
583 | for i in range(10):
584 | U = uniform()
585 | count = count + 1 if U < 0.5 else 0
586 | print(count) # print counts for clarity
587 | if count == k:
588 | payoff = 1
589 |
590 | return payoff
591 |
592 | draw(3)
593 | ```
594 |
595 | Here's another function for the second random device.
596 |
597 | ```{code-cell} python3
598 | def draw_new(k): # pays if k successes in a sequence
599 |
600 | payoff = 0
601 | count = 0
602 |
603 | for i in range(10):
604 | U = uniform()
605 | count = count + ( 1 if U < 0.5 else 0 )
606 | print(count)
607 | if count == k:
608 | payoff = 1
609 |
610 | return payoff
611 |
612 | draw_new(3)
613 | ```
614 |
615 | ```{solution-end}
616 | ```
617 |
618 |
619 | ## Advanced Exercises
620 |
621 | In the following exercises, we will write recursive functions together.
622 |
623 |
624 | ```{exercise-start}
625 | :label: func_ex4
626 | ```
627 |
628 | The Fibonacci numbers are defined by
629 |
630 | ```{math}
631 | :label: fib
632 |
633 | x_{t+1} = x_t + x_{t-1}, \quad x_0 = 0, \; x_1 = 1
634 | ```
635 |
636 | The first few numbers in the sequence are $0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55$.
637 |
638 | Write a function to recursively compute the $t$-th Fibonacci number for any $t$.
639 |
640 | ```{exercise-end}
641 | ```
642 |
643 | ```{solution-start} func_ex4
644 | :class: dropdown
645 | ```
646 |
647 | Here's the standard solution
648 |
649 | ```{code-cell} python3
650 | def x(t):
651 | if t == 0:
652 | return 0
653 | if t == 1:
654 | return 1
655 | else:
656 | return x(t-1) + x(t-2)
657 | ```
658 |
659 | Let's test it
660 |
661 | ```{code-cell} python3
662 | print([x(i) for i in range(10)])
663 | ```
664 |
665 | ```{solution-end}
666 | ```
667 |
668 | ```{exercise-start}
669 | :label: func_ex5
670 | ```
671 |
672 | Rewrite the function `factorial()` in from [Exercise 1](factorial_exercise) using recursion.
673 |
674 | ```{exercise-end}
675 | ```
676 |
677 | ```{solution-start} func_ex5
678 | :class: dropdown
679 | ```
680 |
681 | Here's the standard solution
682 |
683 | ```{code-cell} python3
684 | def recursion_factorial(n):
685 | if n == 1:
686 | return n
687 | else:
688 | return n * recursion_factorial(n-1)
689 | ```
690 |
691 | Let's test it
692 |
693 | ```{code-cell} python3
694 | print([recursion_factorial(i) for i in range(1, 10)])
695 | ```
696 |
697 | ```{solution-end}
698 | ```
699 |
--------------------------------------------------------------------------------