├── .gitignore
├── LICENSE
├── README.md
├── data
    ├── test.csv
    └── train.csv
├── environment.yml
├── img
    ├── decision-tree-titanic.png
    ├── fitting.png
    ├── george.jpg
    ├── gradient-descent.png
    ├── mlp.png
    ├── must-read-books.png
    ├── perceptron.jpg
    ├── rasbt-backprop.png
    └── tikz16.png
└── notebooks
    ├── 1-Instructor-deep-learning-from-scratch-pytorch.ipynb
    ├── 1-Student-deep-learning-from-scratch-pytorch.ipynb
    ├── 2-Instructor-deep-learning-from-scratch-pytorch.ipynb
    ├── 2-Student-deep-learning-from-scratch-pytorch.ipynb
    ├── 3-Instructor-deep-learning-from-scratch-pytorch.ipynb
    └── 3-Student-deep-learning-from-scratch-pytorch.ipynb


/.gitignore:
--------------------------------------------------------------------------------
  1 | # Byte-compiled / optimized / DLL files
  2 | __pycache__/
  3 | *.py[cod]
  4 | *$py.class
  5 | 
  6 | # C extensions
  7 | *.so
  8 | 
  9 | # Distribution / packaging
 10 | .Python
 11 | build/
 12 | develop-eggs/
 13 | dist/
 14 | downloads/
 15 | eggs/
 16 | .eggs/
 17 | lib/
 18 | lib64/
 19 | parts/
 20 | sdist/
 21 | var/
 22 | wheels/
 23 | pip-wheel-metadata/
 24 | share/python-wheels/
 25 | *.egg-info/
 26 | .installed.cfg
 27 | *.egg
 28 | MANIFEST
 29 | 
 30 | # PyInstaller
 31 | #  Usually these files are written by a python script from a template
 32 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 33 | *.manifest
 34 | *.spec
 35 | 
 36 | # Installer logs
 37 | pip-log.txt
 38 | pip-delete-this-directory.txt
 39 | 
 40 | # Unit test / coverage reports
 41 | htmlcov/
 42 | .tox/
 43 | .nox/
 44 | .coverage
 45 | .coverage.*
 46 | .cache
 47 | nosetests.xml
 48 | coverage.xml
 49 | *.cover
 50 | *.py,cover
 51 | .hypothesis/
 52 | .pytest_cache/
 53 | 
 54 | # Translations
 55 | *.mo
 56 | *.pot
 57 | 
 58 | # Django stuff:
 59 | *.log
 60 | local_settings.py
 61 | db.sqlite3
 62 | db.sqlite3-journal
 63 | 
 64 | # Flask stuff:
 65 | instance/
 66 | .webassets-cache
 67 | 
 68 | # Scrapy stuff:
 69 | .scrapy
 70 | 
 71 | # Sphinx documentation
 72 | docs/_build/
 73 | 
 74 | # PyBuilder
 75 | target/
 76 | 
 77 | # Jupyter Notebook
 78 | .ipynb_checkpoints
 79 | 
 80 | # IPython
 81 | profile_default/
 82 | ipython_config.py
 83 | 
 84 | # pyenv
 85 | .python-version
 86 | 
 87 | # pipenv
 88 | #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 89 | #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 90 | #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 91 | #   install all needed dependencies.
 92 | #Pipfile.lock
 93 | 
 94 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow
 95 | __pypackages__/
 96 | 
 97 | # Celery stuff
 98 | celerybeat-schedule
 99 | celerybeat.pid
100 | 
101 | # SageMath parsed files
102 | *.sage.py
103 | 
104 | # Environments
105 | .env
106 | .venv
107 | env/
108 | venv/
109 | ENV/
110 | env.bak/
111 | venv.bak/
112 | 
113 | # Spyder project settings
114 | .spyderproject
115 | .spyproject
116 | 
117 | # Rope project settings
118 | .ropeproject
119 | 
120 | # mkdocs documentation
121 | /site
122 | 
123 | # mypy
124 | .mypy_cache/
125 | .dmypy.json
126 | dmypy.json
127 | 
128 | # Pyre type checker
129 | .pyre/
130 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2020 Hugo Bowne-Anderson
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | **After the tutorial, please leave feedback for us [here](https://hugobowne.typeform.com/to/NYClbcaE)! We'll use this information to help improve the content and delivery of the material.**
 2 | 
 3 | # deep-learning-from-scratch-pytorch
 4 | Deep Learning from Scratch with PyTorch
 5 | 
 6 | 
 7 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hugobowne/deep-learning-from-scratch-pytorch/f61063c3ec3aca124fd90b6af604e8e4c7313604?urlpath=lab)
 8 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hugobowne/deep-learning-from-scratch-pytorch/blob/master/notebooks/1-Student-deep-learning-from-scratch-pytorch.ipynb)
 9 | 
10 | # description
11 | 
12 | This tutorial introduces deep learning (also called neural networks) to intermediate-level Pythonistas. The goal is for participants to develop a sound conceptual foundation for deep learning and to obtain some hands-on experience using an industry-ready toolkit. They will do this in two parts: (1) implementing a neural network classifier from scratch (following a quick review of NumPy array-based computing & supervised learning with Scikit-Learn); and (2) a tour of the PyTorch library building more sophisticated, industry-grade neural networks of varying depth & complexity.
13 | 
14 | 
15 | 
16 | ## Prerequisites
17 | 
18 | Participants will be expected to be comfortable using Python with some prior exposure to NumPy & Scikit-Learn. It would help if you knew
19 | 
20 | * programming fundamentals and the basics of the Python programming language (e.g., variables, for loops);
21 | * a bit about `pandas` and DataFrames;
22 | * a bit about Jupyter Notebooks;
23 | * your way around the terminal/shell.
24 | 
25 | 
26 | **However, we have always found that the most important and beneficial prerequisite is a will to learn new things so if you have this quality, you'll definitely get something out of this tutorial.**
27 | 
28 | Also, if you'd like to watch and **not** code along, you'll also have a great time and these notebooks will be downloadable afterwards also.
29 | 
30 | If you are going to code along and use the [Anaconda distribution](https://www.anaconda.com/download/) of Python 3 (see below), we ask that you install it before the session.
31 | 
32 | 
33 | ## Getting set up computationally
34 | 
35 | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/hugobowne/deep-learning-from-scratch-pytorch/master?urlpath=lab)
36 | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hugobowne/deep-learning-from-scratch-pytorch/blob/master/notebooks/1-Instructor-deep-learning-from-scratch-pytorch.ipynb)
37 | 
38 | The first option is to click on either the [Binder](https://mybinder.readthedocs.io/en/latest/) or [Colab](https://research.google.com/colaboratory/faq.html) badge above. These will spin up the necessary computational environment for you so you can write and execute Python code from the comfort of your browser. They are free services. Due to this, the resources  are not guaranteed, though they usually work well. If you want as close to a guarantee as possible, follow the instructions below to set up your computational environment locally (that is, on your own computer).
39 | 
40 | 
41 | 
42 | ### 1. Clone the repository
43 | 
44 | To get set up for this live coding session, clone this repository. You can do so by executing the following in your terminal:
45 | 
46 | ```
47 | git clone https://github.com/hugobowne/deep-learning-from-scratch-pytorch
48 | ```
49 | 
50 | Alternatively, you can download the zip file of the repository at the top of the main page of the repository. If you prefer not to use git or don't have experience with it, this a good option.
51 | 
52 | ### 2. Download Anaconda (if you haven't already)
53 | 
54 | If you do not already have the [Anaconda distribution](https://www.anaconda.com/download/) of Python 3, go get it (n.b., you can also do this w/out Anaconda using `pip` to install the required packages, however Anaconda is great for Data Science and I encourage you to use it).
55 | 
56 | ### 3. Create your conda environment for this session
57 | 
58 | Navigate to the relevant directory `deep-learning-from-scratch-pytorch` and install required packages in a new conda environment:
59 | 
60 | ```
61 | conda env create -f environment.yml
62 | ```
63 | 
64 | This will create a new environment called deep-learning-from-scratch-pytorch. To activate the environment on OSX/Linux, execute
65 | 
66 | ```
67 | source activate deep-learning-from-scratch-pytorch
68 | ```
69 | On Windows, execute
70 | 
71 | ```
72 | activate deep-learning-from-scratch-pytorch
73 | ```
74 | 
75 | 
76 | ### 4. Open your Jupyter notebook
77 | 
78 | In the terminal, execute `jupyter notebook`.
79 | 
80 | Then open the notebook `1-deep-learning-from-scratch-pytorch.ipynb` and we're ready to get coding. Enjoy.
81 | 
82 | 
83 | ### Code
84 | The code in this repository is released under the [MIT license](LICENSE). Read more at the [Open Source Initiative](https://opensource.org/licenses/MIT). All text remains the Intellectual Property of DataCamp. If you wish to reuse, adapt or remix, get in touch with me at hugo at datacamp com to request permission.
85 | 


--------------------------------------------------------------------------------
/data/test.csv:
--------------------------------------------------------------------------------
  1 | PassengerId,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
  2 | 892,3,"Kelly, Mr. James",male,34.5,0,0,330911,7.8292,,Q
  3 | 893,3,"Wilkes, Mrs. James (Ellen Needs)",female,47,1,0,363272,7,,S
  4 | 894,2,"Myles, Mr. Thomas Francis",male,62,0,0,240276,9.6875,,Q
  5 | 895,3,"Wirz, Mr. Albert",male,27,0,0,315154,8.6625,,S
  6 | 896,3,"Hirvonen, Mrs. Alexander (Helga E Lindqvist)",female,22,1,1,3101298,12.2875,,S
  7 | 897,3,"Svensson, Mr. Johan Cervin",male,14,0,0,7538,9.225,,S
  8 | 898,3,"Connolly, Miss. Kate",female,30,0,0,330972,7.6292,,Q
  9 | 899,2,"Caldwell, Mr. Albert Francis",male,26,1,1,248738,29,,S
 10 | 900,3,"Abrahim, Mrs. Joseph (Sophie Halaut Easu)",female,18,0,0,2657,7.2292,,C
 11 | 901,3,"Davies, Mr. John Samuel",male,21,2,0,A/4 48871,24.15,,S
 12 | 902,3,"Ilieff, Mr. Ylio",male,,0,0,349220,7.8958,,S
 13 | 903,1,"Jones, Mr. Charles Cresson",male,46,0,0,694,26,,S
 14 | 904,1,"Snyder, Mrs. John Pillsbury (Nelle Stevenson)",female,23,1,0,21228,82.2667,B45,S
 15 | 905,2,"Howard, Mr. Benjamin",male,63,1,0,24065,26,,S
 16 | 906,1,"Chaffee, Mrs. Herbert Fuller (Carrie Constance Toogood)",female,47,1,0,W.E.P. 5734,61.175,E31,S
 17 | 907,2,"del Carlo, Mrs. Sebastiano (Argenia Genovesi)",female,24,1,0,SC/PARIS 2167,27.7208,,C
 18 | 908,2,"Keane, Mr. Daniel",male,35,0,0,233734,12.35,,Q
 19 | 909,3,"Assaf, Mr. Gerios",male,21,0,0,2692,7.225,,C
 20 | 910,3,"Ilmakangas, Miss. Ida Livija",female,27,1,0,STON/O2. 3101270,7.925,,S
 21 | 911,3,"Assaf Khalil, Mrs. Mariana (Miriam"")""",female,45,0,0,2696,7.225,,C
 22 | 912,1,"Rothschild, Mr. Martin",male,55,1,0,PC 17603,59.4,,C
 23 | 913,3,"Olsen, Master. Artur Karl",male,9,0,1,C 17368,3.1708,,S
 24 | 914,1,"Flegenheim, Mrs. Alfred (Antoinette)",female,,0,0,PC 17598,31.6833,,S
 25 | 915,1,"Williams, Mr. Richard Norris II",male,21,0,1,PC 17597,61.3792,,C
 26 | 916,1,"Ryerson, Mrs. Arthur Larned (Emily Maria Borie)",female,48,1,3,PC 17608,262.375,B57 B59 B63 B66,C
 27 | 917,3,"Robins, Mr. Alexander A",male,50,1,0,A/5. 3337,14.5,,S
 28 | 918,1,"Ostby, Miss. Helene Ragnhild",female,22,0,1,113509,61.9792,B36,C
 29 | 919,3,"Daher, Mr. Shedid",male,22.5,0,0,2698,7.225,,C
 30 | 920,1,"Brady, Mr. John Bertram",male,41,0,0,113054,30.5,A21,S
 31 | 921,3,"Samaan, Mr. Elias",male,,2,0,2662,21.6792,,C
 32 | 922,2,"Louch, Mr. Charles Alexander",male,50,1,0,SC/AH 3085,26,,S
 33 | 923,2,"Jefferys, Mr. Clifford Thomas",male,24,2,0,C.A. 31029,31.5,,S
 34 | 924,3,"Dean, Mrs. Bertram (Eva Georgetta Light)",female,33,1,2,C.A. 2315,20.575,,S
 35 | 925,3,"Johnston, Mrs. Andrew G (Elizabeth Lily"" Watson)""",female,,1,2,W./C. 6607,23.45,,S
 36 | 926,1,"Mock, Mr. Philipp Edmund",male,30,1,0,13236,57.75,C78,C
 37 | 927,3,"Katavelas, Mr. Vassilios (Catavelas Vassilios"")""",male,18.5,0,0,2682,7.2292,,C
 38 | 928,3,"Roth, Miss. Sarah A",female,,0,0,342712,8.05,,S
 39 | 929,3,"Cacic, Miss. Manda",female,21,0,0,315087,8.6625,,S
 40 | 930,3,"Sap, Mr. Julius",male,25,0,0,345768,9.5,,S
 41 | 931,3,"Hee, Mr. Ling",male,,0,0,1601,56.4958,,S
 42 | 932,3,"Karun, Mr. Franz",male,39,0,1,349256,13.4167,,C
 43 | 933,1,"Franklin, Mr. Thomas Parham",male,,0,0,113778,26.55,D34,S
 44 | 934,3,"Goldsmith, Mr. Nathan",male,41,0,0,SOTON/O.Q. 3101263,7.85,,S
 45 | 935,2,"Corbett, Mrs. Walter H (Irene Colvin)",female,30,0,0,237249,13,,S
 46 | 936,1,"Kimball, Mrs. Edwin Nelson Jr (Gertrude Parsons)",female,45,1,0,11753,52.5542,D19,S
 47 | 937,3,"Peltomaki, Mr. Nikolai Johannes",male,25,0,0,STON/O 2. 3101291,7.925,,S
 48 | 938,1,"Chevre, Mr. Paul Romaine",male,45,0,0,PC 17594,29.7,A9,C
 49 | 939,3,"Shaughnessy, Mr. Patrick",male,,0,0,370374,7.75,,Q
 50 | 940,1,"Bucknell, Mrs. William Robert (Emma Eliza Ward)",female,60,0,0,11813,76.2917,D15,C
 51 | 941,3,"Coutts, Mrs. William (Winnie Minnie"" Treanor)""",female,36,0,2,C.A. 37671,15.9,,S
 52 | 942,1,"Smith, Mr. Lucien Philip",male,24,1,0,13695,60,C31,S
 53 | 943,2,"Pulbaum, Mr. Franz",male,27,0,0,SC/PARIS 2168,15.0333,,C
 54 | 944,2,"Hocking, Miss. Ellen Nellie""""",female,20,2,1,29105,23,,S
 55 | 945,1,"Fortune, Miss. Ethel Flora",female,28,3,2,19950,263,C23 C25 C27,S
 56 | 946,2,"Mangiavacchi, Mr. Serafino Emilio",male,,0,0,SC/A.3 2861,15.5792,,C
 57 | 947,3,"Rice, Master. Albert",male,10,4,1,382652,29.125,,Q
 58 | 948,3,"Cor, Mr. Bartol",male,35,0,0,349230,7.8958,,S
 59 | 949,3,"Abelseth, Mr. Olaus Jorgensen",male,25,0,0,348122,7.65,F G63,S
 60 | 950,3,"Davison, Mr. Thomas Henry",male,,1,0,386525,16.1,,S
 61 | 951,1,"Chaudanson, Miss. Victorine",female,36,0,0,PC 17608,262.375,B61,C
 62 | 952,3,"Dika, Mr. Mirko",male,17,0,0,349232,7.8958,,S
 63 | 953,2,"McCrae, Mr. Arthur Gordon",male,32,0,0,237216,13.5,,S
 64 | 954,3,"Bjorklund, Mr. Ernst Herbert",male,18,0,0,347090,7.75,,S
 65 | 955,3,"Bradley, Miss. Bridget Delia",female,22,0,0,334914,7.725,,Q
 66 | 956,1,"Ryerson, Master. John Borie",male,13,2,2,PC 17608,262.375,B57 B59 B63 B66,C
 67 | 957,2,"Corey, Mrs. Percy C (Mary Phyllis Elizabeth Miller)",female,,0,0,F.C.C. 13534,21,,S
 68 | 958,3,"Burns, Miss. Mary Delia",female,18,0,0,330963,7.8792,,Q
 69 | 959,1,"Moore, Mr. Clarence Bloomfield",male,47,0,0,113796,42.4,,S
 70 | 960,1,"Tucker, Mr. Gilbert Milligan Jr",male,31,0,0,2543,28.5375,C53,C
 71 | 961,1,"Fortune, Mrs. Mark (Mary McDougald)",female,60,1,4,19950,263,C23 C25 C27,S
 72 | 962,3,"Mulvihill, Miss. Bertha E",female,24,0,0,382653,7.75,,Q
 73 | 963,3,"Minkoff, Mr. Lazar",male,21,0,0,349211,7.8958,,S
 74 | 964,3,"Nieminen, Miss. Manta Josefina",female,29,0,0,3101297,7.925,,S
 75 | 965,1,"Ovies y Rodriguez, Mr. Servando",male,28.5,0,0,PC 17562,27.7208,D43,C
 76 | 966,1,"Geiger, Miss. Amalie",female,35,0,0,113503,211.5,C130,C
 77 | 967,1,"Keeping, Mr. Edwin",male,32.5,0,0,113503,211.5,C132,C
 78 | 968,3,"Miles, Mr. Frank",male,,0,0,359306,8.05,,S
 79 | 969,1,"Cornell, Mrs. Robert Clifford (Malvina Helen Lamson)",female,55,2,0,11770,25.7,C101,S
 80 | 970,2,"Aldworth, Mr. Charles Augustus",male,30,0,0,248744,13,,S
 81 | 971,3,"Doyle, Miss. Elizabeth",female,24,0,0,368702,7.75,,Q
 82 | 972,3,"Boulos, Master. Akar",male,6,1,1,2678,15.2458,,C
 83 | 973,1,"Straus, Mr. Isidor",male,67,1,0,PC 17483,221.7792,C55 C57,S
 84 | 974,1,"Case, Mr. Howard Brown",male,49,0,0,19924,26,,S
 85 | 975,3,"Demetri, Mr. Marinko",male,,0,0,349238,7.8958,,S
 86 | 976,2,"Lamb, Mr. John Joseph",male,,0,0,240261,10.7083,,Q
 87 | 977,3,"Khalil, Mr. Betros",male,,1,0,2660,14.4542,,C
 88 | 978,3,"Barry, Miss. Julia",female,27,0,0,330844,7.8792,,Q
 89 | 979,3,"Badman, Miss. Emily Louisa",female,18,0,0,A/4 31416,8.05,,S
 90 | 980,3,"O'Donoghue, Ms. Bridget",female,,0,0,364856,7.75,,Q
 91 | 981,2,"Wells, Master. Ralph Lester",male,2,1,1,29103,23,,S
 92 | 982,3,"Dyker, Mrs. Adolf Fredrik (Anna Elisabeth Judith Andersson)",female,22,1,0,347072,13.9,,S
 93 | 983,3,"Pedersen, Mr. Olaf",male,,0,0,345498,7.775,,S
 94 | 984,1,"Davidson, Mrs. Thornton (Orian Hays)",female,27,1,2,F.C. 12750,52,B71,S
 95 | 985,3,"Guest, Mr. Robert",male,,0,0,376563,8.05,,S
 96 | 986,1,"Birnbaum, Mr. Jakob",male,25,0,0,13905,26,,C
 97 | 987,3,"Tenglin, Mr. Gunnar Isidor",male,25,0,0,350033,7.7958,,S
 98 | 988,1,"Cavendish, Mrs. Tyrell William (Julia Florence Siegel)",female,76,1,0,19877,78.85,C46,S
 99 | 989,3,"Makinen, Mr. Kalle Edvard",male,29,0,0,STON/O 2. 3101268,7.925,,S
100 | 990,3,"Braf, Miss. Elin Ester Maria",female,20,0,0,347471,7.8542,,S
101 | 991,3,"Nancarrow, Mr. William Henry",male,33,0,0,A./5. 3338,8.05,,S
102 | 992,1,"Stengel, Mrs. Charles Emil Henry (Annie May Morris)",female,43,1,0,11778,55.4417,C116,C
103 | 993,2,"Weisz, Mr. Leopold",male,27,1,0,228414,26,,S
104 | 994,3,"Foley, Mr. William",male,,0,0,365235,7.75,,Q
105 | 995,3,"Johansson Palmquist, Mr. Oskar Leander",male,26,0,0,347070,7.775,,S
106 | 996,3,"Thomas, Mrs. Alexander (Thamine Thelma"")""",female,16,1,1,2625,8.5167,,C
107 | 997,3,"Holthen, Mr. Johan Martin",male,28,0,0,C 4001,22.525,,S
108 | 998,3,"Buckley, Mr. Daniel",male,21,0,0,330920,7.8208,,Q
109 | 999,3,"Ryan, Mr. Edward",male,,0,0,383162,7.75,,Q
110 | 1000,3,"Willer, Mr. Aaron (Abi Weller"")""",male,,0,0,3410,8.7125,,S
111 | 1001,2,"Swane, Mr. George",male,18.5,0,0,248734,13,F,S
112 | 1002,2,"Stanton, Mr. Samuel Ward",male,41,0,0,237734,15.0458,,C
113 | 1003,3,"Shine, Miss. Ellen Natalia",female,,0,0,330968,7.7792,,Q
114 | 1004,1,"Evans, Miss. Edith Corse",female,36,0,0,PC 17531,31.6792,A29,C
115 | 1005,3,"Buckley, Miss. Katherine",female,18.5,0,0,329944,7.2833,,Q
116 | 1006,1,"Straus, Mrs. Isidor (Rosalie Ida Blun)",female,63,1,0,PC 17483,221.7792,C55 C57,S
117 | 1007,3,"Chronopoulos, Mr. Demetrios",male,18,1,0,2680,14.4542,,C
118 | 1008,3,"Thomas, Mr. John",male,,0,0,2681,6.4375,,C
119 | 1009,3,"Sandstrom, Miss. Beatrice Irene",female,1,1,1,PP 9549,16.7,G6,S
120 | 1010,1,"Beattie, Mr. Thomson",male,36,0,0,13050,75.2417,C6,C
121 | 1011,2,"Chapman, Mrs. John Henry (Sara Elizabeth Lawry)",female,29,1,0,SC/AH 29037,26,,S
122 | 1012,2,"Watt, Miss. Bertha J",female,12,0,0,C.A. 33595,15.75,,S
123 | 1013,3,"Kiernan, Mr. John",male,,1,0,367227,7.75,,Q
124 | 1014,1,"Schabert, Mrs. Paul (Emma Mock)",female,35,1,0,13236,57.75,C28,C
125 | 1015,3,"Carver, Mr. Alfred John",male,28,0,0,392095,7.25,,S
126 | 1016,3,"Kennedy, Mr. John",male,,0,0,368783,7.75,,Q
127 | 1017,3,"Cribb, Miss. Laura Alice",female,17,0,1,371362,16.1,,S
128 | 1018,3,"Brobeck, Mr. Karl Rudolf",male,22,0,0,350045,7.7958,,S
129 | 1019,3,"McCoy, Miss. Alicia",female,,2,0,367226,23.25,,Q
130 | 1020,2,"Bowenur, Mr. Solomon",male,42,0,0,211535,13,,S
131 | 1021,3,"Petersen, Mr. Marius",male,24,0,0,342441,8.05,,S
132 | 1022,3,"Spinner, Mr. Henry John",male,32,0,0,STON/OQ. 369943,8.05,,S
133 | 1023,1,"Gracie, Col. Archibald IV",male,53,0,0,113780,28.5,C51,C
134 | 1024,3,"Lefebre, Mrs. Frank (Frances)",female,,0,4,4133,25.4667,,S
135 | 1025,3,"Thomas, Mr. Charles P",male,,1,0,2621,6.4375,,C
136 | 1026,3,"Dintcheff, Mr. Valtcho",male,43,0,0,349226,7.8958,,S
137 | 1027,3,"Carlsson, Mr. Carl Robert",male,24,0,0,350409,7.8542,,S
138 | 1028,3,"Zakarian, Mr. Mapriededer",male,26.5,0,0,2656,7.225,,C
139 | 1029,2,"Schmidt, Mr. August",male,26,0,0,248659,13,,S
140 | 1030,3,"Drapkin, Miss. Jennie",female,23,0,0,SOTON/OQ 392083,8.05,,S
141 | 1031,3,"Goodwin, Mr. Charles Frederick",male,40,1,6,CA 2144,46.9,,S
142 | 1032,3,"Goodwin, Miss. Jessie Allis",female,10,5,2,CA 2144,46.9,,S
143 | 1033,1,"Daniels, Miss. Sarah",female,33,0,0,113781,151.55,,S
144 | 1034,1,"Ryerson, Mr. Arthur Larned",male,61,1,3,PC 17608,262.375,B57 B59 B63 B66,C
145 | 1035,2,"Beauchamp, Mr. Henry James",male,28,0,0,244358,26,,S
146 | 1036,1,"Lindeberg-Lind, Mr. Erik Gustaf (Mr Edward Lingrey"")""",male,42,0,0,17475,26.55,,S
147 | 1037,3,"Vander Planke, Mr. Julius",male,31,3,0,345763,18,,S
148 | 1038,1,"Hilliard, Mr. Herbert Henry",male,,0,0,17463,51.8625,E46,S
149 | 1039,3,"Davies, Mr. Evan",male,22,0,0,SC/A4 23568,8.05,,S
150 | 1040,1,"Crafton, Mr. John Bertram",male,,0,0,113791,26.55,,S
151 | 1041,2,"Lahtinen, Rev. William",male,30,1,1,250651,26,,S
152 | 1042,1,"Earnshaw, Mrs. Boulton (Olive Potter)",female,23,0,1,11767,83.1583,C54,C
153 | 1043,3,"Matinoff, Mr. Nicola",male,,0,0,349255,7.8958,,C
154 | 1044,3,"Storey, Mr. Thomas",male,60.5,0,0,3701,,,S
155 | 1045,3,"Klasen, Mrs. (Hulda Kristina Eugenia Lofqvist)",female,36,0,2,350405,12.1833,,S
156 | 1046,3,"Asplund, Master. Filip Oscar",male,13,4,2,347077,31.3875,,S
157 | 1047,3,"Duquemin, Mr. Joseph",male,24,0,0,S.O./P.P. 752,7.55,,S
158 | 1048,1,"Bird, Miss. Ellen",female,29,0,0,PC 17483,221.7792,C97,S
159 | 1049,3,"Lundin, Miss. Olga Elida",female,23,0,0,347469,7.8542,,S
160 | 1050,1,"Borebank, Mr. John James",male,42,0,0,110489,26.55,D22,S
161 | 1051,3,"Peacock, Mrs. Benjamin (Edith Nile)",female,26,0,2,SOTON/O.Q. 3101315,13.775,,S
162 | 1052,3,"Smyth, Miss. Julia",female,,0,0,335432,7.7333,,Q
163 | 1053,3,"Touma, Master. Georges Youssef",male,7,1,1,2650,15.2458,,C
164 | 1054,2,"Wright, Miss. Marion",female,26,0,0,220844,13.5,,S
165 | 1055,3,"Pearce, Mr. Ernest",male,,0,0,343271,7,,S
166 | 1056,2,"Peruschitz, Rev. Joseph Maria",male,41,0,0,237393,13,,S
167 | 1057,3,"Kink-Heilmann, Mrs. Anton (Luise Heilmann)",female,26,1,1,315153,22.025,,S
168 | 1058,1,"Brandeis, Mr. Emil",male,48,0,0,PC 17591,50.4958,B10,C
169 | 1059,3,"Ford, Mr. Edward Watson",male,18,2,2,W./C. 6608,34.375,,S
170 | 1060,1,"Cassebeer, Mrs. Henry Arthur Jr (Eleanor Genevieve Fosdick)",female,,0,0,17770,27.7208,,C
171 | 1061,3,"Hellstrom, Miss. Hilda Maria",female,22,0,0,7548,8.9625,,S
172 | 1062,3,"Lithman, Mr. Simon",male,,0,0,S.O./P.P. 251,7.55,,S
173 | 1063,3,"Zakarian, Mr. Ortin",male,27,0,0,2670,7.225,,C
174 | 1064,3,"Dyker, Mr. Adolf Fredrik",male,23,1,0,347072,13.9,,S
175 | 1065,3,"Torfa, Mr. Assad",male,,0,0,2673,7.2292,,C
176 | 1066,3,"Asplund, Mr. Carl Oscar Vilhelm Gustafsson",male,40,1,5,347077,31.3875,,S
177 | 1067,2,"Brown, Miss. Edith Eileen",female,15,0,2,29750,39,,S
178 | 1068,2,"Sincock, Miss. Maude",female,20,0,0,C.A. 33112,36.75,,S
179 | 1069,1,"Stengel, Mr. Charles Emil Henry",male,54,1,0,11778,55.4417,C116,C
180 | 1070,2,"Becker, Mrs. Allen Oliver (Nellie E Baumgardner)",female,36,0,3,230136,39,F4,S
181 | 1071,1,"Compton, Mrs. Alexander Taylor (Mary Eliza Ingersoll)",female,64,0,2,PC 17756,83.1583,E45,C
182 | 1072,2,"McCrie, Mr. James Matthew",male,30,0,0,233478,13,,S
183 | 1073,1,"Compton, Mr. Alexander Taylor Jr",male,37,1,1,PC 17756,83.1583,E52,C
184 | 1074,1,"Marvin, Mrs. Daniel Warner (Mary Graham Carmichael Farquarson)",female,18,1,0,113773,53.1,D30,S
185 | 1075,3,"Lane, Mr. Patrick",male,,0,0,7935,7.75,,Q
186 | 1076,1,"Douglas, Mrs. Frederick Charles (Mary Helene Baxter)",female,27,1,1,PC 17558,247.5208,B58 B60,C
187 | 1077,2,"Maybery, Mr. Frank Hubert",male,40,0,0,239059,16,,S
188 | 1078,2,"Phillips, Miss. Alice Frances Louisa",female,21,0,1,S.O./P.P. 2,21,,S
189 | 1079,3,"Davies, Mr. Joseph",male,17,2,0,A/4 48873,8.05,,S
190 | 1080,3,"Sage, Miss. Ada",female,,8,2,CA. 2343,69.55,,S
191 | 1081,2,"Veal, Mr. James",male,40,0,0,28221,13,,S
192 | 1082,2,"Angle, Mr. William A",male,34,1,0,226875,26,,S
193 | 1083,1,"Salomon, Mr. Abraham L",male,,0,0,111163,26,,S
194 | 1084,3,"van Billiard, Master. Walter John",male,11.5,1,1,A/5. 851,14.5,,S
195 | 1085,2,"Lingane, Mr. John",male,61,0,0,235509,12.35,,Q
196 | 1086,2,"Drew, Master. Marshall Brines",male,8,0,2,28220,32.5,,S
197 | 1087,3,"Karlsson, Mr. Julius Konrad Eugen",male,33,0,0,347465,7.8542,,S
198 | 1088,1,"Spedden, Master. Robert Douglas",male,6,0,2,16966,134.5,E34,C
199 | 1089,3,"Nilsson, Miss. Berta Olivia",female,18,0,0,347066,7.775,,S
200 | 1090,2,"Baimbrigge, Mr. Charles Robert",male,23,0,0,C.A. 31030,10.5,,S
201 | 1091,3,"Rasmussen, Mrs. (Lena Jacobsen Solvang)",female,,0,0,65305,8.1125,,S
202 | 1092,3,"Murphy, Miss. Nora",female,,0,0,36568,15.5,,Q
203 | 1093,3,"Danbom, Master. Gilbert Sigvard Emanuel",male,0.33,0,2,347080,14.4,,S
204 | 1094,1,"Astor, Col. John Jacob",male,47,1,0,PC 17757,227.525,C62 C64,C
205 | 1095,2,"Quick, Miss. Winifred Vera",female,8,1,1,26360,26,,S
206 | 1096,2,"Andrew, Mr. Frank Thomas",male,25,0,0,C.A. 34050,10.5,,S
207 | 1097,1,"Omont, Mr. Alfred Fernand",male,,0,0,F.C. 12998,25.7417,,C
208 | 1098,3,"McGowan, Miss. Katherine",female,35,0,0,9232,7.75,,Q
209 | 1099,2,"Collett, Mr. Sidney C Stuart",male,24,0,0,28034,10.5,,S
210 | 1100,1,"Rosenbaum, Miss. Edith Louise",female,33,0,0,PC 17613,27.7208,A11,C
211 | 1101,3,"Delalic, Mr. Redjo",male,25,0,0,349250,7.8958,,S
212 | 1102,3,"Andersen, Mr. Albert Karvin",male,32,0,0,C 4001,22.525,,S
213 | 1103,3,"Finoli, Mr. Luigi",male,,0,0,SOTON/O.Q. 3101308,7.05,,S
214 | 1104,2,"Deacon, Mr. Percy William",male,17,0,0,S.O.C. 14879,73.5,,S
215 | 1105,2,"Howard, Mrs. Benjamin (Ellen Truelove Arman)",female,60,1,0,24065,26,,S
216 | 1106,3,"Andersson, Miss. Ida Augusta Margareta",female,38,4,2,347091,7.775,,S
217 | 1107,1,"Head, Mr. Christopher",male,42,0,0,113038,42.5,B11,S
218 | 1108,3,"Mahon, Miss. Bridget Delia",female,,0,0,330924,7.8792,,Q
219 | 1109,1,"Wick, Mr. George Dennick",male,57,1,1,36928,164.8667,,S
220 | 1110,1,"Widener, Mrs. George Dunton (Eleanor Elkins)",female,50,1,1,113503,211.5,C80,C
221 | 1111,3,"Thomson, Mr. Alexander Morrison",male,,0,0,32302,8.05,,S
222 | 1112,2,"Duran y More, Miss. Florentina",female,30,1,0,SC/PARIS 2148,13.8583,,C
223 | 1113,3,"Reynolds, Mr. Harold J",male,21,0,0,342684,8.05,,S
224 | 1114,2,"Cook, Mrs. (Selena Rogers)",female,22,0,0,W./C. 14266,10.5,F33,S
225 | 1115,3,"Karlsson, Mr. Einar Gervasius",male,21,0,0,350053,7.7958,,S
226 | 1116,1,"Candee, Mrs. Edward (Helen Churchill Hungerford)",female,53,0,0,PC 17606,27.4458,,C
227 | 1117,3,"Moubarek, Mrs. George (Omine Amenia"" Alexander)""",female,,0,2,2661,15.2458,,C
228 | 1118,3,"Asplund, Mr. Johan Charles",male,23,0,0,350054,7.7958,,S
229 | 1119,3,"McNeill, Miss. Bridget",female,,0,0,370368,7.75,,Q
230 | 1120,3,"Everett, Mr. Thomas James",male,40.5,0,0,C.A. 6212,15.1,,S
231 | 1121,2,"Hocking, Mr. Samuel James Metcalfe",male,36,0,0,242963,13,,S
232 | 1122,2,"Sweet, Mr. George Frederick",male,14,0,0,220845,65,,S
233 | 1123,1,"Willard, Miss. Constance",female,21,0,0,113795,26.55,,S
234 | 1124,3,"Wiklund, Mr. Karl Johan",male,21,1,0,3101266,6.4958,,S
235 | 1125,3,"Linehan, Mr. Michael",male,,0,0,330971,7.8792,,Q
236 | 1126,1,"Cumings, Mr. John Bradley",male,39,1,0,PC 17599,71.2833,C85,C
237 | 1127,3,"Vendel, Mr. Olof Edvin",male,20,0,0,350416,7.8542,,S
238 | 1128,1,"Warren, Mr. Frank Manley",male,64,1,0,110813,75.25,D37,C
239 | 1129,3,"Baccos, Mr. Raffull",male,20,0,0,2679,7.225,,C
240 | 1130,2,"Hiltunen, Miss. Marta",female,18,1,1,250650,13,,S
241 | 1131,1,"Douglas, Mrs. Walter Donald (Mahala Dutton)",female,48,1,0,PC 17761,106.425,C86,C
242 | 1132,1,"Lindstrom, Mrs. Carl Johan (Sigrid Posse)",female,55,0,0,112377,27.7208,,C
243 | 1133,2,"Christy, Mrs. (Alice Frances)",female,45,0,2,237789,30,,S
244 | 1134,1,"Spedden, Mr. Frederic Oakley",male,45,1,1,16966,134.5,E34,C
245 | 1135,3,"Hyman, Mr. Abraham",male,,0,0,3470,7.8875,,S
246 | 1136,3,"Johnston, Master. William Arthur Willie""""",male,,1,2,W./C. 6607,23.45,,S
247 | 1137,1,"Kenyon, Mr. Frederick R",male,41,1,0,17464,51.8625,D21,S
248 | 1138,2,"Karnes, Mrs. J Frank (Claire Bennett)",female,22,0,0,F.C.C. 13534,21,,S
249 | 1139,2,"Drew, Mr. James Vivian",male,42,1,1,28220,32.5,,S
250 | 1140,2,"Hold, Mrs. Stephen (Annie Margaret Hill)",female,29,1,0,26707,26,,S
251 | 1141,3,"Khalil, Mrs. Betros (Zahie Maria"" Elias)""",female,,1,0,2660,14.4542,,C
252 | 1142,2,"West, Miss. Barbara J",female,0.92,1,2,C.A. 34651,27.75,,S
253 | 1143,3,"Abrahamsson, Mr. Abraham August Johannes",male,20,0,0,SOTON/O2 3101284,7.925,,S
254 | 1144,1,"Clark, Mr. Walter Miller",male,27,1,0,13508,136.7792,C89,C
255 | 1145,3,"Salander, Mr. Karl Johan",male,24,0,0,7266,9.325,,S
256 | 1146,3,"Wenzel, Mr. Linhart",male,32.5,0,0,345775,9.5,,S
257 | 1147,3,"MacKay, Mr. George William",male,,0,0,C.A. 42795,7.55,,S
258 | 1148,3,"Mahon, Mr. John",male,,0,0,AQ/4 3130,7.75,,Q
259 | 1149,3,"Niklasson, Mr. Samuel",male,28,0,0,363611,8.05,,S
260 | 1150,2,"Bentham, Miss. Lilian W",female,19,0,0,28404,13,,S
261 | 1151,3,"Midtsjo, Mr. Karl Albert",male,21,0,0,345501,7.775,,S
262 | 1152,3,"de Messemaeker, Mr. Guillaume Joseph",male,36.5,1,0,345572,17.4,,S
263 | 1153,3,"Nilsson, Mr. August Ferdinand",male,21,0,0,350410,7.8542,,S
264 | 1154,2,"Wells, Mrs. Arthur Henry (Addie"" Dart Trevaskis)""",female,29,0,2,29103,23,,S
265 | 1155,3,"Klasen, Miss. Gertrud Emilia",female,1,1,1,350405,12.1833,,S
266 | 1156,2,"Portaluppi, Mr. Emilio Ilario Giuseppe",male,30,0,0,C.A. 34644,12.7375,,C
267 | 1157,3,"Lyntakoff, Mr. Stanko",male,,0,0,349235,7.8958,,S
268 | 1158,1,"Chisholm, Mr. Roderick Robert Crispin",male,,0,0,112051,0,,S
269 | 1159,3,"Warren, Mr. Charles William",male,,0,0,C.A. 49867,7.55,,S
270 | 1160,3,"Howard, Miss. May Elizabeth",female,,0,0,A. 2. 39186,8.05,,S
271 | 1161,3,"Pokrnic, Mr. Mate",male,17,0,0,315095,8.6625,,S
272 | 1162,1,"McCaffry, Mr. Thomas Francis",male,46,0,0,13050,75.2417,C6,C
273 | 1163,3,"Fox, Mr. Patrick",male,,0,0,368573,7.75,,Q
274 | 1164,1,"Clark, Mrs. Walter Miller (Virginia McDowell)",female,26,1,0,13508,136.7792,C89,C
275 | 1165,3,"Lennon, Miss. Mary",female,,1,0,370371,15.5,,Q
276 | 1166,3,"Saade, Mr. Jean Nassr",male,,0,0,2676,7.225,,C
277 | 1167,2,"Bryhl, Miss. Dagmar Jenny Ingeborg ",female,20,1,0,236853,26,,S
278 | 1168,2,"Parker, Mr. Clifford Richard",male,28,0,0,SC 14888,10.5,,S
279 | 1169,2,"Faunthorpe, Mr. Harry",male,40,1,0,2926,26,,S
280 | 1170,2,"Ware, Mr. John James",male,30,1,0,CA 31352,21,,S
281 | 1171,2,"Oxenham, Mr. Percy Thomas",male,22,0,0,W./C. 14260,10.5,,S
282 | 1172,3,"Oreskovic, Miss. Jelka",female,23,0,0,315085,8.6625,,S
283 | 1173,3,"Peacock, Master. Alfred Edward",male,0.75,1,1,SOTON/O.Q. 3101315,13.775,,S
284 | 1174,3,"Fleming, Miss. Honora",female,,0,0,364859,7.75,,Q
285 | 1175,3,"Touma, Miss. Maria Youssef",female,9,1,1,2650,15.2458,,C
286 | 1176,3,"Rosblom, Miss. Salli Helena",female,2,1,1,370129,20.2125,,S
287 | 1177,3,"Dennis, Mr. William",male,36,0,0,A/5 21175,7.25,,S
288 | 1178,3,"Franklin, Mr. Charles (Charles Fardon)",male,,0,0,SOTON/O.Q. 3101314,7.25,,S
289 | 1179,1,"Snyder, Mr. John Pillsbury",male,24,1,0,21228,82.2667,B45,S
290 | 1180,3,"Mardirosian, Mr. Sarkis",male,,0,0,2655,7.2292,F E46,C
291 | 1181,3,"Ford, Mr. Arthur",male,,0,0,A/5 1478,8.05,,S
292 | 1182,1,"Rheims, Mr. George Alexander Lucien",male,,0,0,PC 17607,39.6,,S
293 | 1183,3,"Daly, Miss. Margaret Marcella Maggie""""",female,30,0,0,382650,6.95,,Q
294 | 1184,3,"Nasr, Mr. Mustafa",male,,0,0,2652,7.2292,,C
295 | 1185,1,"Dodge, Dr. Washington",male,53,1,1,33638,81.8583,A34,S
296 | 1186,3,"Wittevrongel, Mr. Camille",male,36,0,0,345771,9.5,,S
297 | 1187,3,"Angheloff, Mr. Minko",male,26,0,0,349202,7.8958,,S
298 | 1188,2,"Laroche, Miss. Louise",female,1,1,2,SC/Paris 2123,41.5792,,C
299 | 1189,3,"Samaan, Mr. Hanna",male,,2,0,2662,21.6792,,C
300 | 1190,1,"Loring, Mr. Joseph Holland",male,30,0,0,113801,45.5,,S
301 | 1191,3,"Johansson, Mr. Nils",male,29,0,0,347467,7.8542,,S
302 | 1192,3,"Olsson, Mr. Oscar Wilhelm",male,32,0,0,347079,7.775,,S
303 | 1193,2,"Malachard, Mr. Noel",male,,0,0,237735,15.0458,D,C
304 | 1194,2,"Phillips, Mr. Escott Robert",male,43,0,1,S.O./P.P. 2,21,,S
305 | 1195,3,"Pokrnic, Mr. Tome",male,24,0,0,315092,8.6625,,S
306 | 1196,3,"McCarthy, Miss. Catherine Katie""""",female,,0,0,383123,7.75,,Q
307 | 1197,1,"Crosby, Mrs. Edward Gifford (Catherine Elizabeth Halstead)",female,64,1,1,112901,26.55,B26,S
308 | 1198,1,"Allison, Mr. Hudson Joshua Creighton",male,30,1,2,113781,151.55,C22 C26,S
309 | 1199,3,"Aks, Master. Philip Frank",male,0.83,0,1,392091,9.35,,S
310 | 1200,1,"Hays, Mr. Charles Melville",male,55,1,1,12749,93.5,B69,S
311 | 1201,3,"Hansen, Mrs. Claus Peter (Jennie L Howard)",female,45,1,0,350026,14.1083,,S
312 | 1202,3,"Cacic, Mr. Jego Grga",male,18,0,0,315091,8.6625,,S
313 | 1203,3,"Vartanian, Mr. David",male,22,0,0,2658,7.225,,C
314 | 1204,3,"Sadowitz, Mr. Harry",male,,0,0,LP 1588,7.575,,S
315 | 1205,3,"Carr, Miss. Jeannie",female,37,0,0,368364,7.75,,Q
316 | 1206,1,"White, Mrs. John Stuart (Ella Holmes)",female,55,0,0,PC 17760,135.6333,C32,C
317 | 1207,3,"Hagardon, Miss. Kate",female,17,0,0,AQ/3. 30631,7.7333,,Q
318 | 1208,1,"Spencer, Mr. William Augustus",male,57,1,0,PC 17569,146.5208,B78,C
319 | 1209,2,"Rogers, Mr. Reginald Harry",male,19,0,0,28004,10.5,,S
320 | 1210,3,"Jonsson, Mr. Nils Hilding",male,27,0,0,350408,7.8542,,S
321 | 1211,2,"Jefferys, Mr. Ernest Wilfred",male,22,2,0,C.A. 31029,31.5,,S
322 | 1212,3,"Andersson, Mr. Johan Samuel",male,26,0,0,347075,7.775,,S
323 | 1213,3,"Krekorian, Mr. Neshan",male,25,0,0,2654,7.2292,F E57,C
324 | 1214,2,"Nesson, Mr. Israel",male,26,0,0,244368,13,F2,S
325 | 1215,1,"Rowe, Mr. Alfred G",male,33,0,0,113790,26.55,,S
326 | 1216,1,"Kreuchen, Miss. Emilie",female,39,0,0,24160,211.3375,,S
327 | 1217,3,"Assam, Mr. Ali",male,23,0,0,SOTON/O.Q. 3101309,7.05,,S
328 | 1218,2,"Becker, Miss. Ruth Elizabeth",female,12,2,1,230136,39,F4,S
329 | 1219,1,"Rosenshine, Mr. George (Mr George Thorne"")""",male,46,0,0,PC 17585,79.2,,C
330 | 1220,2,"Clarke, Mr. Charles Valentine",male,29,1,0,2003,26,,S
331 | 1221,2,"Enander, Mr. Ingvar",male,21,0,0,236854,13,,S
332 | 1222,2,"Davies, Mrs. John Morgan (Elizabeth Agnes Mary White) ",female,48,0,2,C.A. 33112,36.75,,S
333 | 1223,1,"Dulles, Mr. William Crothers",male,39,0,0,PC 17580,29.7,A18,C
334 | 1224,3,"Thomas, Mr. Tannous",male,,0,0,2684,7.225,,C
335 | 1225,3,"Nakid, Mrs. Said (Waika Mary"" Mowad)""",female,19,1,1,2653,15.7417,,C
336 | 1226,3,"Cor, Mr. Ivan",male,27,0,0,349229,7.8958,,S
337 | 1227,1,"Maguire, Mr. John Edward",male,30,0,0,110469,26,C106,S
338 | 1228,2,"de Brito, Mr. Jose Joaquim",male,32,0,0,244360,13,,S
339 | 1229,3,"Elias, Mr. Joseph",male,39,0,2,2675,7.2292,,C
340 | 1230,2,"Denbury, Mr. Herbert",male,25,0,0,C.A. 31029,31.5,,S
341 | 1231,3,"Betros, Master. Seman",male,,0,0,2622,7.2292,,C
342 | 1232,2,"Fillbrook, Mr. Joseph Charles",male,18,0,0,C.A. 15185,10.5,,S
343 | 1233,3,"Lundstrom, Mr. Thure Edvin",male,32,0,0,350403,7.5792,,S
344 | 1234,3,"Sage, Mr. John George",male,,1,9,CA. 2343,69.55,,S
345 | 1235,1,"Cardeza, Mrs. James Warburton Martinez (Charlotte Wardle Drake)",female,58,0,1,PC 17755,512.3292,B51 B53 B55,C
346 | 1236,3,"van Billiard, Master. James William",male,,1,1,A/5. 851,14.5,,S
347 | 1237,3,"Abelseth, Miss. Karen Marie",female,16,0,0,348125,7.65,,S
348 | 1238,2,"Botsford, Mr. William Hull",male,26,0,0,237670,13,,S
349 | 1239,3,"Whabee, Mrs. George Joseph (Shawneene Abi-Saab)",female,38,0,0,2688,7.2292,,C
350 | 1240,2,"Giles, Mr. Ralph",male,24,0,0,248726,13.5,,S
351 | 1241,2,"Walcroft, Miss. Nellie",female,31,0,0,F.C.C. 13528,21,,S
352 | 1242,1,"Greenfield, Mrs. Leo David (Blanche Strouse)",female,45,0,1,PC 17759,63.3583,D10 D12,C
353 | 1243,2,"Stokes, Mr. Philip Joseph",male,25,0,0,F.C.C. 13540,10.5,,S
354 | 1244,2,"Dibden, Mr. William",male,18,0,0,S.O.C. 14879,73.5,,S
355 | 1245,2,"Herman, Mr. Samuel",male,49,1,2,220845,65,,S
356 | 1246,3,"Dean, Miss. Elizabeth Gladys Millvina""""",female,0.17,1,2,C.A. 2315,20.575,,S
357 | 1247,1,"Julian, Mr. Henry Forbes",male,50,0,0,113044,26,E60,S
358 | 1248,1,"Brown, Mrs. John Murray (Caroline Lane Lamson)",female,59,2,0,11769,51.4792,C101,S
359 | 1249,3,"Lockyer, Mr. Edward",male,,0,0,1222,7.8792,,S
360 | 1250,3,"O'Keefe, Mr. Patrick",male,,0,0,368402,7.75,,Q
361 | 1251,3,"Lindell, Mrs. Edvard Bengtsson (Elin Gerda Persson)",female,30,1,0,349910,15.55,,S
362 | 1252,3,"Sage, Master. William Henry",male,14.5,8,2,CA. 2343,69.55,,S
363 | 1253,2,"Mallet, Mrs. Albert (Antoinette Magnin)",female,24,1,1,S.C./PARIS 2079,37.0042,,C
364 | 1254,2,"Ware, Mrs. John James (Florence Louise Long)",female,31,0,0,CA 31352,21,,S
365 | 1255,3,"Strilic, Mr. Ivan",male,27,0,0,315083,8.6625,,S
366 | 1256,1,"Harder, Mrs. George Achilles (Dorothy Annan)",female,25,1,0,11765,55.4417,E50,C
367 | 1257,3,"Sage, Mrs. John (Annie Bullen)",female,,1,9,CA. 2343,69.55,,S
368 | 1258,3,"Caram, Mr. Joseph",male,,1,0,2689,14.4583,,C
369 | 1259,3,"Riihivouri, Miss. Susanna Juhantytar Sanni""""",female,22,0,0,3101295,39.6875,,S
370 | 1260,1,"Gibson, Mrs. Leonard (Pauline C Boeson)",female,45,0,1,112378,59.4,,C
371 | 1261,2,"Pallas y Castello, Mr. Emilio",male,29,0,0,SC/PARIS 2147,13.8583,,C
372 | 1262,2,"Giles, Mr. Edgar",male,21,1,0,28133,11.5,,S
373 | 1263,1,"Wilson, Miss. Helen Alice",female,31,0,0,16966,134.5,E39 E41,C
374 | 1264,1,"Ismay, Mr. Joseph Bruce",male,49,0,0,112058,0,B52 B54 B56,S
375 | 1265,2,"Harbeck, Mr. William H",male,44,0,0,248746,13,,S
376 | 1266,1,"Dodge, Mrs. Washington (Ruth Vidaver)",female,54,1,1,33638,81.8583,A34,S
377 | 1267,1,"Bowen, Miss. Grace Scott",female,45,0,0,PC 17608,262.375,,C
378 | 1268,3,"Kink, Miss. Maria",female,22,2,0,315152,8.6625,,S
379 | 1269,2,"Cotterill, Mr. Henry Harry""""",male,21,0,0,29107,11.5,,S
380 | 1270,1,"Hipkins, Mr. William Edward",male,55,0,0,680,50,C39,S
381 | 1271,3,"Asplund, Master. Carl Edgar",male,5,4,2,347077,31.3875,,S
382 | 1272,3,"O'Connor, Mr. Patrick",male,,0,0,366713,7.75,,Q
383 | 1273,3,"Foley, Mr. Joseph",male,26,0,0,330910,7.8792,,Q
384 | 1274,3,"Risien, Mrs. Samuel (Emma)",female,,0,0,364498,14.5,,S
385 | 1275,3,"McNamee, Mrs. Neal (Eileen O'Leary)",female,19,1,0,376566,16.1,,S
386 | 1276,2,"Wheeler, Mr. Edwin Frederick""""",male,,0,0,SC/PARIS 2159,12.875,,S
387 | 1277,2,"Herman, Miss. Kate",female,24,1,2,220845,65,,S
388 | 1278,3,"Aronsson, Mr. Ernst Axel Algot",male,24,0,0,349911,7.775,,S
389 | 1279,2,"Ashby, Mr. John",male,57,0,0,244346,13,,S
390 | 1280,3,"Canavan, Mr. Patrick",male,21,0,0,364858,7.75,,Q
391 | 1281,3,"Palsson, Master. Paul Folke",male,6,3,1,349909,21.075,,S
392 | 1282,1,"Payne, Mr. Vivian Ponsonby",male,23,0,0,12749,93.5,B24,S
393 | 1283,1,"Lines, Mrs. Ernest H (Elizabeth Lindsey James)",female,51,0,1,PC 17592,39.4,D28,S
394 | 1284,3,"Abbott, Master. Eugene Joseph",male,13,0,2,C.A. 2673,20.25,,S
395 | 1285,2,"Gilbert, Mr. William",male,47,0,0,C.A. 30769,10.5,,S
396 | 1286,3,"Kink-Heilmann, Mr. Anton",male,29,3,1,315153,22.025,,S
397 | 1287,1,"Smith, Mrs. Lucien Philip (Mary Eloise Hughes)",female,18,1,0,13695,60,C31,S
398 | 1288,3,"Colbert, Mr. Patrick",male,24,0,0,371109,7.25,,Q
399 | 1289,1,"Frolicher-Stehli, Mrs. Maxmillian (Margaretha Emerentia Stehli)",female,48,1,1,13567,79.2,B41,C
400 | 1290,3,"Larsson-Rondberg, Mr. Edvard A",male,22,0,0,347065,7.775,,S
401 | 1291,3,"Conlon, Mr. Thomas Henry",male,31,0,0,21332,7.7333,,Q
402 | 1292,1,"Bonnell, Miss. Caroline",female,30,0,0,36928,164.8667,C7,S
403 | 1293,2,"Gale, Mr. Harry",male,38,1,0,28664,21,,S
404 | 1294,1,"Gibson, Miss. Dorothy Winifred",female,22,0,1,112378,59.4,,C
405 | 1295,1,"Carrau, Mr. Jose Pedro",male,17,0,0,113059,47.1,,S
406 | 1296,1,"Frauenthal, Mr. Isaac Gerald",male,43,1,0,17765,27.7208,D40,C
407 | 1297,2,"Nourney, Mr. Alfred (Baron von Drachstedt"")""",male,20,0,0,SC/PARIS 2166,13.8625,D38,C
408 | 1298,2,"Ware, Mr. William Jeffery",male,23,1,0,28666,10.5,,S
409 | 1299,1,"Widener, Mr. George Dunton",male,50,1,1,113503,211.5,C80,C
410 | 1300,3,"Riordan, Miss. Johanna Hannah""""",female,,0,0,334915,7.7208,,Q
411 | 1301,3,"Peacock, Miss. Treasteall",female,3,1,1,SOTON/O.Q. 3101315,13.775,,S
412 | 1302,3,"Naughton, Miss. Hannah",female,,0,0,365237,7.75,,Q
413 | 1303,1,"Minahan, Mrs. William Edward (Lillian E Thorpe)",female,37,1,0,19928,90,C78,Q
414 | 1304,3,"Henriksson, Miss. Jenny Lovisa",female,28,0,0,347086,7.775,,S
415 | 1305,3,"Spector, Mr. Woolf",male,,0,0,A.5. 3236,8.05,,S
416 | 1306,1,"Oliva y Ocana, Dona. Fermina",female,39,0,0,PC 17758,108.9,C105,C
417 | 1307,3,"Saether, Mr. Simon Sivertsen",male,38.5,0,0,SOTON/O.Q. 3101262,7.25,,S
418 | 1308,3,"Ware, Mr. Frederick",male,,0,0,359309,8.05,,S
419 | 1309,3,"Peter, Master. Michael J",male,,1,1,2668,22.3583,,C
420 | 


--------------------------------------------------------------------------------
/environment.yml:
--------------------------------------------------------------------------------
 1 | name: deep-learning-from-scratch-pytorch
 2 | channels:
 3 | - conda-forge
 4 | - defaults
 5 | - anaconda
 6 | dependencies:
 7 | - python=3.6
 8 | - jupyter=1.0.0
 9 | - matplotlib=3.1.1
10 | - pandas=0.25.1
11 | - scipy=1.3.1
12 | - scikit-learn=0.23
13 | - seaborn=0.9.0
14 | - pytorch=1.4.0
15 | 


--------------------------------------------------------------------------------
/img/decision-tree-titanic.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/33ca80dd9a083f127127a0784079d4c608d7ce9b/img/decision-tree-titanic.png


--------------------------------------------------------------------------------
/img/fitting.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/33ca80dd9a083f127127a0784079d4c608d7ce9b/img/fitting.png


--------------------------------------------------------------------------------
/img/george.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/33ca80dd9a083f127127a0784079d4c608d7ce9b/img/george.jpg


--------------------------------------------------------------------------------
/img/gradient-descent.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/33ca80dd9a083f127127a0784079d4c608d7ce9b/img/gradient-descent.png


--------------------------------------------------------------------------------
/img/mlp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/33ca80dd9a083f127127a0784079d4c608d7ce9b/img/mlp.png


--------------------------------------------------------------------------------
/img/must-read-books.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/33ca80dd9a083f127127a0784079d4c608d7ce9b/img/must-read-books.png


--------------------------------------------------------------------------------
/img/perceptron.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/33ca80dd9a083f127127a0784079d4c608d7ce9b/img/perceptron.jpg


--------------------------------------------------------------------------------
/img/rasbt-backprop.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/33ca80dd9a083f127127a0784079d4c608d7ce9b/img/rasbt-backprop.png


--------------------------------------------------------------------------------
/img/tikz16.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/33ca80dd9a083f127127a0784079d4c608d7ce9b/img/tikz16.png


--------------------------------------------------------------------------------
/notebooks/1-Student-deep-learning-from-scratch-pytorch.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "markdown",
   5 |    "metadata": {},
   6 |    "source": [
   7 |     "# Deep learning from scratch"
   8 |    ]
   9 |   },
  10 |   {
  11 |    "cell_type": "markdown",
  12 |    "metadata": {},
  13 |    "source": [
  14 |     "# Learning objectives of the notebook"
  15 |    ]
  16 |   },
  17 |   {
  18 |    "cell_type": "markdown",
  19 |    "metadata": {},
  20 |    "source": [
  21 |     "- Appreciate that machine learning is a technical, cultural, economic, and social discipline that has the ability to consolidate and re-arrange power structures;\n",
  22 |     "- Build simple ML models for classification and regression using `scikit-learn`;\n",
  23 |     "- Hand-code forward propagation for single and multilayer perceptrons using `numpy`;\n",
  24 |     "- Incorporate non-linearities into neural networks using activation functions;\n",
  25 |     "- Hand-code gradient descent using `numpy`;\n",
  26 |     "- Understand the basics of backpropagation."
  27 |    ]
  28 |   },
  29 |   {
  30 |    "cell_type": "markdown",
  31 |    "metadata": {},
  32 |    "source": [
  33 |     "# 1. An Introduction to Machine Learning"
  34 |    ]
  35 |   },
  36 |   {
  37 |    "cell_type": "markdown",
  38 |    "metadata": {},
  39 |    "source": [
  40 |     "Machine learning is the science and art of teaching computers to \"learn\" patterns from data. In some ways, we can consider it a subdiscipline of data science, which is often sliced into\n",
  41 |     "\n",
  42 |     "* Descriptive analytics (BI, classic analytics, dashboards),\n",
  43 |     "* Predictive analytics (machine learning), and\n",
  44 |     "* Prescriptive analytics (decision science).\n",
  45 |     "\n",
  46 |     "Machine learning itself is often sliced into\n",
  47 |     "\n",
  48 |     "* Supervised learning (predicting a label: classification, or a continuous variable),\n",
  49 |     "* Unsupervised learning (pattern recognition for unlabelled data, a paradigm being clustering),\n",
  50 |     "* Reinforcement learning, in which software agents are placed in constrained environments and given “rewards” and “punishments” based on their activity (AlphaGo Zero, self-driving cars). \n",
  51 |     "\n",
  52 |     "\n",
  53 |     "This workshop is an introduction to deep learning, a powerful form of machine learning that has garnered much attention for its successes in computer vision (e.g. image recognition) and natural language processing.\n",
  54 |     "\n",
  55 |     "At the outset, we'd like to make clear that data science and machine learning are powerful technologies that can do both harm and good. As [Cathy O'Neil has said](https://www.datacamp.com/community/podcast/weapons-math-destruction), \n",
  56 |     "\n",
  57 |     "> \"data science doesn't just predict the future. It creates the future.\"\n",
  58 |     "\n",
  59 |     "For example,\n",
  60 |     "\n",
  61 |     "* [There are runaway feedback loops in “predictive policing”](https://www.smithsonianmag.com/innovation/artificial-intelligence-is-now-used-predict-crime-is-it-biased-180968337/), whereby more police are sent to neighborhoods with higher “reported & predicted crime,” resulting in more police being sent there and more reports of crime and so on.\n",
  62 |     "* Google search encodes all types of cultural and societal biases, such as racial discrimination, as investigated in Safiya Noble’s [Algorithms of Oppression](https://nyupress.org/9781479837243/algorithms-of-oppression/). An example of this is that, for many years, when using Google image search with the keyword “beautiful,” the results would be dominated by photos of white women. In the words of Ruha Benjamin, Associate Professor of African American Studies at Princeton University, [“race and technology are co-produced.”](https://www.ruhabenjamin.com/race-after-technology) \n",
  63 |     "* There are also interaction effects between many models deployed in society that mean they feedback into each other: those most likely to be treated unfairly by [healthcare algorithms](https://www.technologyreview.com/2019/10/25/132184/a-biased-medical-algorithm-favored-white-people-for-healthcare-programs/) are more likely to be discriminated against by models used in employment hiring flows and more likely to be targeted by predatory payday loan ads online, as detailed by Cathy O’Neil in [Weapons of Math Destruction](https://weaponsofmathdestructionbook.com/).\n",
  64 |     "\n",
  65 |     "![Title](../img/must-read-books.png)"
  66 |    ]
  67 |   },
  68 |   {
  69 |    "cell_type": "markdown",
  70 |    "metadata": {},
  71 |    "source": [
  72 |     "\n",
  73 |     "Moreover, data collection and data reporting are political acts and processes embedded in societies with asymmetric power relations, and most often processes controlled by those in positions of power. In the words of Catherine D’Ignazio and Lauren F. Klein in [Data Feminism](https://mitpress.mit.edu/books/data-feminism), “governments and corporations have long employed data and statistics as management techniques to preserve and unequal status quo.” It is a revelation to realize that the etymology of the word statistics comes from the term statecraft (we discovered this fact from Chris Wiggins’ & Matt Jones’ course [data: past, present, and future](https://data-ppf.github.io/) at Columbia University) and the ability of states and governments to wield power through the control of data collection and data reporting (they decide what is collected, reported, how it is reported, and what decisions are made).\n",
  74 |     "\n",
  75 |     "Data science, ML, and AI consolidate and re-arrange power structures: they're cultural, economic, and social tools, as well as technical tools. Also: who chooses the classification scheme, the columns, the rows? Most often, it's those in positions of power. Be careful with the algorithms you build, how they're deployed, and the features that you use:\n",
  76 |     "\n",
  77 |     "* If you think race should not be a feature in your data (which it more than likely should not), then you should throw out zip code also, as it is highly correlated with race and very often encodes it;\n",
  78 |     "* If you are given a dataset with gender or biological sex as a feature, you should question why it was even collected in the first place and whether including it in your could discrimate against any gender or sex (hint: to my knowledge, it's always discriminatory against non-males, such as [here](https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G) and [here](https://www.wired.com/story/the-apple-card-didnt-see-genderand-thats-the-problem/))."
  79 |    ]
  80 |   },
  81 |   {
  82 |    "cell_type": "markdown",
  83 |    "metadata": {},
  84 |    "source": [
  85 |     "## Machine Learning: Classification"
  86 |    ]
  87 |   },
  88 |   {
  89 |    "cell_type": "markdown",
  90 |    "metadata": {},
  91 |    "source": [
  92 |     "So we're now going to jump in and build our first machine learning model. It is the (now) famous Titanic dataset, where each row is a passenger on the Titanic and the target variable (the one you're trying to predict) is whether they survived or not. The features (the variables you use to make the prediction) include their name, the fare they paid, where they embarked, **and** their **Sex**. It is an important question whether we want to use this feature. As we're interested in building the best predictive model and *not* putting it into production to make decisions and take actions that impact lives, it may be ok but I encourage you all to interrogate this question further (it is credible that we could build a more accurate model by keeping 'Sex' as a feature as, [on the Titanic](https://www.newscientist.com/article/dn22119-sinking-the-titanic-women-and-children-first-myth/), \"the captain explicitly issued an order for women and children to be saved first\").\n",
  93 |     "\n",
  94 |     "**On terminology:**\n",
  95 |     "\n",
  96 |     "- The **target variable** is the variable you are trying to predict;\n",
  97 |     "- Other variables are known as **features** (or **predictor variables**), the features that you're using to predict the target variable).\n",
  98 |     "\n",
  99 |     "**On practice and procedure:**\n",
 100 |     "\n",
 101 |     "To build machine learning models, you require two things:\n",
 102 |     "\n",
 103 |     "- **Training data** (which the algorithms learn from) and\n",
 104 |     "- An **evaluation metric**, such as accuracy.\n",
 105 |     "\n",
 106 |     "For more on these, check out Cassie Kozyrkov's wonderful articles [Forget the robots! Here’s how AI will get you](https://towardsdatascience.com/forget-the-robots-heres-how-ai-will-get-you-b674c28d6a34) and [Machine learning — Is the emperor wearing clothes?](https://medium.com/@kozyrkov/machine-learning-is-the-emperor-wearing-clothes-928fe406fe09).\n",
 107 |     "\n",
 108 |     "Also note that the ML ingredients of *training data* and *evaluation* metric can introduce all type of biases and other problems into your ML algorithms, for example:\n",
 109 |     "\n",
 110 |     "* If your training data is biased, your model more than likely will be;\n",
 111 |     "* If you optimize solely for accuracy, what happens to groups that are under-represented in your training data?\n",
 112 |     "\n",
 113 |     "The latter challenge follows from the broader class of problems we face when optimizing anything, as detailed by Rachel Thomas in [\"The problem with metrics is a big problem for AI\"](https://www.fast.ai/2019/09/24/metrics/):\n",
 114 |     "\n",
 115 |     "<blockquote class=\"twitter-tweet\"><p lang=\"en\" dir=\"ltr\">The problem with metrics is a big problem for AI<br>- Most AI approaches optimize metrics<br>- Any metric is just a proxy<br>- Metrics can, and will, be gamed<br>- Metrics overemphasize short-term concerns<br>- Online metrics are gathered in highly addictive environments<a href=\"https://t.co/k0J5ksw91Q\">https://t.co/k0J5ksw91Q</a> <a href=\"https://t.co/yGLUV2T2u3\">pic.twitter.com/yGLUV2T2u3</a></p>&mdash; Rachel Thomas (@math_rachel) <a href=\"https://twitter.com/math_rachel/status/1176606580264951810?ref_src=twsrc%5Etfw\">September 24, 2019</a></blockquote> <script async src=\"https://platform.twitter.com/widgets.js\" charset=\"utf-8\"></script> "
 116 |    ]
 117 |   },
 118 |   {
 119 |    "cell_type": "markdown",
 120 |    "metadata": {},
 121 |    "source": [
 122 |     "Let's now import our dataset and begin looking at it:"
 123 |    ]
 124 |   },
 125 |   {
 126 |    "cell_type": "code",
 127 |    "execution_count": null,
 128 |    "metadata": {},
 129 |    "outputs": [],
 130 |    "source": [
 131 |     "# Import modules\n",
 132 |     "import numpy as np\n",
 133 |     "import pandas as pd\n",
 134 |     "import matplotlib.pyplot as plt\n",
 135 |     "import seaborn as sns\n",
 136 |     "from sklearn import tree\n",
 137 |     "from sklearn.metrics import accuracy_score\n",
 138 |     "\n",
 139 |     "# Figures inline and set visualization style\n",
 140 |     "%matplotlib inline\n",
 141 |     "sns.set()"
 142 |    ]
 143 |   },
 144 |   {
 145 |    "cell_type": "code",
 146 |    "execution_count": null,
 147 |    "metadata": {},
 148 |    "outputs": [],
 149 |    "source": [
 150 |     "# Import data\n",
 151 |     "df = pd.read_csv('https://raw.githubusercontent.com/hugobowne/deep-learning-from-scratch-pytorch/master/data/train.csv')\n",
 152 |     "\n",
 153 |     "# View first lines of training data\n",
 154 |     "____"
 155 |    ]
 156 |   },
 157 |   {
 158 |    "cell_type": "code",
 159 |    "execution_count": null,
 160 |    "metadata": {},
 161 |    "outputs": [],
 162 |    "source": [
 163 |     "# check out data types\n",
 164 |     "____"
 165 |    ]
 166 |   },
 167 |   {
 168 |    "cell_type": "code",
 169 |    "execution_count": null,
 170 |    "metadata": {},
 171 |    "outputs": [],
 172 |    "source": [
 173 |     "# check out summary statistics\n",
 174 |     "____"
 175 |    ]
 176 |   },
 177 |   {
 178 |    "cell_type": "markdown",
 179 |    "metadata": {},
 180 |    "source": [
 181 |     "## EDA and first models"
 182 |    ]
 183 |   },
 184 |   {
 185 |    "cell_type": "markdown",
 186 |    "metadata": {},
 187 |    "source": [
 188 |     "Note: a huuuuuuuuge part of model building is making sure that our models generalize to  new data. Another way to think of this is that we want our models to capture the signal, not the noise or fluctuations in the training data. If it's capturing a lot of the noise, we call this _overfitting_.\n",
 189 |     "Image from [here](https://stats.stackexchange.com/questions/192007/what-measures-you-look-at-the-determine-over-fitting-in-linear-regression/192021).\n",
 190 |     "![Title](../img/fitting.png)"
 191 |    ]
 192 |   },
 193 |   {
 194 |    "cell_type": "markdown",
 195 |    "metadata": {},
 196 |    "source": [
 197 |     "To this end, we don't want to look at all the data at the start! We want to *hold out* some of the data into a *test* or *hold-out* set so that we can test how well any model we build performs on it. The data remaining is called the _training_ data as that's the data we use to _train_ the model.\n",
 198 |     "\n",
 199 |     "**Key terminology:**\n",
 200 |     "\n",
 201 |     "- **Training data** is what we train our ML models on;\n",
 202 |     "- **Test data** or a **hold-out set** is what we use to gauge how well our model performs, after we train it.\n",
 203 |     "\n",
 204 |     "**Note:** there is a slightly more sophisticated alternative to a single hold-out set called *cross validation*, which we won't cover here. Feel free to check it out [here](https://scikit-learn.org/stable/modules/cross_validation.html). \n",
 205 |     "\n",
 206 |     "To split our data into train and test sets, scikit-learn has a pretty cool utility function:"
 207 |    ]
 208 |   },
 209 |   {
 210 |    "cell_type": "code",
 211 |    "execution_count": null,
 212 |    "metadata": {},
 213 |    "outputs": [],
 214 |    "source": [
 215 |     "# split your data\n",
 216 |     "from sklearn.model_selection import ____\n",
 217 |     "df_train, df_test, y_train, y_test = ____"
 218 |    ]
 219 |   },
 220 |   {
 221 |    "cell_type": "code",
 222 |    "execution_count": null,
 223 |    "metadata": {},
 224 |    "outputs": [],
 225 |    "source": [
 226 |     "# make bar plot of target variable\n",
 227 |     "df_train['Survived'] = ____\n",
 228 |     "____"
 229 |    ]
 230 |   },
 231 |   {
 232 |    "cell_type": "markdown",
 233 |    "metadata": {},
 234 |    "source": [
 235 |     "More people died than survived so let's make a first baseline and very naive prediction that everybody died. \n",
 236 |     "Although this is clearly a bad model, it will give us a baseline, against which to compare any future model that we build:"
 237 |    ]
 238 |   },
 239 |   {
 240 |    "cell_type": "code",
 241 |    "execution_count": null,
 242 |    "metadata": {},
 243 |    "outputs": [],
 244 |    "source": [
 245 |     "df_test['Survived'] = 0\n",
 246 |     "# Compute accuracy of this model\n",
 247 |     "pred_diff = ____\n",
 248 |     "accuracy = ____\n",
 249 |     "print(accuracy)"
 250 |    ]
 251 |   },
 252 |   {
 253 |    "cell_type": "markdown",
 254 |    "metadata": {},
 255 |    "source": [
 256 |     "OK! So our incredibly naive, baseline model was 61.7% accurate. This means that if we build more sophisticated models, they definitely need to perform better than this."
 257 |    ]
 258 |   },
 259 |   {
 260 |    "cell_type": "markdown",
 261 |    "metadata": {},
 262 |    "source": [
 263 |     "## Decision Tree"
 264 |    ]
 265 |   },
 266 |   {
 267 |    "cell_type": "markdown",
 268 |    "metadata": {},
 269 |    "source": [
 270 |     "We're now going to build a model called a decision tree. Before doing that, we need to do a bit of data preparation and cleaning. We'll do all of this on the original dataset before the train-test split, to make sure we treat all rows the same:"
 271 |    ]
 272 |   },
 273 |   {
 274 |    "cell_type": "code",
 275 |    "execution_count": null,
 276 |    "metadata": {},
 277 |    "outputs": [],
 278 |    "source": [
 279 |     "# Impute missing numerical variables\n",
 280 |     "df['Age'] = df.Age.fillna(df.Age.median())\n",
 281 |     "df['Fare'] = df.Fare.fillna(df.Fare.median())\n",
 282 |     "\n",
 283 |     "# Check out info of data\n",
 284 |     "____"
 285 |    ]
 286 |   },
 287 |   {
 288 |    "cell_type": "code",
 289 |    "execution_count": null,
 290 |    "metadata": {},
 291 |    "outputs": [],
 292 |    "source": [
 293 |     "# Convert Sex into a numerical feature\n",
 294 |     "df = pd.get_dummies(df, columns=['Sex'], drop_first=True)\n",
 295 |     "\n",
 296 |     "# Select columns and view head\n",
 297 |     "df = df[['Sex_male', 'Fare', 'Age','Pclass', 'SibSp','Survived']]\n",
 298 |     "____"
 299 |    ]
 300 |   },
 301 |   {
 302 |    "cell_type": "code",
 303 |    "execution_count": null,
 304 |    "metadata": {},
 305 |    "outputs": [],
 306 |    "source": [
 307 |     "# train test split\n",
 308 |     "df_train, df_test, y_train, y_test = train_test_split(\n",
 309 |     "    df.drop('Survived', axis=1), df[['Survived']], test_size=0.33, random_state=41, stratify=df[['Survived']])"
 310 |    ]
 311 |   },
 312 |   {
 313 |    "cell_type": "markdown",
 314 |    "metadata": {},
 315 |    "source": [
 316 |     "## Training your model"
 317 |    ]
 318 |   },
 319 |   {
 320 |    "cell_type": "markdown",
 321 |    "metadata": {},
 322 |    "source": [
 323 |     "Now it's time to train your model. We're going to build a decision tree and what the training process actually does is figures out the optimal ways to split the tree:"
 324 |    ]
 325 |   },
 326 |   {
 327 |    "cell_type": "markdown",
 328 |    "metadata": {},
 329 |    "source": [
 330 |     "![title](../img/decision-tree-titanic.png)"
 331 |    ]
 332 |   },
 333 |   {
 334 |    "cell_type": "code",
 335 |    "execution_count": null,
 336 |    "metadata": {},
 337 |    "outputs": [],
 338 |    "source": [
 339 |     "# Instantiate model and fit to data\n",
 340 |     "clf = ____\n",
 341 |     "____"
 342 |    ]
 343 |   },
 344 |   {
 345 |    "cell_type": "code",
 346 |    "execution_count": null,
 347 |    "metadata": {},
 348 |    "outputs": [],
 349 |    "source": [
 350 |     "# Make predictions and store in 'Survived' column of df_test\n",
 351 |     "Y_pred = ____"
 352 |    ]
 353 |   },
 354 |   {
 355 |    "cell_type": "code",
 356 |    "execution_count": null,
 357 |    "metadata": {},
 358 |    "outputs": [],
 359 |    "source": [
 360 |     "# Compute accuracy of this model\n",
 361 |     "____"
 362 |    ]
 363 |   },
 364 |   {
 365 |    "cell_type": "markdown",
 366 |    "metadata": {},
 367 |    "source": [
 368 |     "### HANDS-ON: fit, predict, and ML learning curves"
 369 |    ]
 370 |   },
 371 |   {
 372 |    "cell_type": "markdown",
 373 |    "metadata": {},
 374 |    "source": [
 375 |     "Plot the learning curve as we increase the depth of the decision tree -- this is a plot which has the accuracy of the model on both the training and the test set. "
 376 |    ]
 377 |   },
 378 |   {
 379 |    "cell_type": "code",
 380 |    "execution_count": null,
 381 |    "metadata": {},
 382 |    "outputs": [],
 383 |    "source": [
 384 |     "# Setup arrays to store train and test accuracies\n",
 385 |     "dep = np.arange(1, 9)\n",
 386 |     "train_accuracy = np.empty(len(dep))\n",
 387 |     "test_accuracy = np.empty(len(dep))\n",
 388 |     "\n",
 389 |     "# Loop over different values of k\n",
 390 |     "for i, k in enumerate(dep):\n",
 391 |     "    # Setup a Decision Tree Classifier\n",
 392 |     "    clf = ____\n",
 393 |     "\n",
 394 |     "    # Fit the classifier to the training data\n",
 395 |     "    ____\n",
 396 |     "\n",
 397 |     "    #Compute accuracy on the training set\n",
 398 |     "    train_accuracy[i] = ____\n",
 399 |     "\n",
 400 |     "    #Compute accuracy on the testing set\n",
 401 |     "    test_accuracy[i] = ____\n",
 402 |     "\n",
 403 |     "# Generate plot\n",
 404 |     "plt.title('clf: Varying depth of tree')\n",
 405 |     "plt.plot(dep, test_accuracy, label = 'Testing Accuracy')\n",
 406 |     "plt.plot(dep, train_accuracy, label = 'Training Accuracy')\n",
 407 |     "plt.legend()\n",
 408 |     "plt.xlabel('Depth of tree')\n",
 409 |     "plt.ylabel('Accuracy')\n",
 410 |     "plt.show()"
 411 |    ]
 412 |   },
 413 |   {
 414 |    "cell_type": "markdown",
 415 |    "metadata": {},
 416 |    "source": [
 417 |     "**KEY NOTE:** You can see when the decision trees begin to overfit to the training set!"
 418 |    ]
 419 |   },
 420 |   {
 421 |    "cell_type": "markdown",
 422 |    "metadata": {},
 423 |    "source": [
 424 |     "## Machine Learning: regression"
 425 |    ]
 426 |   },
 427 |   {
 428 |    "cell_type": "markdown",
 429 |    "metadata": {},
 430 |    "source": [
 431 |     "The other common form of supervised learning is **regression**, in which we're predicting a continuous variable, rather than classifying from a finite number of labels. \n",
 432 |     "\n",
 433 |     "One great aspect of the `scikit-learn` API is that the .fit/.predict paradigm generalizes to all forms of supervised learning. You're going to perform regression on the `scikit` diabetes dataset, which we'll now import and check out together:"
 434 |    ]
 435 |   },
 436 |   {
 437 |    "cell_type": "code",
 438 |    "execution_count": null,
 439 |    "metadata": {},
 440 |    "outputs": [],
 441 |    "source": [
 442 |     "# https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py\n",
 443 |     "from sklearn import datasets, linear_model\n",
 444 |     "from sklearn.metrics import mean_squared_error, r2_score\n",
 445 |     "\n",
 446 |     "# Load the diabetes dataset\n",
 447 |     "diabetes_X, diabetes_y = datasets.load_diabetes(return_X_y=True)\n",
 448 |     "diabetes_data = datasets.load_diabetes()\n",
 449 |     "\n",
 450 |     "#\n",
 451 |     "print(diabetes_data.DESCR)"
 452 |    ]
 453 |   },
 454 |   {
 455 |    "cell_type": "code",
 456 |    "execution_count": null,
 457 |    "metadata": {},
 458 |    "outputs": [],
 459 |    "source": [
 460 |     "# split data into predictors and target\n",
 461 |     "diabetes_X, diabetes_y = diabetes_data['data'], diabetes_data['target']"
 462 |    ]
 463 |   },
 464 |   {
 465 |    "cell_type": "markdown",
 466 |    "metadata": {},
 467 |    "source": [
 468 |     "### HANDS ON: Building a regression model"
 469 |    ]
 470 |   },
 471 |   {
 472 |    "cell_type": "markdown",
 473 |    "metadata": {},
 474 |    "source": [
 475 |     "Now it's your turn to build a linear regression model for this dataset using `scikit-learn`:"
 476 |    ]
 477 |   },
 478 |   {
 479 |    "cell_type": "code",
 480 |    "execution_count": null,
 481 |    "metadata": {},
 482 |    "outputs": [],
 483 |    "source": [
 484 |     "# Use only one feature\n",
 485 |     "diabetes_X = diabetes_X[:, np.newaxis, 2]\n",
 486 |     "\n",
 487 |     "# Split the data into training/testing sets\n",
 488 |     "diabetes_X_train = diabetes_X[:-20]\n",
 489 |     "diabetes_X_test = diabetes_X[-20:]\n",
 490 |     "\n",
 491 |     "# Split the targets into training/testing sets\n",
 492 |     "diabetes_y_train = diabetes_y[:-20]\n",
 493 |     "diabetes_y_test = diabetes_y[-20:]\n",
 494 |     "\n",
 495 |     "# Create linear regression object\n",
 496 |     "regr = ____\n",
 497 |     "\n",
 498 |     "# Train the model using the training sets\n",
 499 |     "____\n",
 500 |     "\n",
 501 |     "# Make predictions using the testing set\n",
 502 |     "diabetes_y_pred = ____\n",
 503 |     "\n",
 504 |     "# The coefficients\n",
 505 |     "print('Coefficients: \\n', regr.coef_)\n",
 506 |     "# The mean squared error\n",
 507 |     "print('Mean squared error: %.2f'\n",
 508 |     "      % mean_squared_error(diabetes_y_test, diabetes_y_pred))\n",
 509 |     "# The coefficient of determination: 1 is perfect prediction\n",
 510 |     "print('Coefficient of determination: %.2f'\n",
 511 |     "      % r2_score(diabetes_y_test, diabetes_y_pred))\n",
 512 |     "\n",
 513 |     "# Plot outputs\n",
 514 |     "plt.scatter(diabetes_X_test, diabetes_y_test,  color='black')\n",
 515 |     "plt.plot(diabetes_X_test, diabetes_y_pred, color='blue', linewidth=3);"
 516 |    ]
 517 |   },
 518 |   {
 519 |    "cell_type": "markdown",
 520 |    "metadata": {},
 521 |    "source": [
 522 |     "# 2. Neural networks"
 523 |    ]
 524 |   },
 525 |   {
 526 |    "cell_type": "markdown",
 527 |    "metadata": {},
 528 |    "source": [
 529 |     "Now it's time for deep learning using neural networks. These are:"
 530 |    ]
 531 |   },
 532 |   {
 533 |    "cell_type": "markdown",
 534 |    "metadata": {},
 535 |    "source": [
 536 |     "- ML models inspired by biological neural networks.\n",
 537 |     "- Good for image classification, NLP, and more. Say more here.\n"
 538 |    ]
 539 |   },
 540 |   {
 541 |    "cell_type": "markdown",
 542 |    "metadata": {},
 543 |    "source": [
 544 |     "![title](../img/george.jpg)"
 545 |    ]
 546 |   },
 547 |   {
 548 |    "cell_type": "markdown",
 549 |    "metadata": {},
 550 |    "source": [
 551 |     "Image from [here](https://www.pnas.org/content/116/4/1074/tab-figures-data)."
 552 |    ]
 553 |   },
 554 |   {
 555 |    "cell_type": "markdown",
 556 |    "metadata": {},
 557 |    "source": [
 558 |     "When making predictions with neural networks, we use a procedure called **forward propagation**. When training neural networks (that is, finding the parameters, called weights), we use a procedure called **backpropogation**. To put it another way,\n",
 559 |     "\n",
 560 |     "- **forward propagation** is for prediction (.predict());\n",
 561 |     "- **backpropogation** is for training (.fit()).\n",
 562 |     "\n",
 563 |     "\n",
 564 |     "\n",
 565 |     "So let's first jump into forward propogation!"
 566 |    ]
 567 |   },
 568 |   {
 569 |    "cell_type": "markdown",
 570 |    "metadata": {},
 571 |    "source": [
 572 |     "## 2.1 Forward propogation"
 573 |    ]
 574 |   },
 575 |   {
 576 |    "cell_type": "markdown",
 577 |    "metadata": {},
 578 |    "source": [
 579 |     "### Single-layer perceptron"
 580 |    ]
 581 |   },
 582 |   {
 583 |    "cell_type": "markdown",
 584 |    "metadata": {},
 585 |    "source": [
 586 |     "The first example is the single layer perceptron (SLP).\n",
 587 |     "The parameters that change when we train the model are the weights.\n",
 588 |     "Image is from [here](https://deepai.org/machine-learning-glossary-and-terms/perceptron)."
 589 |    ]
 590 |   },
 591 |   {
 592 |    "cell_type": "markdown",
 593 |    "metadata": {},
 594 |    "source": [
 595 |     "![title](../img/perceptron.jpg)"
 596 |    ]
 597 |   },
 598 |   {
 599 |    "cell_type": "markdown",
 600 |    "metadata": {},
 601 |    "source": [
 602 |     "To build the weighted sum, we take each input (feature) $x_i$, multiply it by the relevant weight $w_i$, and sum them all up. \n",
 603 |     "\n",
 604 |     "This is essentially a *weighted average*! Note that \n",
 605 |     "\n",
 606 |     "* if all the weights are the same $1/n$, the weighted sum _is_ the average of the features,\n",
 607 |     "* if a weight $w_i=0$, then the respective $x_i$ does not contribute at all to the weighted sum, and\n",
 608 |     "* if a weight $w_i$ is greater than a weight $w_j$, the corresponding $x_i$ contributed more to the weighted sum than $w_j$.\n",
 609 |     "\n",
 610 |     "In `numpy`, **x** and **w** will be 1D arrays. To compute the weighted sum, you can take element-wise products of these arrays and then take the sum.\n",
 611 |     "\n",
 612 |     "It's not necessary to know linear algebra here, but if you do know a little bit, you may recognize that **x** and **w** are vectors and that the weighted sum is the _dot product_ of these vectors so that the model is given by\n",
 613 |     "\n",
 614 |     "- $y = w\\cdot x + b $ (vectors).\n",
 615 |     "\n",
 616 |     "For ease of writing code, we'll use the `np.dot()` function and pass it the relevant arrays. If you'd like, I encourage you to confirm that this does produce the weighted sum, by working through several examples."
 617 |    ]
 618 |   },
 619 |   {
 620 |    "cell_type": "markdown",
 621 |    "metadata": {},
 622 |    "source": [
 623 |     "**THE DATA:** We'll use a toy example of an e-commerce website. The features are \n",
 624 |     "* amount of time on website\n",
 625 |     "* number of interactions\n",
 626 |     "* number of customer support interactions\n",
 627 |     "\n",
 628 |     "The target is amount spent in a year (if it's negative, we can interpret it as refunds), and thus this is a regression challenge. Note that this type of question isn't necessarily a good use case for deep learning (as opposed to image classification), but it has the benefit of providing a simpler example for pedagogical purposes. Also note that we don't really care about the units of the features for the time being, but in a real-world case, you definitely would."
 629 |    ]
 630 |   },
 631 |   {
 632 |    "cell_type": "code",
 633 |    "execution_count": null,
 634 |    "metadata": {},
 635 |    "outputs": [],
 636 |    "source": [
 637 |     "# One data point\n",
 638 |     "x = np.array([10, 29, 2])"
 639 |    ]
 640 |   },
 641 |   {
 642 |    "cell_type": "markdown",
 643 |    "metadata": {},
 644 |    "source": [
 645 |     "Now it's time to build a single layer perceptron using NumPy:"
 646 |    ]
 647 |   },
 648 |   {
 649 |    "cell_type": "code",
 650 |    "execution_count": null,
 651 |    "metadata": {},
 652 |    "outputs": [],
 653 |    "source": [
 654 |     "# Set weights, one for each feature\n",
 655 |     "w = ____\n",
 656 |     "# Set bias\n",
 657 |     "b = 0\n",
 658 |     "# Compute weighted sum + bias\n",
 659 |     "y = ____\n",
 660 |     "print(y)"
 661 |    ]
 662 |   },
 663 |   {
 664 |    "cell_type": "markdown",
 665 |    "metadata": {},
 666 |    "source": [
 667 |     "### HANDS ON: Single Layer Perceptron for classification"
 668 |    ]
 669 |   },
 670 |   {
 671 |    "cell_type": "markdown",
 672 |    "metadata": {},
 673 |    "source": [
 674 |     "As stated, this wass a regressor, in that it predicts a contiuous variable. Classically, single layer perceptrons were classifiers. For a classification challenge, you can threshold the output of the regressor by using a step function, for example:"
 675 |    ]
 676 |   },
 677 |   {
 678 |    "cell_type": "code",
 679 |    "execution_count": null,
 680 |    "metadata": {},
 681 |    "outputs": [],
 682 |    "source": [
 683 |     "# One data point\n",
 684 |     "x = np.array([10, 29, 2])\n",
 685 |     "# Set weights, one for each feature\n",
 686 |     "w = ____\n",
 687 |     "# Set bias\n",
 688 |     "b = 0\n",
 689 |     "# Compute weighted sum + bias\n",
 690 |     "y = ____\n",
 691 |     "# Threshold the output of the regressor using a step function\n",
 692 |     "z = ____\n",
 693 |     "print(z)"
 694 |    ]
 695 |   },
 696 |   {
 697 |    "cell_type": "markdown",
 698 |    "metadata": {},
 699 |    "source": [
 700 |     "For bonus points, you can also turn it into a logistic regression classifier:"
 701 |    ]
 702 |   },
 703 |   {
 704 |    "cell_type": "code",
 705 |    "execution_count": null,
 706 |    "metadata": {},
 707 |    "outputs": [],
 708 |    "source": [
 709 |     "def sigmoid(Z):\n",
 710 |     "    return 1/(1+np.exp(-Z))\n",
 711 |     "# One data point\n",
 712 |     "x = np.array([10, 29, 2])\n",
 713 |     "# Set weights, one for each feature\n",
 714 |     "w = ____\n",
 715 |     "# Set bias\n",
 716 |     "b = 0\n",
 717 |     "# Compute weighted sum + bias\n",
 718 |     "y = ____\n",
 719 |     "# Threshold the output of the regressor using a sigmoid function\n",
 720 |     "z = ____\n",
 721 |     "print(z)"
 722 |    ]
 723 |   },
 724 |   {
 725 |    "cell_type": "markdown",
 726 |    "metadata": {},
 727 |    "source": [
 728 |     "### HANDS ON: SLP For many data points"
 729 |    ]
 730 |   },
 731 |   {
 732 |    "cell_type": "markdown",
 733 |    "metadata": {},
 734 |    "source": [
 735 |     "This was using a SLP for a single data point. You'll now write code to generalize to multiple data points:"
 736 |    ]
 737 |   },
 738 |   {
 739 |    "cell_type": "markdown",
 740 |    "metadata": {},
 741 |    "source": [
 742 |     "We'll stick with the same toy e-commerce example:"
 743 |    ]
 744 |   },
 745 |   {
 746 |    "cell_type": "code",
 747 |    "execution_count": null,
 748 |    "metadata": {},
 749 |    "outputs": [],
 750 |    "source": [
 751 |     "# Create 5 data points\n",
 752 |     "x = np.array([[10, 29, 2], [23, 3, 9], [11, 4, 3], [6, 15, 2], [15, 3, 3]])\n",
 753 |     "print(x)"
 754 |    ]
 755 |   },
 756 |   {
 757 |    "cell_type": "markdown",
 758 |    "metadata": {},
 759 |    "source": [
 760 |     "Now you're going to hand code an SLP regressor and classifier for these 5 data points:"
 761 |    ]
 762 |   },
 763 |   {
 764 |    "cell_type": "code",
 765 |    "execution_count": null,
 766 |    "metadata": {},
 767 |    "outputs": [],
 768 |    "source": [
 769 |     "# SLP for regression\n",
 770 |     "# Set weights, one for each feature\n",
 771 |     "w = np.random.normal(size=3)\n",
 772 |     "# Set bias\n",
 773 |     "b = -25\n",
 774 |     "# Compute weighted sum + bias\n",
 775 |     "y = ____\n",
 776 |     "y"
 777 |    ]
 778 |   },
 779 |   {
 780 |    "cell_type": "code",
 781 |    "execution_count": null,
 782 |    "metadata": {},
 783 |    "outputs": [],
 784 |    "source": [
 785 |     "#SLP for classification\n",
 786 |     "# Set weights, one for each feature\n",
 787 |     "w = ____\n",
 788 |     "# Set bias\n",
 789 |     "b = 0\n",
 790 |     "# Compute weighted sum + bias\n",
 791 |     "y = ____\n",
 792 |     "print(y)\n",
 793 |     "# Threshold the output of the regressor using a step function\n",
 794 |     "z = ____\n",
 795 |     "z"
 796 |    ]
 797 |   },
 798 |   {
 799 |    "cell_type": "markdown",
 800 |    "metadata": {},
 801 |    "source": [
 802 |     "For bonus points, you can also turn it into a logistic regression classifier:"
 803 |    ]
 804 |   },
 805 |   {
 806 |    "cell_type": "code",
 807 |    "execution_count": null,
 808 |    "metadata": {},
 809 |    "outputs": [],
 810 |    "source": [
 811 |     "# Logreg\n",
 812 |     "# Set weights, one for each feature\n",
 813 |     "w = ____\n",
 814 |     "# Set bias\n",
 815 |     "b = 25\n",
 816 |     "# Compute weighted sum + bias\n",
 817 |     "y = ____\n",
 818 |     "# Threshold the output of the regressor using logreg\n",
 819 |     "z = ____\n",
 820 |     "z"
 821 |    ]
 822 |   },
 823 |   {
 824 |    "cell_type": "markdown",
 825 |    "metadata": {},
 826 |    "source": [
 827 |     "### Multilayer perceptron"
 828 |    ]
 829 |   },
 830 |   {
 831 |    "cell_type": "markdown",
 832 |    "metadata": {},
 833 |    "source": [
 834 |     "Neural networks generally have many layers between the input and output layers. These layers are called *hidden layers*. To see how these work, let's add one layer to get a multilayer perceptron, such as in the image below! Image from [here](https://www.researchgate.net/figure/A-hypothetical-example-of-Multilayer-Perceptron-Network_fig4_303875065). "
 835 |    ]
 836 |   },
 837 |   {
 838 |    "cell_type": "markdown",
 839 |    "metadata": {},
 840 |    "source": [
 841 |     "![title](../img/mlp.png)"
 842 |    ]
 843 |   },
 844 |   {
 845 |    "cell_type": "markdown",
 846 |    "metadata": {},
 847 |    "source": [
 848 |     "Notes:\n",
 849 |     "* Each of the 5 node in 1st hidden layer has 4 inputs so it will have a 4 x 5 array for weights;\n",
 850 |     "* The output layer has one node and 5 inputs so will have a 5 x 1 array of weights."
 851 |    ]
 852 |   },
 853 |   {
 854 |    "cell_type": "markdown",
 855 |    "metadata": {},
 856 |    "source": [
 857 |     "Let's stick with the toy e-commerce example from above (which has 3 inputs, not 4, remember). We'll first define the data and the weights:"
 858 |    ]
 859 |   },
 860 |   {
 861 |    "cell_type": "code",
 862 |    "execution_count": null,
 863 |    "metadata": {},
 864 |    "outputs": [],
 865 |    "source": [
 866 |     "x = np.array([[10, 29, 2]]) # generate data\n",
 867 |     "w1 = np.random.normal(size=(3, 5)) # weights for hidden layer\n",
 868 |     "w2 = np.random.normal(size=(5, 1)) # weights for output layer\n",
 869 |     "b1 = np.random.normal(size=(1, 5))\n",
 870 |     "b2 = np.random.normal(size=(1, 1))"
 871 |    ]
 872 |   },
 873 |   {
 874 |    "cell_type": "markdown",
 875 |    "metadata": {},
 876 |    "source": [
 877 |     "And now we'll build our MLP classifier: for each layer, we'll perform the same computation as we did for the SLP above:"
 878 |    ]
 879 |   },
 880 |   {
 881 |    "cell_type": "code",
 882 |    "execution_count": null,
 883 |    "metadata": {},
 884 |    "outputs": [],
 885 |    "source": [
 886 |     "# MLP classifier\n",
 887 |     "# First layer\n",
 888 |     "y1 = ____ # @ is matrix multiplication (generalization of dot product above)\n",
 889 |     "print(y1)\n",
 890 |     "# Second layer\n",
 891 |     "y2 = ____\n",
 892 |     "print(y2)\n",
 893 |     "# Output thresholding\n",
 894 |     "z = ____\n",
 895 |     "print(z)"
 896 |    ]
 897 |   },
 898 |   {
 899 |    "cell_type": "markdown",
 900 |    "metadata": {},
 901 |    "source": [
 902 |     "Notes:\n",
 903 |     "* We've used a sigmoid function in the final layer. \"True perceptrons\" use a (Heaviside step) function but, generally speaking, if you use other functions, such as a sigmoid, it's still called a multilayer perceptron ([there's more about this here on wikipedia](https://en.wikipedia.org/wiki/Multilayer_perceptron));\n",
 904 |     "* Similarly, \"true perceptrons\" are classifiers *but* MLPs, in the more general sense, can also be regressors;\n",
 905 |     "* To build the MLP above, we've essentially just concatenated two linear operations so we still only have a linear regression! If the problem is non-linear, this won't be much use. To deal with non-linearities, we use activation functions. Let's do it!"
 906 |    ]
 907 |   },
 908 |   {
 909 |    "cell_type": "markdown",
 910 |    "metadata": {},
 911 |    "source": [
 912 |     "### Activation functions"
 913 |    ]
 914 |   },
 915 |   {
 916 |    "cell_type": "markdown",
 917 |    "metadata": {},
 918 |    "source": [
 919 |     "Historically, `tanh` has been a popular activation function (see below). We've also already seen sigmoid. A popular one these days is ReLU (Rectified Linear Unit), which is defined by:"
 920 |    ]
 921 |   },
 922 |   {
 923 |    "cell_type": "code",
 924 |    "execution_count": null,
 925 |    "metadata": {},
 926 |    "outputs": [],
 927 |    "source": [
 928 |     "def relu(x):\n",
 929 |     "    \"\"\"Computes ReLu function\"\"\"\n",
 930 |     "    return np.maximum(0,x)"
 931 |    ]
 932 |   },
 933 |   {
 934 |    "cell_type": "markdown",
 935 |    "metadata": {},
 936 |    "source": [
 937 |     "At this point, we won't dive into which ones to use when  in great detail but did want to highlight some common ones.\n",
 938 |     "For rules of thumb around which to use, consider the output layer:\n",
 939 |     "\n",
 940 |     "- If the output needs to be squished between 0 and 1, use classic \"sigmoid\".\n",
 941 |     "- If the output needs to be positive-only, use ReLU.\n",
 942 |     "- If the output needs to be squished between -1 and 1, use tanh.\n",
 943 |     "\n",
 944 |     "\n",
 945 |     "Let's plot these functions together, to get a sense of what they look like:"
 946 |    ]
 947 |   },
 948 |   {
 949 |    "cell_type": "code",
 950 |    "execution_count": null,
 951 |    "metadata": {},
 952 |    "outputs": [],
 953 |    "source": [
 954 |     "# Set range of x-axis\n",
 955 |     "x = np.arange(-5, 5, 0.1)\n",
 956 |     "# Figure size\n",
 957 |     "plt.figure(figsize=(10,8))\n",
 958 |     "# Plot the curves\n",
 959 |     "plt.plot(x, relu(x), linewidth=4, label=\"relu\");\n",
 960 |     "plt.plot(x, sigmoid(x), linewidth=4, label=\"sigmoid\")\n",
 961 |     "plt.plot(x, np.tanh(x), linewidth=4, label=\"tanh\")\n",
 962 |     "plt.legend(loc=\"upper left\");"
 963 |    ]
 964 |   },
 965 |   {
 966 |    "cell_type": "markdown",
 967 |    "metadata": {},
 968 |    "source": [
 969 |     "**Note:** It's pretty cool that ReLU has been so powerful in introducing non-linearities into deep learning when it itself is piecewise linear with only two linear components!"
 970 |    ]
 971 |   },
 972 |   {
 973 |    "cell_type": "markdown",
 974 |    "metadata": {},
 975 |    "source": [
 976 |     "### HANDS ON: Adding activation functions to your MLP"
 977 |    ]
 978 |   },
 979 |   {
 980 |    "cell_type": "markdown",
 981 |    "metadata": {},
 982 |    "source": [
 983 |     "Use ReLU MLP now for 5 data points and one hidden layer, which has 8 nodes:"
 984 |    ]
 985 |   },
 986 |   {
 987 |    "cell_type": "code",
 988 |    "execution_count": null,
 989 |    "metadata": {},
 990 |    "outputs": [],
 991 |    "source": [
 992 |     "# 5 data points\n",
 993 |     "x = np.array([[10, 29, 2], [23, 3, 9], [11, 4, 3], [6, 15, 2], [15, 3, 3]])"
 994 |    ]
 995 |   },
 996 |   {
 997 |    "cell_type": "code",
 998 |    "execution_count": null,
 999 |    "metadata": {},
1000 |    "outputs": [],
1001 |    "source": [
1002 |     "# generate weights and biases\n",
1003 |     "w1 = np.random.normal(size=____)\n",
1004 |     "w2 = np.random.normal(size=____)\n",
1005 |     "b1 = np.random.normal(size=____)\n",
1006 |     "b2 = np.random.normal(loc = 50, size=(1, 1))"
1007 |    ]
1008 |   },
1009 |   {
1010 |    "cell_type": "code",
1011 |    "execution_count": null,
1012 |    "metadata": {},
1013 |    "outputs": [],
1014 |    "source": [
1015 |     "# Compute 1st layer, including activation function\n",
1016 |     "y1 = ____\n",
1017 |     "z1 = ____\n",
1018 |     "print(z1)"
1019 |    ]
1020 |   },
1021 |   {
1022 |    "cell_type": "code",
1023 |    "execution_count": null,
1024 |    "metadata": {},
1025 |    "outputs": [],
1026 |    "source": [
1027 |     "# second layer + activation\n",
1028 |     "z2 = ____\n",
1029 |     "y = ____\n",
1030 |     "print(y)"
1031 |    ]
1032 |   },
1033 |   {
1034 |    "cell_type": "markdown",
1035 |    "metadata": {},
1036 |    "source": [
1037 |     "### Deeper Networks"
1038 |    ]
1039 |   },
1040 |   {
1041 |    "cell_type": "markdown",
1042 |    "metadata": {},
1043 |    "source": [
1044 |     "Now we'll build a regressor with 4 hidden layers, in which each layer has 3 nodes (setting $b=0$ throughout to simplify slightly):"
1045 |    ]
1046 |   },
1047 |   {
1048 |    "cell_type": "code",
1049 |    "execution_count": null,
1050 |    "metadata": {},
1051 |    "outputs": [],
1052 |    "source": [
1053 |     "# 5 data points\n",
1054 |     "x = np.array([[10, 29, 2], [23, 3, 9], [11, 4, 3], [6, 15, 2], [15, 3, 3]])\n",
1055 |     "n = 4 # number of hidden layers\n",
1056 |     "# Initialize weights dictionary\n",
1057 |     "weights = {}\n",
1058 |     "for i in range(n):\n",
1059 |     "    #print(f\"weights_{i}\")\n",
1060 |     "    # Set weights for each layer\n",
1061 |     "    weights[i] = ____\n",
1062 |     "weights[n] = ____\n",
1063 |     "weights[0]"
1064 |    ]
1065 |   },
1066 |   {
1067 |    "cell_type": "code",
1068 |    "execution_count": null,
1069 |    "metadata": {},
1070 |    "outputs": [],
1071 |    "source": [
1072 |     "# forward propogation\n",
1073 |     "y = ____ # first layer\n",
1074 |     "for i in range(n-1):\n",
1075 |     "    y = ____ # hidden layers\n",
1076 |     "y = ____  # final layer\n",
1077 |     "print(y)"
1078 |    ]
1079 |   },
1080 |   {
1081 |    "cell_type": "markdown",
1082 |    "metadata": {},
1083 |    "source": [
1084 |     "**A note on representation learning:** one of the most important sub-tasks of machine learning is feature engineering. One interesting aspect of deep learning is that neural networks tend to learn features implicitly, as a result of their structure. For example, the more layers a neural network has, the more complex features it can recognise: early layers can identify edges, then then combinations of edges, then corners, the more complex features, and so on. For more on representation learning, check out [\"Representation Learning: A Review and New Perspectives\"](https://arxiv.org/abs/1206.5538) by  Bengio et al.\n",
1085 |     "\n",
1086 |     "Now we've got a handle on forward prop., let's dive into thinking about fitting/training our neural networks, backprop, and gradient descent!"
1087 |    ]
1088 |   },
1089 |   {
1090 |    "cell_type": "markdown",
1091 |    "metadata": {},
1092 |    "source": [
1093 |     "## 2.2 Backpropagation"
1094 |    ]
1095 |   },
1096 |   {
1097 |    "cell_type": "markdown",
1098 |    "metadata": {},
1099 |    "source": [
1100 |     "Backpropagation is the algorithm used to optimize the weights of neural networks. Before jumping into backprop, let's first check out how gradient descent can be used to optimize the weights of perceptrons."
1101 |    ]
1102 |   },
1103 |   {
1104 |    "cell_type": "markdown",
1105 |    "metadata": {},
1106 |    "source": [
1107 |     "## Gradient descent"
1108 |    ]
1109 |   },
1110 |   {
1111 |    "cell_type": "markdown",
1112 |    "metadata": {},
1113 |    "source": [
1114 |     "Now we know how to use forward propogation to make predictions, it's time to think about how to train a neural network! That is, how we determine the best model parameters. Reminder: our NN model parameters are the weights and biases.\n",
1115 |     "\n",
1116 |     "We want to minimize the difference between the target variable $y$ and the prediction made by our forward propagation algorithm. So after a forward pass, we use *gradient descent* to change the weights and then do another forward pass and see if we have improved our predictions. Image below from [here](https://www.datasciencecentral.com/profiles/blogs/alternatives-to-the-gradient-descent-algorithm)."
1117 |    ]
1118 |   },
1119 |   {
1120 |    "cell_type": "markdown",
1121 |    "metadata": {},
1122 |    "source": [
1123 |     "![title](../img/gradient-descent.png)"
1124 |    ]
1125 |   },
1126 |   {
1127 |    "cell_type": "markdown",
1128 |    "metadata": {},
1129 |    "source": [
1130 |     "### Gradient Descent and the Single Layer Perceptron"
1131 |    ]
1132 |   },
1133 |   {
1134 |    "cell_type": "markdown",
1135 |    "metadata": {},
1136 |    "source": [
1137 |     "Gradient descent is about optimizing the weights after a round of forward propagation."
1138 |    ]
1139 |   },
1140 |   {
1141 |    "cell_type": "markdown",
1142 |    "metadata": {},
1143 |    "source": [
1144 |     "![title](../img/perceptron.jpg)"
1145 |    ]
1146 |   },
1147 |   {
1148 |    "cell_type": "markdown",
1149 |    "metadata": {},
1150 |    "source": [
1151 |     "Let's remind ourselves of SLP regressor forward propagation for a single data point. Let's write a function for our SLP, as we've written it out a few times already (if you do it 3+ times, write a function!):"
1152 |    ]
1153 |   },
1154 |   {
1155 |    "cell_type": "code",
1156 |    "execution_count": null,
1157 |    "metadata": {},
1158 |    "outputs": [],
1159 |    "source": [
1160 |     "# write SLP function\n",
1161 |     "def slp(x, w, b):\n",
1162 |     "    \"\"\"Computes single layer perceptron\"\"\"\n",
1163 |     "    y = ____\n",
1164 |     "    return y"
1165 |    ]
1166 |   },
1167 |   {
1168 |    "cell_type": "markdown",
1169 |    "metadata": {},
1170 |    "source": [
1171 |     "Now let's use this function to do one forward pass for a single data point:"
1172 |    ]
1173 |   },
1174 |   {
1175 |    "cell_type": "code",
1176 |    "execution_count": null,
1177 |    "metadata": {},
1178 |    "outputs": [],
1179 |    "source": [
1180 |     "# for single data point\n",
1181 |     "x = np.array([[10, 29, 2, 7]])/100\n",
1182 |     "y = 10 # target variable\n",
1183 |     "b = 0 # bias \n",
1184 |     "w = ____ # initialize weights\n",
1185 |     "# Compute SLP\n",
1186 |     "y_hat = ____\n",
1187 |     "print(y_hat)"
1188 |    ]
1189 |   },
1190 |   {
1191 |    "cell_type": "markdown",
1192 |    "metadata": {},
1193 |    "source": [
1194 |     "* Discuss how we want to shift each weight slightly in a direction that will improve the prediction *so* we look at how bad the prediction was (prediction minus actual value), take the dot product with the relevant xs, and multiply by the learning rate (which we set and this decides how drastic the changes to the weights will be). This is essentially calculating the slope and we then move down in that direction!\n",
1195 |     "* also note that if the prediction is correct, then y_hat - y is zero and there's no change at all.\n",
1196 |     "* note that we're updating all weights simultaneously also (in contrast to backpropagation, as we'll see)."
1197 |    ]
1198 |   },
1199 |   {
1200 |    "cell_type": "markdown",
1201 |    "metadata": {},
1202 |    "source": [
1203 |     "Now let's hand code one pass of gradient descent:"
1204 |    ]
1205 |   },
1206 |   {
1207 |    "cell_type": "code",
1208 |    "execution_count": null,
1209 |    "metadata": {},
1210 |    "outputs": [],
1211 |    "source": [
1212 |     "# Forward prop\n",
1213 |     "y_hat = slp(x, w, b)\n",
1214 |     "# Set learning rate\n",
1215 |     "learning_rate = 0.1\n",
1216 |     "# The change in w\n",
1217 |     "delta_w = learning_rate*((y_hat - y) * x)\n",
1218 |     "# The change in b\n",
1219 |     "delta_b = learning_rate*(y_hat - y) \n",
1220 |     "# Change weights and bias\n",
1221 |     "w = (w + delta_w).reshape(4)\n",
1222 |     "b = b + delta_b"
1223 |    ]
1224 |   },
1225 |   {
1226 |    "cell_type": "markdown",
1227 |    "metadata": {},
1228 |    "source": [
1229 |     "### HANDS ON: Plot model performance over epochs"
1230 |    ]
1231 |   },
1232 |   {
1233 |    "cell_type": "markdown",
1234 |    "metadata": {},
1235 |    "source": [
1236 |     "It's your turn to now plot the difference between y_hat and y as we alternate between forward prop and updating the weights using gradient descent. Note that a round of forward prop and gradient descent in this case is called an _epoch_. More generally, an _epoch_ is \"one pass over the dataset\" (not just one forward prop and gradient descent run. So for example, if you have 2000 data points, and a batch size of 50 -- how many data points you use in each round of forward prop & gradient descent -- then you have 40 iterations before you hit one epoch."
1237 |    ]
1238 |   },
1239 |   {
1240 |    "cell_type": "code",
1241 |    "execution_count": null,
1242 |    "metadata": {
1243 |     "scrolled": true
1244 |    },
1245 |    "outputs": [],
1246 |    "source": [
1247 |     "# define some lists to plot y_hat - y as we iterate\n",
1248 |     "y = 0.1\n",
1249 |     "n_epochs = 30\n",
1250 |     "diff = list() # initialize list of errors\n",
1251 |     "for _ in range(n_epochs):\n",
1252 |     "    # Forward prop (SLP)\n",
1253 |     "    y_hat = ____\n",
1254 |     "    # Append error to diff\n",
1255 |     "    ____\n",
1256 |     "    # Set learning rate\n",
1257 |     "    learning_rate = 0.1\n",
1258 |     "    # Change in w, b\n",
1259 |     "    delta_w = ____\n",
1260 |     "    delta_b = ____ \n",
1261 |     "    #print(delta_w)\n",
1262 |     "    # Perform gradient descent by making change to w, b\n",
1263 |     "    w = ____\n",
1264 |     "    b = ____\n",
1265 |     "plt.plot(diff);"
1266 |    ]
1267 |   },
1268 |   {
1269 |    "cell_type": "markdown",
1270 |    "metadata": {},
1271 |    "source": [
1272 |     "### HANDS ON: For many data points"
1273 |    ]
1274 |   },
1275 |   {
1276 |    "cell_type": "markdown",
1277 |    "metadata": {},
1278 |    "source": [
1279 |     "It's now time to perform the same but for many data points! Instead of plotting the difference (which will be an array/vector), we'll plot the dot product of the difference with itself divided by the number of data points, which is a measure of distance (this is the mean squared error, if you know that term!)."
1280 |    ]
1281 |   },
1282 |   {
1283 |    "cell_type": "code",
1284 |    "execution_count": null,
1285 |    "metadata": {},
1286 |    "outputs": [],
1287 |    "source": [
1288 |     "# Create data array\n",
1289 |     "x = np.array([[10, 29, 2], [23, 3, 9], [11, 4, 3], [6, 15, 2], [15, 3, 3]])/10\n",
1290 |     "w = np.random.normal(size=3) # weights\n",
1291 |     "b = 0.1 # bias\n",
1292 |     "diff = list() # initialize list of MSE\n",
1293 |     "for _ in range(10):\n",
1294 |     "    # Forward prop (SLP)\n",
1295 |     "    y_hat = ____\n",
1296 |     "    # Append MSE to diff\n",
1297 |     "    ____\n",
1298 |     "    # Set learning rate\n",
1299 |     "    learning_rate = 0.1\n",
1300 |     "    # Change in w\n",
1301 |     "    delta_w = ____\n",
1302 |     "    #print(delta_w)\n",
1303 |     "    # Perform gradient descent by making change to w\n",
1304 |     "    w = ____\n",
1305 |     "plt.plot(diff);"
1306 |    ]
1307 |   },
1308 |   {
1309 |    "cell_type": "markdown",
1310 |    "metadata": {},
1311 |    "source": [
1312 |     "### Backpropagation"
1313 |    ]
1314 |   },
1315 |   {
1316 |    "cell_type": "markdown",
1317 |    "metadata": {},
1318 |    "source": [
1319 |     "In a similar manner to how forward prop takes the input data of features through your neural network and outputs a prediction in the output layer, backprop takes the error from your prediction and propogates it back through the network.\n",
1320 |     "\n",
1321 |     "Backprop calculates the slopes necessary to update the weights as it propagates back through the network. To calculate these slopes, we need use the chain rule from calculus, which is outside the scope of this notebook. You can find out more from Sebastian Raschka's great work [here](https://sebastianraschka.com/faq/docs/backprop-arbitrary.html).\n",
1322 |     "\n",
1323 |     "To say a little more in order to give a sense of things, what we're doing when performing backpropagation is we're approximating the slope of the error (loss) function with respect to each weight. What the chain rule tells us is that the slope (gradient) of the error with respect to a given weight is the product of:\n",
1324 |     "\n",
1325 |     "- the value of the node going into that weight,\n",
1326 |     "- the slope of the activation function at the weight's output, and\n",
1327 |     "- the slope of the error (loss) function with respect to the node it goes into."
1328 |    ]
1329 |   },
1330 |   {
1331 |    "cell_type": "markdown",
1332 |    "metadata": {},
1333 |    "source": [
1334 |     "Below you can see a schematic of backprop (many thanks to [Sebastian Raschka](https://sebastianraschka.com/) for this!). The mathematics and calculus are not really the important parts at the moment, but more to understand the _flow_ of backprop through the network, in the opposite direction to forward propagation. In the next notebook, you'll see how to hand-code backprop using NumPy."
1335 |    ]
1336 |   },
1337 |   {
1338 |    "cell_type": "markdown",
1339 |    "metadata": {},
1340 |    "source": [
1341 |     "![](https://github.com/rasbt/python-machine-learning-book-3rd-edition/blob/master/ch12/images/12_12.png)"
1342 |    ]
1343 |   },
1344 |   {
1345 |    "cell_type": "markdown",
1346 |    "metadata": {},
1347 |    "source": [
1348 |     "![alt text](../img/rasbt-backprop.png)"
1349 |    ]
1350 |   }
1351 |  ],
1352 |  "metadata": {
1353 |   "kernelspec": {
1354 |    "display_name": "Python 3",
1355 |    "language": "python",
1356 |    "name": "python3"
1357 |   },
1358 |   "language_info": {
1359 |    "codemirror_mode": {
1360 |     "name": "ipython",
1361 |     "version": 3
1362 |    },
1363 |    "file_extension": ".py",
1364 |    "mimetype": "text/x-python",
1365 |    "name": "python",
1366 |    "nbconvert_exporter": "python",
1367 |    "pygments_lexer": "ipython3",
1368 |    "version": "3.6.10"
1369 |   }
1370 |  },
1371 |  "nbformat": 4,
1372 |  "nbformat_minor": 1
1373 | }
1374 | 


--------------------------------------------------------------------------------
/notebooks/2-Instructor-deep-learning-from-scratch-pytorch.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Deep learning from scratch"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "# Learning objectives of the notebook"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "- Extend the ideas developed in the previous notebook to build a neural network using functions;\n",
 22 |     "- Implement functions for the initialization of, forward propagation through, and updating of a neural network;\n",
 23 |     "- Apply the logic of backpropagation in more detail;\n",
 24 |     "- Hand-code gradient descent using `numpy` for a more realistic size of problem."
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "markdown",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "Having spent spent some time working on the ideas of supervised learning and getting familiar with the terminology of neural networks, let's write some code to implement a neural network from scratch. We're going to use a functional programming style to help build intuition. To make matters easier, we'll use a dictionary called `model` to store all data associated with the neural network (the weight matrices, the  bias vectors, etc.) and pass that into functions as a single argument. We'll also assume that the activation functions are the same in all the layers (i.e., the *logistic* or *sigmoid* function) to simplify the implementation. Production codes usually use an object-oriented style to build networks and, of course, are optimized for efficiency (unlike what we'll develop here)."
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "metadata": {},
 37 |    "source": [
 38 |     "We're going to borrow notation from Michael Neilsen's [*Neural Networks and Deep Learning*](http://neuralnetworksanddeeplearning.com) to make life easier. In particular, we'll let $W^\\ell$ and $b^\\ell$ denote the weight matrices & bias vectors respectively associated with the $\\ell$th layer of the network. The entry $W^{\\ell}_{jk}$ of $W^\\ell$ is the weight parameter associated with the link connecting the $k$th neuron in layer $\\ell-1$ to the $j$th neuron in layer $\\ell$:\n",
 39 |     "\n",
 40 |     "[![](../img/tikz16.png)](http://neuralnetworksanddeeplearning.com/chap2.html)\n",
 41 |     "\n",
 42 |     "Let's put this altogether now and construct a network from scratch. We start with some typical imports."
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": null,
 48 |    "metadata": {},
 49 |    "outputs": [],
 50 |    "source": [
 51 |     "%matplotlib inline\n",
 52 |     "import numpy as np\n",
 53 |     "import matplotlib.pyplot as plt"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "markdown",
 58 |    "metadata": {},
 59 |    "source": [
 60 |     "## 1. Create an initialization function to set up model\n",
 61 |     "\n",
 62 |     "Rather than the fixed constants in `setup` from before, write a function `initialize_model` that accepts a list  `dimensions` of positive integer inputs that constructs a `dict` with specific key-value pairs:\n",
 63 |     "+ `model['nlayers']` : number of layers in neural network\n",
 64 |     "+ `model['weights']` : list of NumPy matrices with appropriate dimensions\n",
 65 |     "+ `model['biases']` : list of NumPy (column) vectors of appropriate dimensions\n",
 66 |     "+ The matrices in `model['weights']` and the vectors in `model['biases']` should be initialized as randomly arrays of the appropriate shapes.\n",
 67 |     "\n",
 68 |     "If the input list `dimensions` has `L+1` entries, the number of layers is `L` (the first entry of `dimensions` is the input dimension, the next ones are the number of units/neurons in each subsequent layer going up to the output layer).\n",
 69 |     "Thus, for example:\n",
 70 |     "\n",
 71 |     "```python\n",
 72 |     ">>> dimensions = [784, 15, 10]\n",
 73 |     ">>> model = initialize_model(dimensions)\n",
 74 |     ">>> for k, (W, b) in enumerate(zip(model['weights'], model['biases'])):\n",
 75 |     ">>>    print(f'Layer {k+1}:\\tShape of W{k+1}: {W.shape}\\tShape of b{k+1}: {b.shape}')\n",
 76 |     "```\n",
 77 |     "```\n",
 78 |     "Layer 1:\tShape of W1: (15, 784)\tShape of b1: (15, 1)\n",
 79 |     "Layer 2:\tShape of W2: (10, 15)\tShape of b2: (10, 1)\n",
 80 |     "```"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": null,
 86 |    "metadata": {},
 87 |    "outputs": [],
 88 |    "source": [
 89 |     "def initialize_model(dimensions):\n",
 90 |     "    '''Accepts a list of positive integers; returns a dict 'model' with key/values as follows:\n",
 91 |     "      model['nlayers'] : number of layers in neural network\n",
 92 |     "      model['weights'] : list of NumPy matrices with appropriate dimensions\n",
 93 |     "      model['biases'] : list of NumPy (column) vectors of appropriate dimensions\n",
 94 |     "    These correspond to the weight matrices & bias vectors associated with each layer of a neural network.'''\n",
 95 |     "    weights, biases = [], []\n",
 96 |     "    L = len(dimensions) - 1 # number of layers (i.e., excludes input layer)\n",
 97 |     "    for l in range(L):\n",
 98 |     "        W = np.random.randn(dimensions[l+1], dimensions[l])\n",
 99 |     "        b = np.random.randn(dimensions[l+1], 1)\n",
100 |     "        weights.append(W)\n",
101 |     "        biases.append(b)\n",
102 |     "    return dict(weights=weights, biases=biases, nlayers=L)"
103 |    ]
104 |   },
105 |   {
106 |    "cell_type": "code",
107 |    "execution_count": null,
108 |    "metadata": {},
109 |    "outputs": [],
110 |    "source": [
111 |     "# Use a test example to illustrate that the network is initialized as needed\n",
112 |     "dimensions = [784, 15, 10]\n",
113 |     "model = initialize_model(dimensions)\n",
114 |     "for k, (W, b) in enumerate(zip(model['weights'], model['biases'])):\n",
115 |     "    print(f'Layer {k+1}:\\tShape of W{k+1}: {W.shape}\\tShape of b{k+1}: {b.shape}')"
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "code",
120 |    "execution_count": null,
121 |    "metadata": {},
122 |    "outputs": [],
123 |    "source": [
124 |     "# Let's examine the weight matrix & bias vector associated with the second layer.\n",
125 |     "print(f'W2:\\n\\n{model[\"weights\"][1]}')  # Expect a 10x15 matrix of random numbers\n",
126 |     "print(f'b2:\\n\\n{model[\"biases\"][1]}')   # Expect a 10x1 vector of random numbers"
127 |    ]
128 |   },
129 |   {
130 |    "cell_type": "markdown",
131 |    "metadata": {},
132 |    "source": [
133 |     "## 2. Implement activation function(s), loss functions, & their derivatives\n",
134 |     "For today's purposes, we'll use only the *logistic* or *sigmoid* function as an activation function:\n",
135 |     "$$ \\sigma(x) = \\frac{1}{1+\\exp(-x)} = \\frac{\\exp(x)}{1+\\exp(x)}.$$\n",
136 |     "A bit of calculus shows that\n",
137 |     "$$ \\sigma'(x) = \\sigma(x)(1-\\sigma(x)) .$$\n",
138 |     "\n",
139 |     "Actually, a more numerically robust formula for $\\sigma(x)$ (i.e., one that works for large positive or large negative input equally well) is\n",
140 |     "$$\n",
141 |     "\\sigma(x) = \\begin{cases} \\frac{1}{1+\\exp(-x)} & (x\\ge0) \\\\ 1 - \\frac{1}{1+\\exp(x)} & \\mathrm{otherwise} \\end{cases}.\n",
142 |     "$$"
143 |    ]
144 |   },
145 |   {
146 |    "cell_type": "markdown",
147 |    "metadata": {},
148 |    "source": [
149 |     "For the loss function, we'll use the typical \"$L_2$-norm of the error\" (alternatively called *mean-square error (MSE)* when averaged over a batch of values:\n",
150 |     "$$ \\mathcal{E}(\\hat{y},y) = \\frac{1}{2} \\|\\hat{y}-y\\|^{2} = \\frac{1}{2} \\sum_{k=1}^{d} \\left[ \\hat{y}_{k}-y_{k} \\right]^{2}.$$\n",
151 |     "Again, using multivariable calculus, we can see that\n",
152 |     "$$\\nabla_{\\hat{y}} \\mathcal{E}(\\hat{y},y) = \\hat{y} - y.$$\n",
153 |     "\n",
154 |     "Implement all four of these functions below."
155 |    ]
156 |   },
157 |   {
158 |    "cell_type": "code",
159 |    "execution_count": null,
160 |    "metadata": {},
161 |    "outputs": [],
162 |    "source": [
163 |     "def sigma(x):\n",
164 |     "    '''The logistic function; accepts arbitrary arrays as input (vectorized)'''\n",
165 |     "    return np.where(x>=0, 1/(1+np.exp(-x)), 1 - 1/(1+np.exp(x))) # piecewise for numerical robustness\n",
166 |     "def sigma_prime(x):\n",
167 |     "    '''The *derivative* of the logistic function; accepts arbitrary arrays as input (vectorized)'''\n",
168 |     "    return sigma(x)*(1-sigma(x)) # Derivative of logistic function"
169 |    ]
170 |   },
171 |   {
172 |    "cell_type": "code",
173 |    "execution_count": null,
174 |    "metadata": {},
175 |    "outputs": [],
176 |    "source": [
177 |     "def loss(yhat, y):\n",
178 |     "    '''The loss as measured by the L2-norm squared of the error'''\n",
179 |     "    return 0.5 * np.square(yhat-y).sum()\n",
180 |     "def loss_prime(yhat, y):\n",
181 |     "    '''Implementation of the gradient of the loss function'''\n",
182 |     "    return (yhat - y) # gradient w.r.t. yhat"
183 |    ]
184 |   },
185 |   {
186 |    "cell_type": "markdown",
187 |    "metadata": {},
188 |    "source": [
189 |     "## 3. Implement a function for forward propagation\n",
190 |     "\n",
191 |     "Write a function `forward` that uses the architecture described in a `dict` as created by `initialize_model` to evaluate the output of the neural network for a given input *column* vector `x`.\n",
192 |     "+ Take $a^{0}=x$ from the input.\n",
193 |     "+ For $\\ell=1,\\dotsc,L$, compute & store the intermediate computed vectors $z^{\\ell}=W^{\\ell}a^{\\ell-1}+b^{\\ell}$ (the *weighted inputs*) and $a^{\\ell}=\\sigma\\left(z^{\\ell}\\right)$ (the *activations*) in an updated dictionary `model`. That is, modify the input dictionary `model` so as to accumulate:\n",
194 |     "  + `model['activations']`: a list with entries $a^{\\ell}$ for $\\ell=0,\\dotsc,L$\n",
195 |     "  + `model['z_inputs']`: a list with entries $z^{\\ell}$ for $\\ell=1,\\dotsc,L$\n",
196 |     "+ The function should return the computed output $a^{L}$ and the modified dictionary `model`.\n",
197 |     "Notice that input `z` can be a matrix of dimension $n_{0} \\times N_{\\mathrm{batch}}$ corresponding to a batch of input vectors (here, $n_0$ is the dimension of the expected input vectors)."
198 |    ]
199 |   },
200 |   {
201 |    "cell_type": "code",
202 |    "execution_count": null,
203 |    "metadata": {},
204 |    "outputs": [],
205 |    "source": [
206 |     "# Abstract process into function and run tests again.\n",
207 |     "def forward(x, model):\n",
208 |     "    '''Implementation of forward propagation through a feed-forward neural network.\n",
209 |     "       x : input array oriented column-wise (i.e., features along the rows)\n",
210 |     "       model : dict with same keys as output of initialize_model & appropriate lists in 'weights' & 'biases'\n",
211 |     "    The output dict model is the same as the input with additional keys 'z_inputs' & 'activations';\n",
212 |     "    these are accumulated to be used later for backpropagation. Notice the lists model['z_inputs'] &\n",
213 |     "    model['activations'] both have the same number of entries as model['weights'] & model['biases']\n",
214 |     "    (one for each layer).\n",
215 |     "    '''\n",
216 |     "    a = x\n",
217 |     "    activations = [a]\n",
218 |     "    zs = []\n",
219 |     "    for W, b in zip(model['weights'], model['biases']):\n",
220 |     "        z = W @ a + b\n",
221 |     "        a = sigma(z)\n",
222 |     "        zs.append(z)\n",
223 |     "        activations.append(a)\n",
224 |     "    model['activations'], model['z_inputs'] = activations, zs\n",
225 |     "    return (a, model)"
226 |    ]
227 |   },
228 |   {
229 |    "cell_type": "code",
230 |    "execution_count": null,
231 |    "metadata": {},
232 |    "outputs": [],
233 |    "source": [
234 |     "# Use a test example to illustrate that the network is initialized as needed\n",
235 |     "dimensions = [784, 15, 10]\n",
236 |     "model = initialize_model(dimensions)\n",
237 |     "print(f'Before executing *forward*:\\nkeys == {model.keys()}')"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "code",
242 |    "execution_count": null,
243 |    "metadata": {},
244 |    "outputs": [],
245 |    "source": [
246 |     "N_batch = 3  # Let's use, say, 3 random inputs & their corresponding outputs\n",
247 |     "x_input = np.random.rand(dimensions[0], N_batch)\n",
248 |     "y = np.random.rand(dimensions[-1], N_batch)"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "code",
253 |    "execution_count": null,
254 |    "metadata": {},
255 |    "outputs": [],
256 |    "source": [
257 |     "y_hat, model = forward(x_input, model)  # the dict model is *updated* by forward propagation\n",
258 |     "print(f'After executing *forward*:\\nkeys == {model.keys()}')\n",
259 |     "# Observe additional dict keys: 'activations' & 'z_inputs'"
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "markdown",
264 |    "metadata": {},
265 |    "source": [
266 |     "### Algorithm for backpropagation:\n",
267 |     "\n",
268 |     "#### (optional reading for the mathematically brave)\n",
269 |     "\n",
270 |     "The description here is based on the *wonderfully concise* description from Michael Neilsen's [*Neural Networks and Deep Learning*](http://neuralnetworksanddeeplearning.com/chap2.html). Neilsen has artfully crafted a summary using the bare minimum mathematical prerequisites. The notation elegantly summarises the important ideas in a way to make implementation easy in array-based frameworks like Matlab or NumPy. This is the best description I (Dhavide) know of that does this.\n",
271 |     "\n",
272 |     "In the following, $\\mathcal{E}$ is the loss function and the symbol $\\odot$ is the [*Hadamard product*](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) of two conforming arrays; this is simply a fancy way of writing the usual element-wise product of arrays as computed by NumPy & is sometimes called the *Schur product*. This can be reformulated in usual matrix algebra for analysis.\n",
273 |     "\n",
274 |     "Given a neural network with $L$ layers (not including the \"input layer\") described by an appropriate architecture:\n",
275 |     "\n",
276 |     "1. Input $x$: Set the corresponding activation $a^{0} \\leftarrow x$ for the input layer.\n",
277 |     "2. Feedforward: For each $\\ell=1,2,\\dotsc,L$, compute *weighted inputs* $z^{\\ell}$ & *activations* $a^{\\ell}$ using the formulas\n",
278 |     "$$\n",
279 |     "\\begin{aligned}\n",
280 |     "z^{\\ell} & \\leftarrow  W^{\\ell} a^{\\ell-1} + b^{\\ell}, \\\\\n",
281 |     "a^{\\ell} & \\leftarrow  \\sigma\\left( z^{\\ell}\\right)\n",
282 |     "\\end{aligned}.\n",
283 |     "$$\n",
284 |     "3. Starting from the end, compute the \"error\" in the output layer $\\delta^{L}$ according to the formula\n",
285 |     "$$\n",
286 |     "\\delta^{L} \\leftarrow \\nabla_{a^{L}} \\mathcal{E} \\odot \\sigma'\\left(z^{L}\\right)\n",
287 |     "$$\n",
288 |     "\n",
289 |     "4. *Backpropagate* the \"error\" for $\\ell=L−1\\dotsc,1$ using the formula\n",
290 |     "$$\n",
291 |     "\\delta^{\\ell} \\leftarrow \\left[ W^{\\ell+1}\\right]^{T}\\delta^{\\ell+1} \\odot \\sigma'\\left(z^{\\ell}\\right).\n",
292 |     "$$\n",
293 |     "5. The required gradients of the loss function $\\mathcal{E}$ with respect to the parameters $W^{\\ell}_{p,q}$ and $b^{\\ell}_{r}$ can be computed directly from the \"errors\" $\\left\\{ \\delta^{\\ell} \\right\\}$ and the weighted inputs $\\left\\{ z^{\\ell} \\right\\}$ according to the relations\n",
294 |     "$$\n",
295 |     "\\begin{aligned}\n",
296 |     "   \\frac{\\partial \\mathcal{E}}{\\partial W^{\\ell}_{p,q}} &= a^{\\ell-1}_{q} \\delta^{\\ell}_{p} &&(\\ell=1,\\dotsc,L)\\\\\n",
297 |     "   \\frac{\\partial \\mathcal{E}}{\\partial b^{\\ell}_{r}} &= \\delta^{\\ell}_{r} &&\n",
298 |     "\\end{aligned}\n",
299 |     "$$"
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "markdown",
304 |    "metadata": {},
305 |    "source": [
306 |     "## 4. Implement a function for backward propagation\n",
307 |     "\n",
308 |     "**This one is a freebie!**\n",
309 |     "\n",
310 |     "Implement a function `backward` that implements the back-propagation algorithm to compute the gradients of the loss function $\\mathcal{E}$ with respect to the weight matrices $W^{\\ell}$ and the bias vectors $b^{\\ell}$.\n",
311 |     "+ The function should accept a column vector `y` of output labels and an appropriate dictionary `model` as input.\n",
312 |     "+ The dict `model` is assumed to have been generated *after* a call to `forward`; that is, `model` should have keys `'w_inputs'` and `'activations'` as computed by a call to `forward`.\n",
313 |     "+ The result will be a modified dictionary `model` with two additional key-value pairs:\n",
314 |     "  + `model['grad_weights']`: a list with entries $\\nabla_{W^{\\ell}} \\mathcal{E}$ for $\\ell=1,\\dotsc,L$\n",
315 |     "  + `model['grad_biases']`: a list with entries $\\nabla_{b^{\\ell}} \\mathcal{E}$ for $\\ell=1,\\dotsc,L$\n",
316 |     "+ Notice the dimensions of the matrices $\\nabla_{W^{\\ell}} \\mathcal{E}$ and the vectors $\\nabla_{b^{\\ell}} \\mathcal{E}$ will be identical to those of ${W^{\\ell}}$ and ${b^{\\ell}}$ respectively.\n",
317 |     "+ The function's return value is the modified dictionary `model`.\n",
318 |     "\n",
319 |     "\n",
320 |     "We've done this for you (in the interest of time). Notice that input `y` can be a matrix of dimension $n_{L} \\times N_{\\mathrm{batch}}$ corresponding to a batch of output vectors (here, $n_L$ is the number of units in the output layer)."
321 |    ]
322 |   },
323 |   {
324 |    "cell_type": "code",
325 |    "execution_count": null,
326 |    "metadata": {},
327 |    "outputs": [],
328 |    "source": [
329 |     "def backward(y, model):\n",
330 |     "    '''Implementation of backward propagation of data through the network\n",
331 |     "       y : output array oriented column-wise (i.e., features along the rows) as output by forward\n",
332 |     "       model : dict with same keys as output by forward\n",
333 |     "    Note the input needs to have keys 'nlayers', 'weights', 'biases', 'z_inputs', and 'activations'\n",
334 |     "    '''\n",
335 |     "    Nbatch = y.shape[1] # Needed to extend for batches of vectors\n",
336 |     "    # Compute the \"error\" delta^L for the output layer\n",
337 |     "    yhat = model['activations'][-1]\n",
338 |     "    z, a = model['z_inputs'][-1], model['activations'][-2]\n",
339 |     "    delta = loss_prime(yhat, y) * sigma_prime(z)\n",
340 |     "    # Use delta^L to compute gradients w.r.t b & W in the output layer.\n",
341 |     "    grad_b, grad_W = delta @ np.ones((Nbatch, 1)), np.dot(delta, a.T)\n",
342 |     "    grad_weights, grad_biases = [grad_W], [grad_b]\n",
343 |     "    loop_iterates = zip(model['weights'][-1:0:-1],\n",
344 |     "                        model['z_inputs'][-2::-1],\n",
345 |     "                        model['activations'][-3::-1])\n",
346 |     "    for W, z, a in loop_iterates:\n",
347 |     "        delta = np.dot(W.T, delta) * sigma_prime(z)\n",
348 |     "        grad_b, grad_W = delta @ np.ones((Nbatch, 1)), np.dot(delta, a.T)\n",
349 |     "        grad_weights.append(grad_W)\n",
350 |     "        grad_biases.append(grad_b)\n",
351 |     "    # We built up lists of gradients backwards, so we reverse the lists\n",
352 |     "    model['grad_weights'], model['grad_biases'] = grad_weights[::-1], grad_biases[::-1]\n",
353 |     "    return model"
354 |    ]
355 |   },
356 |   {
357 |    "cell_type": "code",
358 |    "execution_count": null,
359 |    "metadata": {},
360 |    "outputs": [],
361 |    "source": [
362 |     "# Use the test example from above. Assume model, x_input have been initialized & *forward* has been executed already.\n",
363 |     "print(f'Before executing *backward*:\\nkeys == {model.keys()}')"
364 |    ]
365 |   },
366 |   {
367 |    "cell_type": "code",
368 |    "execution_count": null,
369 |    "metadata": {},
370 |    "outputs": [],
371 |    "source": [
372 |     "model = backward(y, model)  # the dict model is updated *again* by backward propagation\n",
373 |     "print(f'After executing *backward*:\\nkeys == {model.keys()}')\n",
374 |     "# Observe additional dict keys: 'grad_weights' & 'grad_biases'"
375 |    ]
376 |   },
377 |   {
378 |    "cell_type": "markdown",
379 |    "metadata": {},
380 |    "source": [
381 |     "## 5. Implement a function to update the model parameters using computed gradients.\n",
382 |     "\n",
383 |     "Given some positive learning rate $\\eta>0$, we want to change all the weights and biases using their gradients.\n",
384 |     "Write a function `update` to compute a single step of gradient descent assuming that the model gradients have been computed for a given input vector.\n",
385 |     "+ The functions signature should be `update(eta, model)` where `eta` is a positive scalar value and `model` is a dictionary as output from `backward`.\n",
386 |     "+ The result will be an updated model with the values updated for `model['weights']` and `model['biases']`.\n",
387 |     "+ Written using array notations, these updates can be expressed as\n",
388 |     "   $$\n",
389 |     "   \\begin{aligned}\n",
390 |     "   W^{\\ell} &\\leftarrow W^{\\ell} - \\eta \\nabla_{W^{\\ell}} \\mathcal{E} &&(\\ell=1,\\dotsc,L)\\\\\n",
391 |     "   b^{\\ell} &\\leftarrow b^{\\ell} - \\eta \\nabla_{b^{\\ell}} \\mathcal{E} &&\n",
392 |     "   \\end{aligned}.\n",
393 |     "   $$\n",
394 |     "+ Written out component-wise, the preceding array expressions would be written as\n",
395 |     "   $$\n",
396 |     "   \\begin{aligned}\n",
397 |     "      W^{\\ell}_{p,q} &\\leftarrow W^{\\ell}_{p,q} - \\eta \\frac{\\partial \\mathcal{E}}{\\partial W^{\\ell}_{p,q}}\n",
398 |     "      &&(\\ell=1,\\dotsc,L)\\\\\n",
399 |     "      b^{\\ell}_{r} &\\leftarrow b^{\\ell}_{r} - \\eta \\frac{\\partial \\mathcal{E}}{\\partial b^{\\ell}_{r}} &&\n",
400 |     "   \\end{aligned}\n",
401 |     "   $$.\n",
402 |     "+ For safety, have the update step delete the keys added by calls to `forward` and `backward`, i.e., the keys `'z_inputs'`, `'activations'`, `'grad_weights'`, & `'grad_biases'`.\n",
403 |     "+ The output should be a dict `model` like before."
404 |    ]
405 |   },
406 |   {
407 |    "cell_type": "code",
408 |    "execution_count": null,
409 |    "metadata": {},
410 |    "outputs": [],
411 |    "source": [
412 |     "def update(eta, model):\n",
413 |     "    '''Use learning rate and gradients to update model parameters\n",
414 |     "       eta : learning rate (positive scalar parameter)\n",
415 |     "       model : dict with same keys as output by backward\n",
416 |     "    Output result is a modified dict model\n",
417 |     "    '''\n",
418 |     "    new_weights, new_biases = [], []\n",
419 |     "    for W, b, dW, db in zip(model['weights'], model['biases'], model['grad_weights'], model['grad_biases']):\n",
420 |     "        new_weights.append(W - (eta * dW))\n",
421 |     "        new_biases.append(b- (eta * db))\n",
422 |     "    model['weights'] = new_weights\n",
423 |     "    model['biases'] = new_biases\n",
424 |     "    # Get rid of extraneous keys/values\n",
425 |     "    for key in ['z_inputs', 'activations', 'grad_weights', 'grad_biases']:\n",
426 |     "        del model[key]\n",
427 |     "    return model"
428 |    ]
429 |   },
430 |   {
431 |    "cell_type": "code",
432 |    "execution_count": null,
433 |    "metadata": {},
434 |    "outputs": [],
435 |    "source": [
436 |     "# Use the test example from above. Assume *forward* & *backward* have been executed already.\n",
437 |     "print(f'Before executing *update*:\\nkeys == {model.keys()}')"
438 |    ]
439 |   },
440 |   {
441 |    "cell_type": "code",
442 |    "execution_count": null,
443 |    "metadata": {},
444 |    "outputs": [],
445 |    "source": [
446 |     "eta = 0.5  # Choice of learning rate\n",
447 |     "model = update(eta, model)  # the dict model is updated *again* by calling *update*\n",
448 |     "print(f'After executing *update*:\\nkeys == {model.keys()}')\n",
449 |     "# Observe fewer dict keys: extraneous keys have been freed."
450 |    ]
451 |   },
452 |   {
453 |    "cell_type": "code",
454 |    "execution_count": null,
455 |    "metadata": {},
456 |    "outputs": [],
457 |    "source": [
458 |     "# Observe the required sequence of executions: (forward -> backward -> update -> forward -> backward -> ...)\n",
459 |     "# If done out of sequence, results in KeyError\n",
460 |     "backward(y, model)  # This should cause an exception (KeyError)"
461 |    ]
462 |   },
463 |   {
464 |    "cell_type": "markdown",
465 |    "metadata": {},
466 |    "source": [
467 |     "## 6. Implement steepest descent in a loop for random training data\n",
468 |     "\n",
469 |     "Let's now attempt to use our NumPy-based model to implement the steepest descent algorithm. We'll explain these numbers shortly in the context of the MNIST digit classification problem.\n",
470 |     "\n",
471 |     "+ Generate random arrays `X` and `y` of dimensions $28^2 \\times N_{\\mathrm{batch}}$ and $10\\times N_{\\mathrm{batch}}$ respectively where $N_{\\mathrm{batch}}=10$.\n",
472 |     "+ Initialize the network architecture using `initialize_model` as above to require an input layer of $28^2$ units, a hidden layer of 15 units, and an output layer of 10 units.\n",
473 |     "+ Choose a learning rate of, say, $\\eta=0.5$ and a number of epochs `n_epoch` of, say, $30$.\n",
474 |     "+ Construct a for loop with `n_epochs` iterations in which:\n",
475 |     "    + The output `yhat` is computed from the input`X` using `forward`.\n",
476 |     "    + The function `backward` is called to compute the gradients of the loss function with respect to the weights and biases.\n",
477 |     "    + Update the network parameters using the function `update`.\n",
478 |     "    + Compute and print out the epoch (iteration counter) and the value of the loss function."
479 |    ]
480 |   },
481 |   {
482 |    "cell_type": "code",
483 |    "execution_count": null,
484 |    "metadata": {},
485 |    "outputs": [],
486 |    "source": [
487 |     "N_batch = 10\n",
488 |     "n_epochs = 30\n",
489 |     "dimensions = [784, 15, 10]\n",
490 |     "X = np.random.randn(dimensions[0], N_batch)\n",
491 |     "y = np.random.randn(dimensions[-1], N_batch)\n",
492 |     "eta = 0.5\n",
493 |     "model = initialize_model(dimensions)\n",
494 |     "\n",
495 |     "for epoch in range(n_epochs):\n",
496 |     "    yhat, model = forward(X, model)\n",
497 |     "    err = loss(yhat, y)\n",
498 |     "    print(f'Epoch: {epoch}\\tLoss: {err}')\n",
499 |     "    model = backward(y, model)\n",
500 |     "    model = update(eta, model)\n",
501 |     "\n",
502 |     "# Expect to see loss values decreasing systematically in each iteration."
503 |    ]
504 |   },
505 |   {
506 |    "cell_type": "markdown",
507 |    "metadata": {},
508 |    "source": [
509 |     "## 7. Modify the steepest descent loop to make a plot\n",
510 |     "\n",
511 |     "Let's alter the preceding loop to accumulate selected epoch & loss values in lists for plotting.\n",
512 |     "\n",
513 |     "+ Set `N_batch` and `n_epochs` to be larger, say, $50$ and $30,000$ respectively.\n",
514 |     "+ Change the preceding `for` loop so that:\n",
515 |     "    + The `epoch` counter and the loss value are accumulated into lists every, say, `SKIP` iterations where `SKIP==500`.\n",
516 |     "    + Eliminate the `print` statement(s) to save on output.\n",
517 |     "+ After the `for` loop terminates, make a `semilogy` plot to verify that the loss function is actually decreasing with sucessive epochs.\n",
518 |     "    + Use the list `epochs` to accumulate the `epoch` every 500 epochs.\n",
519 |     "    + Use the list `losses` to accumulate the values of the loss function every 500 epochs."
520 |    ]
521 |   },
522 |   {
523 |    "cell_type": "code",
524 |    "execution_count": null,
525 |    "metadata": {},
526 |    "outputs": [],
527 |    "source": [
528 |     "N_batch = 50\n",
529 |     "n_epochs = 30000\n",
530 |     "SKIP = 50\n",
531 |     "dimensions = [784, 15, 10]\n",
532 |     "X = np.random.randn(dimensions[0], N_batch)\n",
533 |     "y = np.random.randn(dimensions[-1], N_batch)\n",
534 |     "eta = 0.5\n",
535 |     "model = initialize_model(dimensions)\n",
536 |     "\n",
537 |     "# accumulate the epoch and loss in these respective lists\n",
538 |     "epochs, losses = [], []\n",
539 |     "for epoch in range(n_epochs):\n",
540 |     "    yhat, model = forward(X, model)\n",
541 |     "    model = backward(y, model)\n",
542 |     "    model = update(eta, model)\n",
543 |     "    if (divmod(epoch, SKIP)[1]==0):\n",
544 |     "        err = loss(yhat, y)\n",
545 |     "        epochs.append(epoch)\n",
546 |     "        losses.append(err)"
547 |    ]
548 |   },
549 |   {
550 |    "cell_type": "code",
551 |    "execution_count": null,
552 |    "metadata": {},
553 |    "outputs": [],
554 |    "source": [
555 |     "# code for plotting once that the lists epochs and losses are accumulated\n",
556 |     "fig = plt.figure(); ax = fig.add_subplot(111)\n",
557 |     "ax.set_xlim([0,n_epochs]); ax.set_ylim([min(losses), max(losses)]);\n",
558 |     "ax.set_xticks(epochs[::500]); ax.set_xlabel(\"Epochs\"); ax.grid(True);\n",
559 |     "ax.set_ylabel(r'$\\mathcal{E}$'); \n",
560 |     "h1 = ax.semilogy(epochs, losses, 'r-', label=r'$\\mathcal{E}$')\n",
561 |     "plt.title('Loss vs. epochs');"
562 |    ]
563 |   }
564 |  ],
565 |  "metadata": {
566 |   "kernelspec": {
567 |    "display_name": "Python 3",
568 |    "language": "python",
569 |    "name": "python3"
570 |   },
571 |   "language_info": {
572 |    "codemirror_mode": {
573 |     "name": "ipython",
574 |     "version": 3
575 |    },
576 |    "file_extension": ".py",
577 |    "mimetype": "text/x-python",
578 |    "name": "python",
579 |    "nbconvert_exporter": "python",
580 |    "pygments_lexer": "ipython3",
581 |    "version": "3.6.10"
582 |   }
583 |  },
584 |  "nbformat": 4,
585 |  "nbformat_minor": 4
586 | }
587 | 


--------------------------------------------------------------------------------
/notebooks/2-Student-deep-learning-from-scratch-pytorch.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Deep learning from scratch"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "# Learning objectives of the notebook"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "- Extend the ideas developed in the previous notebook to build a neural network using functions;\n",
 22 |     "- Implement functions for the initialization of, forward propagation through, and updating of a neural network;\n",
 23 |     "- Apply the logic of backpropagation in more detail;\n",
 24 |     "- Hand-code gradient descent using `numpy` for a more realistic size of problem."
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "markdown",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "Having spent spent some time working on the ideas of supervised learning and getting familiar with the terminology of neural networks, let's write some code to implement a neural network from scratch. We're going to use a functional programming style to help build intuition. To make matters easier, we'll use a dictionary called `model` to store all data associated with the neural network (the weight matrices, the  bias vectors, etc.) and pass that into functions as a single argument. We'll also assume that the activation functions are the same in all the layers (i.e., the *logistic* or *sigmoid* function) to simplify the implementation. Production codes usually use an object-oriented style to build networks and, of course, are optimized for efficiency (unlike what we'll develop here)."
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "metadata": {},
 37 |    "source": [
 38 |     "We're going to borrow notation from Michael Neilsen's [*Neural Networks and Deep Learning*](http://neuralnetworksanddeeplearning.com) to make life easier. In particular, we'll let $W^\\ell$ and $b^\\ell$ denote the weight matrices & bias vectors respectively associated with the $\\ell$th layer of the network. The entry $W^{\\ell}_{jk}$ of $W^\\ell$ is the weight parameter associated with the link connecting the $k$th neuron in layer $\\ell-1$ to the $j$th neuron in layer $\\ell$:\n",
 39 |     "\n",
 40 |     "[![](../img/tikz16.png)](http://neuralnetworksanddeeplearning.com/chap2.html)\n",
 41 |     "\n",
 42 |     "Let's put this altogether now and construct a network from scratch. We start with some typical imports."
 43 |    ]
 44 |   },
 45 |   {
 46 |    "cell_type": "code",
 47 |    "execution_count": null,
 48 |    "metadata": {},
 49 |    "outputs": [],
 50 |    "source": [
 51 |     "%matplotlib inline\n",
 52 |     "import numpy as np\n",
 53 |     "import matplotlib.pyplot as plt"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "markdown",
 58 |    "metadata": {},
 59 |    "source": [
 60 |     "## 1. Create an initialization function to set up model\n",
 61 |     "\n",
 62 |     "Rather than the fixed constants in `setup` from before, write a function `initialize_model` that accepts a list  `dimensions` of positive integer inputs that constructs a `dict` with specific key-value pairs:\n",
 63 |     "+ `model['nlayers']` : number of layers in neural network\n",
 64 |     "+ `model['weights']` : list of NumPy matrices with appropriate dimensions\n",
 65 |     "+ `model['biases']` : list of NumPy (column) vectors of appropriate dimensions\n",
 66 |     "+ The matrices in `model['weights']` and the vectors in `model['biases']` should be initialized as randomly arrays of the appropriate shapes.\n",
 67 |     "\n",
 68 |     "If the input list `dimensions` has `L+1` entries, the number of layers is `L` (the first entry of `dimensions` is the input dimension, the next ones are the number of units/neurons in each subsequent layer going up to the output layer).\n",
 69 |     "Thus, for example:\n",
 70 |     "\n",
 71 |     "```python\n",
 72 |     ">>> dimensions = [784, 15, 10]\n",
 73 |     ">>> model = initialize_model(dimensions)\n",
 74 |     ">>> for k, (W, b) in enumerate(zip(model['weights'], model['biases'])):\n",
 75 |     ">>>    print(f'Layer {k+1}:\\tShape of W{k+1}: {W.shape}\\tShape of b{k+1}: {b.shape}')\n",
 76 |     "```\n",
 77 |     "```\n",
 78 |     "Layer 1:\tShape of W1: (15, 784)\tShape of b1: (15, 1)\n",
 79 |     "Layer 2:\tShape of W2: (10, 15)\tShape of b2: (10, 1)\n",
 80 |     "```"
 81 |    ]
 82 |   },
 83 |   {
 84 |    "cell_type": "code",
 85 |    "execution_count": null,
 86 |    "metadata": {},
 87 |    "outputs": [],
 88 |    "source": [
 89 |     "def initialize_model(dimensions):\n",
 90 |     "    '''Accepts a list of positive integers; returns a dict 'model' with key/values as follows:\n",
 91 |     "      model['nlayers'] : number of layers in neural network\n",
 92 |     "      model['weights'] : list of NumPy matrices with appropriate dimensions\n",
 93 |     "      model['biases'] : list of NumPy (column) vectors of appropriate dimensions\n",
 94 |     "    These correspond to the weight matrices & bias vectors associated with each layer of a neural network.'''\n",
 95 |     "    #\n",
 96 |     "    # Fill in your code here.\n",
 97 |     "    #\n",
 98 |     "    return # Should return a dict as described."
 99 |    ]
100 |   },
101 |   {
102 |    "cell_type": "code",
103 |    "execution_count": null,
104 |    "metadata": {},
105 |    "outputs": [],
106 |    "source": []
107 |   },
108 |   {
109 |    "cell_type": "code",
110 |    "execution_count": null,
111 |    "metadata": {},
112 |    "outputs": [],
113 |    "source": []
114 |   },
115 |   {
116 |    "cell_type": "markdown",
117 |    "metadata": {},
118 |    "source": [
119 |     "## 2. Implement activation function(s), loss functions, & their derivatives\n",
120 |     "For today's purposes, we'll use only the *logistic* or *sigmoid* function as an activation function:\n",
121 |     "$$ \\sigma(x) = \\frac{1}{1+\\exp(-x)} = \\frac{\\exp(x)}{1+\\exp(x)}.$$\n",
122 |     "A bit of calculus shows that\n",
123 |     "$$ \\sigma'(x) = \\sigma(x)(1-\\sigma(x)) .$$\n",
124 |     "\n",
125 |     "Actually, a more numerically robust formula for $\\sigma(x)$ (i.e., one that works for large positive or large negative input equally well) is\n",
126 |     "$$\n",
127 |     "\\sigma(x) = \\begin{cases} \\frac{1}{1+\\exp(-x)} & (x\\ge0) \\\\ 1 - \\frac{1}{1+\\exp(x)} & \\mathrm{otherwise} \\end{cases}.\n",
128 |     "$$"
129 |    ]
130 |   },
131 |   {
132 |    "cell_type": "markdown",
133 |    "metadata": {},
134 |    "source": [
135 |     "For the loss function, we'll use the typical \"$L_2$-norm of the error\" (alternatively called *mean-square error (MSE)* when averaged over a batch of values:\n",
136 |     "$$ \\mathcal{E}(\\hat{y},y) = \\frac{1}{2} \\|\\hat{y}-y\\|^{2} = \\frac{1}{2} \\sum_{k=1}^{d} \\left[ \\hat{y}_{k}-y_{k} \\right]^{2}.$$\n",
137 |     "Again, using multivariable calculus, we can see that\n",
138 |     "$$\\nabla_{\\hat{y}} \\mathcal{E}(\\hat{y},y) = \\hat{y} - y.$$\n",
139 |     "\n",
140 |     "Implement all four of these functions below."
141 |    ]
142 |   },
143 |   {
144 |    "cell_type": "code",
145 |    "execution_count": null,
146 |    "metadata": {},
147 |    "outputs": [],
148 |    "source": [
149 |     "def sigma(x):\n",
150 |     "    '''The logistic function; accepts arbitrary arrays as input (vectorized)'''\n",
151 |     "    return np.where(x>=0, 1/(1+np.exp(-x)), 1 - 1/(1+np.exp(x))) # piecewise for numerical robustness\n",
152 |     "\n",
153 |     "def sigma_prime(x):\n",
154 |     "    '''The *derivative* of the logistic function; accepts arbitrary arrays as input (vectorized)'''\n",
155 |     "    #\n",
156 |     "    # Fill in your code for the derivative of logistic function\n",
157 |     "    #\n",
158 |     "    return"
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "code",
163 |    "execution_count": null,
164 |    "metadata": {},
165 |    "outputs": [],
166 |    "source": [
167 |     "def loss(yhat, y):\n",
168 |     "    '''The loss as measured by the L2-norm squared of the error'''\n",
169 |     "    #\n",
170 |     "    # Fill in your code for the loss function\n",
171 |     "    #\n",
172 |     "    return\n",
173 |     "def loss_prime(yhat, y):\n",
174 |     "    '''Implementation of the gradient of the loss function'''\n",
175 |     "    #\n",
176 |     "    # Fill in your code for the derivative of loss function\n",
177 |     "    #\n",
178 |     "    return"
179 |    ]
180 |   },
181 |   {
182 |    "cell_type": "markdown",
183 |    "metadata": {},
184 |    "source": [
185 |     "## 3. Implement a function for forward propagation\n",
186 |     "\n",
187 |     "Write a function `forward` that uses the architecture described in a `dict` as created by `initialize_model` to evaluate the output of the neural network for a given input *column* vector `x`.\n",
188 |     "+ Take $a^{0}=x$ from the input.\n",
189 |     "+ For $\\ell=1,\\dotsc,L$, compute & store the intermediate computed vectors $z^{\\ell}=W^{\\ell}a^{\\ell-1}+b^{\\ell}$ (the *weighted inputs*) and $a^{\\ell}=\\sigma\\left(z^{\\ell}\\right)$ (the *activations*) in an updated dictionary `model`. That is, modify the input dictionary `model` so as to accumulate:\n",
190 |     "  + `model['activations']`: a list with entries $a^{\\ell}$ for $\\ell=0,\\dotsc,L$\n",
191 |     "  + `model['z_inputs']`: a list with entries $z^{\\ell}$ for $\\ell=1,\\dotsc,L$\n",
192 |     "+ The function should return the computed output $a^{L}$ and the modified dictionary `model`.\n",
193 |     "Notice that input `z` can be a matrix of dimension $n_{0} \\times N_{\\mathrm{batch}}$ corresponding to a batch of input vectors (here, $n_0$ is the dimension of the expected input vectors)."
194 |    ]
195 |   },
196 |   {
197 |    "cell_type": "code",
198 |    "execution_count": null,
199 |    "metadata": {},
200 |    "outputs": [],
201 |    "source": [
202 |     "# Abstract process into function and run tests again.\n",
203 |     "def forward(x, model):\n",
204 |     "    '''Implementation of forward propagation through a feed-forward neural network.\n",
205 |     "       x : input array oriented column-wise (i.e., features along the rows)\n",
206 |     "       model : dict with same keys as output of initialize_model & appropriate lists in 'weights' & 'biases'\n",
207 |     "    The output dict model is the same as the input with additional keys 'z_inputs' & 'activations';\n",
208 |     "    these are accumulated to be used later for backpropagation. Notice the lists model['z_inputs'] &\n",
209 |     "    model['activations'] both have the same number of entries as model['weights'] & model['biases']\n",
210 |     "    (one for each layer).\n",
211 |     "    '''\n",
212 |     "    a = x\n",
213 |     "    activations = [a]\n",
214 |     "    zs = []\n",
215 |     "    #\n",
216 |     "    # Fill in the rest\n",
217 |     "    #\n",
218 |     "    return"
219 |    ]
220 |   },
221 |   {
222 |    "cell_type": "code",
223 |    "execution_count": null,
224 |    "metadata": {},
225 |    "outputs": [],
226 |    "source": []
227 |   },
228 |   {
229 |    "cell_type": "code",
230 |    "execution_count": null,
231 |    "metadata": {},
232 |    "outputs": [],
233 |    "source": []
234 |   },
235 |   {
236 |    "cell_type": "markdown",
237 |    "metadata": {},
238 |    "source": [
239 |     "### Algorithm for backpropagation:\n",
240 |     "\n",
241 |     "#### (optional reading for the mathematically brave)\n",
242 |     "\n",
243 |     "The description here is based on the *wonderfully concise* description from Michael Neilsen's [*Neural Networks and Deep Learning*](http://neuralnetworksanddeeplearning.com/chap2.html). Neilsen has artfully crafted a summary using the bare minimum mathematical prerequisites. The notation elegantly summarises the important ideas in a way to make implementation easy in array-based frameworks like Matlab or NumPy. This is the best description I (Dhavide) know of that does this.\n",
244 |     "\n",
245 |     "In the following, $\\mathcal{E}$ is the loss function and the symbol $\\odot$ is the [*Hadamard product*](https://en.wikipedia.org/wiki/Hadamard_product_(matrices)) of two conforming arrays; this is simply a fancy way of writing the usual element-wise product of arrays as computed by NumPy & is sometimes called the *Schur product*. This can be reformulated in usual matrix algebra for analysis.\n",
246 |     "\n",
247 |     "Given a neural network with $L$ layers (not including the \"input layer\") described by an appropriate architecture:\n",
248 |     "\n",
249 |     "1. Input $x$: Set the corresponding activation $a^{0} \\leftarrow x$ for the input layer.\n",
250 |     "2. Feedforward: For each $\\ell=1,2,\\dotsc,L$, compute *weighted inputs* $z^{\\ell}$ & *activations* $a^{\\ell}$ using the formulas\n",
251 |     "$$\n",
252 |     "\\begin{aligned}\n",
253 |     "z^{\\ell} & \\leftarrow  W^{\\ell} a^{\\ell-1} + b^{\\ell}, \\\\\n",
254 |     "a^{\\ell} & \\leftarrow  \\sigma\\left( z^{\\ell}\\right)\n",
255 |     "\\end{aligned}.\n",
256 |     "$$\n",
257 |     "3. Starting from the end, compute the \"error\" in the output layer $\\delta^{L}$ according to the formula\n",
258 |     "$$\n",
259 |     "\\delta^{L} \\leftarrow \\nabla_{a^{L}} \\mathcal{E} \\odot \\sigma'\\left(z^{L}\\right)\n",
260 |     "$$\n",
261 |     "\n",
262 |     "4. *Backpropagate* the \"error\" for $\\ell=L−1\\dotsc,1$ using the formula\n",
263 |     "$$\n",
264 |     "\\delta^{\\ell} \\leftarrow \\left[ W^{\\ell+1}\\right]^{T}\\delta^{\\ell+1} \\odot \\sigma'\\left(z^{\\ell}\\right).\n",
265 |     "$$\n",
266 |     "5. The required gradients of the loss function $\\mathcal{E}$ with respect to the parameters $W^{\\ell}_{p,q}$ and $b^{\\ell}_{r}$ can be computed directly from the \"errors\" $\\left\\{ \\delta^{\\ell} \\right\\}$ and the weighted inputs $\\left\\{ z^{\\ell} \\right\\}$ according to the relations\n",
267 |     "$$\n",
268 |     "\\begin{aligned}\n",
269 |     "   \\frac{\\partial \\mathcal{E}}{\\partial W^{\\ell}_{p,q}} &= a^{\\ell-1}_{q} \\delta^{\\ell}_{p} &&(\\ell=1,\\dotsc,L)\\\\\n",
270 |     "   \\frac{\\partial \\mathcal{E}}{\\partial b^{\\ell}_{r}} &= \\delta^{\\ell}_{r} &&\n",
271 |     "\\end{aligned}\n",
272 |     "$$"
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "markdown",
277 |    "metadata": {},
278 |    "source": [
279 |     "## 4. Implement a function for backward propagation\n",
280 |     "\n",
281 |     "**This one is a freebie!**\n",
282 |     "\n",
283 |     "Implement a function `backward` that implements the back-propagation algorithm to compute the gradients of the loss function $\\mathcal{E}$ with respect to the weight matrices $W^{\\ell}$ and the bias vectors $b^{\\ell}$.\n",
284 |     "+ The function should accept a column vector `y` of output labels and an appropriate dictionary `model` as input.\n",
285 |     "+ The dict `model` is assumed to have been generated *after* a call to `forward`; that is, `model` should have keys `'w_inputs'` and `'activations'` as computed by a call to `forward`.\n",
286 |     "+ The result will be a modified dictionary `model` with two additional key-value pairs:\n",
287 |     "  + `model['grad_weights']`: a list with entries $\\nabla_{W^{\\ell}} \\mathcal{E}$ for $\\ell=1,\\dotsc,L$\n",
288 |     "  + `model['grad_biases']`: a list with entries $\\nabla_{b^{\\ell}} \\mathcal{E}$ for $\\ell=1,\\dotsc,L$\n",
289 |     "+ Notice the dimensions of the matrices $\\nabla_{W^{\\ell}} \\mathcal{E}$ and the vectors $\\nabla_{b^{\\ell}} \\mathcal{E}$ will be identical to those of ${W^{\\ell}}$ and ${b^{\\ell}}$ respectively.\n",
290 |     "+ The function's return value is the modified dictionary `model`.\n",
291 |     "\n",
292 |     "\n",
293 |     "We've done this for you (in the interest of time). Notice that input `y` can be a matrix of dimension $n_{L} \\times N_{\\mathrm{batch}}$ corresponding to a batch of output vectors (here, $n_L$ is the number of units in the output layer)."
294 |    ]
295 |   },
296 |   {
297 |    "cell_type": "code",
298 |    "execution_count": null,
299 |    "metadata": {},
300 |    "outputs": [],
301 |    "source": [
302 |     "def backward(y, model):\n",
303 |     "    '''Implementation of backward propagation of data through the network\n",
304 |     "       y : output array oriented column-wise (i.e., features along the rows) as output by forward\n",
305 |     "       model : dict with same keys as output by forward\n",
306 |     "    Note the input needs to have keys 'nlayers', 'weights', 'biases', 'z_inputs', and 'activations'\n",
307 |     "    '''\n",
308 |     "    Nbatch = y.shape[1] # Needed to extend for batches of vectors\n",
309 |     "    # Compute the \"error\" delta^L for the output layer\n",
310 |     "    yhat = model['activations'][-1]\n",
311 |     "    z, a = model['z_inputs'][-1], model['activations'][-2]\n",
312 |     "    delta = loss_prime(yhat, y) * sigma_prime(z)\n",
313 |     "    # Use delta^L to compute gradients w.r.t b & W in the output layer.\n",
314 |     "    grad_b, grad_W = delta @ np.ones((Nbatch, 1)), np.dot(delta, a.T)\n",
315 |     "    grad_weights, grad_biases = [grad_W], [grad_b]\n",
316 |     "    loop_iterates = zip(model['weights'][-1:0:-1],\n",
317 |     "                        model['z_inputs'][-2::-1],\n",
318 |     "                        model['activations'][-3::-1])\n",
319 |     "    for W, z, a in loop_iterates:\n",
320 |     "        delta = np.dot(W.T, delta) * sigma_prime(z)\n",
321 |     "        grad_b, grad_W = delta @ np.ones((Nbatch, 1)), np.dot(delta, a.T)\n",
322 |     "        grad_weights.append(grad_W)\n",
323 |     "        grad_biases.append(grad_b)\n",
324 |     "    # We built up lists of gradients backwards, so we reverse the lists\n",
325 |     "    model['grad_weights'], model['grad_biases'] = grad_weights[::-1], grad_biases[::-1]\n",
326 |     "    return model"
327 |    ]
328 |   },
329 |   {
330 |    "cell_type": "markdown",
331 |    "metadata": {},
332 |    "source": [
333 |     "## 5. Implement a function to update the model parameters using computed gradients.\n",
334 |     "\n",
335 |     "Given some positive learning rate $\\eta>0$, we want to change all the weights and biases using their gradients.\n",
336 |     "Write a function `update` to compute a single step of gradient descent assuming that the model gradients have been computed for a given input vector.\n",
337 |     "+ The functions signature should be `update(eta, model)` where `eta` is a positive scalar value and `model` is a dictionary as output from `backward`.\n",
338 |     "+ The result will be an updated model with the values updated for `model['weights']` and `model['biases']`.\n",
339 |     "+ Written using array notations, these updates can be expressed as\n",
340 |     "   $$\n",
341 |     "   \\begin{aligned}\n",
342 |     "   W^{\\ell} &\\leftarrow W^{\\ell} - \\eta \\nabla_{W^{\\ell}} \\mathcal{E} &&(\\ell=1,\\dotsc,L)\\\\\n",
343 |     "   b^{\\ell} &\\leftarrow b^{\\ell} - \\eta \\nabla_{b^{\\ell}} \\mathcal{E} &&\n",
344 |     "   \\end{aligned}.\n",
345 |     "   $$\n",
346 |     "+ Written out component-wise, the preceding array expressions would be written as\n",
347 |     "   $$\n",
348 |     "   \\begin{aligned}\n",
349 |     "      W^{\\ell}_{p,q} &\\leftarrow W^{\\ell}_{p,q} - \\eta \\frac{\\partial \\mathcal{E}}{\\partial W^{\\ell}_{p,q}}\n",
350 |     "      &&(\\ell=1,\\dotsc,L)\\\\\n",
351 |     "      b^{\\ell}_{r} &\\leftarrow b^{\\ell}_{r} - \\eta \\frac{\\partial \\mathcal{E}}{\\partial b^{\\ell}_{r}} &&\n",
352 |     "   \\end{aligned}\n",
353 |     "   $$.\n",
354 |     "+ For safety, have the update step delete the keys added by calls to `forward` and `backward`, i.e., the keys `'z_inputs'`, `'activations'`, `'grad_weights'`, & `'grad_biases'`.\n",
355 |     "+ The output should be a dict `model` like before."
356 |    ]
357 |   },
358 |   {
359 |    "cell_type": "code",
360 |    "execution_count": null,
361 |    "metadata": {},
362 |    "outputs": [],
363 |    "source": [
364 |     "def update(eta, model):\n",
365 |     "    '''Use learning rate and gradients to update model parameters\n",
366 |     "       eta : learning rate (positive scalar parameter)\n",
367 |     "       model : dict with same keys as output by backward\n",
368 |     "    Output result is a modified dict model\n",
369 |     "    '''\n",
370 |     "    new_weights, new_biases = [], []\n",
371 |     "    for W, b, dW, db in zip(model['weights'], model['biases'], model['grad_weights'], model['grad_biases']):\n",
372 |     "        new_weights.append(____)\n",
373 |     "        new_biases.append(____)\n",
374 |     "    model['weights'] = ____\n",
375 |     "    model['biases'] = ____\n",
376 |     "    # Get rid of extraneous keys/values\n",
377 |     "    for key in ['z_inputs', 'activations', 'grad_weights', 'grad_biases']:\n",
378 |     "        del model[key]\n",
379 |     "    return model"
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "markdown",
384 |    "metadata": {},
385 |    "source": [
386 |     "## 6. Implement steepest descent in a loop for random training data\n",
387 |     "\n",
388 |     "Let's now attempt to use our NumPy-based model to implement the steepest descent algorithm. We'll explain these numbers shortly in the context of the MNIST digit classification problem.\n",
389 |     "\n",
390 |     "+ Generate random arrays `X` and `y` of dimensions $28^2 \\times N_{\\mathrm{batch}}$ and $10\\times N_{\\mathrm{batch}}$ respectively where $N_{\\mathrm{batch}}=10$.\n",
391 |     "+ Initialize the network architecture using `initialize_model` as above to require an input layer of $28^2$ units, a hidden layer of 15 units, and an output layer of 10 units.\n",
392 |     "+ Choose a learning rate of, say, $\\eta=0.5$ and a number of epochs `n_epoch` of, say, $30$.\n",
393 |     "+ Construct a for loop with `n_epochs` iterations in which:\n",
394 |     "    + The output `yhat` is computed from the input`X` using `forward`.\n",
395 |     "    + The function `backward` is called to compute the gradients of the loss function with respect to the weights and biases.\n",
396 |     "    + Update the network parameters using the function `update`.\n",
397 |     "    + Compute and print out the epoch (iteration counter) and the value of the loss function."
398 |    ]
399 |   },
400 |   {
401 |    "cell_type": "code",
402 |    "execution_count": null,
403 |    "metadata": {},
404 |    "outputs": [],
405 |    "source": [
406 |     "N_batch = 10\n",
407 |     "n_epochs = 30\n",
408 |     "dimensions = [784, 15, 10]\n",
409 |     "X = ____\n",
410 |     "y = ____\n",
411 |     "eta = 0.5\n",
412 |     "model = initialize_model(dimensions)\n",
413 |     "\n",
414 |     "for epoch in range(n_epochs):\n",
415 |     "    ___ = forward(X, model)\n",
416 |     "    err = loss(___)\n",
417 |     "    print(f'Epoch: {epoch}\\tLoss: {err}')\n",
418 |     "    # Fill in: compute the derivatives by backpropagation\n",
419 |     "    # Fill in: update the weights & biases"
420 |    ]
421 |   },
422 |   {
423 |    "cell_type": "markdown",
424 |    "metadata": {},
425 |    "source": [
426 |     "## 7. Modify the steepest descent loop to make a plot\n",
427 |     "\n",
428 |     "Let's alter the preceding loop to accumulate selected epoch & loss values in lists for plotting.\n",
429 |     "\n",
430 |     "+ Set `N_batch` and `n_epochs` to be larger, say, $50$ and $30,000$ respectively.\n",
431 |     "+ Change the preceding `for` loop so that:\n",
432 |     "    + The `epoch` counter and the loss value are accumulated into lists every, say, `SKIP` iterations where `SKIP==500`.\n",
433 |     "    + Eliminate the `print` statement(s) to save on output.\n",
434 |     "+ After the `for` loop terminates, make a `semilogy` plot to verify that the loss function is actually decreasing with sucessive epochs.\n",
435 |     "    + Use the list `epochs` to accumulate the `epoch` every 500 epochs.\n",
436 |     "    + Use the list `losses` to accumulate the values of the loss function every 500 epochs."
437 |    ]
438 |   },
439 |   {
440 |    "cell_type": "code",
441 |    "execution_count": null,
442 |    "metadata": {},
443 |    "outputs": [],
444 |    "source": [
445 |     "N_batch = 50\n",
446 |     "n_epochs = 30000\n",
447 |     "SKIP = 50\n",
448 |     "dimensions = [784, 15, 10]\n",
449 |     "X = ____\n",
450 |     "y = ____\n",
451 |     "eta = 0.5\n",
452 |     "model = initialize_model(dimensions)\n",
453 |     "\n",
454 |     "# accumulate the epoch and loss in these respective lists every SKIP epochs\n",
455 |     "epochs, losses = [], []\n",
456 |     "for epoch in range(n_epochs):\n",
457 |     "    ___ = forward(X, model)\n",
458 |     "    err = loss(___)\n",
459 |     "    print(f'Epoch: {epoch}\\tLoss: {err}')\n",
460 |     "    # Fill in: compute the derivatives by backpropagation\n",
461 |     "    # Fill in: update the weights & biases"
462 |    ]
463 |   },
464 |   {
465 |    "cell_type": "code",
466 |    "execution_count": null,
467 |    "metadata": {},
468 |    "outputs": [],
469 |    "source": [
470 |     "# code for plotting once that the lists epochs and losses are accumulated\n",
471 |     "fig = plt.figure(); ax = fig.add_subplot(111)\n",
472 |     "ax.set_xlim([0,n_epochs]); ax.set_ylim([min(losses), max(losses)]);\n",
473 |     "ax.set_xticks(epochs[::500]); ax.set_xlabel(\"Epochs\"); ax.grid(True);\n",
474 |     "ax.set_ylabel(r'$\\mathcal{E}$'); \n",
475 |     "h1 = ax.semilogy(epochs, losses, 'r-', label=r'$\\mathcal{E}$')\n",
476 |     "plt.title('Loss vs. epochs');"
477 |    ]
478 |   }
479 |  ],
480 |  "metadata": {
481 |   "kernelspec": {
482 |    "display_name": "Python 3",
483 |    "language": "python",
484 |    "name": "python3"
485 |   },
486 |   "language_info": {
487 |    "codemirror_mode": {
488 |     "name": "ipython",
489 |     "version": 3
490 |    },
491 |    "file_extension": ".py",
492 |    "mimetype": "text/x-python",
493 |    "name": "python",
494 |    "nbconvert_exporter": "python",
495 |    "pygments_lexer": "ipython3",
496 |    "version": "3.6.10"
497 |   }
498 |  },
499 |  "nbformat": 4,
500 |  "nbformat_minor": 4
501 | }
502 | 


--------------------------------------------------------------------------------
/notebooks/3-Instructor-deep-learning-from-scratch-pytorch.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Deep learning from scratch"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "# Learning objectives of the notebook"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "- Get used to working with *PyTorch Tensors*, the core data structure needed for working with neural networks;\n",
 22 |     "- Practice using the `autograd` capabilities of PyTorch Tensors to carry out backpropagation without all the pain;\n",
 23 |     "- Apply the useful PyTorch `torch.no_grad` context manager for managing memory consumption;\n",
 24 |     "- Convert a NumPy-based gradient descent algorithm into one relying on PyTorch Tensors!"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "markdown",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "# PyTorch Basics"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "metadata": {},
 37 |    "source": [
 38 |     "[PyTorch](http://pytorch.org) is a Python-based scientific computing package to support deep learning research. It provides tensor support (a replacement of NumPy, of sorts) to provide a fast & flexible platform for experimenting with neural networks."
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "code",
 43 |    "execution_count": null,
 44 |    "metadata": {
 45 |     "collapsed": false,
 46 |     "jupyter": {
 47 |      "outputs_hidden": false
 48 |     },
 49 |     "slideshow": {
 50 |      "slide_type": "fragment"
 51 |     }
 52 |    },
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "import torch\n",
 56 |     "import numpy as np\n",
 57 |     "print(f'PyTorch version: {torch.__version__}')\n",
 58 |     "print(f'NumPy version:   {np.__version__}')"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {
 64 |     "slideshow": {
 65 |      "slide_type": "subslide"
 66 |     }
 67 |    },
 68 |    "source": [
 69 |     "The principal data structures in PyTorch are *tensors*; these are pretty much the same as standard multidimensional NumPy arrays. To illustrate this, let's construct a matrix of zeros (of `long` or 64 bit integer `dtype`) in NumPy, and then in PyTorch."
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": null,
 75 |    "metadata": {
 76 |     "collapsed": false,
 77 |     "jupyter": {
 78 |      "outputs_hidden": false
 79 |     },
 80 |     "slideshow": {
 81 |      "slide_type": "fragment"
 82 |     }
 83 |    },
 84 |    "outputs": [],
 85 |    "source": [
 86 |     "# zeros construction in NumPy\n",
 87 |     "x_np = np.zeros((2,4), dtype=np.int64)\n",
 88 |     "print(x_np, x_np.dtype)"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": null,
 94 |    "metadata": {
 95 |     "collapsed": false,
 96 |     "jupyter": {
 97 |      "outputs_hidden": false
 98 |     },
 99 |     "slideshow": {
100 |      "slide_type": "fragment"
101 |     }
102 |    },
103 |    "outputs": [],
104 |    "source": [
105 |     "# zeros construction in PyTorch\n",
106 |     "x = torch.zeros(2, 4, dtype=torch.long) # Observe difference in calling syntax!\n",
107 |     "print(x, x.dtype)"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "markdown",
112 |    "metadata": {
113 |     "slideshow": {
114 |      "slide_type": "subslide"
115 |     }
116 |    },
117 |    "source": [
118 |     "You can query a tensor's size (dimensions) with the `size` method (contrast with NumPy array `shape` attribute)."
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "code",
123 |    "execution_count": null,
124 |    "metadata": {
125 |     "collapsed": false,
126 |     "jupyter": {
127 |      "outputs_hidden": false
128 |     },
129 |     "slideshow": {
130 |      "slide_type": "fragment"
131 |     }
132 |    },
133 |    "outputs": [],
134 |    "source": [
135 |     "print(x)\n",
136 |     "print(x.size())   # \"size\" is *method* for torch tensors\n",
137 |     "print(x_np.shape) # 'shape' is *attribute* returning tuple\n",
138 |     "print(x_np.size)  # \"size\"  is *attribute* for np arrays"
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "code",
143 |    "execution_count": null,
144 |    "metadata": {
145 |     "slideshow": {
146 |      "slide_type": "fragment"
147 |     }
148 |    },
149 |    "outputs": [],
150 |    "source": [
151 |     "# torch.Tensor.size() yields subclass of Python tuple\n",
152 |     "print(type(x.size()))"
153 |    ]
154 |   },
155 |   {
156 |    "cell_type": "markdown",
157 |    "metadata": {},
158 |    "source": [
159 |     "As with NumPy, there are a variety of PyTorch data types for arrays:\n",
160 |     "\n",
161 |     "|  NumPy dtype | PyTorch dtype | Alternative | Tensor class |\n",
162 |     "|:-:|:-:|:-:|:-:|\n",
163 |     "| `np.int16`  |`torch.Int16`  |`torch.short` |`ShortTensor` |\n",
164 |     "| `np.int32`  |`torch.Int32`  |`torch.Int`   |`IntTensor`   |\n",
165 |     "| `np.int64`  |`torch.Int64`  |`torch.long`  |`LongTensor`  |\n",
166 |     "| `np.float16`|`torch.float16`|`torch.half`  |`HalfTensor`  |\n",
167 |     "| `np.float32`|`torch.float32`|`torch.float` |`FloatTensor` |\n",
168 |     "| `np.float64`|`torch.float64`|`torch.double`|`DoubleTensor`|\n",
169 |     "\n",
170 |     "\n",
171 |     "Many functions and methods in PyTorch have similar names to NumPy functions & methods:"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": null,
177 |    "metadata": {},
178 |    "outputs": [],
179 |    "source": [
180 |     "print(torch.empty(3, 4, dtype=torch.short), end='\\n\\n')  # like numpy.empty\n",
181 |     "print(torch.ones(3, 4, dtype=torch.short), end='\\n\\n')   # like numpy.ones\n",
182 |     "print(torch.randn(3, 4, dtype=torch.float), end='\\n\\n')  # like numpy.random.randn"
183 |    ]
184 |   },
185 |   {
186 |    "cell_type": "markdown",
187 |    "metadata": {},
188 |    "source": [
189 |     "You can also construct PyTorch tensors from lists of numerical data or NumPy arrays."
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "code",
194 |    "execution_count": null,
195 |    "metadata": {},
196 |    "outputs": [],
197 |    "source": [
198 |     "# Constructing tensors from lists of data\n",
199 |     "print(torch.tensor([1,2,3]).dtype)  # inferred to be 64 bit integers\n",
200 |     "print(torch.Tensor([1,2,3]).dtype)  # specifically cast to 32 bit floats"
201 |    ]
202 |   },
203 |   {
204 |    "cell_type": "markdown",
205 |    "metadata": {},
206 |    "source": [
207 |     "Notice the factory function `torch.tensor` differs from the class constructor `torch.Tensor`. The former *infers* the data type of the tensor to construct from the numerical data input. By constrast, the latter is  just an alias for `Torch.FloatTensor` (i.e., the data are cast to 32 bit floating point numbers)."
208 |    ]
209 |   },
210 |   {
211 |    "cell_type": "markdown",
212 |    "metadata": {
213 |     "slideshow": {
214 |      "slide_type": "fragment"
215 |     }
216 |    },
217 |    "source": [
218 |     "PyTorch Tensors can be converted to NumPy arrays using the method `torch.Tensor.numpy`:"
219 |    ]
220 |   },
221 |   {
222 |    "cell_type": "code",
223 |    "execution_count": null,
224 |    "metadata": {
225 |     "collapsed": false,
226 |     "jupyter": {
227 |      "outputs_hidden": false
228 |     },
229 |     "slideshow": {
230 |      "slide_type": "fragment"
231 |     }
232 |    },
233 |    "outputs": [],
234 |    "source": [
235 |     "a = torch.rand(2,3)   # first, construct a random PyTorch tensor\n",
236 |     "print(a)\n",
237 |     "print(a.dtype)"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "code",
242 |    "execution_count": null,
243 |    "metadata": {
244 |     "collapsed": false,
245 |     "jupyter": {
246 |      "outputs_hidden": false
247 |     },
248 |     "slideshow": {
249 |      "slide_type": "fragment"
250 |     }
251 |    },
252 |    "outputs": [],
253 |    "source": [
254 |     "b = a.numpy()        # converts to NumPy array (shallow copy; use .copy() for deep copy)\n",
255 |     "print(b)\n",
256 |     "print(type(b))"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "markdown",
261 |    "metadata": {},
262 |    "source": [
263 |     "[What is PyTorch?](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html)  at [`pytorch.org`](https://pytorch.org) provides a quick tour through related topics (e.g., tensor indexing, arithmetic operations, elementwise functions, linear algebra, etc.). For the most part, these resemble (although not perfectly) the same corresponding tasks in NumPy.\n",
264 |     "\n",
265 |     "# Backpropagation with `autograd`\n",
266 |     "\n",
267 |     "Why PyTorch Tensors when all they seem to offer is the same functionality of NumPy arrays? Another related question is why go through the trouble to reimplement everything that's done in NumPy in PyTorch (with slightly different names & APIs)? There are two principle advantages that systems like PyTorch have over NumPy for numerical computing:\n",
268 |     "\n",
269 |     "1. **Automatic differentiation**: PyTorch includes a package called [`autograd`](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) that computes the backpropagation algorithm for users. As such, the management of gradients (and the associated memory needed) is significantly simplified with the PyTorch framework. This is, of course, very important for implementing gradient descent.\n",
270 |     "2. **GPU computation**: GPUs (graphical processing units) are widely available to speed up computation. However, GPU programming remains challenging for most developers with the memory management issues associated with moving data onto GPUs to speed up computation. With PyTorch, much of the work of moving tensors onto GPUs is handled for the user which makes programing with GPUs much easier... and this in turn speeds up a lot of neural network training.\n",
271 |     "\n",
272 |     "If we examine the object `a` created above, you can see it has an attribute `device` that can be set in various ways depending on the availability of GPU hardware."
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "code",
277 |    "execution_count": null,
278 |    "metadata": {},
279 |    "outputs": [],
280 |    "source": [
281 |     "a.device # PyTorch tensors have a `device` attribute\n",
282 |     "# Common alternatives: device(type='cpu'), device(type='cuda'), etc."
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "markdown",
287 |    "metadata": {
288 |     "slideshow": {
289 |      "slide_type": "fragment"
290 |     },
291 |     "toc-hr-collapsed": true
292 |    },
293 |    "source": [
294 |     "We'll focus mostly on automatic differentiation today as supported by the [`autograd`](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) module. Remember, our main reason for wanting to do this is to compute gradients as needed to train neural network parameters (weights & biases) with gradient descent. In PyTorch, automatic differentiation of tensors is achieved using through setting the `requires_grad` attribute to `True` for all relevant `torch.Tensor`s on construction (the default value is `False`). Alternatively, there is also a method `.requires_grad_( ... )` that modifies the `requires_grad` flag in-place (default value `False`).\n",
295 |     "\n",
296 |     "Once tensors are defined with the `requires_grad` attribute set correctly, additional space is allocated for intermediate computations (remember all the extra lists of arrays we had to maintain explicitly within the `forward` and `backward` functions?). These are used when calling `torch.Tensor.backward()` to compute all gradients recursively. The intermediate gradients computed can then be retrieved using the attribute `torch.Tensor.grad`."
297 |    ]
298 |   },
299 |   {
300 |    "cell_type": "markdown",
301 |    "metadata": {
302 |     "slideshow": {
303 |      "slide_type": "subslide"
304 |     }
305 |    },
306 |    "source": [
307 |     "### Backpropagation example\n",
308 |     "\n",
309 |     "Let's consider a simple polynomial function like below applied to a scalar value $x$:\n",
310 |     "\n",
311 |     "$\\begin{aligned} &\\mathrm{Function:} & f(x) &= 3x^4 -2x^3 + 4x^2 - x + 5 \\\\\n",
312 |     "&\\mathrm{Derivative:} & f'(x) &= 12x^3 -6 x^2 + 8x -1\\end{aligned}$"
313 |    ]
314 |   },
315 |   {
316 |    "cell_type": "markdown",
317 |    "metadata": {
318 |     "slideshow": {
319 |      "slide_type": "fragment"
320 |     }
321 |    },
322 |    "source": [
323 |     "1. Create tensor `x` with the attribute `requires_grad=True` set in the constructor."
324 |    ]
325 |   },
326 |   {
327 |    "cell_type": "code",
328 |    "execution_count": null,
329 |    "metadata": {
330 |     "slideshow": {
331 |      "slide_type": "fragment"
332 |     }
333 |    },
334 |    "outputs": [],
335 |    "source": [
336 |     "x = torch.tensor(2.0, requires_grad=True)"
337 |    ]
338 |   },
339 |   {
340 |    "cell_type": "markdown",
341 |    "metadata": {
342 |     "slideshow": {
343 |      "slide_type": "subslide"
344 |     }
345 |    },
346 |    "source": [
347 |     "2. Map the polynomial function $f$ onto tensor the `x` and assign the result to `y`. You can verify explicitly that, when $x=2$, $f(x)=51$:\n",
348 |     " $$f(2)=3(2)^4 - 2(2)^3 + 4(2)^2 -(2) +5 = 48-16+16-2+5 = 51$$"
349 |    ]
350 |   },
351 |   {
352 |    "cell_type": "code",
353 |    "execution_count": null,
354 |    "metadata": {
355 |     "slideshow": {
356 |      "slide_type": "fragment"
357 |     }
358 |    },
359 |    "outputs": [],
360 |    "source": [
361 |     "y = 3*x**4 - 2*x**3 + 4*x**2 - x + 5  # Write out computation of y explicitly.\n",
362 |     "\n",
363 |     "print(y) # Notice y has a new attribute: grad_fn\n",
364 |     "print(y.grad_fn)"
365 |    ]
366 |   },
367 |   {
368 |    "cell_type": "markdown",
369 |    "metadata": {
370 |     "slideshow": {
371 |      "slide_type": "subslide"
372 |     }
373 |    },
374 |    "source": [
375 |     "The object `y` has an associated gradient function accessible as `y.grad_fn`. When `y` is computed and stored, a set of algebraic operations is applied to the tensor `x`. If the derivatives of those operations are known, the `autograd` package provides support for computing those derivatives (that's what the `AddBackward0` object is). Invoking `y.backward()`, then, computes the value of *gradient* of `y` with respect to `x` evaluated at `x==2`:\n",
376 |     "\n",
377 |     "$$f'(2) = 12(2^3) - 6(2^2) + 8(2) - 1 = 96-24+16-1 = 87. $$\n",
378 |     "\n",
379 |     "Notice that the computed gradient value is stored in the attribute `x.grad` of the original tensor `x`."
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "code",
384 |    "execution_count": null,
385 |    "metadata": {
386 |     "lines_to_next_cell": 2,
387 |     "slideshow": {
388 |      "slide_type": "fragment"
389 |     }
390 |    },
391 |    "outputs": [],
392 |    "source": [
393 |     "y.backward() # Compute derivatives and propagate values back through tensors on which y depends\n",
394 |     "\n",
395 |     "print(x.grad)  # Expect the value 87 as a singleton tensor"
396 |    ]
397 |   },
398 |   {
399 |    "cell_type": "markdown",
400 |    "metadata": {},
401 |    "source": [
402 |     "Notice that invoking `y.backward()` a second time raises an exception. This is because the intermediate arrays required to execute the backpropagation have been released (i.e., the memory has been deallocated)."
403 |    ]
404 |   },
405 |   {
406 |    "cell_type": "code",
407 |    "execution_count": null,
408 |    "metadata": {
409 |     "lines_to_next_cell": 2,
410 |     "slideshow": {
411 |      "slide_type": "fragment"
412 |     }
413 |    },
414 |    "outputs": [],
415 |    "source": [
416 |     "y.backward() # Yields a RuntimeError (just like before calling backward before forward)"
417 |    ]
418 |   },
419 |   {
420 |    "cell_type": "markdown",
421 |    "metadata": {
422 |     "slideshow": {
423 |      "slide_type": "subslide"
424 |     }
425 |    },
426 |    "source": [
427 |     "### Another backpropagation example\n",
428 |     "\n",
429 |     "+ Use $z = \\cos(u)$ with $u=x^2$ at $x=\\sqrt{\\frac{\\pi}{3}}$\n",
430 |     "+ Expect $z=\\frac{1}{2}$ when $x=\\sqrt{\\frac{\\pi}{3}}$"
431 |    ]
432 |   },
433 |   {
434 |    "cell_type": "code",
435 |    "execution_count": null,
436 |    "metadata": {
437 |     "collapsed": false,
438 |     "jupyter": {
439 |      "outputs_hidden": false
440 |     },
441 |     "slideshow": {
442 |      "slide_type": "fragment"
443 |     }
444 |    },
445 |    "outputs": [],
446 |    "source": [
447 |     "x = torch.tensor([np.sqrt(np.pi/3)], requires_grad=True)\n",
448 |     "u = x ** 2\n",
449 |     "z = torch.cos(u)\n",
450 |     "print(f'x: {x}\\nu: {u}\\nz: {z}')"
451 |    ]
452 |   },
453 |   {
454 |    "cell_type": "markdown",
455 |    "metadata": {
456 |     "slideshow": {
457 |      "slide_type": "subslide"
458 |     }
459 |    },
460 |    "source": [
461 |     "+ Expect \n",
462 |     "  $$\\frac{dz}{dx} = \\frac{dz}{du} \\frac{du}{dx} = (-\\sin u) (2 x) = \\sqrt{\\pi}$$\n",
463 |     "  when $x=\\sqrt{\\frac{\\pi}{3}}$"
464 |    ]
465 |   },
466 |   {
467 |    "cell_type": "code",
468 |    "execution_count": null,
469 |    "metadata": {
470 |     "collapsed": false,
471 |     "jupyter": {
472 |      "outputs_hidden": false
473 |     },
474 |     "slideshow": {
475 |      "slide_type": "fragment"
476 |     }
477 |    },
478 |    "outputs": [],
479 |    "source": [
480 |     "# Now apply backward for backpropagation of derivate values\n",
481 |     "z.backward()"
482 |    ]
483 |   },
484 |   {
485 |    "cell_type": "code",
486 |    "execution_count": null,
487 |    "metadata": {
488 |     "collapsed": false,
489 |     "jupyter": {
490 |      "outputs_hidden": false
491 |     },
492 |     "slideshow": {
493 |      "slide_type": "fragment"
494 |     }
495 |    },
496 |    "outputs": [],
497 |    "source": [
498 |     "print(f'x.grad:\\t\\t\\t\\t\\t\\t{x.grad}')\n",
499 |     "x, u = x.item(), u.item() # extract scalar values\n",
500 |     "print(f'Computed derivative using analytic formula:\\t{-np.sin(u)*2*x}')"
501 |    ]
502 |   },
503 |   {
504 |    "cell_type": "markdown",
505 |    "metadata": {},
506 |    "source": [
507 |     "Notice that the tensors `x`, `u`, and `z` are all singleton tensors. The method `item` is used to extract a scalar entry out of a singleton tensor."
508 |    ]
509 |   },
510 |   {
511 |    "cell_type": "markdown",
512 |    "metadata": {},
513 |    "source": [
514 |     "# Building a Neural Network in PyTorch\n",
515 |     "\n",
516 |     "Let's now use an approach adapted from one by [Justin Johnson](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)  (BSD Clause-3 License). The goal is to convert a NumPy-constructed gradient descent process modelling a feed-forward neural network into a PyTorch neural network. The architecture is similar to the one constructed in the last notebook.\n",
517 |     "\n",
518 |     "+ The input vectors are assumed to have $784(=28^2)$ features.\n",
519 |     "+ The first layer is a hidden layer with 100 units and a *rectified linear unit* activation function (often called $\\mathrm{ReLU}$):\n",
520 |     "\n",
521 |     "$$ \\mathrm{ReLU}(x) = \\begin{cases} x, & \\mathrm{if\\ }x>0 \\\\ 0 & \\mathrm{otherwise} \\end{cases} \\quad\\Rightarrow\\quad\n",
522 |     "\\mathrm{ReLU}'(x) = \\begin{cases} 1, & \\mathrm{if\\ }x>0 \\\\ 0 & \\mathrm{otherwise} \\end{cases}.\n",
523 |     "$$\n",
524 |     "\n",
525 |     "+ The final output layer has 10 units and the activation function assiciated with this layer is the identity map.\n",
526 |     "\n",
527 |     "The loop provided below does not use functions to represent the initialization, forward propagation, backpropagation, and update steps of the steepest descent process. You'll use this as a starting point to develop a PyTorch version of this gradient descent loop."
528 |    ]
529 |   },
530 |   {
531 |    "cell_type": "code",
532 |    "execution_count": null,
533 |    "metadata": {},
534 |    "outputs": [],
535 |    "source": [
536 |     "N_batch, dimensions = 64, [784, 100, 10]\n",
537 |     "\n",
538 |     "# Create random input and output data\n",
539 |     "X = np.random.randn(dimensions[0], N_batch)\n",
540 |     "y = np.random.randn(dimensions[-1], N_batch)\n",
541 |     "\n",
542 |     "# Randomly initialize weights & biases\n",
543 |     "W1 = np.random.randn(dimensions[1], dimensions[0])\n",
544 |     "W2 = np.random.randn(dimensions[2], dimensions[1])\n",
545 |     "b1 = np.random.randn(dimensions[1], 1)\n",
546 |     "b2 = np.random.randn(dimensions[2], 1)\n",
547 |     "\n",
548 |     "eta, MAXITER, SKIP = 5e-6, 2500, 100\n",
549 |     "for epoch in range(MAXITER):\n",
550 |     "    # Forward propagation: compute predicted y\n",
551 |     "    Z1 = np.dot(W1, X) + b1\n",
552 |     "    A1 = np.maximum(Z1, 0) # ReLU function\n",
553 |     "    Z2 = np.dot(W2, A1) + b2\n",
554 |     "    A2 = Z2   # identity function on output layer\n",
555 |     "    \n",
556 |     "    # Compute and print loss\n",
557 |     "    loss = 0.5 * np.power(A2 - y, 2).sum()\n",
558 |     "    if (divmod(epoch, SKIP)[1]==0):\n",
559 |     "        print(epoch, loss)\n",
560 |     "\n",
561 |     "    # Backpropagation to compute gradients of loss with respect to W1, W2, b1, and b2\n",
562 |     "    delta2 = (A2 - y)               # derivative of identity map == multiplying by ones\n",
563 |     "    grad_W2 = np.dot(delta2, A1.T)\n",
564 |     "    grad_b2 = np.dot(delta2, np.ones((N_batch, 1)))\n",
565 |     "    delta1 = np.dot(W2.T, delta2) * (Z1>0) # derivative of ReLU is a step function\n",
566 |     "    grad_W1 = np.dot(delta1, X.T)\n",
567 |     "    grad_b1 = np.dot(delta1, np.ones((N_batch, 1)))\n",
568 |     "\n",
569 |     "    # Update weights & biases\n",
570 |     "    W1 = W1 - eta * grad_W1\n",
571 |     "    b1 = b1 - eta * grad_b1\n",
572 |     "    W2 = W2 - eta * grad_W2\n",
573 |     "    b2 = b2 - eta * grad_b2"
574 |    ]
575 |   },
576 |   {
577 |    "cell_type": "markdown",
578 |    "metadata": {},
579 |    "source": [
580 |     "## 1. Convert the preceding code to use PyTorch Tensors instead of NumPy arrays\n",
581 |     "\n",
582 |     "+ Replace use of `numpy.random.randn` with `torch.randn` to initialize the problem with PyTorch Tensors rather than NumPy arrays.\n",
583 |     "+ Replace instances of `np.dot` with [`torch.mm`](https://pytorch.org/docs/stable/torch.html#torch.mm) (both of which implement standard matrix-vector products).\n",
584 |     "+ Replace use of `np.ones` [`torch.ones`](https://pytorch.org/docs/stable/torch.html#torch.ones).\n",
585 |     "+ Replace the computation of `A1 = np.maximum(Z1, 0)` with a call to the PyTorch builtin function `torch.relu`.\n",
586 |     "+ Modify the computation of the `loss` to use PyTorch specific functions/methods (hint: there is a PyTorch `torch.Tensor.pow` method).\n",
587 |     "+ When printing the loss every hundred epochs, use the `.item()` method to extract its singleton scalar entry.\n",
588 |     "+ Make sure the loop executes in a similar fashion to the preceding loop."
589 |    ]
590 |   },
591 |   {
592 |    "cell_type": "code",
593 |    "execution_count": null,
594 |    "metadata": {},
595 |    "outputs": [],
596 |    "source": [
597 |     "N_batch, dimensions = 64, [784, 100, 10]\n",
598 |     "\n",
599 |     "# Create random input and output data\n",
600 |     "X = torch.randn(dimensions[0], N_batch)\n",
601 |     "y = torch.randn(dimensions[-1], N_batch)\n",
602 |     "\n",
603 |     "# Randomly initialize weights & biases\n",
604 |     "W1 = torch.randn(dimensions[1], dimensions[0])\n",
605 |     "W2 = torch.randn(dimensions[2], dimensions[1])\n",
606 |     "b1 = torch.randn(dimensions[1], 1)\n",
607 |     "b2 = torch.randn(dimensions[2], 1)\n",
608 |     "\n",
609 |     "eta, MAXITER, SKIP = 5e-6, 2500, 100\n",
610 |     "for epoch in range(MAXITER):\n",
611 |     "    # Forward propagation: compute predicted y\n",
612 |     "    Z1 = torch.mm(W1, X) + b1\n",
613 |     "    A1 = torch.relu(Z1)        # Native PyTorch ReLU function\n",
614 |     "    Z2 = torch.mm(W2, A1) + b2\n",
615 |     "    A2 = Z2\n",
616 |     "\n",
617 |     "    # Compute and print loss\n",
618 |     "    loss = 0.5 * (A2 - y).pow(2).sum()\n",
619 |     "    if (divmod(epoch, SKIP)[1]==0):\n",
620 |     "        print(epoch, loss.item())\n",
621 |     "\n",
622 |     "    # Backpropagation to compute gradients of loss with respect to W1, W2, b1, and b2\n",
623 |     "    delta2 = (A2 - y)               # derivative of identity map == multiplying by ones\n",
624 |     "    grad_W2 = torch.mm(delta2, A1.T)\n",
625 |     "    grad_b2 = torch.mm(delta2, torch.ones(N_batch, 1))\n",
626 |     "    delta1 = torch.mm(W2.T, delta2) * (Z1>0) # derivative of ReLU is a step function\n",
627 |     "    grad_W1 = torch.mm(delta1, X.T)\n",
628 |     "    grad_b1 = torch.mm(delta1, torch.ones(N_batch, 1))\n",
629 |     "\n",
630 |     "    # Update weights & biases\n",
631 |     "    W1 = W1 - eta * grad_W1\n",
632 |     "    b1 = b1 - eta * grad_b1\n",
633 |     "    W2 = W2 - eta * grad_W2\n",
634 |     "    b2 = b2 - eta * grad_b2"
635 |    ]
636 |   },
637 |   {
638 |    "cell_type": "markdown",
639 |    "metadata": {},
640 |    "source": [
641 |     "## 2. Use `backward()` and `grad` to compute backpropagation and updates\n",
642 |     "\n",
643 |     "Having set up the main loop with PyTorch Tensors, now make use of `autograd` to eliminate the tedious work of having to write the code to compute the gradients of the loss function with respect to `W1`, `W2`, `b1`, and `b2` explicitly.\n",
644 |     "+ Insert `requires_grad=True` as an argument in the construction of `W1`, `W2`, `b1`, and `b2`.\n",
645 |     "+ After computing the loss function value `loss`, replace all the lines used to compute gradients explicitly by a single call to `loss.backward()`.\n",
646 |     "+ Replace the update steps with gradients stored in `.grad` attributes of the weights & biases. For instance, you can now compute `W1 -= eta * W1.grad` *after* the call to `loss.backward()` rather than computing and explicitly storing `grad_W1` and later computing `W1 -= eta * grad_W1`.\n",
647 |     "    + Do these update steps within a `with torch.no_grad():` block (as provided below). The purpose of the [`torch.no_grad`](https://pytorch.org/docs/stable/torch.html#torch.no_grad) context manager is to reduce memory consumption.\n",
648 |     "    + After completing the updates, zero out the computed gradients before the next iteration by calling the method `.zero_()`. For instance, you would call `W1.grad.zero_()` to zero out the computed gradient in place. This call will be within the scope of the `torch.no_grad` context manager.\n",
649 |     "\n",
650 |     "Notice, in PyTorch, methods like `.zero_` that have a training underscore in their name operate in place, i.e., they overwrite the memory locations associated with the tensor."
651 |    ]
652 |   },
653 |   {
654 |    "cell_type": "code",
655 |    "execution_count": null,
656 |    "metadata": {},
657 |    "outputs": [],
658 |    "source": [
659 |     "# Create random input and output data\n",
660 |     "X = torch.randn(dimensions[0], N_batch)\n",
661 |     "y = torch.randn(dimensions[-1], N_batch)\n",
662 |     "\n",
663 |     "# Randomly initialize weights & biases\n",
664 |     "W1 = torch.randn(dimensions[1], dimensions[0], requires_grad=True)\n",
665 |     "W2 = torch.randn(dimensions[2], dimensions[1], requires_grad=True)\n",
666 |     "b1 = torch.randn(dimensions[1], 1, requires_grad=True)\n",
667 |     "b2 = torch.randn(dimensions[2], 1, requires_grad=True)\n",
668 |     "\n",
669 |     "eta, MAXITER, SKIP = 5e-6, 2500, 100\n",
670 |     "for epoch in range(MAXITER):\n",
671 |     "    # Forward propagation: compute predicted y\n",
672 |     "    Z1 = torch.mm(W1, X) + b1\n",
673 |     "    A1 = torch.relu(Z1)        # Native PyTorch ReLU function\n",
674 |     "    Z2 = torch.mm(W2, A1) + b2\n",
675 |     "    A2 = Z2\n",
676 |     "\n",
677 |     "    # Compute and print loss\n",
678 |     "    loss = 0.5 * (A2 - y).pow(2).sum()\n",
679 |     "    if (divmod(epoch, SKIP)[1]==0):\n",
680 |     "        print(epoch, loss.item())\n",
681 |     "\n",
682 |     "    # Backpropagation to compute gradients of loss with respect to W1, W2, b1, and b2\n",
683 |     "    loss.backward()\n",
684 |     "\n",
685 |     "    # Update weights & biases\n",
686 |     "    with torch.no_grad():\n",
687 |     "        W1 -= eta * W1.grad\n",
688 |     "        b1 -= eta * b1.grad\n",
689 |     "        W2 -= eta * W2.grad\n",
690 |     "        b2 -= eta * b2.grad\n",
691 |     "        # Manually zero the gradients after updating weights\n",
692 |     "        W1.grad.zero_()\n",
693 |     "        W2.grad.zero_()\n",
694 |     "        b1.grad.zero_()\n",
695 |     "        b2.grad.zero_()"
696 |    ]
697 |   },
698 |   {
699 |    "cell_type": "markdown",
700 |    "metadata": {},
701 |    "source": [
702 |     "---\n",
703 |     "\n",
704 |     "# What next?\n",
705 |     "\n",
706 |     "PyTorch has a large ecosystem of utilities including packages like  `torch.nn` (which is like Keras in spirit to simplify specifying a network architecture in an object-oriented way) and `torch.optim` (which makes managing different optimization schemes easier). We've covered a lot of ground in this tutorial so far, so this will be as far as we can get today. But you now should have enough of an understanding of backpropagation that you can pick up more at [`pytorch.org`](https://pytorch.org) independently."
707 |    ]
708 |   }
709 |  ],
710 |  "metadata": {
711 |   "kernelspec": {
712 |    "display_name": "Python 3",
713 |    "language": "python",
714 |    "name": "python3"
715 |   },
716 |   "language_info": {
717 |    "codemirror_mode": {
718 |     "name": "ipython",
719 |     "version": 3
720 |    },
721 |    "file_extension": ".py",
722 |    "mimetype": "text/x-python",
723 |    "name": "python",
724 |    "nbconvert_exporter": "python",
725 |    "pygments_lexer": "ipython3",
726 |    "version": "3.6.10"
727 |   }
728 |  },
729 |  "nbformat": 4,
730 |  "nbformat_minor": 4
731 | }
732 | 


--------------------------------------------------------------------------------
/notebooks/3-Student-deep-learning-from-scratch-pytorch.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Deep learning from scratch"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "# Learning objectives of the notebook"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "- Get used to working with *PyTorch Tensors*, the core data structure needed for working with neural networks;\n",
 22 |     "- Practice using the `autograd` capabilities of PyTorch Tensors to carry out backpropagation without all the pain;\n",
 23 |     "- Apply the useful PyTorch `torch.no_grad` context manager for managing memory consumption;\n",
 24 |     "- Convert a NumPy-based gradient descent algorithm into one relying on PyTorch Tensors!"
 25 |    ]
 26 |   },
 27 |   {
 28 |    "cell_type": "markdown",
 29 |    "metadata": {},
 30 |    "source": [
 31 |     "# PyTorch Basics"
 32 |    ]
 33 |   },
 34 |   {
 35 |    "cell_type": "markdown",
 36 |    "metadata": {},
 37 |    "source": [
 38 |     "[PyTorch](http://pytorch.org) is a Python-based scientific computing package to support deep learning research. It provides tensor support (a replacement of NumPy, of sorts) to provide a fast & flexible platform for experimenting with neural networks."
 39 |    ]
 40 |   },
 41 |   {
 42 |    "cell_type": "code",
 43 |    "execution_count": null,
 44 |    "metadata": {
 45 |     "collapsed": false,
 46 |     "jupyter": {
 47 |      "outputs_hidden": false
 48 |     },
 49 |     "slideshow": {
 50 |      "slide_type": "fragment"
 51 |     }
 52 |    },
 53 |    "outputs": [],
 54 |    "source": [
 55 |     "import torch\n",
 56 |     "import numpy as np\n",
 57 |     "print(f'PyTorch version: {torch.__version__}')\n",
 58 |     "print(f'NumPy version:   {np.__version__}')"
 59 |    ]
 60 |   },
 61 |   {
 62 |    "cell_type": "markdown",
 63 |    "metadata": {
 64 |     "slideshow": {
 65 |      "slide_type": "subslide"
 66 |     }
 67 |    },
 68 |    "source": [
 69 |     "The principal data structures in PyTorch are *tensors*; these are pretty much the same as standard multidimensional NumPy arrays. To illustrate this, let's construct a matrix of zeros (of `long` or 64 bit integer `dtype`) in NumPy, and then in PyTorch."
 70 |    ]
 71 |   },
 72 |   {
 73 |    "cell_type": "code",
 74 |    "execution_count": null,
 75 |    "metadata": {
 76 |     "collapsed": false,
 77 |     "jupyter": {
 78 |      "outputs_hidden": false
 79 |     },
 80 |     "slideshow": {
 81 |      "slide_type": "fragment"
 82 |     }
 83 |    },
 84 |    "outputs": [],
 85 |    "source": [
 86 |     "# zeros construction in NumPy\n",
 87 |     "x_np = np.zeros((2,4), dtype=np.int64)\n",
 88 |     "print(x_np, x_np.dtype)"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": null,
 94 |    "metadata": {
 95 |     "collapsed": false,
 96 |     "jupyter": {
 97 |      "outputs_hidden": false
 98 |     },
 99 |     "slideshow": {
100 |      "slide_type": "fragment"
101 |     }
102 |    },
103 |    "outputs": [],
104 |    "source": [
105 |     "# zeros construction in PyTorch\n",
106 |     "x = torch.zeros(2, 4, dtype=torch.long) # Observe difference in calling syntax!\n",
107 |     "print(x, x.dtype)"
108 |    ]
109 |   },
110 |   {
111 |    "cell_type": "markdown",
112 |    "metadata": {
113 |     "slideshow": {
114 |      "slide_type": "subslide"
115 |     }
116 |    },
117 |    "source": [
118 |     "You can query a tensor's size (dimensions) with the `size` method (contrast with NumPy array `shape` attribute)."
119 |    ]
120 |   },
121 |   {
122 |    "cell_type": "code",
123 |    "execution_count": null,
124 |    "metadata": {
125 |     "collapsed": false,
126 |     "jupyter": {
127 |      "outputs_hidden": false
128 |     },
129 |     "slideshow": {
130 |      "slide_type": "fragment"
131 |     }
132 |    },
133 |    "outputs": [],
134 |    "source": [
135 |     "print(x)\n",
136 |     "print(x.size())   # \"size\" is *method* for torch tensors\n",
137 |     "print(x_np.shape) # 'shape' is *attribute* returning tuple\n",
138 |     "print(x_np.size)  # \"size\"  is *attribute* for np arrays"
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "code",
143 |    "execution_count": null,
144 |    "metadata": {
145 |     "slideshow": {
146 |      "slide_type": "fragment"
147 |     }
148 |    },
149 |    "outputs": [],
150 |    "source": [
151 |     "# torch.Tensor.size() yields subclass of Python tuple\n",
152 |     "print(type(x.size()))"
153 |    ]
154 |   },
155 |   {
156 |    "cell_type": "markdown",
157 |    "metadata": {},
158 |    "source": [
159 |     "As with NumPy, there are a variety of PyTorch data types for arrays:\n",
160 |     "\n",
161 |     "|  NumPy dtype | PyTorch dtype | Alternative | Tensor class |\n",
162 |     "|:-:|:-:|:-:|:-:|\n",
163 |     "| `np.int16`  |`torch.Int16`  |`torch.short` |`ShortTensor` |\n",
164 |     "| `np.int32`  |`torch.Int32`  |`torch.Int`   |`IntTensor`   |\n",
165 |     "| `np.int64`  |`torch.Int64`  |`torch.long`  |`LongTensor`  |\n",
166 |     "| `np.float16`|`torch.float16`|`torch.half`  |`HalfTensor`  |\n",
167 |     "| `np.float32`|`torch.float32`|`torch.float` |`FloatTensor` |\n",
168 |     "| `np.float64`|`torch.float64`|`torch.double`|`DoubleTensor`|\n",
169 |     "\n",
170 |     "\n",
171 |     "Many functions and methods in PyTorch have similar names to NumPy functions & methods:"
172 |    ]
173 |   },
174 |   {
175 |    "cell_type": "code",
176 |    "execution_count": null,
177 |    "metadata": {},
178 |    "outputs": [],
179 |    "source": [
180 |     "print(torch.empty(3, 4, dtype=torch.short), end='\\n\\n')  # like numpy.empty\n",
181 |     "print(torch.ones(3, 4, dtype=torch.short), end='\\n\\n')   # like numpy.ones\n",
182 |     "print(torch.randn(3, 4, dtype=torch.float), end='\\n\\n')  # like numpy.random.randn"
183 |    ]
184 |   },
185 |   {
186 |    "cell_type": "markdown",
187 |    "metadata": {},
188 |    "source": [
189 |     "You can also construct PyTorch tensors from lists of numerical data or NumPy arrays."
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "code",
194 |    "execution_count": null,
195 |    "metadata": {},
196 |    "outputs": [],
197 |    "source": [
198 |     "# Constructing tensors from lists of data\n",
199 |     "print(torch.tensor([1,2,3]).dtype)  # inferred to be 64 bit integers\n",
200 |     "print(torch.Tensor([1,2,3]).dtype)  # specifically cast to 32 bit floats"
201 |    ]
202 |   },
203 |   {
204 |    "cell_type": "markdown",
205 |    "metadata": {},
206 |    "source": [
207 |     "Notice the factory function `torch.tensor` differs from the class constructor `torch.Tensor`. The former *infers* the data type of the tensor to construct from the numerical data input. By constrast, the latter is  just an alias for `Torch.FloatTensor` (i.e., the data are cast to 32 bit floating point numbers)."
208 |    ]
209 |   },
210 |   {
211 |    "cell_type": "markdown",
212 |    "metadata": {
213 |     "slideshow": {
214 |      "slide_type": "fragment"
215 |     }
216 |    },
217 |    "source": [
218 |     "PyTorch Tensors can be converted to NumPy arrays using the method `torch.Tensor.numpy`:"
219 |    ]
220 |   },
221 |   {
222 |    "cell_type": "code",
223 |    "execution_count": null,
224 |    "metadata": {
225 |     "collapsed": false,
226 |     "jupyter": {
227 |      "outputs_hidden": false
228 |     },
229 |     "slideshow": {
230 |      "slide_type": "fragment"
231 |     }
232 |    },
233 |    "outputs": [],
234 |    "source": [
235 |     "a = torch.rand(2,3)   # first, construct a random PyTorch tensor\n",
236 |     "print(a)\n",
237 |     "print(a.dtype)"
238 |    ]
239 |   },
240 |   {
241 |    "cell_type": "code",
242 |    "execution_count": null,
243 |    "metadata": {
244 |     "collapsed": false,
245 |     "jupyter": {
246 |      "outputs_hidden": false
247 |     },
248 |     "slideshow": {
249 |      "slide_type": "fragment"
250 |     }
251 |    },
252 |    "outputs": [],
253 |    "source": [
254 |     "b = a.numpy()        # converts to NumPy array (shallow copy; use .copy() for deep copy)\n",
255 |     "print(b)\n",
256 |     "print(type(b))"
257 |    ]
258 |   },
259 |   {
260 |    "cell_type": "markdown",
261 |    "metadata": {},
262 |    "source": [
263 |     "[What is PyTorch?](https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html)  at [`pytorch.org`](https://pytorch.org) provides a quick tour through related topics (e.g., tensor indexing, arithmetic operations, elementwise functions, linear algebra, etc.). For the most part, these resemble (although not perfectly) the same corresponding tasks in NumPy.\n",
264 |     "\n",
265 |     "# Backpropagation with `autograd`\n",
266 |     "\n",
267 |     "Why PyTorch Tensors when all they seem to offer is the same functionality of NumPy arrays? Another related question is why go through the trouble to reimplement everything that's done in NumPy in PyTorch (with slightly different names & APIs)? There are two principle advantages that systems like PyTorch have over NumPy for numerical computing:\n",
268 |     "\n",
269 |     "1. **Automatic differentiation**: PyTorch includes a package called [`autograd`](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) that computes the backpropagation algorithm for users. As such, the management of gradients (and the associated memory needed) is significantly simplified with the PyTorch framework. This is, of course, very important for implementing gradient descent.\n",
270 |     "2. **GPU computation**: GPUs (graphical processing units) are widely available to speed up computation. However, GPU programming remains challenging for most developers with the memory management issues associated with moving data onto GPUs to speed up computation. With PyTorch, much of the work of moving tensors onto GPUs is handled for the user which makes programing with GPUs much easier... and this in turn speeds up a lot of neural network training.\n",
271 |     "\n",
272 |     "If we examine the object `a` created above, you can see it has an attribute `device` that can be set in various ways depending on the availability of GPU hardware."
273 |    ]
274 |   },
275 |   {
276 |    "cell_type": "code",
277 |    "execution_count": null,
278 |    "metadata": {},
279 |    "outputs": [],
280 |    "source": [
281 |     "a.device # PyTorch tensors have a `device` attribute\n",
282 |     "# Common alternatives: device(type='cpu'), device(type='cuda'), etc."
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "markdown",
287 |    "metadata": {
288 |     "slideshow": {
289 |      "slide_type": "fragment"
290 |     },
291 |     "toc-hr-collapsed": true
292 |    },
293 |    "source": [
294 |     "We'll focus mostly on automatic differentiation today as supported by the [`autograd`](https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html) module. Remember, our main reason for wanting to do this is to compute gradients as needed to train neural network parameters (weights & biases) with gradient descent. In PyTorch, automatic differentiation of tensors is achieved using through setting the `requires_grad` attribute to `True` for all relevant `torch.Tensor`s on construction (the default value is `False`). Alternatively, there is also a method `.requires_grad_( ... )` that modifies the `requires_grad` flag in-place (default value `False`).\n",
295 |     "\n",
296 |     "Once tensors are defined with the `requires_grad` attribute set correctly, additional space is allocated for intermediate computations (remember all the extra lists of arrays we had to maintain explicitly within the `forward` and `backward` functions?). These are used when calling `torch.Tensor.backward()` to compute all gradients recursively. The intermediate gradients computed can then be retrieved using the attribute `torch.Tensor.grad`."
297 |    ]
298 |   },
299 |   {
300 |    "cell_type": "markdown",
301 |    "metadata": {
302 |     "slideshow": {
303 |      "slide_type": "subslide"
304 |     }
305 |    },
306 |    "source": [
307 |     "### Backpropagation example\n",
308 |     "\n",
309 |     "Let's consider a simple polynomial function like below applied to a scalar value $x$:\n",
310 |     "\n",
311 |     "$\\begin{aligned} &\\mathrm{Function:} & f(x) &= 3x^4 -2x^3 + 4x^2 - x + 5 \\\\\n",
312 |     "&\\mathrm{Derivative:} & f'(x) &= 12x^3 -6 x^2 + 8x -1\\end{aligned}$"
313 |    ]
314 |   },
315 |   {
316 |    "cell_type": "markdown",
317 |    "metadata": {
318 |     "slideshow": {
319 |      "slide_type": "fragment"
320 |     }
321 |    },
322 |    "source": [
323 |     "1. Create tensor `x` with the attribute `requires_grad=True` set in the constructor."
324 |    ]
325 |   },
326 |   {
327 |    "cell_type": "code",
328 |    "execution_count": null,
329 |    "metadata": {
330 |     "slideshow": {
331 |      "slide_type": "fragment"
332 |     }
333 |    },
334 |    "outputs": [],
335 |    "source": [
336 |     "x = torch.tensor(2.0, requires_grad=True)"
337 |    ]
338 |   },
339 |   {
340 |    "cell_type": "markdown",
341 |    "metadata": {
342 |     "slideshow": {
343 |      "slide_type": "subslide"
344 |     }
345 |    },
346 |    "source": [
347 |     "2. Map the polynomial function $f$ onto tensor the `x` and assign the result to `y`. You can verify explicitly that, when $x=2$, $f(x)=51$:\n",
348 |     " $$f(2)=3(2)^4 - 2(2)^3 + 4(2)^2 -(2) +5 = 48-16+16-2+5 = 51$$"
349 |    ]
350 |   },
351 |   {
352 |    "cell_type": "code",
353 |    "execution_count": null,
354 |    "metadata": {
355 |     "slideshow": {
356 |      "slide_type": "fragment"
357 |     }
358 |    },
359 |    "outputs": [],
360 |    "source": [
361 |     "y = 3*x**4 - 2*x**3 + 4*x**2 - x + 5  # Write out computation of y explicitly.\n",
362 |     "\n",
363 |     "print(y) # Notice y has a new attribute: grad_fn\n",
364 |     "print(y.grad_fn)"
365 |    ]
366 |   },
367 |   {
368 |    "cell_type": "markdown",
369 |    "metadata": {
370 |     "slideshow": {
371 |      "slide_type": "subslide"
372 |     }
373 |    },
374 |    "source": [
375 |     "The object `y` has an associated gradient function accessible as `y.grad_fn`. When `y` is computed and stored, a set of algebraic operations is applied to the tensor `x`. If the derivatives of those operations are known, the `autograd` package provides support for computing those derivatives (that's what the `AddBackward0` object is). Invoking `y.backward()`, then, computes the value of *gradient* of `y` with respect to `x` evaluated at `x==2`:\n",
376 |     "\n",
377 |     "$$f'(2) = 12(2^3) - 6(2^2) + 8(2) - 1 = 96-24+16-1 = 87. $$\n",
378 |     "\n",
379 |     "Notice that the computed gradient value is stored in the attribute `x.grad` of the original tensor `x`."
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "code",
384 |    "execution_count": null,
385 |    "metadata": {
386 |     "lines_to_next_cell": 2,
387 |     "slideshow": {
388 |      "slide_type": "fragment"
389 |     }
390 |    },
391 |    "outputs": [],
392 |    "source": [
393 |     "y.backward() # Compute derivatives and propagate values back through tensors on which y depends\n",
394 |     "\n",
395 |     "print(x.grad)  # Expect the value 87 as a singleton tensor"
396 |    ]
397 |   },
398 |   {
399 |    "cell_type": "markdown",
400 |    "metadata": {},
401 |    "source": [
402 |     "Notice that invoking `y.backward()` a second time raises an exception. This is because the intermediate arrays required to execute the backpropagation have been released (i.e., the memory has been deallocated)."
403 |    ]
404 |   },
405 |   {
406 |    "cell_type": "code",
407 |    "execution_count": null,
408 |    "metadata": {
409 |     "lines_to_next_cell": 2,
410 |     "slideshow": {
411 |      "slide_type": "fragment"
412 |     }
413 |    },
414 |    "outputs": [],
415 |    "source": [
416 |     "y.backward() # Yields a RuntimeError"
417 |    ]
418 |   },
419 |   {
420 |    "cell_type": "markdown",
421 |    "metadata": {
422 |     "slideshow": {
423 |      "slide_type": "subslide"
424 |     }
425 |    },
426 |    "source": [
427 |     "### Another backpropagation example\n",
428 |     "\n",
429 |     "+ Use $z = \\cos(u)$ with $u=x^2$ at $x=\\sqrt{\\frac{\\pi}{3}}$\n",
430 |     "+ Expect $z=\\frac{1}{2}$ when $x=\\sqrt{\\frac{\\pi}{3}}$"
431 |    ]
432 |   },
433 |   {
434 |    "cell_type": "code",
435 |    "execution_count": null,
436 |    "metadata": {
437 |     "collapsed": false,
438 |     "jupyter": {
439 |      "outputs_hidden": false
440 |     },
441 |     "slideshow": {
442 |      "slide_type": "fragment"
443 |     }
444 |    },
445 |    "outputs": [],
446 |    "source": [
447 |     "x = torch.tensor([np.sqrt(np.pi/3)], requires_grad=True)\n",
448 |     "u = x ** 2\n",
449 |     "z = torch.cos(u)\n",
450 |     "print(f'x: {x}\\nu: {u}\\nz: {z}')"
451 |    ]
452 |   },
453 |   {
454 |    "cell_type": "markdown",
455 |    "metadata": {
456 |     "slideshow": {
457 |      "slide_type": "subslide"
458 |     }
459 |    },
460 |    "source": [
461 |     "+ Expect \n",
462 |     "  $$\\frac{dz}{dx} = \\frac{dz}{du} \\frac{du}{dx} = (-\\sin u) (2 x) = \\sqrt{\\pi}$$\n",
463 |     "  when $x=\\sqrt{\\frac{\\pi}{3}}$"
464 |    ]
465 |   },
466 |   {
467 |    "cell_type": "code",
468 |    "execution_count": null,
469 |    "metadata": {
470 |     "collapsed": false,
471 |     "jupyter": {
472 |      "outputs_hidden": false
473 |     },
474 |     "slideshow": {
475 |      "slide_type": "fragment"
476 |     }
477 |    },
478 |    "outputs": [],
479 |    "source": [
480 |     "# Now apply backward for backpropagation of derivate values\n",
481 |     "z.backward()"
482 |    ]
483 |   },
484 |   {
485 |    "cell_type": "code",
486 |    "execution_count": null,
487 |    "metadata": {
488 |     "collapsed": false,
489 |     "jupyter": {
490 |      "outputs_hidden": false
491 |     },
492 |     "slideshow": {
493 |      "slide_type": "fragment"
494 |     }
495 |    },
496 |    "outputs": [],
497 |    "source": [
498 |     "print(f'x.grad:\\t\\t\\t\\t\\t\\t{x.grad}')\n",
499 |     "x, u = x.item(), u.item() # extract scalar values\n",
500 |     "print(f'Computed derivative using analytic formula:\\t{-np.sin(u)*2*x}')"
501 |    ]
502 |   },
503 |   {
504 |    "cell_type": "markdown",
505 |    "metadata": {},
506 |    "source": [
507 |     "Notice that the tensors `x`, `u`, and `z` are all singleton tensors. The method `item` is used to extract a scalar entry out of a singleton tensor."
508 |    ]
509 |   },
510 |   {
511 |    "cell_type": "markdown",
512 |    "metadata": {},
513 |    "source": [
514 |     "# Building a Neural Network in PyTorch\n",
515 |     "\n",
516 |     "Let's now use an approach adapted from one by [Justin Johnson](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)  (BSD Clause-3 License). The goal is to convert a NumPy-constructed gradient descent process modelling a feed-forward neural network into a PyTorch neural network. The architecture is similar to the one constructed in the last notebook.\n",
517 |     "\n",
518 |     "+ The input vectors are assumed to have $784(=28^2)$ features.\n",
519 |     "+ The first layer is a hidden layer with 100 units and a *rectified linear unit* activation function (often called $\\mathrm{ReLU}$):\n",
520 |     "\n",
521 |     "$$ \\mathrm{ReLU}(x) = \\begin{cases} x, & \\mathrm{if\\ }x>0 \\\\ 0 & \\mathrm{otherwise} \\end{cases} \\quad\\Rightarrow\\quad\n",
522 |     "\\mathrm{ReLU}'(x) = \\begin{cases} 1, & \\mathrm{if\\ }x>0 \\\\ 0 & \\mathrm{otherwise} \\end{cases}.\n",
523 |     "$$\n",
524 |     "\n",
525 |     "+ The final output layer has 10 units and the activation function assiciated with this layer is the identity map.\n",
526 |     "\n",
527 |     "The loop provided below does not use functions to represent the initialization, forward propagation, backpropagation, and update steps of the steepest descent process. You'll use this as a starting point to develop a PyTorch version of this gradient descent loop."
528 |    ]
529 |   },
530 |   {
531 |    "cell_type": "code",
532 |    "execution_count": null,
533 |    "metadata": {},
534 |    "outputs": [],
535 |    "source": [
536 |     "N_batch, dimensions = 64, [784, 100, 10]\n",
537 |     "\n",
538 |     "# Create random input and output data\n",
539 |     "X = np.random.randn(dimensions[0], N_batch)\n",
540 |     "y = np.random.randn(dimensions[-1], N_batch)\n",
541 |     "\n",
542 |     "# Randomly initialize weights & biases\n",
543 |     "W1 = np.random.randn(dimensions[1], dimensions[0])\n",
544 |     "W2 = np.random.randn(dimensions[2], dimensions[1])\n",
545 |     "b1 = np.random.randn(dimensions[1], 1)\n",
546 |     "b2 = np.random.randn(dimensions[2], 1)\n",
547 |     "\n",
548 |     "eta, MAXITER, SKIP = 5e-6, 2500, 100\n",
549 |     "for epoch in range(MAXITER):\n",
550 |     "    # Forward propagation: compute predicted y\n",
551 |     "    Z1 = np.dot(W1, X) + b1\n",
552 |     "    A1 = np.maximum(Z1, 0) # ReLU function\n",
553 |     "    Z2 = np.dot(W2, A1) + b2\n",
554 |     "    A2 = Z2   # identity function on output layer\n",
555 |     "    \n",
556 |     "    # Compute and print loss\n",
557 |     "    loss = 0.5 * np.power(A2 - y, 2).sum()\n",
558 |     "    if (divmod(epoch, SKIP)[1]==0):\n",
559 |     "        print(epoch, loss)\n",
560 |     "\n",
561 |     "    # Backpropagation to compute gradients of loss with respect to W1, W2, b1, and b2\n",
562 |     "    delta2 = (A2 - y)               # derivative of identity map == multiplying by ones\n",
563 |     "    grad_W2 = np.dot(delta2, A1.T)\n",
564 |     "    grad_b2 = np.dot(delta2, np.ones((N_batch, 1)))\n",
565 |     "    delta1 = np.dot(W2.T, delta2) * (Z1>0) # derivative of ReLU is a step function\n",
566 |     "    grad_W1 = np.dot(delta1, X.T)\n",
567 |     "    grad_b1 = np.dot(delta1, np.ones((N_batch, 1)))\n",
568 |     "\n",
569 |     "    # Update weights & biases\n",
570 |     "    W1 = W1 - eta * grad_W1\n",
571 |     "    b1 = b1 - eta * grad_b1\n",
572 |     "    W2 = W2 - eta * grad_W2\n",
573 |     "    b2 = b2 - eta * grad_b2"
574 |    ]
575 |   },
576 |   {
577 |    "cell_type": "markdown",
578 |    "metadata": {},
579 |    "source": [
580 |     "## 1. Convert the preceding code to use PyTorch Tensors instead of NumPy arrays\n",
581 |     "\n",
582 |     "+ Replace use of `numpy.random.randn` with `torch.randn` to initialize the problem with PyTorch Tensors rather than NumPy arrays.\n",
583 |     "+ Replace instances of `np.dot` with [`torch.mm`](https://pytorch.org/docs/stable/torch.html#torch.mm) (both of which implement standard matrix-vector products).\n",
584 |     "+ Replace use of `np.ones` [`torch.ones`](https://pytorch.org/docs/stable/torch.html#torch.ones).\n",
585 |     "+ Replace the computation of `A1 = np.maximum(Z1, 0)` with a call to the PyTorch builtin function `torch.relu`.\n",
586 |     "+ Modify the computation of the `loss` to use PyTorch specific functions/methods (hint: there is a PyTorch `torch.Tensor.pow` method).\n",
587 |     "+ When printing the loss every hundred epochs, use the `.item()` method to extract its singleton scalar entry.\n",
588 |     "+ Make sure the loop executes in a similar fashion to the preceding loop."
589 |    ]
590 |   },
591 |   {
592 |    "cell_type": "code",
593 |    "execution_count": null,
594 |    "metadata": {},
595 |    "outputs": [],
596 |    "source": [
597 |     "N_batch, dimensions = 64, [784, 100, 10]\n",
598 |     "\n",
599 |     "# Create random input and output data\n",
600 |     "X = torch.randn(dimensions[0], N_batch)\n",
601 |     "y = torch.randn(dimensions[-1], N_batch)\n",
602 |     "\n",
603 |     "# Randomly initialize weights & biases\n",
604 |     "W1 = torch.randn(dimensions[1], dimensions[0])\n",
605 |     "W2 = torch.randn(dimensions[2], dimensions[1])\n",
606 |     "b1 = torch.randn(dimensions[1], 1)\n",
607 |     "b2 = torch.randn(dimensions[2], 1)\n",
608 |     "\n",
609 |     "eta, MAXITER, SKIP = 5e-6, 2500, 100\n",
610 |     "for epoch in range(MAXITER):\n",
611 |     "    # Forward propagation: compute predicted y\n",
612 |     "    Z1 = ___\n",
613 |     "    A1 = ___        # Native PyTorch ReLU function\n",
614 |     "    Z2 = ____\n",
615 |     "    A2 = Z2\n",
616 |     "\n",
617 |     "    # Compute and print loss\n",
618 |     "    loss = ____\n",
619 |     "    if (divmod(epoch, SKIP)[1]==0):\n",
620 |     "        print(epoch, loss.item())\n",
621 |     "\n",
622 |     "    # Backpropagation to compute gradients of loss with respect to W1, W2, b1, and b2\n",
623 |     "    delta2 = (A2 - y)               # derivative of identity map == multiplying by ones\n",
624 |     "    grad_W2 = ____\n",
625 |     "    grad_b2 = ____\n",
626 |     "    delta1 = ____ # derivative of ReLU is a step function\n",
627 |     "    grad_W1 = ____\n",
628 |     "    grad_b1 = ____\n",
629 |     "\n",
630 |     "    # Update weights & biases\n",
631 |     "    W1 = W1 - eta * grad_W1\n",
632 |     "    b1 = b1 - eta * grad_b1\n",
633 |     "    W2 = W2 - eta * grad_W2\n",
634 |     "    b2 = b2 - eta * grad_b2"
635 |    ]
636 |   },
637 |   {
638 |    "cell_type": "markdown",
639 |    "metadata": {},
640 |    "source": [
641 |     "## 2. Use `backward()` and `grad` to compute backpropagation and updates\n",
642 |     "\n",
643 |     "Having set up the main loop with PyTorch Tensors, now make use of `autograd` to eliminate the tedious work of having to write the code to compute the gradients of the loss function with respect to `W1`, `W2`, `b1`, and `b2` explicitly.\n",
644 |     "+ Insert `requires_grad=True` as an argument in the construction of `W1`, `W2`, `b1`, and `b2`.\n",
645 |     "+ After computing the loss function value `loss`, replace all the lines used to compute gradients explicitly by a single call to `loss.backward()`.\n",
646 |     "+ Replace the update steps with gradients stored in `.grad` attributes of the weights & biases. For instance, you can now compute `W1 -= eta * W1.grad` *after* the call to `loss.backward()` rather than computing and explicitly storing `grad_W1` and later computing `W1 -= eta * grad_W1`.\n",
647 |     "    + Do these update steps within a `with torch.no_grad():` block (as provided below). The purpose of the [`torch.no_grad`](https://pytorch.org/docs/stable/torch.html#torch.no_grad) context manager is to reduce memory consumption.\n",
648 |     "    + After completing the updates, zero out the computed gradients before the next iteration by calling the method `.zero_()`. For instance, you would call `W1.grad.zero_()` to zero out the computed gradient in place. This call will be within the scope of the `torch.no_grad` context manager.\n",
649 |     "\n",
650 |     "Notice, in PyTorch, methods like `.zero_` that have a training underscore in their name operate in place, i.e., they overwrite the memory locations associated with the tensor."
651 |    ]
652 |   },
653 |   {
654 |    "cell_type": "code",
655 |    "execution_count": null,
656 |    "metadata": {},
657 |    "outputs": [],
658 |    "source": [
659 |     "# Create random input and output data\n",
660 |     "X = torch.randn(dimensions[0], N_batch)\n",
661 |     "y = torch.randn(dimensions[-1], N_batch)\n",
662 |     "\n",
663 |     "# Randomly initialize weights & biases\n",
664 |     "W1 = torch.randn(dimensions[1], dimensions[0], requires_grad=True)\n",
665 |     "W2 = torch.randn(dimensions[2], dimensions[1], requires_grad=True)\n",
666 |     "b1 = torch.randn(dimensions[1], 1, requires_grad=True)\n",
667 |     "b2 = torch.randn(dimensions[2], 1, requires_grad=True)\n",
668 |     "\n",
669 |     "eta, MAXITER, SKIP = 5e-6, 2500, 100\n",
670 |     "for epoch in range(MAXITER):\n",
671 |     "    # Forward propagation: compute predicted y\n",
672 |     "    Z1 = ____\n",
673 |     "    A1 = ____        # Native PyTorch ReLU function\n",
674 |     "    Z2 = ____\n",
675 |     "    A2 = Z2\n",
676 |     "\n",
677 |     "    # Compute and print loss\n",
678 |     "    loss = ____\n",
679 |     "    if (divmod(epoch, SKIP)[1]==0):\n",
680 |     "        print(epoch, loss.item())\n",
681 |     "\n",
682 |     "    # Backpropagation to compute gradients of loss with respect to W1, W2, b1, and b2\n",
683 |     "    loss.backward()\n",
684 |     "\n",
685 |     "    # Update weights & biases\n",
686 |     "    with torch.no_grad():\n",
687 |     "        #\n",
688 |     "        # Fill in the code for the update steps\n",
689 |     "        #\n",
690 |     "        # Manually zero the gradients after updating weights\n",
691 |     "        #\n",
692 |     "        # Fill in the code to zero the gradients.\n",
693 |     "        #"
694 |    ]
695 |   },
696 |   {
697 |    "cell_type": "markdown",
698 |    "metadata": {},
699 |    "source": [
700 |     "---\n",
701 |     "\n",
702 |     "# What next?\n",
703 |     "\n",
704 |     "PyTorch has a large ecosystem of utilities including packages like  `torch.nn` (which is like Keras in spirit to simplify specifying a network architecture in an object-oriented way) and `torch.optim` (which makes managing different optimization schemes easier). We've covered a lot of ground in this tutorial so far, so this will be as far as we can get today. But you now should have enough of an understanding of backpropagation that you can pick up more at [`pytorch.org`](https://pytorch.org) independently."
705 |    ]
706 |   }
707 |  ],
708 |  "metadata": {
709 |   "kernelspec": {
710 |    "display_name": "Python 3",
711 |    "language": "python",
712 |    "name": "python3"
713 |   },
714 |   "language_info": {
715 |    "codemirror_mode": {
716 |     "name": "ipython",
717 |     "version": 3
718 |    },
719 |    "file_extension": ".py",
720 |    "mimetype": "text/x-python",
721 |    "name": "python",
722 |    "nbconvert_exporter": "python",
723 |    "pygments_lexer": "ipython3",
724 |    "version": "3.6.10"
725 |   }
726 |  },
727 |  "nbformat": 4,
728 |  "nbformat_minor": 4
729 | }
730 | 


--------------------------------------------------------------------------------