├── .gitignore
├── _config.yml
├── figs
├── boundary.png
└── mlp.png
├── index.md
├── index_spring_2024.md
├── notebooks
├── README.md
├── notebook_1_perceptron.ipynb
├── notebook_2_nn.ipynb
├── notebook_3_image_classification.ipynb
├── notebook_4_augmentation_logging.ipynb
├── notebook_5_adversarial_examples.ipynb
├── notebook_6_gan.ipynb
├── notebook_7_autoencoder.ipynb
├── notebook_8_rnn.ipynb
└── slides
│ ├── DL_adversarial_examples.pdf
│ ├── DL_attention_networks.pdf
│ ├── DL_convolutional_nets.pdf
│ ├── DL_deep_reinforcement_learning.pdf
│ ├── DL_gradient_descent.pdf
│ ├── DL_lightning_and_tensorboard.ipynb
│ ├── DL_multilayer_perceptrons.pdf
│ ├── DL_perceptrons.ipynb
│ ├── DL_perceptrons.pdf
│ ├── DL_recurrent_nets.pdf
│ ├── DL_regularization.pdf
│ ├── DL_transformers.pdf
│ └── DL_unsupervised_methods.pdf
├── readings
├── RNN-tutorial-WildML.pdf
└── chapter4-ml.pdf
├── requirements.txt
├── slides
├── DL_GANs.pdf
├── DL_adversarial_examples.pdf
├── DL_attention_networks.pdf
├── DL_audio_adversarial.pdf
├── DL_convolutional_nets.pdf
├── DL_deep_reinforcement_learning.pdf
├── DL_gradient_descent.pdf
├── DL_lightning_and_tensorboard.ipynb
├── DL_multilayer_perceptrons.pdf
├── DL_perceptrons.ipynb
├── DL_perceptrons.pdf
├── DL_recurrent_nets.pdf
├── DL_regularization.pdf
├── DL_transformers.pdf
├── DL_unsupervised_methods.pdf
└── GM_deep_RL_2_policy_gradients.pdf
└── utils
├── __init__.py
├── adversarial_examples
├── __init__.py
├── data.py
├── frequency_masking.py
├── models
│ ├── __init__.py
│ ├── audiomnist_classifier.py
│ └── mnist_classifier.py
├── plotting.py
├── pretrained
│ ├── audiomnist_classifier.pt
│ └── mnist_classifier.pt
└── training.py
├── data.py
├── gan
├── __init__.py
├── data.py
└── plotting.py
└── plotting.py
/.gitignore:
--------------------------------------------------------------------------------
1 | # miscellaneous
2 | **/..git/*
3 | __pycache__/
4 | .DS_Store
5 | ._.DS_Store
6 | .ipynb_checkpoints/
7 | .vscode/
8 | *.egg-info/
9 | .pytest_cache
10 | *.ipynb_checkpoints/
11 |
12 | # ignore large data
13 | data/*
14 | logs/*
15 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/_config.yml
--------------------------------------------------------------------------------
/figs/boundary.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/figs/boundary.png
--------------------------------------------------------------------------------
/figs/mlp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/figs/mlp.png
--------------------------------------------------------------------------------
/index.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | # DEEP LEARNING: Northwestern University CS 396/496 Winter 2025
4 |
5 | |[**Top**](#top)| [**Calendar**](#calendar)| [**Links**](#links)| [**Readings**](#readings)|
6 |
7 | #### Class Day/Time
8 | Tuesdays and Thursdays, 9:30am - 10:50am Central Time
9 |
10 | #### Loctation
11 | 2122 Sheridan Rd Classroom 250
12 |
13 | #### Instructors
14 | Professor: [Bryan Pardo](http://bryanpardo.com)
15 |
16 | TAs: Hugo Flores Garcia, Patrick O'Reilly
17 |
18 | Peer Mentors: Jerry Cao, Saumya Pailwan, Anant Poddar, Nathan Pruyne
19 |
20 | #### Office hours
21 |
22 | Anant Poddar M 3pm - 5pm Mudd 3 floor front counter
23 |
24 | Nathan Pruyne M 5pm - 7pm Mudd 3207
25 |
26 | Bryan Pardo TU 11am - noon Mudd 3115
27 |
28 | Saumya Pailwan W 2pm - 4pm Mudd 3 floor front counter
29 |
30 | Jerry Cao W 5pm - 7pm Mudd 3 floor front counter
31 |
32 | Patrick O'Reilly TH 12pm - 1pm, 2pm-3pm Mudd 3207
33 |
34 | Hugo Flores Garcia TH 2pm - 4pm Mudd 3207
35 |
36 |
37 | ## Course Description
38 | This is a first course in Deep Learning. We will study deep learning architectures: perceptrons, multi-layer perceptrons, convolutional networks, recurrent neural networks (LSTMs, GRUs), attention networks, transformers, autoencoders, and the combination of reinforcement learning with deep learning. Other covered topics include regularization, loss functions and gradient descent.
39 |
40 | Learning will be in the practical context of implementing networks using these architectures in a modern programming environment: Pytorch. Homework consists of a mixture of programming assignments, review of research papers, running experiments with deep learning architectures, and theoretical questions about deep learning.
41 |
42 | Students completing this course should be able to reason about deep network architectures, build a deep network from scratch in Python, modify existing deep networks, train networks, and evaluate their performance. Students completing the course should also be able to understand current research in deep networks.
43 |
44 | ## Course Prerequisites
45 | This course presumes prior knowledge of machine learning equivalent to having taken CS 349 Machine Learning.
46 |
47 | ## Course textbook
48 | The primary text is the [Deep Learning book](http://www.deeplearningbook.org/). This reading will be supplemented by reading key papers in the field.
49 |
50 | ## Course Policies
51 | #### Questions outside of class
52 | Please use [CampusWire](https://campuswire.com) for class-related questions.
53 |
54 | #### Submitting assignments
55 | Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can't finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.
56 |
57 | #### Grading Policy
58 | You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-...and so on).
59 |
60 | Homework and reading assignments are solo assignments and must be your own original work. Use of large language models for answer generation is not allowed.
61 |
62 | #### Extra Credit
63 | There is an extra credit assignment worth 10 points. More details soon.
64 |
65 |
66 | ## Course Calendar
67 | [**Back to top**](#top)
68 |
69 | | Week|Day and Date| Topic (tentative) |Due today | Points|
70 | |----:|------------|------------------------------------------|--------------------|------:|
71 | |1 | Tue Jan 7 | [Perceptrons](slides/DL_perceptrons.pdf) | | |
72 | |1 | - | [Notebook 1: perceptrons](notebooks/notebook_1_perceptron.ipynb) | | |
73 | |1 | Thu Jan 9 | [Gradient descent](slides/DL_gradient_descent.pdf) | | |
74 | |2 | Tue Jan 14 | [Backpropagation of error](slides/DL_multilayer_perceptrons.pdf) | | |
75 | |2 | - | [Notebook 2: MLP in Pytorch](notebooks/notebook_2_nn.ipynb) | | |
76 | |2 | Thu Jan 16 | [Multi-layer perceptrons](slides/DL_multilayer_perceptrons.pdf) | | |
77 | |3 | Tue Jan 21 | [Convolutional nets](slides/DL_convolutional_nets.pdf) | Homework 1 | 15 |
78 | |3 | - | [Notebook 3: Image Classification](notebooks/notebook_3_image_classification.ipynb) | | |
79 | |3 | Thu Jan 23 | [regularization](slides/DL_regularization.pdf) | | |
80 | |4 | Tue Jan 28 | [Data augmentation & generalization](slides/DL_regularization.pdf) | | |
81 | |4 | - | [Notebook 4: CNNs & Logging](notebooks/notebook_4_augmentation_logging.ipynb) | | |
82 | |4 | Thu Jan 30 | [Adversarial examples](slides/DL_adversarial_examples.pdf) | | |
83 | |4 | - | [Notebook 5: adversarial examples](notebooks/notebook_5_adversarial_examples.ipynb) | | |
84 | |5 | Tue Feb 4 | [Generative adversarial networks (GANS)](slides/DL_GANs.pdf) | Homework 2| 15 |
85 | |5 | - | [Notebook 6: GANs](notebooks/notebook_6_gan.ipynb) | | |
86 | |5 | Thu Feb 6 | Catch up day | | |
87 | |6 | Tue Feb 11 | MIDTERM | Midterm | 20 |
88 | |6 | Thu Feb 13 | [Unsupervised methods](slides/DL_unsupervised_methods.pdf) | | |
89 | |6 | - | [Notebook 7: autoencoders](notebooks/notebook_7_autoencoder.ipynb) | | |
90 | |7 | Tue Feb 18 | [recurrent nets](slides/DL_recurrent_nets.pdf) | | |
91 | |7 | Thu Feb 20 | [LSTMs](slides/DL_recurrent_nets.pdf) | Homework 3 | 15 |
92 | |7 | - | [Notebook 8: RNNs](notebooks/notebook_8_rnn.ipynb) | | |
93 | |8 | Tue Feb 25 | [Deep RL](slides/DL_deep_reinforcement_learning.pdf) | | |
94 | |8 | Thu Feb 27 | [Reinforcement learning (RL)](slides/DL_deep_reinforcement_learning.pdf) | | |
95 | |9 | Tue Mar 4 | [Pong with Reinforcement learning (RL)](slides/GM_deep_RL_2_policy_gradients.pdf) | | |
96 | |9 | Thu Mar 6 | [Attention networks](slides/DL_attention_networks.pdf) | Homework 4 | 15 |
97 | |10| Tue Mar 11 | [Transformers](slides/DL_transformers.pdf) | | |
98 | |10| Thu Mar 13 | FINAL EXAM | Final Exam | 20 |
99 | |11| Thu Mar 20 | Extra Credit Due | Extra Credit| 10 |
100 |
101 |
102 |
103 | ## Links
104 | [**Back to top**](#top)
105 |
106 | ### Helpful Programming Packages
107 |
108 | [Anaconda](https://www.anaconda.com) is the most popular python distro for machine learning.
109 |
110 | [Pytorch](http://pytorch.org/) Facebook's popular deep learning package. My lab uses this.
111 |
112 | [Tensorboard](https://www.tensorflow.org/tensorboard) is what my lab uses to visualize how experiments are going.
113 |
114 | [Tensorflow](https://www.tensorflow.org/) is Google's most popular python DNN package
115 |
116 | [Keras](https://keras.io/) A nice programming API that works with Tensorflow
117 |
118 | [JAX](https://github.com/google/jax) Is an alpha package from Gogle that allows differentiation of numpy and also an optimizing compiler for working on tensor processing units
119 |
120 | [Trax](https://github.com/google/trax) Is Google Brain's DNN package. It focuses on transformers and is implemented on top of [JAX](https://github.com/google/jax)
121 |
122 | [MXNET](https://mxnet.apache.org/versions/1.6/get_started?) is Apache's open source DL package.
123 |
124 | ### Helpful Books on Deep Learning
125 |
126 | [Deep Learning](http://www.deeplearningbook.org/) is THE book on Deep Learning. One of the authors won the Turing prize due to his work on deep learning.
127 |
128 | [Dive Into Deep Learning](http://d2l.ai/index.html) provides example code and instruction for how to write DL models in Pytorch, Tensorflow and MXNet.
129 |
130 | ### Computing Resources
131 |
132 | [Google's Colab](https://colab.research.google.com/notebooks/intro.ipynb) offers free GPU time and a nice environment for running Jupyter notebook-style projects.
133 | For $10 per month, you also get priority access to GPUs and TPUs.
134 |
135 | [Amazon's SageMaker](https://aws.amazon.com/sagemaker/pricing/) offers hundres of free hours for newbies.
136 |
137 | [The CS Department Wilkinson Lab](http://it.eecs.northwestern.edu/info/2015/11/03/info-labs.html) just got 22 new machines that each have a graphics card suitable for deep learning, and should be remote-accessable and running Linux with all the python packages needed for deep learning.
138 |
139 |
140 |
141 | ## Course Reading
142 | [**Back to top**](#top)
143 |
144 |
145 | ### Book Chapter Readings
146 |
147 | 1. [Chapter 4 of Machine Learning ](readings/chapter4-ml.pdf): **READ THIS FIRST** This is Tom Mitchell's book. Historical overview + explanation of backprop of error. It's a good starting point for actually understanding deep nets. Read the whole chapter.
148 |
149 | 1. [What are Gradients, Jacobians, and Hessians?](https://najeebkhan.github.io/blog/VecCal.html): This isn't a book chapter, but if you don't know what a gradient, Jacobian or Hessian is, you should read this before you read Chapter 4 of the Deep Learning book.
150 |
151 | 1. [Chapter 4 of the Deep Learning Book](http://www.deeplearningbook.org/): This covers basics of gradient-based optimization. Read through Section 4.3.
152 |
153 | 1. [Chapter 6 of Deep Learning](http://www.deeplearningbook.org/): This covers the basics from a more modern perspective. To my mind, if you've read Tom Mitchell, it is mostly useful for covering different kinds of activation functions. Read through Section 6.4
154 |
155 | 1. [Chapter 7 of the Deep Learning Book](http://www.deeplearningbook.org/): Covers regularization. The minimal useful read is sections 7.1 and 7.4...but this assumes you'll read the papers some of the other sections are based on. Those papers are in the additional readings. If you don't read those, then I'd add 7.9, 7.12, 7.13.
156 |
157 | 1. [Chapter 8 of the Deep Learning Book](http://www.deeplearningbook.org/): This covers optimization. Read through Section 8.5. Beyond that, it is stuff outside the scope of the class.
158 |
159 | 1. [Chapter 9 of Deep Learning](http://www.deeplearningbook.org/): Convolutional networks. Read 9.1 through 9.4 and 9.10
160 |
161 | ------ no book chapter below this line will be expected for the midterm ------
162 |
163 | 1. [Understanding LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/): A simple (maybe too simple?) walk-through of LSTMs. Good to read before trying the book chapter on this topic.
164 |
165 | 1. [Chapter 10 of Deep Learning](http://www.deeplearningbook.org/): RNNs and LSTMS
166 |
167 | 1. [Reinforcement Learning: An Introduction, Chapters 3 and 6](http://www.incompleteideas.net/book/RLbook2020.pdf): This gives you the basics of what reinforcement learning (RL) is about.
168 |
169 | ### Additional Readings
170 |
171 | 1. [Generalization and Network Design Strategies](https://masters.donntu.ru/2012/fknt/umiarov/library/lecun.pdf): The original 1989 paper where LeCun describes Convolutional networks.
172 |
173 | 1. [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167v3.pdf)
174 |
175 | 1. [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572) : This paper got the ball rolling by pointing out how to make images that look good but are consistently misclassified by trained deepnets.
176 |
177 | 1. [Generative Adversarial Nets](https://arxiv.org/pdf/1406.2661v1.pdf): The paper that introduced GANs.
178 |
179 | 1. [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf): Explains a widely-used regularizer
180 |
181 | ------ no additional reading below this line will be expected for the midterm ------
182 |
183 | 1. [DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434): This is an end-to-end model. Many papers build on this. The homework uses the discriminator approach from this paper
184 |
185 | 1. [Long Term Short Term Memory](https://www.researchgate.net/publication/13853244_Long_Short-term_Memory): The original 1997 paper introducing the LSTM
186 |
187 | 1. [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602): A key paper that showed how reinforcement learning can be used with deep nets. This is discussed in class.
188 |
189 | 1. [Deep Reinforcement Learning: Pong from Pixels](http://karpathy.github.io/2016/05/31/rl/): This is the blog we base part of Homework 4 on.
190 |
191 | 1. [The Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/): A good walkthrough that helps a lot with understanding transformers
192 |
193 | 1. [Attention is All You Need](https://arxiv.org/abs/1706.03762): The paper that introduced transformers, which are a popular and more complicated kind of attention network.
194 |
195 | 1. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf): A widely-used language model based on Transformer encoder blocks.
196 |
197 | 1. [The Illustrated GPT-2](http://jalammar.github.io/illustrated-gpt2/): Not a paper, but a good overview of GPT-2 and its relation to Transformer decoder blocks.
198 |
199 |
200 | |[**Top**](#top)| [**Calendar**](#calendar)| [**Links**](#links)| [**Readings**](#readings)|
201 |
202 |
--------------------------------------------------------------------------------
/index_spring_2024.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | # DEEP LEARNING: Northwestern University CS 396/496 Spring 2024
4 |
5 | |[**Top**](#top)| [**Calendar**](#calendar)| [**Links**](#links)| [**Readings**](#readings)|
6 |
7 | #### Class Day/Time
8 | Tuesdays and Thursdays, 9:30am - 10:50am Central Time
9 |
10 | #### Loctation
11 | Tech Lecture Room 5
12 |
13 |
14 | #### Instructors
15 | Professor: [Bryan Pardo](http://bryanpardo.com)
16 |
17 | TAs: Hugo Flores Garcia, Weijan Li
18 |
19 | Peer Mentors: Conor Kotwasinski, Cameron Churchwell, Nathan Pruyne, Finn Wintz, Ben Ferreira
20 |
21 | #### Office hours
22 | Monday: Weijan Li 3-5pm on [Weijan's zoom link](https://northwestern.zoom.us/j/97477504815), Conor Kotwasinski 5-6pm in Mudd 3532
23 |
24 | Tuesday: Hugo Flores Garcia 1-2pm in Mudd 3532, Cameron Churchwell 1-2pm on [Cameron's zoom link](https://northwestern.zoom.us/j/3883785036?pwd=S0pxYWgzUGZ2d1p4ZEZGMnl0SG80dz09), Bryan Pardo 3-5pm in Mudd 3115
25 |
26 | Wednesday: Cameron Churchwell 9-10am in Mudd 3532, Ben Ferreira 1-3pm on [Ben's zoom link](https://northwestern.zoom.us/j/95885283343), Conor Kotwasinsky 3 - 4pm in Mudd 3108, Finn Wintz 4-5pm on [Finn's zoom link](https://northwestern.zoom.us/j/2092202714)
27 |
28 | Thursday: Finn Wintz 4-5pm on [Finn's zoom link](https://northwestern.zoom.us/j/2092202714), Nathan Pruyne 6pm - 8pm in Mudd 3532, Hugo Flores Garcia 1-2pm in Mudd 3532
29 |
30 |
31 |
32 | ## Course Description
33 | This is a first course in Deep Learning. We will study deep learning architectures: perceptrons, multi-layer perceptrons, convolutional networks, recurrent neural networks (LSTMs, GRUs), attention networks, transformers, autoencoders, and the combination of reinforcement learning with deep learning. Other covered topics include regularization, loss functions and gradient descent.
34 |
35 | Learning will be in the practical context of implementing networks using these architectures in a modern programming environment: Pytorch. Homework consists of a mixture of programming assignments, review of research papers, running experiments with deep learning architectures, and theoretical questions about deep learning.
36 |
37 | Students completing this course should be able to reason about deep network architectures, build a deep network from scratch in Python, modify existing deep networks, train networks, and evaluate their performance. Students completing the course should also be able to understand current research in deep networks.
38 |
39 | ## Course Prerequisites
40 | This course presumes prior knowledge of machine learning equivalent to having taken CS 349 Machine Learning.
41 |
42 | ## Course textbook
43 | The primary text is the [Deep Learning book](http://www.deeplearningbook.org/). This reading will be supplemented by reading key papers in the field.
44 |
45 | ## Course Policies
46 | #### Questions outside of class
47 | Please use [CampusWire](https://campuswire.com) for class-related questions.
48 |
49 | #### Submitting assignments
50 | Assignments must be submitted on the due date by the time specified on Canvas. If you are worried you can't finish on time, upload a safety submission an hour early with what you have. I will grade the most recent item submitted before the deadline. Late submissions will not be graded.
51 |
52 | #### Grading Policy
53 | You will be graded on a 100 point scale (e.g. 93 to 100 = A, 90-92 = A-, 87-89 = B+, 83-86 = B, 80-82 = B-...and so on).
54 |
55 | Homework and reading assignments are solo assignments and must be original work.
56 |
57 | #### Extra Credit
58 | You can earn up to 8 points of extra credit in the final reading example
59 |
60 |
61 |
62 | ## Course Calendar
63 | [**Back to top**](#top)
64 |
65 | | Week|Day and Date| Topic (tentative) |Due today | Points|
66 | |----:|------------|------------------------------------------|--------------------|------:|
67 | |1 | Tue Mar 26 | No class: Northwestern runs Monday classes on Tuesday | | |
68 | |1 | Thu Mar 28 | [Perceptrons](slides/DL_perceptrons.pdf) | | |
69 | |1 | - | [Notebook 1: perceptrons](notebooks/notebook_1_perceptron.ipynb) | | |
70 | |2 | Tue Apr 02 | [Gradient descent](slides/DL_gradient_descent.pdf) | Reading 1 | 8 |
71 | |2 | Thu Apr 04 | [Backpropagation of error](slides/DL_multilayer_perceptrons.pdf) | | |
72 | |2 | - | [Notebook 2: MLP in Pytorch](notebooks/notebook_2_nn.ipynb) | | |
73 | |3 | Tue Apr 9 | [Multi-layer perceptrons](slides/DL_multilayer_perceptrons.pdf) | Homework 1 | 15 |
74 | |3 | Thu Apr 11 | [Convolutional nets](slides/DL_convolutional_nets.pdf) | | |
75 | |3 | - | [Notebook 3: Image Classification](notebooks/notebook_3_image_classification.ipynb) | | |
76 | |4 | Tue Apr 16 | [regularization](slides/DL_regularization.pdf) | Reading 2 | 8 |
77 | |4 | Thu Apr 18 | [Data augmentation & generalization](slides/DL_regularization.pdf) | | |
78 | |4 | - | [Notebook 4: CNNs & Logging](notebooks/notebook_4_augmentation_logging.ipynb) | | |
79 | |5 | Tue Apr 23 | [Visual adversarial examples](slides/DL_adversarial_examples.pdf) | | |
80 | |5 | Thu Apr 25 | [Auditory adversarial examples](slides/DL_audio_adversarial.pdf) | Homework 2| 15 |
81 | |5 | - | [Notebook 5: adversarial examples](notebooks/notebook_5_adversarial_examples.ipynb) | | |
82 | |6 | Tue Apr 30 | [Generative adversarial networks (GANS)](slides/DL_GANs.pdf) | | |
83 | |6 | Thu May 02 | [More GANS](slides/DL_GANs.pdf) | Reading 3 | 8 |
84 | |6 | - | [Notebook 6: GANs](notebooks/notebook_6_gan.ipynb) | | |
85 | |7 | Tue May 07 | [Unsupervised methods](slides/DL_unsupervised_methods.pdf) | | |
86 | |7 | Thu May 09 | [recurrent nets](slides/DL_recurrent_nets.pdf) | Homework 3 |15 |
87 | |7 | - | [Notebook 7: autoencoders](notebooks/notebook_7_autoencoder.ipynb) | | |
88 | |8 | Tue May 14 | [LSTMs](slides/DL_recurrent_nets.pdf) | | |
89 | |8 | Thu May 16 | [Deep RL](slides/DL_deep_reinforcement_learning.pdf) | Reading 4 | 8 |
90 | |8 | - | [Notebook 8: RNNs](notebooks/notebook_8_rnn.ipynb) | | |
91 | |9 | Tue May 21 | [Reinforcement learning (RL)](slides/DL_deep_reinforcement_learning.pdf) | | |
92 | |9 | Thu May 23 | [Pong with Reinforcement learning (RL)](slides/GM_deep_RL_2_policy_gradients.pdf) | Reading 5 | 8 |
93 | |9 | - | [Attention networks](slides/DL_attention_networks.pdf) | | |
94 | |10| Tue May 28 | [Transformers](slides/DL_transformers.pdf) | | |
95 | |10| Thu May 30 | Current research in DL | Homework 4 | 15 |
96 | |10| - | | | |
97 | |11| Tue Jun 04 | No final exam, just extra credit reading | Extra Credit Reading 6 | 8 |
98 |
99 |
100 |
101 |
102 |
103 | ## Links
104 | [**Back to top**](#top)
105 |
106 | ### Helpful Programming Packages
107 |
108 | [Anaconda](https://www.anaconda.com) is the most popular python distro for machine learning.
109 |
110 | [Pytorch](http://pytorch.org/) Facebook's popular deep learning package. My lab uses this.
111 | [Tensorboard](https://www.tensorflow.org/tensorboard) is what my lab uses to visualize how experiments are going.
112 |
113 | [Tensorflow](https://www.tensorflow.org/) is Google's most popular python DNN package
114 |
115 | [Keras](https://keras.io/) A nice programming API that works with Tensorflow
116 |
117 | [JAX](https://github.com/google/jax) Is an alpha package from Gogle that allows differentiation of numpy and also an optimizing compiler for working on tensor processing units
118 |
119 | [Trax](https://github.com/google/trax) Is Google Brain's DNN package. It focuses on transformers and is implemented on top of [JAX](https://github.com/google/jax)
120 |
121 | [MXNET](https://mxnet.apache.org/versions/1.6/get_started?) is Apache's open source DL package.
122 |
123 | ### Helpful Books on Deep Learning
124 |
125 | [Deep Learning](http://www.deeplearningbook.org/) is THE book on Deep Learning. One of the authors won the Turing prize due to his work on deep learning.
126 |
127 | [Dive Into Deep Learning](http://d2l.ai/index.html) provides example code and instruction for how to write DL models in Pytorch, Tensorflow and MXNet.
128 |
129 | ### Computing Resources
130 |
131 | [Google's Colab](https://colab.research.google.com/notebooks/intro.ipynb) offers free GPU time and a nice environment for running Jupyter notebook-style projects.
132 | For $10 per month, you also get priority access to GPUs and TPUs.
133 |
134 | [Amazon's SageMaker](https://aws.amazon.com/sagemaker/pricing/) offers hundres of free hours for newbies.
135 |
136 | [The CS Department Wilkinson Lab](http://it.eecs.northwestern.edu/info/2015/11/03/info-labs.html) just got 22 new machines that each have a graphics card suitable for deep learning, and should be remote-accessable and running Linux with all the python packages needed for deep learning.
137 |
138 |
139 |
140 | ## Course Reading
141 | [**Back to top**](#top)
142 |
143 | #### The History
144 | 1. [The Organization of Behavior](https://pure.mpg.de/pubman/item/item_2346268_3/component/file_2346267/Hebb_1949_The_Organization_of_Behavior.pdf): Hebb's 1949 book that provides a general framework for relating behavior to synaptic organization through the dynamics of neural networks.
145 |
146 | 1. [The Perceptron](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.335.3398&rep=rep1&type=pdf): This is the 1st neural networks paper, published in 1958. The algorithm won't be obvious, but the thinking is interesting and the conclusions are worth reading.
147 |
148 | 1. [The Perceptron: A perceiving and recognizing automoton](https://bpb-us-e2.wpmucdn.com/websites.umass.edu/dist/a/27637/files/2016/03/rosenblatt-1957.pdf): This one is an earlier paper by Rosenblatt that is, perhaps, even more historical than the 1958 paper and a bit easer for an engineer to follow, I think.
149 |
150 | #### The basics (1st reading topic)
151 |
152 | 1. [* Chapter 4 of Machine Learning ](readings/chapter4-ml.pdf): This is Tom Mitchell's book. Historical overview + explanation of backprop of error. It's a good starting point for actually understanding deep nets. **START HERE. IT'S WORTH 2 READINGS. WHAT THAT MEANS IS...GIVE ME 2 PAGES OF REACTIONS FOR THIS READING AND GET CREDIT FOR 2 READINGS**
153 |
154 | 1. [Chapter 6 of Deep Learning](http://www.deeplearningbook.org/): Modern intro on deep nets. To me, this is harder to follow than Chapter 4 of Machine Learning, though. Certainly, it's longer.
155 |
156 | #### Optimization (2nd reading topic)
157 |
158 | 1. [This reading is **NOT** worth points, but...](https://najeebkhan.github.io/blog/VecCal.html)...if you don't know what a gradient, Jacobian or Hessian is, you should read this before you read Chapter 4 of the Deep Learning book.
159 |
160 | 1. [Chapter 4 of the Deep Learning Book](http://www.deeplearningbook.org/): This covers basics of gradient-based optimization. **Start here for optimization**
161 |
162 | 1. [Chapter 8 of the Deep Learning Book](http://www.deeplearningbook.org/): This covers optimization. **This should come 2nd in your optimization reading**
163 |
164 | 1. [Why Momentum Really Works](http://distill.pub/2017/momentum/): Reading this will help you understand the popular ADAM optimizer better.
165 |
166 | 1. [On the Difficulties of Training Recurrent Networks](http://proceedings.mlr.press/v28/pascanu13.pdf): A 2013 paper that explains vanishing and exploding gradients
167 |
168 | 1. [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/abs/1502.03167). This is the most common approaches to normalization.
169 |
170 | 1. [AutoClip: Adaptive Gradient Clipping for Source Separation Networks](https://arxiv.org/abs/2007.14469) is a recent paper out of Pardo's lab that helps deal with unruly gradients. There's also [a video](https://www.youtube.com/watch?v=Rc0AN_PzyE0&feature=youtu.be) for this one.
171 |
172 |
173 | #### Convolutional Networks (3rd reading topic)
174 | 1. [Generalization and Network Design Strategies](http://yann.lecun.com/exdb/publis/pdf/lecun-89.pdf): The original 1989 paper where LeCun describes Convolutional networks. **Start here.**
175 |
176 | 1. [Chapter 9 of Deep Learning: Convolutional Networks](http://www.deeplearningbook.org/).
177 |
178 | #### Regularization and overfitting (4th reading topic)
179 | 1. [Chapter 7 of the Deep Learning Book](http://www.deeplearningbook.org/): Covers regularization.
180 |
181 | 1. [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf): Explains a widely-used regularizer
182 |
183 | 1. [Understanding deep learning requires rethinking generalization](https://arxiv.org/abs/1611.03530): Thinks about the question "why aren't deep nets overfitting even more than they seem to be"?
184 |
185 | 1. [The Implicit Bias of Gradient Descent on Separable Data](http://www.jmlr.org/papers/volume19/18-188/18-188.pdf) : A study of bias that is actually based on the algorithm, rather than the dataset.
186 |
187 |
188 | #### Experimental Design
189 | 1. [The Extent and Consequences of P-Hacking in Science](https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002106)
190 |
191 | #### Visualizing and understanding network representations
192 | 1. [Visualizing and Understanding Convolutional Networks](https://arxiv.org/pdf/1311.2901v3.pdf): How do you see what the net is thinking? Here's one way.
193 |
194 | 1. [Local Interpretable Model-Agnostic Explanations (LIME): An Introduction](https://www.oreilly.com/content/introduction-to-local-interpretable-model-agnostic-explanations-lime/) A technique to explain the predictions of any machine learning classifier.
195 |
196 | #### Popular Architectures for Convolutional Networks
197 | If you already understand what convolutional networks are, then here are some populare architectures you can find out about.
198 |
199 | 1. [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385): The 2016 paper that introduces the popular ResNet architecture that can get 100 layers deep
200 |
201 | 1. [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556): The 2015 paper introducing the popular VGG architecture
202 |
203 | 1. [Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842):The 2015 paper describing the Inception network architecture.
204 |
205 |
206 | #### Adversarial examples
207 | 1. [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572) : This paper got the ball rolling by pointing out how to make images that look good but are consistently misclassified by trained deepnets.
208 |
209 | 1. [Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images](https://arxiv.org/pdf/1412.1897.pdf): This paper shows just how screwy you can make an image and still have it misclsasified by a "well trained, highly accurate" image recognition deep net.
210 |
211 | 1. [Effective and Inconspicuous Over-the-air Adversarial Examples with Adaptive Filtering](https://interactiveaudiolab.github.io/assets/papers/oreilly_awasthi_vijayaraghavan_pardo_2021.pdf): Cutting edge research from our very own Patrick O.
212 |
213 | #### Creating GANs
214 | 1. [Generative Adversarial Nets](https://arxiv.org/pdf/1406.2661v1.pdf): The paper that introduced GANs. **If you read only one GAN paper, make it this one.**
215 |
216 | 1. [2016 Tutorial on Generative Adversarial Networks](https://arxiv.org/pdf/1701.00160.pdf) by one of the creators of the GAN. This one's long, but good.
217 |
218 | 1. [DCGAN: Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks](https://arxiv.org/abs/1511.06434): This is an end-to-end model. Many papers build on this. **The homework uses the discriminator approach from this paper**
219 |
220 | 1. [Generative Adversarial Text to Image Synthesis](http://proceedings.mlr.press/v48/reed16.pdf) This paper describes generating images conditioned on text descriptions. Pretty interesting...
221 |
222 | #### Recurrent Networks
223 | 1. [Chapter 10 of Deep Learning](http://www.deeplearningbook.org/): A decent starting point
224 |
225 | 1. [The Recurrent Neural Networks Tutorial](readings/RNN-tutorial-WildML.pdf): This is a 4-part tutorial that starts with an overview and then gets deep into coding up an RNN using Theano (not PyTorch) and has links to GitHub repositories with all the examples. If you just read this for the points, read Part 1. But go deep, if you're interested, and read all the parts. **NOTE** the links to the code repositories work. Many of the other hyperlinks don't.
226 |
227 | 1. [* Extensions of recurrent neural network language model](https://ieeexplore.ieee.org/abstract/document/5947611?casa_token=VaRzW-PbtiEAAAAA:BAXcc2Tb4HL-e2TrTSdao50lxoYMaSkGA0o0iZKC8ojYP-wPHfnWCjlOfj6-coIID8PrBqBE): This covers the RNN language model discussed in class.
228 |
229 | 1. [Backpropagation through time: what it does and how to do it](https://ieeexplore.ieee.org/abstract/document/58337?casa_token=61YezqH4E60AAAAA:Sp19xOIx2R3xt8XnnCy8Cb8vqNt6LLwamZmIr2G6iAAk4PrOYVgqdQyyQKQzXwcgm9bTo6px)
230 |
231 | 1. [Long Term Short Term Memory](https://www.researchgate.net/publication/13853244_Long_Short-term_Memory): The original 1997 paper introducing the LSTM
232 |
233 | 1. [Understanding LSTMs](http://colah.github.io/posts/2015-08-Understanding-LSTMs/): A simple (maybe too simple?) walk-through of LSTMs
234 |
235 | 1. [Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling](https://arxiv.org/pdf/1412.3555.pdf): Compares a simplified LSTM (the GRU) to the original LSTM and also simple RNN units.
236 |
237 |
238 | #### Attention networks (read these before looking at Transformers)
239 |
240 | 1. [Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/) ** This is a good starting point on attention models. **
241 |
242 | 1. [Sequence to Sequence Learning with Neural Networks](https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf): This is the paper that the link above was trying to explain.
243 |
244 | 1. [* Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation](https://arxiv.org/pdf/1406.1078.pdf): This paper introduces encoder-decoder networks for translation. Attention models were first built on this framework. Covered in class.
245 |
246 | 1. [* Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/pdf/1409.0473.pdf): This paper introduces additive attention to an encoder-decoder. Covered in class.
247 |
248 | 1. [* Effective Approaches to Attention-based Neural Machine Translation](https://arxiv.org/pdf/1508.04025.pdf): Introduced multiplicative attention. Covered in class.
249 |
250 | 1. [Massive Exploration of Neural Machine Translation Architectures](https://arxiv.org/pdf/1703.03906.pdf): A 2017 paper that settles the questions about which architecture is best for doing translation....except that the Transformer model came out that same year and upended everything. Still, a good overview of the pre-transformer state-of-the-art.
251 |
252 | 1. [Show, Attend and Tell: Neural Image Caption Generation with Visual Attention](https://arxiv.org/abs/1502.03044): Attention started with text, but is now applied to images. Here's an example.
253 |
254 | 1. [Listen, Attend and Spell](https://arxiv.org/pdf/1508.01211.pdf): Attention is also applied to speech, as per this example.
255 |
256 | 1. [A Tutorial in TensorFlow](https://github.com/tensorflow/nmt): Ths walks through how to use Tensorflow 1.X to build a neural machine translation network with attention.
257 |
258 | #### Transformer networks (Don't read until you understand attention models)
259 | 1. [The Illustrated Transformer](http://jalammar.github.io/illustrated-transformer/): A good walkthrough that helps a lot with understanding transformers ** I'd start with this one to learn about transformers.**
260 |
261 | 1. [The Annotated Transformer](https://nlp.seas.harvard.edu/2018/04/03/attention.html): An annotated walk-through of the "Attention is All You Need" paper, complete with detailed python implementation of a transformer.
262 |
263 | 1. [Attention is All You Need](https://arxiv.org/abs/1706.03762): The paper that introduced transformers, which are a popular and more complicated kind of attention network.
264 |
265 | 1. [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/pdf/1810.04805.pdf): A widely-used language model based on Transformer encoder blocks.
266 |
267 | 1. [The Illustrated GPT-2](http://jalammar.github.io/illustrated-gpt2/): A good overview of GPT-2 and its relation to Transformer decoder blocks.
268 |
269 |
270 | #### Reinforcement Learning
271 | 1. [Reinforcement Learning: An Introduction, Chapters 3 and 6](http://www.incompleteideas.net/book/RLbook2020.pdf): This gives you the basics of what reinforcement learning (RL) is about.
272 |
273 | 1. [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602): A key paper that showed how reinforcement learning can be used with deep nets. This is discussed in class.
274 |
275 | 1. [Deep Reinforcement Learning: Pong from Pixels](http://karpathy.github.io/2016/05/31/rl/): This is the blog we base part of Homework 4 on. Reading this is a very efficient use of your time, since you can count it as a reading AND you have to read it for the homework, anyhow.
276 |
277 | 1. [Mastering the game of Go with deep neural networks and tree search](http://airesearch.com/wp-content/uploads/2016/01/deepmind-mastering-go.pdf): A famous paper that showed how RL + Deepnets = the best Go player in existence at the time.
278 |
279 | 1. [A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play](https://science.sciencemag.org/content/362/6419/1140/): This is the AlphaZero paper. AlphaZero is the best go player...and a great chess player.
280 |
281 |
282 | |[**Top**](#top)| [**Calendar**](#calendar)| [**Links**](#links)| [**Readings**](#readings)|
283 |
284 |
--------------------------------------------------------------------------------
/notebooks/README.md:
--------------------------------------------------------------------------------
1 |
COMP_SCI 396 Deep Learning SP2022
2 |
3 |
4 | ---
5 |
6 | This repository holds notebooks and code for the course COMP_SCI 396 Deep Learning, Spring 2021, Northwestern University.
7 |
8 |
9 | ### Getting Started
10 |
11 | If you want to run the notebooks locally rather than through Google Colab, the following steps are recommended:
12 |
13 | 1. Clone this repository from GitHub:
14 | ```bash
15 | git clone https://github.com/interactiveaudiolab/course-deep-learning.git && cd course-deep-learning
16 | ```
17 |
18 | 2. Download and install Miniconda, a Python virtual environment manager
19 |
20 | 3. Create a new virtual environment using the following command:
21 | ```bash
22 | conda create --name course-deep-learning python=3.8
23 | ```
24 |
25 | 4. Activate the new environment:
26 | ```bash
27 | conda activate course-deep-learning
28 | ```
29 |
30 | 5. Install the required packages (you can then skip the installation instructions at the beginning of each notebook):
31 | ```bash
32 | pip install -r requirements.txt
33 | ```
34 |
35 | 6. Run the following command to ensure your new virtual environment is visible to Jupyter:
36 | ```bash
37 | python -m ipykernel install --user --name=course-deep-learning
38 | ```
39 |
40 | 7. When PyTorch downloads a dataset and tries to create a progress bar, Jupyter can occasionally cause issues. To avoid this, run:
41 | ```bash
42 | conda install -c conda-forge ipywidgets
43 | jupyter nbextension enable --py widgetsnbextension
44 | ```
45 |
46 | ### Notebooks
47 |
48 | TODO: put Google Colab links here once notebooks are finalized
49 |
--------------------------------------------------------------------------------
/notebooks/notebook_3_image_classification.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "59b3745d-0dfd-4661-bece-dddb094c98e2",
6 | "metadata": {
7 | "id": "59b3745d-0dfd-4661-bece-dddb094c98e2"
8 | },
9 | "source": [
10 | "# Notebook 3: Training a Neural Network for Image Classification\n",
11 | "\n",
12 | "In this notebook, we'll train a neural network to classify images rather than two-dimensional synthetic data. We'll also take a look at the components of a typical training pipeline in PyTorch, including datasets, data loaders, and checkpointing. We'll use a new loss function to guide our network during training, and let one of PyTorch's optimizers automatically update our network's parameters using gradient descent.\n",
13 | "\n",
14 | "The notebook is broken up as follows:\n",
15 | "\n",
16 | " 1. [Setup](#setup) \n",
17 | " 2. [Data](#data) \n",
18 | " 2.1 [Datasets](#datasets) \n",
19 | " 2.2 [DataLoaders](#dataloaders) \n",
20 | " 3. [A Neural Network for Image Recognition](#nn) \n",
21 | " 3.1. [Defining the Network](#definition) \n",
22 | " 3.2 [Classification Loss](#loss) \n",
23 | " 3.3 [Picking an Optimizer: SGD](#sgd) \n",
24 | " 3.4. [Checkpointing](#checkpoint) \n",
25 | " 4. [Putting It All Together: Training Loop](#train) \n",
26 | " 5. [GPU Acceleration](#gpu)"
27 | ]
28 | },
29 | {
30 | "cell_type": "markdown",
31 | "id": "1d22747f-31e8-4263-9e74-c499f25416b7",
32 | "metadata": {
33 | "id": "1d22747f-31e8-4263-9e74-c499f25416b7",
34 | "tags": []
35 | },
36 | "source": [
37 | "## __1.__ Setup\n"
38 | ]
39 | },
40 | {
41 | "cell_type": "markdown",
42 | "id": "5784a0af-7266-4bcd-b87a-2558b15da621",
43 | "metadata": {
44 | "id": "5784a0af-7266-4bcd-b87a-2558b15da621"
45 | },
46 | "source": [
47 | "Make sure the needed packages are installed and utility code is in the right place."
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": null,
53 | "id": "27d077ec-0fd2-4c9e-9074-fd4524ead6c8",
54 | "metadata": {
55 | "id": "27d077ec-0fd2-4c9e-9074-fd4524ead6c8"
56 | },
57 | "outputs": [],
58 | "source": [
59 | "# helper code from the course repository\n",
60 | "!git clone https://github.com/interactiveaudiolab/course-deep-learning.git\n",
61 | "# install common pacakges used for deep learning\n",
62 | "!cd course-deep-learning/ && pip install -r requirements.txt"
63 | ]
64 | },
65 | {
66 | "cell_type": "code",
67 | "execution_count": null,
68 | "id": "1b554b46-be39-4431-8031-b4e41d2d15b2",
69 | "metadata": {
70 | "id": "1b554b46-be39-4431-8031-b4e41d2d15b2"
71 | },
72 | "outputs": [],
73 | "source": [
74 | "%matplotlib inline\n",
75 | "%cd course-deep-learning/\n",
76 | "\n",
77 | "import time\n",
78 | "import torch\n",
79 | "import torchvision\n",
80 | "import torchvision.datasets as datasets\n",
81 | "import matplotlib.pyplot as plt\n",
82 | "import numpy as np"
83 | ]
84 | },
85 | {
86 | "cell_type": "markdown",
87 | "id": "020178c4-31a3-49b8-a2e6-d61e205f9d67",
88 | "metadata": {
89 | "id": "020178c4-31a3-49b8-a2e6-d61e205f9d67",
90 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
91 | },
92 | "source": [
93 | "## __2.__ Data\n",
94 | "\n",
95 | "### __2.1__ Datasets\n",
96 | "\n",
97 | "In the previous two notebooks, we saw a variety of two-dimensional synthetic datasets. In this notebook, we'll be working with a pre-existing image dataset. Image data is inherently high-dimensional: each pixel corresponds to a single coordinate/dimension (grayscale), or holds three separate coordinates (Red,Green,Blue). For even small images, this means our inputs can have thousands of dimensions (e.g. 32 x 32 pixels x 3 colors = 3072). As a result, image datasets can be fairly large. Additionally, we may need to apply certain __transformations__ or __preprocessing__ steps to our image data before attempting to pass it to a neural network.\n",
98 | "\n",
99 | "PyTorch and its corresponding image library, TorchVision, offer a number of utilities to streamline dataset storage, loading, and preprocessing. We'll start by using TorchVision to download the well-known [MNIST dataset](http://yann.lecun.com/exdb/mnist/). This dataset contains 28x28-pixel images of handwritten digits, and our goal will be to predict the correct label given an image:"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": null,
105 | "id": "431fcf53-5a9b-468b-8287-e873c41d1fbb",
106 | "metadata": {
107 | "id": "431fcf53-5a9b-468b-8287-e873c41d1fbb"
108 | },
109 | "outputs": [],
110 | "source": [
111 | "# make a new directory in which to download the MNIST dataset\n",
112 | "data_dir = \"./data/\"\n",
113 | "\n",
114 | "# download MNIST \"test\" dataset\n",
115 | "mnist_test = torchvision.datasets.MNIST(data_dir, train=False, download=True)\n",
116 | "\n",
117 | "# download MNIST \"train\" dataset and set aside a portion for validation\n",
118 | "mnist_train_full = datasets.MNIST(data_dir, train=True, download=True)\n",
119 | "mnist_train, mnist_val = torch.utils.data.random_split(mnist_train_full, [55000, 5000])\n",
120 | "\n",
121 | "type(mnist_test), type(mnist_train), type(mnist_val)"
122 | ]
123 | },
124 | {
125 | "cell_type": "markdown",
126 | "id": "8cea4af4-c158-4ef1-a02a-bce0fba736d1",
127 | "metadata": {
128 | "id": "8cea4af4-c158-4ef1-a02a-bce0fba736d1",
129 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
130 | },
131 | "source": [
132 | "Our dataset is now held in three `torch.utils.data.Dataset` objects, each acting as an iterable container from which we can fetch input-label pairs. You should also now see a `data/` directory containing the MNIST dataset. Let's have a look at a random image from the test set."
133 | ]
134 | },
135 | {
136 | "cell_type": "code",
137 | "execution_count": null,
138 | "id": "87582752-ad1f-4c56-a5a2-a9d478a3ec17",
139 | "metadata": {
140 | "id": "87582752-ad1f-4c56-a5a2-a9d478a3ec17"
141 | },
142 | "outputs": [],
143 | "source": [
144 | "print(f\"There are {len(mnist_test)} images in mnist_test\")\n",
145 | "d = np.random.randint(0, len(mnist_test))\n",
146 | "print(f\"Image {d} is a {mnist_test[d][1]}\")\n",
147 | "\n",
148 | "# plot our image\n",
149 | "plt.imshow(mnist_test[d][0], cmap='gray')\n",
150 | "plt.show()"
151 | ]
152 | },
153 | {
154 | "cell_type": "markdown",
155 | "id": "80efbb56-c23f-43e7-9f99-01a28e376321",
156 | "metadata": {
157 | "id": "80efbb56-c23f-43e7-9f99-01a28e376321",
158 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
159 | },
160 | "source": [
161 | "Our \"test\" dataset contains 10,000 entries, each of which is a tuple holding a `PIL.Image.Image` object and an integer label. Unfortunately, the neural networks we trained in the previous notebook require `torch.Tensor` inputs. We therefore need to apply some preprocessing to these image datasets before we can train a network.\n",
162 | "\n",
163 | "TorchVision provides a `Transform` class for building and composing preprocessing stages that can be automatically applied to your image data. Here's an example:"
164 | ]
165 | },
166 | {
167 | "cell_type": "code",
168 | "execution_count": null,
169 | "id": "c1627880-672e-415a-a509-361722bd19e4",
170 | "metadata": {
171 | "id": "c1627880-672e-415a-a509-361722bd19e4"
172 | },
173 | "outputs": [],
174 | "source": [
175 | "# we'll stack multiple transformations in a single object that will apply them in sequence\n",
176 | "transform = torchvision.transforms.Compose([\n",
177 | " torchvision.transforms.ToTensor(), # this is a built-in Transform object to convert images to tensors\n",
178 | " lambda x: x>0, # this is our own transformation function for binarizing MNIST images\n",
179 | " lambda x: x.float(), # this is our own transformation function for converting inputs to floating-point\n",
180 | "])\n",
181 | "\n",
182 | "# grab the first image-label pair from our \"test\" dataset\n",
183 | "example_img, example_label = mnist_test[0]\n",
184 | "\n",
185 | "# apply our sequence of transformations\n",
186 | "transformed = transform(example_img)\n",
187 | "print(f\"Image label: {example_label}\")\n",
188 | "print(\"Transformed image shape:\", transformed.shape)\n",
189 | "print(f\"Transformed image data: {(', '.join(str(p.item()) for p in transformed.flatten()))[:100]} ...\")\n",
190 | "\n",
191 | "# plot our image\n",
192 | "plt.imshow(transformed.squeeze(), cmap='gray')\n",
193 | "plt.show()"
194 | ]
195 | },
196 | {
197 | "cell_type": "markdown",
198 | "id": "e35ee41e-b55e-4e05-a30e-27adb99b7fba",
199 | "metadata": {
200 | "id": "e35ee41e-b55e-4e05-a30e-27adb99b7fba",
201 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
202 | },
203 | "source": [
204 | "We can see that our transform converts MNIST images to floating-point tensors holding binary values -- which we can feed to a neural network! In fact, we can bake our transform directly into our datasets so that it is applied automatically when we go to fetch data. To demonstrate, we'll re-initialize our datasets, this time reading directly from our `data/` folder rather than re-downloading:"
205 | ]
206 | },
207 | {
208 | "cell_type": "code",
209 | "execution_count": null,
210 | "id": "bcf0ea06-2879-4ab3-80c7-0c5f05bdeaa2",
211 | "metadata": {
212 | "id": "bcf0ea06-2879-4ab3-80c7-0c5f05bdeaa2"
213 | },
214 | "outputs": [],
215 | "source": [
216 | "# load MNIST \"test\" dataset from disk. Note we're using the transform defined a few cells earlier, which\n",
217 | "# turns the data into the right format as we load from disk.\n",
218 | "mnist_test = torchvision.datasets.MNIST(data_dir, train=False, download=False, transform=transform)\n",
219 | "\n",
220 | "# load MNIST \"train\" dataset from disk and set aside a portion for validation\n",
221 | "mnist_train_full = datasets.MNIST(data_dir, train=True, download=False, transform=transform)\n",
222 | "mnist_train, mnist_val = torch.utils.data.random_split(mnist_train_full, [55000, 5000])\n",
223 | "\n",
224 | "example_img, example_label = mnist_test[0]\n",
225 | "print(f\"Each image in our dataset now has type {type(example_img)} and shape {example_img.shape}\")"
226 | ]
227 | },
228 | {
229 | "cell_type": "markdown",
230 | "id": "5e0bc892-84ee-4824-81a9-9144d9fe3825",
231 | "metadata": {
232 | "id": "5e0bc892-84ee-4824-81a9-9144d9fe3825",
233 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
234 | },
235 | "source": [
236 | "### __2.2__ DataLoaders\n",
237 | "\n",
238 | "Given that `torch.utils.data.Dataset` and its subclasses provide an iterable container from which we can fetch input-label pairs, we could go ahead and start traininng a network:\n",
239 | "\n",
240 | "```\n",
241 | "for x, y in myDataset:\n",
242 | "\n",
243 | " opt.zero_grad()\n",
244 | "\n",
245 | " outputs = myNetwork(x)\n",
246 | " \n",
247 | " loss = myLoss(outputs, y)\n",
248 | " loss.backward()\n",
249 | " \n",
250 | " opt.step()\n",
251 | " ...\n",
252 | "```\n",
253 | "\n",
254 | "However, we often want to load our data in __batches__ while training, typically in a random or __shuffled__ order. PyTorch provides a `DataLoader` class to handle the process of fetching data from a `Dataset` object, including shuffling, custom batch collation, and various random sampling schemes."
255 | ]
256 | },
257 | {
258 | "cell_type": "code",
259 | "execution_count": null,
260 | "id": "a9f0e984-9e5c-4c1d-9bb5-cd11a7218531",
261 | "metadata": {
262 | "id": "a9f0e984-9e5c-4c1d-9bb5-cd11a7218531"
263 | },
264 | "outputs": [],
265 | "source": [
266 | "# we'll use a batch size of 60 for training our network\n",
267 | "batch_size = 60\n",
268 | "\n",
269 | "# initialize a DataLoader object for each dataset\n",
270 | "train_dataloader = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)\n",
271 | "val_dataloader = torch.utils.data.DataLoader(mnist_val, batch_size=batch_size, shuffle=True)\n",
272 | "test_dataloader = torch.utils.data.DataLoader(mnist_test, batch_size=1, shuffle=False)\n",
273 | "\n",
274 | "# grab the first batch from one of our DataLoader objects\n",
275 | "example_batch_img, example_batch_label = next(iter(train_dataloader))\n",
276 | "\n",
277 | "#for batch in train_dataloader:#\n",
278 | "\n",
279 | "# print(batch[0], batch[1])\n",
280 | "# break\n",
281 | "\n",
282 | "# inputs and labels are batched together as tensor objects\n",
283 | "print(f\"Batch inputs shape: {example_batch_img.shape}, Batch labels shape: {example_batch_label.shape}\")"
284 | ]
285 | },
286 | {
287 | "cell_type": "markdown",
288 | "id": "aa5717ae-f257-4b1c-bcfe-766742a570f7",
289 | "metadata": {
290 | "id": "aa5717ae-f257-4b1c-bcfe-766742a570f7",
291 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
292 | },
293 | "source": [
294 | "## __3.__ A Neural Network for Image Recognition\n",
295 | "\n",
296 | "### __3.1__ Defining the Network\n",
297 | "\n",
298 | "Now that we've seen the data we'll be working with, it's time build a neural network capable of classifying handwritten digits. In the previous notebook, we created a neural network capable of turning two-dimensional inputs into one-dimensional (scalar) predictions. By contrast, our inputs will have 28x28 = 784 dimensions, and our network will have to predict one of ten possible labels (one for each digit 0-9). To accommodate these changes, we'll tweak our network as follows:\n",
299 | "\n",
300 | " 1. We'll modify our network's first layer to take 784-dimensional inputs\n",
301 | " 2. We'll use a larger intermediate layer to allow our network to learn complex decision functions\n",
302 | " 3. We'll try out the ReLU (rectified linear unit) activation function \n",
303 | " 4. We'll have our network produce a 10-dimensional vector as output; the index of the largest value in this vector will be our predicted label (e.g. if the first entry has the largest value, our predicted digit will be 0).\n",
304 | " \n",
305 | "
\n",
306 | "\n",
307 | "
\n",
308 | "\n",
309 | "
\n"
310 | ]
311 | },
312 | {
313 | "cell_type": "code",
314 | "execution_count": null,
315 | "id": "f316de00-6063-40d8-b048-a3bca6a0789e",
316 | "metadata": {
317 | "id": "f316de00-6063-40d8-b048-a3bca6a0789e"
318 | },
319 | "outputs": [],
320 | "source": [
321 | "class MNISTNetwork(torch.nn.Module):\n",
322 | "\n",
323 | " def __init__(self):\n",
324 | " super().__init__()\n",
325 | "\n",
326 | " # MNIST images are (1, 28, 28) (channels, width, height)\n",
327 | " self.layer_1 = torch.nn.Linear(28*28, 1024)\n",
328 | " self.layer_2 = torch.nn.Linear(1024, 10)\n",
329 | " self.relu = torch.nn.ReLU()\n",
330 | "\n",
331 | " def forward(self, x):\n",
332 | "\n",
333 | " batch_size, channels, width, height = x.size()\n",
334 | " x = x.view(batch_size, -1) # create an array of flattened images with dimension (batch_size, num_pixels)\n",
335 | " \n",
336 | " # this time, we'll use the ReLU nonlinearity at each layer \n",
337 | " x = self.relu(self.layer_1(x))\n",
338 | " x = self.layer_2(x) # we'll avoid \"squashing\" our final outputs by omitting the sigmoid\n",
339 | " \n",
340 | " return x\n",
341 | "\n",
342 | "model = MNISTNetwork()\n",
343 | "model"
344 | ]
345 | },
346 | {
347 | "cell_type": "markdown",
348 | "id": "a7f646c8-1f72-424e-a1c7-d6dd93ad3a00",
349 | "metadata": {
350 | "id": "a7f646c8-1f72-424e-a1c7-d6dd93ad3a00",
351 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
352 | },
353 | "source": [
354 | "### __3.2__ Classification Loss\n",
355 | "\n",
356 | "In the previous notebook, we used mean squared error loss to train our neural network. While mean squared error performs well in a number of tasks, it is more common to use __categorical cross-entropy loss__ for multiclass classification. We can think of our network's output as a vector of ten \"class scores,\" one per digit. In training our network, our goal is to make sure that given an input image, the correct class score \"comes out on top.\" We might try to minimize the mean squared error between our network's normalized output and a __one-hot__ vector indexing the correct label\n",
357 | "\n",
358 | "```\n",
359 | "prediction = [0.1, 0.1, 0.1, 0.0, 0.0, 0.0, 0.0, 0.5, 0.1, 0.1]\n",
360 | "target = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]\n",
361 | "```\n",
362 | "\n",
363 | "However, this objective does not necessarily correspond to our goal of maximizing the score of the target class while keeping all other scores low. Cross entropy loss generally does a better job of capturing this objective for multiclass classification, and its use can be considered equivalent to maximum-likelihood estimation under certain assumptions. We will use PyTorch's implementation, which provides an object capable of both computing the loss on pairs of tensors and computing gradients during the backward pass. We won't go into detail here, but for more info, check out the [official documentation](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html). Below, we show an example of calculating loss for a bit of made-up data.\n",
364 | "\n"
365 | ]
366 | },
367 | {
368 | "cell_type": "code",
369 | "execution_count": null,
370 | "id": "a60994ee-a190-485a-b245-9c5a8faa3bcc",
371 | "metadata": {
372 | "id": "a60994ee-a190-485a-b245-9c5a8faa3bcc"
373 | },
374 | "outputs": [],
375 | "source": [
376 | "# a PyTorch cross-entropy loss object\n",
377 | "loss_fn = torch.nn.CrossEntropyLoss()\n",
378 | "\n",
379 | "# the loss object takes in a vector of class scores and a vector of target class indices\n",
380 | "preds = torch.randn(batch_size, 10) # make a batch of random \"class score\" vectors, each with 10 scores corresponding to digits\n",
381 | "targets = torch.full((batch_size,), 7).long() # make a batch of target indices; here, we'll set 7 as the target for all predictions\n",
382 | "\n",
383 | "# compute the loss for this batch; by default, CrossEntropyLoss will average over a batch to return a scalar\n",
384 | "loss_fn(preds, targets)"
385 | ]
386 | },
387 | {
388 | "cell_type": "markdown",
389 | "id": "fcbe872c-fd79-4a7d-9c39-7ccbc9d81517",
390 | "metadata": {
391 | "id": "fcbe872c-fd79-4a7d-9c39-7ccbc9d81517",
392 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
393 | },
394 | "source": [
395 | "### __3.3__ Picking an Optimizer: SGD\n",
396 | "\n",
397 | "Recall that each training iteration can be broken down as follows: \n",
398 | "* we pass inputs to our network and collect outputs\n",
399 | "* we compute a differentiable a scalar loss on our network's outputs\n",
400 | "* we use backpropagation to compute the gradients of the loss with respect to our network's weights\n",
401 | "* we perform a gradient-based update on our weights to reduce the loss \n",
402 | "\n",
403 | "In the previous notebook, we made use of a built-in __optimizer__ to automate the process of updating our network's weights. This optimizer object stores references to our network's weights. When our backpropagation step (`backward()`) computes and stores gradients for all network parameters, the optimizer fetches these gradients and performs an update determined by its optimization algorithm. When training neural networks with large numbers of parameters, this becomes much simpler than manually updating each weight.\n",
404 | "\n",
405 | "PyTorch offers a number of [optimization algorithms](https://pytorch.org/docs/stable/optim.html), all of which use the same basic interface:\n",
406 | "\n",
407 | "```\n",
408 | "optimizer = OptimizerName(my_model.parameters(), lr=my_learning_rate, *other_params)\n",
409 | "```\n",
410 | "\n",
411 | "Each optimizer requires an iterable containing our network's weights (which the `.parameters()` method of any `torch.nn.Module` object provides) and a __learning rate__. As in the last notebook, we'll use __Stochastic Gradient Descent (SGD)__ to determine our updates. This algorithm scales the computed gradients with its learning rate and subtracts them from their respective weights to \"descend\" the loss function."
412 | ]
413 | },
414 | {
415 | "cell_type": "code",
416 | "execution_count": null,
417 | "id": "c7a89209-9e7f-4b37-b64f-8efcae8fcece",
418 | "metadata": {
419 | "id": "c7a89209-9e7f-4b37-b64f-8efcae8fcece"
420 | },
421 | "outputs": [],
422 | "source": [
423 | "# a simple optimization problem: we want our \"weights\" to sum to 10\n",
424 | "weights = torch.zeros(10).requires_grad_(True)\n",
425 | "print(f\"Starting weights: {weights}, Sum: {weights.sum().item()}\")\n",
426 | "\n",
427 | "# create an optimizer object and pass it an Iterable containing our \"weights\".\n",
428 | "# In this example, we'll take steps of size 1.0, meaning that each weight will \n",
429 | "# change by an amount equal to the magnitude of its gradient\n",
430 | "opt = torch.optim.SGD([weights], lr = 1.0) \n",
431 | "\n",
432 | "# compute loss and perform backpropagation\n",
433 | "loss = 10 - weights.sum()\n",
434 | "loss.backward()\n",
435 | "\n",
436 | "# perform an optimization step, i.e. a gradient-based update of our weights\n",
437 | "opt.step()\n",
438 | "\n",
439 | "print(f\"Updated weights: {weights}, Sum: {weights.sum().item()}\")"
440 | ]
441 | },
442 | {
443 | "cell_type": "markdown",
444 | "id": "e772d67a-67c8-4d88-970d-87fefadd65bb",
445 | "metadata": {
446 | "id": "e772d67a-67c8-4d88-970d-87fefadd65bb",
447 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
448 | },
449 | "source": [
450 | "### __3.4__ Checkpointing\n",
451 | "\n",
452 | "Before we begin training our model, we want to make sure we can save it in some format in case we experience a bug during training or want to use it again later. The process of saving snapshots of a model during training is often called __checkpointing__, and PyTorch offers utilities to make saving and loading models simple. For a neural network, saving a model really means saving its weights (parameters). All PyTorch models have a `.state_dict()` method that exposes their weights as named entries in a dictionary. Using this __state dictionary__, we can easily save weights or overwrite them with ones we load from elsewhere. For more info, feel free to check out the [official documentation](https://pytorch.org/tutorials/beginner/saving_loading_models.html)."
453 | ]
454 | },
455 | {
456 | "cell_type": "code",
457 | "execution_count": null,
458 | "id": "a55bb3e5-593f-491d-b804-362342febb3c",
459 | "metadata": {
460 | "id": "a55bb3e5-593f-491d-b804-362342febb3c"
461 | },
462 | "outputs": [],
463 | "source": [
464 | "# initialize a model\n",
465 | "model = MNISTNetwork()\n",
466 | "print(\"Names of network weights:\", list(model.state_dict().keys()))\n",
467 | "\n",
468 | "# save weights to disk\n",
469 | "torch.save(model.state_dict(), \"dummy_weights.pt\")\n",
470 | "\n",
471 | "# load weights from disk and overwrite network weights\n",
472 | "model.load_state_dict(torch.load(\"dummy_weights.pt\"))\n",
473 | "\n",
474 | "model"
475 | ]
476 | },
477 | {
478 | "cell_type": "markdown",
479 | "id": "ccfc6805-b78c-4005-96f8-a192ba040769",
480 | "metadata": {
481 | "id": "ccfc6805-b78c-4005-96f8-a192ba040769",
482 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
483 | },
484 | "source": [
485 | "## __4.__ Putting It All Together: Training Loop\n",
486 | "We're now ready to train a neural network to recognize handwritten digits from the MNIST dataset."
487 | ]
488 | },
489 | {
490 | "cell_type": "code",
491 | "execution_count": null,
492 | "id": "44658ffa-8a06-401e-b8a2-7429c20da354",
493 | "metadata": {
494 | "id": "44658ffa-8a06-401e-b8a2-7429c20da354"
495 | },
496 | "outputs": [],
497 | "source": [
498 | "def training_loop(save_path, epochs, batch_size, device=\"cpu\"):\n",
499 | " \"\"\"\n",
500 | " Train a neural network model for digit recognition on the MNIST dataset.\n",
501 | " \n",
502 | " Parameters\n",
503 | " ----------\n",
504 | " save_path (str): path/filename for model checkpoint, e.g. 'my_model.pt'\n",
505 | " \n",
506 | " epochs (int): number of iterations through the whole dataset for training\n",
507 | " \n",
508 | " batch_size (int): size of a single batch of inputs\n",
509 | " \n",
510 | " device (str): device on which tensors are placed; should be 'cpu' or 'cuda'. \n",
511 | " More on this in the next section!\n",
512 | " \n",
513 | " Returns\n",
514 | " -------\n",
515 | " model (nn.Module): final trained model\n",
516 | " \n",
517 | " save_path (str): path/filename for model checkpoint, so that we can load our model\n",
518 | " later to test on unseen data\n",
519 | " \n",
520 | " device (str): the device on which we carried out training, so we can match it\n",
521 | " when we test the final model on unseen data later\n",
522 | " \"\"\"\n",
523 | "\n",
524 | " # initialize model\n",
525 | " model = MNISTNetwork()\n",
526 | " model.to(device) # we'll cover this in the next section!\n",
527 | "\n",
528 | " # initialize an optimizer to update our model's parameters during training\n",
529 | " optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)\n",
530 | "\n",
531 | " # make a new directory in which to download the MNIST dataset\n",
532 | " data_dir = \"./data/\"\n",
533 | " \n",
534 | " # initialize a Transform object to prepare our data\n",
535 | " transform = torchvision.transforms.Compose([\n",
536 | " torchvision.transforms.ToTensor(),\n",
537 | " lambda x: x>0,\n",
538 | " lambda x: x.float(),\n",
539 | " ])\n",
540 | "\n",
541 | " # load MNIST \"test\" dataset from disk\n",
542 | " mnist_test = torchvision.datasets.MNIST(data_dir, train=False, download=False, transform=transform)\n",
543 | "\n",
544 | " # load MNIST \"train\" dataset from disk and set aside a portion for validation\n",
545 | " mnist_train_full = datasets.MNIST(data_dir, train=True, download=False, transform=transform)\n",
546 | " mnist_train, mnist_val = torch.utils.data.random_split(mnist_train_full, [55000, 5000])\n",
547 | "\n",
548 | " # initialize a DataLoader object for each dataset\n",
549 | " train_dataloader = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)\n",
550 | " val_dataloader = torch.utils.data.DataLoader(mnist_val, batch_size=batch_size, shuffle=False)\n",
551 | " test_dataloader = torch.utils.data.DataLoader(mnist_test, batch_size=1, shuffle=False)\n",
552 | "\n",
553 | " # a PyTorch categorical cross-entropy loss object\n",
554 | " loss_fn = torch.nn.CrossEntropyLoss()\n",
555 | "\n",
556 | " # time training process\n",
557 | " st = time.time()\n",
558 | "\n",
559 | " # time to start training!\n",
560 | " for epoch_idx, epoch in enumerate(range(epochs)):\n",
561 | "\n",
562 | " # keep track of best validation accuracy; if improved upon, save checkpoint\n",
563 | " best_acc = 0.0\n",
564 | "\n",
565 | " # loop through the entire dataset once per epoch\n",
566 | " train_loss = 0.0\n",
567 | " train_acc = 0.0\n",
568 | " train_total = 0\n",
569 | " model.train()\n",
570 | " for batch_idx, batch in enumerate(train_dataloader):\n",
571 | "\n",
572 | " # clear gradients\n",
573 | " optimizer.zero_grad()\n",
574 | "\n",
575 | " # unpack data and labels\n",
576 | " x, y = batch\n",
577 | " x = x.to(device) # we'll cover this in the next section!\n",
578 | " y = y.to(device) # we'll cover this in the next section!\n",
579 | "\n",
580 | " # generate predictions and compute loss\n",
581 | " output = model(x) # (batch_size, 10)\n",
582 | " loss = loss_fn(output, y)\n",
583 | "\n",
584 | " # compute accuracy\n",
585 | " preds = output.argmax(dim=1)\n",
586 | " acc = preds.eq(y).sum().item()/len(y)\n",
587 | "\n",
588 | " # compute gradients and update model parameters\n",
589 | " loss.backward()\n",
590 | " optimizer.step()\n",
591 | "\n",
592 | " # update statistics\n",
593 | " train_loss += (loss * len(x))\n",
594 | " train_acc += (acc * len(x))\n",
595 | " train_total += len(x)\n",
596 | "\n",
597 | " train_loss /= train_total\n",
598 | " train_acc /= train_total\n",
599 | "\n",
600 | " # perform validation once per epoch\n",
601 | " val_loss = 0.0\n",
602 | " val_acc = 0.0\n",
603 | " val_total = 0\n",
604 | " model.eval()\n",
605 | " for batch_idx, batch in enumerate(val_dataloader):\n",
606 | "\n",
607 | " # don't compute gradients during validation\n",
608 | " with torch.no_grad():\n",
609 | "\n",
610 | " # unpack data and labels\n",
611 | " x, y = batch\n",
612 | " x = x.to(device) # we'll cover this in the next section!\n",
613 | " y = y.to(device) # we'll cover this in the next section!\n",
614 | "\n",
615 | " # generate predictions and compute loss\n",
616 | " output = model(x)\n",
617 | " loss = loss_fn(output, y)\n",
618 | "\n",
619 | " # compute accuracy\n",
620 | " preds = output.argmax(dim=1)\n",
621 | " acc = preds.eq(y).sum().item()/len(y)\n",
622 | "\n",
623 | " # update statistics\n",
624 | " val_loss += (loss * len(x))\n",
625 | " val_acc += (acc * len(x))\n",
626 | " val_total += len(x)\n",
627 | "\n",
628 | " val_loss /= val_total\n",
629 | " val_acc /= val_total\n",
630 | " print(f\"Epoch {epoch_idx + 1}: val loss {val_loss :0.3f}, val acc {val_acc :0.3f}, train loss {train_loss :0.3f}, train acc {train_acc :0.3f}\")\n",
631 | "\n",
632 | " if val_acc > best_acc:\n",
633 | "\n",
634 | " best_acc = val_acc\n",
635 | " print(f\"New best accuracy; saving model weights to {save_path}\")\n",
636 | " torch.save(model.state_dict(), save_path)\n",
637 | "\n",
638 | " print(f\"Total training time (s): {time.time() - st :0.3f}\")\n",
639 | " \n",
640 | " return model, save_path, device\n",
641 | "\n",
642 | " \n",
643 | "# run our training loop\n",
644 | "model, save_path, device = training_loop(\"mnist_basic.pt\", 10, 60, \"cpu\")"
645 | ]
646 | },
647 | {
648 | "cell_type": "markdown",
649 | "id": "bFe7Zw54DEyr",
650 | "metadata": {
651 | "id": "bFe7Zw54DEyr"
652 | },
653 | "source": [
654 | "Once we're done training, we now load the best saved version of the model weights (which may not be the one from the final epoch) and compute final performance on unseen test data. Typically, this is reserved for after the model development process, so we get an unbiased estimate of the model's generalized accuracy."
655 | ]
656 | },
657 | {
658 | "cell_type": "code",
659 | "execution_count": null,
660 | "id": "961472ee-b038-4ae0-bd93-2becc8f653dc",
661 | "metadata": {
662 | "id": "961472ee-b038-4ae0-bd93-2becc8f653dc"
663 | },
664 | "outputs": [],
665 | "source": [
666 | "# load best weights\n",
667 | "model.load_state_dict(torch.load(save_path, map_location=device))\n",
668 | "\n",
669 | "test_loss = 0.0\n",
670 | "test_acc = 0.0\n",
671 | "test_total = 0\n",
672 | "model.eval()\n",
673 | "for batch_idx, batch in enumerate(test_dataloader):\n",
674 | "\n",
675 | " # don't compute gradients during validation\n",
676 | " with torch.no_grad():\n",
677 | "\n",
678 | " # unpack data and labels\n",
679 | " x, y = batch\n",
680 | " x = x.to(device) # we'll cover this in the next section!\n",
681 | " y = y.to(device) # we'll cover this in the next section!\n",
682 | "\n",
683 | " # generate predictions and compute loss\n",
684 | " output = model(x)\n",
685 | " loss = loss_fn(output, y)\n",
686 | "\n",
687 | " # compute accuracy\n",
688 | " preds = output.argmax(dim=1)\n",
689 | " acc = preds.eq(y).sum().item()/len(y)\n",
690 | "\n",
691 | " # update statistics\n",
692 | " test_loss += (loss * len(x))\n",
693 | " test_acc += (acc * len(x))\n",
694 | " test_total += len(x)\n",
695 | "\n",
696 | "test_loss /= test_total\n",
697 | "test_acc /= test_total\n",
698 | "print(f\"test loss {test_loss :0.3f}, test acc {test_acc :0.3f}\")"
699 | ]
700 | },
701 | {
702 | "cell_type": "markdown",
703 | "id": "edd262fc-3bf6-4a57-8ad6-7372cda4bded",
704 | "metadata": {
705 | "id": "edd262fc-3bf6-4a57-8ad6-7372cda4bded",
706 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e",
707 | "tags": []
708 | },
709 | "source": [
710 | "## __5.__ GPU Acceleration\n",
711 | "\n",
712 | "You might have noticed all the mentions of a `device` in the cells above. It turns out that neural networks use many operations, such as matrix multiplication, that can be efficiently parallelized and run on modern GPUs (graphics processing units, sometimes called \"video cards\"). As a result, neural network training and inference can see drastic speedups when run on a suitable GPU. PyTorch offers this option for NVIDIA-manufactured GPUs through the [CUDA platform](https://pytorch.org/docs/stable/cuda.html), and provides a simple interface (`.to()`) for moving data and computation between the CPU and GPU devices. To move data to the CPU, we can call:\n",
713 | "\n",
714 | "```\n",
715 | "x = x.to(\"cpu\")\n",
716 | "```\n",
717 | "\n",
718 | "To move data to a compatible NVIDIA GPU, we can call:\n",
719 | "\n",
720 | "```\n",
721 | "x = x.to(\"cuda\")\n",
722 | "```\n",
723 | "\n",
724 | "In practice, running machine learning code on a GPU may require you to check your device's compatibility and install various drivers; this can be quite a hassle. Luckily, [Google Colab](https://colab.research.google.com/) provides free (albeit limited) access to GPUs in a Jupyter-like notebook environment. If you're already running this code in Colab, you can access a GPU by going to `Runtime` > `Change runtime type`, setting `Hardware accelerator` to `GPU`, and clicking `Save`. Note that this will restart the notebook, meaning you will have to run your code again.\n",
725 | "\n",
726 | "Below, we'll try our basic training loop again. This time, however, we'll move our network and data to the GPU, allowing for faster training and inference. While the difference between CPU and GPU may be relatively minor in this case, it can be massive for larger models and datasets."
727 | ]
728 | },
729 | {
730 | "cell_type": "code",
731 | "source": [
732 | "# run this terminal command to see the details of your Colab server's GPU\n",
733 | "!nvidia-smi"
734 | ],
735 | "metadata": {
736 | "id": "d12RFSDR6Oej"
737 | },
738 | "id": "d12RFSDR6Oej",
739 | "execution_count": null,
740 | "outputs": []
741 | },
742 | {
743 | "cell_type": "code",
744 | "execution_count": null,
745 | "id": "706c9c65-096d-4e30-8e3b-f9dc44b26cd4",
746 | "metadata": {
747 | "id": "706c9c65-096d-4e30-8e3b-f9dc44b26cd4"
748 | },
749 | "outputs": [],
750 | "source": [
751 | "# first, let's check if we can access a compatible GPU\n",
752 | "if torch.cuda.is_available():\n",
753 | " print(\"Found a CUDA-compatible GPU!\")\n",
754 | " device = torch.device('cuda')\n",
755 | "else:\n",
756 | " print(\"No compatible GPU found; your code will run on the CPU again\")\n",
757 | " device = torch.device('cpu')\n",
758 | "\n",
759 | "training_loop(\"mnist_gpu.pt\", 10, 60, device)"
760 | ]
761 | }
762 | ],
763 | "metadata": {
764 | "colab": {
765 | "name": "notebook_3_image_classification (1).ipynb",
766 | "provenance": [],
767 | "collapsed_sections": []
768 | },
769 | "kernelspec": {
770 | "display_name": "course-deep-learning",
771 | "language": "python",
772 | "name": "course-deep-learning"
773 | },
774 | "language_info": {
775 | "codemirror_mode": {
776 | "name": "ipython",
777 | "version": 3
778 | },
779 | "file_extension": ".py",
780 | "mimetype": "text/x-python",
781 | "name": "python",
782 | "nbconvert_exporter": "python",
783 | "pygments_lexer": "ipython3",
784 | "version": "3.8.12"
785 | },
786 | "accelerator": "GPU"
787 | },
788 | "nbformat": 4,
789 | "nbformat_minor": 5
790 | }
--------------------------------------------------------------------------------
/notebooks/notebook_4_augmentation_logging.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "id": "76d05231-77c8-4d94-a445-0748a6033e48",
6 | "metadata": {
7 | "id": "76d05231-77c8-4d94-a445-0748a6033e48"
8 | },
9 | "source": [
10 | "# Notebook 4: Data Augmentation and Logging\n",
11 | "\n",
12 | "In this notebook, we'll expand our training loop for image classification to include __data augmentation__. We'll also use PyTorch's built-in __logging__ tools to monitor our network's progress as it trains.\n",
13 | "\n",
14 | "The notebook is broken up as follows:\n",
15 | "\n",
16 | " 1. [Setup](#setup) \n",
17 | " 2. [Neural Networks for Image Recognition](#review)\n",
18 | " 3. [Data Augmentation](#augmentation) \n",
19 | " 4. [Logging](#logging) "
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "id": "2df0ea70-8b9e-4c8d-88f4-b8b947e24a47",
25 | "metadata": {
26 | "id": "2df0ea70-8b9e-4c8d-88f4-b8b947e24a47",
27 | "tags": []
28 | },
29 | "source": [
30 | "## __1.__ Setup\n"
31 | ]
32 | },
33 | {
34 | "cell_type": "markdown",
35 | "id": "08ffed19-bf0e-4b4f-86e5-c13456ded4fa",
36 | "metadata": {
37 | "id": "08ffed19-bf0e-4b4f-86e5-c13456ded4fa"
38 | },
39 | "source": [
40 | "Make sure the needed packages are installed and utility code is in the right place."
41 | ]
42 | },
43 | {
44 | "cell_type": "code",
45 | "execution_count": null,
46 | "id": "d7593208-6c33-463f-82d7-6e411696f76e",
47 | "metadata": {
48 | "id": "d7593208-6c33-463f-82d7-6e411696f76e"
49 | },
50 | "outputs": [],
51 | "source": [
52 | "# helper code from the course repository\n",
53 | "!git clone https://github.com/interactiveaudiolab/course-deep-learning.git\n",
54 | "# install common pacakges used for deep learning\n",
55 | "!cd course-deep-learning/ && pip install -r requirements.txt"
56 | ]
57 | },
58 | {
59 | "cell_type": "code",
60 | "execution_count": null,
61 | "id": "70c1cadb-a5d7-4cec-8ef7-04253a7036ba",
62 | "metadata": {
63 | "id": "70c1cadb-a5d7-4cec-8ef7-04253a7036ba"
64 | },
65 | "outputs": [],
66 | "source": [
67 | "import time\n",
68 | "import torch\n",
69 | "import torch.nn as nn\n",
70 | "import torch.nn.functional as F\n",
71 | "import torchvision\n",
72 | "import torchvision.datasets as datasets\n",
73 | "import matplotlib.pyplot as plt\n",
74 | "import numpy as np\n",
75 | "\n",
76 | "%matplotlib inline\n",
77 | "%cd course-deep-learning/"
78 | ]
79 | },
80 | {
81 | "cell_type": "markdown",
82 | "id": "d4bf3ea7-458d-46cb-811f-e3607796845e",
83 | "metadata": {
84 | "id": "d4bf3ea7-458d-46cb-811f-e3607796845e"
85 | },
86 | "source": [
87 | "## __2.__ Neural Networks for Image Recognition\n",
88 | "\n",
89 | "In the previous notebook, we designed and trained a neural network to perform digit recognition on the MNIST dataset. In this notebook, we'll also consider a __convolutional neural network__ for the same task. Recall that convolutional networks use weight __kernels__ to capture correlations between neighboring coordinates. We can wrap the application of these kernels into a \"layer\" in the same way we do for weight-input dot products in a multilayer perceptron.\n",
90 | "\n",
91 | "In PyTorch, we can define a two-dimensional convolutional layer as follows:\n",
92 | "\n",
93 | "```\n",
94 | "conv_layer = nn.Conv2d(\n",
95 | " in_channels,\n",
96 | " out_channels,\n",
97 | " kernel_size,\n",
98 | " stride\n",
99 | ")\n",
100 | "```\n",
101 | "Some things to keep in mind:\n",
102 | "* `in_channels` refers to the number of channels in the input. In our case, because MNIST images are grayscale (1 channel), this value will be 1 for our first layer. \n",
103 | "* `kernel_size` can be either a tuple specifying `(kernel_height, kernel_width)` or an integer, in which case both the kernel height and width will be set to this value. Each kernel in the layer will have dimension `(in_channels, kernel_height, kernel_width)`, and will produce a single-channel feature map when applied to the input. Thus, `out_channels` refers to both the number of channels (feature maps) in the output and the number of convolutional kernels applied in the layer. \n",
104 | "* `stride` refers to the hop size when applying kernels, and can be either a tuple (specifying vertical and horizontal hop sizes) or an integer (in which case the same value will be used for both). \n",
105 | "* For an overview of more options, see the official [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html).\n",
106 | "\n",
107 | "In addition to convolution, we'll experiment with two additional types of layers:\n",
108 | "* __Dropout__ randomly zeros elements of an input tensor with a given probability, ensuring that the network learns more robust and general features. In order to apply dropout at training time but _not_ at inference time, we can call `.train()` and `.eval()` on our network as usual; these will automatically set the behavior of any dropout layers in the model. For more details, see the [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html).\n",
109 | "* __Max-Pooling__ can be thought of as a convolutional layer with `out_channels=in_channels`, but with the kernel dot-product operation replaced by a maximum. This can be used to \"pool\" or compress the spatial (height/width) dimensions of tensors as they pass through the network. For more details, see the [PyTorch documentation](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html)."
110 | ]
111 | },
112 | {
113 | "cell_type": "markdown",
114 | "id": "39b94f9f-d4a2-469b-a2c9-3c2bf9fb02ea",
115 | "metadata": {
116 | "id": "39b94f9f-d4a2-469b-a2c9-3c2bf9fb02ea"
117 | },
118 | "source": [
119 | "#### Model Definitions"
120 | ]
121 | },
122 | {
123 | "cell_type": "code",
124 | "execution_count": null,
125 | "id": "d3da2571-1c64-4d01-82b4-500a0bcb39f4",
126 | "metadata": {
127 | "id": "d3da2571-1c64-4d01-82b4-500a0bcb39f4"
128 | },
129 | "outputs": [],
130 | "source": [
131 | "class LinearNetwork(nn.Module):\n",
132 | "\n",
133 | " def __init__(self):\n",
134 | " \"\"\"The multi-layer perceptron from our previous notebook\"\"\"\n",
135 | " super().__init__()\n",
136 | "\n",
137 | " # MNIST images are (1, 28, 28) (channels, width, height)\n",
138 | " self.layer_1 = nn.Linear(28*28, 1024)\n",
139 | " self.layer_2 = nn.Linear(1024, 10)\n",
140 | " self.relu = nn.ReLU()\n",
141 | "\n",
142 | " def forward(self, x):\n",
143 | "\n",
144 | " batch_size, channels, width, height = x.size()\n",
145 | " x = x.view(batch_size, -1) # create an array of flattened images with dimension (batch_size, num_pixels)\n",
146 | "\n",
147 | " # this time, we'll use the ReLU nonlinearity at each layer \n",
148 | " x = self.relu(self.layer_1(x))\n",
149 | " x = self.layer_2(x) # we'll avoid \"squashing\" our final outputs by omitting the sigmoid\n",
150 | "\n",
151 | " return x\n",
152 | "\n",
153 | "\n",
154 | "class ConvNetwork(nn.Module):\n",
155 | " \"\"\"\n",
156 | " A simple convolutional neural network for image classification.\n",
157 | " From https://github.com/pytorch/examples/blob/master/mnist/main.py\n",
158 | " \"\"\"\n",
159 | "\n",
160 | " def __init__(self):\n",
161 | " super().__init__()\n",
162 | "\n",
163 | " # convolutional layers\n",
164 | " self.conv1 = nn.Conv2d(1, 32, 3, 1)\n",
165 | " self.conv2 = nn.Conv2d(32, 64, 3, 1)\n",
166 | "\n",
167 | " # just like in our fully-connected network, we'll use ReLU activations\n",
168 | " self.relu = nn.ReLU()\n",
169 | "\n",
170 | " # random dropout with two different \"strengths\"\n",
171 | " self.dropout1 = nn.Dropout(0.25) # we pass the dropout probability\n",
172 | " self.dropout2 = nn.Dropout(0.5)\n",
173 | "\n",
174 | " # max-pooling\n",
175 | " self.pool = nn.MaxPool2d(4)\n",
176 | "\n",
177 | " # a final fully-connected network to map our learned convolutional\n",
178 | " # features to class predictions\n",
179 | " self.fc1 = nn.Linear(64*6*6, 128)\n",
180 | " self.fc2 = nn.Linear(128, 10)\n",
181 | "\n",
182 | " def forward(self, x):\n",
183 | "\n",
184 | " # inputs are expected to have shape (batch_size, 1, 28, 28)\n",
185 | " x = self.conv1(x)\n",
186 | " x = self.relu(x)\n",
187 | "\n",
188 | " # out first convolutional layer reshapes inputs to (batch_size, 32, 26, 26)\n",
189 | " x = self.conv2(x)\n",
190 | " x = self.relu(x)\n",
191 | "\n",
192 | " # our second convolutional layer reshapes inputs to (batch_size, 64, 24, 24)\n",
193 | " x = self.pool(x)\n",
194 | " x = self.dropout1(x)\n",
195 | "\n",
196 | " # our pooling layer reduces inputs to (batch_size, 64, 6, 6)\n",
197 | " x = torch.flatten(x, 1)\n",
198 | "\n",
199 | " # we \"flatten\" inputs to (batch_size, 64 * 6 * 6) before passing to a \n",
200 | " # small fully-connected network\n",
201 | " x = self.fc1(x)\n",
202 | " x = self.relu(x)\n",
203 | " x = self.dropout2(x)\n",
204 | " x = self.fc2(x)\n",
205 | "\n",
206 | " # our final outputs are vectors of class scores, with shape (batch_size, 10)\n",
207 | " return x\n",
208 | "\n",
209 | "\n",
210 | "def param_count(m: nn.Module):\n",
211 | " \"\"\"Count the number of trainable parameters (weights) in a model\"\"\"\n",
212 | " return sum([p.shape.numel() for p in m.parameters() if p.requires_grad])\n",
213 | "\n",
214 | "\n",
215 | "model1 = LinearNetwork()\n",
216 | "model2 = ConvNetwork()\n",
217 | "\n",
218 | "params1 = param_count(model1)\n",
219 | "params2 = param_count(model2)\n",
220 | "\n",
221 | "print(f'Parameters in fully-connected network: {params1}')\n",
222 | "print(f'Parameters in convolutional network: {params2}')\n",
223 | "print(f'The convolutional network has {params2/params1 :0.2f}x as many parameters')"
224 | ]
225 | },
226 | {
227 | "cell_type": "markdown",
228 | "id": "801d2928-7e46-4fba-8940-1240da5bf879",
229 | "metadata": {
230 | "id": "801d2928-7e46-4fba-8940-1240da5bf879"
231 | },
232 | "source": [
233 | "#### Training Loop\n",
234 | "\n",
235 | "Next, we'll slightly modify our training loop to allow for different models."
236 | ]
237 | },
238 | {
239 | "cell_type": "code",
240 | "execution_count": null,
241 | "id": "c72e3a3a-4d41-4931-90ff-1ae9cfecbe91",
242 | "metadata": {
243 | "id": "c72e3a3a-4d41-4931-90ff-1ae9cfecbe91"
244 | },
245 | "outputs": [],
246 | "source": [
247 | "def training_loop(save_path, epochs, batch_size, device=\"cpu\", use_conv=False):\n",
248 | " \"\"\"\n",
249 | " Train a neural network model for digit recognition on the MNIST dataset.\n",
250 | " \n",
251 | " Parameters\n",
252 | " ----------\n",
253 | " save_path (str): path/filename for model checkpoint, e.g. 'my_model.pt'\n",
254 | " \n",
255 | " epochs (int): number of iterations through the whole dataset for training\n",
256 | " \n",
257 | " batch_size (int): size of a single batch of inputs\n",
258 | " \n",
259 | " device (str): device on which tensors are placed; should be 'cpu' or 'cuda'. \n",
260 | "\n",
261 | " use_conv (bool): if True, use ConvNetwork; else, use LinearNetwork.\n",
262 | " \n",
263 | " Returns\n",
264 | " -------\n",
265 | " model (nn.Module): final trained model\n",
266 | " \n",
267 | " save_path (str): path/filename for model checkpoint, so that we can load our model\n",
268 | " later to test on unseen data\n",
269 | " \n",
270 | " device (str): the device on which we carried out training, so we can match it\n",
271 | " when we test the final model on unseen data later\n",
272 | " \"\"\"\n",
273 | "\n",
274 | " # initialize model\n",
275 | " if use_conv:\n",
276 | " model = ConvNetwork()\n",
277 | " print('Training convolutional neural network...')\n",
278 | " else:\n",
279 | " model = LinearNetwork()\n",
280 | " print('Training fully-connected neural network...')\n",
281 | "\n",
282 | " print(f'Parameters in model: {param_count(model)}')\n",
283 | " model.to(device)\n",
284 | "\n",
285 | " # initialize an optimizer to update our model's parameters during training\n",
286 | " optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)\n",
287 | " # optimizer = torch.optim.Adadelta(model.parameters(), lr=1.0)\n",
288 | "\n",
289 | " # make a new directory in which to download the MNIST dataset\n",
290 | " data_dir = \"./data/\"\n",
291 | " \n",
292 | " # initialize a Transform object to prepare our data\n",
293 | " transform = torchvision.transforms.Compose([\n",
294 | " torchvision.transforms.ToTensor(),\n",
295 | " lambda x: x>0,\n",
296 | " lambda x: x.float(),\n",
297 | " ])\n",
298 | "\n",
299 | " # load MNIST \"test\" dataset from disk\n",
300 | " mnist_test = datasets.MNIST(data_dir, train=False, download=True, transform=transform)\n",
301 | "\n",
302 | " # load MNIST \"train\" dataset from disk and set aside a portion for validation\n",
303 | " mnist_train_full = datasets.MNIST(data_dir, train=True, download=True, transform=transform)\n",
304 | " mnist_train, mnist_val = torch.utils.data.random_split(mnist_train_full, [55000, 5000])\n",
305 | "\n",
306 | " # initialize a DataLoader object for each dataset\n",
307 | " train_dataloader = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)\n",
308 | " val_dataloader = torch.utils.data.DataLoader(mnist_val, batch_size=batch_size, shuffle=False)\n",
309 | " test_dataloader = torch.utils.data.DataLoader(mnist_test, batch_size=1, shuffle=False)\n",
310 | "\n",
311 | " # a PyTorch categorical cross-entropy loss object\n",
312 | " loss_fn = torch.nn.CrossEntropyLoss()\n",
313 | "\n",
314 | " # time training process\n",
315 | " st = time.time()\n",
316 | "\n",
317 | " # keep track of best validation accuracy; if improved upon, save checkpoint\n",
318 | " best_acc = 0.0\n",
319 | "\n",
320 | " # time to start training!\n",
321 | " for epoch_idx, epoch in enumerate(range(epochs)):\n",
322 | "\n",
323 | " # loop through the entire dataset once per epoch\n",
324 | " train_loss = 0.0\n",
325 | " train_acc = 0.0\n",
326 | " train_total = 0\n",
327 | " model.train()\n",
328 | " for batch_idx, batch in enumerate(train_dataloader):\n",
329 | "\n",
330 | " # clear gradients\n",
331 | " optimizer.zero_grad()\n",
332 | "\n",
333 | " # unpack data and labels\n",
334 | " x, y = batch\n",
335 | " x = x.to(device) # we'll cover this in the next section!\n",
336 | " y = y.to(device) # we'll cover this in the next section!\n",
337 | "\n",
338 | " # generate predictions and compute loss\n",
339 | " output = model(x) # (batch_size, 10)\n",
340 | " loss = loss_fn(output, y)\n",
341 | "\n",
342 | " # compute accuracy\n",
343 | " preds = output.argmax(dim=1)\n",
344 | " acc = preds.eq(y).sum().item()/len(y)\n",
345 | "\n",
346 | " # compute gradients and update model parameters\n",
347 | " loss.backward()\n",
348 | " optimizer.step()\n",
349 | "\n",
350 | " # update statistics\n",
351 | " train_loss += (loss * len(x))\n",
352 | " train_acc += (acc * len(x))\n",
353 | " train_total += len(x)\n",
354 | "\n",
355 | " train_loss /= train_total\n",
356 | " train_acc /= train_total\n",
357 | "\n",
358 | " # perform validation once per epoch\n",
359 | " val_loss = 0.0\n",
360 | " val_acc = 0.0\n",
361 | " val_total = 0\n",
362 | " model.eval()\n",
363 | " for batch_idx, batch in enumerate(val_dataloader):\n",
364 | "\n",
365 | " # don't compute gradients during validation\n",
366 | " with torch.no_grad():\n",
367 | "\n",
368 | " # unpack data and labels\n",
369 | " x, y = batch\n",
370 | " x = x.to(device) # we'll cover this in the next section!\n",
371 | " y = y.to(device) # we'll cover this in the next section!\n",
372 | "\n",
373 | " # generate predictions and compute loss\n",
374 | " output = model(x)\n",
375 | " loss = loss_fn(output, y)\n",
376 | "\n",
377 | " # compute accuracy\n",
378 | " preds = output.argmax(dim=1)\n",
379 | " acc = preds.eq(y).sum().item()/len(y)\n",
380 | "\n",
381 | " # update statistics\n",
382 | " val_loss += (loss * len(x))\n",
383 | " val_acc += (acc * len(x))\n",
384 | " val_total += len(x)\n",
385 | "\n",
386 | " val_loss /= val_total\n",
387 | " val_acc /= val_total\n",
388 | " print(f\"Epoch {epoch_idx + 1}: val loss {val_loss :0.3f}, val acc {val_acc :0.3f}, train loss {train_loss :0.3f}, train acc {train_acc :0.3f}\")\n",
389 | "\n",
390 | " if val_acc > best_acc:\n",
391 | " print(f\"New best accuracy {val_acc : 0.3f} (old {best_acc : 0.3f}); saving model weights to {save_path}\")\n",
392 | " best_acc = val_acc\n",
393 | " torch.save(model.state_dict(), save_path)\n",
394 | "\n",
395 | " print(f\"Total training time (s): {time.time() - st :0.3f}\")\n",
396 | " \n",
397 | " return model, save_path, device\n"
398 | ]
399 | },
400 | {
401 | "cell_type": "markdown",
402 | "id": "9367db63-7e64-4ac8-9216-be729f9c15bf",
403 | "metadata": {
404 | "id": "9367db63-7e64-4ac8-9216-be729f9c15bf"
405 | },
406 | "source": [
407 | "#### Run It!\n",
408 | "\n",
409 | "Finally, we can compare our convolutional and fully-connected models."
410 | ]
411 | },
412 | {
413 | "cell_type": "code",
414 | "execution_count": null,
415 | "id": "b9418055-6cf2-43e9-8d6f-5c6e50735f09",
416 | "metadata": {
417 | "id": "b9418055-6cf2-43e9-8d6f-5c6e50735f09"
418 | },
419 | "outputs": [],
420 | "source": [
421 | "# train a convolutional neural network\n",
422 | "conv_model, conv_path, device = training_loop(\n",
423 | " save_path=\"mnist_cnn.pt\", \n",
424 | " epochs=20, \n",
425 | " batch_size=60, \n",
426 | " device=\"cuda\" if torch.cuda.is_available() else \"cpu\",\n",
427 | " use_conv=True\n",
428 | ")\n",
429 | "\n",
430 | "# train a fully-connected neural network\n",
431 | "lin_model, lin_path, device = training_loop(\n",
432 | " save_path=\"mnist_fc.pt\", \n",
433 | " epochs=20, \n",
434 | " batch_size=60, \n",
435 | " device=\"cuda\" if torch.cuda.is_available() else \"cpu\",\n",
436 | " use_conv=False\n",
437 | ")"
438 | ]
439 | },
440 | {
441 | "cell_type": "markdown",
442 | "source": [
443 | "Our convolutional network is able to achieve a classification accuracy __~4%__ higher than our fully-connected network, with less than half the parameters!"
444 | ],
445 | "metadata": {
446 | "id": "ZMAkPGvztiBU"
447 | },
448 | "id": "ZMAkPGvztiBU"
449 | },
450 | {
451 | "cell_type": "markdown",
452 | "id": "cfbb51e2-bcdc-435c-aea8-f2efa94df648",
453 | "metadata": {
454 | "id": "cfbb51e2-bcdc-435c-aea8-f2efa94df648",
455 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e",
456 | "tags": []
457 | },
458 | "source": [
459 | "## __3.__ Data Augmentation\n",
460 | "\n",
461 | "We've got a pretty accurate model, but there are plenty of deep learning tricks we can use to squeeze some extra performance. One common practice is __data augmentation__, in which random transformations are applied to inputs during training. This helps in two ways:\n",
462 | "* Often, datasets are relatively small and imperfectly represent the popluation from which they are sampled. Data augmentation effectively expands the size of the dataset through sampling additional randomized variations of each instance.\n",
463 | "* We typically want to train a model that is __robust__ against common real-world transformations of its inputs -- that is, a model whose predictions are __invariant__ under these transformations. Data augmentation exposes our model to a chosen set of transformations during training so that it can learn to \"see past\" them.\n",
464 | "\n",
465 | "TorchVision provides a number of `Transform` objects designed to perform data augmentation, making it easy to apply transformations automatically when data is fetched from a `Dataset` object."
466 | ]
467 | },
468 | {
469 | "cell_type": "code",
470 | "execution_count": null,
471 | "id": "f3663caa-62a9-4ecd-803d-b2bd36c6a231",
472 | "metadata": {
473 | "id": "f3663caa-62a9-4ecd-803d-b2bd36c6a231"
474 | },
475 | "outputs": [],
476 | "source": [
477 | "# directory for MNIST dataset\n",
478 | "data_dir = \"./data/\"\n",
479 | "\n",
480 | "# initialize a Transform object to prepare our data\n",
481 | "transform = torchvision.transforms.Compose([\n",
482 | " torchvision.transforms.ToTensor(),\n",
483 | " lambda x: x>0,\n",
484 | " lambda x: x.float(),\n",
485 | "])\n",
486 | "\n",
487 | "# load MNIST \"train\" dataset from disk\n",
488 | "mnist_train = datasets.MNIST(data_dir, train=False, download=True, transform=transform)\n",
489 | "\n",
490 | "# fetch an image from the MNIST dataset\n",
491 | "example_img, example_label = mnist_train[300]\n",
492 | "plt.imshow(example_img.squeeze(), cmap='gray')\n",
493 | "plt.show()\n",
494 | "\n",
495 | "# perform a random affine transformation of an input (rotation, translation, shear)\n",
496 | "affine_aug = torchvision.transforms.RandomAffine(degrees=(-30, 30), translate=(0.25, 0.25), shear=(-45, 45))\n",
497 | "augmented = affine_aug(example_img)\n",
498 | "plt.imshow(augmented.squeeze(), cmap='gray')\n",
499 | "plt.show()"
500 | ]
501 | },
502 | {
503 | "cell_type": "markdown",
504 | "id": "d9650113-cdf9-4b1f-a39e-5863287d0ca2",
505 | "metadata": {
506 | "id": "d9650113-cdf9-4b1f-a39e-5863287d0ca2",
507 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
508 | },
509 | "source": [
510 | "Because we're effectively increasing the size of the dataset, and due to the computation required to perform each transformation, training with data augmentation may take more time (as measured in both walltime and iterations). It's also worth noting that augmentations are typically applied to the training data only. While we won't go into detail at the moment, feel free to try training with any of the [augmentations offered by TorchVision](https://pytorch.org/vision/stable/transforms.html). You can add augmentations to the training loop above by editing the `transfom` object:\n",
511 | "\n",
512 | "```\n",
513 | "# initialize a Transform object to prepare our data\n",
514 | "transform = torchvision.transforms.Compose([\n",
515 | " torchvision.transforms.ToTensor(),\n",
516 | " lambda x: x>0,\n",
517 | " lambda x: x.float(),\n",
518 | " torchvision.transforms.RandomAffine(degrees=(-30, 30), translate=(0.25, 0.25), shear=(-45, 45)) # just append transforms!\n",
519 | "])\n",
520 | "```"
521 | ]
522 | },
523 | {
524 | "cell_type": "markdown",
525 | "id": "6d616308-8574-43bb-a9b3-ea3b92f899e9",
526 | "metadata": {
527 | "id": "6d616308-8574-43bb-a9b3-ea3b92f899e9",
528 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e",
529 | "tags": []
530 | },
531 | "source": [
532 | "## __4.__ Logging"
533 | ]
534 | },
535 | {
536 | "cell_type": "markdown",
537 | "id": "7f9fe057-59da-47a9-9ccf-ca9d96070ca1",
538 | "metadata": {
539 | "id": "7f9fe057-59da-47a9-9ccf-ca9d96070ca1",
540 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
541 | },
542 | "source": [
543 | "In our training loop, we print running summaries of our model's training performance in order to monitor its progress. This is somewhat clunky and limited - what if we want to plot accuracy in real time, visualize challenging instances, dynamically change what information is displayed, or document and compare across multiple training runs? All these tasks fall under the umbrella of __logging__, and once again, PyTorch provides utilities to simplify the process. We can use PyTorch's built-in TensorBoard support to configure and view training logs without the need for any external database or visualization software. To launch TensorBoard within the notebook, run the cell below:"
544 | ]
545 | },
546 | {
547 | "cell_type": "code",
548 | "execution_count": null,
549 | "id": "4bfc97e7-86a5-41c7-941d-951dce68a654",
550 | "metadata": {
551 | "id": "4bfc97e7-86a5-41c7-941d-951dce68a654"
552 | },
553 | "outputs": [],
554 | "source": [
555 | "# here, we'll initialize TensorBoard. You should see an empty window in this cell, which will populate with\n",
556 | "# graphs as soon as we run our training code below.\n",
557 | "%load_ext tensorboard\n",
558 | "%tensorboard --logdir logs"
559 | ]
560 | },
561 | {
562 | "cell_type": "markdown",
563 | "id": "b0a8b591-8607-437f-9380-2a99c3cf1e2d",
564 | "metadata": {
565 | "id": "b0a8b591-8607-437f-9380-2a99c3cf1e2d",
566 | "outputId": "b47b6db9-4f27-42f0-bb93-f497018aa513"
567 | },
568 | "source": [
569 | "Next, we'll re-write out training loop to log loss and accuracy values to TensorBoard rather than printing."
570 | ]
571 | },
572 | {
573 | "cell_type": "code",
574 | "execution_count": null,
575 | "id": "7633848e-400c-4e8c-a73b-04b4244b2377",
576 | "metadata": {
577 | "id": "7633848e-400c-4e8c-a73b-04b4244b2377"
578 | },
579 | "outputs": [],
580 | "source": [
581 | "import datetime\n",
582 | "from pathlib import Path\n",
583 | "from torch.utils.tensorboard import SummaryWriter\n",
584 | "\n",
585 | "# save all log data to a local directory\n",
586 | "run_dir = \"logs\"\n",
587 | "\n",
588 | "# timestamp the logs for each run so we can sort through them \n",
589 | "run_time = datetime.datetime.now().strftime(\"%I:%M%p on %B %d, %Y\")\n",
590 | "\n",
591 | "# initialize a SummaryWriter object to handle all logging actions\n",
592 | "logger = SummaryWriter(log_dir=Path(run_dir) / run_time)\n",
593 | "\n",
594 | "def training_loop(save_path, \n",
595 | " epochs, \n",
596 | " batch_size, \n",
597 | " device=\"cpu\", \n",
598 | " use_conv=False,\n",
599 | " logger=None\n",
600 | " ):\n",
601 | " \"\"\"\n",
602 | " Train a neural network model for digit recognition on the MNIST dataset.\n",
603 | " \n",
604 | " Parameters\n",
605 | " ----------\n",
606 | " save_path (str): path/filename for model checkpoint, e.g. 'my_model.pt'\n",
607 | " \n",
608 | " epochs (int): number of iterations through the whole dataset for training\n",
609 | " \n",
610 | " batch_size (int): size of a single batch of inputs\n",
611 | " \n",
612 | " device (str): device on which tensors are placed; should be 'cpu' or 'cuda'. \n",
613 | "\n",
614 | " use_conv (bool): if True, use ConvNetwork; else, use LinearNetwork.\n",
615 | "\n",
616 | " logger (SummaryWriter): a TensorBoard logger\n",
617 | " \n",
618 | " Returns\n",
619 | " -------\n",
620 | " model (nn.Module): final trained model\n",
621 | " \n",
622 | " save_path (str): path/filename for model checkpoint, so that we can load our model\n",
623 | " later to test on unseen data\n",
624 | " \n",
625 | " device (str): the device on which we carried out training, so we can match it\n",
626 | " when we test the final model on unseen data later\n",
627 | " \"\"\"\n",
628 | "\n",
629 | " # initialize model\n",
630 | " if use_conv:\n",
631 | " model = ConvNetwork()\n",
632 | " print('Training convolutional neural network...')\n",
633 | " else:\n",
634 | " model = LinearNetwork()\n",
635 | " print('Training fully-connected neural network...')\n",
636 | "\n",
637 | " print(f'Parameters in model: {param_count(model)}')\n",
638 | " model.to(device)\n",
639 | "\n",
640 | " # initialize an optimizer to update our model's parameters during training\n",
641 | " if use_conv:\n",
642 | " optimizer = torch.optim.Adadelta(model.parameters(), lr=1.0)\n",
643 | " else:\n",
644 | " optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)\n",
645 | "\n",
646 | " # make a new directory in which to download the MNIST dataset\n",
647 | " data_dir = \"./data/\"\n",
648 | " \n",
649 | " # initialize a Transform object to prepare our data\n",
650 | " transform = torchvision.transforms.Compose([\n",
651 | " torchvision.transforms.ToTensor(),\n",
652 | " lambda x: x>0,\n",
653 | " lambda x: x.float(),\n",
654 | " ])\n",
655 | "\n",
656 | " # load MNIST \"test\" dataset from disk\n",
657 | " mnist_test = datasets.MNIST(data_dir, train=False, download=True, transform=transform)\n",
658 | "\n",
659 | " # load MNIST \"train\" dataset from disk and set aside a portion for validation\n",
660 | " mnist_train_full = datasets.MNIST(data_dir, train=True, download=True, transform=transform)\n",
661 | " mnist_train, mnist_val = torch.utils.data.random_split(mnist_train_full, [55000, 5000])\n",
662 | "\n",
663 | " # initialize a DataLoader object for each dataset\n",
664 | " train_dataloader = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True)\n",
665 | " val_dataloader = torch.utils.data.DataLoader(mnist_val, batch_size=batch_size, shuffle=False)\n",
666 | " test_dataloader = torch.utils.data.DataLoader(mnist_test, batch_size=1, shuffle=False)\n",
667 | "\n",
668 | " # a PyTorch categorical cross-entropy loss object\n",
669 | " loss_fn = torch.nn.CrossEntropyLoss()\n",
670 | "\n",
671 | " # time training process\n",
672 | " st = time.time()\n",
673 | "\n",
674 | " # keep track of best validation accuracy; if improved upon, save checkpoint\n",
675 | " best_acc = 0.0\n",
676 | "\n",
677 | " # time to start training!\n",
678 | " for epoch_idx, epoch in enumerate(range(epochs)):\n",
679 | "\n",
680 | " # loop through the entire dataset once per epoch\n",
681 | " train_loss = 0.0\n",
682 | " train_acc = 0.0\n",
683 | " train_total = 0\n",
684 | " model.train()\n",
685 | " for batch_idx, batch in enumerate(train_dataloader):\n",
686 | "\n",
687 | " # clear gradients\n",
688 | " optimizer.zero_grad()\n",
689 | "\n",
690 | " # unpack data and labels\n",
691 | " x, y = batch\n",
692 | " x = x.to(device) # we'll cover this in the next section!\n",
693 | " y = y.to(device) # we'll cover this in the next section!\n",
694 | "\n",
695 | " # generate predictions and compute loss\n",
696 | " output = model(x) # (batch_size, 10)\n",
697 | " loss = loss_fn(output, y)\n",
698 | "\n",
699 | " # compute accuracy\n",
700 | " preds = output.argmax(dim=1)\n",
701 | " acc = preds.eq(y).sum().item()/len(y)\n",
702 | "\n",
703 | " # compute gradients and update model parameters\n",
704 | " loss.backward()\n",
705 | " optimizer.step()\n",
706 | "\n",
707 | " # update statistics\n",
708 | " train_loss += (loss * len(x))\n",
709 | " train_acc += (acc * len(x))\n",
710 | " train_total += len(x)\n",
711 | "\n",
712 | " train_loss /= train_total\n",
713 | " train_acc /= train_total\n",
714 | "\n",
715 | " ########################################################################\n",
716 | " # NEW: log to TensorBoard\n",
717 | " ########################################################################\n",
718 | "\n",
719 | " if logger is not None:\n",
720 | " logger.add_scalar(\"train_loss\", train_loss, epoch_idx)\n",
721 | " logger.add_scalar(\"train_acc\", train_acc, epoch_idx)\n",
722 | "\n",
723 | " # perform validation once per epoch\n",
724 | " val_loss = 0.0\n",
725 | " val_acc = 0.0\n",
726 | " val_total = 0\n",
727 | " model.eval()\n",
728 | " for batch_idx, batch in enumerate(val_dataloader):\n",
729 | "\n",
730 | " # don't compute gradients during validation\n",
731 | " with torch.no_grad():\n",
732 | "\n",
733 | " # unpack data and labels\n",
734 | " x, y = batch\n",
735 | " x = x.to(device) # we'll cover this in the next section!\n",
736 | " y = y.to(device) # we'll cover this in the next section!\n",
737 | "\n",
738 | " # generate predictions and compute loss\n",
739 | " output = model(x)\n",
740 | " loss = loss_fn(output, y)\n",
741 | "\n",
742 | " # compute accuracy\n",
743 | " preds = output.argmax(dim=1)\n",
744 | " acc = preds.eq(y).sum().item()/len(y)\n",
745 | "\n",
746 | " # update statistics\n",
747 | " val_loss += (loss * len(x))\n",
748 | " val_acc += (acc * len(x))\n",
749 | " val_total += len(x)\n",
750 | "\n",
751 | " val_loss /= val_total\n",
752 | " val_acc /= val_total\n",
753 | "\n",
754 | " ########################################################################\n",
755 | " # NEW: log to TensorBoard\n",
756 | " ########################################################################\n",
757 | " \n",
758 | " if logger is not None:\n",
759 | " logger.add_scalar(\"val_loss\", val_loss, epoch_idx)\n",
760 | " logger.add_scalar(\"val_acc\", val_acc, epoch_idx)\n",
761 | " \n",
762 | " print(f\"Epoch {epoch_idx + 1}: val loss {val_loss :0.3f}, val acc {val_acc :0.3f}, train loss {train_loss :0.3f}, train acc {train_acc :0.3f}\")\n",
763 | "\n",
764 | " if val_acc > best_acc:\n",
765 | " print(f\"New best accuracy {val_acc : 0.3f} (old {best_acc : 0.3f}); saving model weights to {save_path}\")\n",
766 | " best_acc = val_acc\n",
767 | " torch.save(model.state_dict(), save_path)\n",
768 | "\n",
769 | " print(f\"Total training time (s): {time.time() - st :0.3f}\")\n",
770 | " \n",
771 | " return model, save_path, device\n",
772 | " "
773 | ]
774 | },
775 | {
776 | "cell_type": "code",
777 | "execution_count": null,
778 | "id": "bb3c2f18-ff04-420c-a47c-16c1387bcae3",
779 | "metadata": {
780 | "id": "bb3c2f18-ff04-420c-a47c-16c1387bcae3"
781 | },
782 | "outputs": [],
783 | "source": [
784 | "# run our training loop\n",
785 | "model, save_path, device = training_loop(\n",
786 | " save_path=\"mnist_review.pt\", \n",
787 | " epochs=10, \n",
788 | " batch_size=60, \n",
789 | " device=\"cuda\" if torch.cuda.is_available() else \"cpu\",\n",
790 | " use_conv=True,\n",
791 | " logger=logger\n",
792 | ")"
793 | ]
794 | },
795 | {
796 | "cell_type": "markdown",
797 | "id": "dcb29bfa-c022-460d-92b2-706c50061272",
798 | "metadata": {
799 | "id": "dcb29bfa-c022-460d-92b2-706c50061272",
800 | "outputId": "9348bd86-e439-48de-e0a3-f9935787cf8e"
801 | },
802 | "source": [
803 | "We can also run TensorBoard from the terminal, in which case we can view the logs in a browser by navigating to the correct port on our `localhost`. In the example below, after running the command we would need to point our browser to `localhost:9999`\n",
804 | "\n",
805 | "```\n",
806 | "$ tensorboard --logdir /path/to/logging/directory/ --port 9999\n",
807 | "```\n",
808 | "\n",
809 | "If no port is given, TensorBoard will default to 6006. In fact, the logs from your experiment above should already be visible at `localhost:6006`. TensorBoard will continue serving on this port until the notebook kernel shuts down or you halt the terminal command (e.g. using `ctrl` + `c`), at which point you will not be able to view your logs until you re-start TensorBoard."
810 | ]
811 | }
812 | ],
813 | "metadata": {
814 | "kernelspec": {
815 | "display_name": "course-deep-learning",
816 | "language": "python",
817 | "name": "course-deep-learning"
818 | },
819 | "language_info": {
820 | "codemirror_mode": {
821 | "name": "ipython",
822 | "version": 3
823 | },
824 | "file_extension": ".py",
825 | "mimetype": "text/x-python",
826 | "name": "python",
827 | "nbconvert_exporter": "python",
828 | "pygments_lexer": "ipython3",
829 | "version": "3.8.12"
830 | },
831 | "colab": {
832 | "name": "notebook_4_augmentation_logging (2).ipynb",
833 | "provenance": [],
834 | "collapsed_sections": [],
835 | "toc_visible": true
836 | },
837 | "accelerator": "GPU"
838 | },
839 | "nbformat": 4,
840 | "nbformat_minor": 5
841 | }
--------------------------------------------------------------------------------
/notebooks/slides/DL_adversarial_examples.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_adversarial_examples.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_attention_networks.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_attention_networks.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_convolutional_nets.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_convolutional_nets.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_deep_reinforcement_learning.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_deep_reinforcement_learning.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_gradient_descent.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_gradient_descent.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_lightning_and_tensorboard.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"DL Lightning and TensorBoard.ipynb","provenance":[],"private_outputs":true,"collapsed_sections":["Exnwbnlyhk-o","3fMHyUaGhj-p"],"toc_visible":true,"authorship_tag":"ABX9TyOkNZDLi9ZC8C2jyL6SwjWI"},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"cell_type":"markdown","metadata":{"id":"fXGJ_v2xEsfd"},"source":["# Basics of pytorch_lightning, dataloaders, and tensorboard\n","\n","This is a basic walkthrough of building, training, and using a simple neural network in [PyTorch](https://pytorch.org/) using the [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/latest/) package to let us make better-organized code, and displaying information about training in [TensorBoard](https://www.tensorflow.org/tensorboard). We'll load the classic [MNIST dataset](http://yann.lecun.com/exdb/mnist/) from the [torchvision.datasets](https://pytorch.org/docs/stable/torchvision/datasets.html) library of image datasets and feed the network data using a [dataloader](https://pytorch.org/docs/stable/data.html), which is a standard way of handling data for a PyTorch model.\n","\n"]},{"cell_type":"markdown","metadata":{"id":"Exnwbnlyhk-o"},"source":["## Install and import the needed packages"]},{"cell_type":"markdown","metadata":{"id":"wjbkm66hEngE"},"source":["### Install the needed packages."]},{"cell_type":"code","metadata":{"id":"SDsIUoGDKxNy"},"source":["!pip install torch \n","!pip install torchvision\n","!pip install pytorch_lightning\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"LAPyUo7jK22C"},"source":["### Now that the packages are installed, import them to this project."]},{"cell_type":"code","metadata":{"id":"9lEdGUS8EjGM"},"source":["# -------------------------------------------------\n","# Pytorch \n","# -------------------------------------------------\n","\n","# This is the main torch package\n","import torch \n","#Computer vision specific package \n","import torchvision\n","#There are a bunch of standard datasets in torchvision. \n","import torchvision.datasets as datasets\n","\n","# -------------------------------------------------\n","# Pytorch Lightning \n","# -------------------------------------------------\n","\n","import pytorch_lightning as pl\n","# this gives us the hooks to connect to TensorBoard\n","import pytorch_lightning.loggers as pl_loggers\n","\n","# -------------------------------------------------\n","# Stuff to show the data using matplotlib\n","# -------------------------------------------------\n","#import random\n","#import numpy as np\n","#import matplotlib\n","#import matplotlib.pyplot as plt\n","# this magic command lets me show plots in the notebook\n","#%matplotlib inline\n","\n","# -------------------------------------------------\n","# Stuff for timestamping \n","# -------------------------------------------------\n","from time import process_time \n","import datetime\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"3fMHyUaGhj-p"},"source":["## Define the [LightningDataModule](https://pytorch-lightning.readthedocs.io/en/latest/datamodules.html) to prepares the data for use by the network."]},{"cell_type":"markdown","metadata":{"id":"oQy1M_0Zy-eM"},"source":["A datamodule is a shareable, reusable class that encapsulates all the steps needed to process data. \n","\n","1. Download / tokenize / process.\n","1. Clean up data and (maybe) save to disk.\n","1. Load data into a [Dataset](https://pytorch.org/docs/stable/data.html).\n","1. Apply [Transforms](https://pytorch.org/docs/stable/torchvision/transforms.html) to the data (rotate, tokenize, etc…).\n","1. Wrap inside a [DataLoader](https://pytorch.org/docs/stable/data.html)."]},{"cell_type":"code","metadata":{"id":"_LTYf095fSq0"},"source":["class MyDataModule(pl.LightningDataModule):\n","\n"," def __init__(self, data_dir='./data/'):\n"," super().__init__()\n"," self.data_dir = data_dir\n","\n"," # These will be applied to every element in the dataset, to ensure they're \n"," # normalized and in the right format.\n"," self.transform = torchvision.transforms.Compose([\n"," torchvision.transforms.ToTensor(),\n"," torchvision.transforms.Normalize((0.5,), (0.5,))])\n"," \n"," # Our dataset is in the torchvision library of datasets. Here is where you'd\n"," # change the code to process a different dataset\n"," def setup(self, stage=None):\n"," self.mnist_test = datasets.MNIST(self.data_dir,train=False,download=True, transform=self.transform)\n"," mnist_full = datasets.MNIST(self.data_dir, train=True, download=True, transform=self.transform)\n"," self.mnist_train, self.mnist_val = torch.utils.data.random_split(mnist_full, [55000, 5000])\n","\n"," # Dataloaders are the things that handle creating batches of data and handing them\n"," # to the model. You determine whether to randomize data order and the size of the batch\n"," # when you declare the data loader\n"," def train_dataloader(self):\n"," return torch.utils.data.DataLoader(self.mnist_train, batch_size=64, shuffle=True)\n","\n"," def val_dataloader(self):\n"," return torch.utils.data.DataLoader(self.mnist_val, batch_size=64, shuffle=True)\n","\n"," def test_dataloader(self):\n"," return torch.utils.data.DataLoader(self.mnist_test, batch_size=1)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"NHEBDrqXAYw2"},"source":["## Define network architecture and test/train actions in a [PyTorchLightning](https://pytorch-lightning.readthedocs.io/en/latest/lightning_module.html) Module."]},{"cell_type":"markdown","metadata":{"id":"4vxVCyl317X9"},"source":["PyTorch Lightning builds on top of standard PyTorch. It is a way of organizing code to make it more modular and easier to handle. A LightningModule organizes PyTorch code into these sections:\n","\n","1. Network architecture (init)\n","1. Data-flow/computations (forward)\n","1. Train loop (training_step)\n","1. Validation loop (validation_step)\n","1. Test loop (test_step)\n","1. Optimizers (configure_optimizers)\n","\n","The first two of these (init and forward) are what you'd find in a typical [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). The LightningModule extends the torch Module to add the train/test/validation and optimizer definitions into the module.\n","\n","The code you'd normally write for a torch Module's training and testing loops are instead handled externally to your code by a [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html). The Trainer handles the event loop for training and testing the model, calling the methods in your LightningModule when it's time to train or test on a batch of data from a dataloader."]},{"cell_type":"code","metadata":{"id":"p_Qr7940ORwn"},"source":["\n","class MyLightningModule(pl.LightningModule):\n","\n"," # Define the model architecture\n"," def __init__(self):\n"," super(MyLightningModule, self).__init__()\n","\n"," # mnist images are (1, 28, 28) (channels, width, height) \n"," self.layer_1 = torch.nn.Linear(28 * 28, 64)\n"," self.layer_2 = torch.nn.Linear(64, 256)\n"," self.layer_3 = torch.nn.Linear(256, 10)\n","\n"," def forward(self, x):\n"," batch_size, channels, width, height = x.size()\n"," x = x.view(batch_size, -1)\n"," x = torch.relu(self.layer_1(x))\n"," x = torch.relu(self.layer_2(x))\n"," x = torch.log_softmax(self.layer_3(x), dim=1)\n"," return x\n","\n"," def configure_optimizers(self):\n"," optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)\n"," return optimizer\n","\n"," def my_loss(self, y_hat,y):\n"," return torch.nn.functional.nll_loss(y_hat,y)\n","\n"," def training_step(self, train_batch, batch_idx):\n"," x, y = train_batch # Here x = data, y = labels\n"," output = self.forward(x)\n"," loss = self.my_loss(output, y)\n"," \n"," # Calculate the accuracy of the model on the batch of data\n"," y_hat = output.argmax(dim=1)\n"," accuracy = y_hat.eq(y).sum().item()/len(y)\n","\n"," # these two lines write the accurcay and loss to TensorBoard\n"," self.logger.experiment.add_scalar(\"Accuracy/Train\", accuracy)\n"," self.logger.experiment.add_scalar(\"Loss/Train\", loss)\n","\n"," return {\"loss\": loss} \n","\n"," def validation_step(self, val_batch, batch_idx):\n"," x, y = val_batch\n"," output = self.forward(x)\n"," loss = self.my_loss(output, y)\n"," self.logger.experiment.add_scalar(\"Loss/Val\", loss)\n"," return {\"loss\":loss}\n","\n"," def test_step(self, test_batch, batch_idx):\n"," x, y = test_batch\n"," output = self.forward(x)\n"," loss = self.my_loss(output, y)\n"," return {\"loss\":loss}\n","\n"," \n","\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"je7DSC35eJk_"},"source":["## Declare a [TensorBoard](https://www.tensorflow.org/tensorboard) logger. "]},{"cell_type":"markdown","metadata":{"id":"Ke2qOKOi4l4J"},"source":["Here, we're going to set up logging to [TensorBoard](https://www.tensorflow.org/tensorboard) (the most popular way of displaying data about training your deep net), both before and after traning. Here is a nice [tutorial on using TensorBoard with PyTorch Lightning](https://www.learnopencv.com/tensorboard-with-pytorch-lightning/).\n"]},{"cell_type":"code","metadata":{"id":"M-Pzj_WudiEY"},"source":["# To clear out TensorBoard and start totally fresh, you need to\n","# remove old logs by deleting them from the directory\n","!rm -rf ./lightning_logs/\n","\n","# This will help me time-stamp my logs \n","mytime = datetime.datetime.now().strftime(\"%I:%M%p on %B %d, %Y\")\n","\n","# define how to log information about training to tensorboard\n","tb_logger = pl_loggers.TensorBoardLogger('lightning_logs/','CPU',mytime)\n","\n","\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"2CCwnGievHDa"},"source":["## Declare the model and the data module"]},{"cell_type":"code","metadata":{"id":"OWKni8ePvEjo"},"source":["# load and format our data\n","data_module = MyDataModule()\n","# define our model and how it will train\n","model = MyLightningModule()\n","\n","# saving the weights of the model for later comparison\n","untrained_model = MyLightningModule()\n","untrained_model.load_state_dict(model.state_dict())\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"3pBHCxA2SO8l"},"source":["## Actually train the model"]},{"cell_type":"markdown","metadata":{"id":"IsJWiJyd7TLf"},"source":["PyTorch Lightning saves you from having to write a training loop because there is a [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html?highlight=Trainer) that handles calling the right DataLoader to hand a batch of training data to the Module defining the network model. The trainer is where you define the number of epochs to train for, as an example.\n","\n","Now that we've definde the network structure and dataflow in the PyTorch LigningModule, and we've defined how to load and format data in the LightningDataModule, we're ready to run the main training loop. This is where we declare the [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html?highlight=Trainer), which calls these other modules at the appropriate time.\n","\n","Note that logging to TensorBoard is happening due to some calls in the LightningModule that declared what happens in a train step and a validation step."]},{"cell_type":"code","metadata":{"id":"SAU5uu_O1PH7"},"source":["\n","# declare the traininer, which runs the training and validation loops automatically\n","trainer = pl.Trainer(logger=tb_logger, max_epochs = 2)\n","\n","# just to measure how long training takes\n","start_time = process_time() \n","\n","# OK. Run the training loop\n","trainer.fit(model, data_module)\n","\n","print(\"Elapsed time in seconds:\", process_time() -start_time) \n","\n","\n","\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"kvo2D3Q0yboM"},"source":["## Now put some more stuff on TensorBoard"]},{"cell_type":"markdown","metadata":{"id":"XjNWgL1yzWBM"},"source":["We logged training and valdation loss to TensorBoard due to the calls made in the LightningModule that defined our model. There are other things you can put on TensorBoard, including dataset images, network structure and histograms of model weights. We'll do that here."]},{"cell_type":"code","metadata":{"id":"YekjbpNTycB5"},"source":["# put a histogram of layer 1's after-training weights on TensorBoard\n","tb_logger.experiment.add_histogram('histogram of model.layer_1.weight after training', model.layer_1.weight)\n","\n","# Let's also do the before-training weights\n","tb_logger.experiment.add_histogram('histogram of model.layer_1.weight before training', untrained_model.layer_1.weight)\n","\n","# put an example batch of data on TensorBoard\n","data, target = next(iter(model.val_dataloader()))\n","show_this = torchvision.utils.make_grid(data)\n","tb_logger.experiment.add_image('validation images', show_this)\n","\n","#Putting the network structure on tensorboard\n","tb_logger.experiment.add_graph(model,data)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"CkHT8s9CKlTc"},"source":["## View how training went using TensorBoard. \n"]},{"cell_type":"markdown","metadata":{"id":"lhHZYYiE8foE"},"source":["Now we can start up [TensorBoard](https://www.tensorflow.org/tensorboard).TensorBoard provides visualization and tooling for machine learning experimentation. Let's view what we've logged about training accuracy/loss, our data, and the model structure.\n","\n","Here is a good basic [tutorial on using TensorBoard with Pytorch Lightning](https://www.learnopencv.com/tensorboard-with-pytorch-lightning/)"]},{"cell_type":"code","metadata":{"id":"IqgiTM2-I1MD"},"source":["%load_ext tensorboard\n","%tensorboard --logdir lightning_logs/"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"bLWe-SRgJluo"},"source":["## Check that we can access the GPU"]},{"cell_type":"markdown","metadata":{"id":"xpE2vpmVJse6"},"source":["Starting here, we're going to care about the GPU. Let's just check to make sure things are OK on that front. When you run the code below, you should see that CUDA is available = TRUE....if you want to do anything on the GPU. This won't affect CPU processing at all.\n","\n","Note...if you're running CoLab then, for this test to show that CUDA is available, you'll need to go to the menu bar above and select \"Runtime\", then \"Chang Runtime Type\", then select \"GPU\". Once you've done that, this test may still fail. At that point, try \"Runtime\", then \"Factory Reset Runtime\". Once you've done that, execute this notebook again, starting from the first cell.\n"]},{"cell_type":"code","metadata":{"id":"h72byGpMHjQE"},"source":["# This will show us details about the GPU\n","!nvidia-smi\n","\n","# This will tell us the torch version, in case there's an issue there (the latest version\n","# as of this writing is 1.6)\n","print(\"My version of PyTorch is: \", torch.__version__)\n","\n","# If this turns out to be \"true\", we're probably in good shape\n","print(\"CUDA is available = \", torch.cuda.is_available())\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"445nHrSmQZTU"},"source":["## Train the model again, but this time on the GPU instead of the CPU."]},{"cell_type":"markdown","metadata":{"id":"lsPGIZeROcr7"},"source":["Now...we do train a new model from scratch, but this time with the trainer set to gpus=1. That will tell it there is 1 GPU to use...and it will use that GPU. Here's the ONLY difference between this block of code and the previous block where we declared and ran a Trainer.\n","\n","Previous:\n","\n","\n","```\n","trainer = pl.Trainer(logger=tb_logger, max_epochs = 1, callbacks=[MyCallback()])\n","```\n","\n","\n","Runs on GPU:\n","\n","\n","\n","```\n","trainer = pl.Trainer(logger=tb_logger, max_epochs = 1, callbacks=[MyCallback()], gpus=[0])\n","```\n","\n","\n"]},{"cell_type":"code","metadata":{"id":"zHaU4vSl_8ht"},"source":["# This will help me time-stamp my new log and keep is separate from the old one \n","mytime = datetime.datetime.now().strftime(\"%I:%M%p on %B %d, %Y\")\n","# define how to log information about training to tensorboard\n","tb_logger = pl_loggers.TensorBoardLogger('lightning_logs/','GPU',mytime)\n","\n","# declare the traininer, which runs the training and validation loops automatically\n","trainer = pl.Trainer(logger=tb_logger, max_epochs = 2, gpus=1)\n","\n","# OK. Run the training loop\n","start_time = process_time() \n","\n","trainer.fit(model, data_module)\n","\n","print(\"Elapsed time in seconds:\", process_time() -start_time) "],"execution_count":null,"outputs":[]}]}
--------------------------------------------------------------------------------
/notebooks/slides/DL_multilayer_perceptrons.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_multilayer_perceptrons.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_perceptrons.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_perceptrons.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_recurrent_nets.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_recurrent_nets.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_regularization.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_regularization.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_transformers.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_transformers.pdf
--------------------------------------------------------------------------------
/notebooks/slides/DL_unsupervised_methods.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/notebooks/slides/DL_unsupervised_methods.pdf
--------------------------------------------------------------------------------
/readings/RNN-tutorial-WildML.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/readings/RNN-tutorial-WildML.pdf
--------------------------------------------------------------------------------
/readings/chapter4-ml.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/readings/chapter4-ml.pdf
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | torch
2 | torchvision
3 | gdown >= 4.4.0
4 | torchaudio
5 | librosa
6 | matplotlib
7 | tensorboard
8 | ipython >= 7.0
9 | ipykernel
10 | tqdm
11 | librosa
12 | numpy
13 | seaborn
14 | torchsummary
15 |
--------------------------------------------------------------------------------
/slides/DL_GANs.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_GANs.pdf
--------------------------------------------------------------------------------
/slides/DL_adversarial_examples.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_adversarial_examples.pdf
--------------------------------------------------------------------------------
/slides/DL_attention_networks.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_attention_networks.pdf
--------------------------------------------------------------------------------
/slides/DL_audio_adversarial.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_audio_adversarial.pdf
--------------------------------------------------------------------------------
/slides/DL_convolutional_nets.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_convolutional_nets.pdf
--------------------------------------------------------------------------------
/slides/DL_deep_reinforcement_learning.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_deep_reinforcement_learning.pdf
--------------------------------------------------------------------------------
/slides/DL_gradient_descent.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_gradient_descent.pdf
--------------------------------------------------------------------------------
/slides/DL_lightning_and_tensorboard.ipynb:
--------------------------------------------------------------------------------
1 | {"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"DL Lightning and TensorBoard.ipynb","provenance":[],"private_outputs":true,"collapsed_sections":["Exnwbnlyhk-o","3fMHyUaGhj-p"],"toc_visible":true,"authorship_tag":"ABX9TyOkNZDLi9ZC8C2jyL6SwjWI"},"kernelspec":{"name":"python3","display_name":"Python 3"},"accelerator":"GPU"},"cells":[{"cell_type":"markdown","metadata":{"id":"fXGJ_v2xEsfd"},"source":["# Basics of pytorch_lightning, dataloaders, and tensorboard\n","\n","This is a basic walkthrough of building, training, and using a simple neural network in [PyTorch](https://pytorch.org/) using the [PyTorch Lightning](https://pytorch-lightning.readthedocs.io/en/latest/) package to let us make better-organized code, and displaying information about training in [TensorBoard](https://www.tensorflow.org/tensorboard). We'll load the classic [MNIST dataset](http://yann.lecun.com/exdb/mnist/) from the [torchvision.datasets](https://pytorch.org/docs/stable/torchvision/datasets.html) library of image datasets and feed the network data using a [dataloader](https://pytorch.org/docs/stable/data.html), which is a standard way of handling data for a PyTorch model.\n","\n"]},{"cell_type":"markdown","metadata":{"id":"Exnwbnlyhk-o"},"source":["## Install and import the needed packages"]},{"cell_type":"markdown","metadata":{"id":"wjbkm66hEngE"},"source":["### Install the needed packages."]},{"cell_type":"code","metadata":{"id":"SDsIUoGDKxNy"},"source":["!pip install torch \n","!pip install torchvision\n","!pip install pytorch_lightning\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"LAPyUo7jK22C"},"source":["### Now that the packages are installed, import them to this project."]},{"cell_type":"code","metadata":{"id":"9lEdGUS8EjGM"},"source":["# -------------------------------------------------\n","# Pytorch \n","# -------------------------------------------------\n","\n","# This is the main torch package\n","import torch \n","#Computer vision specific package \n","import torchvision\n","#There are a bunch of standard datasets in torchvision. \n","import torchvision.datasets as datasets\n","\n","# -------------------------------------------------\n","# Pytorch Lightning \n","# -------------------------------------------------\n","\n","import pytorch_lightning as pl\n","# this gives us the hooks to connect to TensorBoard\n","import pytorch_lightning.loggers as pl_loggers\n","\n","# -------------------------------------------------\n","# Stuff to show the data using matplotlib\n","# -------------------------------------------------\n","#import random\n","#import numpy as np\n","#import matplotlib\n","#import matplotlib.pyplot as plt\n","# this magic command lets me show plots in the notebook\n","#%matplotlib inline\n","\n","# -------------------------------------------------\n","# Stuff for timestamping \n","# -------------------------------------------------\n","from time import process_time \n","import datetime\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"3fMHyUaGhj-p"},"source":["## Define the [LightningDataModule](https://pytorch-lightning.readthedocs.io/en/latest/datamodules.html) to prepares the data for use by the network."]},{"cell_type":"markdown","metadata":{"id":"oQy1M_0Zy-eM"},"source":["A datamodule is a shareable, reusable class that encapsulates all the steps needed to process data. \n","\n","1. Download / tokenize / process.\n","1. Clean up data and (maybe) save to disk.\n","1. Load data into a [Dataset](https://pytorch.org/docs/stable/data.html).\n","1. Apply [Transforms](https://pytorch.org/docs/stable/torchvision/transforms.html) to the data (rotate, tokenize, etc…).\n","1. Wrap inside a [DataLoader](https://pytorch.org/docs/stable/data.html)."]},{"cell_type":"code","metadata":{"id":"_LTYf095fSq0"},"source":["class MyDataModule(pl.LightningDataModule):\n","\n"," def __init__(self, data_dir='./data/'):\n"," super().__init__()\n"," self.data_dir = data_dir\n","\n"," # These will be applied to every element in the dataset, to ensure they're \n"," # normalized and in the right format.\n"," self.transform = torchvision.transforms.Compose([\n"," torchvision.transforms.ToTensor(),\n"," torchvision.transforms.Normalize((0.5,), (0.5,))])\n"," \n"," # Our dataset is in the torchvision library of datasets. Here is where you'd\n"," # change the code to process a different dataset\n"," def setup(self, stage=None):\n"," self.mnist_test = datasets.MNIST(self.data_dir,train=False,download=True, transform=self.transform)\n"," mnist_full = datasets.MNIST(self.data_dir, train=True, download=True, transform=self.transform)\n"," self.mnist_train, self.mnist_val = torch.utils.data.random_split(mnist_full, [55000, 5000])\n","\n"," # Dataloaders are the things that handle creating batches of data and handing them\n"," # to the model. You determine whether to randomize data order and the size of the batch\n"," # when you declare the data loader\n"," def train_dataloader(self):\n"," return torch.utils.data.DataLoader(self.mnist_train, batch_size=64, shuffle=True)\n","\n"," def val_dataloader(self):\n"," return torch.utils.data.DataLoader(self.mnist_val, batch_size=64, shuffle=True)\n","\n"," def test_dataloader(self):\n"," return torch.utils.data.DataLoader(self.mnist_test, batch_size=1)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"NHEBDrqXAYw2"},"source":["## Define network architecture and test/train actions in a [PyTorchLightning](https://pytorch-lightning.readthedocs.io/en/latest/lightning_module.html) Module."]},{"cell_type":"markdown","metadata":{"id":"4vxVCyl317X9"},"source":["PyTorch Lightning builds on top of standard PyTorch. It is a way of organizing code to make it more modular and easier to handle. A LightningModule organizes PyTorch code into these sections:\n","\n","1. Network architecture (init)\n","1. Data-flow/computations (forward)\n","1. Train loop (training_step)\n","1. Validation loop (validation_step)\n","1. Test loop (test_step)\n","1. Optimizers (configure_optimizers)\n","\n","The first two of these (init and forward) are what you'd find in a typical [torch.nn.Module](https://pytorch.org/docs/stable/generated/torch.nn.Module.html). The LightningModule extends the torch Module to add the train/test/validation and optimizer definitions into the module.\n","\n","The code you'd normally write for a torch Module's training and testing loops are instead handled externally to your code by a [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html). The Trainer handles the event loop for training and testing the model, calling the methods in your LightningModule when it's time to train or test on a batch of data from a dataloader."]},{"cell_type":"code","metadata":{"id":"p_Qr7940ORwn"},"source":["\n","class MyLightningModule(pl.LightningModule):\n","\n"," # Define the model architecture\n"," def __init__(self):\n"," super(MyLightningModule, self).__init__()\n","\n"," # mnist images are (1, 28, 28) (channels, width, height) \n"," self.layer_1 = torch.nn.Linear(28 * 28, 64)\n"," self.layer_2 = torch.nn.Linear(64, 256)\n"," self.layer_3 = torch.nn.Linear(256, 10)\n","\n"," def forward(self, x):\n"," batch_size, channels, width, height = x.size()\n"," x = x.view(batch_size, -1)\n"," x = torch.relu(self.layer_1(x))\n"," x = torch.relu(self.layer_2(x))\n"," x = torch.log_softmax(self.layer_3(x), dim=1)\n"," return x\n","\n"," def configure_optimizers(self):\n"," optimizer = torch.optim.Adam(self.parameters(), lr=1e-3)\n"," return optimizer\n","\n"," def my_loss(self, y_hat,y):\n"," return torch.nn.functional.nll_loss(y_hat,y)\n","\n"," def training_step(self, train_batch, batch_idx):\n"," x, y = train_batch # Here x = data, y = labels\n"," output = self.forward(x)\n"," loss = self.my_loss(output, y)\n"," \n"," # Calculate the accuracy of the model on the batch of data\n"," y_hat = output.argmax(dim=1)\n"," accuracy = y_hat.eq(y).sum().item()/len(y)\n","\n"," # these two lines write the accurcay and loss to TensorBoard\n"," self.logger.experiment.add_scalar(\"Accuracy/Train\", accuracy)\n"," self.logger.experiment.add_scalar(\"Loss/Train\", loss)\n","\n"," return {\"loss\": loss} \n","\n"," def validation_step(self, val_batch, batch_idx):\n"," x, y = val_batch\n"," output = self.forward(x)\n"," loss = self.my_loss(output, y)\n"," self.logger.experiment.add_scalar(\"Loss/Val\", loss)\n"," return {\"loss\":loss}\n","\n"," def test_step(self, test_batch, batch_idx):\n"," x, y = test_batch\n"," output = self.forward(x)\n"," loss = self.my_loss(output, y)\n"," return {\"loss\":loss}\n","\n"," \n","\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"je7DSC35eJk_"},"source":["## Declare a [TensorBoard](https://www.tensorflow.org/tensorboard) logger. "]},{"cell_type":"markdown","metadata":{"id":"Ke2qOKOi4l4J"},"source":["Here, we're going to set up logging to [TensorBoard](https://www.tensorflow.org/tensorboard) (the most popular way of displaying data about training your deep net), both before and after traning. Here is a nice [tutorial on using TensorBoard with PyTorch Lightning](https://www.learnopencv.com/tensorboard-with-pytorch-lightning/).\n"]},{"cell_type":"code","metadata":{"id":"M-Pzj_WudiEY"},"source":["# To clear out TensorBoard and start totally fresh, you need to\n","# remove old logs by deleting them from the directory\n","!rm -rf ./lightning_logs/\n","\n","# This will help me time-stamp my logs \n","mytime = datetime.datetime.now().strftime(\"%I:%M%p on %B %d, %Y\")\n","\n","# define how to log information about training to tensorboard\n","tb_logger = pl_loggers.TensorBoardLogger('lightning_logs/','CPU',mytime)\n","\n","\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"2CCwnGievHDa"},"source":["## Declare the model and the data module"]},{"cell_type":"code","metadata":{"id":"OWKni8ePvEjo"},"source":["# load and format our data\n","data_module = MyDataModule()\n","# define our model and how it will train\n","model = MyLightningModule()\n","\n","# saving the weights of the model for later comparison\n","untrained_model = MyLightningModule()\n","untrained_model.load_state_dict(model.state_dict())\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"3pBHCxA2SO8l"},"source":["## Actually train the model"]},{"cell_type":"markdown","metadata":{"id":"IsJWiJyd7TLf"},"source":["PyTorch Lightning saves you from having to write a training loop because there is a [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html?highlight=Trainer) that handles calling the right DataLoader to hand a batch of training data to the Module defining the network model. The trainer is where you define the number of epochs to train for, as an example.\n","\n","Now that we've definde the network structure and dataflow in the PyTorch LigningModule, and we've defined how to load and format data in the LightningDataModule, we're ready to run the main training loop. This is where we declare the [Trainer](https://pytorch-lightning.readthedocs.io/en/latest/trainer.html?highlight=Trainer), which calls these other modules at the appropriate time.\n","\n","Note that logging to TensorBoard is happening due to some calls in the LightningModule that declared what happens in a train step and a validation step."]},{"cell_type":"code","metadata":{"id":"SAU5uu_O1PH7"},"source":["\n","# declare the traininer, which runs the training and validation loops automatically\n","trainer = pl.Trainer(logger=tb_logger, max_epochs = 2)\n","\n","# just to measure how long training takes\n","start_time = process_time() \n","\n","# OK. Run the training loop\n","trainer.fit(model, data_module)\n","\n","print(\"Elapsed time in seconds:\", process_time() -start_time) \n","\n","\n","\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"kvo2D3Q0yboM"},"source":["## Now put some more stuff on TensorBoard"]},{"cell_type":"markdown","metadata":{"id":"XjNWgL1yzWBM"},"source":["We logged training and valdation loss to TensorBoard due to the calls made in the LightningModule that defined our model. There are other things you can put on TensorBoard, including dataset images, network structure and histograms of model weights. We'll do that here."]},{"cell_type":"code","metadata":{"id":"YekjbpNTycB5"},"source":["# put a histogram of layer 1's after-training weights on TensorBoard\n","tb_logger.experiment.add_histogram('histogram of model.layer_1.weight after training', model.layer_1.weight)\n","\n","# Let's also do the before-training weights\n","tb_logger.experiment.add_histogram('histogram of model.layer_1.weight before training', untrained_model.layer_1.weight)\n","\n","# put an example batch of data on TensorBoard\n","data, target = next(iter(model.val_dataloader()))\n","show_this = torchvision.utils.make_grid(data)\n","tb_logger.experiment.add_image('validation images', show_this)\n","\n","#Putting the network structure on tensorboard\n","tb_logger.experiment.add_graph(model,data)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"CkHT8s9CKlTc"},"source":["## View how training went using TensorBoard. \n"]},{"cell_type":"markdown","metadata":{"id":"lhHZYYiE8foE"},"source":["Now we can start up [TensorBoard](https://www.tensorflow.org/tensorboard).TensorBoard provides visualization and tooling for machine learning experimentation. Let's view what we've logged about training accuracy/loss, our data, and the model structure.\n","\n","Here is a good basic [tutorial on using TensorBoard with Pytorch Lightning](https://www.learnopencv.com/tensorboard-with-pytorch-lightning/)"]},{"cell_type":"code","metadata":{"id":"IqgiTM2-I1MD"},"source":["%load_ext tensorboard\n","%tensorboard --logdir lightning_logs/"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"bLWe-SRgJluo"},"source":["## Check that we can access the GPU"]},{"cell_type":"markdown","metadata":{"id":"xpE2vpmVJse6"},"source":["Starting here, we're going to care about the GPU. Let's just check to make sure things are OK on that front. When you run the code below, you should see that CUDA is available = TRUE....if you want to do anything on the GPU. This won't affect CPU processing at all.\n","\n","Note...if you're running CoLab then, for this test to show that CUDA is available, you'll need to go to the menu bar above and select \"Runtime\", then \"Chang Runtime Type\", then select \"GPU\". Once you've done that, this test may still fail. At that point, try \"Runtime\", then \"Factory Reset Runtime\". Once you've done that, execute this notebook again, starting from the first cell.\n"]},{"cell_type":"code","metadata":{"id":"h72byGpMHjQE"},"source":["# This will show us details about the GPU\n","!nvidia-smi\n","\n","# This will tell us the torch version, in case there's an issue there (the latest version\n","# as of this writing is 1.6)\n","print(\"My version of PyTorch is: \", torch.__version__)\n","\n","# If this turns out to be \"true\", we're probably in good shape\n","print(\"CUDA is available = \", torch.cuda.is_available())\n"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"445nHrSmQZTU"},"source":["## Train the model again, but this time on the GPU instead of the CPU."]},{"cell_type":"markdown","metadata":{"id":"lsPGIZeROcr7"},"source":["Now...we do train a new model from scratch, but this time with the trainer set to gpus=1. That will tell it there is 1 GPU to use...and it will use that GPU. Here's the ONLY difference between this block of code and the previous block where we declared and ran a Trainer.\n","\n","Previous:\n","\n","\n","```\n","trainer = pl.Trainer(logger=tb_logger, max_epochs = 1, callbacks=[MyCallback()])\n","```\n","\n","\n","Runs on GPU:\n","\n","\n","\n","```\n","trainer = pl.Trainer(logger=tb_logger, max_epochs = 1, callbacks=[MyCallback()], gpus=[0])\n","```\n","\n","\n"]},{"cell_type":"code","metadata":{"id":"zHaU4vSl_8ht"},"source":["# This will help me time-stamp my new log and keep is separate from the old one \n","mytime = datetime.datetime.now().strftime(\"%I:%M%p on %B %d, %Y\")\n","# define how to log information about training to tensorboard\n","tb_logger = pl_loggers.TensorBoardLogger('lightning_logs/','GPU',mytime)\n","\n","# declare the traininer, which runs the training and validation loops automatically\n","trainer = pl.Trainer(logger=tb_logger, max_epochs = 2, gpus=1)\n","\n","# OK. Run the training loop\n","start_time = process_time() \n","\n","trainer.fit(model, data_module)\n","\n","print(\"Elapsed time in seconds:\", process_time() -start_time) "],"execution_count":null,"outputs":[]}]}
--------------------------------------------------------------------------------
/slides/DL_multilayer_perceptrons.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_multilayer_perceptrons.pdf
--------------------------------------------------------------------------------
/slides/DL_perceptrons.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_perceptrons.pdf
--------------------------------------------------------------------------------
/slides/DL_recurrent_nets.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_recurrent_nets.pdf
--------------------------------------------------------------------------------
/slides/DL_regularization.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_regularization.pdf
--------------------------------------------------------------------------------
/slides/DL_transformers.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_transformers.pdf
--------------------------------------------------------------------------------
/slides/DL_unsupervised_methods.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/DL_unsupervised_methods.pdf
--------------------------------------------------------------------------------
/slides/GM_deep_RL_2_policy_gradients.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/slides/GM_deep_RL_2_policy_gradients.pdf
--------------------------------------------------------------------------------
/utils/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/utils/__init__.py
--------------------------------------------------------------------------------
/utils/adversarial_examples/__init__.py:
--------------------------------------------------------------------------------
1 | from utils.adversarial_examples.data import *
2 | from utils.adversarial_examples.plotting import *
3 | from utils.adversarial_examples.training import *
4 | from utils.adversarial_examples.models import *
5 | from utils.adversarial_examples.frequency_masking import *
6 |
--------------------------------------------------------------------------------
/utils/adversarial_examples/data.py:
--------------------------------------------------------------------------------
1 | import os
2 | import copy
3 | import librosa as li
4 | import seaborn as sns
5 | import matplotlib.pyplot as plt
6 |
7 | from tqdm import tqdm
8 |
9 | from pathlib import Path
10 |
11 | import torch
12 | import torch.nn as nn
13 | import torch.nn.functional as F
14 |
15 | import torch.optim as optim
16 | from torch.optim.lr_scheduler import StepLR
17 |
18 | from torchvision import datasets, transforms
19 | import IPython.display as ipd
20 |
21 |
22 | def load_mnist(train_batch_size: int = 64, test_batch_size: int = 1000):
23 | """
24 | Load MNIST dataset. MNIST classification code adapted from
25 | https://github.com/pytorch/examples/blob/master/mnist/main.py
26 | :return: train and test DataLoader objects
27 | """
28 |
29 | cuda_kwargs = {
30 | 'num_workers': 1,
31 | 'pin_memory': True,
32 | 'shuffle': True
33 | } if torch.cuda.is_available() else {}
34 |
35 | # format image data
36 | transform = transforms.Compose([
37 | transforms.ToTensor(),
38 | transforms.Normalize((0.1307,), (0.3081,))
39 | ])
40 |
41 | # download MNIST data
42 | train_data = datasets.MNIST(
43 | './data',
44 | train=True,
45 | download=True,
46 | transform=transform
47 | )
48 | test_data = datasets.MNIST(
49 | './data',
50 | train=False,
51 | transform=transform,
52 | )
53 |
54 | # load MNIST data
55 | train_loader = torch.utils.data.DataLoader(
56 | train_data,
57 | batch_size=train_batch_size,
58 | **cuda_kwargs
59 | )
60 | test_loader = torch.utils.data.DataLoader(
61 | test_data,
62 | batch_size=test_batch_size,
63 | **cuda_kwargs
64 | )
65 |
66 | return train_loader, test_loader
67 |
68 |
69 | def load_audiomnist(data_dir, train_batch_size: int = 64, test_batch_size: int = 128):
70 |
71 | audio_list = sorted(list(Path(data_dir).rglob(f'*.wav')))
72 | cache_list = sorted(list(Path(data_dir).rglob('*.pt'))) # check for cached dataset
73 |
74 | if len(cache_list) > 0:
75 | tx = torch.load(os.path.join(data_dir, 'audiomnist_tx.pt'))
76 | ty = torch.load(os.path.join(data_dir, 'audiomnist_ty.pt'))
77 |
78 | else:
79 | tx = torch.zeros((len(audio_list), 1, 16000))
80 | ty = torch.zeros(len(audio_list), dtype=torch.long)
81 |
82 | pbar = tqdm(audio_list, total=len(audio_list))
83 |
84 | for i, audio_fn in enumerate(pbar):
85 | pbar.set_description(
86 | f'Loading AudioMNIST ({os.path.basename(audio_fn)})')
87 | waveform, _ = li.load(audio_fn,
88 | mono=True,
89 | sr=16000,
90 | duration=1.0)
91 | waveform = torch.from_numpy(waveform)
92 |
93 | tx[i, :, :waveform.shape[-1]] = waveform
94 | ty[i] = int(os.path.basename(audio_fn).split("_")[0])
95 |
96 | torch.save(tx, os.path.join(data_dir, 'audiomnist_tx.pt'))
97 | torch.save(ty, os.path.join(data_dir, 'audiomnist_ty.pt'))
98 |
99 | # partition data
100 | tx_train, ty_train, tx_test, ty_test = [], [], [], []
101 | for i in range(10):
102 |
103 | idx = ty == i
104 | tx_i = tx[idx]
105 | ty_i = ty[idx]
106 |
107 | split = int(0.8 * len(tx_i))
108 |
109 | tx_train.append(tx_i[:split]), ty_train.append(ty_i[:split])
110 | tx_test.append(tx_i[split:]), ty_test.append(ty_i[split:])
111 |
112 | tx_train = torch.cat(tx_train, dim=0)
113 | ty_train = torch.cat(ty_train, dim=0)
114 | tx_test = torch.cat(tx_test, dim=0)
115 | ty_test = torch.cat(ty_test, dim=0)
116 |
117 | # create datasets
118 | train_data = torch.utils.data.TensorDataset(tx_train, ty_train)
119 |
120 | test_data = torch.utils.data.TensorDataset(tx_test, ty_test)
121 |
122 | # load data
123 | cuda_kwargs = {
124 | 'num_workers': 1,
125 | 'pin_memory': True,
126 | 'shuffle': True
127 | } if torch.cuda.is_available() else {}
128 |
129 | train_loader = torch.utils.data.DataLoader(
130 | train_data,
131 | batch_size=train_batch_size,
132 | **cuda_kwargs
133 | )
134 | test_loader = torch.utils.data.DataLoader(
135 | test_data,
136 | batch_size=test_batch_size,
137 | **cuda_kwargs
138 | )
139 |
140 | return train_loader, test_loader
141 |
--------------------------------------------------------------------------------
/utils/adversarial_examples/frequency_masking.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 | from tqdm import tqdm
5 |
6 | from typing import Optional, List, Union, Tuple, TYPE_CHECKING
7 |
8 |
9 | class FrequencyMaskingLoss(nn.Module):
10 | """
11 | Adapted from Adversarial Robustness Toolkit (ART) implementation of Qin et al.
12 | frequency-masking attack (ICML, 2019). See: https://bit.ly/3lmmNXn
13 | """
14 | def __init__(self,
15 | alpha: Union[float, torch.Tensor] = 1e-6,
16 | window_size: int = 512,
17 | hop_size: int = 128,
18 | sample_rate: int = 16000,
19 | pad: bool = True,
20 | normalize: str = None):
21 |
22 | super().__init__()
23 |
24 | self.alpha = alpha
25 | self.window_size = window_size
26 | self.hop_size = hop_size
27 |
28 | # full-overlap: hop size must divide window size
29 | if self.window_size % self.hop_size:
30 | raise ValueError(f"Full-overlap: hop size {self.hop_size} must "
31 | f"divide window size {self.window_size}")
32 |
33 | self.masker = PsychoacousticMasker(window_size, hop_size, sample_rate)
34 |
35 | self.pad = pad # pad audio to avoid boundary artifacts due to framing
36 |
37 | # normalize incoming audio to deal with loss scale-dependence
38 | if normalize not in [None, 'none', 'peak']:
39 | raise ValueError(f'Invalid normalization {normalize}')
40 | self.normalize = normalize
41 | self.peak = None
42 |
43 | # store reference masking thresholds and PSD maxima to avoid recomputing
44 | self.ref_wav = None
45 | self.ref_thresh = None
46 | self.ref_psd = None
47 |
48 | def _normalize(self, x: torch.Tensor):
49 | if self.normalize == "peak":
50 | return (1.0 / self.peak) * x * 0.95
51 | else:
52 | return x
53 |
54 | def _pad(self, x: torch.Tensor):
55 | pad_frames = self.window_size // self.hop_size - 1
56 | pad_len = pad_frames * self.hop_size
57 | return nn.functional.pad(x, (pad_len, pad_len))
58 |
59 | def _stabilized_threshold_and_psd_maximum(self, x_ref: torch.Tensor):
60 | """
61 | Return batch of stabilized masking thresholds and PSD maxima.
62 | :param x_ref: waveform reference inputs of shape (n_batch, ...)
63 | :return: tuple consisting of stabilized masking thresholds and PSD maxima
64 | """
65 |
66 | masking_threshold = []
67 | psd_maximum = []
68 |
69 | assert x_ref.ndim >= 2 # inputs must have batch dimension
70 |
71 | if self.pad: # apply padding to avoid boundary artifacts
72 | x_ref = self._pad(x_ref)
73 |
74 | pbar = tqdm(enumerate(x_ref), total=len(x_ref), desc="Computing masking thresholds")
75 | for _, x_i in pbar:
76 | mt, pm = self.masker.calculate_threshold_and_psd_maximum(x_i)
77 | masking_threshold.append(mt)
78 | psd_maximum.append(pm)
79 |
80 | # stabilize imperceptible loss by canceling out the "10*log" term in power spectral density maximum and
81 | # masking threshold
82 | masking_threshold_stabilized = 10 ** (torch.cat(masking_threshold, dim=0) * 0.1)
83 | psd_maximum_stabilized = 10 ** (torch.cat(psd_maximum, dim=0) * 0.1)
84 |
85 | return masking_threshold_stabilized, psd_maximum_stabilized
86 |
87 | def _masking_hinge_loss(
88 | self,
89 | perturbation: torch.Tensor,
90 | psd_maximum_stabilized: torch.Tensor,
91 | masking_threshold_stabilized: torch.Tensor
92 | ):
93 |
94 | n_batch = perturbation.shape[0]
95 |
96 | # calculate approximate power spectral density
97 | psd_perturbation = self._approximate_power_spectral_density(
98 | perturbation, psd_maximum_stabilized
99 | )
100 |
101 | # calculate hinge loss per input, averaged over frames
102 | loss = nn.functional.relu(
103 | psd_perturbation - masking_threshold_stabilized
104 | ).view(n_batch, -1).mean(-1)
105 |
106 | return loss
107 |
108 | def _approximate_power_spectral_density(
109 | self, perturbation: torch.Tensor, psd_maximum_stabilized: torch.Tensor
110 | ):
111 | """
112 | Approximate the power spectral density for a perturbation
113 | """
114 |
115 | n_batch = perturbation.shape[0]
116 |
117 | if self.pad: # pad to avoid boundary artifacts
118 | perturbation = self._pad(perturbation)
119 |
120 | # compute short-time Fourier transform (STFT)
121 | stft_matrix = torch.stft(
122 | perturbation.reshape(n_batch, -1),
123 | n_fft=self.window_size,
124 | hop_length=self.hop_size,
125 | win_length=self.window_size,
126 | center=False,
127 | return_complex=False,
128 | window=torch.hann_window(self.window_size).to(perturbation),
129 | ).to(perturbation)
130 |
131 | # compute power spectral density (PSD)
132 | # note: fixes implementation of Qin et al. by also considering the square root of gain_factor
133 | gain_factor = torch.sqrt(torch.as_tensor(8.0 / 3.0))
134 | psd_matrix = torch.sum(torch.square(gain_factor * stft_matrix / self.window_size), dim=-1)
135 |
136 | # approximate normalized psd: psd_matrix_approximated = 10^((96.0 - psd_matrix_max + psd_matrix)/10)
137 | psd_matrix_approximated = pow(10.0, 9.6) / psd_maximum_stabilized.reshape(-1, 1, 1) * psd_matrix
138 |
139 | # return PSD matrix such that shape is (batch_size, window_size // 2 + 1, frame_length)
140 | return psd_matrix_approximated
141 |
142 | def forward(self, x_adv: torch.Tensor, x_ref: torch.Tensor = None):
143 |
144 | self.peak = torch.max(
145 | torch.abs(x_ref) + 1e-12, dim=-1, keepdim=True)[0]
146 |
147 | x_adv = self._normalize(x_adv)
148 |
149 | # use precomputed references if available
150 | if self.ref_wav is None:
151 | x_ref = self._normalize(x_ref)
152 | perturbation = x_adv - x_ref # assume additive waveform perturbation
153 | masking_threshold, psd_maximum = self._stabilized_threshold_and_psd_maximum(x_ref)
154 | else:
155 | perturbation = x_adv - self.ref_wav
156 | masking_threshold, psd_maximum = self.ref_thresh, self.ref_psd
157 |
158 | loss = self._masking_hinge_loss( # do not reduce across batch dimension
159 | perturbation, psd_maximum, masking_threshold
160 | )
161 |
162 | # scale loss
163 | scaled_loss = self.alpha * loss
164 |
165 | return scaled_loss
166 |
167 | def set_reference(self, x_ref: torch.Tensor):
168 | """
169 | Compute and store masking thresholds and PSD maxima for reference inputs
170 | :param x_ref: waveform inputs of shape (n_batch, ...)
171 | """
172 |
173 | self.peak = torch.max(
174 | torch.abs(x_ref) + 1e-12, dim=-1, keepdim=True)[0]
175 |
176 | self.ref_wav = self._normalize(x_ref.clone().detach())
177 | self.ref_thresh, self.ref_psd = self._stabilized_threshold_and_psd_maximum(self.ref_wav)
178 |
179 | # do not track gradients for stored references
180 | self.ref_wav.requires_grad = False
181 | self.ref_thresh.requires_grad = False
182 | self.ref_psd.requires_grad = False
183 |
184 |
185 | class PsychoacousticMasker:
186 | """
187 | Adapted from Adversarial Robustness Toolbox Imperceptible ASR attack. Implements
188 | psychoacoustic model of Lin and Abdulla (2015) following Qin et al. (2019) simplifications.
189 |
190 | | Repo link: https://github.com/Trusted-AI/adversarial-robustness-toolbox/
191 | | Paper link: Lin and Abdulla (2015), https://www.springer.com/gp/book/9783319079738
192 | | Paper link: Qin et al. (2019), http://proceedings.mlr.press/v97/qin19a.html
193 | """
194 |
195 | def __init__(self, window_size: int = 2048, hop_size: int = 512, sample_rate: int = 16000) -> None:
196 | """
197 | Initialization.
198 |
199 | :param window_size: Length of the window. The number of STFT rows is `(window_size // 2 + 1)`.
200 | :param hop_size: Number of audio samples between adjacent STFT columns.
201 | :param sample_rate: Sampling frequency of audio inputs.
202 | """
203 | self._window_size = window_size
204 | self._hop_size = hop_size
205 | self._sample_rate = sample_rate
206 |
207 | # init some private properties for lazy loading
208 | self._fft_frequencies = None
209 | self._bark = None
210 | self._absolute_threshold_hearing = None
211 |
212 | def calculate_threshold_and_psd_maximum(self,
213 | audio: torch.Tensor
214 | ) -> Tuple[torch.Tensor, torch.Tensor]:
215 | """
216 | Compute the global masking threshold for an audio input and also return
217 | its maximum power spectral density. This is the main method to call in
218 | order to obtain global masking thresholds for an audio input. It also
219 | returns the maximum power spectral density (PSD) for each frame. Given
220 | an audio input, the following steps are performed:
221 |
222 | 1. STFT analysis and sound pressure level normalization
223 | 2. Identification and filtering of maskers
224 | 3. Calculation of individual masking thresholds
225 | 4. Calculation of global masking thresholds
226 |
227 | :param audio: Audio samples of shape `(length,)`.
228 | :return: Global masking thresholds of shape
229 | `(window_size // 2 + 1, frame_length)` and the PSD maximum for
230 | each frame of shape `(frame_length)`.
231 | """
232 |
233 | assert audio.ndim <= 1 or audio.shape[0] == 1 # process a single waveform
234 |
235 | # compute normalized PSD estimate frame-by-frame for each input, as well
236 | # as maximum of each input's unnormalized PSD
237 | psd_matrix, psd_max = self.power_spectral_density(audio)
238 | threshold = torch.zeros_like(psd_matrix)
239 |
240 | # compute masking frequencies frame-by-frame for each input
241 | for frame in range(psd_matrix.shape[-1]):
242 | # apply methods for finding and filtering maskers
243 | maskers, masker_idx = self.filter_maskers(*self.find_maskers(psd_matrix[..., frame]))
244 |
245 | # apply methods for calculating global threshold
246 | threshold[..., frame] = self.calculate_global_threshold(
247 | self.calculate_individual_threshold(maskers, masker_idx)
248 | )
249 |
250 | return threshold, psd_max
251 |
252 | def power_spectral_density(self, audio: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
253 | """
254 | Compute the power spectral density matrix for an audio input.
255 |
256 | :param audio: audio inputs of shape `(signal_len,)`.
257 | :return: PSD matrix of shape `(window_size // 2 + 1, frame_length)` and maximum vector of shape
258 | `(n_batch, frame_length)`.
259 | """
260 |
261 | # compute short-time Fourier transform (STFT)
262 | stft_matrix = torch.stft(
263 | audio.reshape(1, -1),
264 | n_fft=self.window_size,
265 | hop_length=self.hop_size,
266 | win_length=self.window_size,
267 | center=False,
268 | return_complex=True,
269 | window=torch.hann_window(self.window_size).to(audio.device),
270 | ).to(audio.device)
271 |
272 | # compute power spectral density (PSD)
273 | # note: fixes implementation of Qin et al. by also considering the square root of gain_factor
274 | gain_factor = torch.sqrt(torch.as_tensor(8.0 / 3.0))
275 | psd_matrix = 20 * torch.log10(torch.abs(gain_factor * stft_matrix / self.window_size))
276 | psd_matrix = psd_matrix.clamp(min=-200)
277 |
278 | # normalize PSD at 96dB
279 | psd_matrix_max = torch.amax(psd_matrix, dim=[d for d in range(1, psd_matrix.ndim)], keepdim=True)
280 | psd_matrix_normalized = 96.0 - psd_matrix_max + psd_matrix
281 |
282 | return psd_matrix_normalized, psd_matrix_max
283 |
284 | @staticmethod
285 | def find_maskers(psd_vector: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
286 | """
287 | Identify maskers. Possible maskers are local PSD maxima. Following Qin et al.,
288 | all maskers are treated as tonal.
289 |
290 | :param psd_vector: PSD vector of shape `(window_size // 2 + 1)`.
291 | :return: Possible PSD maskers and indices.
292 | """
293 |
294 | # find all local maxima in single-frame PSD estimate
295 | flat = psd_vector.reshape(-1)
296 | left = flat[1:-1] - flat[:-2]
297 | right = flat[1:-1] - flat[2:]
298 |
299 | ind = torch.where((left > 0) * (right > 0),
300 | torch.ones_like(left),
301 | torch.zeros_like(left))
302 | ind = torch.nn.functional.pad(ind, (1, 1), "constant", 0)
303 | masker_idx = torch.nonzero(ind, out=None).cpu().reshape(-1)
304 |
305 | # smooth maskers with their direct neighbors
306 | psd_maskers = 10 * torch.log10(
307 | torch.sum(
308 | torch.cat(
309 | [10 ** (psd_vector[..., masker_idx + i] / 10) for i in range(-1, 2)]
310 | ),
311 | dim=0
312 | )
313 | )
314 |
315 | return psd_maskers, masker_idx.to(psd_maskers.device)
316 |
317 | def filter_maskers(self,
318 | maskers: torch.Tensor,
319 | masker_idx: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
320 | """
321 | Filter maskers. First, discard all maskers that are below the absolute threshold
322 | of hearing. Second, reduce pairs of maskers that are within 0.5 bark distance of
323 | each other by keeping the larger masker.
324 |
325 | :param maskers: Masker PSD values.
326 | :param masker_idx: Masker indices.
327 | :return: Filtered PSD maskers and indices.
328 | """
329 | # filter on the absolute threshold of hearing
330 | # note: deviates from Qin et al. implementation by filtering first on ATH and only then on bark distance
331 | ath_condition = maskers > self.absolute_threshold_hearing.to(maskers)[masker_idx]
332 | masker_idx = masker_idx[ath_condition]
333 | maskers = maskers[ath_condition]
334 |
335 | # filter on the bark distance
336 | bark_condition = torch.ones(masker_idx.shape, dtype=torch.bool)
337 | i_prev = 0
338 | for i in range(1, len(masker_idx)):
339 | # find pairs of maskers that are within 0.5 bark distance of each other
340 | if self.bark[i] - self.bark[i_prev] < 0.5:
341 | # discard the smaller masker
342 | i_todelete, i_prev = (i_prev, i_prev + 1) if maskers[i_prev] < maskers[i] else (i, i_prev)
343 | bark_condition[i_todelete] = False
344 | else:
345 | i_prev = i
346 | masker_idx = masker_idx[bark_condition]
347 | maskers = maskers[bark_condition]
348 |
349 | return maskers, masker_idx
350 |
351 | @property
352 | def window_size(self) -> int:
353 | """
354 | :return: Window size of the masker.
355 | """
356 | return self._window_size
357 |
358 | @property
359 | def hop_size(self) -> int:
360 | """
361 | :return: Hop size of the masker.
362 | """
363 | return self._hop_size
364 |
365 | @property
366 | def sample_rate(self) -> int:
367 | """
368 | :return: Sample rate of the masker.
369 | """
370 | return self._sample_rate
371 |
372 | @property
373 | def fft_frequencies(self) -> torch.Tensor:
374 | """
375 | :return: Discrete fourier transform sample frequencies.
376 | """
377 | if self._fft_frequencies is None:
378 | self._fft_frequencies = torch.linspace(0, self.sample_rate / 2, self.window_size // 2 + 1)
379 | return self._fft_frequencies
380 |
381 | @property
382 | def bark(self) -> torch.Tensor:
383 | """
384 | :return: Bark scale for discrete fourier transform sample frequencies.
385 | """
386 | if self._bark is None:
387 | self._bark = 13 * torch.arctan(0.00076 * self.fft_frequencies) + 3.5 * torch.arctan(
388 | torch.square(self.fft_frequencies / 7500.0)
389 | )
390 | return self._bark
391 |
392 | @property
393 | def absolute_threshold_hearing(self) -> torch.Tensor:
394 | """
395 | :return: Absolute threshold of hearing (ATH) for discrete fourier transform sample frequencies.
396 | """
397 | if self._absolute_threshold_hearing is None:
398 | # ATH applies only to frequency range 20Hz<=f<=20kHz
399 | # note: deviates from Qin et al. implementation by using the Hz range as valid domain
400 | valid_domain = torch.logical_and(20 <= self.fft_frequencies, self.fft_frequencies <= 2e4)
401 | freq = self.fft_frequencies[valid_domain] * 0.001
402 |
403 | # outside valid ATH domain, set values to -infinity
404 | # note: This ensures that every possible masker in the bins <=20Hz is valid. As a consequence, the global
405 | # masking threshold formula will always return a value different to infinity
406 | self._absolute_threshold_hearing = torch.ones(valid_domain.shape) * -float('inf')
407 |
408 | self._absolute_threshold_hearing[valid_domain] = (
409 | 3.64 * pow(freq, -0.8) - 6.5 * torch.exp(-0.6 * torch.square(freq - 3.3)) + 0.001 * pow(freq, 4) - 12
410 | )
411 | return self._absolute_threshold_hearing
412 |
413 | def calculate_individual_threshold(self,
414 | maskers: torch.Tensor,
415 | masker_idx: torch.Tensor) -> torch.Tensor:
416 | """
417 | Calculate individual masking threshold with frequency denoted at bark scale.
418 |
419 | :param maskers: Masker PSD values.
420 | :param masker_idx: Masker indices.
421 | :return: Individual threshold vector of shape `(window_size // 2 + 1)`.
422 | """
423 | delta_shift = -6.025 - 0.275 * self.bark
424 | threshold = torch.zeros(masker_idx.shape + self.bark.shape).to(maskers)
425 | # TODO reduce for loop
426 | for k, (masker_j, masker) in enumerate(zip(masker_idx, maskers)):
427 |
428 | # critical band rate of the masker
429 | z_j = self.bark[masker_j].to(maskers)
430 | # distance maskees to masker in bark
431 | delta_z = self.bark.to(maskers) - z_j
432 |
433 | # define two-slope spread function:
434 | # if delta_z <= 0, spread_function = 27*delta_z
435 | # if delta_z > 0, spread_function = [-27+0.37*max(PSD_masker-40,0]*delta_z
436 | spread_function = 27 * delta_z
437 | spread_function[delta_z > 0] = (-27 + 0.37 * max(masker - 40, 0)) * delta_z[delta_z > 0]
438 |
439 | # calculate threshold
440 | threshold[k, :] = masker + delta_shift[masker_j] + spread_function
441 |
442 | return threshold
443 |
444 | def calculate_global_threshold(self, individual_threshold):
445 | """
446 | Calculate global masking threshold.
447 |
448 | :param individual_threshold: Individual masking threshold vector.
449 | :return: Global threshold vector of shape `(window_size // 2 + 1)`.
450 | """
451 | # note: deviates from Qin et al. implementation by taking the log of the summation, which they do for numerical
452 | # stability of the stage 2 optimization. We stabilize the optimization in the loss itself.
453 |
454 | return 10 * torch.log10(
455 | torch.sum(10 ** (individual_threshold / 10), dim=0) + 10 ** (self.absolute_threshold_hearing.to(individual_threshold) / 10)
456 | )
457 |
--------------------------------------------------------------------------------
/utils/adversarial_examples/models/__init__.py:
--------------------------------------------------------------------------------
1 | from utils.adversarial_examples.models.mnist_classifier import *
2 | from utils.adversarial_examples.models.audiomnist_classifier import *
3 |
--------------------------------------------------------------------------------
/utils/adversarial_examples/models/audiomnist_classifier.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import math
3 | import decimal
4 | from torch import nn
5 |
6 |
7 | class CNN(nn.Module):
8 | """Adaptation of AudioNet (arXiv:1807.03418)."""
9 |
10 | def __init__(self,
11 | input_dim=16000, # 1s @ 16kHz
12 | n_classes=10 # 10 digits
13 | ):
14 | super().__init__()
15 |
16 | self.conv1 = nn.Sequential(
17 | nn.Conv1d(1, 100, kernel_size=3, stride=1, padding=2),
18 | nn.BatchNorm1d(100),
19 | nn.ReLU(),
20 | nn.MaxPool1d(3, stride=2))
21 |
22 | self.conv2 = nn.Sequential(
23 | nn.Conv1d(100, 64, kernel_size=3, stride=1, padding=1),
24 | nn.BatchNorm1d(64),
25 | nn.ReLU(),
26 | nn.MaxPool1d(2, stride=2))
27 |
28 | self.conv3 = nn.Sequential(
29 | nn.Conv1d(64, 128, kernel_size=3, stride=1, padding=1),
30 | nn.BatchNorm1d(128),
31 | nn.ReLU(),
32 | nn.MaxPool1d(2, stride=2))
33 |
34 | self.conv4 = nn.Sequential(
35 | nn.Conv1d(128, 128, kernel_size=3, stride=1, padding=1),
36 | nn.BatchNorm1d(128),
37 | nn.ReLU(),
38 | nn.MaxPool1d(2, stride=2))
39 |
40 | self.conv5 = nn.Sequential(
41 | nn.Conv1d(128, 128, kernel_size=3, stride=1, padding=1),
42 | nn.BatchNorm1d(128),
43 | nn.ReLU(),
44 | nn.MaxPool1d(2, stride=2))
45 |
46 | self.conv6 = nn.Sequential(
47 | nn.Conv1d(128, 128, kernel_size=3, stride=1, padding=1),
48 | nn.BatchNorm1d(128),
49 | nn.ReLU(),
50 | nn.MaxPool1d(2, stride=2))
51 |
52 | self.conv7 = nn.Sequential(
53 | nn.Conv1d(128, 64, kernel_size=3, stride=1, padding=1),
54 | nn.BatchNorm1d(64),
55 | nn.ReLU(),
56 | nn.MaxPool1d(2, stride=2))
57 |
58 | self.conv8 = nn.Sequential(
59 | nn.Conv1d(64, 32, kernel_size=3, stride=1, padding=0),
60 | nn.BatchNorm1d(32),
61 | nn.ReLU(),
62 | nn.MaxPool1d(2, stride=2))
63 |
64 | # compute necessary dimensions of final linear layer
65 | conv_shape = self._compute_output_size(input_dim)
66 | self.fc = nn.Linear(conv_shape, n_classes)
67 |
68 | def _compute_output_size(self, input_dim):
69 | x = torch.zeros((1, 1, input_dim))
70 | with torch.no_grad():
71 | x = self.conv1(x)
72 | x = self.conv2(x)
73 | x = self.conv3(x)
74 | x = self.conv4(x)
75 | x = self.conv5(x)
76 | x = self.conv6(x)
77 | x = self.conv7(x)
78 | x = self.conv8(x)
79 | return x.numel()
80 |
81 | def forward(self, x):
82 |
83 | x = self.conv1(x)
84 | x = self.conv2(x)
85 | x = self.conv3(x)
86 | x = self.conv4(x)
87 | x = self.conv5(x)
88 | x = self.conv6(x)
89 | x = self.conv7(x)
90 | x = self.conv8(x)
91 |
92 | x = x.view(x.shape[0], -1)
93 | x = self.fc(x)
94 | return x
95 |
96 |
97 | class AudioNet(nn.Module):
98 | """
99 | Wrapper for AudioNet waveform convolutional model proposed in Becker et al.
100 | (https://arxiv.org/abs/1807.03418), with normalization preprocessing. Code
101 | adapted from Adversarial Robustness Toolbox (https://tinyurl.com/54sdatn3)
102 | """
103 |
104 | def __init__(self,
105 | input_dim: int = 16000,
106 | n_classes: int = 10,
107 | normalize: bool = True,
108 | ):
109 | super().__init__()
110 |
111 | self.normalize = normalize
112 |
113 | self.cnn = CNN(
114 | input_dim=input_dim,
115 | n_classes=n_classes
116 | )
117 |
118 | def load_weights(self, path: str):
119 | """
120 | Load weights from checkpoint file
121 | """
122 |
123 | # check if file exists
124 | if not path or not os.path.isfile(path):
125 | return
126 |
127 | try:
128 | self.cnn.load_state_dict(torch.load(path))
129 | except RuntimeError:
130 | self.load_state_dict(torch.load(path))
131 |
132 | def forward(self, x: torch.Tensor):
133 |
134 | if self.normalize:
135 | x = (1.0 / torch.max(
136 | torch.abs(x) + 1e-8, dim=-1, keepdim=True
137 | )[0]) * x * 0.95
138 |
139 | return self.cnn(x)
140 |
141 | @staticmethod
142 | def match_predict(y_pred: torch.Tensor, y_true: torch.Tensor):
143 | """
144 | Determine whether target pairs are equivalent
145 | """
146 | if y_pred.ndim >= 2 and y_pred.shape[-1] >= 2:
147 | y_pred = torch.argmax(y_pred, dim=-1)
148 | else:
149 | y_pred = torch.round(y_pred.to(torch.float32))
150 |
151 | if y_true.ndim >= 2 and y_true.shape[-1] >= 2:
152 | y_true = torch.argmax(y_true, dim=-1)
153 | else:
154 | y_true = torch.round(y_true.to(torch.float32))
155 |
156 | return y_pred == y_true
157 |
--------------------------------------------------------------------------------
/utils/adversarial_examples/models/mnist_classifier.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 | import torch.nn.functional as F
4 |
5 |
6 | class MNISTClassifier(nn.Module):
7 | """
8 | A simple convolutional neural network for image classification.
9 | From https://github.com/pytorch/examples/blob/master/mnist/main.py
10 | """
11 |
12 | def __init__(self):
13 | super().__init__()
14 | self.conv1 = nn.Conv2d(1, 32, 3, 1)
15 | self.conv2 = nn.Conv2d(32, 64, 3, 1)
16 | self.dropout1 = nn.Dropout(0.25)
17 | self.dropout2 = nn.Dropout(0.5)
18 | self.fc1 = nn.Linear(9216, 128)
19 | self.fc2 = nn.Linear(128, 10)
20 |
21 | def forward(self, x):
22 | x = self.conv1(x)
23 | x = F.relu(x)
24 | x = self.conv2(x)
25 | x = F.relu(x)
26 | x = F.max_pool2d(x, 2)
27 | x = self.dropout1(x)
28 | x = torch.flatten(x, 1)
29 | x = self.fc1(x)
30 | x = F.relu(x)
31 | x = self.dropout2(x)
32 | x = self.fc2(x)
33 | output = F.log_softmax(x, dim=1)
34 | return output
35 |
--------------------------------------------------------------------------------
/utils/adversarial_examples/plotting.py:
--------------------------------------------------------------------------------
1 | import os
2 | import copy
3 | import librosa as li
4 | import seaborn as sns
5 | import matplotlib.pyplot as plt
6 | from typing import Union
7 |
8 | from tqdm import tqdm
9 |
10 | from pathlib import Path
11 |
12 | import torch
13 | import torch.nn as nn
14 | import torch.nn.functional as F
15 |
16 | import torch.optim as optim
17 | from torch.optim.lr_scheduler import StepLR
18 |
19 | from torchvision import datasets, transforms
20 | import IPython.display as ipd
21 |
22 |
23 | def play_audiomnist(x):
24 | return ipd.Audio(x.detach().cpu().numpy().flatten(), rate=16000) # load a NumPy array
25 |
26 |
27 | def plot_audiomnist(x: torch.Tensor, y: Union[int, torch.Tensor], model: nn.Module):
28 | """
29 | Given an audio waveform, ground-truth label, and a classification model:
30 | * plot audio
31 | * plot model's prediction for audio (vector of class scores)
32 |
33 | :param x: a tensor holding an audio waveform to classify and plot
34 | :param y: an integer or integer tensor holding the ground-truth class label
35 | :param model: a classification model for generating predictions
36 | """
37 |
38 | device = x.device # hold onto original device
39 |
40 | x = x.clone().detach().cpu()
41 | if isinstance(y, torch.Tensor):
42 | y = y.clone().detach().cpu().item()
43 |
44 | # use model to compute class scores and predicted label
45 | y_scores = torch.nn.functional.softmax(
46 | model(x.reshape(1, 1, 16000).to(device)), dim=-1
47 | ).detach().cpu()
48 | y_pred = y_scores.argmax()
49 |
50 | # initialize plot
51 | fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(6, 2.5))
52 | width = 0.5
53 | linewidth = 2.0
54 |
55 | # image plot
56 | axs[0].plot(x.squeeze().numpy(), 'k-')
57 | axs[0].set_xlabel('Sample idx')
58 | axs[0].set_ylabel('Amplitude')
59 |
60 | # class scores plot
61 | axs[1].bar(
62 | list(range(0, 10)),
63 | y_scores.flatten().detach().cpu().numpy(),
64 | width,
65 | color='black',
66 | label='class scores',
67 | edgecolor='black',
68 | linewidth=linewidth
69 | )
70 |
71 | # formatting
72 | fig.suptitle(f"True Label: {y}, Predicted Label: {y_pred}", y=1.1)
73 | axs[1].grid(False)
74 | axs[1].spines['left'].set_linewidth(linewidth)
75 | axs[1].set_xlim(-1, 10)
76 | axs[1].tick_params(bottom=True, left=True)
77 | axs[1].set_yscale('log')
78 | axs[1].set_xticks(list(range(0, 10)))
79 | sns.despine(bottom=True)
80 | plt.tight_layout()
81 | plt.show()
82 |
83 |
84 | def plot_mnist(x: torch.Tensor, y: Union[int, torch.Tensor], model: nn.Module):
85 | """
86 | Given a grayscale image, ground-truth label, and a classification model:
87 | * plot image
88 | * plot model's prediction for image (vector of class scores)
89 |
90 | :param x: a tensor holding an image to classify and plot
91 | :param y: an integer or integer tensor holding the ground-truth class label
92 | :param model: a classification model for generating predictions
93 | """
94 |
95 | device = x.device # hold onto original device
96 |
97 | x = x.clone().detach().cpu()
98 | if isinstance(y, torch.Tensor):
99 | y = y.clone().detach().cpu()
100 |
101 | # use model to compute class scores and predicted label
102 | y_scores = torch.nn.functional.softmax(
103 | model(x.reshape(1, 1, 28, 28).to(device)), dim=-1
104 | ).detach().cpu()
105 | y_pred = y_scores.argmax()
106 |
107 | # initialize plot
108 | fig, axs = plt.subplots(nrows=1, ncols=2, figsize=(6, 2.5))
109 | width = 0.5
110 | margin = 0.0025
111 | linewidth = 2.0
112 |
113 | # image plot
114 | axs[0].imshow(x.squeeze().numpy(), cmap='gray')
115 |
116 | # class scores plot
117 | axs[1].bar(
118 | list(range(0, 10)),
119 | y_scores.flatten().detach().cpu().numpy(),
120 | width,
121 | color='black',
122 | label='class scores',
123 | edgecolor='black',
124 | linewidth=linewidth
125 | )
126 |
127 | # formatting
128 | fig.suptitle(f"True Label: {y}, Predicted Label: {y_pred}", y=1.1)
129 | axs[1].grid(False)
130 | axs[1].spines['left'].set_linewidth(linewidth)
131 | axs[1].set_xlim(-1, 10)
132 | axs[1].tick_params(bottom=True, left=True)
133 | axs[1].set_yscale('log')
134 | axs[1].set_xticks(list(range(0, 10)))
135 | sns.despine(bottom=True)
136 | plt.tight_layout()
137 | plt.show()
138 |
139 |
140 |
141 |
142 |
143 |
144 |
--------------------------------------------------------------------------------
/utils/adversarial_examples/pretrained/audiomnist_classifier.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/utils/adversarial_examples/pretrained/audiomnist_classifier.pt
--------------------------------------------------------------------------------
/utils/adversarial_examples/pretrained/mnist_classifier.pt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/interactiveaudiolab/course-deep-learning/42213ca66f0255baae8482a28300b10954ad45d1/utils/adversarial_examples/pretrained/mnist_classifier.pt
--------------------------------------------------------------------------------
/utils/adversarial_examples/training.py:
--------------------------------------------------------------------------------
1 | import os
2 | import copy
3 | import librosa as li
4 | import seaborn as sns
5 | import matplotlib.pyplot as plt
6 |
7 | from tqdm import tqdm
8 |
9 | from pathlib import Path
10 |
11 | import torch
12 | import torch.nn as nn
13 | import torch.nn.functional as F
14 |
15 | import torch.optim as optim
16 | from torch.optim.lr_scheduler import StepLR
17 |
18 | from torchvision import datasets, transforms
19 | import IPython.display as ipd
20 |
21 |
22 | def train_audiomnist(model,
23 | device,
24 | train_loader,
25 | test_loader,
26 | epochs: int = 14,
27 | save_model: bool = True
28 | ):
29 | optimizer = torch.optim.SGD(
30 | model.parameters(), lr=0.001, momentum=0.9
31 | )
32 | scheduler = torch.optim.lr_scheduler.StepLR(
33 | optimizer,
34 | step_size=1,
35 | gamma=1.0
36 | )
37 |
38 | criterion = torch.nn.CrossEntropyLoss()
39 |
40 | best_acc = 0.0
41 |
42 | for epoch in range(epochs):
43 |
44 | # track loss
45 | training_loss = 0.0
46 | validation_loss = 0
47 |
48 | # track accuracy
49 | correct = 0
50 | total = 0
51 |
52 | pbar = tqdm(train_loader, total=len(train_loader))
53 |
54 | model.train()
55 | for batch_idx, batch_data in enumerate(pbar):
56 |
57 | pbar.set_description(
58 | f'Epoch {epoch + 1}, batch {batch_idx + 1}/{len(train_loader)}')
59 |
60 | inputs, labels = batch_data
61 |
62 | inputs = inputs.to(device)
63 | labels = labels.to(device)
64 |
65 | # forward + backward + optimize
66 | optimizer.zero_grad()
67 | outputs = model(inputs)
68 |
69 | loss = criterion(outputs, labels)
70 | loss.backward()
71 | optimizer.step()
72 |
73 | # sum training loss
74 | training_loss += loss.item()
75 |
76 | model.eval()
77 | with torch.no_grad():
78 |
79 | pbar = tqdm(test_loader, total=len(test_loader))
80 | for batch_idx, batch_data in enumerate(pbar):
81 |
82 | pbar.set_description(
83 | f'Validation, batch {batch_idx + 1}/{len(test_loader)}')
84 |
85 | inputs, labels = batch_data
86 |
87 | inputs = inputs.to(device)
88 | labels = labels.to(device)
89 |
90 | outputs = model(inputs)
91 |
92 | loss = criterion(outputs, labels)
93 |
94 | # sum validation loss
95 | validation_loss += loss.item()
96 |
97 | # calculate validation accuracy
98 | preds = torch.max(outputs.data, 1)[1]
99 |
100 | total += labels.size(0)
101 | correct += (preds == labels).sum().item()
102 |
103 | # calculate final metrics
104 | validation_loss /= len(test_loader)
105 | training_loss /= len(train_loader)
106 | accuracy = 100 * correct / total
107 |
108 | # if best model thus far, save
109 | if accuracy > best_acc and save_model:
110 | print(f"New best accuracy: {accuracy}; saving model")
111 | best_model = copy.deepcopy(model.state_dict())
112 | best_acc = accuracy
113 | torch.save(
114 | best_model,
115 | "audionet.pt"
116 | )
117 |
118 | # update step-size scheduler
119 | scheduler.step()
120 |
121 |
122 | def test_audiomnist(model, device, test_loader):
123 |
124 | model.to(device)
125 | model.eval()
126 |
127 | # track accuracy
128 | correct = 0
129 | total = 0
130 |
131 | with torch.no_grad():
132 |
133 | pbar = tqdm(test_loader, total=len(test_loader))
134 | for batch_idx, batch_data in enumerate(pbar):
135 |
136 | pbar.set_description(
137 | f'Validation, batch {batch_idx + 1}/{len(test_loader)}')
138 |
139 | inputs, labels = batch_data
140 |
141 | inputs = inputs.to(device)
142 | labels = labels.to(device)
143 |
144 | outputs = model(inputs)
145 |
146 | # calculate validation accuracy
147 | preds = torch.max(outputs.data, 1)[1]
148 |
149 | total += labels.size(0)
150 | correct += (preds == labels).sum().item()
151 |
152 | accuracy = 100 * correct / total
153 |
154 | print(f"\nModel accuracy: {accuracy}")
155 |
156 |
157 | def train_mnist(
158 | model,
159 | device,
160 | train_loader,
161 | test_loader,
162 | epochs: int = 14,
163 | log_interval: int = 50,
164 | save_model: bool = True
165 | ):
166 | """
167 | Train a simple MNIST classifier. MNIST classification code adapted from
168 | https://github.com/pytorch/examples/blob/master/mnist/main.py
169 |
170 | :param model:
171 | :param device:
172 | :param train_loader:
173 | :param test_loader:
174 | :param epochs:
175 | :param log_interval:
176 | :param save_model:
177 |
178 | :return:
179 | """
180 |
181 | # configure optimization
182 | optimizer = optim.Adadelta(model.parameters(), lr=1.0)
183 | scheduler = StepLR(optimizer, step_size=1, gamma=0.7)
184 |
185 | for epoch in range(1, epochs + 1):
186 |
187 | # training step
188 | model.train() # training mode
189 | for batch_idx, (data, target) in enumerate(train_loader):
190 | data, target = data.to(device), target.to(device)
191 | optimizer.zero_grad()
192 | output = model(data)
193 | loss = F.nll_loss(output, target)
194 | loss.backward()
195 | optimizer.step()
196 | if batch_idx % log_interval == 0:
197 | print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
198 | epoch, batch_idx * len(data), len(train_loader.dataset),
199 | 100. * batch_idx / len(train_loader), loss.item()))
200 |
201 | # validation step
202 | model.eval() # evaluation mode
203 | test_loss = 0
204 | correct = 0
205 | with torch.no_grad():
206 | for data, target in test_loader:
207 | data, target = data.to(device), target.to(device)
208 | output = model(data)
209 | test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
210 | pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
211 | correct += pred.eq(target.view_as(pred)).sum().item()
212 |
213 | test_loss /= len(test_loader.dataset)
214 |
215 | print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
216 | test_loss, correct, len(test_loader.dataset),
217 | 100. * correct / len(test_loader.dataset)))
218 |
219 | scheduler.step()
220 |
221 | if save_model:
222 | torch.save(model.state_dict(), "../models/mnist_cnn.pt")
223 |
224 |
225 | def test_mnist(model, device, test_loader):
226 | """
227 | Evaluate a simple MNIST classifier. MNIST classification code adapted from
228 | https://github.com/pytorch/examples/blob/master/mnist/main.py
229 | """
230 |
231 | model.eval() # evaluation mode
232 | test_loss = 0
233 | correct = 0
234 | with torch.no_grad():
235 | for data, target in test_loader:
236 | data, target = data.to(device), target.to(device)
237 | output = model(data)
238 | pred = output.argmax(dim=1, keepdim=True) # get the index of the max log-probability
239 | correct += pred.eq(target.view_as(pred)).sum().item()
240 |
241 | test_loss /= len(test_loader.dataset)
242 |
243 | print('\nTest Accuracy: {}/{} ({:.0f}%)\n'.format(
244 | correct, len(test_loader.dataset),
245 | 100. * correct / len(test_loader.dataset)))
--------------------------------------------------------------------------------
/utils/data.py:
--------------------------------------------------------------------------------
1 | import numpy as np
2 | from numpy import pi
3 |
4 |
5 | ################################################################################
6 | # Datasets
7 | ################################################################################
8 |
9 |
10 | def make_two_gaussians_data(
11 | examples_per_class: int,
12 | distance_between_means: float
13 | ):
14 | """
15 | Create a 2-dimensional set of points, where half the points are drawn from
16 | one Gaussian distribution and the other half are drawn from a different Gaussian
17 |
18 | PARAMETERS
19 | ----------
20 | examples_per_class An integer determining how much data we'll generate
21 |
22 | distance_between_means Distance between the means of the two Gaussians.
23 |
24 | RETURNS
25 | -------
26 | data A numpy array of 2 columns (dimensions) and 2*examples_per_class rows
27 |
28 | labels A numpy vector with 2*examples_per_class, with a +1 or -1 in each
29 | element. The jth element is the label of the jth example"""
30 |
31 | mean = [0, 0]
32 | cov = [[1, 0], [0, 1]]
33 |
34 | # negative class -1
35 | negData = np.random.multivariate_normal(mean, cov, examples_per_class)
36 |
37 | # positive class 1
38 | posData = np.random.multivariate_normal(mean, cov, examples_per_class)
39 | posData += distance_between_means
40 |
41 | # make the labels
42 | negL = np.ones(examples_per_class) * -1
43 | posL = np.ones(examples_per_class)
44 |
45 | # wrap it up and ship it out!
46 | data = np.concatenate([posData, negData])
47 | labels = np.concatenate([posL, negL])
48 |
49 | # shuffle the data
50 | perm = np.random.permutation(len(labels))
51 | data = data[perm]
52 | labels = labels[perm]
53 |
54 | return data, labels
55 |
56 |
57 | def make_XOR_data(examples_per_class: int):
58 | """
59 | Create a 2-dimensional set of points in the XOR pattern. Things in the
60 | upper right and lower left quadrant are class 1. Things in the other two
61 | quadrants are class -1.
62 |
63 | PARAMETERS
64 | ----------
65 | examples_per_class An integer determining how much data we'll generate
66 |
67 | RETURNS
68 | -------
69 | data A numpy array of 2 columns (dimensions) and 2*examples_per_class rows
70 |
71 | labels A numpy vector with 2*examples_per_class, with a +1 or -1 in each
72 | element. The jth element is the label of the jth example"""
73 |
74 | mean = [0, 0]
75 | cov = [[1, 0], [0, 1]]
76 |
77 | # make a circular unit Gaussian and sample from it
78 | data = np.random.multivariate_normal(mean, cov, examples_per_class*2)
79 |
80 | x = data.T[0]
81 | y = data.T[1]
82 |
83 | labels = np.sign(np.multiply(x, y))
84 |
85 | # shuffle the data
86 | perm = np.random.permutation(len(labels))
87 | data = data[perm]
88 | labels = labels[perm]
89 |
90 | return data, labels
91 |
92 |
93 | def make_center_surround_data(
94 | examples_per_class: int,
95 | distance_from_origin: float
96 | ):
97 | """
98 | Create a 2-dimensional set of points, where half the points are drawn from
99 | one Gaussian centered on the origin and the other half form a ring around
100 | the first class
101 |
102 | PARAMETERS
103 | ----------
104 | examples_per_class An integer determining how much data we'll generate
105 |
106 | distance_from_origin All points from one of the Gaussians will have their
107 | coordinates updated to have their distance from the
108 | origin increased by this ammount. Should be
109 | non-negative.
110 |
111 | RETURNS
112 | -------
113 | data A numpy array of 2 columns (dimensions) and 2*examples_per_class rows
114 |
115 | labels A numpy vector with 2*examples_per_class, with a +1 or -1 in each
116 | element. The jth element is the label of the jth example"""
117 |
118 | mean = [0, 0]
119 | cov = [[1, 0], [0, 1]]
120 |
121 | # negative class -1
122 | negData = np.random.multivariate_normal(mean, cov, examples_per_class)
123 |
124 | # positive class 1
125 | posData = np.random.multivariate_normal(mean, cov, examples_per_class)
126 |
127 | # now....treat the positive class as having been drawn from phase, magnitude
128 | # coordinates and manipulate the magnitude so the mean distance of the points
129 | # from the origin is 4...and make sure the distribution of phase is all the way
130 | # way around the circle
131 | magnitude = posData.T[0, :] + distance_from_origin
132 | phase = posData.T[1, :] * 2
133 |
134 | # now go back to cartesian coordinates
135 | x = magnitude * np.cos(phase)
136 | y = magnitude * np.sin(phase)
137 |
138 | # and stick it back in the array
139 | posData.T[0, :] = x
140 | posData.T[1, :] = y
141 |
142 | # wrap it up and return it.
143 | negL = np.ones(examples_per_class) * -1
144 | posL = np.ones(examples_per_class)
145 | data = np.concatenate([posData, negData])
146 | labels = np.concatenate([posL, negL])
147 |
148 | # shuffle the data
149 | perm = np.random.permutation(len(labels))
150 | data = data[perm]
151 | labels = labels[perm]
152 |
153 | return data, labels
154 |
155 |
156 | def make_spiral_data(examples_per_class):
157 | """
158 | Create a 2-dimensional set of points in two interwoven spirals. All elements
159 | in a single spiral share a label (either +1 or -1, depending on the spiral)
160 |
161 | PARAMETERS
162 | ----------
163 | examples_per_class An integer determining how much data we'll generate
164 |
165 | RETURNS
166 | -------
167 | data A numpy array of 2 columns (dimensions) and 2*examples_per_class rows
168 |
169 | labels A numpy vector with 2*examples_per_class, with a +1 or -1 in each
170 | element. The jth element is the label of the jth example"""
171 |
172 | theta = np.sqrt(np.random.rand(examples_per_class))*2*pi
173 |
174 | # make points in a spiral that have some randomness
175 | r_a = 2*theta + pi
176 | temp = np.array([np.cos(theta)*r_a, np.sin(theta)*r_a]).T
177 | negData = temp + np.random.randn(examples_per_class, 2)
178 |
179 | # make points in a spiral offset from the first one that have some randomness
180 | r_b = -2*theta - pi
181 | temp = np.array([np.cos(theta)*r_b, np.sin(theta)*r_b]).T
182 | posData = temp + np.random.randn(examples_per_class, 2)
183 |
184 | # give labels to the data
185 | negL = np.ones(examples_per_class) * -1
186 | posL = np.ones(examples_per_class)
187 |
188 | # return the data
189 | data = np.concatenate([posData, negData])
190 | labels = np.concatenate([posL, negL])
191 |
192 | # shuffle the data
193 | perm = np.random.permutation(len(labels))
194 | data = data[perm]
195 | labels = labels[perm]
196 |
197 | return data, labels
198 |
--------------------------------------------------------------------------------
/utils/gan/__init__.py:
--------------------------------------------------------------------------------
1 | from utils.gan.data import *
2 | from utils.gan.plotting import *
3 |
--------------------------------------------------------------------------------
/utils/gan/data.py:
--------------------------------------------------------------------------------
1 | import os
2 |
3 | from pathlib import Path
4 |
5 | import torch
6 | import torch.nn as nn
7 | import torch.nn.functional as F
8 |
9 | from torchvision import datasets, transforms
10 |
11 |
12 | def load_mnist(batch_size: int = 128):
13 | """
14 | Load MNIST dataset.
15 | :return: DataLoader object
16 | """
17 |
18 | cuda_kwargs = {
19 | 'num_workers': 1,
20 | 'pin_memory': True,
21 | 'shuffle': True
22 | } if torch.cuda.is_available() else {}
23 |
24 | # format image data, but do not normalize
25 | transform = transforms.Compose([
26 | transforms.ToTensor(),
27 | ])
28 |
29 | # download MNIST data
30 | data = datasets.MNIST(
31 | './data',
32 | train=True,
33 | download=True,
34 | transform=transform
35 | )
36 |
37 | # load MNIST data
38 | loader = torch.utils.data.DataLoader(
39 | data,
40 | batch_size=batch_size,
41 | **cuda_kwargs
42 | )
43 |
44 | return loader
45 |
46 |
47 | def load_fashionmnist(batch_size: int = 128):
48 | """
49 | Load MNIST dataset.
50 | :return: DataLoader object
51 | """
52 |
53 | cuda_kwargs = {
54 | 'num_workers': 1,
55 | 'pin_memory': True,
56 | 'shuffle': True
57 | } if torch.cuda.is_available() else {}
58 |
59 | # format image data, but do not normalize
60 | transform = transforms.Compose([
61 | transforms.ToTensor(),
62 | ])
63 |
64 | # download MNIST data
65 | data = datasets.FashionMNIST(
66 | './data',
67 | train=True,
68 | download=True,
69 | transform=transform
70 | )
71 |
72 | # load MNIST data
73 | loader = torch.utils.data.DataLoader(
74 | data,
75 | batch_size=batch_size,
76 | **cuda_kwargs
77 | )
78 |
79 | return loader
80 |
81 |
82 | def load_celeba(batch_size: int = 128):
83 | """
84 | Load CelebA dataset.
85 | :return: DataLoader object
86 | """
87 |
88 | cuda_kwargs = {
89 | 'num_workers': 1,
90 | 'pin_memory': True,
91 | 'shuffle': True
92 | } if torch.cuda.is_available() else {}
93 |
94 | # crop data to 3x64x64 images (channels/width/height)
95 | image_size = 64
96 |
97 | # format image data
98 | transform = transforms.Compose([
99 | transforms.Resize(image_size),
100 | transforms.CenterCrop(image_size),
101 | transforms.ToTensor(),
102 | transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))
103 | ])
104 |
105 | # download MNIST data
106 | data = datasets.CelebA(
107 | './data',
108 | split='all',
109 | download=True,
110 | transform=transform
111 | )
112 |
113 | # load MNIST data
114 | loader = torch.utils.data.DataLoader(
115 | data,
116 | batch_size=batch_size,
117 | **cuda_kwargs
118 | )
119 |
120 | return loader
121 |
--------------------------------------------------------------------------------
/utils/gan/plotting.py:
--------------------------------------------------------------------------------
1 | import io
2 | import torch
3 | import torchvision.utils as vutils
4 | import matplotlib.pyplot as plt
5 | import numpy as np
6 | from PIL import Image
7 | from torchvision.transforms import ToTensor
8 |
9 |
10 | def make_grid(batch: torch.Tensor, size: int, title: str = "Training Images"):
11 | """
12 | Plot images in a grid with the specified dimensions.
13 | """
14 |
15 | # check that a square grid of the given size can be plotted
16 | assert batch.shape[0] >= size * size
17 |
18 | images = batch[:size * size, ...].detach().cpu()
19 |
20 | fig = plt.figure(figsize=(size, size))
21 | plt.axis("off")
22 | plt.title(title)
23 | plt.imshow(
24 | np.transpose(
25 | vutils.make_grid(images, padding=2, normalize=True),
26 | (1, 2, 0)
27 | )
28 | )
29 |
30 | # save plot to buffer
31 | buf = io.BytesIO()
32 | plt.savefig(buf, format="png")
33 | plt.close(fig)
34 | buf.seek(0)
35 | img = Image.open(buf)
36 |
37 | # return plot as image
38 | return ToTensor()(np.array(img))
39 |
40 |
41 | def make_loss_plot(max_epochs: int, loss_d: np.ndarray, loss_g: np.ndarray):
42 | """
43 | Plot discriminator and generator losses by epoch
44 | """
45 | fig = plt.figure()
46 | plt.xlim((0, max_epochs))
47 | plt.plot(range(0, max_epochs), loss_d, label='discriminator')
48 | plt.plot(range(0, max_epochs), loss_g, label='generator')
49 | plt.legend()
50 |
51 | # save plot to buffer
52 | buf = io.BytesIO()
53 | plt.savefig(buf, format="png")
54 | plt.close(fig)
55 | buf.seek(0)
56 | img = Image.open(buf)
57 |
58 | # return plot as image
59 | return ToTensor()(np.array(img))
60 |
61 |
62 | def make_score_plot(max_epochs: int, scores_real: np.ndarray, scores_fake: np.ndarray):
63 | """
64 | Plot discriminator scores for real and generated inputs
65 | """
66 | fig = plt.figure()
67 | plt.xlim((0, max_epochs))
68 | plt.plot(range(0, max_epochs), scores_real, label='real')
69 | plt.plot(range(0, max_epochs), scores_fake, label='generated')
70 | plt.legend()
71 |
72 | # save plot to buffer
73 | buf = io.BytesIO()
74 | plt.savefig(buf, format="png")
75 | plt.close(fig)
76 | buf.seek(0)
77 | img = Image.open(buf)
78 |
79 | # return plot as image
80 | return ToTensor()(np.array(img))
81 |
--------------------------------------------------------------------------------
/utils/plotting.py:
--------------------------------------------------------------------------------
1 | import torch
2 | import torch.nn as nn
3 |
4 | import matplotlib
5 | import matplotlib.pyplot as plt
6 | import numpy as np
7 |
8 | ################################################################################
9 | # Some simple plotting utilities
10 | ################################################################################
11 |
12 |
13 | def plot_data(data: np.ndarray,
14 | labels: np.ndarray,
15 | ax: matplotlib.axes.Axes = None):
16 | """
17 | A helper function to plot our data sets
18 |
19 | PARAMETERS
20 | ----------
21 | data A numpy array of 2 columns (dimensions) and 2*examples_per_class rows
22 |
23 | labels A numpy vector with 2*examples_per_class, with a +1 or -1 in each
24 | element. The jth element is the label of the jth example
25 |
26 | ax An optional matplotlib axis object to plot to
27 | """
28 |
29 | # require shape (n, 2)
30 | assert data.ndim == 2
31 | assert data.shape[-1] == 2
32 |
33 | # plot the data
34 | pos_idx = np.where(labels == 1)
35 | neg_idx = np.where(labels == -1)
36 |
37 | if ax is None:
38 | ax = plt
39 | ax.plot(
40 | data.T[0, pos_idx],
41 | data.T[1, pos_idx],
42 | 'r^',
43 | label='positive'
44 | )
45 | ax.plot(
46 | data.T[0, neg_idx],
47 | data.T[1, neg_idx],
48 | 'bo',
49 | label='negative'
50 | )
51 | ax.axis('equal')
52 | handles, labels = plt.gca().get_legend_handles_labels()
53 | by_label = dict(zip(labels, handles))
54 | plt.legend(by_label.values(), by_label.keys(), loc="upper right")
55 |
56 | if ax is None:
57 | plt.show()
58 |
59 |
60 | def plot_decision_surface(model=None,
61 | axis_limits=(-5, 5, -5, 5),
62 | ax: matplotlib.axes.Axes = None
63 | ):
64 | """
65 | Creates a grid of points, measures what a model would label each
66 | point as, and uses this data to draw a region for class +1 and a region for
67 | class -1.
68 |
69 | PARAMETERS
70 | ----------
71 | model A callable model that can take 2-d real-valued input and produce
72 | a +1 or -1 label for each data point.
73 |
74 | axis_limits An array-like object with 4 floats [lowest_horizontal, highest_horizontal,
75 | lowest_vertical, highest_vertical]. This sets the limits over which
76 | the decision surface will be caluclated and plotted.
77 |
78 | ax An optional matplotlib axis object to plot to
79 |
80 | RETURNS
81 | -------
82 | my_contour a matplotlib.contour.QuadContourSet with the contour
83 | """
84 |
85 | # Create a grid of points spanning the entire space displayed in the axis.
86 | # This will let us draw the decision boundary later
87 | xx, yy = np.meshgrid(np.arange(axis_limits[0], axis_limits[1], .05),
88 | np.arange(axis_limits[2], axis_limits[3], .05))
89 | data = np.concatenate([xx.reshape([1, -1]), yy.reshape([1, -1])]).T
90 |
91 | # Predict the class of each point in XGrid, using the classifier.
92 | # This shows our regions determined by the classifier
93 | if isinstance(model, nn.Module):
94 | with torch.no_grad():
95 | pl = model(torch.tensor(data).to(dtype=torch.float32))
96 | predicted_labels = np.sign(pl.numpy())
97 | else:
98 | predicted_labels = model(data)
99 |
100 | predicted_labels = predicted_labels.reshape(xx.shape)
101 |
102 | # Put the result into a color plot
103 | if ax is None:
104 | ax = plt
105 |
106 | ax.contourf(xx, yy, predicted_labels, cmap=plt.cm.Paired)
107 | ax.axis('equal')
108 | ax.axis('tight')
109 |
110 | if ax is None:
111 | plt.show()
112 |
--------------------------------------------------------------------------------