├── .gitattributes ├── .gitignore ├── README.md └── notebooks ├── 1-Pytorch-Introduction.ipynb ├── 2-Pytorch-Autograd.ipynb ├── 3-Pytorch-Optimizers.ipynb ├── 4-Pytorch-Modules.ipynb ├── 5-Pytorch-Dataloader.ipynb ├── 6-Pytorch-Alexnet-Example.ipynb ├── autograd-graph.png ├── dataloader.png ├── how-to-read-pytorch.png ├── ipynb_drop_output.py └── setup_notebooks.sh /.gitattributes: -------------------------------------------------------------------------------- 1 | *.ipynb filter=clean_ipynb 2 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | /datasets 3 | /models 4 | /checkpoints 5 | /notebooks/datasets 6 | /notebooks/models 7 | /notebooks/checkpoints 8 | __pycache__ 9 | .DS_Store 10 | .__* 11 | .ipynb* 12 | .nfs* 13 | .*swp 14 | .idea 15 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | David's Tips on How to Read Pytorch 2 | =================================== 3 | 4 | ![Figure thumbnails](notebooks/how-to-read-pytorch.png) 5 | 6 | These five python notebooks are an illustrated introduction to core pytorch idioms. Click below to run them on Colab. 7 | 8 | 1. [Tensor arithmetic](https://colab.research.google.com/github/davidbau/how-to-read-pytorch/blob/master/notebooks/1-Pytorch-Introduction.ipynb): the notation for manipulating n-dimensional arrays of numbers on CPU or GPU. 9 | 2. [Autograd](https://colab.research.google.com/github/davidbau/how-to-read-pytorch/blob/master/notebooks//2-Pytorch-Autograd.ipynb): how to get derivatives of any scalar with respect to any tensor input. 10 | 3. [Optimization](https://colab.research.google.com/github/davidbau/how-to-read-pytorch/blob/master/notebooks//3-Pytorch-Optimizers.ipynb): ways to update tensor parameters to reduce any computed objective, using autograd gradients. 11 | 4. [Network modules](https://colab.research.google.com/github/davidbau/how-to-read-pytorch/blob/master/notebooks//4-Pytorch-Modules.ipynb): how pytorch represents neural networks for convenient composition, training, and saving. 12 | 5. [Datasets and Dataloaders](https://colab.research.google.com/github/davidbau/how-to-read-pytorch/blob/master/notebooks//5-Pytorch-Dataloader.ipynb): for efficient multithreaded prefetching of large streams of data. 13 | 14 | Pytorch is a numerical library that makes it very convenient to train deep networks on GPU hardware. It introduces a new programming vocabulary that takes a few steps beyond regular numerical python code. Although pytorch code can look simple and concrete, much of of the subtlety of what happens is invisible, so when working with pytorch code it helps to thoroughly understand the runtime model. 15 | 16 | For example, consider this code: 17 | 18 | ``` 19 | torch.nn.cross_entropy(model(images.cuda()), labels.cuda()).backward() 20 | optimizer.step() 21 | ``` 22 | 23 | It looks like it computes some function of `images` and `labels` without storing the answer. But actually the purpose of this code is to update some hidden parameters that are not explicit in this formula. This line of code moves batches of image and label data from CPU to the GPU; runs a neural network to make a prediction; constructs a computation graph describing how the loss depends on the network parameters; annotates every network parameter with a gradient; then finally it runs one step of optimization to adjust every parameter of the model. During all this, the CPU does not see any of the actual answers. That is intentional for speed reasons. All the numerical computation is done on the GPU asynchronously and kept there. 24 | 25 | The berevity of the code is what makes pytorch code fun to write. But it also reflects why pytorch can be so fast even though the python interpreter is so slow. Although the main python logic slogs along sequentially in a single very slow CPU thread, just a few python instructions can load a huge amount of work into the GPU. That means the program can keep the GPU busy churning through massive numerical computations, for most part, without waiting for the python interpreter. 26 | 27 | Is is worth understanding five idioms that work together to make this possible. The five notebooks in this directory are a quick overview of these idioms. 28 | 29 | The key ideas are illustrated with small, runnable, tweakable examples, and there are links to other reference material and resources. 30 | 31 | All the notebooks can be run on Google Colab where GPUs can be used for free. Or they can be run on your own local Jupyter notebook server. The examples should all work with python 3.5 or newer and pytorch 1.0 or newer. 32 | 33 | [Start with the first notebook here!](https://colab.research.google.com/github/davidbau/how-to-read-pytorch/blob/master/notebooks/1-Pytorch-Introduction.ipynb) 34 | 35 | --- *David Bau, July 2020* 36 | 37 | ([David](https://people.csail.mit.edu/davidbau/home/) is a PhD student at MIT and former Google engineer. His research pursues transparency in deep networks.) 38 | -------------------------------------------------------------------------------- /notebooks/1-Pytorch-Introduction.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "How to Read Pytorch\n", 8 | "===================\n", 9 | "\n", 10 | "These five python notebooks are an introduction to core pytorch idioms.\n", 11 | "\n", 12 | "Pytorch is a numerical library that makes it very convenient to train deep networks on GPU hardware. It introduces a new programming vocabulary that takes a few steps beyond regular numerical python code. Although pytorch code can look simple and concrete, much of of the subtlety of what happens is invisible, so when working with pytorch code it helps to thoroughly understand the runtime model.\n", 13 | "\n", 14 | "For example, consider this code:\n", 15 | "\n", 16 | "```\n", 17 | "torch.nn.cross_entropy(model(images.cuda()), labels.cuda()).backward()\n", 18 | "optimizer.step()\n", 19 | "```\n", 20 | "\n", 21 | "It looks like it computes some function of `images` and `labels` without storing the answer. But actually the purpose of this code is to update some hidden parameters that are not explicit in this formula. This line of code moves batches of image and label data from CPU to the GPU; runs a neural network to make a prediction; constructs a computation graph describing how the loss depends on the network parameters; annotates every network parameter with a gradient; then finally it runs one step of optimization to adjust every parameter of the model. During all this, the CPU does not see any of the actual answers. That is intentional for speed reasons. All the numerical computation is done on the GPU asynchronously and kept there.\n", 22 | "\n", 23 | "The berevity of the code is what makes pytorch code fun to write. But it also reflects why pytorch can be so fast even though the python interpreter is so slow. Although the main python logic slogs along sequentially in a single very slow CPU thread, just a few python instructions can load a huge amount of work into the GPU. That means the program can keep the GPU busy churning through massive numerical computations, for most part, without waiting for the python interpreter.\n", 24 | "\n", 25 | "Is is worth understanding five core idioms that work together to make this possible. This tutorial has five Colab notebooks, one for each topic:\n", 26 | "\n", 27 | " 1. GPU Tensor arithmetic ([this notebook on colab](https://colab.research.google.com/github/davidbau/how-to-read-pytorch/blob/master/notebooks/1-Pytorch-Introduction.ipynb)): the notation for manipulating n-dimensional arrays of numbers on CPU or GPU.\n", 28 | " 2. [Autograd](./2-Pytorch-Autograd.ipynb): how to build a tensor computation graph and use it to get derivatives of any scalar with respect to any input.\n", 29 | " 3. [Optimization](./3-Pytorch-Optimizers.ipynb): ways to update tensor parameters to reduce any computed objective, using autograd gradients.\n", 30 | " 4. [Network modules](./4-Pytorch-Modules.ipynb): how pytorch represents neural networks for convenient composition, training, and saving.\n", 31 | " 5. [Datasets and Dataloaders](./5-Pytorch-Dataloader.ipynb): for efficient multithreaded prefetching of large streams of data.\n", 32 | "\n", 33 | "The key ideas are illustrated with small, illustrated, hackable examples, and there are links to other reference material and resources.\n", 34 | "\n", 35 | "All the notebooks can be run on Google Colab where some GPU compuation can be used for free, or they can be run on your own local Jupyter notebook server.\n", 36 | "\n", 37 | "The examples should all work with python 3.5 or newer and pytorch 1.0 or newer.\n", 38 | "\n", 39 | "The original [code on github can be found here](https://github.com/davidbau/how-to-read-pytorch).\n", 40 | "\n", 41 | "--- [*David Bau, July 2020*](http://davidbau.com/archives/2020/07/05/davids_tips_on_how_to_read_pytorch.html)" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "metadata": {}, 47 | "source": [ 48 | "Topic 1: pytorch Tensors\n", 49 | "===============\n", 50 | "\n", 51 | "The first big trick for doing math fast on a modern computer is to do giant array operations all at once.\n", 52 | "\n", 53 | "To faciliate this, pytorch provides a [`torch.Tensor`](https://pytorch.org/docs/stable/tensors.html) class that is a lookalike to the older python numerical library [`numpy.ndarray`](https://numpy.org/doc/1.18/reference/arrays.ndarray.html). Just like a numpy `ndarray`, the pytorch `Tensor` stores a d-dimensional array of numbers, where d can be zero or more, and where the contained numbers can be any of the usual selection of float or integer types. Pytorch is designed to feel just like numpy: almost all the numpy operations are also available on torch tensors. But if something is missing, torch tensors can be directly converted to and from numpy using `x.numpy()` and `torch.from_numpy(a)`. So what is different and why did the pytorch authors bother to reimplement this whole library?\n", 54 | "\n", 55 | "**There are two things that pytorch Tensors have that numpy arrays lack:**\n", 56 | "\n", 57 | " 1. pytorch Tensors can live on either **GPU or CPU** (numpy is cpu-only).\n", 58 | " 2. pytorch can automatically track tensor computations to enable **automatic differentiation**.\n", 59 | "\n", 60 | "In the following sections on this page we talk about the basics of the Tensor API as well as point (1) - how to work with GPU and CPU tensors. A discussion of (2) can be found in the next notebook, [2. Autograd](https://colab.research.google.com/github/davidbau/pytorch-tutorial/blob/master/notebooks/2-Pytorch-Autograd.ipynb).\n" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "Basic operations in the Tensor API\n", 68 | "----------------------------------\n", 69 | "\n", 70 | "Pytorch is not very different from numpy, although the pytorch API has more convenience methods such as `x.clamp(0).pow(2)` (supporting a chained method style, as is popular in Javascript libraries, so you don't need to say the verbose `numpy.pow(numpy.clip(x, 0), 2)`). So code is often shorter in pytorch. A brief overview:\n", 71 | "\n", 72 | "**Elementwise operations.** Most tensor operations are simple (embarassingly parallelizable) elementwise operations, where the same math is done on every element of the array. `x+y`, `x*y`, `x.abs()`, `x.pow(3)`, etc. Unlike Matlab, `*` is for element-wise multiplication, not matrix-multiplication.\n", 73 | "\n", 74 | "**Copy semantics by default.** Almost all operations, including things like `x.sort()`, return a new copy of the tensor without overwriting the input tensors. The exceptions are functions that end in an underscore such as `x.mul_(2)` which doubles the contents of x in-place.\n", 75 | "\n", 76 | "**Common reduction operations.** There are some common operations such as `max`, `min`, `mean`, `sum` that reduce the array by one or more dimension. In pytorch, you can specify which dimension you want to reduce by passing the argument `dim=n`.\n", 77 | "\n", 78 | "**Why does min return two things?** Note that `[data, indexes] = x.sort(dim=0)` and `[vals, indexes] = x.min(dim=0)` return the pair of both the answer and the index values, so you do not need to separately recompute `argsort` or `argmin` when you need to know where the min came from.\n", 79 | "\n", 80 | "**What about linear algebra?** It's there. `torch.mm(a,b)` is matrix multiplication, `torch.inverse(a)` inverts, `torch.eig(a)` gets eigenvalues, etc.\n", 81 | "\n", 82 | "The other thing to know is that pytorch tends to be very fast, often much faster than numpy even on CPU, because its implementation is aggressively parallelized behind-the-scenes. Pytorch is willing to use multiple threads in situations where numpy just uses one.\n", 83 | "\n", 84 | "See the [reference for Tensor methods](https://pytorch.org/docs/stable/tensors.html#torch.Tensor) for what comes built-in. A simple demo of some vectors:\n", 85 | "\n" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 1, 91 | "metadata": {}, 92 | "outputs": [ 93 | { 94 | "name": "stdout", 95 | "output_type": "stream", 96 | "text": [ 97 | "tensor([0.0000, 0.0500, 0.1000, 0.1500, 0.2000])\n" 98 | ] 99 | } 100 | ], 101 | "source": [ 102 | "import math, numpy, torch\n", 103 | "from matplotlib import pyplot as plt\n", 104 | "\n", 105 | "# Make a vector of 101 equally spaced numbers from 0 to 5.\n", 106 | "x = torch.linspace(0, 5, 101)\n", 107 | "\n", 108 | "# Print the first five things in x.\n", 109 | "print(x[:5])" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "### Exercise\n", 117 | "\n", 118 | "Print the last five things in x." 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 6, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "name": "stdout", 128 | "output_type": "stream", 129 | "text": [ 130 | "TODO\n" 131 | ] 132 | } 133 | ], 134 | "source": [ 135 | "# TODO: Print the last five things in x (instead of the first five)\n", 136 | "\n", 137 | "print('TODO')" 138 | ] 139 | }, 140 | { 141 | "cell_type": "code", 142 | "execution_count": 3, 143 | "metadata": {}, 144 | "outputs": [ 145 | { 146 | "name": "stdout", 147 | "output_type": "stream", 148 | "text": [ 149 | "The shape of x is torch.Size([101])\n", 150 | "The shape of y1=x.sin() is torch.Size([101])\n", 151 | "The shape of y2=x ** x.cos() is torch.Size([101])\n", 152 | "The shape of y3=y2 - y1 is torch.Size([101])\n", 153 | "The shape of y4=y3.min() is torch.Size([]), a zero-d scalar\n" 154 | ] 155 | }, 156 | { 157 | "data": { 158 | "image/png": "\n", 159 | "text/plain": [ 160 | "
" 161 | ] 162 | }, 163 | "metadata": { 164 | "needs_background": "light" 165 | }, 166 | "output_type": "display_data" 167 | } 168 | ], 169 | "source": [ 170 | "# Do some vector computations.\n", 171 | "y1, y2 = x.sin(), x ** x.cos()\n", 172 | "y3 = y2 - y1\n", 173 | "y4 = y3.min()\n", 174 | "\n", 175 | "# Print and plot some answers.\n", 176 | "print(f'The shape of x is {x.shape}')\n", 177 | "print(f'The shape of y1=x.sin() is {y1.shape}')\n", 178 | "print(f'The shape of y2=x ** x.cos() is {y2.shape}')\n", 179 | "print(f'The shape of y3=y2 - y1 is {y3.shape}')\n", 180 | "print(f'The shape of y4=y3.min() is {y4.shape}, a zero-d scalar')\n", 181 | "\n", 182 | "plt.plot(x, y1, 'red', x, y2, 'blue', x, y3, 'green')\n", 183 | "plt.axhline(y4, color='green', linestyle='--')\n", 184 | "plt.show()" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "### Exercise\n", 192 | "\n", 193 | "Plot y3 clamped between 0.0 and 1.0." 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 62, 199 | "metadata": {}, 200 | "outputs": [ 201 | { 202 | "data": { 203 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAD4CAYAAAAKA1qZAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAJYUlEQVR4nO3cW4jmd33H8c+32YYSWrFmt9YmadfWC6lQYhxCL1IIVUTSoC30UtrihQqKPVDbYBCk0IvEirbggSDYSBVvekQsbQ2FFoqR2ZxEo00itdUmTdILtbRFbL+9mP+WcZ3ZnXnm8Mzk+3rBn3nmf3jy+/KHvHf+z+xWdweAmb5n3QsAYH1EAGAwEQAYTAQABhMBgMHOrHsB+3H27Nk+f/78upcBcKpcuHDh2e4+t9OxUxWB8+fPZ3Nzc93LADhVquorux3zOAhgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDBLhuBqrq2qh5atqeq6mvbvv/Rqvrzqnqsqp6oqt+vqquX626tqq9X1YNV9aWq+ruquv2S935jVX1x2T5bVbcc5aAAfLczlzvY3f+e5MYkqap3JfmP7v69qqok9yf5YHe/rqquSnJPkt9N8vbl8r/v7tuXa29M8mdV9V/dfd8ShDcluaW7n62qm5bjN3f3U4c+JQA7WvVx0M8m+e/u/kiSdPf/JPn1JG+oqmsuPbm7H0ryO0neuuz67SRv7+5nl+MPJLk3yVtWXA8AK1g1Ai9LcmH7ju7+RpJ/TvKSXa55IMlLd7s+yeay/zssj402q2rzmWeeWXG5AOxk1QhUkt7H/ovH9v2e3X1Pd29098a5c+f2t0oALmvVCHw+ycb2HVX1vCQ3JHlil2tenuTR5fUXkrzikuM3LfsBOCarRuC+JNdU1S8lyfLB8HuS/GF3/+elJ1fVTyV5Z5L3L7vuTnJXVV27HL8xya8k+cCK6wFgBZf97aDddHdX1S8k+UBVvTNbMflUkndsO+1nqurBJNckeTrJ27r7vuX6v6iq65L8Q1V1km8meX13P3mAWQDYp+re7RH+ybOxsdGbm5vrXgbAqVJVF7p7Y6dj/sYwwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMJgIAg4kAwGAiADCYCAAMVt297jXsWVU9k+Qr617HCs4meXbdizhmZp7BzKfDj3X3uZ0OnKoInFZVtdndG+tex3Ey8wxmPv08DgIYTAQABhOB43HPuhewBmaewcynnM8EAAbzkwDAYCIAMJgIHJKqekFV/U1VPbZ8/cFdzntNVX2pqh6vqjt2OP6bVdVVdfboV30wB525qt5dVV+sqkeq6k+r6vnHtvh92MM9q6r6g+X4I1V1016vPalWnbmqbqiqv62qR6vq81X1q8e/+tUc5D4vx6+qqger6pPHt+pD0N22Q9iS3J3kjuX1HUnu2uGcq5I8keTHk1yd5OEkP7nt+A1J/ipbfyHu7LpnOuqZk7w6yZnl9V07Xb/u7Ur3bDnntiR/maSS/HSS+/d67UncDjjzi5LctLz+gST/+Fyfedvx30jy8SSfXPc8+9n8JHB4Xpfk3uX1vUl+fodzbk7yeHd/ubu/leQTy3UXvTfJbyU5LZ/WH2jm7v7r7v72ct5nklx/tMtdyZXuWZbvP9pbPpPk+VX1oj1eexKtPHN3P9ndDyRJd38zyaNJrjvOxa/oIPc5VXV9kp9L8uHjXPRhEIHD88LufjJJlq8/tMM51yX5l23ff3XZl6p6bZKvdffDR73QQ3SgmS/xhmz9Keuk2cv6dztnr7OfNAeZ+f9V1fkkL09y/+Ev8dAddOb3ZesPcP97ROs7MmfWvYDTpKo+neSHdzh0517fYod9XVXXLO/x6lXXdlSOauZL/ht3Jvl2ko/tb3XH4orrv8w5e7n2JDrIzFsHq74/yR8n+bXu/sYhru2orDxzVd2e5OnuvlBVtx72wo6aCOxDd79qt2NV9W8XfxxefkR8eofTvpqt5/4XXZ/kX5P8RJIXJ3m4qi7uf6Cqbu7upw5tgBUc4cwX3+OXk9ye5JW9PFg9YS67/iucc/Uerj2JDjJzqup7sxWAj3X3nxzhOg/TQWb+xSSvrarbknxfkudV1R919+uPcL2HZ90fSjxXtiTvznd+SHr3DuecSfLlbP0P/+KHTy/b4bx/yun4YPhAMyd5TZIvJDm37lkuM+MV71m2ngVv/8Dws/u53ydtO+DMleSjSd637jmOa+ZLzrk1p+yD4bUv4LmyJbk2yX1JHlu+vmDZ/yNJPrXtvNuy9RsTTyS5c5f3Oi0RONDMSR7P1jPWh5btQ+ueaZc5v2v9Sd6c5M3L60ry/uX455Js7Od+n8Rt1ZmT3JKtxyiPbLuvt617nqO+z9ve49RFwD8bATCY3w4CGEwEAAYTAYDBRABgMBEAGEwEAAYTAYDB/g9I3ziJUe7pHQAAAABJRU5ErkJggg==\n", 204 | "text/plain": [ 205 | "
" 206 | ] 207 | }, 208 | "metadata": { 209 | "needs_background": "light" 210 | }, 211 | "output_type": "display_data" 212 | } 213 | ], 214 | "source": [ 215 | "# TODO: Plot y3 clamped between 0.0 and 1.0.\n", 216 | "\n", 217 | "plt.plot('TODO')\n", 218 | "plt.show()\n" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "Subscripts and multiple dimensions\n", 226 | "----------------------------------\n", 227 | "\n", 228 | "Pytorch code is full of multidimensional arrays. The key to reading this kind of code is stopping to think about the careful, sometimes tangled, use of multiple array subscripts.\n", 229 | "\n", 230 | "**Slicing.** As normal in python, you can use `[min:max:stride]` to slice ranges, and multidimensional subscripts like `x[2,0,1,9]` work as you would expect (selecting the 9th entry of the of the 1st of the 0th of the 2nd entry of `x`; and can be used with slices like `x[0:3,2,:,:]`. The special slice `:` selects the whole range in that dimension.\n", 231 | "\n", 232 | "**Unsqueezing to add a dimension, and broadcasting.** While a single integer subscript like `x[0]` eliminates a dimension, the special subscript `x[None]` does the reverse and adds an extra dimension of size one.\n", 233 | "\n", 234 | "An extra dimension of size one is more useful than you might imagine, because pytorch (similar to numpy) can combine different-shaped arrays as long as the shape differences appear only on dimensions of size one by **broadcasting** the singleton dimensions. An example that uses broadcasting to calculate an outer product is illustrated below.\n", 235 | "\n", 236 | "**Fancy indexing.** Lots more can be done by passing numerical arrays or boolean array masks as subscripts. The reshuffling possibilities can get quite intricate; the rules are modeled on the capabilties in numpy. For details see [Numpy fancy indexing](https://numpy.org/doc/stable/user/basics.indexing.html).\n", 237 | "\n", 238 | "Here is a demonstration of simple tensor reshaping." 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 12, 244 | "metadata": {}, 245 | "outputs": [ 246 | { 247 | "name": "stdout", 248 | "output_type": "stream", 249 | "text": [ 250 | "m is tensor([[0.1353, 1.2838, 0.2440, 0.5774, 1.3416],\n", 251 | " [0.9628, 0.1760, 0.4458, 0.9256, 1.6327]]), and m[1,2] is 0.445751816034317\n", 252 | "\n", 253 | "column zero, m[:,0] is tensor([0.1353, 0.9628])\n", 254 | "row zero m[0,:] is tensor([0.1353, 1.2838, 0.2440, 0.5774, 1.3416])\n", 255 | "\n", 256 | "The dot product of rows (m[0,:] * m[1,:]).sum() is 3.1897406578063965\n", 257 | "\n", 258 | "The outer product of rows m[0,:][None,:] * m[1,:][:,None] is:\n", 259 | "tensor([[0.1302, 1.2361, 0.2349, 0.5560, 1.2917],\n", 260 | " [0.0238, 0.2259, 0.0429, 0.1016, 0.2361],\n", 261 | " [0.0603, 0.5723, 0.1088, 0.2574, 0.5980],\n", 262 | " [0.1252, 1.1883, 0.2258, 0.5345, 1.2417],\n", 263 | " [0.2208, 2.0961, 0.3983, 0.9428, 2.1904]])\n" 264 | ] 265 | }, 266 | { 267 | "data": { 268 | "image/png": "\n", 269 | "text/plain": [ 270 | "
" 271 | ] 272 | }, 273 | "metadata": { 274 | "needs_background": "light" 275 | }, 276 | "output_type": "display_data" 277 | } 278 | ], 279 | "source": [ 280 | "import torch\n", 281 | "from matplotlib import pyplot as plt\n", 282 | "\n", 283 | "# Make an array of normally distributed randoms.\n", 284 | "m = torch.randn(2, 5).abs()\n", 285 | "print(f'm is {m}, and m[1,2] is {m[1,2]}\\n')\n", 286 | "print(f'column zero, m[:,0] is {m[:,0]}')\n", 287 | "print(f'row zero m[0,:] is {m[0,:]}\\n')\n", 288 | "dot_product = (m[0,:] * m[1,:]).sum()\n", 289 | "print(f'The dot product of rows (m[0,:] * m[1,:]).sum() is {dot_product}\\n')\n", 290 | "outer_product = m[0,:][None,:] * m[1,:][:,None]\n", 291 | "print(f'The outer product of rows m[0,:][None,:] * m[1,:][:,None] is:\\n{outer_product}')\n", 292 | "\n", 293 | "fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(5, 5), dpi=100)\n", 294 | "def color_mat(ax, m, title):\n", 295 | " ax.set_title(title)\n", 296 | " ax.imshow(m, cmap='hot', vmax=1.5, interpolation='nearest')\n", 297 | " ax.get_xaxis().set_ticks(range(m.shape[1]))\n", 298 | " ax.get_yaxis().set_ticks(range(m.shape[0]))\n", 299 | "color_mat(ax1, m, 'm[:,:]')\n", 300 | "color_mat(ax2, m[0,:][None,:], 'm[0,:][None,:]')\n", 301 | "color_mat(ax3, m[1,:][:,None], 'm[1,:][:,None]')\n", 302 | "color_mat(ax4, outer_product, 'm[0,:][None,:] * m[1,:][:,None]')\n", 303 | "fig.tight_layout()\n", 304 | "fig.show()" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": {}, 310 | "source": [ 311 | "### Exercise\n", 312 | "\n", 313 | "Use `torch.mm` to compute `outer_product` and `dot_product`.\n", 314 | "\n", 315 | "Explain to yourself why order matters when using torch.mm but not when using `*`. " 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 32, 321 | "metadata": {}, 322 | "outputs": [ 323 | { 324 | "name": "stdout", 325 | "output_type": "stream", 326 | "text": [ 327 | "False\n", 328 | "False\n" 329 | ] 330 | } 331 | ], 332 | "source": [ 333 | "# TODO Use torch.mm to compute outer_product and dot_product.\n", 334 | "\n", 335 | "outer = 'TODO'\n", 336 | "print(outer == outer_product)\n", 337 | "dot = 'TODO'\n", 338 | "print(dot == dot_product)" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": {}, 344 | "source": [ 345 | "Devices and types\n", 346 | "-----------------\n", 347 | "\n", 348 | "One of the big reasons to use pytorch instead of numpy is that pytorch can do computations on the GPU. But because moving data on and off of a GPU device is more expensive than keeping it within the device, pytorch treats a Tensor's **computing device** as pseudo-type that requires explicit declaration and explicit conversion. Here are some things to know about pytorch devices and types:\n", 349 | "\n", 350 | "**Single precision CPU default.** By default a torch tensor will be stored on the CPU and will store single-precision 32-bit `torch.float` values.\n", 351 | "\n", 352 | "**Specifying data type.** To store a different data type such as integers, use the argument `dtype=torch.long` when you create the Tensor. For example, `z = torch.zeros(10, dtype=torch.long)`. This is similar to numpy with minor differences. See the [Tensor reference](https://pytorch.org/docs/stable/tensors.html) for all the types.\n", 353 | "\n", 354 | "**Specifying GPU.** To store the tensor on the GPU, specify `device='cuda'` when you make it, for example `identity_matrix = torch.eye(5, device='cuda')`. (Instead `device='cpu'` indicates the default CPU storage).\n", 355 | "\n", 356 | "Even on a multi-GPU machine it is fine to pretend there is only one GPU. Setting the environment variable `CUDA_VISIBLE_DEVICES=3` before you start the program will set up the process to see GPU\\#3 as the only visible GPU when it runs.\n", 357 | "\n", 358 | "As an aside, in principle you could instead target one of many GPUs with `device='cuda:3'`, but if you want to use multiple GPUs for the same computation your best bet is to a use a multiprocess utility class that manages data distribution between forked processes automatically, while each python process touches only one GPU. When this becomes an issue, read the [DistributedDataParalllel docs](https://pytorch.org/docs/stable/distributed.html).\n", 359 | "\n", 360 | "**Copying a tensor to a different device or type.** You cannot directly combine tensors that are on different devices (e.g., GPU vs CPU or different GPUs); this is similar to how most different-data-type combinations are also prohibited. In both cases you will need to convert types and move devices explicitly to make tensors compatible before combining them. The `x.to(y.device)` or `x.to(y.dtype)` function can be used to do the conversion.\n", 361 | "\n", 362 | "There are also commonly-used convenience synonyms `x.cpu()`, `x.cuda()`, `x.float()`, `x.long()`, etc. for making a copy of `x` with the specified device or type. There is a bit of cost, so move data only when needed.\n", 363 | "\n", 364 | "**GPU rounding is nondeterministic.** Computationally the GPU is **not** perfectly equivalent to the CPU. To speed parallelization, the GPU does not do associative operations such as summations in a deterministic sequential order. Since changing the order of summations can alter rounding behavior in fixed-precision arithmetic, GPU rounding can be different from CPU results an even nondeterministic. When the numerical algorithm is well-behaved, the difference should be small enough that you do not care, but you should know it is different. You can see this gap in the code example below.\n", 365 | "\n", 366 | "**float is fastest.** All commodity GPU hardware is fast at single-precision 32-bit floating-point math, about 20x CPU speed. Be aware that only expensive cards are fast at 64-bit double-precision math. If you change `torch.float` in the below example to `torch.double` on an Nvidia Titan or consumer card without hardware double-precision support, you will slow down to just-slightly-faster-than-CPU speeds. Similarly 16-bit `torch.half` or `torch.bfloat16` or other cool options will only be faster on newer hardware, and with these data types you need to take care that the reduced precision is not damaging your results.\n", 367 | "\n", 368 | "So `float` is the default and usually the best.\n", 369 | "\n", 370 | "Also note that some operations (like linear algebra) are floating-point only and cannot be done on integers.\n", 371 | "\n", 372 | "An example of some CPU versus GPU speed comparisons is below." 373 | ] 374 | }, 375 | { 376 | "cell_type": "code", 377 | "execution_count": 35, 378 | "metadata": {}, 379 | "outputs": [ 380 | { 381 | "name": "stdout", 382 | "output_type": "stream", 383 | "text": [ 384 | "time using the CPU alone: 1.39 seconds\n", 385 | "time using GPU, moving data from CPU: 0.135 seconds\n", 386 | "time using GPU on pinned CPU memory: 0.0728 seconds\n", 387 | "time using the GPU alone: 0.0174 seconds\n" 388 | ] 389 | }, 390 | { 391 | "data": { 392 | "image/png": "\n", 393 | "text/plain": [ 394 | "
" 395 | ] 396 | }, 397 | "metadata": { 398 | "needs_background": "light" 399 | }, 400 | "output_type": "display_data" 401 | }, 402 | { 403 | "name": "stdout", 404 | "output_type": "stream", 405 | "text": [ 406 | "Your GPU is 80.1x faster than CPU but only 10.3x if data is repeatedly copied from the CPU\n", 407 | "When copying from pinned memory, speedup is 19.1x\n", 408 | "Numerical differences between GPU and CPU: 0.0002938236575573683\n" 409 | ] 410 | } 411 | ], 412 | "source": [ 413 | "import torch, time\n", 414 | "from matplotlib import pyplot as plt\n", 415 | "\n", 416 | "# Here is a demonstration of moving data between GPU and CPU.\n", 417 | "# We multiply a batch of vectors through a big linear opeation 10 times\n", 418 | "r = torch.randn(1024, 1024, dtype=torch.float)\n", 419 | "x = torch.randn(32768, 1024, dtype=r.dtype)\n", 420 | "iterations = 10\n", 421 | "\n", 422 | "def time_iterated_mm(x, matrix):\n", 423 | " start = time.time()\n", 424 | " result = 0\n", 425 | " for i in range(iterations):\n", 426 | " result += torch.mm(matrix, x.to(matrix.device).t())\n", 427 | " torch.cuda.synchronize()\n", 428 | " elapsed = time.time() - start\n", 429 | " return elapsed, result.cpu()\n", 430 | "\n", 431 | "cpu_time, cpu_result = time_iterated_mm(x.cpu(), r.cpu())\n", 432 | "print(f'time using the CPU alone: {cpu_time:.3g} seconds')\n", 433 | "\n", 434 | "mixed_time, mixed_result = time_iterated_mm(x.cpu(), r.cuda())\n", 435 | "print(f'time using GPU, moving data from CPU: {mixed_time:.3g} seconds')\n", 436 | "\n", 437 | "pinned_time, pinned_result = time_iterated_mm(x.cpu().pin_memory(), r.cuda())\n", 438 | "print(f'time using GPU on pinned CPU memory: {pinned_time:.3g} seconds')\n", 439 | "\n", 440 | "gpu_time, gpu_result = time_iterated_mm(x.cuda(), r.cuda())\n", 441 | "print(f'time using the GPU alone: {gpu_time:.3g} seconds')\n", 442 | "\n", 443 | "plt.figure(figsize=(4,2), dpi=150)\n", 444 | "plt.ylabel('iterations per sec')\n", 445 | "plt.bar(['cpu', 'mixed', 'pinned', 'gpu'],\n", 446 | " [iterations/cpu_time,\n", 447 | " iterations/mixed_time,\n", 448 | " iterations/pinned_time,\n", 449 | " iterations/gpu_time])\n", 450 | "plt.show()\n", 451 | "\n", 452 | "print(f'Your GPU is {cpu_time / gpu_time:.3g}x faster than CPU'\n", 453 | " f' but only {cpu_time / mixed_time:.3g}x if data is repeatedly copied from the CPU')\n", 454 | "print(f'When copying from pinned memory, speedup is {cpu_time / pinned_time:.3g}x')\n", 455 | "print(f'Numerical differences between GPU and CPU: {(cpu_result - gpu_result).norm() / cpu_result.norm()}')" 456 | ] 457 | }, 458 | { 459 | "cell_type": "markdown", 460 | "metadata": {}, 461 | "source": [ 462 | "### Exercise\n", 463 | "\n", 464 | "Repeat the benchmark using type `torch.double`. What does that tell you about your GPU hardware?" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 38, 470 | "metadata": {}, 471 | "outputs": [], 472 | "source": [ 473 | "# TODO: Repeat the benchmark using type torch.double.\n", 474 | "r = 'TODO'\n", 475 | "x = 'TODO'\n", 476 | "\n", 477 | "# Benchmark and plot" 478 | ] 479 | }, 480 | { 481 | "cell_type": "markdown", 482 | "metadata": {}, 483 | "source": [ 484 | "Performance tips\n", 485 | "----------------\n", 486 | "\n", 487 | "**GPU operations are async.** When pytorch operates on GPU tensors, the python code does not wait for computations to complete. Sp GPU calculations get queued up, and they will be done as quickly as possible in the background while your python is free to work on other things like loading the next batch of training data.\n", 488 | "\n", 489 | "**Moving data to cpu waits for computations.** You do not need to worry about the GPU asynchrony, because as soon as you actually ask to look at the data, e.g., when you move GPU data to CPU (or print it or save it), pytorch will block and wait for the GPU operations to finish computing what you need before proceeding. The call seen above to `torch.cuda.synchronize()` flushes the GPU queue without requesting the data, but you will not need to do this unless you are doing performance timing.\n", 490 | "\n", 491 | "**Pinned memory transfers are async and faster.** Copying data from CPU to GPU can be sped up if the CPU data is put in pinned memory (i.e., at a fixed non-swappable block of RAM). Therefore when data loaders gather together lots of CPU data that is destined for the GPU, they should be configured to stream their results into pinned memory. See the performance comparison above." 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": {}, 497 | "source": [ 498 | "pytorch Tensor dimension-ordering conventions\n", 499 | "---------------------------------------------\n", 500 | "\n", 501 | "**Multidimensional data convention.** As soon as you have more than one dimension, you need to decide how to order the axes. To reduce confusion, most data processing follows the same global convention. In particular, much image-related data in pytorch is four dimensional, and the dimensions are ordered like this: `data[batch_index, channel_index, y_position, x_position]`, that is:\n", 502 | "\n", 503 | "* Dimension 0 is used to index separate images within a batch.\n", 504 | "* Dimension 1 indexes channels within an image representation (e.g., 0,1,2 = R,G,B, or more dims for more channels).\n", 505 | "* Dimension 2 (if present) indexes the row position (y-value, starting from the top)\n", 506 | "* Dimension 3 (if present) indexes the column position (x-value, starting from the left)\n", 507 | "\n", 508 | "There a way to remember this ordering: adjacent entries that vary only in the last dimensions are stored physically closer in RAM; since they are often combined with each other, this could help with locality, whereas the first (batch) dimension usually just groups separate independent data points which are not combined much, so they do not need to be physically close.\n", 509 | "\n", 510 | "Stream-oriented data without grid geometry will drop the last dimensions, and 3d grid data will be 5-dimensional, adding a depth z before y. This same 4d-axis ordering convention is also seen in caffe and tensorflow.\n", 511 | "\n", 512 | "Separate tensors can be put together into a single batch tensor using `torch.cat([a, b, c])` or `torch.stack([a, b, c])`. (The difference: `cat` doesn't add any new dimensions but just concatenates along the existing 0th dimension. `stack` adds a new 0th dimension for the batch.)\n", 513 | "\n", 514 | "**Multidimensional linear operation convention.** When storing matrix weights or convolution weights, linear algebra conventions are followed\n", 515 | "* Dimension 0 (number of rows) matches the output channel dimension\n", 516 | "* Dimension 1 (number of columns) matches the input channel dimension\n", 517 | "* Dimension 2 (if present) is the convolutional kernel y-dimension\n", 518 | "* Dimension 3 (if present) is the convolutional kernel x-dimension\n", 519 | "\n", 520 | "Since this convention assumes channels are arranged in different rows whereas the data convention puts different batch items in different rows, some axis transposition is often needed before applying linear algebra to the data.\n", 521 | "\n", 522 | "**Permute and view reshape an array without moving memory.** The `permute` and `view` methods are useful for rearranging, flattening, and unflatteneing axes. `x.permute(1,0,2,3).view(x.shape[1], -1)`. They just alter the view of the block of numbers in memory without moving any of the numbers around, so they are fast.\n", 523 | "\n", 524 | "**Reshaping sometimes needs copying.** Some sequences of axis permutations and flattenings cannot be done without copying the data into the new order in memory; the `x.contiguous()` method copies the data iinto the natural order given by the current view; also `x.reshape()` is similar to `view` but will makea copy if necessary so you do not need to think about it. See [the Tensor.view method documentation](https://pytorch.org/docs/master/tensors.html#torch.Tensor.view).\n" 525 | ] 526 | }, 527 | { 528 | "cell_type": "markdown", 529 | "metadata": {}, 530 | "source": [ 531 | "### Exercise\n", 532 | "\n", 533 | "Use `torch.randn` to create a four-dimensional tensor `x` of size (2,3,4,5), which could store two 5x4 RGB images.\n", 534 | "\n", 535 | "Then print three things:\n", 536 | " * print `x`.\n", 537 | " * Use `x.permute` to switch the horizontal and vertical (last two) dimensions.\n", 538 | " * Use `x.view` to see each image as a flat vector of 60 numbers." 539 | ] 540 | }, 541 | { 542 | "cell_type": "code", 543 | "execution_count": 60, 544 | "metadata": {}, 545 | "outputs": [ 546 | { 547 | "name": "stdout", 548 | "output_type": "stream", 549 | "text": [ 550 | "TODO\n", 551 | "TODO\n", 552 | "TODO\n" 553 | ] 554 | } 555 | ], 556 | "source": [ 557 | "# TODO make x of size (2,3,4,5), and print three rearrangements of x\n", 558 | "x = 'TODO'\n", 559 | "print(x)\n", 560 | "print('TODO')\n", 561 | "print('TODO')" 562 | ] 563 | }, 564 | { 565 | "cell_type": "markdown", 566 | "metadata": {}, 567 | "source": [ 568 | "## Special topic: einsum notation\n", 569 | "\n", 570 | "Matrix multiplication can be generalized to tensors of arbitrary number of dimensions, but keeping tensor dimensions straight can be confusing. The solution to this is [Einstein notation](https://en.wikipedia.org/wiki/Einstein_notation): assign letter variables to each axis of the input tensors, and then explicitly write down which axes end up in the output tensor. For example, an outer product might be written as `i, j -> ij`, whereas matrix multiplication could be `ij, jk -> ik`.\n", 571 | "\n", 572 | "Einstein notation is a topic of active development and programming language design: [here is a recent paper on the history and future of Einstein APIs.](https://openreview.net/pdf?id=oapKSVM2bcj)\n", 573 | "\n", 574 | "\n", 575 | "In pytorch, Einstein notation is available as `einsum`. Here is how ordinary matrix multiplication looks as einsum:" 576 | ] 577 | }, 578 | { 579 | "cell_type": "code", 580 | "execution_count": 58, 581 | "metadata": {}, 582 | "outputs": [ 583 | { 584 | "name": "stdout", 585 | "output_type": "stream", 586 | "text": [ 587 | "tensor([[ 3.2591, -0.9139, 3.3531],\n", 588 | " [ 4.6914, -1.4011, 5.6399]])\n" 589 | ] 590 | } 591 | ], 592 | "source": [ 593 | "A = torch.randn(2,5)\n", 594 | "B = torch.randn(5,3)\n", 595 | "\n", 596 | "# Uncomment to see ordinary matrix multiplication\n", 597 | "# print(torch.mm(A, B))\n", 598 | "\n", 599 | "# Ordinary matrix multiplication written as an einsum\n", 600 | "print(torch.einsum('ij, jk -> ik', A, B))" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": {}, 606 | "source": [ 607 | "### Exercise\n", 608 | "\n", 609 | "Make A in the shape (5, 6, 2) and B in the shape (5, 6, 3); we can think of A as a 5x6 grid of 2-dimensional vectors and B as a 5x6 of 3-dimesnsional vectors.\n", 610 | "\n", 611 | "Covariances (un-normalized) of vectors in A and B could be computed by flattening and transposing the tensors into (2,30) and (30,3) matrices and then doing a matrix multiplication of these batches as follows:\n", 612 | "\n", 613 | "```\n", 614 | "print(torch.mm(A.reshape(30, 2).t(), B.reshape(30, 3)))\n", 615 | "```\n", 616 | "\n", 617 | "Instead use einsum to compute the same thing.\n" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": 56, 623 | "metadata": {}, 624 | "outputs": [ 625 | { 626 | "name": "stdout", 627 | "output_type": "stream", 628 | "text": [ 629 | "TODO\n" 630 | ] 631 | } 632 | ], 633 | "source": [ 634 | "# TODO: use einsum to compute a covariance statistic over vectors in A and B.\n", 635 | "A = torch.randn(5,6,2)\n", 636 | "B = torch.randn(5,6,3)\n", 637 | "\n", 638 | "\n", 639 | "print('TODO')" 640 | ] 641 | }, 642 | { 643 | "cell_type": "markdown", 644 | "metadata": {}, 645 | "source": [ 646 | "### [On to topic 2: Autograd →](2-Pytorch-Autograd.ipynb)" 647 | ] 648 | } 649 | ], 650 | "metadata": { 651 | "accelerator": "GPU", 652 | "kernelspec": { 653 | "display_name": "Python 3 (ipykernel)", 654 | "language": "python", 655 | "name": "python3" 656 | }, 657 | "language_info": { 658 | "codemirror_mode": { 659 | "name": "ipython", 660 | "version": 3 661 | }, 662 | "file_extension": ".py", 663 | "mimetype": "text/x-python", 664 | "name": "python", 665 | "nbconvert_exporter": "python", 666 | "pygments_lexer": "ipython3", 667 | "version": "3.9.9" 668 | } 669 | }, 670 | "nbformat": 4, 671 | "nbformat_minor": 4 672 | } 673 | -------------------------------------------------------------------------------- /notebooks/2-Pytorch-Autograd.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Topic 2: Pytorch Autograd\n", 8 | "================\n", 9 | "\n", 10 | "If you flag a torch Tensor with the attribute `x.requires_grad=True`, then pytorch will automatically keep track the computational history of all tensors that are derived from `x`. This allows pytorch to figure out derivatives of any scalar result with regard to changes in the components of x.\n", 11 | "\n", 12 | "\n", 13 | "\n", 14 | "The function `torch.autograd.grad(output_scalar, [list of input_tensors])` computes `d(output_scalar)/d(input_tensor)` for each input tensor component in the list. For it to work, the input tensors and output must be part of the same `requires_grad=True` compuation.\n", 15 | "\n", 16 | "In the example here, `x` is explicitly marked `requires_grad=True`, so `y.sum()`, which is derived from `x`, automatically comes along with the computation history, and can be differentiated." 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": 37, 22 | "metadata": {}, 23 | "outputs": [ 24 | { 25 | "data": { 26 | "image/png": "\n", 27 | "text/plain": [ 28 | "
" 29 | ] 30 | }, 31 | "metadata": { 32 | "needs_background": "light" 33 | }, 34 | "output_type": "display_data" 35 | } 36 | ], 37 | "source": [ 38 | "import torch\n", 39 | "from matplotlib import pyplot as plt\n", 40 | "\n", 41 | "x = torch.linspace(0, 5, 100,\n", 42 | " requires_grad=True)\n", 43 | "y = (x**2).cos()\n", 44 | "s = y.sum()\n", 45 | "[dydx] = torch.autograd.grad(s, [x])\n", 46 | "\n", 47 | "plt.plot(x.detach(), y.detach(), label='y')\n", 48 | "plt.plot(x.detach(), dydx, label='dy/dx')\n", 49 | "plt.legend()\n", 50 | "plt.show()" 51 | ] 52 | }, 53 | { 54 | "cell_type": "markdown", 55 | "metadata": {}, 56 | "source": [ 57 | "(Note that in the example above, because the components of the vector space are independent of each other, we happen to have `dy[j] / dx[i] == 0` when `j != i`, so that `d(y.sum())/dx[i] = dy[i]/dx[i]`. That means computing a single gradient vector of the sum `s` is equiavlent to computing elementwise derivatives `dy/dx`.)\n", 58 | "\n", 59 | "**Detaching tensors from the computation history.** Every tensor that depends on `x` will be `requires_grad=True` and connected to the complete computation history. But if you were to convert a tensor to a regular python number, pytorch would not be able to see the calculations and would not be able to compute gradients on it.\n", 60 | "\n", 61 | "To avoid programming mistakes where some computation invisibly goes through a non-pytorch number that cannot be tracked, pytorch disables requires-grad tensors from being converted to untrackable numbers. You need to explicitly call `x.detach()` or `y.detach()` first, to explicitly say that you want an untracked reference, before plotting the data or using it as non-pytorch numbers." 62 | ] 63 | }, 64 | { 65 | "cell_type": "markdown", 66 | "metadata": {}, 67 | "source": [ 68 | "### Exercise\n", 69 | "\n", 70 | "Plot the polynomial y=x3-6x2+8x and its derivative, instead of cos(x2)." 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 69, 76 | "metadata": {}, 77 | "outputs": [], 78 | "source": [ 79 | "# TODO: set y to the given polynomial of x\n", 80 | "y = 'TODO'\n", 81 | "\n", 82 | "# TODO: use autograd to compute the derivative\n", 83 | "\n", 84 | "# TODO: plot the results.\n" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": {}, 90 | "source": [ 91 | "Backprop and In-place gradients\n", 92 | "-------------------------------\n", 93 | "\n", 94 | "In a typical neural network we will not just be getting gradients with regard to one input like `x` above, but with regard to a list of dozens or hundreds of tensor parameters that have all been marked with `requires_grad=True`. It can be inconvenient to keep track of which gradient outputs go with which original tensor input. But since the gradients have exactly the same shape as the inputs, it is natural to store computed gradients in-place on the tensors themselves.\n", 95 | "\n", 96 | "**Using `backward()` to add `.grad` attributes.** To simplify this common operation, pytorch provides the `y.backward()` method, which computes the gradients of y with respect to every tracked dependency, and stores the results in the field `x.grad` for every original input vector `x` that was marked as `requires_grad=True`." 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 70, 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "name": "stdout", 106 | "output_type": "stream", 107 | "text": [ 108 | "tensor([-0.0000e+00, -2.5765e-04, -2.0612e-03, -6.9560e-03, -1.6485e-02,\n", 109 | " -3.2185e-02, -5.5575e-02, -8.8145e-02, -1.3133e-01, -1.8650e-01,\n", 110 | " -2.5487e-01, -3.3752e-01, -4.3528e-01, -5.4869e-01, -6.7791e-01,\n", 111 | " -8.2262e-01, -9.8193e-01, -1.1543e+00, -1.3373e+00, -1.5279e+00,\n", 112 | " -1.7218e+00, -1.9138e+00, -2.0978e+00, -2.2665e+00, -2.4118e+00,\n", 113 | " -2.5246e+00, -2.5954e+00, -2.6144e+00, -2.5720e+00, -2.4592e+00,\n", 114 | " -2.2684e+00, -1.9940e+00, -1.6330e+00, -1.1861e+00, -6.5843e-01,\n", 115 | " -5.9786e-02, 5.9438e-01, 1.2829e+00, 1.9791e+00, 2.6508e+00,\n", 116 | " 3.2620e+00, 3.7737e+00, 4.1467e+00, 4.3434e+00, 4.3315e+00,\n", 117 | " 4.0872e+00, 3.5983e+00, 2.8676e+00, 1.9159e+00, 7.8273e-01,\n", 118 | " -4.7262e-01, -1.7729e+00, -3.0265e+00, -4.1327e+00, -4.9894e+00,\n", 119 | " -5.5028e+00, -5.5970e+00, -5.2252e+00, -4.3782e+00, -3.0925e+00,\n", 120 | " -1.4526e+00, 4.1007e-01, 2.3249e+00, 4.0956e+00, 5.5192e+00,\n", 121 | " 6.4094e+00, 6.6222e+00, 6.0798e+00, 4.7897e+00, 2.8560e+00,\n", 122 | " 4.7794e-01, -2.0646e+00, -4.4405e+00, -6.3087e+00, -7.3680e+00,\n", 123 | " -7.4080e+00, -6.3531e+00, -4.2917e+00, -1.4813e+00, 1.6739e+00,\n", 124 | " 4.6748e+00, 7.0040e+00, 8.2157e+00, 8.0255e+00, 6.3823e+00,\n", 125 | " 3.5034e+00, -1.3781e-01, -3.8789e+00, -6.9824e+00, -8.7814e+00,\n", 126 | " -8.8286e+00, -7.0156e+00, -3.6318e+00, 6.6059e-01, 4.9416e+00,\n", 127 | " 8.2239e+00, 9.6828e+00, 8.8724e+00, 5.8738e+00, 1.3235e+00])\n" 128 | ] 129 | }, 130 | { 131 | "data": { 132 | "image/png": "\n", 133 | "text/plain": [ 134 | "
" 135 | ] 136 | }, 137 | "metadata": { 138 | "needs_background": "light" 139 | }, 140 | "output_type": "display_data" 141 | } 142 | ], 143 | "source": [ 144 | "x = torch.linspace(0, 5, 100, requires_grad=True)\n", 145 | "y = (x**2).cos()\n", 146 | "y.sum().backward() # populates the grad attribute below.\n", 147 | "print(x.grad)\n", 148 | "\n", 149 | "plt.plot(x.detach(), y.detach(), label='y')\n", 150 | "plt.plot(x.detach(), x.grad, label='dy/dx')\n", 151 | "plt.legend()\n", 152 | "plt.show()" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "### Exercise\n", 160 | "\n", 161 | "1. Introduce a new vector x2 which also ranges from zero to five (same as x, but not cloned from x).\n", 162 | "2. Plot the polynomial y=x23-6x2+8x\n", 163 | "3. Plot dy/dx and dy/dx2.\n", 164 | "\n", 165 | "Which x contributes most to the gradient at zero? At five?" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 72, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [ 174 | "# TODO: define x2 just like x, but not cloned from x\n", 175 | "x2 = 'TODO'\n", 176 | "\n", 177 | "# TODO: Plot the given polynomial which depends on both x and x2\n", 178 | "\n", 179 | "# TODO: Plot both dy/dx and dy/dx2. Explain what you get." 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "Accumulating and Zeroing grad\n", 187 | "-----------------------------\n", 188 | "\n", 189 | "**Gradient accumulation.** If you find that your data batches are too large to get gradients of the\n", 190 | "whole thing, then it is usually possible to split the batches into smaller pieces and add the\n", 191 | "gradients. Because gradient accumulation is a common pattern, if you call `.backward()` when parameters\n", 192 | "`x.grad` already exists, it is not an error. The new gradient will be *added* to the old one.\n", 193 | "\n", 194 | "**zero_grad().** That means that you need to set any previous value of `x.grad` to zero before\n", 195 | "running `backward()`, or else the new gradient will be added to the old one. Optimizers have a\n", 196 | "utility `optim.zero_grad()` to do this to all the optimized parameters at once." 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": {}, 202 | "source": [ 203 | "Saving memory on inference\n", 204 | "--------------------------\n", 205 | "\n", 206 | "**Avoid autograd when you don't need it.** Normally, all the parameters of a neural network are set to `requires_grad=True` by default, so they are ready to be trained. But that means that whenever you run a network, you will get output which is also requires-grad, and it will be attached to a long computation history that consumes a lot of precious GPU memory.\n", 207 | "\n", 208 | "To avoid all this expense when you have no intention of training the network, you could go through all the network parameters to set `requires_grad=False`.\n", 209 | "\n", 210 | "Another way to avoid the computation history is to enclose the entire computation within a `with torch.no_grad():` block. This will suppress all the autograd mechanics (which means, of course, `.backward()` will not function).\n", 211 | "\n", 212 | "Note that this is different from the role of `net.eval()` which puts puts the network in inference mode computationally (batchnorm, dropout, and other operations behave differently in training and inference); `net.eval()` does not have any effect on `requires_grad`." 213 | ] 214 | }, 215 | { 216 | "cell_type": "markdown", 217 | "metadata": {}, 218 | "source": [ 219 | "More tricks\n", 220 | "-----------\n", 221 | "\n", 222 | "**Gradients over intermediate values.** Normally gradients with respect to intermediate values are not stored in `.grad` - just original input variables - but you can ask for intermediate gradients to be stored using `v.retain_grad()`.\n", 223 | "\n", 224 | "**Second derivatives.** If you want higher-order derivatives, then you want pytorch to build the computation graph when it is computing the gradient itself, so this graph can be differentiated again. To do this, use the `create_graph=True` option on the `grad` or `backward` methods.\n", 225 | "\n", 226 | "**Gradients of more than one objective.** Usually you only need to call `y.backward()` once per computation tree, and pytorch will not let you call it again. To save memory, pytorch will have deallocated the computation graph after you have computed a single gradient. But if you need more than one gradient (e.g., if you have different objectives that you want to apply to different subsets of parameters, as with happens with GANs sometimes), you can use `retain_graph=True`.\n" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": {}, 232 | "source": [ 233 | "### Exercise\n", 234 | "\n", 235 | "1. Plot the polynomial y=x3-6x2+8x, just as in the first exercise.\n", 236 | "2. Use `y.sum().backward(create_graph=True)` to compute the gradient, and plot dy/dx. Why is `x.grad.detach()` needed now?\n", 237 | "3. Now set `dy = x.grad.clone()` and then `x.grad.zero_()`, before using `dy.sum().backward()` to compute a 2nd gradient. Plot d2y/dx." 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 74, 243 | "metadata": {}, 244 | "outputs": [], 245 | "source": [ 246 | "# TODO: define the polynomial just like the first exercise\n", 247 | "y = 'TODO'\n", 248 | "\n", 249 | "# TODO: Use `backward(create_graph=True)` to compute dy/dx. Plot it.\n", 250 | "\n", 251 | "# TODO: Use `backward()` a second time to compute the second derivative. Plot it." 252 | ] 253 | }, 254 | { 255 | "cell_type": "markdown", 256 | "metadata": {}, 257 | "source": [ 258 | "### [On to topic 3: Optimizers →](3-Pytorch-Optimizers.ipynb)" 259 | ] 260 | } 261 | ], 262 | "metadata": { 263 | "accelerator": "GPU", 264 | "kernelspec": { 265 | "display_name": "Python 3 (ipykernel)", 266 | "language": "python", 267 | "name": "python3" 268 | }, 269 | "language_info": { 270 | "codemirror_mode": { 271 | "name": "ipython", 272 | "version": 3 273 | }, 274 | "file_extension": ".py", 275 | "mimetype": "text/x-python", 276 | "name": "python", 277 | "nbconvert_exporter": "python", 278 | "pygments_lexer": "ipython3", 279 | "version": "3.9.9" 280 | } 281 | }, 282 | "nbformat": 4, 283 | "nbformat_minor": 4 284 | } 285 | -------------------------------------------------------------------------------- /notebooks/5-Pytorch-Dataloader.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# To save time, start this download first, before reading through the examples.\n", 10 | "import torch, torchvision, os\n", 11 | "if not os.path.isfile('datasets/miniplaces/train/yard/00001000.jpg'):\n", 12 | " torchvision.datasets.utils.download_and_extract_archive(\n", 13 | " 'http://dissect.csail.mit.edu/datasets/miniplaces.zip',\n", 14 | " 'datasets', md5='bfabeb497c7eca01c74cd8441a9ac108')" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "Datasets and Dataloaders in pytorch\n", 29 | "===================================\n", 30 | "\n", 31 | "Data sets can be thought of as big arrays of data. If the data set is small enough (e.g., MNIST, which has 60,000 28x28 grayscale images), a dataset can be literally represented as an array - or more precisely, as a single pytorch tensor. With one number per pixel, MNIST takes about 200 megabytes of RAM, which fits comfortably into a modern computer.\n", 32 | "\n", 33 | "But larger-scale datasets like ImageNet or Places365 have more than a million higher-resolution full-color images. In these cases, an ordinary python array or pytorch tensor would require more than a terabyte of RAM, which is impractical on most computers.\n", 34 | "\n", 35 | "Instead, we need to load the data from disk (or SSD). Unfortunately, the latency of loading from disk is very slow compared to RAM, so we need to do the loading cleverly if we want to load the data quickly.\n", 36 | "\n", 37 | "To solve the problem, pytorch provides two classes:\n", 38 | " * `torch.utils.data.Dataset` - This very simple base class represents an array where the actual data may be slow to fetch, typically because the data is in disk files that require some loading, decoding, or other preprocessing. Pytorch provides a variety of different `Dataset` subclasses. For example, there is a handy one called `ImageFolder` that treats a directory tree of image files as an array of classified images.\n", 39 | " * `torch.utils.data.DataLoader` - This fancy class wraps a `Dataset` as a stream of data batches. Behind the scenes it uses a few techniques to feed the data faster. You do not need to subclass `DataLoader` - its purpose is to make a `Dataset` speedy." 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "## Looking at an image data set using ImageFolder\n", 47 | "\n", 48 | "The most common `Dataset` used in computer vision is `ImageFolder`, which loads a set of image files from a directory tree. It treats every subdirectory of images as a classification category. To demonstrate it, we will use it to load images from the miniplaces dataset loaded above.\n", 49 | "\n", 50 | "**Directory layout.** Notice that `datasets/miniplaces/val` contains a set of 100 directories with names like `golf_course`. Each of these directories contains 100 images, each stored as a jpeg file: 10000 images in total." 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "ls datasets/miniplaces/val/golf_course" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "**Constructing an ImageFolder.** Making an ImageFolder at the root directory of the data set creats an object that behaves like an array: it has a length, and each entry contains a tuple with an image and a number. The image is stored as a `PIL` object which is a standard python object for images, and the number denotes the classification class - with one class for each folder, numbered in alphabetical order." 67 | ] 68 | }, 69 | { 70 | "cell_type": "code", 71 | "execution_count": null, 72 | "metadata": {}, 73 | "outputs": [], 74 | "source": [ 75 | "val_set = torchvision.datasets.ImageFolder('datasets/miniplaces/val')\n", 76 | "print('Length is', len(val_set))\n", 77 | "item = val_set[5100]\n", 78 | "print('5100th item is a pair', item)\n", 79 | "\n", 80 | "# Display the PIL image and the class name directly.\n", 81 | "display(item[0])\n", 82 | "print('Class name is', val_set.classes[item[1]])" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "**Transforming the PIL image into a pytorch tensor.** A PIL image is not convenient for training: we would prefer our data set to return pytorch tensors. So we can tell `ImageFolder` to do this by specifying the `transform` function on construction. Pytorch comes with a standard transform function `torchvision.transforms.ToTensor()` which converts an image to a pytorch tensor.\n", 90 | "\n", 91 | "Now when indexing into the data set, we will get a pytorch tensor instead of a PIL image." 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": null, 97 | "metadata": {}, 98 | "outputs": [], 99 | "source": [ 100 | "val_set = torchvision.datasets.ImageFolder(\n", 101 | " 'datasets/miniplaces/val',\n", 102 | " transform=torchvision.transforms.ToTensor())\n", 103 | "print(val_set[1000])\n", 104 | "\n", 105 | "# There is an inverse transform that can be used to convert it back to a PIL image,\n", 106 | "# handy if we want to see it.\n", 107 | "as_image = torchvision.transforms.ToPILImage()\n", 108 | "display(as_image(val_set[1000][0]))" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": {}, 114 | "source": [ 115 | "## Fast Dataset Access using DataLoader\n", 116 | "\n", 117 | "When we use a dataset for training, we will usually run through the whole dataset in batches. We could do this ourselves, as in line 6-8 below, by just fetching the images one at a time and grouping them.\n", 118 | "\n", 119 | "But a faster way to iterate through the dataset is to wrap our `val_set` object in a `torch.utils.data.DataLoader` object, as shown on line 14-18 below. The `val_loader` we get can magically pull data out of the Dataset much faster than doing it in the smiple way; the `DataLoader` class does this by using several threads to load and prefetch the data.\n", 120 | "\n", 121 | "The speedup will depend on the system and the number of threads you use (the number of threads to use is specified using `num_workers`). In practice using `DataLoader` will typically be 5-20 times faster than direct `Dataset` access." 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": null, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "import time\n", 131 | "\n", 132 | "print('Going over the data set as an array.')\n", 133 | "start = time.time()\n", 134 | "summed_image_dataset = 0\n", 135 | "batch_size = 100\n", 136 | "for i in range(0, len(val_set), batch_size):\n", 137 | " image_batch = torch.stack([val_set[i+j][0] for j in range(batch_size)])\n", 138 | " summed_image_dataset += image_batch.sum(0)\n", 139 | "end = time.time()\n", 140 | "print(f'Took {end - start} seconds')\n", 141 | "\n", 142 | "print('Going over the same dataset using a dataloader.')\n", 143 | "start = time.time()\n", 144 | "val_loader = torch.utils.data.DataLoader(\n", 145 | " val_set, batch_size=batch_size, num_workers=10)\n", 146 | "summed_image_loader = 0\n", 147 | "for image_batch, label_batch in val_loader:\n", 148 | " summed_image_loader += image_batch.sum(0)\n", 149 | "end = time.time()\n", 150 | "print(f'Took {end - start} seconds')\n", 151 | "\n", 152 | "print('Numerical difference is exactly', (summed_image_loader - summed_image_dataset).norm().item())" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "### Exercise\n", 160 | "\n", 161 | "1. Try adjusting `num_workers` down to 1 and up to 100. How does this affect the speed?\n", 162 | "2. Try changing `batch_size` down to 1 or up to 1000.\n", 163 | "\n", 164 | "**Note**: the speed differences you see will depend on the specifics of your system setup.\n", 165 | "If you are running on Google Colab, you may not see much of a speedup from DataLoader.\n", 166 | "This is because Colab provides a very low-latency virtual disk (so direct Dataset access\n", 167 | "is faster than on a regular computer), and a virtual CPU with very slow concurrency\n", 168 | "(so DataLoader multithreading is slower than normal)." 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": null, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "# TODO: copy the code above and alter:\n", 178 | "# 1. num_workers and note the changes in speed\n", 179 | "# 2. batch_size and note the changes in speed." 180 | ] 181 | }, 182 | { 183 | "cell_type": "markdown", 184 | "metadata": {}, 185 | "source": [ 186 | "**Other common dataloader tricks.** `DataLoader` can do a few more useful things.\n", 187 | "\n", 188 | " * Although a DataLoader does not put batches on the GPU directly (because of multithreading limitations), it *can* put the batch in pinned memory, which is faster to copy to the GPU later after you get it out of the DataLoader. Make the DataLoader with `pin_memory=True` for this.\n", 189 | " * During training you usually do not want the batches in alphabetical order. The DataLoader can shuffle the batches so that they are randomized, instead of sequential. `shuffle=True` for this.\n", 190 | "\n", 191 | " \n", 192 | " " 193 | ] 194 | }, 195 | { 196 | "cell_type": "markdown", 197 | "metadata": {}, 198 | "source": [ 199 | "## Using a DataLoader for Training\n", 200 | "\n", 201 | "We can put everything together by using the data from a data loader to train a classifier.\n", 202 | "\n", 203 | "The following is a simplistic example of training an image classifier. It uses the Adam optimizer and the ResNet-18 neural network architecture, and trains for a couple minutes, just passing once over the training set." 204 | ] 205 | }, 206 | { 207 | "cell_type": "code", 208 | "execution_count": null, 209 | "metadata": {}, 210 | "outputs": [], 211 | "source": [ 212 | "from tqdm import tqdm\n", 213 | "\n", 214 | "# Create a Dataset of miniplaces training images.\n", 215 | "train_set = torchvision.datasets.ImageFolder(\n", 216 | " 'datasets/miniplaces/train',\n", 217 | " torchvision.transforms.ToTensor())\n", 218 | "\n", 219 | "# Wrap the Dataset in a high-speed DataLoader with batch_size 100.\n", 220 | "train_loader = torch.utils.data.DataLoader(\n", 221 | " train_set, batch_size=100, num_workers=10,\n", 222 | " shuffle=True,\n", 223 | " pin_memory=True)\n", 224 | "\n", 225 | "# Create an untrained neural network using the ResNet 18 architecture.\n", 226 | "model = torchvision.models.resnet18(num_classes=100).cuda()\n", 227 | "\n", 228 | "# Set up the model for training using the Adam optimizer.\n", 229 | "model.train()\n", 230 | "optimizer = torch.optim.Adam(model.parameters(), lr=0.01)\n", 231 | "\n", 232 | "# To train, optimize an objective on batches of training data.\n", 233 | "# Here we look at every training image once.\n", 234 | "for batch in tqdm(train_loader):\n", 235 | " images, labels = [d.cuda() for d in batch]\n", 236 | " optimizer.zero_grad()\n", 237 | " scores = model(images.cuda())\n", 238 | " loss = torch.nn.functional.cross_entropy(scores, labels)\n", 239 | " loss.backward()\n", 240 | " optimizer.step()" 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "## Checking Accuracy with a Held-Out Dataset\n", 248 | "\n", 249 | "To check if network has learned anything useful, we can check whether the model can make good predictions on unseen images. The easy way to do this is to create a second `ImageFolder` dataset (and `DataLoader`) with a second set of images that was **not** used for training.\n", 250 | "\n", 251 | "While the achieved accuracy after a couple minutes of training is not perfect, it is already much better than random." 252 | ] 253 | }, 254 | { 255 | "cell_type": "code", 256 | "execution_count": null, 257 | "metadata": {}, 258 | "outputs": [], 259 | "source": [ 260 | "# Create a validation dataset and data loader.\n", 261 | "val_set = torchvision.datasets.ImageFolder(\n", 262 | " 'datasets/miniplaces/val',\n", 263 | " torchvision.transforms.ToTensor())\n", 264 | "val_loader = torch.utils.data.DataLoader(\n", 265 | " val_set, batch_size=100, num_workers=10,\n", 266 | " pin_memory=True)\n", 267 | "\n", 268 | "# This function runs over the validation images and counts accurate predictions.\n", 269 | "def accuracy():\n", 270 | " model.eval()\n", 271 | " correct = 0\n", 272 | " for iter, batch in enumerate(val_loader):\n", 273 | " images, labels = [d.cuda() for d in batch]\n", 274 | " with torch.no_grad():\n", 275 | " scores = model(images.cuda())\n", 276 | " correct += (scores.max(1)[1] == labels).float().sum()\n", 277 | " return correct.item() / len(val_set)\n", 278 | "\n", 279 | "print(f'Accuracy on unseen images {accuracy() * 100}% (random guesses would be 1%)')" 280 | ] 281 | }, 282 | { 283 | "cell_type": "markdown", 284 | "metadata": {}, 285 | "source": [ 286 | "### Exercise\n", 287 | "\n", 288 | "1. For every 10th batch, display the first image in the batch.\n", 289 | "2. Also print the predicted class name and the true class name for that image.\n", 290 | "\n", 291 | "Hints:\n", 292 | "* Use the `as_image` function defined in a previous cell.\n", 293 | "* Use `images[0].cpu()` to move the image to the CPU before displaying it.\n", 294 | "* The prediction of the network for the 0th item of the batch is `scores.max(1)[1][0]`\n", 295 | "* Use `val_set.classes[pred]` to convert the numerical prediction to a readable label." 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": {}, 301 | "source": [ 302 | "## Improving Training using Data Augmentation\n", 303 | "\n", 304 | "One of the main ways to stretch a data set to make it more effective for training is to randomly adjust the images. For example if we randomly adjust the crop, color, or orientation of the image while loading, using the same image file multiple times will produce different training examples for the network. This is an easy way to increase the amount of training diversity in the data set without requring more actual images.\n", 305 | "\n", 306 | "To do data augmentation in a pytorch `Dataset`, you can specify more operations on `transform=` besides `ToTensor()`.\n", 307 | "\n", 308 | "In particular, there is a `Compose` transform that makes it easy to chain a series of data transformations; and `torchvision.transforms` includes a number of useful image transforms such as random resized crops and image flips.\n", 309 | "\n", 310 | "Here is an example:" 311 | ] 312 | }, 313 | { 314 | "cell_type": "code", 315 | "execution_count": null, 316 | "metadata": {}, 317 | "outputs": [], 318 | "source": [ 319 | "# Create a Dataset of miniplaces training images.\n", 320 | "train_set = torchvision.datasets.ImageFolder(\n", 321 | " 'datasets/miniplaces/train',\n", 322 | " torchvision.transforms.Compose([\n", 323 | " torchvision.transforms.RandomCrop(112),\n", 324 | " torchvision.transforms.RandomHorizontalFlip(),\n", 325 | " torchvision.transforms.ToTensor(),\n", 326 | " ]))\n", 327 | "train_loader = torch.utils.data.DataLoader(\n", 328 | " train_set, batch_size=100, num_workers=10,\n", 329 | " shuffle=True,\n", 330 | " pin_memory=True)\n", 331 | "\n", 332 | "# Now let's train for one more epoch, and test the accuracy\n", 333 | "model.train()\n", 334 | "for batch in tqdm(train_loader):\n", 335 | " images, labels = [d.cuda() for d in batch]\n", 336 | " optimizer.zero_grad()\n", 337 | " scores = model(images.cuda())\n", 338 | " loss = torch.nn.functional.cross_entropy(scores, labels)\n", 339 | " loss.backward()\n", 340 | " optimizer.step()\n", 341 | "print(f'Accuracy on unseen images {accuracy() * 100}% (random guesses would be 1%)')" 342 | ] 343 | }, 344 | { 345 | "cell_type": "markdown", 346 | "metadata": {}, 347 | "source": [ 348 | "### Exercise\n", 349 | "\n", 350 | "1. Print out the same images as before, with updated predictions for the newly tuned network parameters.\n", 351 | "2. Repeat training for a few more epochs. How does the accuracy evolve?" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "### Epilog\n", 359 | "\n", 360 | "Almost all the pytorch code you will find will be variations and extensions of the patterns we have covered. You're ready to explore.\n", 361 | "\n", 362 | "Have fun!\n", 363 | "\n", 364 | "### [Back to the introduction →](1-Pytorch-Introduction.ipynb)" 365 | ] 366 | } 367 | ], 368 | "metadata": { 369 | "accelerator": "GPU", 370 | "kernelspec": { 371 | "display_name": "Python 3 (ipykernel)", 372 | "language": "python", 373 | "name": "python3" 374 | }, 375 | "language_info": { 376 | "codemirror_mode": { 377 | "name": "ipython", 378 | "version": 3 379 | }, 380 | "file_extension": ".py", 381 | "mimetype": "text/x-python", 382 | "name": "python", 383 | "nbconvert_exporter": "python", 384 | "pygments_lexer": "ipython3", 385 | "version": "3.9.9" 386 | } 387 | }, 388 | "nbformat": 4, 389 | "nbformat_minor": 4 390 | } 391 | -------------------------------------------------------------------------------- /notebooks/6-Pytorch-Alexnet-Example.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "Pytorch Alexnet Example\n", 8 | "=======================\n", 9 | "\n", 10 | "This is a complete example of training an alexnet on pytorch, fully within notebook, and using nothing but widely-used library functions.\n", 11 | "\n", 12 | "Warning: this notebook download a full large-scale dataset (places365). That is too large to do in a practical way on Google Colab, so you need to host this notebook on your own server." 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": null, 18 | "metadata": {}, 19 | "outputs": [], 20 | "source": [ 21 | "import torch, torchvision, os\n", 22 | "\n", 23 | "def train_alexnet_places(num_steps=100000):\n", 24 | " print(\"Making alexnet...\")\n", 25 | " alexnet = make_untrained_alexnet_places()\n", 26 | " alexnet.train()\n", 27 | " print(\"Loading datasets...\")\n", 28 | " train_loader, val_loader = get_train_and_val_data_loaders()\n", 29 | " print(\"Training classifier...\")\n", 30 | " checkpointer = make_checkpointing_function(val_loader, checkpoint_dir='checkpoints')\n", 31 | " train_classifier(alexnet, train_loader,\n", 32 | " max_iter=num_steps,\n", 33 | " momentum=0.9,\n", 34 | " init_lr=2e-2,\n", 35 | " weight_decay=5e-4,\n", 36 | " monitor=checkpointer)\n", 37 | " return alexnet" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "# Untrained Alexnet\n", 45 | "-----------------\n", 46 | "\n", 47 | "This function creates an untrained alexnet, with randomized parameters." 48 | ] 49 | }, 50 | { 51 | "cell_type": "code", 52 | "execution_count": null, 53 | "metadata": {}, 54 | "outputs": [], 55 | "source": [ 56 | "from torch import nn\n", 57 | "from collections import OrderedDict\n", 58 | "def make_untrained_alexnet_places():\n", 59 | " # channel widths\n", 60 | " w = [3, 96, 256, 384, 384, 256, 4096, 4096, 365]\n", 61 | " # Alexnet splits channels into groups\n", 62 | " groups = [1, 2, 1, 2, 2]\n", 63 | " model = nn.Sequential(OrderedDict([\n", 64 | " ('conv1', nn.Conv2d(w[0], w[1], kernel_size=11,\n", 65 | " stride=4,\n", 66 | " groups=groups[0], bias=True)),\n", 67 | " ('relu1', nn.ReLU(inplace=True)),\n", 68 | " ('pool1', nn.MaxPool2d(kernel_size=3, stride=2)),\n", 69 | " ('conv2', nn.Conv2d(w[1], w[2], kernel_size=5, padding=2,\n", 70 | " groups=groups[1], bias=True)),\n", 71 | " ('relu2', nn.ReLU(inplace=True)),\n", 72 | " ('pool2', nn.MaxPool2d(kernel_size=3, stride=2)),\n", 73 | " ('conv3', nn.Conv2d(w[2], w[3], kernel_size=3, padding=1,\n", 74 | " groups=groups[2], bias=True)),\n", 75 | " ('relu3', nn.ReLU(inplace=True)),\n", 76 | " ('conv4', nn.Conv2d(w[3], w[4], kernel_size=3, padding=1,\n", 77 | " groups=groups[3], bias=True)),\n", 78 | " ('relu4', nn.ReLU(inplace=True)),\n", 79 | " ('conv5', nn.Conv2d(w[4], w[5], kernel_size=3, padding=1,\n", 80 | " groups=groups[4], bias=True)),\n", 81 | " ('relu5', nn.ReLU(inplace=True)),\n", 82 | " ('pool5', nn.MaxPool2d(kernel_size=3, stride=2)),\n", 83 | " ('flatten', nn.Flatten()),\n", 84 | " ('fc6', nn.Linear(w[5] * 6 * 6, w[6], bias=True)),\n", 85 | " ('relu6', nn.ReLU(inplace=True)),\n", 86 | " ('dropout6', nn.Dropout()),\n", 87 | " ('fc7', nn.Linear(w[6], w[7], bias=True)),\n", 88 | " ('relu7', nn.ReLU(inplace=True)),\n", 89 | " ('dropout7', nn.Dropout()),\n", 90 | " ('fc8', nn.Linear(w[7], w[8]))\n", 91 | " ]))\n", 92 | " # Setup the initial parameters randomly\n", 93 | " for n, p in model.named_parameters():\n", 94 | " if 'bias' in n:\n", 95 | " torch.nn.init.zeros_(p)\n", 96 | " else:\n", 97 | " torch.nn.init.kaiming_normal_(p, nonlinearity='relu')\n", 98 | " model.cuda()\n", 99 | " model.train()\n", 100 | " return model" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "We can call the function to make a network, and then list all the network's trainable parameters." 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": null, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "a = make_untrained_alexnet_places()\n", 117 | "for n, p in a.named_parameters():\n", 118 | " print(n, tuple(p.shape))" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "And we can save the uninitialized neural network." 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": null, 131 | "metadata": {}, 132 | "outputs": [], 133 | "source": [ 134 | "torch.save(a.state_dict(), 'checkpoints/uninitialized_alexnet.pth')" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": {}, 140 | "source": [ 141 | "Main Training Loop\n", 142 | "------------------\n", 143 | "\n", 144 | "This is a generic training loop for a classifier." 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": null, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "def train_classifier(model, train_data_loader, max_iter,\n", 154 | " momentum=0.9, init_lr=2e-2, weight_decay=5e-4,\n", 155 | " monitor=None):\n", 156 | " if monitor is not None:\n", 157 | " monitor(model, 0, 0.0, 0.0, 0)\n", 158 | " optimizer = torch.optim.SGD(\n", 159 | " model.parameters(),\n", 160 | " lr=init_lr, momentum=momentum, weight_decay=weight_decay)\n", 161 | " scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer, init_lr, max_iter)\n", 162 | " iter_num = 0\n", 163 | " while iter_num < max_iter:\n", 164 | " for t_input, t_target in train_data_loader:\n", 165 | " # Copy data into the gpu\n", 166 | " input_var, target_var = [d.cuda() for d in [t_input, t_target]]\n", 167 | " # Evaluate model\n", 168 | " output = model(input_var)\n", 169 | " loss = torch.nn.functional.cross_entropy(output, target_var)\n", 170 | " # Perform one step of SGD\n", 171 | " optimizer.zero_grad()\n", 172 | " loss.backward()\n", 173 | " optimizer.step()\n", 174 | " scheduler.step() # Learning rate schedule\n", 175 | " # Check training set accuracy\n", 176 | " _, pred = output.max(1)\n", 177 | " batch_size = len(t_input)\n", 178 | " accuracy = target_var.detach().eq(pred).float().sum().item() / batch_size\n", 179 | " # Advance, and print out some stats\n", 180 | " iter_num += 1\n", 181 | " if monitor is not None:\n", 182 | " monitor(model, iter_num, loss, accuracy, batch_size)\n", 183 | " if iter_num >= max_iter:\n", 184 | " break" 185 | ] 186 | }, 187 | { 188 | "cell_type": "markdown", 189 | "metadata": {}, 190 | "source": [ 191 | "Data set\n", 192 | "--------\n", 193 | "\n", 194 | "This is the definition of the places data set used for training.\n", 195 | "If we do not have the files, we download them. And then we make a\n", 196 | "DataSet object that defines how to resize, crop, and normalize the images.\n", 197 | "\n", 198 | "The DataLoader objects wrap the dataset in a multithreaded streaming\n", 199 | "object that batches the image data and loads it quickly." 200 | ] 201 | }, 202 | { 203 | "cell_type": "code", 204 | "execution_count": null, 205 | "metadata": {}, 206 | "outputs": [], 207 | "source": [ 208 | "def get_places_data_set(split, crop_size=227, download=True):\n", 209 | " dirname = f'datasets/places/{split}'\n", 210 | " nfs_source = '/data/vision/torralba/datasets/places/files'\n", 211 | " web_source = 'https://dissect.csail.mit.edu/datasets/'\n", 212 | " if not os.path.exists(dirname) and download:\n", 213 | " if os.path.exists(nfs_source):\n", 214 | " os.symlink(nfs_source, 'datasets/places')\n", 215 | " else:\n", 216 | " os.makedirs(dirname, exist_ok=True)\n", 217 | " torchvision.datasets.utils.download_and_extract_archive(\n", 218 | " 'web_sources' +\n", 219 | " 'places_%s.zip' % split,\n", 220 | " 'datasets',\n", 221 | " md5=dict(val='593bbc21590cf7c396faac2e600cd30c',\n", 222 | " train='d1db6ad3fc1d69b94da325ac08886a01')[split])\n", 223 | " if split == 'train':\n", 224 | " cropping_rule = [\n", 225 | " torchvision.transforms.RandomCrop(227),\n", 226 | " torchvision.transforms.RandomHorizontalFlip() ]\n", 227 | " else:\n", 228 | " cropping_rule = [torchvision.transforms.CenterCrop(crop_size)]\n", 229 | " places_transform = torchvision.transforms.Compose([\n", 230 | " torchvision.transforms.Resize(256)\n", 231 | " ] + cropping_rule + [\n", 232 | " torchvision.transforms.ToTensor(),\n", 233 | " torchvision.transforms.Normalize(\n", 234 | " [0.485, 0.456, 0.406], [0.229, 0.224, 0.225])\n", 235 | " ])\n", 236 | " return torchvision.datasets.ImageFolder(\n", 237 | " dirname, transform=places_transform)\n", 238 | "\n", 239 | "def get_train_and_val_data_loaders():\n", 240 | " return [\n", 241 | " torch.utils.data.DataLoader(\n", 242 | " get_places_data_set(split),\n", 243 | " batch_size=256, shuffle=(split == 'train'),\n", 244 | " num_workers=48, pin_memory=True)\n", 245 | " for split in ['train', 'val']\n", 246 | " ]" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "Generic Evaluation and Checkpointing Utilities\n", 254 | "----------------------------------------------\n", 255 | "\n", 256 | " * **measure_val_accuracy_and_loss** evaluates the model on the holdout set and reports its performance.\n", 257 | " * **save_model_iteration** saves the current model parameters in a pytorch file.\n", 258 | " * **make_training_monitor** makes a callback function for periodically evaluating and saving a model during training.\n", 259 | " * **AverageMeter** tracks averages (e.g., average accuracy, average loss)." 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": null, 265 | "metadata": {}, 266 | "outputs": [], 267 | "source": [ 268 | "def measure_val_accuracy_and_loss(model, val_data_loader):\n", 269 | " '''\n", 270 | " Evaluates the model (in inference mode) on holdout data.\n", 271 | " '''\n", 272 | " model.eval()\n", 273 | " val_loss, val_acc = AverageMeter(), AverageMeter()\n", 274 | " for input, target in val_data_loader:\n", 275 | " input_var, target_var = [d.cuda() for d in [input, target]]\n", 276 | " with torch.no_grad():\n", 277 | " output = model(input_var)\n", 278 | " loss = torch.nn.functional.cross_entropy(output, target_var)\n", 279 | " _, pred = output.max(1)\n", 280 | " accuracy = (target_var.eq(pred)\n", 281 | " ).data.float().sum().item() / input.size(0)\n", 282 | " val_acc.update(accuracy, input.size(0))\n", 283 | " val_loss.update(loss.data.item(), input.size(0))\n", 284 | " return val_acc, val_loss\n", 285 | "\n", 286 | "def save_model_iteration(model, iter_num, checkpoint_dir):\n", 287 | " '''\n", 288 | " Saves the current parameters of the model to a file.\n", 289 | " '''\n", 290 | " torch.save(model.state_dict(), os.path.join(checkpoint_dir, f'iter_{iter_num}.pth'))\n", 291 | " \n", 292 | "def make_checkpointing_function(val_data_loader, checkpoint_dir=None, checkpoint_freq=100):\n", 293 | " '''\n", 294 | " Makes a callback to monitor training and make checkpoints.\n", 295 | " '''\n", 296 | " avg_train_accuracy, avg_train_loss = AverageMeter(), AverageMeter()\n", 297 | " def monitor(model, iter_num, loss, accuracy, batch_size):\n", 298 | " avg_train_accuracy.update(accuracy, batch_size)\n", 299 | " avg_train_loss.update(loss, batch_size)\n", 300 | " if iter_num % checkpoint_freq == 0:\n", 301 | " val_accuracy, val_loss = measure_val_accuracy_and_loss(model, val_data_loader)\n", 302 | " if checkpoint_dir is not None:\n", 303 | " save_model_iteration(model, iter_num, checkpoint_dir)\n", 304 | " print(f'Iter {iter_num}, ' + \n", 305 | " f'train acc {avg_train_accuracy.avg:.3g} loss {avg_train_loss.avg:.3g}, ' +\n", 306 | " f'val acc {val_accuracy.avg:.3g}, loss {val_loss.avg:.3g}')\n", 307 | " model.train()\n", 308 | " return monitor \n", 309 | " \n", 310 | "class AverageMeter(object):\n", 311 | " '''\n", 312 | " To keep running averages.\n", 313 | " '''\n", 314 | " def __init__(self):\n", 315 | " self.reset()\n", 316 | " def reset(self):\n", 317 | " self.val = 0.\n", 318 | " self.avg = 0.\n", 319 | " self.sum = 0.\n", 320 | " self.count = 0\n", 321 | " def update(self, val, n=1):\n", 322 | " self.val = val\n", 323 | " self.sum += val * n\n", 324 | " self.count += n\n", 325 | " if self.count:\n", 326 | " self.avg = self.sum / self.count" 327 | ] 328 | }, 329 | { 330 | "cell_type": "markdown", 331 | "metadata": {}, 332 | "source": [ 333 | "Now do the work\n", 334 | "---------------\n", 335 | "\n", 336 | "Try loading alexnet from a checkpoint. If we have not yet saved a checkpoint snapshot with the number of iterations we want, then train it. " 337 | ] 338 | }, 339 | { 340 | "cell_type": "code", 341 | "execution_count": null, 342 | "metadata": {}, 343 | "outputs": [], 344 | "source": [ 345 | "num_iterations = 100\n", 346 | "try:\n", 347 | " a = make_untrained_alexnet_places()\n", 348 | " a.load_state_dict(torch.load(f'checkpoints/iter_{num_iterations}.pth'))\n", 349 | "except:\n", 350 | " a = train_alexnet_places(num_iterations)" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "Now view one image - reverse the dataset normalization to get a nice image." 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": null, 363 | "metadata": {}, 364 | "outputs": [], 365 | "source": [ 366 | "dsv = get_places_data_set('val')\n", 367 | "im, label = dsv[5000]\n", 368 | "im = im.cuda()\n", 369 | "# Reverse the normalization\n", 370 | "unnormalized = (im.cpu().permute(1, 2, 0)\n", 371 | " * torch.tensor([0.229, 0.224, 0.225])\n", 372 | " + torch.tensor([0.485, 0.456, 0.406]))\n", 373 | "\n", 374 | "from matplotlib import pyplot as plt\n", 375 | "plt.imshow(unnormalized)\n", 376 | "plt.axis('off')\n", 377 | "plt.show()" 378 | ] 379 | }, 380 | { 381 | "cell_type": "markdown", 382 | "metadata": {}, 383 | "source": [ 384 | "Finally, run the network on the function and print the prediction.\n", 385 | "\n", 386 | "Note the network expexts to work in batches, so `im[None]` forms an image batch of size one." 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": null, 392 | "metadata": {}, 393 | "outputs": [], 394 | "source": [ 395 | "a.eval()\n", 396 | "output = a(im[None])\n", 397 | "pred = output.max(1)[1][0]\n", 398 | "\n", 399 | "print('prediction: ', dsv.classes[pred])\n", 400 | "print('groundtruth: ', dsv.classes[label])" 401 | ] 402 | } 403 | ], 404 | "metadata": { 405 | "accelerator": "GPU", 406 | "kernelspec": { 407 | "display_name": "Python 3", 408 | "language": "python", 409 | "name": "python3" 410 | }, 411 | "language_info": { 412 | "codemirror_mode": { 413 | "name": "ipython", 414 | "version": 3 415 | }, 416 | "file_extension": ".py", 417 | "mimetype": "text/x-python", 418 | "name": "python", 419 | "nbconvert_exporter": "python", 420 | "pygments_lexer": "ipython3", 421 | "version": "3.6.10" 422 | } 423 | }, 424 | "nbformat": 4, 425 | "nbformat_minor": 4 426 | } -------------------------------------------------------------------------------- /notebooks/autograd-graph.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/davidbau/how-to-read-pytorch/bbefbbd51834ac766f9d0a0ad09b69c1337521be/notebooks/autograd-graph.png -------------------------------------------------------------------------------- /notebooks/dataloader.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/davidbau/how-to-read-pytorch/bbefbbd51834ac766f9d0a0ad09b69c1337521be/notebooks/dataloader.png -------------------------------------------------------------------------------- /notebooks/how-to-read-pytorch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/davidbau/how-to-read-pytorch/bbefbbd51834ac766f9d0a0ad09b69c1337521be/notebooks/how-to-read-pytorch.png -------------------------------------------------------------------------------- /notebooks/ipynb_drop_output.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | """ 4 | Suppress output and prompt numbers in git version control. 5 | 6 | This script will tell git to ignore prompt numbers and cell output 7 | when looking at ipynb files UNLESS their metadata contains: 8 | 9 | "git": { 10 | "keep_outputs": true 11 | }, 12 | 13 | The notebooks themselves are not changed. 14 | 15 | See also this blogpost: http://pascalbugnion.net/blog/ipython-notebooks-and-git.html. 16 | 17 | Usage instructions 18 | ================== 19 | 20 | 1. Put this script in a directory that is on the system's path. 21 | For future reference, I will assume you saved it in 22 | `~/scripts/ipynb_drop_output`. 23 | 2. Make sure it is executable by typing the command 24 | `chmod +x ~/scripts/ipynb_drop_output`. 25 | 3. Register a filter for ipython notebooks by 26 | putting the following line in `~/.config/git/attributes`: 27 | `*.ipynb filter=clean_ipynb` 28 | 4. Connect this script to the filter by running the following 29 | git commands: 30 | 31 | git config --global filter.clean_ipynb.clean ipynb_drop_output 32 | git config --global filter.clean_ipynb.smudge cat 33 | 34 | To tell git NOT to ignore the output and prompts for a notebook, 35 | open the notebook's metadata (Edit > Edit Notebook Metadata). A 36 | panel should open containing the lines: 37 | 38 | { 39 | "name" : "", 40 | "signature" : "some very long hash" 41 | } 42 | 43 | Add an extra line so that the metadata now looks like: 44 | 45 | { 46 | "name" : "", 47 | "signature" : "don't change the hash, but add a comma at the end of the line", 48 | "git" : { "keep_outputs" : true } 49 | } 50 | 51 | You may need to "touch" the notebooks for git to actually register a change, if 52 | your notebooks are already under version control. 53 | 54 | Notes 55 | ===== 56 | 57 | Changed by David Bau to make stripping output the default. 58 | 59 | This script is inspired by http://stackoverflow.com/a/20844506/827862, but 60 | lets the user specify whether the ouptut of a notebook should be kept 61 | in the notebook's metadata, and works for IPython v3.0. 62 | """ 63 | 64 | import sys 65 | import json 66 | 67 | nb = sys.stdin.read() 68 | 69 | json_in = json.loads(nb) 70 | nb_metadata = json_in["metadata"] 71 | keep_output = False 72 | if "git" in nb_metadata: 73 | if "keep_outputs" in nb_metadata["git"] and nb_metadata["git"]["keep_outputs"]: 74 | keep_output = True 75 | if keep_output: 76 | sys.stdout.write(nb) 77 | exit() 78 | 79 | 80 | ipy_version = int(json_in["nbformat"]) - 1 # nbformat is 1 more than actual version. 81 | 82 | 83 | def strip_output_from_cell(cell): 84 | if "outputs" in cell: 85 | cell["outputs"] = [] 86 | if "prompt_number" in cell: 87 | del cell["prompt_number"] 88 | if "execution_count" in cell: 89 | cell["execution_count"] = None 90 | 91 | 92 | if ipy_version == 2: 93 | for sheet in json_in["worksheets"]: 94 | for cell in sheet["cells"]: 95 | strip_output_from_cell(cell) 96 | else: 97 | for cell in json_in["cells"]: 98 | strip_output_from_cell(cell) 99 | 100 | json.dump( 101 | json_in, 102 | sys.stdout, 103 | sort_keys=True, 104 | indent=1, 105 | separators=(",", ": "), 106 | ensure_ascii=False, 107 | ) 108 | # https://stackoverflow.com/questions/729692/why-should-text-files-end-with-a-newline 109 | sys.stdout.write("\n") 110 | -------------------------------------------------------------------------------- /notebooks/setup_notebooks.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # Start from directory of script 4 | cd "$(dirname "${BASH_SOURCE[0]}")" 5 | 6 | # Set up git config filters so huge output of notebooks is not committed. 7 | git config filter.clean_ipynb.clean "$(pwd)/ipynb_drop_output.py" 8 | git config filter.clean_ipynb.smudge cat 9 | git config filter.clean_ipynb.required true 10 | 11 | # Set up symlinks for the example notebooks 12 | for DIRNAME in datasets checkpoints 13 | do 14 | mkdir -p ../${DIRNAME} 15 | ln -sfn ../${DIRNAME} . 16 | done 17 | --------------------------------------------------------------------------------