├── .gitignore ├── 00-intro.ipynb ├── 01-pytorch-basics.ipynb ├── 02-linear-regression.ipynb ├── 03-modules-and-mlps.ipynb ├── 04-optional-word2vec.ipynb ├── README.md ├── bonus-computational-efficiency.ipynb ├── broadcasting_real_examples.ipynb ├── challenges-for-true-pytorch-heroes-solutions.ipynb ├── challenges-for-true-pytorch-heroes.ipynb ├── img ├── common_mistakes.png ├── dynamic_graph.gif ├── pytorch-logo.png ├── pytorch_logo.png ├── pytorch_logo_flame.png └── the_real_reason.png ├── requirements.txt └── spec.py /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | .idea 3 | env/ 4 | -------------------------------------------------------------------------------- /00-intro.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Introduction\n", 8 | "\n", 9 | "The material for this course is here: https://github.com/mtreviso/pytorch-lecture. \n", 10 | "\n", 11 | "
\n", 12 | "
\n", 13 | " What we are NOT going to cover in this course:
\n", 14 | " How to implement SOTA models\n", 15 | "
\n", 16 | " How to optimize our code\n", 17 | "
\n", 18 | " How autograd is implemented\n", 19 | "
\n", 20 | " How to use the new fancy stuff: mobile support, distributed training, quantization, sparse tensors, etc.\n", 21 | "

\n", 22 | " Instead, we are going to:
\n", 23 | " Understand the key PyTorch concepts (e.g., tensors, modules, autograd, broadcasting, ...)\n", 24 | "
\n", 25 | " Understand what PyTorch can and cannot do\n", 26 | "
\n", 27 | " Create simple neural networks and get and idea of how we can implement more complex models in the future\n", 28 | "
\n", 29 | " Kick off with PyTorch 🚀\n", 30 | "
\n", 31 | "\n", 32 | "> If you use PyTorch on a daily basis, you will most probably not learn a lot during this lecture." 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": {}, 38 | "source": [ 39 | "---" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "# Quick Recap of Jupyter Notebooks\n", 47 | "\n", 48 | "A jupyter notebook document has the `.ipynb` extension and is composed of a number of cells. In cells, you can write program code in Python and create notes in markdown style. These three types of cells correspond to:\n", 49 | " \n", 50 | " 1. code\n", 51 | " 2. markdown\n", 52 | " 3. raw\n", 53 | " \n", 54 | "To work with the contents of a cell, use *Edit mode* (turns on by pressing **Enter** after selecting a cell), and to navigate between cells, use *command mode* (turns on by pressing **Esc**).\n", 55 | "\n", 56 | "The cell type can be set in command mode either using hotkeys (**y** to code, **m** to markdown, **r** to edit raw text), or in the menu *Cell -> Cell type* ... " 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": {}, 62 | "source": [ 63 | "### Example" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": null, 69 | "metadata": {}, 70 | "outputs": [], 71 | "source": [ 72 | "# cell with code\n", 73 | "a = 1" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": null, 79 | "metadata": {}, 80 | "outputs": [], 81 | "source": [ 82 | "a = 2" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "a\n", 92 | "print(a)" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "Cell with markdown text" 100 | ] 101 | }, 102 | { 103 | "cell_type": "raw", 104 | "metadata": {}, 105 | "source": [ 106 | "Cell with raw text" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "Next, press `Shift + Enter` to process the contents of the cell:\n", 114 | "interpret the code or lay out the marked-up text." 115 | ] 116 | }, 117 | { 118 | "cell_type": "markdown", 119 | "metadata": {}, 120 | "source": [ 121 | "### Basic shortcuts\n", 122 | "\n", 123 | "- `a` creates a cell above the current cell\n", 124 | "- `b` creates a cell below the current cell\n", 125 | "- `dd` deletes the curent cell\n", 126 | "- `Enter` enters in edit mode\n", 127 | "- `Esc` exits edit mode\n", 128 | "- `Ctrl` + `Enter` runs the cell\n", 129 | "- `Shift` + `Enter` runs the cell and creates (or jumps to) a next one\n", 130 | "- `m` converts the current cell to markdown\n", 131 | "- `y` converts the current cell to code" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "> ***Word of caution***
\n", 139 | "> Jupyter-notebook is a great tool for data science since we can see the direct effect of a snippet of code, either by plotting the result or by inspecting the direct output. However, we should be careful with the order in which we run cells (this is a common source of errors).\n" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "---" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": {}, 152 | "source": [ 153 | "# PyTorch Overview\n", 154 | "\n", 155 | "\n", 156 | "> \"PyTorch - From Research To Production\n", 157 | "> \n", 158 | "> An open source machine learning framework that accelerates the path from research prototyping to production deployment.\"\n", 159 | "> -- https://pytorch.org/" 160 | ] 161 | }, 162 | { 163 | "cell_type": "markdown", 164 | "metadata": {}, 165 | "source": [ 166 | "## \"Build by run\" - what is that and why do I care?\n", 167 | "\n" 168 | ] 169 | }, 170 | { 171 | "cell_type": "markdown", 172 | "metadata": {}, 173 | "source": [ 174 | "" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "A very practical reason to use PyTorch:" 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "import torch\n", 191 | "import ipdb\n", 192 | "\n", 193 | "def f(x):\n", 194 | " res = x + x\n", 195 | " ipdb.set_trace() # <-- :o\n", 196 | " return res\n", 197 | "\n", 198 | "x = torch.randn(1, 8)\n", 199 | "f(x)" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": {}, 205 | "source": [ 206 | "## Other reasons for using PyTorch\n" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "\n", 221 | "- Seamless GPU integration\n", 222 | "- Production ready\n", 223 | "- Distributed training\n", 224 | "- Mobile support\n", 225 | "- Cloud support\n", 226 | "- Robust ecosystem\n", 227 | "- C++ front-end\n" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": {}, 233 | "source": [ 234 | "## Other neural network toolkits you might want to check out\n", 235 | "- TensorFlow\n", 236 | "- JAX\n", 237 | "- MXNet\n", 238 | "- Keras\n", 239 | "- CNTK\n", 240 | "- Chainer\n", 241 | "- caffe\n", 242 | "- caffe2\n", 243 | "- dynet\n", 244 | "- many many more\n", 245 | "\n", 246 | "Which one to choose? There is no bullet silver. All of them are good!\n" 247 | ] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "---" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "# Useful Links\n", 261 | "\n", 262 | "- Twitter: https://twitter.com/PyTorch\n", 263 | "- Forum: https://discuss.pytorch.org/\n", 264 | "- Tutorials: https://pytorch.org/tutorials/\n", 265 | "- Examples: https://github.com/pytorch/examples\n", 266 | "- API Reference: https://pytorch.org/docs/stable/index.html\n", 267 | "- Torchvision: https://pytorch.org/docs/stable/torchvision/index.html\n", 268 | "- PyTorch Text: https://github.com/pytorch/text\n", 269 | "- PyTorch Audio: https://github.com/pytorch/audio\n", 270 | "\n", 271 | "\n", 272 | "More tutorials:\n", 273 | "- https://github.com/sotte/pytorch_tutorial\n", 274 | "- https://github.com/erickrf/pytorch-lecture\n", 275 | "- https://github.com/goncalomcorreia/pytorch-lecture" 276 | ] 277 | } 278 | ], 279 | "metadata": { 280 | "kernelspec": { 281 | "display_name": "Python 3 (ipykernel)", 282 | "language": "python", 283 | "name": "python3" 284 | }, 285 | "language_info": { 286 | "codemirror_mode": { 287 | "name": "ipython", 288 | "version": 3 289 | }, 290 | "file_extension": ".py", 291 | "mimetype": "text/x-python", 292 | "name": "python", 293 | "nbconvert_exporter": "python", 294 | "pygments_lexer": "ipython3", 295 | "version": "3.9.7" 296 | } 297 | }, 298 | "nbformat": 4, 299 | "nbformat_minor": 2 300 | } 301 | -------------------------------------------------------------------------------- /01-pytorch-basics.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# An introduction to PyTorch\n", 8 | "\n", 9 | "PyTorch is a platform for deep learning in Python or C++. In this lecture we will focus in the **Python** landscape. " 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": {}, 15 | "source": [ 16 | "# Tensors\n", 17 | "\n", 18 | "Tensors are elementary units of PyTorch. They are very similar to numpy arrays" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "import numpy as np\n", 28 | "np.random.seed(0)\n", 29 | "\n", 30 | "import torch\n", 31 | "torch.manual_seed(0)" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": null, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "x = np.array([1.0, 2.0, 3.0])\n", 41 | "y = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": null, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "x" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": null, 56 | "metadata": {}, 57 | "outputs": [], 58 | "source": [ 59 | "y" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": null, 65 | "metadata": {}, 66 | "outputs": [], 67 | "source": [ 68 | "z = y ** 2\n", 69 | "z" 70 | ] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": {}, 75 | "source": [ 76 | "Broadly speaking, a tensor is like a numpy array that can carry gradient information from the chain of operations applied on top of it. There are other flavors that make them different, but this is the key distinction." 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": {}, 82 | "source": [ 83 | "## Creating tensors " 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": {}, 90 | "outputs": [], 91 | "source": [ 92 | "# directly from data\n", 93 | "data = [[0, 1], [1, 0]]\n", 94 | "x_data = torch.tensor(data)\n", 95 | "x_data" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "# from a numpy array\n", 105 | "x_numpy = np.array([[1, 2], [3, 4]])\n", 106 | "x_torch = torch.from_numpy(x_numpy)\n", 107 | "x_torch" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": null, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "# convert it back to a numpy array\n", 117 | "x_numpy = x_torch.numpy()\n", 118 | "x_numpy" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": null, 124 | "metadata": {}, 125 | "outputs": [], 126 | "source": [ 127 | "# with constant data\n", 128 | "x = torch.ones(2, 3) # 2 rows and 3 columns\n", 129 | "print(x)\n", 130 | "y = torch.zeros(3, 2) # 3 rows and 2 columns\n", 131 | "print(y)\n", 132 | "z = torch.full((3, 1), -5) # 3 row and 1 columns (aka column vector)\n", 133 | "print(z)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": null, 139 | "metadata": {}, 140 | "outputs": [], 141 | "source": [ 142 | "# with random data\n", 143 | "x = torch.rand(2, 3) # uniform distribution U(0, 1)\n", 144 | "print(x)\n", 145 | "y = torch.randn(2, 3) # standard gaussian N(0, 1)\n", 146 | "print(y)\n", 147 | "z = torch.randint(0, 10, size=(2, 3)) # random integers [0, 10)\n", 148 | "print(z)" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": null, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [ 157 | "# other initializations\n", 158 | "print(torch.arange(5)) # from 0 (inclusive) to 5 (exclusive)\n", 159 | "print(torch.arange(2, 8)) # from 2 to 8\n", 160 | "print(torch.arange(2, 8, 2)) # from 2 to 8, with stepsize=2\n", 161 | "\n", 162 | "print(torch.linspace(0, 1, 6)) # returns 6 linear spaced numbers from 0 to 1 (inclusive)\n", 163 | "print(torch.linspace(-1, 1, 8)) # returns 8 linear spaced numbers form -1 to 1 \n", 164 | "\n", 165 | "print(torch.eye(3)) # identity matrix" 166 | ] 167 | }, 168 | { 169 | "cell_type": "markdown", 170 | "metadata": {}, 171 | "source": [ 172 | "See the full set of creation ops [here](https://pytorch.org/docs/stable/torch.html#creation-ops)." 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "## Tensor attributes" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": null, 185 | "metadata": {}, 186 | "outputs": [], 187 | "source": [ 188 | "x = torch.rand(3, 4, requires_grad=True)\n", 189 | "print(x.device)\n", 190 | "print(x.shape)\n", 191 | "print(x.dtype)\n", 192 | "print(x)\n", 193 | "print(x.data)\n", 194 | "print(x[0, 0])\n", 195 | "print(x[0, 0].item())" 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "Tensor data types:\n", 203 | "\n", 204 | "\n", 205 | "\n", 206 | "\n", 207 | "\n", 208 | "\n", 209 | "\n", 210 | "\n", 211 | "\n", 212 | "\n", 213 | "\n", 214 | "\n", 215 | "\n", 216 | "\n", 217 | "\n", 218 | "\n", 219 | "\n", 220 | "\n", 221 | "\n", 222 | "\n", 223 | "\n", 224 | "\n", 225 | "\n", 226 | "\n", 227 | "\n", 228 | "\n", 229 | "\n", 230 | "\n", 231 | "\n", 232 | "\n", 233 | "\n", 234 | "\n", 235 | "\n", 236 | "\n", 237 | "\n", 238 | "\n", 239 | "\n", 240 | "\n", 241 | "\n", 242 | "\n", 243 | "\n", 244 | "\n", 245 | "\n", 246 | "\n", 247 | "\n", 248 | "\n", 249 | "\n", 250 | "\n", 251 | "\n", 252 | "\n", 253 | "\n", 254 | "\n", 255 | "\n", 256 | "\n", 257 | "\n", 258 | "\n", 259 | "\n", 260 | "\n", 261 | "

Data type

dtype

Legacy Constructors

32-bit floating point

torch.float32 or torch.float

torch.*.FloatTensor

64-bit floating point

torch.float64 or torch.double

torch.*.DoubleTensor

64-bit complex

torch.complex64 or torch.cfloat

128-bit complex

torch.complex128 or torch.cdouble

16-bit floating point 1

torch.float16 or torch.half

torch.*.HalfTensor

16-bit floating point 2

torch.bfloat16

torch.*.BFloat16Tensor

8-bit integer (unsigned)

torch.uint8

torch.*.ByteTensor

8-bit integer (signed)

torch.int8

torch.*.CharTensor

16-bit integer (signed)

torch.int16 or torch.short

torch.*.ShortTensor

32-bit integer (signed)

torch.int32 or torch.int

torch.*.IntTensor

64-bit integer (signed)

torch.int64 or torch.long

torch.*.LongTensor

Boolean

torch.bool

torch.*.BoolTensor

\n" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "Casting tensors accoding to regular Python rules:\n", 269 | "```\n", 270 | "complex > floating > integral > boolean\n", 271 | "```\n", 272 | "\n", 273 | "Also, be careful with casts to the same dtypes to avoid underflow/overflow:" 274 | ] 275 | }, 276 | { 277 | "cell_type": "code", 278 | "execution_count": null, 279 | "metadata": {}, 280 | "outputs": [], 281 | "source": [ 282 | "float_tensor = torch.randn(2, 2, dtype=torch.float)\n", 283 | "int_tensor = torch.ones(1, dtype=torch.int)\n", 284 | "long_tensor = torch.ones(1, dtype=torch.long)\n", 285 | "uint_tensor = torch.ones(1, dtype=torch.uint8)" 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": null, 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "long_tensor_big_number = long_tensor * 2**33\n", 295 | "long_tensor_big_number, long_tensor_big_number.int()" 296 | ] 297 | }, 298 | { 299 | "cell_type": "code", 300 | "execution_count": null, 301 | "metadata": {}, 302 | "outputs": [], 303 | "source": [ 304 | "float_tensor, float_tensor.long()" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": {}, 310 | "source": [ 311 | "See the full list of attributes [here](https://pytorch.org/docs/stable/tensor_attributes.html)" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "## Examples" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": null, 324 | "metadata": {}, 325 | "outputs": [], 326 | "source": [ 327 | "# scalar\n", 328 | "x = torch.tensor(2)\n", 329 | "print(x)\n", 330 | "print(x.shape)\n", 331 | "print(x.item()) # access the (single) element inside the tensor\n", 332 | "print('')\n", 333 | "\n", 334 | "# vector\n", 335 | "x = torch.rand(4)\n", 336 | "print(x)\n", 337 | "print(x.shape)\n", 338 | "print('')\n", 339 | "\n", 340 | "# matrix\n", 341 | "x = torch.rand(4, 3)\n", 342 | "print(x)\n", 343 | "print(x.shape)\n", 344 | "print('')\n", 345 | "\n", 346 | "# n-dimensional array\n", 347 | "x = torch.rand(3, 4, 3) # e.g., image with width=3, height=4, and channels=3\n", 348 | "print(x)\n", 349 | "print(x.shape)\n", 350 | "print('')\n", 351 | "\n", 352 | "from matplotlib import pyplot as plt; plt.imshow(x)" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": {}, 358 | "source": [ 359 | "## Tensor operations" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": null, 365 | "metadata": {}, 366 | "outputs": [], 367 | "source": [ 368 | "v1 = torch.arange(8)\n", 369 | "v2 = torch.arange(10, 18)\n", 370 | "\n", 371 | "print(\"v1: %s\" % v1)\n", 372 | "print(\"v2: %s\" % v2)\n", 373 | "print(\"Dot product: %d\" % v1.dot(v2))" 374 | ] 375 | }, 376 | { 377 | "cell_type": "markdown", 378 | "metadata": {}, 379 | "source": [ 380 | "#### You can also change a value inside the array manually" 381 | ] 382 | }, 383 | { 384 | "cell_type": "code", 385 | "execution_count": null, 386 | "metadata": {}, 387 | "outputs": [], 388 | "source": [ 389 | "v2[1] = 25\n", 390 | "print(v2)" 391 | ] 392 | }, 393 | { 394 | "cell_type": "markdown", 395 | "metadata": {}, 396 | "source": [ 397 | "**Accessing values:**" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": {}, 403 | "source": [ 404 | "Individual tensor positions are scalars, or 0-dimension tensor:" 405 | ] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "execution_count": null, 410 | "metadata": {}, 411 | "outputs": [], 412 | "source": [ 413 | "print(v1[0])\n", 414 | "print(v1[0].shape)" 415 | ] 416 | }, 417 | { 418 | "cell_type": "markdown", 419 | "metadata": {}, 420 | "source": [ 421 | "`.item()` returns a Python number:" 422 | ] 423 | }, 424 | { 425 | "cell_type": "code", 426 | "execution_count": null, 427 | "metadata": {}, 428 | "outputs": [], 429 | "source": [ 430 | "number = v1[0].item()\n", 431 | "print(number)\n", 432 | "print(isinstance(number, int))" 433 | ] 434 | }, 435 | { 436 | "cell_type": "markdown", 437 | "metadata": {}, 438 | "source": [ 439 | "**Numpy-style indexing:**" 440 | ] 441 | }, 442 | { 443 | "cell_type": "code", 444 | "execution_count": null, 445 | "metadata": {}, 446 | "outputs": [], 447 | "source": [ 448 | "m = torch.randn(3, 4, 3)\n", 449 | "m" 450 | ] 451 | }, 452 | { 453 | "cell_type": "code", 454 | "execution_count": null, 455 | "metadata": { 456 | "scrolled": true 457 | }, 458 | "outputs": [], 459 | "source": [ 460 | "m[0,1,0]" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": null, 466 | "metadata": {}, 467 | "outputs": [], 468 | "source": [ 469 | "m[:, 1, 0]" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "execution_count": null, 475 | "metadata": {}, 476 | "outputs": [], 477 | "source": [ 478 | "m[0, :, -1]" 479 | ] 480 | }, 481 | { 482 | "cell_type": "code", 483 | "execution_count": null, 484 | "metadata": {}, 485 | "outputs": [], 486 | "source": [ 487 | "m[:, :, -1]" 488 | ] 489 | }, 490 | { 491 | "cell_type": "code", 492 | "execution_count": null, 493 | "metadata": {}, 494 | "outputs": [], 495 | "source": [ 496 | "m[..., -1]" 497 | ] 498 | }, 499 | { 500 | "cell_type": "markdown", 501 | "metadata": {}, 502 | "source": [ 503 | "## Elementwise operations" 504 | ] 505 | }, 506 | { 507 | "cell_type": "code", 508 | "execution_count": null, 509 | "metadata": {}, 510 | "outputs": [], 511 | "source": [ 512 | "v1" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": null, 518 | "metadata": {}, 519 | "outputs": [], 520 | "source": [ 521 | "v2" 522 | ] 523 | }, 524 | { 525 | "cell_type": "code", 526 | "execution_count": null, 527 | "metadata": {}, 528 | "outputs": [], 529 | "source": [ 530 | "v1 + v2" 531 | ] 532 | }, 533 | { 534 | "cell_type": "code", 535 | "execution_count": null, 536 | "metadata": {}, 537 | "outputs": [], 538 | "source": [ 539 | "v1 * v2" 540 | ] 541 | }, 542 | { 543 | "cell_type": "markdown", 544 | "metadata": {}, 545 | "source": [ 546 | "Some caveats when working with integer values!" 547 | ] 548 | }, 549 | { 550 | "cell_type": "code", 551 | "execution_count": null, 552 | "metadata": {}, 553 | "outputs": [], 554 | "source": [ 555 | "v1 / v2 " 556 | ] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "execution_count": null, 561 | "metadata": {}, 562 | "outputs": [], 563 | "source": [ 564 | "x = v1.float()\n", 565 | "y = v2.float()\n", 566 | "x / y" 567 | ] 568 | }, 569 | { 570 | "cell_type": "markdown", 571 | "metadata": {}, 572 | "source": [ 573 | "#### Operations with constants" 574 | ] 575 | }, 576 | { 577 | "cell_type": "code", 578 | "execution_count": null, 579 | "metadata": {}, 580 | "outputs": [], 581 | "source": [ 582 | "x" 583 | ] 584 | }, 585 | { 586 | "cell_type": "code", 587 | "execution_count": null, 588 | "metadata": {}, 589 | "outputs": [], 590 | "source": [ 591 | "x + 1" 592 | ] 593 | }, 594 | { 595 | "cell_type": "code", 596 | "execution_count": null, 597 | "metadata": {}, 598 | "outputs": [], 599 | "source": [ 600 | "x ** 2" 601 | ] 602 | }, 603 | { 604 | "cell_type": "markdown", 605 | "metadata": {}, 606 | "source": [ 607 | "## Aggregating tensors" 608 | ] 609 | }, 610 | { 611 | "cell_type": "code", 612 | "execution_count": null, 613 | "metadata": {}, 614 | "outputs": [], 615 | "source": [ 616 | "(x ** 2).sum().sqrt()" 617 | ] 618 | }, 619 | { 620 | "cell_type": "code", 621 | "execution_count": null, 622 | "metadata": {}, 623 | "outputs": [], 624 | "source": [ 625 | "x.mean(), x.std()" 626 | ] 627 | }, 628 | { 629 | "cell_type": "code", 630 | "execution_count": null, 631 | "metadata": {}, 632 | "outputs": [], 633 | "source": [ 634 | "x.min(), x.max()" 635 | ] 636 | }, 637 | { 638 | "cell_type": "code", 639 | "execution_count": null, 640 | "metadata": {}, 641 | "outputs": [], 642 | "source": [ 643 | "x.norm(p=3)" 644 | ] 645 | }, 646 | { 647 | "cell_type": "markdown", 648 | "metadata": {}, 649 | "source": [ 650 | "## Joining tensors" 651 | ] 652 | }, 653 | { 654 | "cell_type": "code", 655 | "execution_count": null, 656 | "metadata": {}, 657 | "outputs": [], 658 | "source": [ 659 | "torch.cat([x, y])" 660 | ] 661 | }, 662 | { 663 | "cell_type": "code", 664 | "execution_count": null, 665 | "metadata": {}, 666 | "outputs": [], 667 | "source": [ 668 | "z = torch.stack([x, y])\n", 669 | "z" 670 | ] 671 | }, 672 | { 673 | "cell_type": "code", 674 | "execution_count": null, 675 | "metadata": {}, 676 | "outputs": [], 677 | "source": [ 678 | "torch.vstack([z, x])" 679 | ] 680 | }, 681 | { 682 | "cell_type": "markdown", 683 | "metadata": {}, 684 | "source": [ 685 | "## Tensor multiplication" 686 | ] 687 | }, 688 | { 689 | "cell_type": "code", 690 | "execution_count": null, 691 | "metadata": {}, 692 | "outputs": [], 693 | "source": [ 694 | "m1 = torch.rand(5, 4)\n", 695 | "m2 = torch.rand(4, 5)\n", 696 | "\n", 697 | "print(\"m1: %s\\n\" % m1)\n", 698 | "print(\"m2: %s\\n\" % m2)\n", 699 | "print(m1.dot(m2))" 700 | ] 701 | }, 702 | { 703 | "cell_type": "markdown", 704 | "metadata": {}, 705 | "source": [ 706 | "Oops... that can be misleading if you are used to numpy. In PyTorch, `dot` is reserved for vectors only.\n", 707 | "For matrices, call `mm`:" 708 | ] 709 | }, 710 | { 711 | "cell_type": "code", 712 | "execution_count": null, 713 | "metadata": { 714 | "scrolled": true 715 | }, 716 | "outputs": [], 717 | "source": [ 718 | "print(m1.mm(m2))" 719 | ] 720 | }, 721 | { 722 | "cell_type": "markdown", 723 | "metadata": {}, 724 | "source": [ 725 | "Or the now-default-python operator for matrix multiplication `@`" 726 | ] 727 | }, 728 | { 729 | "cell_type": "code", 730 | "execution_count": null, 731 | "metadata": {}, 732 | "outputs": [], 733 | "source": [ 734 | "print(m1 @ m2)" 735 | ] 736 | }, 737 | { 738 | "cell_type": "markdown", 739 | "metadata": {}, 740 | "source": [ 741 | "What if I have batched data? It's better to use `.bmm()` (this is a common source of error)" 742 | ] 743 | }, 744 | { 745 | "cell_type": "code", 746 | "execution_count": null, 747 | "metadata": {}, 748 | "outputs": [], 749 | "source": [ 750 | "m1 = torch.rand(2, 5, 4)\n", 751 | "m2 = torch.rand(2, 4, 5)\n", 752 | "\n", 753 | "print(m1.bmm(m2))" 754 | ] 755 | }, 756 | { 757 | "cell_type": "markdown", 758 | "metadata": {}, 759 | "source": [ 760 | "`@` will work as `.bmm()`!" 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": null, 766 | "metadata": { 767 | "scrolled": true 768 | }, 769 | "outputs": [], 770 | "source": [ 771 | "print(m1 @ m2)" 772 | ] 773 | }, 774 | { 775 | "cell_type": "markdown", 776 | "metadata": {}, 777 | "source": [ 778 | "What if I have even more dimensions?" 779 | ] 780 | }, 781 | { 782 | "cell_type": "code", 783 | "execution_count": null, 784 | "metadata": {}, 785 | "outputs": [], 786 | "source": [ 787 | "m1 = torch.rand(2, 3, 5, 4)\n", 788 | "m2 = torch.rand(2, 3, 4, 5)\n", 789 | "\n", 790 | "print(m1.bmm(m2))" 791 | ] 792 | }, 793 | { 794 | "cell_type": "markdown", 795 | "metadata": {}, 796 | "source": [ 797 | "`.bmm` works only with 3d tensors. For higher dimensionalities, we can use the more general `matmul`. In fact, the `@` operator is a shorthand for `matmul` (which is implemented in the magic method `__matmul__` )" 798 | ] 799 | }, 800 | { 801 | "cell_type": "code", 802 | "execution_count": null, 803 | "metadata": { 804 | "scrolled": true 805 | }, 806 | "outputs": [], 807 | "source": [ 808 | "print(m1.matmul(m2).shape)\n", 809 | "print(m1.matmul(m2))" 810 | ] 811 | }, 812 | { 813 | "cell_type": "markdown", 814 | "metadata": {}, 815 | "source": [ 816 | "Anoter option is to use the powerful `einsum` function. Let's say our input have the following representation:\n", 817 | "- `b` = batch size \n", 818 | "- `c` = channels\n", 819 | "- `i` = `m1` timesteps\n", 820 | "- `j` = `m2` timesteps\n", 821 | "- `d` = hidden size" 822 | ] 823 | }, 824 | { 825 | "cell_type": "code", 826 | "execution_count": null, 827 | "metadata": { 828 | "scrolled": true 829 | }, 830 | "outputs": [], 831 | "source": [ 832 | "torch.einsum('bcid,bcdj->bcij', m1, m2)" 833 | ] 834 | }, 835 | { 836 | "cell_type": "markdown", 837 | "metadata": {}, 838 | "source": [ 839 | "See more about `einsum` here: https://pytorch.org/docs/master/generated/torch.einsum.html#torch.einsum" 840 | ] 841 | }, 842 | { 843 | "cell_type": "markdown", 844 | "metadata": {}, 845 | "source": [ 846 | "## Broadcasting\n", 847 | "\n", 848 | "Broadcasting means doing some arithmetic operation with tensors of different ranks, as if the smaller one were expanded, or broadcast, to match the larger.\n", 849 | "\n", 850 | "Let's experiment with a matrix (rank 2 tensor) and a vector (rank 1)." 851 | ] 852 | }, 853 | { 854 | "cell_type": "code", 855 | "execution_count": null, 856 | "metadata": {}, 857 | "outputs": [], 858 | "source": [ 859 | "m = torch.rand(5, 4)\n", 860 | "v = torch.arange(4)" 861 | ] 862 | }, 863 | { 864 | "cell_type": "code", 865 | "execution_count": null, 866 | "metadata": {}, 867 | "outputs": [], 868 | "source": [ 869 | "print(\"m:\", m)\n", 870 | "print(\"v:\", v)" 871 | ] 872 | }, 873 | { 874 | "cell_type": "code", 875 | "execution_count": null, 876 | "metadata": {}, 877 | "outputs": [], 878 | "source": [ 879 | "m_plus_v = m + v\n", 880 | "print(\"m + v:\\n\", m_plus_v)" 881 | ] 882 | }, 883 | { 884 | "cell_type": "markdown", 885 | "metadata": {}, 886 | "source": [ 887 | "Proof check" 888 | ] 889 | }, 890 | { 891 | "cell_type": "code", 892 | "execution_count": null, 893 | "metadata": {}, 894 | "outputs": [], 895 | "source": [ 896 | "print(\"m[0] = %s\\n\" % m[0])\n", 897 | "print(\"v = %s\\n\" % v)\n", 898 | "\n", 899 | "row_sum = m[0] + v\n", 900 | "print(\"m[0] + v = %s\\n\" % row_sum)\n", 901 | "print(\"(m + v)[0] = %s\" % m_plus_v[0])" 902 | ] 903 | }, 904 | { 905 | "cell_type": "markdown", 906 | "metadata": {}, 907 | "source": [ 908 | "We can also reshape tensors" 909 | ] 910 | }, 911 | { 912 | "cell_type": "code", 913 | "execution_count": null, 914 | "metadata": {}, 915 | "outputs": [], 916 | "source": [ 917 | "v.shape" 918 | ] 919 | }, 920 | { 921 | "cell_type": "code", 922 | "execution_count": null, 923 | "metadata": {}, 924 | "outputs": [], 925 | "source": [ 926 | "v" 927 | ] 928 | }, 929 | { 930 | "cell_type": "code", 931 | "execution_count": null, 932 | "metadata": {}, 933 | "outputs": [], 934 | "source": [ 935 | "v = v.view(2, 2)\n", 936 | "v" 937 | ] 938 | }, 939 | { 940 | "cell_type": "code", 941 | "execution_count": null, 942 | "metadata": {}, 943 | "outputs": [], 944 | "source": [ 945 | "v = v.view(4, 1)\n", 946 | "v" 947 | ] 948 | }, 949 | { 950 | "cell_type": "markdown", 951 | "metadata": {}, 952 | "source": [ 953 | "Note that shape `[4, 1]` is not broadcastable to match `[5, 4]`!" 954 | ] 955 | }, 956 | { 957 | "cell_type": "code", 958 | "execution_count": null, 959 | "metadata": {}, 960 | "outputs": [], 961 | "source": [ 962 | "m + v" 963 | ] 964 | }, 965 | { 966 | "cell_type": "markdown", 967 | "metadata": {}, 968 | "source": [ 969 | "... but `[1, 4]` is!" 970 | ] 971 | }, 972 | { 973 | "cell_type": "code", 974 | "execution_count": null, 975 | "metadata": {}, 976 | "outputs": [], 977 | "source": [ 978 | "v = v.view(1, 4)\n", 979 | "m + v" 980 | ] 981 | }, 982 | { 983 | "cell_type": "markdown", 984 | "metadata": {}, 985 | "source": [ 986 | "## Squeezing and Unsqueezing\n", 987 | "\n", 988 | "Broadcasting is one of the most important concepts for manipulating n-dimensional arrays. PyTorch offers some ways of expanding the rank of a tensor. " 989 | ] 990 | }, 991 | { 992 | "cell_type": "code", 993 | "execution_count": null, 994 | "metadata": {}, 995 | "outputs": [], 996 | "source": [ 997 | "v = torch.rand(4).view(1, 4, 1)\n", 998 | "print(v)\n", 999 | "print(v.shape)" 1000 | ] 1001 | }, 1002 | { 1003 | "cell_type": "code", 1004 | "execution_count": null, 1005 | "metadata": {}, 1006 | "outputs": [], 1007 | "source": [ 1008 | "v.squeeze().shape # \"compress\" all single-dimensions" 1009 | ] 1010 | }, 1011 | { 1012 | "cell_type": "code", 1013 | "execution_count": null, 1014 | "metadata": {}, 1015 | "outputs": [], 1016 | "source": [ 1017 | "v.squeeze(0).shape # \"compress\" only the (0-indexed) single-dimension" 1018 | ] 1019 | }, 1020 | { 1021 | "cell_type": "code", 1022 | "execution_count": null, 1023 | "metadata": {}, 1024 | "outputs": [], 1025 | "source": [ 1026 | "v.unsqueeze(1).shape # \"add\" a new dimension BEFORE the (1-indexed) dimension" 1027 | ] 1028 | }, 1029 | { 1030 | "cell_type": "code", 1031 | "execution_count": null, 1032 | "metadata": {}, 1033 | "outputs": [], 1034 | "source": [ 1035 | "# using numpy notation (better since it explicitily says where a new dimension is being created)\n", 1036 | "v[:, None].shape" 1037 | ] 1038 | }, 1039 | { 1040 | "cell_type": "code", 1041 | "execution_count": null, 1042 | "metadata": {}, 1043 | "outputs": [], 1044 | "source": [ 1045 | "v.unsqueeze(1).unsqueeze(-1).unsqueeze(1).shape # what unsqueeze(1).unsqueeze(1) does?" 1046 | ] 1047 | }, 1048 | { 1049 | "cell_type": "code", 1050 | "execution_count": null, 1051 | "metadata": {}, 1052 | "outputs": [], 1053 | "source": [ 1054 | "v[:, None, None, ..., None].shape" 1055 | ] 1056 | }, 1057 | { 1058 | "cell_type": "code", 1059 | "execution_count": null, 1060 | "metadata": {}, 1061 | "outputs": [], 1062 | "source": [ 1063 | "# we can also use .view(dims) as long te specified dims are valid\n", 1064 | "v.view(1, 1, 1, 4, 1, 1).shape" 1065 | ] 1066 | }, 1067 | { 1068 | "cell_type": "markdown", 1069 | "metadata": {}, 1070 | "source": [ 1071 | "## General Broadcast Semantics" 1072 | ] 1073 | }, 1074 | { 1075 | "cell_type": "markdown", 1076 | "metadata": {}, 1077 | "source": [ 1078 | "Two tensors are “broadcastable” if the following rules hold:\n", 1079 | "\n", 1080 | "- Each tensor has at least one dimension.\n", 1081 | "\n", 1082 | "- When iterating over the dimension sizes, starting at the trailing dimension, the dimension sizes must either be equal, one of them is 1, or one of them does not exist." 1083 | ] 1084 | }, 1085 | { 1086 | "cell_type": "code", 1087 | "execution_count": null, 1088 | "metadata": {}, 1089 | "outputs": [], 1090 | "source": [ 1091 | "x = torch.rand(5,7,3)\n", 1092 | "y = torch.rand(5,7,3)\n", 1093 | "z = x + y\n", 1094 | "# same shapes are always broadcastable (i.e. the above rules always hold)" 1095 | ] 1096 | }, 1097 | { 1098 | "cell_type": "code", 1099 | "execution_count": null, 1100 | "metadata": {}, 1101 | "outputs": [], 1102 | "source": [ 1103 | "x = torch.rand((0,))\n", 1104 | "y = torch.rand(2,2)\n", 1105 | "print(x.shape)\n", 1106 | "z = x + y\n", 1107 | "# x and y are not broadcastable, because x does not have at least 1 dimension" 1108 | ] 1109 | }, 1110 | { 1111 | "cell_type": "code", 1112 | "execution_count": null, 1113 | "metadata": {}, 1114 | "outputs": [], 1115 | "source": [ 1116 | "# can line up trailing dimensions\n", 1117 | "x = torch.empty(5,3,4,1)\n", 1118 | "y = torch.empty( 3,1,1)\n", 1119 | "z = x + y\n", 1120 | "# x and y are broadcastable.\n", 1121 | "# 1st trailing dimension: both have size 1\n", 1122 | "# 2nd trailing dimension: y has size 1\n", 1123 | "# 3rd trailing dimension: x size == y size\n", 1124 | "# 4th trailing dimension: y dimension doesn't exist" 1125 | ] 1126 | }, 1127 | { 1128 | "cell_type": "code", 1129 | "execution_count": null, 1130 | "metadata": {}, 1131 | "outputs": [], 1132 | "source": [ 1133 | "# but:\n", 1134 | "x = torch.empty(5,2,4,1)\n", 1135 | "y = torch.empty( 3,1,1)\n", 1136 | "z = x + y\n", 1137 | "# x and y are not broadcastable, because in the 3rd trailing dimension 2 != 3" 1138 | ] 1139 | }, 1140 | { 1141 | "cell_type": "markdown", 1142 | "metadata": {}, 1143 | "source": [ 1144 | "Always take care of tensor shapes! It is a good practice to debug how some expression is evaluated before inserting adding it to your codebase. \n", 1145 | "\n", 1146 | "\n", 1147 | "\n", 1148 | "See more here: https://pytorch.org/docs/master/notes/broadcasting.html" 1149 | ] 1150 | }, 1151 | { 1152 | "cell_type": "markdown", 1153 | "metadata": {}, 1154 | "source": [ 1155 | "## Useful Functions\n", 1156 | "\n", 1157 | "Pytorch (and other libraries) have many functions that operate on tensors. Let's try some of them and plot the results." 1158 | ] 1159 | }, 1160 | { 1161 | "cell_type": "code", 1162 | "execution_count": null, 1163 | "metadata": {}, 1164 | "outputs": [], 1165 | "source": [ 1166 | "import matplotlib.pyplot as plt" 1167 | ] 1168 | }, 1169 | { 1170 | "cell_type": "markdown", 1171 | "metadata": {}, 1172 | "source": [ 1173 | "Create a vector x with values from -10 to 10, and intervals of 0.1." 1174 | ] 1175 | }, 1176 | { 1177 | "cell_type": "code", 1178 | "execution_count": null, 1179 | "metadata": {}, 1180 | "outputs": [], 1181 | "source": [ 1182 | "x = torch.arange(-10, 10, 0.1, dtype=torch.float)" 1183 | ] 1184 | }, 1185 | { 1186 | "cell_type": "code", 1187 | "execution_count": null, 1188 | "metadata": {}, 1189 | "outputs": [], 1190 | "source": [ 1191 | "x.shape" 1192 | ] 1193 | }, 1194 | { 1195 | "cell_type": "code", 1196 | "execution_count": null, 1197 | "metadata": {}, 1198 | "outputs": [], 1199 | "source": [ 1200 | "y = x.sin()\n", 1201 | "plt.plot(x.numpy(), y.numpy())" 1202 | ] 1203 | }, 1204 | { 1205 | "cell_type": "code", 1206 | "execution_count": null, 1207 | "metadata": {}, 1208 | "outputs": [], 1209 | "source": [ 1210 | "y = x.tanh()\n", 1211 | "plt.plot(x.numpy(), y.numpy())" 1212 | ] 1213 | }, 1214 | { 1215 | "cell_type": "code", 1216 | "execution_count": null, 1217 | "metadata": {}, 1218 | "outputs": [], 1219 | "source": [ 1220 | "y = x.exp()\n", 1221 | "plt.plot(x.numpy(), y.numpy())" 1222 | ] 1223 | }, 1224 | { 1225 | "cell_type": "code", 1226 | "execution_count": null, 1227 | "metadata": {}, 1228 | "outputs": [], 1229 | "source": [ 1230 | "y = torch.log(x)\n", 1231 | "pl.plot(x.numpy(), y.numpy())" 1232 | ] 1233 | }, 1234 | { 1235 | "cell_type": "markdown", 1236 | "metadata": {}, 1237 | "source": [ 1238 | "# But what about GPUs?\n", 1239 | "How do I use A GPU?" 1240 | ] 1241 | }, 1242 | { 1243 | "cell_type": "code", 1244 | "execution_count": null, 1245 | "metadata": {}, 1246 | "outputs": [], 1247 | "source": [ 1248 | "my_device = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n", 1249 | "my_device" 1250 | ] 1251 | }, 1252 | { 1253 | "cell_type": "markdown", 1254 | "metadata": {}, 1255 | "source": [ 1256 | "If you have a GPU you should get something like: \n", 1257 | "`device(type='cuda', index=0)`" 1258 | ] 1259 | }, 1260 | { 1261 | "cell_type": "code", 1262 | "execution_count": null, 1263 | "metadata": {}, 1264 | "outputs": [], 1265 | "source": [ 1266 | "# you can initialize a tensor in a specfic device\n", 1267 | "torch.ones(5, device=my_device)" 1268 | ] 1269 | }, 1270 | { 1271 | "cell_type": "code", 1272 | "execution_count": null, 1273 | "metadata": {}, 1274 | "outputs": [], 1275 | "source": [ 1276 | "# you can move data to the GPU by doing .to(device)\n", 1277 | "data = torch.eye(3) # data is on the cpu \n", 1278 | "data.to(my_device) # data is moved to my_device" 1279 | ] 1280 | }, 1281 | { 1282 | "cell_type": "markdown", 1283 | "metadata": {}, 1284 | "source": [ 1285 | "Now the computation happens on the GPU." 1286 | ] 1287 | }, 1288 | { 1289 | "cell_type": "code", 1290 | "execution_count": null, 1291 | "metadata": {}, 1292 | "outputs": [], 1293 | "source": [ 1294 | "res = data + data\n", 1295 | "res" 1296 | ] 1297 | }, 1298 | { 1299 | "cell_type": "code", 1300 | "execution_count": null, 1301 | "metadata": {}, 1302 | "outputs": [], 1303 | "source": [ 1304 | "# you can get a tensor's device via the .device attribute\n", 1305 | "res.device\n", 1306 | "z = torch.arange(10)\n", 1307 | "z = z.to(res.device)\n", 1308 | "print(z.device)" 1309 | ] 1310 | }, 1311 | { 1312 | "cell_type": "markdown", 1313 | "metadata": {}, 1314 | "source": [ 1315 | "# Automatic differentiation with `autograd`\n", 1316 | "\n", 1317 | "Central to all neural networks in PyTorch is the `autograd` package. \n", 1318 | "\n", 1319 | "We can say that it is the _true_ power behind PyTorch. The autograd package provides automatic differentiation for all operations on Tensors. It is a **define-by-run** framework, which means that your backprop is defined by how your code is run, and that **every single iteration can be different**." 1320 | ] 1321 | }, 1322 | { 1323 | "cell_type": "markdown", 1324 | "metadata": {}, 1325 | "source": [ 1326 | "`torch.Tensor` is the central class of the package. If you set its attribute `.requires_grad` as `True`, it starts to track all operations applied on it. When you finish your computation you can call `.backward()` and have all the gradients computed automatically. The gradient for this tensor will be accumulated into the `.grad` attribute." 1327 | ] 1328 | }, 1329 | { 1330 | "cell_type": "code", 1331 | "execution_count": null, 1332 | "metadata": {}, 1333 | "outputs": [], 1334 | "source": [ 1335 | "x = torch.tensor(2.)\n", 1336 | "print(x)" 1337 | ] 1338 | }, 1339 | { 1340 | "cell_type": "code", 1341 | "execution_count": null, 1342 | "metadata": {}, 1343 | "outputs": [], 1344 | "source": [ 1345 | "# setting requires_grad in directly via tensor's constructor\n", 1346 | "x = torch.tensor(2., requires_grad=True)\n", 1347 | "\n", 1348 | "# or by setting .requires_grad attribute\n", 1349 | "# you can do this at any moment to track operations on x\n", 1350 | "x.requires_grad = True \n", 1351 | "\n", 1352 | "print(x)" 1353 | ] 1354 | }, 1355 | { 1356 | "cell_type": "code", 1357 | "execution_count": null, 1358 | "metadata": {}, 1359 | "outputs": [], 1360 | "source": [ 1361 | "print(x.requires_grad)\n", 1362 | "print(x.grad) # no gradient yet" 1363 | ] 1364 | }, 1365 | { 1366 | "cell_type": "code", 1367 | "execution_count": null, 1368 | "metadata": {}, 1369 | "outputs": [], 1370 | "source": [ 1371 | "# let's perform a simple operation on x\n", 1372 | "y = x ** 2\n", 1373 | "\n", 1374 | "print(\"Grad of x:\", x.grad)" 1375 | ] 1376 | }, 1377 | { 1378 | "cell_type": "code", 1379 | "execution_count": null, 1380 | "metadata": {}, 1381 | "outputs": [], 1382 | "source": [ 1383 | "# if you want to compute the derivatives, you can call .backward() on a Tensor\n", 1384 | "y.backward()\n", 1385 | "print(\"Grad of y with respect to x:\", x.grad)" 1386 | ] 1387 | }, 1388 | { 1389 | "cell_type": "markdown", 1390 | "metadata": {}, 1391 | "source": [ 1392 | "To stop a tensor from tracking history, you can call `.detach()` to detach it from the computation history, and to prevent future computation from being tracked." 1393 | ] 1394 | }, 1395 | { 1396 | "cell_type": "code", 1397 | "execution_count": null, 1398 | "metadata": {}, 1399 | "outputs": [], 1400 | "source": [ 1401 | "x = torch.tensor(2., requires_grad=True)\n", 1402 | "print(x)\n", 1403 | "\n", 1404 | "y = x ** 2\n", 1405 | "print(y)\n", 1406 | "\n", 1407 | "c = y.detach() # c will be treated as a constant! c has the same contents as y but requires_grad=False\n", 1408 | "print(c)\n", 1409 | "\n", 1410 | "z = c * y.exp() \n", 1411 | "print(z)\n", 1412 | "\n", 1413 | "z.backward()\n", 1414 | "print(x.grad)" 1415 | ] 1416 | }, 1417 | { 1418 | "cell_type": "markdown", 1419 | "metadata": {}, 1420 | "source": [ 1421 | "To prevent tracking history (and using memory), you can also wrap the code block in with `torch.no_grad()`: This can be particularly helpful when evaluating a model because the model may have trainable parameters with `requires_grad=True`, but for which we don’t need the gradients." 1422 | ] 1423 | }, 1424 | { 1425 | "cell_type": "code", 1426 | "execution_count": null, 1427 | "metadata": {}, 1428 | "outputs": [], 1429 | "source": [ 1430 | "x = torch.tensor(2.)\n", 1431 | "x.requires_grad = True\n", 1432 | "print('x:', x)\n", 1433 | "\n", 1434 | "y = x ** 2\n", 1435 | "print('y:', y)\n", 1436 | "\n", 1437 | "with torch.no_grad():\n", 1438 | " y = 2 * y\n", 1439 | " print('x:', x) # Try to think why x.requires_grad is True\n", 1440 | " print('y:', y)" 1441 | ] 1442 | }, 1443 | { 1444 | "cell_type": "markdown", 1445 | "metadata": {}, 1446 | "source": [ 1447 | "There’s one more class which is very important for autograd implementation - a `Function`.\n", 1448 | "\n", 1449 | "`Tensor` and `Function` are interconnected and build up an acyclic graph, that encodes a complete history of computation. Each tensor has a `.grad_fn` attribute that references a `Function` that has created the `Tensor` (except for `Tensor`s created by the user - their `grad_fn` is `None`).\n", 1450 | "\n", 1451 | "Let's go back and see the `grad_fn` in our previous example:\n", 1452 | "```\n", 1453 | "input -> x -> Pow(2) -> y -> Exp() -> Mul(constant) -> output\n", 1454 | "```\n", 1455 | "\n", 1456 | "We can create a `Function` and manually define its gradient (this is particularly useful for originally non-differentiable operations)" 1457 | ] 1458 | }, 1459 | { 1460 | "cell_type": "code", 1461 | "execution_count": null, 1462 | "metadata": {}, 1463 | "outputs": [], 1464 | "source": [ 1465 | "class Exp(torch.autograd.Function):\n", 1466 | " @staticmethod\n", 1467 | " def forward(ctx, i):\n", 1468 | " result = i.exp()\n", 1469 | " ctx.save_for_backward(result)\n", 1470 | " return result\n", 1471 | " \n", 1472 | " @staticmethod\n", 1473 | " def backward(ctx, grad_output):\n", 1474 | " result, = ctx.saved_tensors\n", 1475 | " return grad_output * result\n", 1476 | "\n", 1477 | "# Use it by calling the apply method:\n", 1478 | "x = torch.arange(4)\n", 1479 | "output = Exp.apply(x)\n", 1480 | "output" 1481 | ] 1482 | }, 1483 | { 1484 | "cell_type": "markdown", 1485 | "metadata": {}, 1486 | "source": [ 1487 | "If you still don't believe autograd works, here's something that I think will change your mind --- we're going to compute the derivative of an unnecessarily complicated function:\n", 1488 | "\n", 1489 | "$$ y(x) = \\sum_{x_i} e^{0.001 x_i^2} + \\sin(x_i^3) \\cdot \\log(x_i)$$" 1490 | ] 1491 | }, 1492 | { 1493 | "cell_type": "code", 1494 | "execution_count": null, 1495 | "metadata": {}, 1496 | "outputs": [], 1497 | "source": [ 1498 | "def complicated_func(X):\n", 1499 | " return torch.sum(torch.exp(0.001 * X ** 2) + torch.sin(X ** 3) * torch.log(X))" 1500 | ] 1501 | }, 1502 | { 1503 | "cell_type": "code", 1504 | "execution_count": null, 1505 | "metadata": {}, 1506 | "outputs": [], 1507 | "source": [ 1508 | "x = torch.arange(1, 10, 0.1, dtype=torch.float, requires_grad=True)\n", 1509 | "x" 1510 | ] 1511 | }, 1512 | { 1513 | "cell_type": "code", 1514 | "execution_count": null, 1515 | "metadata": {}, 1516 | "outputs": [], 1517 | "source": [ 1518 | "y = complicated_func(x)\n", 1519 | "y.backward()" 1520 | ] 1521 | }, 1522 | { 1523 | "cell_type": "code", 1524 | "execution_count": null, 1525 | "metadata": {}, 1526 | "outputs": [], 1527 | "source": [ 1528 | "x.grad" 1529 | ] 1530 | }, 1531 | { 1532 | "cell_type": "markdown", 1533 | "metadata": {}, 1534 | "source": [ 1535 | "### Concepts not covered in this lecture\n", 1536 | "\n", 1537 | "PyTorch's `autograd` is a very powerfull tool. For instance, it can calculate the Jacobian and Hessian of any given function! Here is a list of more advanced things that you can accomplish with `autograd`:\n", 1538 | "\n", 1539 | "- Vector-Jacobian products for non-scalar outputs (e.g., when `y` is a vector)\n", 1540 | "- Compute Jacobian and Hessian\n", 1541 | "- Retain the computation graph (useful for inspecting gradients inside a model)\n", 1542 | "- Sparse gradients\n", 1543 | "- Register and remove hooks (useful for saving gradients)\n", 1544 | "- How to set up user-designed `Function`s properly\n", 1545 | "- Numerical gradient checking\n", 1546 | "\n", 1547 | "\n", 1548 | "More info: https://pytorch.org/docs/stable/autograd.html" 1549 | ] 1550 | }, 1551 | { 1552 | "cell_type": "markdown", 1553 | "metadata": {}, 1554 | "source": [ 1555 | "### The interaction of `autograd` with `nn.Module`s and `nn.Parameters`" 1556 | ] 1557 | }, 1558 | { 1559 | "cell_type": "markdown", 1560 | "metadata": {}, 1561 | "source": [ 1562 | "In the next notebook we will see how to build a linear regression model using PyTorch's `nn.Module`. You will see that you don't need to worry about gradients when using `nn.Module` and `nn.Parameter`. This is because they automatically keep track of gradients for you." 1563 | ] 1564 | }, 1565 | { 1566 | "cell_type": "code", 1567 | "execution_count": null, 1568 | "metadata": {}, 1569 | "outputs": [], 1570 | "source": [ 1571 | "# w.x + b\n", 1572 | "lin = torch.nn.Linear(2, 1, bias=True) # nn.Linear is a nn.Module\n", 1573 | "lin.weight # lin.weight is a nn.Parameter!" 1574 | ] 1575 | }, 1576 | { 1577 | "cell_type": "code", 1578 | "execution_count": null, 1579 | "metadata": {}, 1580 | "outputs": [], 1581 | "source": [ 1582 | "type(lin.weight)" 1583 | ] 1584 | }, 1585 | { 1586 | "cell_type": "markdown", 1587 | "metadata": {}, 1588 | "source": [ 1589 | "---" 1590 | ] 1591 | }, 1592 | { 1593 | "cell_type": "markdown", 1594 | "metadata": {}, 1595 | "source": [ 1596 | "
\n", 1597 | " Exercise: Derive the gradient \n", 1598 | "

\n", 1599 | " $$\n", 1600 | " \\dfrac{\\partial \\big[\\sum_{x_i} e^{0.001 x_i^2} + \\sin(x_i^3) \\cdot \\log(x_i)\\big]}{\\partial x}\n", 1601 | " $$\n", 1602 | "
\n", 1603 | " and make a function that computes it. Check that it gives the same output as `x.grad` in our previous example.\n", 1604 | "
" 1605 | ] 1606 | } 1607 | ], 1608 | "metadata": { 1609 | "kernelspec": { 1610 | "display_name": "Python 3 (ipykernel)", 1611 | "language": "python", 1612 | "name": "python3" 1613 | }, 1614 | "language_info": { 1615 | "codemirror_mode": { 1616 | "name": "ipython", 1617 | "version": 3 1618 | }, 1619 | "file_extension": ".py", 1620 | "mimetype": "text/x-python", 1621 | "name": "python", 1622 | "nbconvert_exporter": "python", 1623 | "pygments_lexer": "ipython3", 1624 | "version": "3.9.7" 1625 | } 1626 | }, 1627 | "nbformat": 4, 1628 | "nbformat_minor": 2 1629 | } 1630 | -------------------------------------------------------------------------------- /02-linear-regression.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Linear Regression and Gradient Descent" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "In this notebook we will see how we can perform linear regression in three different ways: \n", 15 | "1. pure numpy\n", 16 | "2. numpy + pytorch's autograd \n", 17 | "3. pure pytorch" 18 | ] 19 | }, 20 | { 21 | "cell_type": "code", 22 | "execution_count": null, 23 | "metadata": {}, 24 | "outputs": [], 25 | "source": [ 26 | "import numpy as np\n", 27 | "import torch\n", 28 | "import matplotlib.pyplot as plt\n", 29 | "from pprint import pprint" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": {}, 36 | "outputs": [], 37 | "source": [ 38 | "np.random.seed(0)\n", 39 | "torch.manual_seed(0);" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "## The Problem" 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "Suppose that we want to predict a real-valued quantity $y \\in \\mathbb{R}$ for a given input $\\mathbf{x} \\in \\mathbb{R}^d$. This is known as **regression**. \n", 54 | "\n", 55 | "The most common loss function for regression is the **quadractic loss** or **$\\ell_2$ loss**:\n", 56 | "\n", 57 | "$$\n", 58 | "\\ell_2(y, \\hat{y}) = (y - \\hat{y})^2\n", 59 | "$$\n", 60 | "\n", 61 | "The empirical risk becomes the **mean squared error (MSE)**:\n", 62 | "\n", 63 | "$$\n", 64 | "MSE(\\theta) = \\frac{1}{N} \\sum\\limits_{n=1}^{N} (y_n - f(\\mathbf{x}_n; \\theta))^2\n", 65 | "$$\n", 66 | "\n", 67 | "The model $f(\\mathbf{x}_n; \\theta)$ can be parameterized in many ways. In this lecture we will focus on a linear parameterization, leading to the well-known **Linear Regression** formulation:\n", 68 | "\n", 69 | "$$\n", 70 | "f(\\mathbf{x}; \\theta) = \\mathbf{w}^\\top \\mathbf{x} + b = w_1 x_1 + w_2 x_2 + \\cdots + w_D x_D + b\n", 71 | "$$\n", 72 | "\n", 73 | "where $\\theta = (b, \\mathbf{w})$ are the parameters of the model." 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": {}, 79 | "source": [ 80 | "## Example\n", 81 | "\n", 82 | "Let's create a synthetic regression dataset using `sklearn`'s `make_regression` function. For better visualization, we will use only a single feature.$" 83 | ] 84 | }, 85 | { 86 | "cell_type": "code", 87 | "execution_count": null, 88 | "metadata": {}, 89 | "outputs": [], 90 | "source": [ 91 | "from sklearn.datasets import make_regression\n", 92 | "\n", 93 | "\n", 94 | "n_features = 1\n", 95 | "n_samples = 100\n", 96 | "\n", 97 | "X, y = make_regression(\n", 98 | " n_samples=n_samples,\n", 99 | " n_features=n_features,\n", 100 | " noise=20,\n", 101 | " random_state=42,\n", 102 | ")\n", 103 | "\n", 104 | "fix, ax = plt.subplots()\n", 105 | "ax.plot(X, y, \".\")\n", 106 | "print(X.shape, y.shape)" 107 | ] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "metadata": {}, 112 | "source": [ 113 | "For instance, by looking at the plot above, let's say that $w \\approx 40$ and $b \\approx 2$. Then, we would arrive at the following predictions (with vertical bars indicating the errors)." 114 | ] 115 | }, 116 | { 117 | "cell_type": "code", 118 | "execution_count": null, 119 | "metadata": {}, 120 | "outputs": [], 121 | "source": [ 122 | "# our estimate\n", 123 | "w = 40.0\n", 124 | "b = 2.0\n", 125 | "y_pred = w*X + b\n", 126 | "\n", 127 | "# subplots\n", 128 | "fig, axs = plt.subplots(1, 2, figsize=(16, 4))\n", 129 | "\n", 130 | "# left plot\n", 131 | "axs[0].plot(X, y, 'o')\n", 132 | "axs[0].plot(X, y_pred, '-')\n", 133 | "\n", 134 | "# right plot\n", 135 | "axs[1].vlines(X, y, y_pred, color='black')\n", 136 | "axs[1].plot(X, y, 'o')\n", 137 | "axs[1].plot(X, y_pred, '-')" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": {}, 143 | "source": [ 144 | "By adjusting our parameters $\\theta=(w, b)$, we can minimize the sum of squared errors to find the **least squares solution**\n", 145 | "\n", 146 | "$$\n", 147 | "\\begin{align}\n", 148 | "\\hat{\\theta} &= \\arg\\min_\\theta MSE(\\theta) \\\\\n", 149 | "&= \\arg\\min_\\theta \\frac{1}{N} \\sum\\limits_{n=1}^{N} (y_n - f(\\mathbf{x}_n; \\theta))^2 \\\\\n", 150 | "&= \\arg\\min_{w,b} \\frac{1}{N} \\sum\\limits_{n=1}^{N} (y_n - (w \\cdot x_n + b))^2\n", 151 | "\\end{align}\n", 152 | "$$\n", 153 | "\n", 154 | "Which can be found by taking the gradient of the loss function w.r.t. $\\theta$. \n", 155 | "\n", 156 | "\n" 202 | ] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "In general, for inputs with higher dimensionality $d$, we have $\\mathbf{w} \\in \\mathbb{R}^d$, and thus we have the following gradient (assuming that $b$ is absorbed by $w$):\n", 209 | "\n", 210 | "$$\n", 211 | "\\begin{align}\n", 212 | "\\nabla_\\mathbf{w} MSE(\\theta) &= \\nabla_\\mathbf{w} \\frac{1}{N} \\sum\\limits_{n=1}^{N} (y_n - f(\\mathbf{x}_n; \\theta))^2 \\\\\n", 213 | "&= \\frac{-2}{N} \\sum\\limits_{n=1}^{N} (y_n - f(\\mathbf{x}_n; \\theta)) \\cdot \\nabla_\\mathbf{w} f(\\mathbf{x}_n; \\theta) \\\\\n", 214 | "&= \\frac{-2}{N} \\sum\\limits_{n=1}^{N} (y_n - (\\mathbf{w}^\\top \\mathbf{x}_n + b)) \\cdot \\mathbf{x}_n\n", 215 | "\\end{align}\n", 216 | "$$\n", 217 | "\n", 218 | "Now, we just have follow the gradient descent rule to update $\\mathbf{w}$: \n", 219 | "\n", 220 | "$$\n", 221 | "\\mathbf{w}_{t+1} = \\mathbf{w}_{t} - \\alpha \\nabla_{\\mathbf{w}} MSE(\\theta)\n", 222 | "$$\n", 223 | "\n", 224 | "Where $\\alpha$ represents the learning rate. So, let's implement this in numpy to see what happens." 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "# Numpy Solution" 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": null, 237 | "metadata": {}, 238 | "outputs": [], 239 | "source": [ 240 | "class LinearRegression(object):\n", 241 | " def __init__(self, n_features, n_targets=1, lr=0.1):\n", 242 | " self.W = np.zeros((n_targets, n_features))\n", 243 | " self.lr = lr\n", 244 | "\n", 245 | " def update_weight(self, X, y, y_hat):\n", 246 | " N = X.shape[0]\n", 247 | " W_grad = - 2 * np.dot(X.T, y - y_hat) / N\n", 248 | " self.W = self.W - self.lr * W_grad\n", 249 | "\n", 250 | " def loss(self, y_hat, y):\n", 251 | " return np.mean(np.power(y - y_hat, 2))\n", 252 | "\n", 253 | " def predict(self, X):\n", 254 | " return np.dot(X, self.W.T).squeeze(-1)\n", 255 | "\n", 256 | " def train(self, X, y, epochs=50):\n", 257 | " \"\"\"\n", 258 | " X (n_examples x n_features): input matrix\n", 259 | " y (n_examples): gold labels\n", 260 | " \"\"\"\n", 261 | " loss_history = []\n", 262 | " for _ in range(epochs):\n", 263 | " # get prediction for computing the loss\n", 264 | " y_hat = self.predict(X)\n", 265 | " loss = self.loss(y_hat, y)\n", 266 | "\n", 267 | " # update weights\n", 268 | " self.update_weight(X, y, y_hat)\n", 269 | " # (thought exercise): what happens if we do this instead?\n", 270 | " # for x_i, y_i in zip(X, y):\n", 271 | " # self.update_weight(x_i, y_i)\n", 272 | "\n", 273 | " # save loss value\n", 274 | " loss_history.append(loss)\n", 275 | " return loss_history" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": null, 281 | "metadata": {}, 282 | "outputs": [], 283 | "source": [ 284 | "# trick for handling the bias term:\n", 285 | "# concat a columns of 1s to the original input matrix X\n", 286 | "use_bias = True\n", 287 | "if use_bias:\n", 288 | " X_np = np.hstack([np.ones((n_samples,1)), X])\n", 289 | " n_features += 1\n", 290 | "else:\n", 291 | " X_np = X" 292 | ] 293 | }, 294 | { 295 | "cell_type": "code", 296 | "execution_count": null, 297 | "metadata": {}, 298 | "outputs": [], 299 | "source": [ 300 | "model = LinearRegression(n_features=n_features, n_targets=1, lr=0.1)\n", 301 | "loss_history = model.train(X_np, y, epochs=50)\n", 302 | "y_hat = model.predict(X_np)" 303 | ] 304 | }, 305 | { 306 | "cell_type": "code", 307 | "execution_count": null, 308 | "metadata": {}, 309 | "outputs": [], 310 | "source": [ 311 | "print('b:', model.W[0,0])\n", 312 | "print('W:', model.W[0,1])\n", 313 | "plt.plot(loss_history)\n", 314 | "plt.title('Loss per epoch')" 315 | ] 316 | }, 317 | { 318 | "cell_type": "code", 319 | "execution_count": null, 320 | "metadata": {}, 321 | "outputs": [], 322 | "source": [ 323 | "# Vis\n", 324 | "fig, axs = plt.subplots(1, 2, figsize=(16, 4))\n", 325 | "axs[0].plot(X, y, \"o\", label=\"data\")\n", 326 | "axs[0].plot(X, 40*X + 2, \"-\", label=\"pred\")\n", 327 | "axs[0].set_title(\"Guess\")\n", 328 | "axs[0].legend();\n", 329 | "\n", 330 | "axs[1].plot(X, y, \"o\", label=\"data\")\n", 331 | "axs[1].plot(X, y_hat, \"-\", label=\"pred\")\n", 332 | "axs[1].set_title(\"Numpy solution\")\n", 333 | "axs[1].legend();" 334 | ] 335 | }, 336 | { 337 | "cell_type": "markdown", 338 | "metadata": {}, 339 | "source": [ 340 | "# Numpy + Autograd Solution\n", 341 | "\n", 342 | "In the previous implementation, we had to derive the gradient $\\frac{\\partial MSE(\\theta)}{\\partial \\theta}$ manually. If the model $f(\\cdot;\\theta)$ is more complex, this might be a cumbersome and error-prone task. To avoid this, we will use PyTorch `autograd` to automatically compute gradients.\n" 343 | ] 344 | }, 345 | { 346 | "cell_type": "code", 347 | "execution_count": null, 348 | "metadata": {}, 349 | "outputs": [], 350 | "source": [ 351 | "class MixedLinearRegression(object):\n", 352 | " def __init__(self, n_features, n_targets=1, lr=0.01):\n", 353 | " # note requires_grad=True!\n", 354 | " self.W = torch.zeros(n_targets, n_features, requires_grad=True)\n", 355 | " self.lr = lr\n", 356 | " \n", 357 | " def update_weight(self):\n", 358 | " # Gradients are given to us by autograd!\n", 359 | " self.W.data = self.W.data - self.lr * self.W.grad.data\n", 360 | "\n", 361 | " def loss(self, y_hat, y):\n", 362 | " return torch.mean(torch.pow(y - y_hat, 2))\n", 363 | "\n", 364 | " def predict(self, X):\n", 365 | " return torch.matmul(X, self.W.t()).squeeze(-1)\n", 366 | "\n", 367 | " def train(self, X, y, epochs=50):\n", 368 | " \"\"\"\n", 369 | " X (n_examples x n_features): input matrix\n", 370 | " y (n_examples): gold labels\n", 371 | " \"\"\"\n", 372 | " loss_history = []\n", 373 | " for _ in range(epochs):\n", 374 | " # Our neural net is a Line function!\n", 375 | " y_hat = self.predict(X)\n", 376 | " \n", 377 | " # Compute the loss using torch operations so they are saved in the gradient history.\n", 378 | " loss = self.loss(y_hat, y)\n", 379 | " \n", 380 | " # Computes the gradient of loss with respect to all Variables with requires_grad=True.\n", 381 | " # where Variables are tensors with requires_grad=True\n", 382 | " loss.backward()\n", 383 | " loss_history.append(loss.item())\n", 384 | "\n", 385 | " # Update weights using gradient descent; W.data is a Tensor.\n", 386 | " self.update_weight()\n", 387 | "\n", 388 | " # Reset the accumulated gradients\n", 389 | " self.W.grad.data.zero_()\n", 390 | " \n", 391 | " return loss_history" 392 | ] 393 | }, 394 | { 395 | "cell_type": "code", 396 | "execution_count": null, 397 | "metadata": {}, 398 | "outputs": [], 399 | "source": [ 400 | "X_pt = torch.from_numpy(X_np).float()\n", 401 | "y_pt = torch.from_numpy(y).float()" 402 | ] 403 | }, 404 | { 405 | "cell_type": "code", 406 | "execution_count": null, 407 | "metadata": {}, 408 | "outputs": [], 409 | "source": [ 410 | "model = MixedLinearRegression(n_features=n_features, n_targets=1, lr=0.1)\n", 411 | "loss_history = model.train(X_pt, y_pt, epochs=50)\n", 412 | "with torch.no_grad():\n", 413 | " y_hat = model.predict(X_pt)" 414 | ] 415 | }, 416 | { 417 | "cell_type": "code", 418 | "execution_count": null, 419 | "metadata": {}, 420 | "outputs": [], 421 | "source": [ 422 | "print('b:', model.W[0,0].item())\n", 423 | "print('W:', model.W[0,1].item())\n", 424 | "plt.plot(loss_history)\n", 425 | "plt.title('Loss per epoch');" 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": null, 431 | "metadata": {}, 432 | "outputs": [], 433 | "source": [ 434 | "# Vis\n", 435 | "fig, axs = plt.subplots(1, 3, figsize=(16, 4))\n", 436 | "axs[0].plot(X, y, \"o\", label=\"data\")\n", 437 | "axs[0].plot(X, 40*X + 2, \"-\", label=\"pred\")\n", 438 | "axs[0].set_title(\"Guess\")\n", 439 | "axs[0].legend();\n", 440 | "\n", 441 | "axs[1].plot(X, y, \"o\", label=\"data\")\n", 442 | "axs[1].plot(X, 47.12483907744531*X + 2.3264433961431727, \"-\", label=\"pred\")\n", 443 | "axs[1].set_title(\"Numpy solution\")\n", 444 | "axs[1].legend();\n", 445 | "\n", 446 | "axs[2].plot(X, y, \"o\", label=\"data\")\n", 447 | "axs[2].plot(X, y_hat, \"-\", label=\"pred\")\n", 448 | "axs[2].set_title(\"Mixed solution\")\n", 449 | "axs[2].legend();" 450 | ] 451 | }, 452 | { 453 | "cell_type": "markdown", 454 | "metadata": {}, 455 | "source": [ 456 | "# PyTorch Solution\n", 457 | "\n", 458 | "Mixing PyTorch and Numpy is no fun. PyTorch is actually very powerful and provides most of the things we need to apply gradient descent for any model $f$, as long all operations applied over the inputs are Torch operations (so gradients can be tracked). \n", 459 | "\n", 460 | "To this end, we will use the submodule `torch.nn`, which provides us a way for encapsulating our model into a `nn.Module`. With this, all we need to do is define the our parameters in the `__init__` method and then the _forward_ pass of our model in the `forward` method. " 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": null, 466 | "metadata": {}, 467 | "outputs": [], 468 | "source": [ 469 | "from torch import nn\n", 470 | "from torch import optim\n", 471 | "\n", 472 | "# See the inheritance from nn.Module\n", 473 | "class TorchLinearRegression(nn.Module):\n", 474 | " \n", 475 | " def __init__(self, n_features, n_targets=1):\n", 476 | " super().__init__() # this is mandatory!\n", 477 | " \n", 478 | " # encapsulate our weights into a nn.Parameter object\n", 479 | " self.W = torch.nn.Parameter(torch.zeros(n_targets, n_features))\n", 480 | "\n", 481 | " def forward(self, X):\n", 482 | " \"\"\"\n", 483 | " X (n_examples x n_features): input matrix\n", 484 | " \"\"\"\n", 485 | " #if self.training:\n", 486 | " # X = X ** 2\n", 487 | " #else:\n", 488 | " # X = X ** 3\n", 489 | " # import ipdb; ipdb. set_trace()\n", 490 | " return X @ self.W.t()" 491 | ] 492 | }, 493 | { 494 | "cell_type": "code", 495 | "execution_count": null, 496 | "metadata": {}, 497 | "outputs": [], 498 | "source": [ 499 | "# define model, loss function and optmizer\n", 500 | "model = TorchLinearRegression(n_features)\n", 501 | "loss_fn = nn.MSELoss()\n", 502 | "optimizer = optim.SGD(model.parameters(), lr=0.1)\n", 503 | "\n", 504 | "# move to CUDA if available\n", 505 | "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n", 506 | "model = model.to(device)\n", 507 | "X = X_pt.to(device)\n", 508 | "y = y_pt.to(device).unsqueeze(-1)" 509 | ] 510 | }, 511 | { 512 | "cell_type": "markdown", 513 | "metadata": {}, 514 | "source": [ 515 | "All done! Now we just have to write a training loop, which is more or less a standard set of steps for training all models:" 516 | ] 517 | }, 518 | { 519 | "cell_type": "code", 520 | "execution_count": null, 521 | "metadata": {}, 522 | "outputs": [], 523 | "source": [ 524 | "def train(model, X, y, epochs=50):\n", 525 | " # inform PyTorch that we are in \"training\" mode\n", 526 | " model.train()\n", 527 | " \n", 528 | " loss_history = []\n", 529 | " for _ in range(epochs):\n", 530 | " # reset gradients before learning\n", 531 | " optimizer.zero_grad()\n", 532 | " \n", 533 | " # get predictions and and the final score from the loss function \n", 534 | " y_hat = model(X)\n", 535 | " loss = loss_fn(y_hat, y)\n", 536 | " loss_history.append(loss.item())\n", 537 | " \n", 538 | " # compute gradients of the loss wrt parameters\n", 539 | " loss.backward()\n", 540 | " \n", 541 | " # perform gradient step to update the parameters\n", 542 | " optimizer.step()\n", 543 | "\n", 544 | " return loss_history" 545 | ] 546 | }, 547 | { 548 | "cell_type": "code", 549 | "execution_count": null, 550 | "metadata": {}, 551 | "outputs": [], 552 | "source": [ 553 | "def evaluate(model, X):\n", 554 | " # inform PyTorch that we are in \"evaluation\" mode\n", 555 | " model.eval()\n", 556 | " \n", 557 | " # disable gradient tracking\n", 558 | " with torch.no_grad():\n", 559 | " # get prediction\n", 560 | " y_hat = model(X)\n", 561 | " \n", 562 | " return y_hat" 563 | ] 564 | }, 565 | { 566 | "cell_type": "code", 567 | "execution_count": null, 568 | "metadata": {}, 569 | "outputs": [], 570 | "source": [ 571 | "loss_history = train(model, X, y, epochs=50)\n", 572 | "y_hat = evaluate(model, X)" 573 | ] 574 | }, 575 | { 576 | "cell_type": "code", 577 | "execution_count": null, 578 | "metadata": {}, 579 | "outputs": [], 580 | "source": [ 581 | "print('b:', model.W[0,0].item())\n", 582 | "print('W:', model.W[0,1].item())\n", 583 | "plt.plot(loss_history)\n", 584 | "plt.title('Loss per epoch');" 585 | ] 586 | }, 587 | { 588 | "cell_type": "code", 589 | "execution_count": null, 590 | "metadata": {}, 591 | "outputs": [], 592 | "source": [ 593 | "# Vis\n", 594 | "X = X_pt[:, 1:].numpy()\n", 595 | "y = y_pt.squeeze(-1).numpy()\n", 596 | "\n", 597 | "fig, axs = plt.subplots(1, 4, figsize=(16, 4))\n", 598 | "axs[0].plot(X, y, \"o\", label=\"data\")\n", 599 | "axs[0].plot(X, 40*X + 2, \"-\", label=\"pred\")\n", 600 | "axs[0].set_title(\"Guess\")\n", 601 | "axs[0].legend();\n", 602 | "\n", 603 | "axs[1].plot(X, y, \"o\", label=\"data\")\n", 604 | "axs[1].plot(X, 47.12483907744531*X + 2.3264433961431727, \"-\", label=\"pred\")\n", 605 | "axs[1].set_title(\"Numpy solution\")\n", 606 | "axs[1].legend();\n", 607 | "\n", 608 | "axs[2].plot(X, y, \"o\", label=\"data\")\n", 609 | "axs[2].plot(X, 47.12483596801758*X + 2.3264429569244385, \"-\", label=\"pred\")\n", 610 | "axs[2].set_title(\"Mixed solution\")\n", 611 | "axs[2].legend();\n", 612 | "\n", 613 | "axs[3].plot(X, y, \"o\", label=\"data\")\n", 614 | "axs[3].plot(X, y_hat, \"-\", label=\"pred\")\n", 615 | "axs[3].set_title(\"PyTorch solution\")\n", 616 | "axs[3].legend();" 617 | ] 618 | }, 619 | { 620 | "cell_type": "markdown", 621 | "metadata": {}, 622 | "source": [ 623 | "**Note:** I did gradient descent with the entire dataset rather than splitting the data into `train` and `valid` subsets, which should be done in practice!" 624 | ] 625 | }, 626 | { 627 | "cell_type": "markdown", 628 | "metadata": {}, 629 | "source": [ 630 | "## Exercises" 631 | ] 632 | }, 633 | { 634 | "cell_type": "markdown", 635 | "metadata": {}, 636 | "source": [ 637 | "- Write a proper training loop for PyTorch:\n", 638 | " - add support for batches\n", 639 | " - add a stop criterion for the convergence of the model\n", 640 | " \n", 641 | "- Add L2 regularization" 642 | ] 643 | } 644 | ], 645 | "metadata": { 646 | "anaconda-cloud": {}, 647 | "kernelspec": { 648 | "display_name": "Python 3 (ipykernel)", 649 | "language": "python", 650 | "name": "python3" 651 | }, 652 | "language_info": { 653 | "codemirror_mode": { 654 | "name": "ipython", 655 | "version": 3 656 | }, 657 | "file_extension": ".py", 658 | "mimetype": "text/x-python", 659 | "name": "python", 660 | "nbconvert_exporter": "python", 661 | "pygments_lexer": "ipython3", 662 | "version": "3.9.7" 663 | } 664 | }, 665 | "nbformat": 4, 666 | "nbformat_minor": 1 667 | } 668 | -------------------------------------------------------------------------------- /03-modules-and-mlps.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Why Modules\n", 8 | "\n", 9 | "A typical training procedure for a neural net:\n", 10 | "\n", 11 | "0. Define a dataset ($X$ and $Y$)\n", 12 | "1. Define the neural network with some learnable weights\n", 13 | "2. Iterate over the dataset\n", 14 | "3. Pass inputs to the network (forward pass)\n", 15 | "4. Compute the loss\n", 16 | "5. Compute gradients w.r.t. network's weights (backward pass)\n", 17 | "6. Update weights (e.g., weight = weight - lr * gradient)\n", 18 | "\n", 19 | "PyTorch handles 1-6 for you via encapsulation, so you still have the flexibility to change something in between if you want! " 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## Example: MNIST classifier\n", 27 | "\n", 28 | "The MNIST dataset is composed of images of digits that must be classified with labels from 0 to 9. The inputs are 28x28 matrices containing the grayscale intensity in each pixel.\n", 29 | "\n", 30 | "We will download the MNIST dataset for training a classifier. PyTorch provides a convenient function for that." 31 | ] 32 | }, 33 | { 34 | "cell_type": "code", 35 | "execution_count": null, 36 | "metadata": {}, 37 | "outputs": [], 38 | "source": [ 39 | "import torch\n", 40 | "import torch.nn as nn\n", 41 | "import torch.optim as optim\n", 42 | "from torchvision import datasets\n", 43 | "import matplotlib.pyplot as plt\n", 44 | "torch.manual_seed(0);" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "# Dataset\n", 52 | "It's easy to create your `Dataset`,\n", 53 | "but PyTorch comes with several built-in datasets for [vision](https://pytorch.org/vision/stable/datasets.html), [audio](https://pytorch.org/audio/stable/datasets.html), and [text](https://pytorch.org/text/stable/datasets.html) modalities.\n", 54 | "\n", 55 | "The class `Dataset` gives you information about the number of samples (implement `__len__`) and gives you the sample at a given index (implement `__getitem__`). It's a nice and simple abstraction to work with data. It has the following structure:\n", 56 | "\n", 57 | "```python\n", 58 | "class Dataset(object):\n", 59 | " def __getitem__(self, index):\n", 60 | " raise NotImplementedError\n", 61 | "\n", 62 | " def __len__(self):\n", 63 | " raise NotImplementedError\n", 64 | "\n", 65 | " def __add__(self, other):\n", 66 | " return ConcatDataset([self, other])\n", 67 | "```\n", 68 | "\n", 69 | "For now, let's use MNIST. But feel free to use another `Dataset` as an exercise." 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": null, 75 | "metadata": {}, 76 | "outputs": [], 77 | "source": [ 78 | "from torch.utils.data import Dataset" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": null, 84 | "metadata": {}, 85 | "outputs": [], 86 | "source": [ 87 | "# download MNIST and store it in \"../data\"\n", 88 | "# PyTorch.datasets also handles caching for you so you don't have to download the dataset twice\n", 89 | "train_data = datasets.MNIST('../data', train=True, download=True)\n", 90 | "test_data = datasets.MNIST('../data', train=False)\n", 91 | "\n", 92 | "train_x = train_data.data\n", 93 | "train_y = train_data.targets\n", 94 | "test_x = test_data.data\n", 95 | "test_y = test_data.targets" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "n_train_examples = train_x.shape[0]\n", 105 | "n_test_examples = test_x.shape[0]\n", 106 | "print('Training instances:', n_train_examples)\n", 107 | "print('Test instances:', n_test_examples)" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": {}, 113 | "source": [ 114 | "Check the shape of our training data to see how many input features we have:" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "train_x.shape, train_y.shape" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "And what the images looks like:" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": null, 136 | "metadata": { 137 | "scrolled": true 138 | }, 139 | "outputs": [], 140 | "source": [ 141 | "C = 8\n", 142 | "fig, axs = plt.subplots(3, C, figsize=(12, 4))\n", 143 | "for i in range(3):\n", 144 | " for j in range(C):\n", 145 | " axs[i, j].imshow(train_x[i*C + j], cmap='gray')\n", 146 | " axs[i, j].set_axis_off()\n", 147 | "print(train_y[:24].reshape(3, C))" 148 | ] 149 | }, 150 | { 151 | "cell_type": "markdown", 152 | "metadata": {}, 153 | "source": [ 154 | "### Formatting" 155 | ] 156 | }, 157 | { 158 | "cell_type": "markdown", 159 | "metadata": {}, 160 | "source": [ 161 | "Each sample is a 28x28 matrix. But we want to represent them as vectors, since our model (which will be a simple MLP) doesn't take any advantage of the 2D nature of the data.\n", 162 | "\n", 163 | "So, we reshape the data:" 164 | ] 165 | }, 166 | { 167 | "cell_type": "code", 168 | "execution_count": null, 169 | "metadata": {}, 170 | "outputs": [], 171 | "source": [ 172 | "num_features = 28 * 28\n", 173 | "train_x_vectors = train_x.view(n_train_examples, num_features)\n", 174 | "print(train_x_vectors.shape)" 175 | ] 176 | }, 177 | { 178 | "cell_type": "markdown", 179 | "metadata": {}, 180 | "source": [ 181 | "When we reshape an array (or torch tensor, for that matter), we don't need to specify all dimensions. We can leave one as -1, and it will be automatically determined from the size of the data. This is useful when we don't know a priori the shape of some array." 182 | ] 183 | }, 184 | { 185 | "cell_type": "code", 186 | "execution_count": null, 187 | "metadata": {}, 188 | "outputs": [], 189 | "source": [ 190 | "train_x_vectors = train_x.view(n_train_examples, -1)\n", 191 | "test_x_vectors = test_x.view(n_test_examples, -1)\n", 192 | "\n", 193 | "print(train_x_vectors.shape, test_x_vectors.shape)" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": {}, 199 | "source": [ 200 | "Also, the values are integers in the range $[0, 255]$. It is better to work with float values in a smaller interval, such as $[0, 1]$ or $[-1, 1]$. There are some more elaborate normalization techniques, but for now let's just normalize the data into $[0, 1]$." 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "execution_count": null, 206 | "metadata": {}, 207 | "outputs": [], 208 | "source": [ 209 | "train_x_norm = train_x_vectors / 255.0\n", 210 | "test_x_norm = test_x_vectors / 255.0\n", 211 | "print(train_x_norm.max(), train_x_norm.min(), train_x_norm.mean(), train_x_norm.std())" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": {}, 217 | "source": [ 218 | "Now, let's check all the available labels:" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": null, 224 | "metadata": {}, 225 | "outputs": [], 226 | "source": [ 227 | "print(torch.unique(train_y))\n", 228 | "num_classes = len(torch.unique(train_y))\n", 229 | "print('Num classes:', num_classes)" 230 | ] 231 | }, 232 | { 233 | "cell_type": "markdown", 234 | "metadata": {}, 235 | "source": [ 236 | "# Modules and MLPs\n", 237 | "\n", 238 | "We've seen how the internals of a simple linear classifier work. However, we still had to set a lot of things manually. It's much better to have a higher-level API that encapsulates the classifier.\n", 239 | "\n", 240 | "We are going to see that now, with pytorch Module objects. Then, it will allow us to build more complex models, like a multilayer perceptron." 241 | ] 242 | }, 243 | { 244 | "cell_type": "markdown", 245 | "metadata": {}, 246 | "source": [ 247 | "We begin by loading, reshaping and normalizing the data again (so the code looks concise):" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": null, 253 | "metadata": {}, 254 | "outputs": [], 255 | "source": [ 256 | "from torchvision.transforms import ToTensor\n", 257 | "\n", 258 | "train_dataset = datasets.MNIST('../data', train=True, download=True, transform=ToTensor())\n", 259 | "test_dataset = datasets.MNIST('../data', train=False, transform=ToTensor())\n", 260 | "\n", 261 | "train_x = train_dataset.data\n", 262 | "train_y = train_dataset.targets\n", 263 | "test_x = test_dataset.data\n", 264 | "test_y = test_dataset.targets\n", 265 | "\n", 266 | "num_features = 28 * 28\n", 267 | "num_classes = len(torch.unique(train_y))\n", 268 | "new_shape = [-1, num_features]\n", 269 | "train_x_vectors = train_x.reshape(new_shape)\n", 270 | "test_x_vectors = test_x.reshape(new_shape)\n", 271 | "\n", 272 | "# shorten the names\n", 273 | "train_x = train_x_vectors.float() / 255\n", 274 | "test_x = test_x_vectors.float() / 255" 275 | ] 276 | }, 277 | { 278 | "cell_type": "markdown", 279 | "metadata": {}, 280 | "source": [ 281 | "## Using Modules\n", 282 | "\n", 283 | "PyTorch provides some basic building blocks for neural nets under `.nn` module. Here you can check the complete list of available blocks: https://pytorch.org/docs/stable/nn.html\n", 284 | "\n", 285 | "For now, let's recreate a simple linear model using `nn.Linear` (see [doc](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html#torch.nn.Linear))." 286 | ] 287 | }, 288 | { 289 | "cell_type": "code", 290 | "execution_count": null, 291 | "metadata": {}, 292 | "outputs": [], 293 | "source": [ 294 | "class LinearModel(nn.Module):\n", 295 | " def __init__(self, n_features, n_classes):\n", 296 | " super().__init__()\n", 297 | " self.linear_layer = nn.Linear(n_features, n_classes)\n", 298 | " \n", 299 | " def forward(self, X):\n", 300 | " # This is the same as doing:\n", 301 | " # return X @ self.linear_layer.weight.t() + self.linear_layer.bias\n", 302 | " # where weight and bias are instances of nn.Parameter\n", 303 | " return self.linear_layer(X)\n", 304 | "\n", 305 | "linear_model = LinearModel(num_features, num_classes)" 306 | ] 307 | }, 308 | { 309 | "cell_type": "markdown", 310 | "metadata": {}, 311 | "source": [ 312 | "As before, the model can be called as function in order to produce an output:" 313 | ] 314 | }, 315 | { 316 | "cell_type": "code", 317 | "execution_count": null, 318 | "metadata": {}, 319 | "outputs": [], 320 | "source": [ 321 | "batch = train_x[:2]\n", 322 | "outputs = linear_model(batch)\n", 323 | "outputs" 324 | ] 325 | }, 326 | { 327 | "cell_type": "markdown", 328 | "metadata": {}, 329 | "source": [ 330 | "Same as doing the forward method $$w^T x + b$$" 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": null, 336 | "metadata": {}, 337 | "outputs": [], 338 | "source": [ 339 | "batch @ linear_model.linear_layer.weight.t() + linear_model.linear_layer.bias" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": {}, 345 | "source": [ 346 | "Now that we defined our model, we just have to: \n", 347 | "- define an iterator\n", 348 | "- define and compute the loss\n", 349 | "- compute gradients\n", 350 | "- define the strategy to update the parameters of our model\n", 351 | "- glue previous steps to form the training loop!" 352 | ] 353 | }, 354 | { 355 | "cell_type": "markdown", 356 | "metadata": {}, 357 | "source": [ 358 | "#### Batching\n", 359 | "\n", 360 | "Batching can be boring to code. PyTorch provides the `DataLoader` class to help us! Dealing with data is one of the most important yet more time consuming tasks. Take a look in the PyTorch `data` submodule to [learn more](https://pytorch.org/docs/stable/data.html).\n", 361 | "\n", 362 | "In general, we just have to pass a torch `Dataset` object as input to the dataloader, and then set some hyperparams for the iterator: " 363 | ] 364 | }, 365 | { 366 | "cell_type": "code", 367 | "execution_count": null, 368 | "metadata": {}, 369 | "outputs": [], 370 | "source": [ 371 | "from torch.utils.data import DataLoader\n", 372 | "print(type(train_dataset))\n", 373 | "\n", 374 | "train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)" 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": {}, 380 | "source": [ 381 | "#### Loss\n", 382 | "\n", 383 | "Here is the complete list of available [loss functions](https://pytorch.org/docs/stable/nn.html#loss-functions).\n", 384 | "If the provided loss functions don't satisfy your constraints, it is easy to define your own loss function: just use torch operations (and be careful with differentiability issues). For example:" 385 | ] 386 | }, 387 | { 388 | "cell_type": "code", 389 | "execution_count": null, 390 | "metadata": {}, 391 | "outputs": [], 392 | "source": [ 393 | "with torch.no_grad(): # disable gradient-tracking\n", 394 | " \n", 395 | " dummy_loss = nn.CrossEntropyLoss()\n", 396 | " \n", 397 | " # try other losses!\n", 398 | " # multi-class classification hinge loss (margin-based loss):\n", 399 | " # dummy_loss = nn.MultiMarginLoss() \n", 400 | " batch = train_x[:2]\n", 401 | " targets = train_y[:2]\n", 402 | " predictions = linear_model(batch)\n", 403 | " \n", 404 | " print(predictions.shape, targets.shape)\n", 405 | " print(dummy_loss(predictions, targets))" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "And writing our own function (from the definition of the Cross Entropy loss):\n", 413 | "\n", 414 | "$$\n", 415 | "CE(p,y) = - \\log\\frac{\\exp(p_y)}{\\sum_c \\exp(p_c)}\n", 416 | "$$" 417 | ] 418 | }, 419 | { 420 | "cell_type": "code", 421 | "execution_count": null, 422 | "metadata": {}, 423 | "outputs": [], 424 | "source": [ 425 | "def dummy_loss(y_pred, y):\n", 426 | " one_hot = y.unsqueeze(1) == torch.arange(num_classes).unsqueeze(0)\n", 427 | " res = - torch.log(torch.exp(y_pred) / torch.exp(y_pred).sum(-1).unsqueeze(-1))[one_hot]\n", 428 | " return res.mean() # average per sample\n", 429 | "\n", 430 | "print(dummy_loss(predictions, targets))" 431 | ] 432 | }, 433 | { 434 | "cell_type": "markdown", 435 | "metadata": {}, 436 | "source": [ 437 | "We will use the CrossEntropy function as our loss" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": null, 443 | "metadata": {}, 444 | "outputs": [], 445 | "source": [ 446 | "loss_function = nn.CrossEntropyLoss()" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "#### Optimizer\n", 454 | "\n", 455 | "The optimizer is the object which handles the update of the model's parameters. In the previous exercise, we were using the famous \"delta\" rule to update our weights:\n", 456 | "\n", 457 | "$$\\mathbf{w}_t = \\mathbf{w}_{t-1} - \\alpha \\frac{\\partial L}{\\partial \\mathbf{w}}.$$\n", 458 | "\n", 459 | "But there are more ellaborate ways of updating our parameters: \n", 460 | "\n", 461 | "\n", 462 | "\n", 463 | "\n", 464 | "\n", 465 | "\n", 466 | "PyTorch provides an extensive list of optimizers: https://pytorch.org/docs/stable/optim.html. Notice that, as everything else, it should be easy to define your own optimizer procedure. \n", 467 | "\n", 468 | "We will use the simple yet powerful SGD optmizer. The optimizer needs to be told which are the parameters to optimize." 469 | ] 470 | }, 471 | { 472 | "cell_type": "code", 473 | "execution_count": null, 474 | "metadata": {}, 475 | "outputs": [], 476 | "source": [ 477 | "parameters = linear_model.parameters() # we will optimize all model's parameters!\n", 478 | "optimizer = torch.optim.SGD(parameters, lr=0.1)" 479 | ] 480 | }, 481 | { 482 | "cell_type": "markdown", 483 | "metadata": {}, 484 | "source": [ 485 | "#### Training loop\n", 486 | "\n", 487 | "Now we write the main training loop. This is the basic skeleton for training PyTorch models." 488 | ] 489 | }, 490 | { 491 | "cell_type": "code", 492 | "execution_count": null, 493 | "metadata": {}, 494 | "outputs": [], 495 | "source": [ 496 | "def train_model(model, dataloader, optimizer, loss_function, num_epochs=1):\n", 497 | " # Tell PyTorch that we are in training mode.\n", 498 | " # This is useful for mechanisms that work differently during training and test time, like Dropout. \n", 499 | " model.train()\n", 500 | " \n", 501 | " losses = []\n", 502 | " for epoch in range(1, num_epochs+1):\n", 503 | " print('Starting epoch %d' % epoch)\n", 504 | " total_loss = 0\n", 505 | " hits = 0\n", 506 | "\n", 507 | " for batch_x, batch_y in dataloader:\n", 508 | " # check shapes with:\n", 509 | " # import ipdb; ipdb.set_trace()\n", 510 | " # batch_x.shape is (batch_size, 28, 28)\n", 511 | " # batch_y.shape is (batch_size, )\n", 512 | " \n", 513 | " # Step 1. Remember that PyTorch accumulates gradients.\n", 514 | " # We need to clear them out before each step\n", 515 | " optimizer.zero_grad()\n", 516 | " \n", 517 | " # Step 2. Preprocess the data\n", 518 | " # (batch_size, 28, 28) -> (batch_size, 784 = 28 * 28)\n", 519 | " batch_x = batch_x.reshape(batch_x.shape[0], -1)\n", 520 | " batch_x = batch_x.to(torch.float) / 255.0\n", 521 | "\n", 522 | " # Step 3. Run forward pass.\n", 523 | " logits = model(batch_x)\n", 524 | "\n", 525 | " # Step 4. Compute loss\n", 526 | " loss = loss_function(logits, batch_y)\n", 527 | " \n", 528 | " # Step 5. Compute gradeints\n", 529 | " loss.backward()\n", 530 | " \n", 531 | " # Step 6. After determining the gradients, take a step toward their (neg-)direction\n", 532 | " optimizer.step()\n", 533 | " \n", 534 | " # Optional. Save statistics of your training\n", 535 | " loss_value = loss.item()\n", 536 | " total_loss += loss_value\n", 537 | " losses.append(loss_value)\n", 538 | " y_pred = logits.argmax(dim=1)\n", 539 | " hits += torch.sum(y_pred == batch_y).item()\n", 540 | " \n", 541 | " avg_loss = total_loss / len(train_dataloader.dataset)\n", 542 | " print('Epoch loss: %.4f' % avg_loss)\n", 543 | " acc = hits / len(train_dataloader.dataset)\n", 544 | " print('Epoch accuracy: %.4f' % acc)\n", 545 | " \n", 546 | " print('Done!')\n", 547 | " return losses" 548 | ] 549 | }, 550 | { 551 | "cell_type": "code", 552 | "execution_count": null, 553 | "metadata": {}, 554 | "outputs": [], 555 | "source": [ 556 | "linear_losses = train_model(linear_model, train_dataloader, optimizer, loss_function, num_epochs=10)" 557 | ] 558 | }, 559 | { 560 | "cell_type": "markdown", 561 | "metadata": {}, 562 | "source": [ 563 | "Graphics are good to understand the performance of a model. Let's plot the loss curve by training step:" 564 | ] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": null, 569 | "metadata": {}, 570 | "outputs": [], 571 | "source": [ 572 | "fig, ax = plt.subplots()\n", 573 | "ax.plot(linear_losses, \"-\")\n", 574 | "ax.set_xlabel('Step')\n", 575 | "ax.set_ylabel('Loss');" 576 | ] 577 | }, 578 | { 579 | "cell_type": "markdown", 580 | "metadata": {}, 581 | "source": [ 582 | "What can you conclude from this?" 583 | ] 584 | }, 585 | { 586 | "cell_type": "markdown", 587 | "metadata": {}, 588 | "source": [ 589 | "## Multilayer Perceptron\n", 590 | "\n", 591 | "We can now proceed to a more sofisticated classifier: a multilayer perceptron. Let's build one using the Sequential API." 592 | ] 593 | }, 594 | { 595 | "cell_type": "code", 596 | "execution_count": null, 597 | "metadata": {}, 598 | "outputs": [], 599 | "source": [ 600 | "class MLP(nn.Module):\n", 601 | " def __init__(self, n_features, hidden_size, n_classes):\n", 602 | " super().__init__()\n", 603 | " linear_layer1 = nn.Linear(n_features, hidden_size)\n", 604 | " linear_layer2 = nn.Linear(hidden_size, hidden_size)\n", 605 | " linear_layer3 = nn.Linear(hidden_size, n_classes)\n", 606 | " self.feedforward = nn.Sequential(\n", 607 | " linear_layer1, \n", 608 | " nn.Tanh(), \n", 609 | " linear_layer2, \n", 610 | " nn.Tanh(),\n", 611 | " linear_layer3\n", 612 | " )\n", 613 | "\n", 614 | " def forward(self, X):\n", 615 | " return self.feedforward(X)\n", 616 | "\n", 617 | "hidden_size = 200\n", 618 | "mlp = MLP(num_features, hidden_size, num_classes)\n", 619 | "loss_function = nn.CrossEntropyLoss()\n", 620 | "optimizer = torch.optim.SGD(mlp.parameters(), lr=0.1)" 621 | ] 622 | }, 623 | { 624 | "cell_type": "markdown", 625 | "metadata": {}, 626 | "source": [ 627 | "Now let's train the model." 628 | ] 629 | }, 630 | { 631 | "cell_type": "code", 632 | "execution_count": null, 633 | "metadata": {}, 634 | "outputs": [], 635 | "source": [ 636 | "mlp_losses = train_model(mlp, train_dataloader, optimizer, loss_function, num_epochs=5)" 637 | ] 638 | }, 639 | { 640 | "cell_type": "markdown", 641 | "metadata": {}, 642 | "source": [ 643 | "How do the loss and accuracy compare with the linear model?\n", 644 | "\n", 645 | "You probably also noticed a difference in running time!" 646 | ] 647 | }, 648 | { 649 | "cell_type": "code", 650 | "execution_count": null, 651 | "metadata": {}, 652 | "outputs": [], 653 | "source": [ 654 | "fig, ax = plt.subplots()\n", 655 | "ax.plot(linear_losses, \".\", label=\"linear\")\n", 656 | "ax.plot(mlp_losses, \".\", label=\"mlp\")\n", 657 | "ax.legend()" 658 | ] 659 | }, 660 | { 661 | "cell_type": "markdown", 662 | "metadata": {}, 663 | "source": [ 664 | "Note the different concentration of dots in the MLP and Linear graphics!" 665 | ] 666 | }, 667 | { 668 | "cell_type": "markdown", 669 | "metadata": {}, 670 | "source": [ 671 | "### Validation data\n", 672 | "\n", 673 | "Evaluating the performance on training data is important to understand if the model is actually learning, but if we want to know if our model has any usefulness, we should evaluate its performance on validation or test data.\n", 674 | "\n" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": null, 680 | "metadata": {}, 681 | "outputs": [], 682 | "source": [ 683 | "def evaluate_model(model, test_x, test_y):\n", 684 | " # Tell PyTorch that we are in evaluation mode.\n", 685 | " model.eval()\n", 686 | "\n", 687 | " with torch.no_grad():\n", 688 | " loss_function = torch.nn.CrossEntropyLoss()\n", 689 | " logits = model(test_x)\n", 690 | " loss = loss_function(logits, test_y)\n", 691 | "\n", 692 | " y_pred = logits.argmax(dim=1)\n", 693 | " hits = torch.sum(y_pred == test_y).item()\n", 694 | " \n", 695 | " return loss.item() / len(test_x), hits / len(test_x)" 696 | ] 697 | }, 698 | { 699 | "cell_type": "code", 700 | "execution_count": null, 701 | "metadata": {}, 702 | "outputs": [], 703 | "source": [ 704 | "evaluate_model(mlp, train_x, train_y)" 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": null, 710 | "metadata": {}, 711 | "outputs": [], 712 | "source": [ 713 | "evaluate_model(mlp, test_x, test_y)" 714 | ] 715 | }, 716 | { 717 | "cell_type": "code", 718 | "execution_count": null, 719 | "metadata": {}, 720 | "outputs": [], 721 | "source": [ 722 | "evaluate_model(linear_model, train_x, train_y)" 723 | ] 724 | }, 725 | { 726 | "cell_type": "code", 727 | "execution_count": null, 728 | "metadata": {}, 729 | "outputs": [], 730 | "source": [ 731 | "evaluate_model(linear_model, test_x, test_y)" 732 | ] 733 | }, 734 | { 735 | "cell_type": "markdown", 736 | "metadata": {}, 737 | "source": [ 738 | "How can we make our model better? There are two things to be done:\n", 739 | "\n", 740 | "1. **Hyperparameter search**. Do a grid search or random search on the hyperparameters (hidden size, learning rate, batch size, activation function, type of optimizer, ...)\n", 741 | "2. **Generalize better**. This include either finding some better feature representation or regularizing, i.e., add some kind of penalty to the model weights that encourages it to find a more general solution. Examples: L2-norm weight regularization, dropout.\n", 742 | "3. **Early stop**. Evaluate the model on validation data after each epoch or some number of batches; only save it when validation performance increases. This means detecting when the model achieved its performance peak." 743 | ] 744 | }, 745 | { 746 | "cell_type": "markdown", 747 | "metadata": {}, 748 | "source": [ 749 | "#### Dropout\n", 750 | "\n", 751 | "We could try dropout. It effectivelly deactivates some neural connections at random, forcing the network to avoid depending on specific inputs." 752 | ] 753 | }, 754 | { 755 | "cell_type": "code", 756 | "execution_count": null, 757 | "metadata": {}, 758 | "outputs": [], 759 | "source": [ 760 | "class MLPDropout(nn.Module):\n", 761 | " def __init__(self, n_features, hidden_size, n_classes, p_dropout):\n", 762 | " super().__init__()\n", 763 | " linear_layer1 = nn.Linear(n_features, hidden_size)\n", 764 | " linear_layer2 = nn.Linear(hidden_size, n_classes)\n", 765 | " self.feedforward = nn.Sequential(\n", 766 | " linear_layer1,\n", 767 | " nn.Tanh(),\n", 768 | " nn.Dropout(p_dropout),\n", 769 | " linear_layer2\n", 770 | " )\n", 771 | "\n", 772 | " def forward(self, X):\n", 773 | " return self.feedforward(X)\n", 774 | "\n", 775 | "hidden_size = 200\n", 776 | "p_dropout = 0.5\n", 777 | "mlp_dropout = MLPDropout(num_features, hidden_size, num_classes, p_dropout)\n", 778 | "loss_function = nn.CrossEntropyLoss()\n", 779 | "optimizer = torch.optim.SGD(mlp_dropout.parameters(), lr=0.1)" 780 | ] 781 | }, 782 | { 783 | "cell_type": "code", 784 | "execution_count": null, 785 | "metadata": {}, 786 | "outputs": [], 787 | "source": [ 788 | "losses = train_model(mlp_dropout, train_dataloader, optimizer, loss_function, num_epochs=5)" 789 | ] 790 | }, 791 | { 792 | "cell_type": "markdown", 793 | "metadata": {}, 794 | "source": [ 795 | "Training loss is a bit worse, as expected. After all, we are obstructing some connections.\n", 796 | "\n", 797 | "Now let's check validation performance:" 798 | ] 799 | }, 800 | { 801 | "cell_type": "code", 802 | "execution_count": null, 803 | "metadata": {}, 804 | "outputs": [], 805 | "source": [ 806 | "evaluate_model(mlp, test_x, test_y)" 807 | ] 808 | }, 809 | { 810 | "cell_type": "code", 811 | "execution_count": null, 812 | "metadata": {}, 813 | "outputs": [], 814 | "source": [ 815 | "evaluate_model(mlp_dropout, test_x, test_y)" 816 | ] 817 | }, 818 | { 819 | "cell_type": "markdown", 820 | "metadata": {}, 821 | "source": [ 822 | "No improvement. Ideally, we should retrain our model with different hyperparamters (learning rates, layer sizes, number of layers, dropout rate) as well as some changes in the structure (different optimizers, activation functions, losses). However, data representation plays a key role. \n", 823 | "\n", 824 | "
\n", 825 | "
\n", 826 | "Do you think representing the input as independent pixels is a good idea for recognizing digits?\n", 827 | "
" 828 | ] 829 | }, 830 | { 831 | "cell_type": "markdown", 832 | "metadata": {}, 833 | "source": [ 834 | "### Saving\n", 835 | "\n", 836 | "Persisting the model after training is obviously important to reuse it later. In Pytorch, we can save the model calling `save()` and passing the model's `state_dict` (a Python dict that maps all parameters name to their actual tensors)." 837 | ] 838 | }, 839 | { 840 | "cell_type": "code", 841 | "execution_count": null, 842 | "metadata": {}, 843 | "outputs": [], 844 | "source": [ 845 | "torch.save(mlp.state_dict(), 'mlp.model')" 846 | ] 847 | }, 848 | { 849 | "cell_type": "markdown", 850 | "metadata": {}, 851 | "source": [ 852 | "Later, recreate the model and load the data." 853 | ] 854 | }, 855 | { 856 | "cell_type": "code", 857 | "execution_count": null, 858 | "metadata": {}, 859 | "outputs": [], 860 | "source": [ 861 | "mlp2 = MLP(num_features, hidden_size, num_classes)\n", 862 | "mlp2.load_state_dict(torch.load('mlp.model'))" 863 | ] 864 | }, 865 | { 866 | "cell_type": "markdown", 867 | "metadata": {}, 868 | "source": [ 869 | "Let's check the performance to see if it's the same!" 870 | ] 871 | }, 872 | { 873 | "cell_type": "code", 874 | "execution_count": null, 875 | "metadata": {}, 876 | "outputs": [], 877 | "source": [ 878 | "evaluate_model(mlp, test_x, test_y)" 879 | ] 880 | }, 881 | { 882 | "cell_type": "markdown", 883 | "metadata": {}, 884 | "source": [ 885 | "# The End\n", 886 | "\n", 887 | "![https://twitter.com/karpathy/status/1013244313327681536](img/common_mistakes.png)\n", 888 | "https://twitter.com/karpathy/status/1013244313327681536" 889 | ] 890 | }, 891 | { 892 | "cell_type": "markdown", 893 | "metadata": {}, 894 | "source": [ 895 | "### Exercises\n", 896 | "\n", 897 | "- Try running the MLP example for more epochs\n", 898 | "- Try using CNNs: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html" 899 | ] 900 | } 901 | ], 902 | "metadata": { 903 | "kernelspec": { 904 | "display_name": "Python 3 (ipykernel)", 905 | "language": "python", 906 | "name": "python3" 907 | }, 908 | "language_info": { 909 | "codemirror_mode": { 910 | "name": "ipython", 911 | "version": 3 912 | }, 913 | "file_extension": ".py", 914 | "mimetype": "text/x-python", 915 | "name": "python", 916 | "nbconvert_exporter": "python", 917 | "pygments_lexer": "ipython3", 918 | "version": "3.9.7" 919 | } 920 | }, 921 | "nbformat": 4, 922 | "nbformat_minor": 2 923 | } 924 | -------------------------------------------------------------------------------- /04-optional-word2vec.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Word2Vec\n", 8 | "\n", 9 | "> \"Word2vec is a technique for natural language processing. The word2vec algorithm uses a neural network model to learn word associations from a large corpus of text. Once trained, such a model can detect synonymous words or suggest additional words for a partial sentence. As the name implies, word2vec represents each distinct word with a particular list of numbers called a vector. The vectors are chosen carefully such that a simple mathematical function (the cosine similarity between the vectors) indicates the level of semantic similarity between the words represented by those vectors.\" [ https://en.wikipedia.org/wiki/Word2vec ]\n", 10 | "\n", 11 | "\n", 12 | "There are two Word2Vec architectures: \n", 13 | "\n", 14 | "- **CBOW (Continuous Bag-of-Words)** predicts the central word from the sum of context vectors. This simple sum of word vectors is called \"bag of words\", which gives the name for the model.\n", 15 | "\n", 16 | "- **Skip-Gram** predicts context words given the central word. Skip-Gram with negative sampling is the most popular approach.\n", 17 | "\n", 18 | "Here we will build a PyTorch model that implements Word2Vec's CBOW strategy." 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": {}, 31 | "source": [ 32 | "## What we can do with it?\n", 33 | "\n", 34 | "To calculate the proximity of words, usually the cosine or euclidean distances between vectors are used. Using word embeddings, we can build semantic proportions (also known as analogies) and solve examples like:\n", 35 | "\n", 36 | "$$\n", 37 | "\\textit{king: male = queen: female} \\\\\n", 38 | "\\Downarrow \\\\\n", 39 | "\\textit{king - man + woman = queen}\n", 40 | "$$\n", 41 | "\n", 42 | "" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "## Implementing Word2vec CBOW" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": null, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "import torch\n", 59 | "import torch.nn as nn\n", 60 | "import torch.optim as optim\n", 61 | "torch.manual_seed(0);" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": null, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "class CBOW(nn.Module):\n", 71 | "\n", 72 | " def __init__(self, vocab_size, emb_size):\n", 73 | " super().__init__()\n", 74 | " self.word_emb = nn.Embedding(vocab_size, emb_size)\n", 75 | " self.linear = nn.Linear(emb_size, vocab_size)\n", 76 | "\n", 77 | " def forward(self, x):\n", 78 | " # (batch_size, context_size) -> (batch_size, context_size, emb_dim)\n", 79 | " x = self.word_emb(x)\n", 80 | " \n", 81 | " # (batch_size, context_size, emb_dim) -> (batch_size, emb_dim)\n", 82 | " x = x.sum(dim=1)\n", 83 | "\n", 84 | " # (batch_size, emb_dim) -> (batch_size, vocab_size)\n", 85 | " logits = self.linear(x)\n", 86 | "\n", 87 | " return torch.log_softmax(logits, dim=-1)" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "## Exercise\n", 95 | "Instantiate the model and write a proper training loop. Here are some functions to help you make the data ready for use:" 96 | ] 97 | }, 98 | { 99 | "cell_type": "markdown", 100 | "metadata": {}, 101 | "source": [ 102 | "## Data" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": null, 108 | "metadata": {}, 109 | "outputs": [], 110 | "source": [ 111 | "from torch.utils.data import Dataset, DataLoader\n", 112 | "\n", 113 | "class ContextDataset(Dataset):\n", 114 | " \n", 115 | " def __init__(self, tokenized_texts, context_size=2):\n", 116 | " super().__init__()\n", 117 | " # shifted by 2 due to special tokens for padding and unknown tokens\n", 118 | " self.word_to_ix = {}\n", 119 | " self.word_to_ix[''] = 0\n", 120 | " self.word_to_ix[''] = 1\n", 121 | " for text in tokenized_texts:\n", 122 | " self.add_to_vocab(text)\n", 123 | " self.context_size = context_size\n", 124 | " self.contexts = []\n", 125 | " self.targets = []\n", 126 | " for text in tokenized_texts:\n", 127 | " self.add_to_context_and_target(text)\n", 128 | " \n", 129 | " def add_to_vocab(self, text):\n", 130 | " for word in text:\n", 131 | " if word not in self.word_to_ix.keys():\n", 132 | " self.word_to_ix[word] = len(self.word_to_ix)\n", 133 | " \n", 134 | " def add_to_context_and_target(self, text):\n", 135 | " # k words to the left and k to the right\n", 136 | " k = self.context_size\n", 137 | " for i in range(len(text)):\n", 138 | " context = [text[i+j] if 0 <= i+j < len(text) else '' for j in range(-k, k+1) if j != 0]\n", 139 | " target = text[i]\n", 140 | " self.contexts.append(self.get_words_ids(context))\n", 141 | " self.targets.append(self.get_word_id(target))\n", 142 | " \n", 143 | " def get_word_id(self, word):\n", 144 | " if word in self.word_to_ix.keys():\n", 145 | " return self.word_to_ix[word]\n", 146 | " return self.word_to_ix['']\n", 147 | "\n", 148 | " def get_words_ids(self, words):\n", 149 | " return [self.get_word_id(w) for w in words]\n", 150 | " \n", 151 | " @property\n", 152 | " def ix_to_word(self):\n", 153 | " return list(self.word_to_ix.keys())\n", 154 | " \n", 155 | " @property\n", 156 | " def vocab_size(self):\n", 157 | " return len(self.word_to_ix)\n", 158 | " \n", 159 | " def __getitem__(self, idx):\n", 160 | " context = torch.tensor(self.contexts[idx], dtype=torch.long)\n", 161 | " target = torch.tensor(self.targets[idx], dtype=torch.long)\n", 162 | " return context, target\n", 163 | " \n", 164 | " def __len__(self):\n", 165 | " return len(self.contexts)\n" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": null, 171 | "metadata": {}, 172 | "outputs": [], 173 | "source": [ 174 | "raw_texts = [\n", 175 | " \"we are about to study the idea of a computational process .\",\n", 176 | " \"computational processes are abstract beings that inhabit computers .\",\n", 177 | " \"as they evolve, processes manipulate other abstract things called data .\",\n", 178 | " \"the evolution of a process is directed by a pattern of rules called a program .\",\n", 179 | " \"people create programs to direct processes .\", \n", 180 | " \"in effect , we conjure the spirits of the computer with our spells .\"\n", 181 | "]\n", 182 | "tokenized_texts = [text.lower().split() for text in raw_texts]\n", 183 | "\n", 184 | "train_dataset = ContextDataset(tokenized_texts, context_size=2)\n", 185 | "train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=True)\n", 186 | "vocab = train_dataset.word_to_ix\n", 187 | "print('Dataset size:', len(train_dataset))\n", 188 | "print('Vocab size:', train_dataset.vocab_size)" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "## Model" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": null, 201 | "metadata": {}, 202 | "outputs": [], 203 | "source": [ 204 | "emb_size = 2\n", 205 | "lr = 0.1\n", 206 | "\n", 207 | "model = CBOW(train_dataset.vocab_size, emb_size)\n", 208 | "loss_function = nn.NLLLoss()\n", 209 | "optimizer = torch.optim.SGD(model.parameters(), lr=lr)" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "## Training loop" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": {}, 223 | "outputs": [], 224 | "source": [ 225 | "def train_model(model, dataloader, optimizer, loss_function, num_epochs=1):\n", 226 | " model.train()\n", 227 | " losses = []\n", 228 | " for epoch in range(1, num_epochs+1):\n", 229 | " print('Starting epoch %d' % epoch)\n", 230 | " total_loss = 0\n", 231 | " hits = 0\n", 232 | " for batch_x, batch_y in train_dataloader:\n", 233 | " optimizer.zero_grad()\n", 234 | " logits = model(batch_x)\n", 235 | " loss = loss_function(logits, batch_y)\n", 236 | " loss.backward()\n", 237 | " optimizer.step()\n", 238 | "\n", 239 | " loss_value = loss.item()\n", 240 | " total_loss += loss_value\n", 241 | " losses.append(loss_value)\n", 242 | " y_pred = logits.argmax(dim=1)\n", 243 | " hits += torch.sum(y_pred == batch_y).item()\n", 244 | " avg_loss = total_loss / len(train_dataloader.dataset)\n", 245 | " print('Epoch loss: %.4f' % avg_loss)\n", 246 | " acc = hits / len(train_dataloader.dataset)\n", 247 | " print('Epoch accuracy: %.4f' % acc)\n", 248 | " print('Done!')\n", 249 | " return losses" 250 | ] 251 | }, 252 | { 253 | "cell_type": "code", 254 | "execution_count": null, 255 | "metadata": { 256 | "scrolled": true 257 | }, 258 | "outputs": [], 259 | "source": [ 260 | "losses = train_model(model, train_dataloader, optimizer, loss_function, num_epochs=10)" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": {}, 267 | "outputs": [], 268 | "source": [ 269 | "from matplotlib import pyplot as plt\n", 270 | "fig, ax = plt.subplots()\n", 271 | "ax.plot(losses, \".\")" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "## Plot vectors\n", 279 | "\n", 280 | "Since we mapped words to 2D vectors, we can actually plot them. In the real world, however, we would use much larger vector dimensionalities, so we would need some sort of dimensionality reduction algorithm to see a plot like this." 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": null, 286 | "metadata": {}, 287 | "outputs": [], 288 | "source": [ 289 | "def get_vector(w):\n", 290 | " return model.word_emb(torch.tensor(vocab[w]))\n", 291 | "\n", 292 | "with torch.no_grad():\n", 293 | " fig, ax = plt.subplots(figsize=(12, 8))\n", 294 | " for w in train_dataset.word_to_ix:\n", 295 | " vec = get_vector(w)\n", 296 | " ax.plot(vec[0], vec[1], 'k.')\n", 297 | " ax.annotate(w, (vec[0], vec[1]), textcoords=\"offset points\", xytext=(0, 5), ha='center')" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "## Finding closest words" 305 | ] 306 | }, 307 | { 308 | "cell_type": "code", 309 | "execution_count": null, 310 | "metadata": {}, 311 | "outputs": [], 312 | "source": [ 313 | "def closest(word, n=10):\n", 314 | " vec = get_vector(word)\n", 315 | " all_dists = [(w, torch.dist(vec, get_vector(w)).item()) for w in vocab.keys()]\n", 316 | " return sorted(all_dists, key=lambda t: t[1])[:n]" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": null, 322 | "metadata": {}, 323 | "outputs": [], 324 | "source": [ 325 | "closest('program', n=10)" 326 | ] 327 | }, 328 | { 329 | "cell_type": "markdown", 330 | "metadata": {}, 331 | "source": [ 332 | "## Exercise\n", 333 | "\n", 334 | "Try to implement the SkipGram approach." 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": {}, 340 | "source": [ 341 | "## More information\n", 342 | "\n", 343 | "If you like, these PyTorch's NLP tutorials are a good place to start building NLP models:\n", 344 | "\n", 345 | "- https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html\n", 346 | "- https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html\n", 347 | "- https://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html\n", 348 | "- https://pytorch.org/tutorials/beginner/text_sentiment_ngrams_tutorial.html\n", 349 | "- https://pytorch.org/tutorials/beginner/transformer_tutorial.html" 350 | ] 351 | } 352 | ], 353 | "metadata": { 354 | "kernelspec": { 355 | "display_name": "Python 3 (ipykernel)", 356 | "language": "python", 357 | "name": "python3" 358 | }, 359 | "language_info": { 360 | "codemirror_mode": { 361 | "name": "ipython", 362 | "version": 3 363 | }, 364 | "file_extension": ".py", 365 | "mimetype": "text/x-python", 366 | "name": "python", 367 | "nbconvert_exporter": "python", 368 | "pygments_lexer": "ipython3", 369 | "version": "3.9.7" 370 | } 371 | }, 372 | "nbformat": 4, 373 | "nbformat_minor": 4 374 | } 375 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | An introductory course for PyTorch. 2 | 3 | Throughout this course we will be using: 4 | - Python 3.6+. 5 | - PyTorch 1.11.0 6 | 7 | 8 | # Lectures 9 | 10 | [Lecture 0](https://github.com/mtreviso/pytorch-lecture/blob/master/00-intro.ipynb): Hello world, introduction to Jupyter, and PyTorch high-level overview 11 |
12 | [Lecture 1](https://github.com/mtreviso/pytorch-lecture/blob/master/01-pytorch-basics.ipynb): Introduction to PyTorch: tensors, tensor operations, gradients, autodiff, and broadcasting 13 |
14 | [Lecture 2](https://github.com/mtreviso/pytorch-lecture/blob/master/02-linear-regression.ipynb): Linear Regression via Gradient Descent using Numpy, Numpy + Autodiff, and PyTorch 15 |
16 | [Lecture 3](https://github.com/mtreviso/pytorch-lecture/blob/master/03-modules-and-mlps.ipynb): PyTorch `nn.Modules` alongside training and evaluation loop 17 |
18 | [Lecture 4](https://github.com/mtreviso/pytorch-lecture/blob/master/04-optional-word2vec.ipynb): Implementation of a proof-of-concept Word2Vec in PyTorch
19 | ⏳ [Bonus](https://github.com/mtreviso/pytorch-lecture/blob/master/bonus-computational-efficiency.ipynb): Comparison of the computation efficiency between raw Python, Numpy, and PyTorch (+JIT) 20 |
21 | 🔥 [PyTorch Challenges](https://github.com/mtreviso/pytorch-lecture/blob/master/challenges-for-true-pytorch-heroes.ipynb): a set of 27 mini-puzzles (extension of the ones proposed by [Sasha Rush](https://github.com/srush/Tensor-Puzzles)) 22 |
23 | 🌎 [From Puzzles to Real Code](https://github.com/mtreviso/pytorch-lecture/blob/master/broadcasting_real_examples.ipynb): Examples of broadcasting in real word applications: **wordpieces aggregation**, **clustered attention**, **attention statistics**. 24 | 25 | 26 | # Installation 27 | 28 | First, clone this repository using `git`: 29 | 30 | ```sh 31 | git clone https://github.com/mtreviso/pytorch-lecture.git 32 | cd pytorch-lecture 33 | ``` 34 | 35 | It is highly recommended that you work inside a Python virtualenv. You can create one and install all dependencies via: 36 | ```sh 37 | python3 -m venv env 38 | source env/bin/activate 39 | pip3 install -r requirements.txt 40 | ``` 41 | 42 | Run Jupyter: 43 | ```sh 44 | jupyter-notebook 45 | ``` 46 | 47 | After running the command above, your browser will automatically open the Jupyter homepage: `http://localhost:8888/tree`. 48 | 49 | 50 | 51 | -------------------------------------------------------------------------------- /bonus-computational-efficiency.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Computation Efficiency with Numpy, PyTorch, and JIT\n", 8 | "\n", 9 | "This notebooks illustrates the computational efficiency of running linear algebra with the proper tools - such as numpy." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": null, 15 | "metadata": {}, 16 | "outputs": [], 17 | "source": [ 18 | "from matplotlib import pyplot as plt\n", 19 | "\n", 20 | "def plot_times(labels, times):\n", 21 | " x = list(range(len(times)))\n", 22 | " fig, ax = plt.subplots()\n", 23 | " ax.grid(alpha=0.5, ls='--', which='both')\n", 24 | " ax.bar(x, times, log=True)\n", 25 | " ax.set_xticks(x, labels)\n", 26 | " ax.set_axisbelow(True)" 27 | ] 28 | }, 29 | { 30 | "cell_type": "markdown", 31 | "metadata": {}, 32 | "source": [ 33 | "Let's compute an array dot product in Python:" 34 | ] 35 | }, 36 | { 37 | "cell_type": "code", 38 | "execution_count": null, 39 | "metadata": {}, 40 | "outputs": [], 41 | "source": [ 42 | "def array_dot_product(v1, v2):\n", 43 | " result = 0\n", 44 | " for v1_i, v2_i in zip(v1, v2):\n", 45 | " result += v1_i * v2_i\n", 46 | " return result\n", 47 | "\n", 48 | "v1 = list(range(100))\n", 49 | "v2 = [1]*100\n", 50 | "\n", 51 | "print(\"v1 = {}\".format(v1))\n", 52 | "print(\"v2 = {}\\n\".format(v2))\n", 53 | "\n", 54 | "print(\"v1 dot v2: {}\".format(array_dot_product(v1, v2)))\n", 55 | "print(\"1+2+...+99:\", 99*100/2)" 56 | ] 57 | }, 58 | { 59 | "cell_type": "markdown", 60 | "metadata": {}, 61 | "source": [ 62 | "Okay, it works, but how long does it take?" 63 | ] 64 | }, 65 | { 66 | "cell_type": "code", 67 | "execution_count": null, 68 | "metadata": {}, 69 | "outputs": [], 70 | "source": [ 71 | "%timeit array_dot_product(v1, v2)" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "## Enters numpy\n", 79 | "\n", 80 | "Now let's try with numpy, which uses a C backend optimized for mathematical operations, alleviating the Python overhead." 81 | ] 82 | }, 83 | { 84 | "cell_type": "code", 85 | "execution_count": null, 86 | "metadata": {}, 87 | "outputs": [], 88 | "source": [ 89 | "import numpy as np" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": null, 95 | "metadata": {}, 96 | "outputs": [], 97 | "source": [ 98 | "v1_np = np.arange(100)\n", 99 | "v2_np = np.ones(100)\n", 100 | "print(\"v1 dot v2: {}\".format(v1_np.dot(v2_np)))" 101 | ] 102 | }, 103 | { 104 | "cell_type": "markdown", 105 | "metadata": {}, 106 | "source": [ 107 | "Nice, aligned with our raw Python version. Now let's check the running time." 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": null, 113 | "metadata": {}, 114 | "outputs": [], 115 | "source": [ 116 | "%timeit v1_np.dot(v2_np)" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "We can already se the difference. Numpy was roughly 6x faster than raw PyTorch for a very small array. New let's check with matrices." 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": {}, 130 | "outputs": [], 131 | "source": [ 132 | "def matrix_mul(m1, m2):\n", 133 | " num_rows = len(m1)\n", 134 | " num_columns = len(m2[0])\n", 135 | " internal_dim = len(m1[0])\n", 136 | " result = []\n", 137 | " for i in range(num_rows):\n", 138 | " new_row = []\n", 139 | " for j in range(num_columns):\n", 140 | " total = 0\n", 141 | " for k in range(internal_dim):\n", 142 | " total += m1[i][k] * m2[k][j]\n", 143 | " new_row.append(total)\n", 144 | " result.append(new_row)\n", 145 | " return result" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": null, 151 | "metadata": {}, 152 | "outputs": [], 153 | "source": [ 154 | "m1_np = np.random.randn(100, 200)\n", 155 | "m2_np = np.random.randn(200, 100)\n", 156 | "m1_list = m1_np.tolist()\n", 157 | "m2_list = m2_np.tolist()\n", 158 | "\n", 159 | "result_raw = matrix_mul(m1_list, m2_list)\n", 160 | "result_np = m1_np.dot(m2_np)" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": {}, 166 | "source": [ 167 | "Checking the results..." 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": null, 173 | "metadata": {}, 174 | "outputs": [], 175 | "source": [ 176 | "eps = np.abs(result_raw - result_np).sum()\n", 177 | "print('{} up to {}'.format(np.allclose(result_raw, result_np), eps))" 178 | ] 179 | }, 180 | { 181 | "cell_type": "markdown", 182 | "metadata": {}, 183 | "source": [ 184 | "Okay. Now lets time it again." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": null, 190 | "metadata": {}, 191 | "outputs": [], 192 | "source": [ 193 | "time_raw = %timeit -o matrix_mul(m1_list, m2_list) " 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": null, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "time_np = %timeit -o m1_np.dot(m2_np)" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": null, 208 | "metadata": { 209 | "scrolled": true 210 | }, 211 | "outputs": [], 212 | "source": [ 213 | "time_ratio = time_raw.average / time_np.average\n", 214 | "print('Numpy is ~{:.0f}x faster than standard python'.format(time_ratio))\n", 215 | "print('Something the runs in 1h in numpy would need to run for {:.0f} days in raw python'.format(time_ratio / 24))" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [ 224 | "plot_times(['python', 'numpy'], [time_raw.average, time_np.average])" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "## Enters PyTorch\n", 232 | "\n", 233 | "Now let's try with PyTorch. Note that PyTorch also uses a C-backend to implement linear algebra methods. However, it also has the power to run those operation on GPUs. Let's try both variants and compare them." 234 | ] 235 | }, 236 | { 237 | "cell_type": "code", 238 | "execution_count": null, 239 | "metadata": {}, 240 | "outputs": [], 241 | "source": [ 242 | "import torch" 243 | ] 244 | }, 245 | { 246 | "cell_type": "code", 247 | "execution_count": null, 248 | "metadata": {}, 249 | "outputs": [], 250 | "source": [ 251 | "m1_pt = torch.from_numpy(m1_np)\n", 252 | "m2_pt = torch.from_numpy(m2_np)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": null, 258 | "metadata": {}, 259 | "outputs": [], 260 | "source": [ 261 | "time_pt = %timeit -o m1_pt @ m2_pt" 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": null, 267 | "metadata": {}, 268 | "outputs": [], 269 | "source": [ 270 | "plot_times(['python', 'numpy', 'pytorch'], \n", 271 | " [time_raw.average, time_np.average, time_pt.average])" 272 | ] 273 | }, 274 | { 275 | "cell_type": "markdown", 276 | "metadata": {}, 277 | "source": [ 278 | "Seems about the same... Now let's try to use a GPU:" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": null, 284 | "metadata": {}, 285 | "outputs": [], 286 | "source": [ 287 | "m1_pt = m1_pt.to('cuda' if torch.cuda.is_available() else 'cpu')\n", 288 | "m2_pt = m2_pt.to('cuda' if torch.cuda.is_available() else 'cpu')\n", 289 | "time_pt_gpu = %timeit -o m1_pt @ m2_pt" 290 | ] 291 | }, 292 | { 293 | "cell_type": "code", 294 | "execution_count": null, 295 | "metadata": {}, 296 | "outputs": [], 297 | "source": [ 298 | "plot_times(['numpy', 'pytorch (cpu)', 'pytorch (gpu)'], \n", 299 | " [time_np.average, time_pt.average, time_pt_gpu.average])" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "## Enters JIT" 307 | ] 308 | }, 309 | { 310 | "cell_type": "markdown", 311 | "metadata": {}, 312 | "source": [ 313 | "Now suppose we have an even more complicated function that contains control flows (if-else statements). To handle that, we have to rely on the Python interpreter, which is slow. To circumvent that, we can \"compile\" our function/module into a fixed intermediate-level code representation. \n", 314 | "\n", 315 | "https://pytorch.org/docs/stable/jit.html" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": null, 321 | "metadata": {}, 322 | "outputs": [], 323 | "source": [ 324 | "@torch.jit.script\n", 325 | "def jit_mm(m1, m2):\n", 326 | " return m1 @ m2\n", 327 | "\n", 328 | "time_pt_jit = %timeit -o jit_mm(m1_pt, m2_pt)\n", 329 | "\n", 330 | "plot_times(['numpy', 'pt (cpu)', 'pt (gpu)', 'pt (gpu+jit)'], \n", 331 | " [time_np.average, time_pt.average, time_pt_gpu.average, time_pt_jit.average])" 332 | ] 333 | }, 334 | { 335 | "cell_type": "markdown", 336 | "metadata": {}, 337 | "source": [ 338 | "For more optimizations, check this blog post by Horace He:\n", 339 | "[Making Deep Learning Go Brrrr From First Principles](https://horace.io/brrr_intro.html)" 340 | ] 341 | } 342 | ], 343 | "metadata": { 344 | "anaconda-cloud": {}, 345 | "kernelspec": { 346 | "display_name": "Python 3 (ipykernel)", 347 | "language": "python", 348 | "name": "python3" 349 | }, 350 | "language_info": { 351 | "codemirror_mode": { 352 | "name": "ipython", 353 | "version": 3 354 | }, 355 | "file_extension": ".py", 356 | "mimetype": "text/x-python", 357 | "name": "python", 358 | "nbconvert_exporter": "python", 359 | "pygments_lexer": "ipython3", 360 | "version": "3.9.7" 361 | } 362 | }, 363 | "nbformat": 4, 364 | "nbformat_minor": 2 365 | } 366 | -------------------------------------------------------------------------------- /challenges-for-true-pytorch-heroes-solutions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# PyTorch Challenges\n", 8 | "\n", 9 | "These set of challenges are concerned about broadcasting, one of the key concepts when dealing with tensors.\n", 10 | "\n", 11 | "[Sasha Rush](https://twitter.com/srush_nlp) compiled a set of [16 Tensor mini-puzzles](https://github.com/srush/Tensor-Puzzles) that involve reasoning about broadcasting in a constrained setting: people are allowed to use only a single PyTorch function: `torch.arange`. Can you do it?\n", 12 | "\n", 13 | "Here, I've extended his list to 26 puzzles! \n", 14 | "\n", 15 | "**Rules**\n", 16 | "\n", 17 | "- Each puzzle needs to be solved in 1 line (<80 columns) of code.\n", 18 | "- You are allowed @, arithmetic, comparison, shape, any indexing (e.g. `a[:j], a[:, None], a[arange(10)]`), and previous puzzle functions.\n", 19 | "- To start off, we give you an implementation for the `torch.arange` function.\n", 20 | "\n", 21 | "**Anti-Rules**\n", 22 | "- Nothing else. No `.view, .sum, .take, .squeeze, .tensor`.\n", 23 | "- No cheating. Stackoverflow is great, but this is about first-principles.\n", 24 | "- Hint... these puzzles are mostly about [Broadcasting](https://pytorch.org/docs/master/notes/broadcasting.html). Make sure you understand this rule.\n" 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "---" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": null, 37 | "metadata": {}, 38 | "outputs": [], 39 | "source": [ 40 | "%load_ext autoreload\n", 41 | "%autoreload 2" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 169, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "import torch\n", 51 | "from spec import make_test, run_test, TT" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "### arange\n", 59 | "\n", 60 | "This is given for free! Think about it as a \"for-loop\"" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 170, 66 | "metadata": {}, 67 | "outputs": [ 68 | { 69 | "data": { 70 | "text/plain": [ 71 | "tensor([0, 1, 2, 3, 4, 5])" 72 | ] 73 | }, 74 | "execution_count": 170, 75 | "metadata": {}, 76 | "output_type": "execute_result" 77 | } 78 | ], 79 | "source": [ 80 | "def arange(i: int):\n", 81 | " return torch.arange(i)\n", 82 | "\n", 83 | "arange(6)" 84 | ] 85 | }, 86 | { 87 | "cell_type": "markdown", 88 | "metadata": {}, 89 | "source": [ 90 | "### where" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 171, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/plain": [ 101 | "tensor([ 0, -1, 2, -1])" 102 | ] 103 | }, 104 | "execution_count": 171, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "def where(q, a, b):\n", 111 | " return q * a + (~q) * b\n", 112 | "\n", 113 | "where(arange(4) % 2 == 0, arange(4), -1)" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "### ones" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 172, 126 | "metadata": {}, 127 | "outputs": [ 128 | { 129 | "data": { 130 | "text/plain": [ 131 | "tensor([1, 1, 1, 1])" 132 | ] 133 | }, 134 | "execution_count": 172, 135 | "metadata": {}, 136 | "output_type": "execute_result" 137 | } 138 | ], 139 | "source": [ 140 | "def ones(i: int):\n", 141 | " return where(arange(i) >= 0, 1, 0)\n", 142 | "\n", 143 | "ones(4)" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "### sum" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 173, 156 | "metadata": {}, 157 | "outputs": [ 158 | { 159 | "data": { 160 | "text/plain": [ 161 | "tensor(6)" 162 | ] 163 | }, 164 | "execution_count": 173, 165 | "metadata": {}, 166 | "output_type": "execute_result" 167 | } 168 | ], 169 | "source": [ 170 | "def sum(a: torch.Tensor):\n", 171 | " return ones(a.shape[0]) @ a\n", 172 | "\n", 173 | "sum(arange(4))" 174 | ] 175 | }, 176 | { 177 | "cell_type": "markdown", 178 | "metadata": {}, 179 | "source": [ 180 | "### outer" 181 | ] 182 | }, 183 | { 184 | "cell_type": "code", 185 | "execution_count": 174, 186 | "metadata": {}, 187 | "outputs": [ 188 | { 189 | "data": { 190 | "text/plain": [ 191 | "tensor([[0, 0, 0],\n", 192 | " [1, 1, 1],\n", 193 | " [2, 2, 2],\n", 194 | " [3, 3, 3]])" 195 | ] 196 | }, 197 | "execution_count": 174, 198 | "metadata": {}, 199 | "output_type": "execute_result" 200 | } 201 | ], 202 | "source": [ 203 | "def outer(a: torch.Tensor, b: torch.Tensor):\n", 204 | " return a[:, None] * b[None, :]\n", 205 | "\n", 206 | "outer(arange(4), ones(3))" 207 | ] 208 | }, 209 | { 210 | "cell_type": "markdown", 211 | "metadata": {}, 212 | "source": [ 213 | "### diag" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 175, 219 | "metadata": {}, 220 | "outputs": [ 221 | { 222 | "data": { 223 | "text/plain": [ 224 | "tensor([0, 1, 2, 3])" 225 | ] 226 | }, 227 | "execution_count": 175, 228 | "metadata": {}, 229 | "output_type": "execute_result" 230 | } 231 | ], 232 | "source": [ 233 | "def diag(a: torch.Tensor):\n", 234 | " return a[arange(a.shape[0]), arange(a.shape[0])]\n", 235 | "\n", 236 | "diag(outer(arange(4), ones(4)))" 237 | ] 238 | }, 239 | { 240 | "cell_type": "markdown", 241 | "metadata": {}, 242 | "source": [ 243 | "### eye" 244 | ] 245 | }, 246 | { 247 | "cell_type": "code", 248 | "execution_count": 176, 249 | "metadata": {}, 250 | "outputs": [ 251 | { 252 | "data": { 253 | "text/plain": [ 254 | "tensor([[1, 0, 0, 0],\n", 255 | " [0, 1, 0, 0],\n", 256 | " [0, 0, 1, 0],\n", 257 | " [0, 0, 0, 1]])" 258 | ] 259 | }, 260 | "execution_count": 176, 261 | "metadata": {}, 262 | "output_type": "execute_result" 263 | } 264 | ], 265 | "source": [ 266 | "def eye(j: int):\n", 267 | " return (arange(j)[:, None] == arange(j)[None, :]) * 1\n", 268 | "\n", 269 | "eye(4)" 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": {}, 275 | "source": [ 276 | "### triu" 277 | ] 278 | }, 279 | { 280 | "cell_type": "code", 281 | "execution_count": 177, 282 | "metadata": {}, 283 | "outputs": [ 284 | { 285 | "data": { 286 | "text/plain": [ 287 | "tensor([[1, 1, 1, 1],\n", 288 | " [0, 1, 1, 1],\n", 289 | " [0, 0, 1, 1],\n", 290 | " [0, 0, 0, 1]])" 291 | ] 292 | }, 293 | "execution_count": 177, 294 | "metadata": {}, 295 | "output_type": "execute_result" 296 | } 297 | ], 298 | "source": [ 299 | "def triu(j: int):\n", 300 | " return (arange(j)[:,None] <= arange(j))*1\n", 301 | "\n", 302 | "triu(4)" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "### cumsum" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": 178, 315 | "metadata": {}, 316 | "outputs": [ 317 | { 318 | "data": { 319 | "text/plain": [ 320 | "tensor([0, 1, 3, 6])" 321 | ] 322 | }, 323 | "execution_count": 178, 324 | "metadata": {}, 325 | "output_type": "execute_result" 326 | } 327 | ], 328 | "source": [ 329 | "def cumsum(a: torch.Tensor):\n", 330 | " return (outer(ones(a.shape[0]), a) @ triu(a.shape[0]))[0]\n", 331 | "\n", 332 | "cumsum(torch.arange(4))" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": {}, 338 | "source": [ 339 | "### diff" 340 | ] 341 | }, 342 | { 343 | "cell_type": "code", 344 | "execution_count": 179, 345 | "metadata": {}, 346 | "outputs": [ 347 | { 348 | "data": { 349 | "text/plain": [ 350 | "tensor([0, 1, 1, 1])" 351 | ] 352 | }, 353 | "execution_count": 179, 354 | "metadata": {}, 355 | "output_type": "execute_result" 356 | } 357 | ], 358 | "source": [ 359 | "def diff(a: torch.Tensor, i: int):\n", 360 | " return a - a[where(arange(i) > 0, arange(i)-1, 0)] + (a*(arange(i) <= 0))\n", 361 | "\n", 362 | "diff(arange(4), 4)" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": {}, 368 | "source": [ 369 | "### vstack" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 180, 375 | "metadata": {}, 376 | "outputs": [ 377 | { 378 | "data": { 379 | "text/plain": [ 380 | "tensor([[0, 1, 2, 3],\n", 381 | " [1, 1, 1, 1]])" 382 | ] 383 | }, 384 | "execution_count": 180, 385 | "metadata": {}, 386 | "output_type": "execute_result" 387 | } 388 | ], 389 | "source": [ 390 | "def vstack(a: torch.Tensor, b: torch.Tensor):\n", 391 | " return a * (1-arange(2)[:, None]) + b * arange(2)[:, None]\n", 392 | "\n", 393 | "vstack(arange(4), ones(4))" 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": {}, 399 | "source": [ 400 | "### roll" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 181, 406 | "metadata": {}, 407 | "outputs": [ 408 | { 409 | "data": { 410 | "text/plain": [ 411 | "tensor([1, 2, 3, 0])" 412 | ] 413 | }, 414 | "execution_count": 181, 415 | "metadata": {}, 416 | "output_type": "execute_result" 417 | } 418 | ], 419 | "source": [ 420 | "def roll(a: torch.Tensor, i: int):\n", 421 | " return a[(arange(i) + 1) * ((arange(i) + 1) < i)]\n", 422 | "\n", 423 | "roll(arange(4), 4)" 424 | ] 425 | }, 426 | { 427 | "cell_type": "markdown", 428 | "metadata": {}, 429 | "source": [ 430 | "### flip" 431 | ] 432 | }, 433 | { 434 | "cell_type": "code", 435 | "execution_count": 182, 436 | "metadata": {}, 437 | "outputs": [ 438 | { 439 | "data": { 440 | "text/plain": [ 441 | "tensor([3, 2, 1, 0])" 442 | ] 443 | }, 444 | "execution_count": 182, 445 | "metadata": {}, 446 | "output_type": "execute_result" 447 | } 448 | ], 449 | "source": [ 450 | "def flip(a: torch.Tensor, i: int):\n", 451 | " return a[i - arange(i) - 1]\n", 452 | "\n", 453 | "flip(arange(4), 4)" 454 | ] 455 | }, 456 | { 457 | "cell_type": "markdown", 458 | "metadata": {}, 459 | "source": [ 460 | "### compress" 461 | ] 462 | }, 463 | { 464 | "cell_type": "code", 465 | "execution_count": 183, 466 | "metadata": {}, 467 | "outputs": [ 468 | { 469 | "data": { 470 | "text/plain": [ 471 | "tensor([1, 2, 0])" 472 | ] 473 | }, 474 | "execution_count": 183, 475 | "metadata": {}, 476 | "output_type": "execute_result" 477 | } 478 | ], 479 | "source": [ 480 | "def compress(g: torch.Tensor, v: torch.Tensor, i: int):\n", 481 | " return sum(eye(i)[:sum(g*1)] * outer(v[g], ones(i)))\n", 482 | "\n", 483 | "compress(torch.tensor([False, True, True]), arange(3), 3)" 484 | ] 485 | }, 486 | { 487 | "cell_type": "markdown", 488 | "metadata": {}, 489 | "source": [ 490 | "### pad_to" 491 | ] 492 | }, 493 | { 494 | "cell_type": "code", 495 | "execution_count": 184, 496 | "metadata": {}, 497 | "outputs": [ 498 | { 499 | "data": { 500 | "text/plain": [ 501 | "tensor([0, 1, 2, 0, 0])" 502 | ] 503 | }, 504 | "execution_count": 184, 505 | "metadata": {}, 506 | "output_type": "execute_result" 507 | } 508 | ], 509 | "source": [ 510 | "def pad_to(a: torch.Tensor, i: int, j: int):\n", 511 | " return sum((arange(i)[:, None] == arange(j)[None, :]) * a[:, None])\n", 512 | "\n", 513 | "pad_to(arange(3), 3, 5)" 514 | ] 515 | }, 516 | { 517 | "cell_type": "markdown", 518 | "metadata": {}, 519 | "source": [ 520 | "### sequence_mask" 521 | ] 522 | }, 523 | { 524 | "cell_type": "code", 525 | "execution_count": 185, 526 | "metadata": {}, 527 | "outputs": [ 528 | { 529 | "data": { 530 | "text/plain": [ 531 | "tensor([[1, 1, 0],\n", 532 | " [1, 1, 0],\n", 533 | " [1, 0, 0],\n", 534 | " [1, 1, 1]])" 535 | ] 536 | }, 537 | "execution_count": 185, 538 | "metadata": {}, 539 | "output_type": "execute_result" 540 | } 541 | ], 542 | "source": [ 543 | "def sequence_mask(values: torch.Tensor, length: torch.Tensor):\n", 544 | " return values * (length[:, None] > arange(values.shape[-1])[None, :])\n", 545 | "\n", 546 | "sequence_mask(outer(ones(4), ones(3)), torch.tensor([2,2,1,3]))" 547 | ] 548 | }, 549 | { 550 | "cell_type": "markdown", 551 | "metadata": {}, 552 | "source": [ 553 | "### bincount" 554 | ] 555 | }, 556 | { 557 | "cell_type": "code", 558 | "execution_count": 186, 559 | "metadata": {}, 560 | "outputs": [ 561 | { 562 | "data": { 563 | "text/plain": [ 564 | "tensor([1, 3, 4, 2])" 565 | ] 566 | }, 567 | "execution_count": 186, 568 | "metadata": {}, 569 | "output_type": "execute_result" 570 | } 571 | ], 572 | "source": [ 573 | "def bincount(a: torch.Tensor, j: int):\n", 574 | " return ones(len(a)) @ ((a[:, None] == arange(j)[None, :]) * 1)\n", 575 | "\n", 576 | "bincount(torch.tensor([2, 1, 3, 3, 1, 2, 2, 2, 1, 0]), 4)" 577 | ] 578 | }, 579 | { 580 | "cell_type": "markdown", 581 | "metadata": {}, 582 | "source": [ 583 | "### scatter_add" 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": 187, 589 | "metadata": {}, 590 | "outputs": [ 591 | { 592 | "data": { 593 | "text/plain": [ 594 | "tensor([8, 7, 5, 4])" 595 | ] 596 | }, 597 | "execution_count": 187, 598 | "metadata": {}, 599 | "output_type": "execute_result" 600 | } 601 | ], 602 | "source": [ 603 | "def scatter_add(values: torch.Tensor, link: torch.Tensor, j: int):\n", 604 | " return sum((link[:, None] == arange(j)[None, :]) * outer(values, ones(j)))\n", 605 | "\n", 606 | "scatter_add(torch.tensor([5,1,7,2,3,2,1,3]), torch.tensor([0,0,1,0,2,2,3,3]), 4)" 607 | ] 608 | }, 609 | { 610 | "cell_type": "markdown", 611 | "metadata": {}, 612 | "source": [ 613 | "### flatten" 614 | ] 615 | }, 616 | { 617 | "cell_type": "code", 618 | "execution_count": 189, 619 | "metadata": {}, 620 | "outputs": [ 621 | { 622 | "data": { 623 | "text/plain": [ 624 | "tensor([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])" 625 | ] 626 | }, 627 | "execution_count": 189, 628 | "metadata": {}, 629 | "output_type": "execute_result" 630 | } 631 | ], 632 | "source": [ 633 | "def flatten(a: torch.Tensor, i:int, j:int):\n", 634 | " return a[outer(ones(i), ones(j)) == 1]\n", 635 | "\n", 636 | "flatten(arange(16).view(4, 4), 4, 4)" 637 | ] 638 | }, 639 | { 640 | "cell_type": "markdown", 641 | "metadata": {}, 642 | "source": [ 643 | "### linspace" 644 | ] 645 | }, 646 | { 647 | "cell_type": "code", 648 | "execution_count": 190, 649 | "metadata": {}, 650 | "outputs": [ 651 | { 652 | "data": { 653 | "text/plain": [ 654 | "tensor([0.0000, 0.1111, 0.2222, 0.3333, 0.4444, 0.5556, 0.6667, 0.7778, 0.8889,\n", 655 | " 1.0000])" 656 | ] 657 | }, 658 | "execution_count": 190, 659 | "metadata": {}, 660 | "output_type": "execute_result" 661 | } 662 | ], 663 | "source": [ 664 | "def linspace(i: float, j: float, n: int):\n", 665 | " return i + (j - i) * arange(n) / max(1, (n - 1))\n", 666 | "\n", 667 | "linspace(0, 1, 10)" 668 | ] 669 | }, 670 | { 671 | "cell_type": "markdown", 672 | "metadata": {}, 673 | "source": [ 674 | "### heaviside" 675 | ] 676 | }, 677 | { 678 | "cell_type": "code", 679 | "execution_count": 191, 680 | "metadata": {}, 681 | "outputs": [ 682 | { 683 | "data": { 684 | "text/plain": [ 685 | "tensor([ 1.0000, -2.6444, 0.0000])" 686 | ] 687 | }, 688 | "execution_count": 191, 689 | "metadata": {}, 690 | "output_type": "execute_result" 691 | } 692 | ], 693 | "source": [ 694 | "def heaviside(a: torch.Tensor, b: torch.Tensor):\n", 695 | " return (a > 0) + (a == 0) * b\n", 696 | "\n", 697 | "heaviside(torch.tensor([1, 0, -2]), torch.randn(3))" 698 | ] 699 | }, 700 | { 701 | "cell_type": "markdown", 702 | "metadata": {}, 703 | "source": [ 704 | "### hstack" 705 | ] 706 | }, 707 | { 708 | "cell_type": "code", 709 | "execution_count": 192, 710 | "metadata": {}, 711 | "outputs": [ 712 | { 713 | "data": { 714 | "text/plain": [ 715 | "tensor([[0, 1],\n", 716 | " [1, 1],\n", 717 | " [2, 1]])" 718 | ] 719 | }, 720 | "execution_count": 192, 721 | "metadata": {}, 722 | "output_type": "execute_result" 723 | } 724 | ], 725 | "source": [ 726 | "def hstack(a: torch.Tensor, b: torch.Tensor):\n", 727 | " return a[:,None] * eye(2)[0] + b[:,None] * eye(2)[1]\n", 728 | "\n", 729 | "hstack(arange(3), ones(3))" 730 | ] 731 | }, 732 | { 733 | "cell_type": "markdown", 734 | "metadata": {}, 735 | "source": [ 736 | "### view (1d to 2d)" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": 193, 742 | "metadata": {}, 743 | "outputs": [ 744 | { 745 | "data": { 746 | "text/plain": [ 747 | "tensor([[0, 1],\n", 748 | " [2, 3],\n", 749 | " [4, 5]])" 750 | ] 751 | }, 752 | "execution_count": 193, 753 | "metadata": {}, 754 | "output_type": "execute_result" 755 | } 756 | ], 757 | "source": [ 758 | "def view(a: torch.Tensor, i: int, j: int):\n", 759 | " return a[(j * arange(i)[:,None] + arange(j)[None]) % len(a)][:i, :j]\n", 760 | "\n", 761 | "view(arange(6), 3, 2)" 762 | ] 763 | }, 764 | { 765 | "cell_type": "markdown", 766 | "metadata": {}, 767 | "source": [ 768 | "### repeat (1d)" 769 | ] 770 | }, 771 | { 772 | "cell_type": "code", 773 | "execution_count": 194, 774 | "metadata": {}, 775 | "outputs": [ 776 | { 777 | "data": { 778 | "text/plain": [ 779 | "tensor([0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4])" 780 | ] 781 | }, 782 | "execution_count": 194, 783 | "metadata": {}, 784 | "output_type": "execute_result" 785 | } 786 | ], 787 | "source": [ 788 | "def repeat(a: torch.Tensor, d: int):\n", 789 | " return (ones(d)[:, None] * a)[outer(ones(d), ones(len(a))) == 1]\n", 790 | "\n", 791 | "repeat(arange(5), 3)" 792 | ] 793 | }, 794 | { 795 | "cell_type": "markdown", 796 | "metadata": {}, 797 | "source": [ 798 | "### repeat_interleave (1d)" 799 | ] 800 | }, 801 | { 802 | "cell_type": "code", 803 | "execution_count": 195, 804 | "metadata": {}, 805 | "outputs": [ 806 | { 807 | "data": { 808 | "text/plain": [ 809 | "tensor([0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])" 810 | ] 811 | }, 812 | "execution_count": 195, 813 | "metadata": {}, 814 | "output_type": "execute_result" 815 | } 816 | ], 817 | "source": [ 818 | "def repeat_interleave(a: torch.Tensor, d: int):\n", 819 | " return (ones(d)[:, None] * a).T[outer(ones(len(a)), ones(d)) == 1]\n", 820 | "\n", 821 | "repeat_interleave(arange(5), 3)" 822 | ] 823 | }, 824 | { 825 | "cell_type": "markdown", 826 | "metadata": {}, 827 | "source": [ 828 | "### chunk" 829 | ] 830 | }, 831 | { 832 | "cell_type": "code", 833 | "execution_count": 198, 834 | "metadata": { 835 | "scrolled": true 836 | }, 837 | "outputs": [ 838 | { 839 | "data": { 840 | "text/plain": [ 841 | "[tensor([0, 1]),\n", 842 | " tensor([2, 3]),\n", 843 | " tensor([4, 5]),\n", 844 | " tensor([6, 7]),\n", 845 | " tensor([8, 9]),\n", 846 | " tensor([10, 11])]" 847 | ] 848 | }, 849 | "execution_count": 198, 850 | "metadata": {}, 851 | "output_type": "execute_result" 852 | } 853 | ], 854 | "source": [ 855 | "def chunk(a: torch.Tensor, c: int):\n", 856 | " return list(view(a, c, len(a)//c))\n", 857 | "\n", 858 | "chunk(torch.arange(12), 6)" 859 | ] 860 | }, 861 | { 862 | "cell_type": "markdown", 863 | "metadata": {}, 864 | "source": [ 865 | "### nonzero" 866 | ] 867 | }, 868 | { 869 | "cell_type": "code", 870 | "execution_count": 200, 871 | "metadata": {}, 872 | "outputs": [ 873 | { 874 | "data": { 875 | "text/plain": [ 876 | "tensor([[0, 0],\n", 877 | " [1, 1],\n", 878 | " [2, 2]])" 879 | ] 880 | }, 881 | "execution_count": 200, 882 | "metadata": {}, 883 | "output_type": "execute_result" 884 | } 885 | ], 886 | "source": [ 887 | "def nonzero(a: torch.Tensor, i: int, j: int):\n", 888 | " return hstack(outer(arange(i),ones(j))[a!=0],outer(ones(i),arange(j))[a!=0])\n", 889 | "\n", 890 | "nonzero(eye(3), 3, 3)" 891 | ] 892 | }, 893 | { 894 | "cell_type": "markdown", 895 | "metadata": {}, 896 | "source": [ 897 | "### bucketize" 898 | ] 899 | }, 900 | { 901 | "cell_type": "code", 902 | "execution_count": 201, 903 | "metadata": {}, 904 | "outputs": [ 905 | { 906 | "data": { 907 | "text/plain": [ 908 | "tensor([1, 3, 4])" 909 | ] 910 | }, 911 | "execution_count": 201, 912 | "metadata": {}, 913 | "output_type": "execute_result" 914 | } 915 | ], 916 | "source": [ 917 | "def bucketize(v: torch.Tensor, boundaries: torch.Tensor):\n", 918 | " return sum((v[:,None] > boundaries[None, :]).T * 1)\n", 919 | "\n", 920 | "bucketize(torch.tensor([3, 6, 9]), torch.tensor([1, 3, 5, 7, 9]))" 921 | ] 922 | } 923 | ], 924 | "metadata": { 925 | "anaconda-cloud": {}, 926 | "celltoolbar": "Raw Cell Format", 927 | "jupytext": { 928 | "formats": "ipynb,py:percent" 929 | }, 930 | "kernelspec": { 931 | "display_name": "Python 3 (ipykernel)", 932 | "language": "python", 933 | "name": "python3" 934 | }, 935 | "language_info": { 936 | "codemirror_mode": { 937 | "name": "ipython", 938 | "version": 3 939 | }, 940 | "file_extension": ".py", 941 | "mimetype": "text/x-python", 942 | "name": "python", 943 | "nbconvert_exporter": "python", 944 | "pygments_lexer": "ipython3", 945 | "version": "3.9.7" 946 | } 947 | }, 948 | "nbformat": 4, 949 | "nbformat_minor": 2 950 | } 951 | -------------------------------------------------------------------------------- /challenges-for-true-pytorch-heroes.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# PyTorch Challenges\n", 8 | "\n", 9 | "[Sasha Rush](https://twitter.com/srush_nlp) compiled a set of [16 Tensor mini-puzzles](https://github.com/srush/Tensor-Puzzles) that involve reasoning about broadcasting in a constrained setting: people are allowed to use only a single PyTorch function: `torch.arange`. Can you do it?\n", 10 | "\n", 11 | "Here, I've extended his list to 27 puzzles! \n", 12 | "\n", 13 | "**Rules**\n", 14 | "\n", 15 | "- Each puzzle needs to be solved in 1 line (<80 columns) of code.\n", 16 | "- You are allowed @, arithmetic, comparison, shape, any indexing (e.g. `a[:j], a[:, None], a[arange(10)]`), and previous puzzle functions.\n", 17 | "- To start off, we give you an implementation for the `torch.arange` function.\n", 18 | "\n", 19 | "**Anti-Rules**\n", 20 | "- Nothing else. No `.view, .sum, .take, .squeeze, .tensor`.\n", 21 | "- No cheating. Stackoverflow is great, but this is about first-principles.\n", 22 | "- Hint... these puzzles are mostly about [Broadcasting](https://pytorch.org/docs/master/notes/broadcasting.html). Make sure you understand this rule, which is a key concept for dealing with n-dimensional arrays.\n", 23 | "\n", 24 | "🐶🐶🐶 After you convince yourself your code is correct, run the cell to test it. If the test succeeds, you will get a puppy 🐶🐶🐶." 25 | ] 26 | }, 27 | { 28 | "cell_type": "markdown", 29 | "metadata": {}, 30 | "source": [ 31 | "List of puzzles:\n", 32 | "\n", 33 | "1. [where](#1\\)-where)\n", 34 | "2. [ones](#2\\)-ones)\n", 35 | "3. [sum](#3\\)-sum)\n", 36 | "4. [outer](#4\\)-outer)\n", 37 | "5. [diag](#5\\)-diag)\n", 38 | "6. [eye](#6\\)-eye)\n", 39 | "7. [triu](#7\\)-triu)\n", 40 | "8. [cumsum](#8\\)-cumsum)\n", 41 | "9. [diff](#9\\)-diff)\n", 42 | "10. [vstack](#10\\)-vstack)\n", 43 | "11. [roll](#11\\)-roll)\n", 44 | "12. [flip](#12\\)-flip)\n", 45 | "13. [compress](#13\\)-compress)\n", 46 | "14. [pad_to](#14\\)-pad_to)\n", 47 | "15. [sequence_mask](#15\\)-sequence_mask)\n", 48 | "16. [bincount](#16\\)-bincount)\n", 49 | "17. [scatter_add](#17\\)-scatter_add)\n", 50 | "18. [flatten](#18\\)-flatten)\n", 51 | "19. [linspace](#19\\)-linspace)\n", 52 | "20. [heaviside](#20\\)-heaviside)\n", 53 | "21. [hstack](#21\\)-hstack)\n", 54 | "22. [view](#22\\)-view-\\(1d-to-2d\\))\n", 55 | "23. [repeat](#23\\)-repeat-\\(1d\\))\n", 56 | "24. [repeat_interleave](#24\\)-repeat_interleave-\\(1d\\))\n", 57 | "25. [chunk](#25\\)-chunk)\n", 58 | "26. [nonzero](#26\\)-nonzero)\n", 59 | "27. [bucketize](#27\\)-bucketize)" 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "---\n", 67 | "\n", 68 | "## Setup" 69 | ] 70 | }, 71 | { 72 | "cell_type": "code", 73 | "execution_count": null, 74 | "metadata": {}, 75 | "outputs": [], 76 | "source": [ 77 | "!pip install -qqq torchtyping hypothesis pytest" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": null, 83 | "metadata": {}, 84 | "outputs": [], 85 | "source": [ 86 | "import torch\n", 87 | "from spec import make_test, run_test, TT" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": {}, 93 | "source": [ 94 | "---" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": {}, 100 | "source": [ 101 | "### arange\n", 102 | "\n", 103 | "This one is given! Think about it as a \"for-loop\"" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": null, 109 | "metadata": {}, 110 | "outputs": [], 111 | "source": [ 112 | "def arange(i: int):\n", 113 | " return torch.arange(i)\n", 114 | "\n", 115 | "arange(6)" 116 | ] 117 | }, 118 | { 119 | "cell_type": "markdown", 120 | "metadata": {}, 121 | "source": [ 122 | "### 1) where\n", 123 | "https://numpy.org/doc/stable/reference/generated/numpy.where.html" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": null, 129 | "metadata": {}, 130 | "outputs": [], 131 | "source": [ 132 | "def where_spec(q, a, b, out):\n", 133 | " for i in range(len(out)):\n", 134 | " out[i] = a[i] if q[i] else b[i]\n", 135 | "\n", 136 | "def where(q: TT[\"i\", bool], a: TT[\"i\"], b: TT[\"i\"]) -> TT[\"i\"]:\n", 137 | " raise NotImplementedError\n", 138 | "\n", 139 | "run_test(make_test(\"where\", where, where_spec))" 140 | ] 141 | }, 142 | { 143 | "cell_type": "markdown", 144 | "metadata": {}, 145 | "source": [ 146 | "### 2) ones\n", 147 | "https://numpy.org/doc/stable/reference/generated/numpy.ones.html" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": {}, 154 | "outputs": [], 155 | "source": [ 156 | "def ones_spec(out):\n", 157 | " for i in range(len(out)):\n", 158 | " out[i] = 1\n", 159 | "\n", 160 | "def ones(i: int) -> TT[\"i\"]:\n", 161 | " raise NotImplementedError\n", 162 | "\n", 163 | "run_test(make_test(\"one\", ones, ones_spec, add_sizes=[\"i\"]))" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": {}, 169 | "source": [ 170 | "### 3) sum\n", 171 | "https://numpy.org/doc/stable/reference/generated/numpy.sum.html" 172 | ] 173 | }, 174 | { 175 | "cell_type": "code", 176 | "execution_count": null, 177 | "metadata": {}, 178 | "outputs": [], 179 | "source": [ 180 | "def sum_spec(a, out):\n", 181 | " out[0] = 0\n", 182 | " for i in range(len(a)):\n", 183 | " out[0] += a[i]\n", 184 | "\n", 185 | "def sum(a: TT[\"i\"]) -> TT[1]:\n", 186 | " raise NotImplementedError\n", 187 | "\n", 188 | "run_test(make_test(\"sum\", sum, sum_spec))" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "### 4) outer\n", 196 | "https://numpy.org/doc/stable/reference/generated/numpy.outer.html" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": null, 202 | "metadata": {}, 203 | "outputs": [], 204 | "source": [ 205 | "def outer_spec(a, b, out):\n", 206 | " for i in range(len(out)):\n", 207 | " for j in range(len(out[0])):\n", 208 | " out[i][j] = a[i] * b[j]\n", 209 | "\n", 210 | "def outer(a: TT[\"i\"], b: TT[\"j\"]) -> TT[\"i\", \"j\"]:\n", 211 | " raise NotImplementedError\n", 212 | "\n", 213 | "run_test(make_test(\"outer\", outer, outer_spec))" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": {}, 219 | "source": [ 220 | "### 5) diag\n", 221 | "https://numpy.org/doc/stable/reference/generated/numpy.diag.html" 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": null, 227 | "metadata": {}, 228 | "outputs": [], 229 | "source": [ 230 | "def diag_spec(a, out):\n", 231 | " for i in range(len(a)):\n", 232 | " out[i] = a[i][i]\n", 233 | " \n", 234 | "def diag(a: TT[\"i\", \"i\"]) -> TT[\"i\"]:\n", 235 | " raise NotImplementedError\n", 236 | "\n", 237 | "run_test(make_test(\"diag\", diag, diag_spec))" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "### 6) eye\n", 245 | "https://numpy.org/doc/stable/reference/generated/numpy.eye.html" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": null, 251 | "metadata": {}, 252 | "outputs": [], 253 | "source": [ 254 | "def eye_spec(out):\n", 255 | " for i in range(len(out)):\n", 256 | " out[i][i] = 1\n", 257 | " \n", 258 | "def eye(j: int) -> TT[\"j\", \"j\"]:\n", 259 | " raise NotImplementedError\n", 260 | "\n", 261 | "run_test(make_test(\"eye\", eye, eye_spec, add_sizes=[\"j\"]))" 262 | ] 263 | }, 264 | { 265 | "cell_type": "markdown", 266 | "metadata": {}, 267 | "source": [ 268 | "### 7) triu\n", 269 | "https://numpy.org/doc/stable/reference/generated/numpy.triu.html" 270 | ] 271 | }, 272 | { 273 | "cell_type": "code", 274 | "execution_count": null, 275 | "metadata": {}, 276 | "outputs": [], 277 | "source": [ 278 | "def triu_spec(out):\n", 279 | " for i in range(len(out)):\n", 280 | " for j in range(len(out)):\n", 281 | " if i <= j:\n", 282 | " out[i][j] = 1\n", 283 | " else:\n", 284 | " out[i][j] = 0\n", 285 | " \n", 286 | "def triu(j: int) -> TT[\"j\", \"j\"]:\n", 287 | " raise NotImplementedError\n", 288 | "\n", 289 | "run_test(make_test(\"triu\", triu, triu_spec, add_sizes=[\"j\"]))" 290 | ] 291 | }, 292 | { 293 | "cell_type": "markdown", 294 | "metadata": {}, 295 | "source": [ 296 | "### 8) cumsum\n", 297 | "https://numpy.org/doc/stable/reference/generated/numpy.cumsum.html" 298 | ] 299 | }, 300 | { 301 | "cell_type": "code", 302 | "execution_count": null, 303 | "metadata": {}, 304 | "outputs": [], 305 | "source": [ 306 | "def cumsum_spec(a, out):\n", 307 | " total = 0\n", 308 | " for i in range(len(out)):\n", 309 | " out[i] = total + a[i]\n", 310 | " total += a[i]\n", 311 | "\n", 312 | "def cumsum(a: TT[\"i\"]) -> TT[\"i\"]:\n", 313 | " raise NotImplementedError\n", 314 | "\n", 315 | "run_test(make_test(\"cumsum\", cumsum, cumsum_spec))" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": {}, 321 | "source": [ 322 | "### 9) diff\n", 323 | "https://numpy.org/doc/stable/reference/generated/numpy.diff.html" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "execution_count": null, 329 | "metadata": {}, 330 | "outputs": [], 331 | "source": [ 332 | "def diff_spec(a, out):\n", 333 | " out[0] = a[0]\n", 334 | " for i in range(1, len(out)):\n", 335 | " out[i] = a[i] - a[i - 1]\n", 336 | "\n", 337 | "def diff(a: TT[\"i\"], i: int) -> TT[\"i\"]:\n", 338 | " raise NotImplementedError\n", 339 | "\n", 340 | "run_test(make_test(\"diff\", diff, diff_spec, add_sizes=[\"i\"]))" 341 | ] 342 | }, 343 | { 344 | "cell_type": "markdown", 345 | "metadata": {}, 346 | "source": [ 347 | "### 10) vstack\n", 348 | "https://numpy.org/doc/stable/reference/generated/numpy.vstack.html" 349 | ] 350 | }, 351 | { 352 | "cell_type": "code", 353 | "execution_count": null, 354 | "metadata": {}, 355 | "outputs": [], 356 | "source": [ 357 | "def vstack_spec(a, b, out):\n", 358 | " for i in range(len(out[0])):\n", 359 | " out[0][i] = a[i]\n", 360 | " out[1][i] = b[i]\n", 361 | "\n", 362 | "def vstack(a: TT[\"i\"], b: TT[\"i\"]) -> TT[2, \"i\"]:\n", 363 | " raise NotImplementedError\n", 364 | "\n", 365 | "run_test(make_test(\"vstack\", vstack, vstack_spec))" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": {}, 371 | "source": [ 372 | "### 11) roll\n", 373 | "https://numpy.org/doc/stable/reference/generated/numpy.roll.html" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": null, 379 | "metadata": {}, 380 | "outputs": [], 381 | "source": [ 382 | "def roll_spec(a, out):\n", 383 | " for i in range(len(out)):\n", 384 | " if i + 1 < len(out):\n", 385 | " out[i] = a[i + 1]\n", 386 | " else:\n", 387 | " out[i] = a[i + 1 - len(out)]\n", 388 | " \n", 389 | "def roll(a: TT[\"i\"], i: int) -> TT[\"i\"]:\n", 390 | " raise NotImplementedError\n", 391 | "\n", 392 | "run_test(make_test(\"roll\", roll, roll_spec, add_sizes=[\"i\"]))" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": {}, 398 | "source": [ 399 | "### 12) flip\n", 400 | "https://numpy.org/doc/stable/reference/generated/numpy.flip.html" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": null, 406 | "metadata": {}, 407 | "outputs": [], 408 | "source": [ 409 | "def flip_spec(a, out):\n", 410 | " for i in range(len(out)):\n", 411 | " out[i] = a[len(out) - i - 1]\n", 412 | " \n", 413 | "def flip(a: TT[\"i\"], i: int) -> TT[\"i\"]:\n", 414 | " raise NotImplementedError\n", 415 | "\n", 416 | "run_test(make_test(\"flip\", flip, flip_spec, add_sizes=[\"i\"]))" 417 | ] 418 | }, 419 | { 420 | "cell_type": "markdown", 421 | "metadata": {}, 422 | "source": [ 423 | "### 13) compress\n", 424 | "https://numpy.org/doc/stable/reference/generated/numpy.compress.html" 425 | ] 426 | }, 427 | { 428 | "cell_type": "code", 429 | "execution_count": null, 430 | "metadata": {}, 431 | "outputs": [], 432 | "source": [ 433 | "def compress_spec(g, v, out):\n", 434 | " j = 0\n", 435 | " for i in range(len(g)):\n", 436 | " if g[i]:\n", 437 | " out[j] = v[i]\n", 438 | " j += 1\n", 439 | " \n", 440 | "def compress(g: TT[\"i\", bool], v: TT[\"i\"], i:int) -> TT[\"i\"]:\n", 441 | " raise NotImplementedError\n", 442 | "\n", 443 | "run_test(make_test(\"compress\", compress, compress_spec, add_sizes=[\"i\"]))" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": {}, 449 | "source": [ 450 | "### 14) pad_to\n", 451 | "\n", 452 | "https://pytorch.org/docs/stable/generated/torch.nn.utils.rnn.pad_sequence.html?highlight=pad#torch.nn.utils.rnn.pad_sequence" 453 | ] 454 | }, 455 | { 456 | "cell_type": "code", 457 | "execution_count": null, 458 | "metadata": {}, 459 | "outputs": [], 460 | "source": [ 461 | "def pad_to_spec(a, out):\n", 462 | " for i in range(min(len(out), len(a))):\n", 463 | " out[i] = a[i]\n", 464 | "\n", 465 | "def pad_to(a: TT[\"i\"], i: int, j: int) -> TT[\"j\"]:\n", 466 | " raise NotImplementedError\n", 467 | "\n", 468 | "run_test(make_test(\"pad_to\", pad_to, pad_to_spec, add_sizes=[\"i\", \"j\"]))" 469 | ] 470 | }, 471 | { 472 | "cell_type": "markdown", 473 | "metadata": {}, 474 | "source": [ 475 | "### 15) sequence_mask\n", 476 | "https://www.tensorflow.org/api_docs/python/tf/sequence_mask" 477 | ] 478 | }, 479 | { 480 | "cell_type": "code", 481 | "execution_count": null, 482 | "metadata": {}, 483 | "outputs": [], 484 | "source": [ 485 | "def sequence_mask_spec(values, length, out):\n", 486 | " for i in range(len(out)):\n", 487 | " for j in range(len(out[0])):\n", 488 | " if j < length[i]:\n", 489 | " out[i][j] = values[i][j]\n", 490 | " else:\n", 491 | " out[i][j] = 0\n", 492 | "\n", 493 | "def constraint_set_length(d, sizes=None):\n", 494 | " d[\"length\"] = d[\"length\"] % d[\"values\"].shape[1]\n", 495 | " return d\n", 496 | " \n", 497 | "def sequence_mask(values: TT[\"i\", \"j\"], length: TT[\"i\", int]) -> TT[\"i\", \"j\"]:\n", 498 | " raise NotImplementedError\n", 499 | "\n", 500 | "run_test(make_test(\"sequence_mask\",\n", 501 | " sequence_mask, sequence_mask_spec, constraint=constraint_set_length\n", 502 | "))" 503 | ] 504 | }, 505 | { 506 | "cell_type": "markdown", 507 | "metadata": {}, 508 | "source": [ 509 | "### 16) bincount\n", 510 | "https://numpy.org/doc/stable/reference/generated/numpy.bincount.html" 511 | ] 512 | }, 513 | { 514 | "cell_type": "code", 515 | "execution_count": null, 516 | "metadata": {}, 517 | "outputs": [], 518 | "source": [ 519 | "def bincount_spec(a, out):\n", 520 | " for i in range(len(a)):\n", 521 | " out[a[i]] += 1\n", 522 | " \n", 523 | "def constraint_set_max(d, sizes=None):\n", 524 | " d[\"a\"] = d[\"a\"] % d[\"return\"].shape[0]\n", 525 | " return d\n", 526 | " \n", 527 | "def bincount(a: TT[\"i\"], j: int) -> TT[\"j\"]:\n", 528 | " raise NotImplementedError\n", 529 | "\n", 530 | "run_test(make_test(\"bincount\",\n", 531 | " bincount, bincount_spec, add_sizes=[\"j\"], constraint=constraint_set_max\n", 532 | "))" 533 | ] 534 | }, 535 | { 536 | "cell_type": "markdown", 537 | "metadata": {}, 538 | "source": [ 539 | "### 17) scatter_add\n", 540 | "https://pytorch-scatter.readthedocs.io/en/1.3.0/functions/add.html" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": null, 546 | "metadata": {}, 547 | "outputs": [], 548 | "source": [ 549 | "def scatter_add_spec(values, link, out):\n", 550 | " for j in range(len(values)):\n", 551 | " out[link[j]] += values[j]\n", 552 | "\n", 553 | "def constraint_set_max(d, sizes=None):\n", 554 | " d[\"link\"] = d[\"link\"] % d[\"return\"].shape[0]\n", 555 | " return d\n", 556 | "\n", 557 | "def scatter_add(values: TT[\"i\"], link: TT[\"i\"], j: int) -> TT[\"j\"]:\n", 558 | " raise NotImplementedError\n", 559 | "\n", 560 | "\n", 561 | "run_test(make_test(\"scatter_add\",\n", 562 | " scatter_add, scatter_add_spec, add_sizes=[\"j\"], constraint=constraint_set_max\n", 563 | "))" 564 | ] 565 | }, 566 | { 567 | "cell_type": "markdown", 568 | "metadata": {}, 569 | "source": [ 570 | "### 18) flatten\n", 571 | "\n", 572 | "https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html" 573 | ] 574 | }, 575 | { 576 | "cell_type": "code", 577 | "execution_count": null, 578 | "metadata": {}, 579 | "outputs": [], 580 | "source": [ 581 | "def flatten_spec(a, out):\n", 582 | " k = 0\n", 583 | " for i in range(len(a)):\n", 584 | " for j in range(len(a[0])):\n", 585 | " out[k] = a[i][j]\n", 586 | " k += 1\n", 587 | "\n", 588 | "def flatten(a: TT[\"i\", \"j\"], i:int, j:int) -> TT[\"i * j\"]:\n", 589 | " raise NotImplementedError\n", 590 | "\n", 591 | "run_test(make_test(\"flatten\", flatten, flatten_spec, add_sizes=[\"i\", \"j\"]))" 592 | ] 593 | }, 594 | { 595 | "cell_type": "markdown", 596 | "metadata": {}, 597 | "source": [ 598 | "### 19) linspace\n", 599 | "\n", 600 | "https://numpy.org/doc/stable/reference/generated/numpy.linspace.html" 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": null, 606 | "metadata": {}, 607 | "outputs": [], 608 | "source": [ 609 | "def linspace_spec(i, j, out):\n", 610 | " for k in range(len(out)):\n", 611 | " out[k] = float(i + (j - i) * k / max(1, len(out) - 1))\n", 612 | "\n", 613 | "def linspace(i: TT[1], j: TT[1], n: int) -> TT[\"n\", float]:\n", 614 | " raise NotImplementedError\n", 615 | "\n", 616 | "run_test(make_test(\"linspace\", linspace, linspace_spec, add_sizes=[\"n\"]))" 617 | ] 618 | }, 619 | { 620 | "cell_type": "markdown", 621 | "metadata": {}, 622 | "source": [ 623 | "### 20) heaviside\n", 624 | "\n", 625 | "https://numpy.org/doc/stable/reference/generated/numpy.heaviside.html" 626 | ] 627 | }, 628 | { 629 | "cell_type": "code", 630 | "execution_count": null, 631 | "metadata": {}, 632 | "outputs": [], 633 | "source": [ 634 | "def heaviside_spec(a, b, out):\n", 635 | " for k in range(len(out)):\n", 636 | " if a[k] == 0:\n", 637 | " out[k] = b[k]\n", 638 | " else:\n", 639 | " out[k] = int(a[k] > 0)\n", 640 | "\n", 641 | "def heaviside(a: TT[\"i\"], b: TT[\"i\"]) -> TT[\"i\"]:\n", 642 | " raise NotImplementedError\n", 643 | "\n", 644 | "run_test(make_test(\"heaviside\", heaviside, heaviside_spec))" 645 | ] 646 | }, 647 | { 648 | "cell_type": "markdown", 649 | "metadata": {}, 650 | "source": [ 651 | "### 21) hstack\n", 652 | "\n", 653 | "https://numpy.org/doc/stable/reference/generated/numpy.hstack.html" 654 | ] 655 | }, 656 | { 657 | "cell_type": "code", 658 | "execution_count": null, 659 | "metadata": {}, 660 | "outputs": [], 661 | "source": [ 662 | "def hstack_spec(a, b, out):\n", 663 | " for i in range(len(out)):\n", 664 | " out[i][0] = a[i]\n", 665 | " out[i][1] = b[i]\n", 666 | " \n", 667 | "def hstack(a: TT[\"i\"], b: TT[\"i\"]) -> TT[\"i\", 2]:\n", 668 | " raise NotImplementedError\n", 669 | "\n", 670 | "run_test(make_test(\"hstack\", hstack, hstack_spec))" 671 | ] 672 | }, 673 | { 674 | "cell_type": "markdown", 675 | "metadata": {}, 676 | "source": [ 677 | "---\n", 678 | "\n", 679 | "No more puppies from now on... For now, check with the examples shown in the docs." 680 | ] 681 | }, 682 | { 683 | "cell_type": "markdown", 684 | "metadata": {}, 685 | "source": [ 686 | "### 22) view (1d to 2d)\n", 687 | "\n", 688 | "https://pytorch.org/docs/stable/generated/torch.Tensor.view.html" 689 | ] 690 | }, 691 | { 692 | "cell_type": "code", 693 | "execution_count": null, 694 | "metadata": {}, 695 | "outputs": [], 696 | "source": [ 697 | "def view(a: TT[\"i * j\"], i: int, j: int) -> TT[\"i\", \"j\"]:\n", 698 | " raise NotImplementedError" 699 | ] 700 | }, 701 | { 702 | "cell_type": "markdown", 703 | "metadata": {}, 704 | "source": [ 705 | "### 23) repeat (1d)\n", 706 | "\n", 707 | "https://pytorch.org/docs/stable/generated/torch.Tensor.repeat.html" 708 | ] 709 | }, 710 | { 711 | "cell_type": "code", 712 | "execution_count": null, 713 | "metadata": {}, 714 | "outputs": [], 715 | "source": [ 716 | "def repeat(a: TT[\"i\"], d: int) -> TT[\"d\"]:\n", 717 | " raise NotImplementedError" 718 | ] 719 | }, 720 | { 721 | "cell_type": "markdown", 722 | "metadata": {}, 723 | "source": [ 724 | "### 24) repeat_interleave (1d)\n", 725 | "\n", 726 | "https://pytorch.org/docs/stable/generated/torch.repeat_interleave.html" 727 | ] 728 | }, 729 | { 730 | "cell_type": "code", 731 | "execution_count": null, 732 | "metadata": {}, 733 | "outputs": [], 734 | "source": [ 735 | "def repeat_interleave(a: TT[\"i\"], d: int) -> TT[\"d\"]:\n", 736 | " raise NotImplementedError" 737 | ] 738 | }, 739 | { 740 | "cell_type": "markdown", 741 | "metadata": {}, 742 | "source": [ 743 | "### 25) chunk\n", 744 | "https://pytorch.org/docs/stable/generated/torch.chunk.html" 745 | ] 746 | }, 747 | { 748 | "cell_type": "code", 749 | "execution_count": null, 750 | "metadata": {}, 751 | "outputs": [], 752 | "source": [ 753 | "def chunk(a: TT[\"i\"], c: int) -> TT[\"c\", \"i // c\"]:\n", 754 | " raise NotImplementedError" 755 | ] 756 | }, 757 | { 758 | "cell_type": "markdown", 759 | "metadata": {}, 760 | "source": [ 761 | "### 26) nonzero\n", 762 | "https://pytorch.org/docs/stable/generated/torch.nonzero.html" 763 | ] 764 | }, 765 | { 766 | "cell_type": "code", 767 | "execution_count": null, 768 | "metadata": {}, 769 | "outputs": [], 770 | "source": [ 771 | "def nonzero(a: TT[\"i\",\"j\"], i: int, j: int) -> TT[\"k\", 2]:\n", 772 | " raise NotImplementedError" 773 | ] 774 | }, 775 | { 776 | "cell_type": "markdown", 777 | "metadata": {}, 778 | "source": [ 779 | "### 27) bucketize\n", 780 | "https://pytorch.org/docs/stable/generated/torch.bucketize.html" 781 | ] 782 | }, 783 | { 784 | "cell_type": "code", 785 | "execution_count": null, 786 | "metadata": {}, 787 | "outputs": [], 788 | "source": [ 789 | "def bucketize(v: TT[\"i\"], boundaries: TT[\"j\"]) -> TT[\"i\"]:\n", 790 | " raise NotImplementedError" 791 | ] 792 | } 793 | ], 794 | "metadata": { 795 | "anaconda-cloud": {}, 796 | "celltoolbar": "Raw Cell Format", 797 | "jupytext": { 798 | "formats": "ipynb,py:percent" 799 | }, 800 | "kernelspec": { 801 | "display_name": "Python 3 (ipykernel)", 802 | "language": "python", 803 | "name": "python3" 804 | }, 805 | "language_info": { 806 | "codemirror_mode": { 807 | "name": "ipython", 808 | "version": 3 809 | }, 810 | "file_extension": ".py", 811 | "mimetype": "text/x-python", 812 | "name": "python", 813 | "nbconvert_exporter": "python", 814 | "pygments_lexer": "ipython3", 815 | "version": "3.9.7" 816 | } 817 | }, 818 | "nbformat": 4, 819 | "nbformat_minor": 2 820 | } 821 | -------------------------------------------------------------------------------- /img/common_mistakes.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/srush/pytorch-lecture/8b348fe95ec3c1157c37cacbb8bd71894bd17895/img/common_mistakes.png -------------------------------------------------------------------------------- /img/dynamic_graph.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/srush/pytorch-lecture/8b348fe95ec3c1157c37cacbb8bd71894bd17895/img/dynamic_graph.gif -------------------------------------------------------------------------------- /img/pytorch-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/srush/pytorch-lecture/8b348fe95ec3c1157c37cacbb8bd71894bd17895/img/pytorch-logo.png -------------------------------------------------------------------------------- /img/pytorch_logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/srush/pytorch-lecture/8b348fe95ec3c1157c37cacbb8bd71894bd17895/img/pytorch_logo.png -------------------------------------------------------------------------------- /img/pytorch_logo_flame.png: -------------------------------------------------------------------------------- 1 | --2019-03-18 14:43:17-- https://pytorch.org/assets/images/pytorch-logo.png 2 | Resolving pytorch.org (pytorch.org)... 185.199.108.153 3 | Connecting to pytorch.org (pytorch.org)|185.199.108.153|:443... connected. 4 | HTTP request sent, awaiting response... 200 OK 5 | Length: 22916 (22K) [image/png] 6 | Saving to: ‘pytorch-logo.png’ 7 | 8 | 0K .......... .......... .. 100% 664K=0,03s 9 | 10 | 2019-03-18 14:43:18 (664 KB/s) - ‘pytorch-logo.png’ saved [22916/22916] 11 | 12 | -------------------------------------------------------------------------------- /img/the_real_reason.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/srush/pytorch-lecture/8b348fe95ec3c1157c37cacbb8bd71894bd17895/img/the_real_reason.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | celluloid 2 | ipython 3 | ipdb 4 | jupyter 5 | matplotlib 6 | numpy 7 | scikit-learn 8 | scipy 9 | seaborn 10 | pandas 11 | Pillow 12 | torch 13 | torchvision 14 | -------------------------------------------------------------------------------- /spec.py: -------------------------------------------------------------------------------- 1 | import torch 2 | from torchtyping import TensorType as TT 3 | from hypothesis.extra.numpy import arrays 4 | from hypothesis.strategies import integers, lists, composite, floats 5 | from hypothesis import given 6 | import numpy as np 7 | import random 8 | import sys 9 | import typing 10 | 11 | tensor = torch.tensor 12 | 13 | numpy_to_torch_dtype_dict = { 14 | bool: torch.bool, 15 | np.uint8: torch.uint8, 16 | np.int8: torch.int8, 17 | np.int16: torch.int16, 18 | np.int32: torch.int32, 19 | np.int64: torch.int64, 20 | np.float16: torch.float16, 21 | np.float32: torch.float32, 22 | np.float64: torch.float64, 23 | } 24 | torch_to_numpy_dtype_dict = {v: k for k, v in numpy_to_torch_dtype_dict.items()} 25 | 26 | 27 | @composite 28 | def spec(draw, x, min_size=1): 29 | # Get the type hints. 30 | if sys.version_info >= (3, 9): 31 | gth = typing.get_type_hints(x, include_extras=True) 32 | else: 33 | gth = typing.get_type_hints(x) 34 | 35 | # Collect all the dimension names. 36 | names = set() 37 | for k in gth: 38 | if not hasattr(gth[k], "__metadata__"): 39 | continue 40 | dims = gth[k].__metadata__[0]["details"][0].dims 41 | names.update([d.name for d in dims if isinstance(d.name, str)]) 42 | names = list(names) 43 | 44 | # draw sizes for each dim. 45 | size = integers(min_value=min_size, max_value=5) 46 | arr = draw(arrays(shape=(len(names),), unique=True, elements=size, dtype=np.int32)).tolist() 47 | sizes = dict(zip(names, arr)) 48 | for n in list(sizes.keys()): 49 | if '*' in n or '+' in n or '-' in n or '//' in n: 50 | i, op, j = n.split() 51 | i_val = i if i.isdigit() else sizes[i] 52 | j_val = j if j.isdigit() else sizes[j] 53 | sizes[n] = eval('{}{}{}'.format(i_val, op,j_val)) 54 | 55 | # Create tensors for each size. 56 | ret = {} 57 | for k in gth: 58 | if not hasattr(gth[k], "__metadata__"): 59 | continue 60 | shape = tuple( 61 | [ 62 | sizes[d.name] if isinstance(d.name, str) else d.size 63 | for d in gth[k].__metadata__[0]["details"][0].dims 64 | ] 65 | ) 66 | dtype = (torch_to_numpy_dtype_dict[ 67 | gth[k].__metadata__[0]["details"][1].dtype 68 | ] 69 | if len(gth[k].__metadata__[0]["details"]) >= 2 70 | else int) 71 | ret[k] = draw( 72 | arrays( 73 | shape=shape, 74 | dtype=dtype, 75 | elements=integers(min_value=-5, max_value=5) if 76 | dtype == int else None, 77 | unique=False 78 | ) 79 | ) 80 | ret[k] = np.nan_to_num(ret[k], nan=0, neginf=0, posinf=0) 81 | ret["return"][:] = 0 82 | return ret, sizes 83 | 84 | 85 | def make_test(name, problem, problem_spec, add_sizes=[], constraint=lambda d, sizes: d): 86 | examples = [] 87 | for i in range(3): 88 | example, sizes = spec(problem, 3).example() 89 | example = constraint(example, sizes=sizes) 90 | out = example["return"].tolist() 91 | del example["return"] 92 | problem_spec(*example.values(), out) 93 | 94 | for size in add_sizes: 95 | example[size] = sizes[size] 96 | 97 | yours = None 98 | try: 99 | yours = problem(*map(tensor, example.values())) 100 | 101 | except AssertionError: 102 | pass 103 | for size in add_sizes: 104 | del example[size] 105 | example["target"] = tensor(out) 106 | if yours is not None: 107 | example["yours"] = yours 108 | examples.append(example) 109 | 110 | @given(spec(problem)) 111 | def test_problem(d): 112 | d, sizes = d 113 | 114 | d = constraint(d, sizes=sizes) 115 | out = d["return"].tolist() 116 | del d["return"] 117 | problem_spec(*d.values(), out) 118 | for size in add_sizes: 119 | d[size] = sizes[size] 120 | 121 | out2 = problem(*map(tensor, d.values())) 122 | out = tensor(out) 123 | out2 = torch.broadcast_to(out2, out.shape) 124 | assert torch.allclose( 125 | out, out2 126 | ), "Two tensors are not equal\n Spec: \n\t%s \n\t%s" % (out, out2) 127 | 128 | return test_problem 129 | 130 | 131 | def run_test(fn): 132 | fn() 133 | # Generate a random puppy video if you are correct. 134 | print("Correct!") 135 | from IPython.display import HTML 136 | pups = [ 137 | "2m78jPG", 138 | "pn1e9TO", 139 | "MQCIwzT", 140 | "udLK6FS", 141 | "ZNem5o3", 142 | "DS2IZ6K", 143 | "aydRUz8", 144 | "MVUdQYK", 145 | "kLvno0p", 146 | "wScLiVz", 147 | "Z0TII8i", 148 | "F1SChho", 149 | "9hRi2jN", 150 | "lvzRF3W", 151 | "fqHxOGI", 152 | "1xeUYme", 153 | "6tVqKyM", 154 | "CCxZ6Wr", 155 | "lMW0OPQ", 156 | "wHVpHVG", 157 | "Wj2PGRl", 158 | "HlaTE8H", 159 | "k5jALH0", 160 | "3V37Hqr", 161 | "Eq2uMTA", 162 | "Vy9JShx", 163 | "g9I2ZmK", 164 | "Nu4RH7f", 165 | "sWp0Dqd", 166 | "bRKfspn", 167 | "qawCMl5", 168 | "2F6j2B4", 169 | "fiJxCVA", 170 | "pCAIlxD", 171 | "zJx2skh", 172 | "2Gdl1u7", 173 | "aJJAY4c", 174 | "ros6RLC", 175 | "DKLBJh7", 176 | "eyxH0Wc", 177 | "rJEkEw4"] 178 | return HTML(""" 179 | 182 | """%(random.sample(pups, 1)[0])) 183 | --------------------------------------------------------------------------------