├── floyd.yml ├── README.md └── nn_tutorial.ipynb /floyd.yml: -------------------------------------------------------------------------------- 1 | env: pytorch-1.0 2 | input: 3 | - destination: mnist 4 | source: redeipirati/datasets/pytorch-mnist/2 5 | machine: cpu -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # PyTorch NN Tutorial on FloydHub Workspace 2 | 3 | By clicking on the below button, you will create a new [FloydHub Workspace](https://www.floydhub.com/product/build) with the notebook tutorial: [WHAT IS TORCH.NN REALLY?](https://pytorch.org/tutorials/beginner/nn_tutorial.html) created by Jeremy Howard and the [fast.ai](https://www.fast.ai/) Team. 4 | 5 | [![Run on FH](https://static.floydhub.com/button/button.svg)](https://floydhub.com/run) -------------------------------------------------------------------------------- /nn_tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "%matplotlib inline" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "\n", 19 | "What is `torch.nn` *really*?\n", 20 | "============================\n", 21 | "by Jeremy Howard, `fast.ai `_. Thanks to Rachel Thomas and Francisco Ingham.\n", 22 | "\n" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "PyTorch provides the elegantly designed modules and classes `torch.nn `_ ,\n", 30 | "`torch.optim `_ ,\n", 31 | "`Dataset `_ ,\n", 32 | "and `DataLoader `_\n", 33 | "to help you create and train neural networks.\n", 34 | "In order to fully utilize their power and customize\n", 35 | "them for your problem, you need to really understand exactly what they're\n", 36 | "doing. To develop this understanding, we will first train basic neural net\n", 37 | "on the MNIST data set without using any features from these models; we will\n", 38 | "initially only use the most basic PyTorch tensor functionality. Then, we will\n", 39 | "incrementally add one feature from ``torch.nn``, ``torch.optim``, ``Dataset``, or\n", 40 | "``DataLoader`` at a time, showing exactly what each piece does, and how it\n", 41 | "works to make the code either more concise, or more flexible.\n", 42 | "\n", 43 | "**This tutorial assumes you already familiar\n", 44 | "with the basics of tensor operations.** (If you're familiar with Numpy array\n", 45 | "operations, you'll find the PyTorch tensor operations used here nearly identical).\n", 46 | "\n", 47 | "MNIST data setup\n", 48 | "----------------\n", 49 | "\n", 50 | "We will use the classic `MNIST `_ dataset,\n", 51 | "which consists of black-and-white images of hand-drawn digits (between 0 and 9).\n", 52 | "\n", 53 | "We will use `pathlib `_\n", 54 | "for dealing with paths (part of the Python 3 standard library). We will only\n", 55 | "import modules when we use them, so you can see exactly what's being\n", 56 | "used at each point.\n", 57 | "\n", 58 | "**Note: the dataset has been already attached to the Workspace.**" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": null, 64 | "metadata": { 65 | "collapsed": false 66 | }, 67 | "outputs": [], 68 | "source": [ 69 | "from pathlib import Path\n", 70 | "import requests\n", 71 | "\n", 72 | "DATA_PATH = Path(\"/floyd/input\")\n", 73 | "PATH = DATA_PATH / \"mnist\"\n", 74 | "FILENAME = \"mnist.pkl.gz\"" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": {}, 80 | "source": [ 81 | "This dataset is in numpy array format, and has been stored using pickle,\n", 82 | "a python-specific format for serializing data.\n", 83 | "\n" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": { 90 | "collapsed": false 91 | }, 92 | "outputs": [], 93 | "source": [ 94 | "import pickle\n", 95 | "import gzip\n", 96 | "\n", 97 | "with gzip.open((PATH / FILENAME).as_posix(), \"rb\") as f:\n", 98 | " ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding=\"latin-1\")" 99 | ] 100 | }, 101 | { 102 | "cell_type": "markdown", 103 | "metadata": {}, 104 | "source": [ 105 | "Each image is 28 x 28, and is being stored as a flattened row of length\n", 106 | "784 (=28x28). Let's take a look at one; we need to reshape it to 2d\n", 107 | "first.\n", 108 | "\n" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": null, 114 | "metadata": { 115 | "collapsed": false 116 | }, 117 | "outputs": [], 118 | "source": [ 119 | "from matplotlib import pyplot\n", 120 | "import numpy as np\n", 121 | "\n", 122 | "pyplot.imshow(x_train[0].reshape((28, 28)), cmap=\"gray\")\n", 123 | "print(x_train.shape)" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "PyTorch uses ``torch.tensor``, rather than numpy arrays, so we need to\n", 131 | "convert our data.\n", 132 | "\n" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": null, 138 | "metadata": { 139 | "collapsed": false 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "import torch\n", 144 | "\n", 145 | "x_train, y_train, x_valid, y_valid = map(\n", 146 | " torch.tensor, (x_train, y_train, x_valid, y_valid)\n", 147 | ")\n", 148 | "n, c = x_train.shape\n", 149 | "x_train, x_train.shape, y_train.min(), y_train.max()\n", 150 | "print(x_train, y_train)\n", 151 | "print(x_train.shape)\n", 152 | "print(y_train.min(), y_train.max())" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": {}, 158 | "source": [ 159 | "Neural net from scratch (no torch.nn)\n", 160 | "---------------------------------------------\n", 161 | "\n", 162 | "Let's first create a model using nothing but PyTorch tensor operations. We're assuming\n", 163 | "you're already familiar with the basics of neural networks. (If you're not, you can\n", 164 | "learn them at `course.fast.ai `_).\n", 165 | "\n", 166 | "PyTorch provides methods to create random or zero-filled tensors, which we will\n", 167 | "use to create our weights and bias for a simple linear model. These are just regular\n", 168 | "tensors, with one very special addition: we tell PyTorch that they require a\n", 169 | "gradient. This causes PyTorch to record all of the operations done on the tensor,\n", 170 | "so that it can calculate the gradient during back-propagation *automatically*!\n", 171 | "\n", 172 | "For the weights, we set ``requires_grad`` **after** the initialization, since we\n", 173 | "don't want that step included in the gradient. (Note that a trailling ``_`` in\n", 174 | "PyTorch signifies that the operation is performed in-place.)\n", 175 | "\n", 176 | "

Note

We are initializing the weights here with\n", 177 | " `Xavier initialisation `_\n", 178 | " (by multiplying with 1/sqrt(n)).

\n", 179 | "\n" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": null, 185 | "metadata": { 186 | "collapsed": false 187 | }, 188 | "outputs": [], 189 | "source": [ 190 | "import math\n", 191 | "\n", 192 | "weights = torch.randn(784, 10) / math.sqrt(784)\n", 193 | "weights.requires_grad_()\n", 194 | "bias = torch.zeros(10, requires_grad=True)" 195 | ] 196 | }, 197 | { 198 | "cell_type": "markdown", 199 | "metadata": {}, 200 | "source": [ 201 | "Thanks to PyTorch's ability to calculate gradients automatically, we can\n", 202 | "use any standard Python function (or callable object) as a model! So\n", 203 | "let's just write a plain matrix multiplication and broadcasted addition\n", 204 | "to create a simple linear model. We also need an activation function, so\n", 205 | "we'll write `log_softmax` and use it. Remember: although PyTorch\n", 206 | "provides lots of pre-written loss functions, activation functions, and\n", 207 | "so forth, you can easily write your own using plain python. PyTorch will\n", 208 | "even create fast GPU or vectorized CPU code for your function\n", 209 | "automatically.\n", 210 | "\n" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": null, 216 | "metadata": { 217 | "collapsed": false 218 | }, 219 | "outputs": [], 220 | "source": [ 221 | "def log_softmax(x):\n", 222 | " return x - x.exp().sum(-1).log().unsqueeze(-1)\n", 223 | "\n", 224 | "def model(xb):\n", 225 | " return log_softmax(xb @ weights + bias)" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "In the above, the ``@`` stands for the dot product operation. We will call\n", 233 | "our function on one batch of data (in this case, 64 images). This is\n", 234 | "one *forward pass*. Note that our predictions won't be any better than\n", 235 | "random at this stage, since we start with random weights.\n", 236 | "\n" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": null, 242 | "metadata": { 243 | "collapsed": false 244 | }, 245 | "outputs": [], 246 | "source": [ 247 | "bs = 64 # batch size\n", 248 | "\n", 249 | "xb = x_train[0:bs] # a mini-batch from x\n", 250 | "preds = model(xb) # predictions\n", 251 | "preds[0], preds.shape\n", 252 | "print(preds[0], preds.shape)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "As you see, the ``preds`` tensor contains not only the tensor values, but also a\n", 260 | "gradient function. We'll use this later to do backprop.\n", 261 | "\n", 262 | "Let's implement negative log-likelihood to use as the loss function\n", 263 | "(again, we can just use standard Python):\n", 264 | "\n" 265 | ] 266 | }, 267 | { 268 | "cell_type": "code", 269 | "execution_count": null, 270 | "metadata": { 271 | "collapsed": false 272 | }, 273 | "outputs": [], 274 | "source": [ 275 | "def nll(input, target):\n", 276 | " return -input[range(target.shape[0]), target].mean()\n", 277 | "\n", 278 | "loss_func = nll" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "Let's check our loss with our random model, so we can see if we improve\n", 286 | "after a backprop pass later.\n", 287 | "\n" 288 | ] 289 | }, 290 | { 291 | "cell_type": "code", 292 | "execution_count": null, 293 | "metadata": { 294 | "collapsed": false 295 | }, 296 | "outputs": [], 297 | "source": [ 298 | "yb = y_train[0:bs]\n", 299 | "print(loss_func(preds, yb))" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": {}, 305 | "source": [ 306 | "Let's also implement a function to calculate the accuracy of our model.\n", 307 | "For each prediction, if the index with the largest value matches the\n", 308 | "target value, then the prediction was correct.\n", 309 | "\n" 310 | ] 311 | }, 312 | { 313 | "cell_type": "code", 314 | "execution_count": null, 315 | "metadata": { 316 | "collapsed": false 317 | }, 318 | "outputs": [], 319 | "source": [ 320 | "def accuracy(out, yb):\n", 321 | " preds = torch.argmax(out, dim=1)\n", 322 | " return (preds == yb).float().mean()" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "Let's check the accuracy of our random model, so we can see if our\n", 330 | "accuracy improves as our loss improves.\n", 331 | "\n" 332 | ] 333 | }, 334 | { 335 | "cell_type": "code", 336 | "execution_count": null, 337 | "metadata": { 338 | "collapsed": false 339 | }, 340 | "outputs": [], 341 | "source": [ 342 | "print(accuracy(preds, yb))" 343 | ] 344 | }, 345 | { 346 | "cell_type": "markdown", 347 | "metadata": {}, 348 | "source": [ 349 | "We can now run a training loop. For each iteration, we will:\n", 350 | "\n", 351 | "- select a mini-batch of data (of size ``bs``)\n", 352 | "- use the model to make predictions\n", 353 | "- calculate the loss\n", 354 | "- ``loss.backward()`` updates the gradients of the model, in this case, ``weights``\n", 355 | " and ``bias``.\n", 356 | "\n", 357 | "We now use these gradients to update the weights and bias. We do this\n", 358 | "within the ``torch.no_grad()`` context manager, because we do not want these\n", 359 | "actions to be recorded for our next calculation of the gradient. You can read\n", 360 | "more about how PyTorch's Autograd records operations\n", 361 | "`here `_.\n", 362 | "\n", 363 | "We then set the\n", 364 | "gradients to zero, so that we are ready for the next loop.\n", 365 | "Otherwise, our gradients would record a running tally of all the operations\n", 366 | "that had happened (i.e. ``loss.backward()`` *adds* the gradients to whatever is\n", 367 | "already stored, rather than replacing them).\n", 368 | "\n", 369 | ".. tip:: You can use the standard python debugger to step through PyTorch\n", 370 | " code, allowing you to check the various variable values at each step.\n", 371 | " Uncomment ``set_trace()`` below to try it out.\n", 372 | "\n", 373 | "\n" 374 | ] 375 | }, 376 | { 377 | "cell_type": "code", 378 | "execution_count": null, 379 | "metadata": { 380 | "collapsed": false 381 | }, 382 | "outputs": [], 383 | "source": [ 384 | "from IPython.core.debugger import set_trace\n", 385 | "\n", 386 | "lr = 0.5 # learning rate\n", 387 | "epochs = 2 # how many epochs to train for\n", 388 | "\n", 389 | "for epoch in range(epochs):\n", 390 | " for i in range((n - 1) // bs + 1):\n", 391 | " # set_trace()\n", 392 | " start_i = i * bs\n", 393 | " end_i = start_i + bs\n", 394 | " xb = x_train[start_i:end_i]\n", 395 | " yb = y_train[start_i:end_i]\n", 396 | " pred = model(xb)\n", 397 | " loss = loss_func(pred, yb)\n", 398 | "\n", 399 | " loss.backward()\n", 400 | " with torch.no_grad():\n", 401 | " weights -= weights.grad * lr\n", 402 | " bias -= bias.grad * lr\n", 403 | " weights.grad.zero_()\n", 404 | " bias.grad.zero_()" 405 | ] 406 | }, 407 | { 408 | "cell_type": "markdown", 409 | "metadata": {}, 410 | "source": [ 411 | "That's it: we've created and trained a minimal neural network (in this case, a\n", 412 | "logistic regression, since we have no hidden layers) entirely from scratch!\n", 413 | "\n", 414 | "Let's check the loss and accuracy and compare those to what we got\n", 415 | "earlier. We expect that the loss will have decreased and accuracy to\n", 416 | "have increased, and they have.\n", 417 | "\n" 418 | ] 419 | }, 420 | { 421 | "cell_type": "code", 422 | "execution_count": null, 423 | "metadata": { 424 | "collapsed": false 425 | }, 426 | "outputs": [], 427 | "source": [ 428 | "print(loss_func(model(xb), yb), accuracy(model(xb), yb))" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": {}, 434 | "source": [ 435 | "Using torch.nn.functional\n", 436 | "------------------------------\n", 437 | "\n", 438 | "We will now refactor our code, so that it does the same thing as before, only\n", 439 | "we'll start taking advantage of PyTorch's ``nn`` classes to make it more concise\n", 440 | "and flexible. At each step from here, we should be making our code one or more\n", 441 | "of: shorter, more understandable, and/or more flexible.\n", 442 | "\n", 443 | "The first and easiest step is to make our code shorter by replacing our\n", 444 | "hand-written activation and loss functions with those from ``torch.nn.functional``\n", 445 | "(which is generally imported into the namespace ``F`` by convention). This module\n", 446 | "contains all the functions in the ``torch.nn`` library (whereas other parts of the\n", 447 | "library contain classes). As well as a wide range of loss and activation\n", 448 | "functions, you'll also find here some convenient functions for creating neural\n", 449 | "nets, such as pooling functions. (There are also functions for doing convolutions,\n", 450 | "linear layers, etc, but as we'll see, these are usually better handled using\n", 451 | "other parts of the library.)\n", 452 | "\n", 453 | "If you're using negative log likelihood loss and log softmax activation,\n", 454 | "then Pytorch provides a single function ``F.cross_entropy`` that combines\n", 455 | "the two. So we can even remove the activation function from our model.\n", 456 | "\n" 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": null, 462 | "metadata": { 463 | "collapsed": false 464 | }, 465 | "outputs": [], 466 | "source": [ 467 | "import torch.nn.functional as F\n", 468 | "\n", 469 | "loss_func = F.cross_entropy\n", 470 | "\n", 471 | "def model(xb):\n", 472 | " return xb @ weights + bias" 473 | ] 474 | }, 475 | { 476 | "cell_type": "markdown", 477 | "metadata": {}, 478 | "source": [ 479 | "Note that we no longer call ``log_softmax`` in the ``model`` function. Let's\n", 480 | "confirm that our loss and accuracy are the same as before:\n", 481 | "\n" 482 | ] 483 | }, 484 | { 485 | "cell_type": "code", 486 | "execution_count": null, 487 | "metadata": { 488 | "collapsed": false 489 | }, 490 | "outputs": [], 491 | "source": [ 492 | "print(loss_func(model(xb), yb), accuracy(model(xb), yb))" 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": {}, 498 | "source": [ 499 | "Refactor using nn.Module\n", 500 | "-----------------------------\n", 501 | "Next up, we'll use ``nn.Module`` and ``nn.Parameter``, for a clearer and more\n", 502 | "concise training loop. We subclass ``nn.Module`` (which itself is a class and\n", 503 | "able to keep track of state). In this case, we want to create a class that\n", 504 | "holds our weights, bias, and method for the forward step. ``nn.Module`` has a\n", 505 | "number of attributes and methods (such as ``.parameters()`` and ``.zero_grad()``)\n", 506 | "which we will be using.\n", 507 | "\n", 508 | "

Note

``nn.Module`` (uppercase M) is a PyTorch specific concept, and is a\n", 509 | " class we'll be using a lot. ``nn.Module`` is not to be confused with the Python\n", 510 | " concept of a (lowercase ``m``) `module `_,\n", 511 | " which is a file of Python code that can be imported.

\n", 512 | "\n" 513 | ] 514 | }, 515 | { 516 | "cell_type": "code", 517 | "execution_count": null, 518 | "metadata": { 519 | "collapsed": false 520 | }, 521 | "outputs": [], 522 | "source": [ 523 | "from torch import nn\n", 524 | "\n", 525 | "class Mnist_Logistic(nn.Module):\n", 526 | " def __init__(self):\n", 527 | " super().__init__()\n", 528 | " self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))\n", 529 | " self.bias = nn.Parameter(torch.zeros(10))\n", 530 | "\n", 531 | " def forward(self, xb):\n", 532 | " return xb @ self.weights + self.bias" 533 | ] 534 | }, 535 | { 536 | "cell_type": "markdown", 537 | "metadata": {}, 538 | "source": [ 539 | "Since we're now using an object instead of just using a function, we\n", 540 | "first have to instantiate our model:\n", 541 | "\n" 542 | ] 543 | }, 544 | { 545 | "cell_type": "code", 546 | "execution_count": null, 547 | "metadata": { 548 | "collapsed": false 549 | }, 550 | "outputs": [], 551 | "source": [ 552 | "model = Mnist_Logistic()" 553 | ] 554 | }, 555 | { 556 | "cell_type": "markdown", 557 | "metadata": {}, 558 | "source": [ 559 | "Now we can calculate the loss in the same way as before. Note that\n", 560 | "``nn.Module`` objects are used as if they are functions (i.e they are\n", 561 | "*callable*), but behind the scenes Pytorch will call our ``forward``\n", 562 | "method automatically.\n", 563 | "\n" 564 | ] 565 | }, 566 | { 567 | "cell_type": "code", 568 | "execution_count": null, 569 | "metadata": { 570 | "collapsed": false 571 | }, 572 | "outputs": [], 573 | "source": [ 574 | "print(loss_func(model(xb), yb))" 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": {}, 580 | "source": [ 581 | "Previously for our training loop we had to update the values for each parameter\n", 582 | "by name, and manually zero out the grads for each parameter separately, like this:\n", 583 | "::\n", 584 | " with torch.no_grad():\n", 585 | " weights -= weights.grad * lr\n", 586 | " bias -= bias.grad * lr\n", 587 | " weights.grad.zero_()\n", 588 | " bias.grad.zero_()\n", 589 | "\n", 590 | "\n", 591 | "Now we can take advantage of model.parameters() and model.zero_grad() (which\n", 592 | "are both defined by PyTorch for ``nn.Module``) to make those steps more concise\n", 593 | "and less prone to the error of forgetting some of our parameters, particularly\n", 594 | "if we had a more complicated model:\n", 595 | "::\n", 596 | " with torch.no_grad():\n", 597 | " for p in model.parameters(): p -= p.grad * lr\n", 598 | " model.zero_grad()\n", 599 | "\n", 600 | "\n", 601 | "We'll wrap our little training loop in a ``fit`` function so we can run it\n", 602 | "again later.\n", 603 | "\n" 604 | ] 605 | }, 606 | { 607 | "cell_type": "code", 608 | "execution_count": null, 609 | "metadata": { 610 | "collapsed": false 611 | }, 612 | "outputs": [], 613 | "source": [ 614 | "def fit():\n", 615 | " for epoch in range(epochs):\n", 616 | " for i in range((n - 1) // bs + 1):\n", 617 | " start_i = i * bs\n", 618 | " end_i = start_i + bs\n", 619 | " xb = x_train[start_i:end_i]\n", 620 | " yb = y_train[start_i:end_i]\n", 621 | " pred = model(xb)\n", 622 | " loss = loss_func(pred, yb)\n", 623 | "\n", 624 | " loss.backward()\n", 625 | " with torch.no_grad():\n", 626 | " for p in model.parameters():\n", 627 | " p -= p.grad * lr\n", 628 | " model.zero_grad()\n", 629 | "\n", 630 | "fit()" 631 | ] 632 | }, 633 | { 634 | "cell_type": "markdown", 635 | "metadata": {}, 636 | "source": [ 637 | "Let's double-check that our loss has gone down:\n", 638 | "\n" 639 | ] 640 | }, 641 | { 642 | "cell_type": "code", 643 | "execution_count": null, 644 | "metadata": { 645 | "collapsed": false 646 | }, 647 | "outputs": [], 648 | "source": [ 649 | "print(loss_func(model(xb), yb))" 650 | ] 651 | }, 652 | { 653 | "cell_type": "markdown", 654 | "metadata": {}, 655 | "source": [ 656 | "Refactor using nn.Linear\n", 657 | "-------------------------\n", 658 | "\n", 659 | "We continue to refactor our code. Instead of manually defining and\n", 660 | "initializing ``self.weights`` and ``self.bias``, and calculating ``xb @\n", 661 | "self.weights + self.bias``, we will instead use the Pytorch class\n", 662 | "`nn.Linear `_ for a\n", 663 | "linear layer, which does all that for us. Pytorch has many types of\n", 664 | "predefined layers that can greatly simplify our code, and often makes it\n", 665 | "faster too.\n", 666 | "\n" 667 | ] 668 | }, 669 | { 670 | "cell_type": "code", 671 | "execution_count": null, 672 | "metadata": { 673 | "collapsed": false 674 | }, 675 | "outputs": [], 676 | "source": [ 677 | "class Mnist_Logistic(nn.Module):\n", 678 | " def __init__(self):\n", 679 | " super().__init__()\n", 680 | " self.lin = nn.Linear(784, 10)\n", 681 | "\n", 682 | " def forward(self, xb):\n", 683 | " return self.lin(xb)" 684 | ] 685 | }, 686 | { 687 | "cell_type": "markdown", 688 | "metadata": {}, 689 | "source": [ 690 | "We instantiate our model and calculate the loss in the same way as before:\n", 691 | "\n" 692 | ] 693 | }, 694 | { 695 | "cell_type": "code", 696 | "execution_count": null, 697 | "metadata": { 698 | "collapsed": false 699 | }, 700 | "outputs": [], 701 | "source": [ 702 | "model = Mnist_Logistic()\n", 703 | "print(loss_func(model(xb), yb))" 704 | ] 705 | }, 706 | { 707 | "cell_type": "markdown", 708 | "metadata": {}, 709 | "source": [ 710 | "We are still able to use our same ``fit`` method as before.\n", 711 | "\n" 712 | ] 713 | }, 714 | { 715 | "cell_type": "code", 716 | "execution_count": null, 717 | "metadata": { 718 | "collapsed": false 719 | }, 720 | "outputs": [], 721 | "source": [ 722 | "fit()\n", 723 | "\n", 724 | "print(loss_func(model(xb), yb))" 725 | ] 726 | }, 727 | { 728 | "cell_type": "markdown", 729 | "metadata": {}, 730 | "source": [ 731 | "Refactor using optim\n", 732 | "------------------------------\n", 733 | "\n", 734 | "Pytorch also has a package with various optimization algorithms, ``torch.optim``.\n", 735 | "We can use the ``step`` method from our optimizer to take a forward step, instead\n", 736 | "of manually updating each parameter.\n", 737 | "\n", 738 | "This will let us replace our previous manually coded optimization step:\n", 739 | "::\n", 740 | " with torch.no_grad():\n", 741 | " for p in model.parameters(): p -= p.grad * lr\n", 742 | " model.zero_grad()\n", 743 | "\n", 744 | "and instead use just:\n", 745 | "::\n", 746 | " opt.step()\n", 747 | " opt.zero_grad()\n", 748 | "\n", 749 | "(``optim.zero_grad()`` resets the gradient to 0 and we need to call it before\n", 750 | "computing the gradient for the next minibatch.)\n", 751 | "\n" 752 | ] 753 | }, 754 | { 755 | "cell_type": "code", 756 | "execution_count": null, 757 | "metadata": { 758 | "collapsed": false 759 | }, 760 | "outputs": [], 761 | "source": [ 762 | "from torch import optim" 763 | ] 764 | }, 765 | { 766 | "cell_type": "markdown", 767 | "metadata": {}, 768 | "source": [ 769 | "We'll define a little function to create our model and optimizer so we\n", 770 | "can reuse it in the future.\n", 771 | "\n" 772 | ] 773 | }, 774 | { 775 | "cell_type": "code", 776 | "execution_count": null, 777 | "metadata": { 778 | "collapsed": false 779 | }, 780 | "outputs": [], 781 | "source": [ 782 | "def get_model():\n", 783 | " model = Mnist_Logistic()\n", 784 | " return model, optim.SGD(model.parameters(), lr=lr)\n", 785 | "\n", 786 | "model, opt = get_model()\n", 787 | "print(loss_func(model(xb), yb))\n", 788 | "\n", 789 | "for epoch in range(epochs):\n", 790 | " for i in range((n - 1) // bs + 1):\n", 791 | " start_i = i * bs\n", 792 | " end_i = start_i + bs\n", 793 | " xb = x_train[start_i:end_i]\n", 794 | " yb = y_train[start_i:end_i]\n", 795 | " pred = model(xb)\n", 796 | " loss = loss_func(pred, yb)\n", 797 | "\n", 798 | " loss.backward()\n", 799 | " opt.step()\n", 800 | " opt.zero_grad()\n", 801 | "\n", 802 | "print(loss_func(model(xb), yb))" 803 | ] 804 | }, 805 | { 806 | "cell_type": "markdown", 807 | "metadata": {}, 808 | "source": [ 809 | "Refactor using Dataset\n", 810 | "------------------------------\n", 811 | "\n", 812 | "PyTorch has an abstract Dataset class. A Dataset can be anything that has\n", 813 | "a ``__len__`` function (called by Python's standard ``len`` function) and\n", 814 | "a ``__getitem__`` function as a way of indexing into it.\n", 815 | "`This tutorial `_\n", 816 | "walks through a nice example of creating a custom ``FacialLandmarkDataset`` class\n", 817 | "as a subclass of ``Dataset``.\n", 818 | "\n", 819 | "PyTorch's `TensorDataset `_\n", 820 | "is a Dataset wrapping tensors. By defining a length and way of indexing,\n", 821 | "this also gives us a way to iterate, index, and slice along the first\n", 822 | "dimension of a tensor. This will make it easier to access both the\n", 823 | "independent and dependent variables in the same line as we train.\n", 824 | "\n" 825 | ] 826 | }, 827 | { 828 | "cell_type": "code", 829 | "execution_count": null, 830 | "metadata": { 831 | "collapsed": false 832 | }, 833 | "outputs": [], 834 | "source": [ 835 | "from torch.utils.data import TensorDataset" 836 | ] 837 | }, 838 | { 839 | "cell_type": "markdown", 840 | "metadata": {}, 841 | "source": [ 842 | "Both ``x_train`` and ``y_train`` can be combined in a single ``TensorDataset``,\n", 843 | "which will be easier to iterate over and slice.\n", 844 | "\n" 845 | ] 846 | }, 847 | { 848 | "cell_type": "code", 849 | "execution_count": null, 850 | "metadata": { 851 | "collapsed": false 852 | }, 853 | "outputs": [], 854 | "source": [ 855 | "train_ds = TensorDataset(x_train, y_train)" 856 | ] 857 | }, 858 | { 859 | "cell_type": "markdown", 860 | "metadata": {}, 861 | "source": [ 862 | "Previously, we had to iterate through minibatches of x and y values separately:\n", 863 | "::\n", 864 | " xb = x_train[start_i:end_i]\n", 865 | " yb = y_train[start_i:end_i]\n", 866 | "\n", 867 | "\n", 868 | "Now, we can do these two steps together:\n", 869 | "::\n", 870 | " xb,yb = train_ds[i*bs : i*bs+bs]\n", 871 | "\n", 872 | "\n" 873 | ] 874 | }, 875 | { 876 | "cell_type": "code", 877 | "execution_count": null, 878 | "metadata": { 879 | "collapsed": false 880 | }, 881 | "outputs": [], 882 | "source": [ 883 | "model, opt = get_model()\n", 884 | "\n", 885 | "for epoch in range(epochs):\n", 886 | " for i in range((n - 1) // bs + 1):\n", 887 | " xb, yb = train_ds[i * bs: i * bs + bs]\n", 888 | " pred = model(xb)\n", 889 | " loss = loss_func(pred, yb)\n", 890 | "\n", 891 | " loss.backward()\n", 892 | " opt.step()\n", 893 | " opt.zero_grad()\n", 894 | "\n", 895 | "print(loss_func(model(xb), yb))" 896 | ] 897 | }, 898 | { 899 | "cell_type": "markdown", 900 | "metadata": {}, 901 | "source": [ 902 | "Refactor using DataLoader\n", 903 | "------------------------------\n", 904 | "\n", 905 | "Pytorch's ``DataLoader`` is responsible for managing batches. You can\n", 906 | "create a ``DataLoader`` from any ``Dataset``. ``DataLoader`` makes it easier\n", 907 | "to iterate over batches. Rather than having to use ``train_ds[i*bs : i*bs+bs]``,\n", 908 | "the DataLoader gives us each minibatch automatically.\n", 909 | "\n" 910 | ] 911 | }, 912 | { 913 | "cell_type": "code", 914 | "execution_count": null, 915 | "metadata": { 916 | "collapsed": false 917 | }, 918 | "outputs": [], 919 | "source": [ 920 | "from torch.utils.data import DataLoader\n", 921 | "\n", 922 | "train_ds = TensorDataset(x_train, y_train)\n", 923 | "train_dl = DataLoader(train_ds, batch_size=bs)" 924 | ] 925 | }, 926 | { 927 | "cell_type": "markdown", 928 | "metadata": {}, 929 | "source": [ 930 | "Previously, our loop iterated over batches (xb, yb) like this:\n", 931 | "::\n", 932 | " for i in range((n-1)//bs + 1):\n", 933 | " xb,yb = train_ds[i*bs : i*bs+bs]\n", 934 | " pred = model(xb)\n", 935 | "\n", 936 | "Now, our loop is much cleaner, as (xb, yb) are loaded automatically from the data loader:\n", 937 | "::\n", 938 | " for xb,yb in train_dl:\n", 939 | " pred = model(xb)\n", 940 | "\n" 941 | ] 942 | }, 943 | { 944 | "cell_type": "code", 945 | "execution_count": null, 946 | "metadata": { 947 | "collapsed": false 948 | }, 949 | "outputs": [], 950 | "source": [ 951 | "model, opt = get_model()\n", 952 | "\n", 953 | "for epoch in range(epochs):\n", 954 | " for xb, yb in train_dl:\n", 955 | " pred = model(xb)\n", 956 | " loss = loss_func(pred, yb)\n", 957 | "\n", 958 | " loss.backward()\n", 959 | " opt.step()\n", 960 | " opt.zero_grad()\n", 961 | "\n", 962 | "print(loss_func(model(xb), yb))" 963 | ] 964 | }, 965 | { 966 | "cell_type": "markdown", 967 | "metadata": {}, 968 | "source": [ 969 | "Thanks to Pytorch's ``nn.Module``, ``nn.Parameter``, ``Dataset``, and ``DataLoader``,\n", 970 | "our training loop is now dramatically smaller and easier to understand. Let's\n", 971 | "now try to add the basic features necessary to create effecive models in practice.\n", 972 | "\n", 973 | "Add validation\n", 974 | "-----------------------\n", 975 | "\n", 976 | "In section 1, we were just trying to get a reasonable training loop set up for\n", 977 | "use on our training data. In reality, you **always** should also have\n", 978 | "a `validation set `_, in order\n", 979 | "to identify if you are overfitting.\n", 980 | "\n", 981 | "Shuffling the training data is\n", 982 | "`important `_\n", 983 | "to prevent correlation between batches and overfitting. On the other hand, the\n", 984 | "validation loss will be identical whether we shuffle the validation set or not.\n", 985 | "Since shuffling takes extra time, it makes no sense to shuffle the validation data.\n", 986 | "\n", 987 | "We'll use a batch size for the validation set that is twice as large as\n", 988 | "that for the training set. This is because the validation set does not\n", 989 | "need backpropagation and thus takes less memory (it doesn't need to\n", 990 | "store the gradients). We take advantage of this to use a larger batch\n", 991 | "size and compute the loss more quickly.\n", 992 | "\n" 993 | ] 994 | }, 995 | { 996 | "cell_type": "code", 997 | "execution_count": null, 998 | "metadata": { 999 | "collapsed": false 1000 | }, 1001 | "outputs": [], 1002 | "source": [ 1003 | "train_ds = TensorDataset(x_train, y_train)\n", 1004 | "train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)\n", 1005 | "\n", 1006 | "valid_ds = TensorDataset(x_valid, y_valid)\n", 1007 | "valid_dl = DataLoader(valid_ds, batch_size=bs * 2)" 1008 | ] 1009 | }, 1010 | { 1011 | "cell_type": "markdown", 1012 | "metadata": {}, 1013 | "source": [ 1014 | "We will calculate and print the validation loss at the end of each epoch.\n", 1015 | "\n", 1016 | "(Note that we always call ``model.train()`` before training, and ``model.eval()``\n", 1017 | "before inference, because these are used by layers such as ``nn.BatchNorm2d``\n", 1018 | "and ``nn.Dropout`` to ensure appropriate behaviour for these different phases.)\n", 1019 | "\n" 1020 | ] 1021 | }, 1022 | { 1023 | "cell_type": "code", 1024 | "execution_count": null, 1025 | "metadata": { 1026 | "collapsed": false 1027 | }, 1028 | "outputs": [], 1029 | "source": [ 1030 | "model, opt = get_model()\n", 1031 | "\n", 1032 | "for epoch in range(epochs):\n", 1033 | " model.train()\n", 1034 | " for xb, yb in train_dl:\n", 1035 | " pred = model(xb)\n", 1036 | " loss = loss_func(pred, yb)\n", 1037 | "\n", 1038 | " loss.backward()\n", 1039 | " opt.step()\n", 1040 | " opt.zero_grad()\n", 1041 | "\n", 1042 | " model.eval()\n", 1043 | " with torch.no_grad():\n", 1044 | " valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)\n", 1045 | "\n", 1046 | " print(epoch, valid_loss / len(valid_dl))" 1047 | ] 1048 | }, 1049 | { 1050 | "cell_type": "markdown", 1051 | "metadata": {}, 1052 | "source": [ 1053 | "Create fit() and get_data()\n", 1054 | "----------------------------------\n", 1055 | "\n", 1056 | "We'll now do a little refactoring of our own. Since we go through a similar\n", 1057 | "process twice of calculating the loss for both the training set and the\n", 1058 | "validation set, let's make that into its own function, ``loss_batch``, which\n", 1059 | "computes the loss for one batch.\n", 1060 | "\n", 1061 | "We pass an optimizer in for the training set, and use it to perform\n", 1062 | "backprop. For the validation set, we don't pass an optimizer, so the\n", 1063 | "method doesn't perform backprop.\n", 1064 | "\n" 1065 | ] 1066 | }, 1067 | { 1068 | "cell_type": "code", 1069 | "execution_count": null, 1070 | "metadata": { 1071 | "collapsed": false 1072 | }, 1073 | "outputs": [], 1074 | "source": [ 1075 | "def loss_batch(model, loss_func, xb, yb, opt=None):\n", 1076 | " loss = loss_func(model(xb), yb)\n", 1077 | "\n", 1078 | " if opt is not None:\n", 1079 | " loss.backward()\n", 1080 | " opt.step()\n", 1081 | " opt.zero_grad()\n", 1082 | "\n", 1083 | " return loss.item(), len(xb)" 1084 | ] 1085 | }, 1086 | { 1087 | "cell_type": "markdown", 1088 | "metadata": {}, 1089 | "source": [ 1090 | "``fit`` runs the necessary operations to train our model and compute the\n", 1091 | "training and validation losses for each epoch.\n", 1092 | "\n" 1093 | ] 1094 | }, 1095 | { 1096 | "cell_type": "code", 1097 | "execution_count": null, 1098 | "metadata": { 1099 | "collapsed": false 1100 | }, 1101 | "outputs": [], 1102 | "source": [ 1103 | "import numpy as np\n", 1104 | "\n", 1105 | "def fit(epochs, model, loss_func, opt, train_dl, valid_dl):\n", 1106 | " for epoch in range(epochs):\n", 1107 | " model.train()\n", 1108 | " for xb, yb in train_dl:\n", 1109 | " loss_batch(model, loss_func, xb, yb, opt)\n", 1110 | "\n", 1111 | " model.eval()\n", 1112 | " with torch.no_grad():\n", 1113 | " losses, nums = zip(\n", 1114 | " *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]\n", 1115 | " )\n", 1116 | " val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)\n", 1117 | "\n", 1118 | " print(epoch, val_loss)" 1119 | ] 1120 | }, 1121 | { 1122 | "cell_type": "markdown", 1123 | "metadata": {}, 1124 | "source": [ 1125 | "``get_data`` returns dataloaders for the training and validation sets.\n", 1126 | "\n" 1127 | ] 1128 | }, 1129 | { 1130 | "cell_type": "code", 1131 | "execution_count": null, 1132 | "metadata": { 1133 | "collapsed": false 1134 | }, 1135 | "outputs": [], 1136 | "source": [ 1137 | "def get_data(train_ds, valid_ds, bs):\n", 1138 | " return (\n", 1139 | " DataLoader(train_ds, batch_size=bs, shuffle=True),\n", 1140 | " DataLoader(valid_ds, batch_size=bs * 2),\n", 1141 | " )" 1142 | ] 1143 | }, 1144 | { 1145 | "cell_type": "markdown", 1146 | "metadata": {}, 1147 | "source": [ 1148 | "Now, our whole process of obtaining the data loaders and fitting the\n", 1149 | "model can be run in 3 lines of code:\n", 1150 | "\n" 1151 | ] 1152 | }, 1153 | { 1154 | "cell_type": "code", 1155 | "execution_count": null, 1156 | "metadata": { 1157 | "collapsed": false 1158 | }, 1159 | "outputs": [], 1160 | "source": [ 1161 | "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n", 1162 | "model, opt = get_model()\n", 1163 | "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" 1164 | ] 1165 | }, 1166 | { 1167 | "cell_type": "markdown", 1168 | "metadata": {}, 1169 | "source": [ 1170 | "You can use these basic 3 lines of code to train a wide variety of models.\n", 1171 | "Let's see if we can use them to train a convolutional neural network (CNN)!\n", 1172 | "\n", 1173 | "Switch to CNN\n", 1174 | "-------------\n", 1175 | "\n", 1176 | "We are now going to build our neural network with three convolutional layers.\n", 1177 | "Because none of the functions in the previous section assume anything about\n", 1178 | "the model form, we'll be able to use them to train a CNN without any modification.\n", 1179 | "\n", 1180 | "We will use Pytorch's predefined\n", 1181 | "`Conv2d `_ class\n", 1182 | "as our convolutional layer. We define a CNN with 3 convolutional layers.\n", 1183 | "Each convolution is followed by a ReLU. At the end, we perform an\n", 1184 | "average pooling. (Note that ``view`` is PyTorch's version of numpy's\n", 1185 | "``reshape``)\n", 1186 | "\n" 1187 | ] 1188 | }, 1189 | { 1190 | "cell_type": "code", 1191 | "execution_count": null, 1192 | "metadata": { 1193 | "collapsed": false 1194 | }, 1195 | "outputs": [], 1196 | "source": [ 1197 | "class Mnist_CNN(nn.Module):\n", 1198 | " def __init__(self):\n", 1199 | " super().__init__()\n", 1200 | " self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)\n", 1201 | " self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)\n", 1202 | " self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)\n", 1203 | "\n", 1204 | " def forward(self, xb):\n", 1205 | " xb = xb.view(-1, 1, 28, 28)\n", 1206 | " xb = F.relu(self.conv1(xb))\n", 1207 | " xb = F.relu(self.conv2(xb))\n", 1208 | " xb = F.relu(self.conv3(xb))\n", 1209 | " xb = F.avg_pool2d(xb, 4)\n", 1210 | " return xb.view(-1, xb.size(1))\n", 1211 | "\n", 1212 | "lr = 0.1" 1213 | ] 1214 | }, 1215 | { 1216 | "cell_type": "markdown", 1217 | "metadata": {}, 1218 | "source": [ 1219 | "`Momentum `_ is a variation on\n", 1220 | "stochastic gradient descent that takes previous updates into account as well\n", 1221 | "and generally leads to faster training.\n", 1222 | "\n" 1223 | ] 1224 | }, 1225 | { 1226 | "cell_type": "code", 1227 | "execution_count": null, 1228 | "metadata": { 1229 | "collapsed": false 1230 | }, 1231 | "outputs": [], 1232 | "source": [ 1233 | "model = Mnist_CNN()\n", 1234 | "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n", 1235 | "\n", 1236 | "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" 1237 | ] 1238 | }, 1239 | { 1240 | "cell_type": "markdown", 1241 | "metadata": {}, 1242 | "source": [ 1243 | "nn.Sequential\n", 1244 | "------------------------\n", 1245 | "\n", 1246 | "``torch.nn`` has another handy class we can use to simply our code:\n", 1247 | "`Sequential `_ .\n", 1248 | "A ``Sequential`` object runs each of the modules contained within it, in a\n", 1249 | "sequential manner. This is a simpler way of writing our neural network.\n", 1250 | "\n", 1251 | "To take advantage of this, we need to be able to easily define a\n", 1252 | "**custom layer** from a given function. For instance, PyTorch doesn't\n", 1253 | "have a `view` layer, and we need to create one for our network. ``Lambda``\n", 1254 | "will create a layer that we can then use when defining a network with\n", 1255 | "``Sequential``.\n", 1256 | "\n" 1257 | ] 1258 | }, 1259 | { 1260 | "cell_type": "code", 1261 | "execution_count": null, 1262 | "metadata": { 1263 | "collapsed": false 1264 | }, 1265 | "outputs": [], 1266 | "source": [ 1267 | "class Lambda(nn.Module):\n", 1268 | " def __init__(self, func):\n", 1269 | " super().__init__()\n", 1270 | " self.func = func\n", 1271 | "\n", 1272 | " def forward(self, x):\n", 1273 | " return self.func(x)\n", 1274 | "\n", 1275 | "\n", 1276 | "def preprocess(x):\n", 1277 | " return x.view(-1, 1, 28, 28)" 1278 | ] 1279 | }, 1280 | { 1281 | "cell_type": "markdown", 1282 | "metadata": {}, 1283 | "source": [ 1284 | "The model created with ``Sequential`` is simply:\n", 1285 | "\n" 1286 | ] 1287 | }, 1288 | { 1289 | "cell_type": "code", 1290 | "execution_count": null, 1291 | "metadata": { 1292 | "collapsed": false 1293 | }, 1294 | "outputs": [], 1295 | "source": [ 1296 | "model = nn.Sequential(\n", 1297 | " Lambda(preprocess),\n", 1298 | " nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n", 1299 | " nn.ReLU(),\n", 1300 | " nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n", 1301 | " nn.ReLU(),\n", 1302 | " nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n", 1303 | " nn.ReLU(),\n", 1304 | " nn.AvgPool2d(4),\n", 1305 | " Lambda(lambda x: x.view(x.size(0), -1)),\n", 1306 | ")\n", 1307 | "\n", 1308 | "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n", 1309 | "\n", 1310 | "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" 1311 | ] 1312 | }, 1313 | { 1314 | "cell_type": "markdown", 1315 | "metadata": {}, 1316 | "source": [ 1317 | "Wrapping DataLoader\n", 1318 | "-----------------------------\n", 1319 | "\n", 1320 | "Our CNN is fairly concise, but it only works with MNIST, because:\n", 1321 | " - It assumes the input is a 28\\*28 long vector\n", 1322 | " - It assumes that the final CNN grid size is 4\\*4 (since that's the average\n", 1323 | "pooling kernel size we used)\n", 1324 | "\n", 1325 | "Let's get rid of these two assumptions, so our model works with any 2d\n", 1326 | "single channel image. First, we can remove the initial Lambda layer but\n", 1327 | "moving the data preprocessing into a generator:\n", 1328 | "\n" 1329 | ] 1330 | }, 1331 | { 1332 | "cell_type": "code", 1333 | "execution_count": null, 1334 | "metadata": { 1335 | "collapsed": false 1336 | }, 1337 | "outputs": [], 1338 | "source": [ 1339 | "def preprocess(x, y):\n", 1340 | " return x.view(-1, 1, 28, 28), y\n", 1341 | "\n", 1342 | "\n", 1343 | "class WrappedDataLoader:\n", 1344 | " def __init__(self, dl, func):\n", 1345 | " self.dl = dl\n", 1346 | " self.func = func\n", 1347 | "\n", 1348 | " def __len__(self):\n", 1349 | " return len(self.dl)\n", 1350 | "\n", 1351 | " def __iter__(self):\n", 1352 | " batches = iter(self.dl)\n", 1353 | " for b in batches:\n", 1354 | " yield (self.func(*b))\n", 1355 | "\n", 1356 | "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n", 1357 | "train_dl = WrappedDataLoader(train_dl, preprocess)\n", 1358 | "valid_dl = WrappedDataLoader(valid_dl, preprocess)" 1359 | ] 1360 | }, 1361 | { 1362 | "cell_type": "markdown", 1363 | "metadata": {}, 1364 | "source": [ 1365 | "Next, we can replace ``nn.AvgPool2d`` with ``nn.AdaptiveAvgPool2d``, which\n", 1366 | "allows us to define the size of the *output* tensor we want, rather than\n", 1367 | "the *input* tensor we have. As a result, our model will work with any\n", 1368 | "size input.\n", 1369 | "\n" 1370 | ] 1371 | }, 1372 | { 1373 | "cell_type": "code", 1374 | "execution_count": null, 1375 | "metadata": { 1376 | "collapsed": false 1377 | }, 1378 | "outputs": [], 1379 | "source": [ 1380 | "model = nn.Sequential(\n", 1381 | " nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n", 1382 | " nn.ReLU(),\n", 1383 | " nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n", 1384 | " nn.ReLU(),\n", 1385 | " nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n", 1386 | " nn.ReLU(),\n", 1387 | " nn.AdaptiveAvgPool2d(1),\n", 1388 | " Lambda(lambda x: x.view(x.size(0), -1)),\n", 1389 | ")\n", 1390 | "\n", 1391 | "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)" 1392 | ] 1393 | }, 1394 | { 1395 | "cell_type": "markdown", 1396 | "metadata": {}, 1397 | "source": [ 1398 | "Let's try it out:\n", 1399 | "\n" 1400 | ] 1401 | }, 1402 | { 1403 | "cell_type": "code", 1404 | "execution_count": null, 1405 | "metadata": { 1406 | "collapsed": false 1407 | }, 1408 | "outputs": [], 1409 | "source": [ 1410 | "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" 1411 | ] 1412 | }, 1413 | { 1414 | "cell_type": "markdown", 1415 | "metadata": {}, 1416 | "source": [ 1417 | "Using your GPU\n", 1418 | "---------------\n", 1419 | "\n", 1420 | "[Switch on a GPU machine](https://docs.floydhub.com/guides/workspace/#switching-between-cpu-and-gpu) to speed up the computation with the next lines of code.\n", 1421 | "\n", 1422 | "Note: You can run the next Code Cell on CPU machine as well since the below code is device agnostic.\n" 1423 | ] 1424 | }, 1425 | { 1426 | "cell_type": "code", 1427 | "execution_count": null, 1428 | "metadata": { 1429 | "collapsed": false 1430 | }, 1431 | "outputs": [], 1432 | "source": [ 1433 | "print(torch.cuda.is_available())" 1434 | ] 1435 | }, 1436 | { 1437 | "cell_type": "markdown", 1438 | "metadata": {}, 1439 | "source": [ 1440 | "And then create a device object for it:\n", 1441 | "\n" 1442 | ] 1443 | }, 1444 | { 1445 | "cell_type": "code", 1446 | "execution_count": null, 1447 | "metadata": { 1448 | "collapsed": false 1449 | }, 1450 | "outputs": [], 1451 | "source": [ 1452 | "dev = torch.device(\n", 1453 | " \"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")" 1454 | ] 1455 | }, 1456 | { 1457 | "cell_type": "markdown", 1458 | "metadata": {}, 1459 | "source": [ 1460 | "Let's update ``preprocess`` to move batches to the GPU:\n", 1461 | "\n" 1462 | ] 1463 | }, 1464 | { 1465 | "cell_type": "code", 1466 | "execution_count": null, 1467 | "metadata": { 1468 | "collapsed": false 1469 | }, 1470 | "outputs": [], 1471 | "source": [ 1472 | "def preprocess(x, y):\n", 1473 | " return x.view(-1, 1, 28, 28).to(dev), y.to(dev)\n", 1474 | "\n", 1475 | "\n", 1476 | "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n", 1477 | "train_dl = WrappedDataLoader(train_dl, preprocess)\n", 1478 | "valid_dl = WrappedDataLoader(valid_dl, preprocess)" 1479 | ] 1480 | }, 1481 | { 1482 | "cell_type": "markdown", 1483 | "metadata": {}, 1484 | "source": [ 1485 | "Finally, we can move our model to the GPU.\n", 1486 | "\n" 1487 | ] 1488 | }, 1489 | { 1490 | "cell_type": "code", 1491 | "execution_count": null, 1492 | "metadata": { 1493 | "collapsed": false 1494 | }, 1495 | "outputs": [], 1496 | "source": [ 1497 | "model.to(dev)\n", 1498 | "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)" 1499 | ] 1500 | }, 1501 | { 1502 | "cell_type": "markdown", 1503 | "metadata": {}, 1504 | "source": [ 1505 | "You should find it runs faster now:\n", 1506 | "\n" 1507 | ] 1508 | }, 1509 | { 1510 | "cell_type": "code", 1511 | "execution_count": null, 1512 | "metadata": { 1513 | "collapsed": false 1514 | }, 1515 | "outputs": [], 1516 | "source": [ 1517 | "fit(epochs, model, loss_func, opt, train_dl, valid_dl)" 1518 | ] 1519 | }, 1520 | { 1521 | "cell_type": "markdown", 1522 | "metadata": {}, 1523 | "source": [ 1524 | "Closing thoughts\n", 1525 | "-----------------\n", 1526 | "\n", 1527 | "We now have a general data pipeline and training loop which you can use for\n", 1528 | "training many types of models using Pytorch. To see how simple training a model\n", 1529 | "can now be, take a look at the `mnist_sample` sample notebook.\n", 1530 | "\n", 1531 | "Of course, there are many things you'll want to add, such as data augmentation,\n", 1532 | "hyperparameter tuning, monitoring training, transfer learning, and so forth.\n", 1533 | "These features are available in the fastai library, which has been developed\n", 1534 | "using the same design approach shown in this tutorial, providing a natural\n", 1535 | "next step for practitioners looking to take their models further.\n", 1536 | "\n", 1537 | "We promised at the start of this tutorial we'd explain through example each of\n", 1538 | "``torch.nn``, ``torch.optim``, ``Dataset``, and ``DataLoader``. So let's summarize\n", 1539 | "what we've seen:\n", 1540 | "\n", 1541 | " - **torch.nn**\n", 1542 | "\n", 1543 | " + ``Module``: creates a callable which behaves like a function, but can also\n", 1544 | " contain state(such as neural net layer weights). It knows what ``Parameter`` (s) it\n", 1545 | " contains and can zero all their gradients, loop through them for weight updates, etc.\n", 1546 | " + ``Parameter``: a wrapper for a tensor that tells a ``Module`` that it has weights\n", 1547 | " that need updating during backprop. Only tensors with the `requires_grad` attribute set are updated\n", 1548 | " + ``functional``: a module(usually imported into the ``F`` namespace by convention)\n", 1549 | " which contains activation functions, loss functions, etc, as well as non-stateful\n", 1550 | " versions of layers such as convolutional and linear layers.\n", 1551 | " - ``torch.optim``: Contains optimizers such as ``SGD``, which update the weights\n", 1552 | " of ``Parameter`` during the backward step\n", 1553 | " - ``Dataset``: An abstract interface of objects with a ``__len__`` and a ``__getitem__``,\n", 1554 | " including classes provided with Pytorch such as ``TensorDataset``\n", 1555 | " - ``DataLoader``: Takes any ``Dataset`` and creates an iterator which returns batches of data.\n", 1556 | "\n" 1557 | ] 1558 | } 1559 | ], 1560 | "metadata": { 1561 | "kernelspec": { 1562 | "display_name": "Python 3", 1563 | "language": "python", 1564 | "name": "python3" 1565 | }, 1566 | "language_info": { 1567 | "codemirror_mode": { 1568 | "name": "ipython", 1569 | "version": 3 1570 | }, 1571 | "file_extension": ".py", 1572 | "mimetype": "text/x-python", 1573 | "name": "python", 1574 | "nbconvert_exporter": "python", 1575 | "pygments_lexer": "ipython3", 1576 | "version": "3.6.5" 1577 | } 1578 | }, 1579 | "nbformat": 4, 1580 | "nbformat_minor": 2 1581 | } 1582 | --------------------------------------------------------------------------------