├── floyd.yml
├── README.md
└── nn_tutorial.ipynb


/floyd.yml:
--------------------------------------------------------------------------------
1 | env: pytorch-1.0
2 | input:
3 |   - destination: mnist
4 |     source: redeipirati/datasets/pytorch-mnist/2
5 | machine: cpu


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # PyTorch NN Tutorial on FloydHub Workspace
2 | 
3 | By clicking on the below button, you will create a new [FloydHub Workspace](https://www.floydhub.com/product/build) with the notebook tutorial: [WHAT IS TORCH.NN REALLY?](https://pytorch.org/tutorials/beginner/nn_tutorial.html) created by Jeremy Howard and the [fast.ai](https://www.fast.ai/) Team.
4 | 
5 | [![Run on FH](https://static.floydhub.com/button/button.svg)](https://floydhub.com/run)


--------------------------------------------------------------------------------
/nn_tutorial.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |  "cells": [
   3 |   {
   4 |    "cell_type": "code",
   5 |    "execution_count": null,
   6 |    "metadata": {
   7 |     "collapsed": false
   8 |    },
   9 |    "outputs": [],
  10 |    "source": [
  11 |     "%matplotlib inline"
  12 |    ]
  13 |   },
  14 |   {
  15 |    "cell_type": "markdown",
  16 |    "metadata": {},
  17 |    "source": [
  18 |     "\n",
  19 |     "What is `torch.nn` *really*?\n",
  20 |     "============================\n",
  21 |     "by Jeremy Howard, `fast.ai <https://www.fast.ai>`_. Thanks to Rachel Thomas and Francisco Ingham.\n",
  22 |     "\n"
  23 |    ]
  24 |   },
  25 |   {
  26 |    "cell_type": "markdown",
  27 |    "metadata": {},
  28 |    "source": [
  29 |     "PyTorch provides the elegantly designed modules and classes `torch.nn <https://pytorch.org/docs/stable/nn.html>`_ ,\n",
  30 |     "`torch.optim <https://pytorch.org/docs/stable/optim.html>`_ ,\n",
  31 |     "`Dataset <https://pytorch.org/docs/stable/data.html?highlight=dataset#torch.utils.data.Dataset>`_ ,\n",
  32 |     "and `DataLoader <https://pytorch.org/docs/stable/data.html?highlight=dataloader#torch.utils.data.DataLoader>`_\n",
  33 |     "to help you create and train neural networks.\n",
  34 |     "In order to fully utilize their power and customize\n",
  35 |     "them for your problem, you need to really understand exactly what they're\n",
  36 |     "doing. To develop this understanding, we will first train basic neural net\n",
  37 |     "on the MNIST data set without using any features from these models; we will\n",
  38 |     "initially only use the most basic PyTorch tensor functionality. Then, we will\n",
  39 |     "incrementally add one feature from ``torch.nn``, ``torch.optim``, ``Dataset``, or\n",
  40 |     "``DataLoader`` at a time, showing exactly what each piece does, and how it\n",
  41 |     "works to make the code either more concise, or more flexible.\n",
  42 |     "\n",
  43 |     "**This tutorial assumes you already familiar\n",
  44 |     "with the basics of tensor operations.** (If you're familiar with Numpy array\n",
  45 |     "operations, you'll find the PyTorch tensor operations used here nearly identical).\n",
  46 |     "\n",
  47 |     "MNIST data setup\n",
  48 |     "----------------\n",
  49 |     "\n",
  50 |     "We will use the classic `MNIST <http://deeplearning.net/data/mnist/>`_ dataset,\n",
  51 |     "which consists of black-and-white images of hand-drawn digits (between 0 and 9).\n",
  52 |     "\n",
  53 |     "We will use `pathlib <https://docs.python.org/3/library/pathlib.html>`_\n",
  54 |     "for dealing with paths (part of the Python 3 standard library). We will only\n",
  55 |     "import modules when we use them, so you can see exactly what's being\n",
  56 |     "used at each point.\n",
  57 |     "\n",
  58 |     "**Note: the dataset has been already attached to the Workspace.**"
  59 |    ]
  60 |   },
  61 |   {
  62 |    "cell_type": "code",
  63 |    "execution_count": null,
  64 |    "metadata": {
  65 |     "collapsed": false
  66 |    },
  67 |    "outputs": [],
  68 |    "source": [
  69 |     "from pathlib import Path\n",
  70 |     "import requests\n",
  71 |     "\n",
  72 |     "DATA_PATH = Path(\"/floyd/input\")\n",
  73 |     "PATH = DATA_PATH / \"mnist\"\n",
  74 |     "FILENAME = \"mnist.pkl.gz\""
  75 |    ]
  76 |   },
  77 |   {
  78 |    "cell_type": "markdown",
  79 |    "metadata": {},
  80 |    "source": [
  81 |     "This dataset is in numpy array format, and has been stored using pickle,\n",
  82 |     "a python-specific format for serializing data.\n",
  83 |     "\n"
  84 |    ]
  85 |   },
  86 |   {
  87 |    "cell_type": "code",
  88 |    "execution_count": null,
  89 |    "metadata": {
  90 |     "collapsed": false
  91 |    },
  92 |    "outputs": [],
  93 |    "source": [
  94 |     "import pickle\n",
  95 |     "import gzip\n",
  96 |     "\n",
  97 |     "with gzip.open((PATH / FILENAME).as_posix(), \"rb\") as f:\n",
  98 |     "        ((x_train, y_train), (x_valid, y_valid), _) = pickle.load(f, encoding=\"latin-1\")"
  99 |    ]
 100 |   },
 101 |   {
 102 |    "cell_type": "markdown",
 103 |    "metadata": {},
 104 |    "source": [
 105 |     "Each image is 28 x 28, and is being stored as a flattened row of length\n",
 106 |     "784 (=28x28). Let's take a look at one; we need to reshape it to 2d\n",
 107 |     "first.\n",
 108 |     "\n"
 109 |    ]
 110 |   },
 111 |   {
 112 |    "cell_type": "code",
 113 |    "execution_count": null,
 114 |    "metadata": {
 115 |     "collapsed": false
 116 |    },
 117 |    "outputs": [],
 118 |    "source": [
 119 |     "from matplotlib import pyplot\n",
 120 |     "import numpy as np\n",
 121 |     "\n",
 122 |     "pyplot.imshow(x_train[0].reshape((28, 28)), cmap=\"gray\")\n",
 123 |     "print(x_train.shape)"
 124 |    ]
 125 |   },
 126 |   {
 127 |    "cell_type": "markdown",
 128 |    "metadata": {},
 129 |    "source": [
 130 |     "PyTorch uses ``torch.tensor``, rather than numpy arrays, so we need to\n",
 131 |     "convert our data.\n",
 132 |     "\n"
 133 |    ]
 134 |   },
 135 |   {
 136 |    "cell_type": "code",
 137 |    "execution_count": null,
 138 |    "metadata": {
 139 |     "collapsed": false
 140 |    },
 141 |    "outputs": [],
 142 |    "source": [
 143 |     "import torch\n",
 144 |     "\n",
 145 |     "x_train, y_train, x_valid, y_valid = map(\n",
 146 |     "    torch.tensor, (x_train, y_train, x_valid, y_valid)\n",
 147 |     ")\n",
 148 |     "n, c = x_train.shape\n",
 149 |     "x_train, x_train.shape, y_train.min(), y_train.max()\n",
 150 |     "print(x_train, y_train)\n",
 151 |     "print(x_train.shape)\n",
 152 |     "print(y_train.min(), y_train.max())"
 153 |    ]
 154 |   },
 155 |   {
 156 |    "cell_type": "markdown",
 157 |    "metadata": {},
 158 |    "source": [
 159 |     "Neural net from scratch (no torch.nn)\n",
 160 |     "---------------------------------------------\n",
 161 |     "\n",
 162 |     "Let's first create a model using nothing but PyTorch tensor operations. We're assuming\n",
 163 |     "you're already familiar with the basics of neural networks. (If you're not, you can\n",
 164 |     "learn them at `course.fast.ai <https://course.fast.ai>`_).\n",
 165 |     "\n",
 166 |     "PyTorch provides methods to create random or zero-filled tensors, which we will\n",
 167 |     "use to create our weights and bias for a simple linear model. These are just regular\n",
 168 |     "tensors, with one very special addition: we tell PyTorch that they require a\n",
 169 |     "gradient. This causes PyTorch to record all of the operations done on the tensor,\n",
 170 |     "so that it can calculate the gradient during back-propagation *automatically*!\n",
 171 |     "\n",
 172 |     "For the weights, we set ``requires_grad`` **after** the initialization, since we\n",
 173 |     "don't want that step included in the gradient. (Note that a trailling ``_`` in\n",
 174 |     "PyTorch signifies that the operation is performed in-place.)\n",
 175 |     "\n",
 176 |     "<div class=\"alert alert-info\"><h4>Note</h4><p>We are initializing the weights here with\n",
 177 |     "   `Xavier initialisation <http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf>`_\n",
 178 |     "   (by multiplying with 1/sqrt(n)).</p></div>\n",
 179 |     "\n"
 180 |    ]
 181 |   },
 182 |   {
 183 |    "cell_type": "code",
 184 |    "execution_count": null,
 185 |    "metadata": {
 186 |     "collapsed": false
 187 |    },
 188 |    "outputs": [],
 189 |    "source": [
 190 |     "import math\n",
 191 |     "\n",
 192 |     "weights = torch.randn(784, 10) / math.sqrt(784)\n",
 193 |     "weights.requires_grad_()\n",
 194 |     "bias = torch.zeros(10, requires_grad=True)"
 195 |    ]
 196 |   },
 197 |   {
 198 |    "cell_type": "markdown",
 199 |    "metadata": {},
 200 |    "source": [
 201 |     "Thanks to PyTorch's ability to calculate gradients automatically, we can\n",
 202 |     "use any standard Python function (or callable object) as a model! So\n",
 203 |     "let's just write a plain matrix multiplication and broadcasted addition\n",
 204 |     "to create a simple linear model. We also need an activation function, so\n",
 205 |     "we'll write `log_softmax` and use it. Remember: although PyTorch\n",
 206 |     "provides lots of pre-written loss functions, activation functions, and\n",
 207 |     "so forth, you can easily write your own using plain python. PyTorch will\n",
 208 |     "even create fast GPU or vectorized CPU code for your function\n",
 209 |     "automatically.\n",
 210 |     "\n"
 211 |    ]
 212 |   },
 213 |   {
 214 |    "cell_type": "code",
 215 |    "execution_count": null,
 216 |    "metadata": {
 217 |     "collapsed": false
 218 |    },
 219 |    "outputs": [],
 220 |    "source": [
 221 |     "def log_softmax(x):\n",
 222 |     "    return x - x.exp().sum(-1).log().unsqueeze(-1)\n",
 223 |     "\n",
 224 |     "def model(xb):\n",
 225 |     "    return log_softmax(xb @ weights + bias)"
 226 |    ]
 227 |   },
 228 |   {
 229 |    "cell_type": "markdown",
 230 |    "metadata": {},
 231 |    "source": [
 232 |     "In the above, the ``@`` stands for the dot product operation. We will call\n",
 233 |     "our function on one batch of data (in this case, 64 images).  This is\n",
 234 |     "one *forward pass*.  Note that our predictions won't be any better than\n",
 235 |     "random at this stage, since we start with random weights.\n",
 236 |     "\n"
 237 |    ]
 238 |   },
 239 |   {
 240 |    "cell_type": "code",
 241 |    "execution_count": null,
 242 |    "metadata": {
 243 |     "collapsed": false
 244 |    },
 245 |    "outputs": [],
 246 |    "source": [
 247 |     "bs = 64  # batch size\n",
 248 |     "\n",
 249 |     "xb = x_train[0:bs]  # a mini-batch from x\n",
 250 |     "preds = model(xb)  # predictions\n",
 251 |     "preds[0], preds.shape\n",
 252 |     "print(preds[0], preds.shape)"
 253 |    ]
 254 |   },
 255 |   {
 256 |    "cell_type": "markdown",
 257 |    "metadata": {},
 258 |    "source": [
 259 |     "As you see, the ``preds`` tensor contains not only the tensor values, but also a\n",
 260 |     "gradient function. We'll use this later to do backprop.\n",
 261 |     "\n",
 262 |     "Let's implement negative log-likelihood to use as the loss function\n",
 263 |     "(again, we can just use standard Python):\n",
 264 |     "\n"
 265 |    ]
 266 |   },
 267 |   {
 268 |    "cell_type": "code",
 269 |    "execution_count": null,
 270 |    "metadata": {
 271 |     "collapsed": false
 272 |    },
 273 |    "outputs": [],
 274 |    "source": [
 275 |     "def nll(input, target):\n",
 276 |     "    return -input[range(target.shape[0]), target].mean()\n",
 277 |     "\n",
 278 |     "loss_func = nll"
 279 |    ]
 280 |   },
 281 |   {
 282 |    "cell_type": "markdown",
 283 |    "metadata": {},
 284 |    "source": [
 285 |     "Let's check our loss with our random model, so we can see if we improve\n",
 286 |     "after a backprop pass later.\n",
 287 |     "\n"
 288 |    ]
 289 |   },
 290 |   {
 291 |    "cell_type": "code",
 292 |    "execution_count": null,
 293 |    "metadata": {
 294 |     "collapsed": false
 295 |    },
 296 |    "outputs": [],
 297 |    "source": [
 298 |     "yb = y_train[0:bs]\n",
 299 |     "print(loss_func(preds, yb))"
 300 |    ]
 301 |   },
 302 |   {
 303 |    "cell_type": "markdown",
 304 |    "metadata": {},
 305 |    "source": [
 306 |     "Let's also implement a function to calculate the accuracy of our model.\n",
 307 |     "For each prediction, if the index with the largest value matches the\n",
 308 |     "target value, then the prediction was correct.\n",
 309 |     "\n"
 310 |    ]
 311 |   },
 312 |   {
 313 |    "cell_type": "code",
 314 |    "execution_count": null,
 315 |    "metadata": {
 316 |     "collapsed": false
 317 |    },
 318 |    "outputs": [],
 319 |    "source": [
 320 |     "def accuracy(out, yb):\n",
 321 |     "    preds = torch.argmax(out, dim=1)\n",
 322 |     "    return (preds == yb).float().mean()"
 323 |    ]
 324 |   },
 325 |   {
 326 |    "cell_type": "markdown",
 327 |    "metadata": {},
 328 |    "source": [
 329 |     "Let's check the accuracy of our random model, so we can see if our\n",
 330 |     "accuracy improves as our loss improves.\n",
 331 |     "\n"
 332 |    ]
 333 |   },
 334 |   {
 335 |    "cell_type": "code",
 336 |    "execution_count": null,
 337 |    "metadata": {
 338 |     "collapsed": false
 339 |    },
 340 |    "outputs": [],
 341 |    "source": [
 342 |     "print(accuracy(preds, yb))"
 343 |    ]
 344 |   },
 345 |   {
 346 |    "cell_type": "markdown",
 347 |    "metadata": {},
 348 |    "source": [
 349 |     "We can now run a training loop.  For each iteration, we will:\n",
 350 |     "\n",
 351 |     "- select a mini-batch of data (of size ``bs``)\n",
 352 |     "- use the model to make predictions\n",
 353 |     "- calculate the loss\n",
 354 |     "- ``loss.backward()`` updates the gradients of the model, in this case, ``weights``\n",
 355 |     "  and ``bias``.\n",
 356 |     "\n",
 357 |     "We now use these gradients to update the weights and bias.  We do this\n",
 358 |     "within the ``torch.no_grad()`` context manager, because we do not want these\n",
 359 |     "actions to be recorded for our next calculation of the gradient.  You can read\n",
 360 |     "more about how PyTorch's Autograd records operations\n",
 361 |     "`here <https://pytorch.org/docs/stable/notes/autograd.html>`_.\n",
 362 |     "\n",
 363 |     "We then set the\n",
 364 |     "gradients to zero, so that we are ready for the next loop.\n",
 365 |     "Otherwise, our gradients would record a running tally of all the operations\n",
 366 |     "that had happened (i.e. ``loss.backward()`` *adds* the gradients to whatever is\n",
 367 |     "already stored, rather than replacing them).\n",
 368 |     "\n",
 369 |     ".. tip:: You can use the standard python debugger to step through PyTorch\n",
 370 |     "   code, allowing you to check the various variable values at each step.\n",
 371 |     "   Uncomment ``set_trace()`` below to try it out.\n",
 372 |     "\n",
 373 |     "\n"
 374 |    ]
 375 |   },
 376 |   {
 377 |    "cell_type": "code",
 378 |    "execution_count": null,
 379 |    "metadata": {
 380 |     "collapsed": false
 381 |    },
 382 |    "outputs": [],
 383 |    "source": [
 384 |     "from IPython.core.debugger import set_trace\n",
 385 |     "\n",
 386 |     "lr = 0.5  # learning rate\n",
 387 |     "epochs = 2  # how many epochs to train for\n",
 388 |     "\n",
 389 |     "for epoch in range(epochs):\n",
 390 |     "    for i in range((n - 1) // bs + 1):\n",
 391 |     "        #         set_trace()\n",
 392 |     "        start_i = i * bs\n",
 393 |     "        end_i = start_i + bs\n",
 394 |     "        xb = x_train[start_i:end_i]\n",
 395 |     "        yb = y_train[start_i:end_i]\n",
 396 |     "        pred = model(xb)\n",
 397 |     "        loss = loss_func(pred, yb)\n",
 398 |     "\n",
 399 |     "        loss.backward()\n",
 400 |     "        with torch.no_grad():\n",
 401 |     "            weights -= weights.grad * lr\n",
 402 |     "            bias -= bias.grad * lr\n",
 403 |     "            weights.grad.zero_()\n",
 404 |     "            bias.grad.zero_()"
 405 |    ]
 406 |   },
 407 |   {
 408 |    "cell_type": "markdown",
 409 |    "metadata": {},
 410 |    "source": [
 411 |     "That's it: we've created and trained a minimal neural network (in this case, a\n",
 412 |     "logistic regression, since we have no hidden layers) entirely from scratch!\n",
 413 |     "\n",
 414 |     "Let's check the loss and accuracy and compare those to what we got\n",
 415 |     "earlier. We expect that the loss will have decreased and accuracy to\n",
 416 |     "have increased, and they have.\n",
 417 |     "\n"
 418 |    ]
 419 |   },
 420 |   {
 421 |    "cell_type": "code",
 422 |    "execution_count": null,
 423 |    "metadata": {
 424 |     "collapsed": false
 425 |    },
 426 |    "outputs": [],
 427 |    "source": [
 428 |     "print(loss_func(model(xb), yb), accuracy(model(xb), yb))"
 429 |    ]
 430 |   },
 431 |   {
 432 |    "cell_type": "markdown",
 433 |    "metadata": {},
 434 |    "source": [
 435 |     "Using torch.nn.functional\n",
 436 |     "------------------------------\n",
 437 |     "\n",
 438 |     "We will now refactor our code, so that it does the same thing as before, only\n",
 439 |     "we'll start taking advantage of PyTorch's ``nn`` classes to make it more concise\n",
 440 |     "and flexible. At each step from here, we should be making our code one or more\n",
 441 |     "of: shorter, more understandable, and/or more flexible.\n",
 442 |     "\n",
 443 |     "The first and easiest step is to make our code shorter by replacing our\n",
 444 |     "hand-written activation and loss functions with those from ``torch.nn.functional``\n",
 445 |     "(which is generally imported into the namespace ``F`` by convention). This module\n",
 446 |     "contains all the functions in the ``torch.nn`` library (whereas other parts of the\n",
 447 |     "library contain classes). As well as a wide range of loss and activation\n",
 448 |     "functions, you'll also find here some convenient functions for creating neural\n",
 449 |     "nets, such as pooling functions. (There are also functions for doing convolutions,\n",
 450 |     "linear layers, etc, but as we'll see, these are usually better handled using\n",
 451 |     "other parts of the library.)\n",
 452 |     "\n",
 453 |     "If you're using negative log likelihood loss and log softmax activation,\n",
 454 |     "then Pytorch provides a single function ``F.cross_entropy`` that combines\n",
 455 |     "the two. So we can even remove the activation function from our model.\n",
 456 |     "\n"
 457 |    ]
 458 |   },
 459 |   {
 460 |    "cell_type": "code",
 461 |    "execution_count": null,
 462 |    "metadata": {
 463 |     "collapsed": false
 464 |    },
 465 |    "outputs": [],
 466 |    "source": [
 467 |     "import torch.nn.functional as F\n",
 468 |     "\n",
 469 |     "loss_func = F.cross_entropy\n",
 470 |     "\n",
 471 |     "def model(xb):\n",
 472 |     "    return xb @ weights + bias"
 473 |    ]
 474 |   },
 475 |   {
 476 |    "cell_type": "markdown",
 477 |    "metadata": {},
 478 |    "source": [
 479 |     "Note that we no longer call ``log_softmax`` in the ``model`` function. Let's\n",
 480 |     "confirm that our loss and accuracy are the same as before:\n",
 481 |     "\n"
 482 |    ]
 483 |   },
 484 |   {
 485 |    "cell_type": "code",
 486 |    "execution_count": null,
 487 |    "metadata": {
 488 |     "collapsed": false
 489 |    },
 490 |    "outputs": [],
 491 |    "source": [
 492 |     "print(loss_func(model(xb), yb), accuracy(model(xb), yb))"
 493 |    ]
 494 |   },
 495 |   {
 496 |    "cell_type": "markdown",
 497 |    "metadata": {},
 498 |    "source": [
 499 |     "Refactor using nn.Module\n",
 500 |     "-----------------------------\n",
 501 |     "Next up, we'll use ``nn.Module`` and ``nn.Parameter``, for a clearer and more\n",
 502 |     "concise training loop. We subclass ``nn.Module`` (which itself is a class and\n",
 503 |     "able to keep track of state).  In this case, we want to create a class that\n",
 504 |     "holds our weights, bias, and method for the forward step.  ``nn.Module`` has a\n",
 505 |     "number of attributes and methods (such as ``.parameters()`` and ``.zero_grad()``)\n",
 506 |     "which we will be using.\n",
 507 |     "\n",
 508 |     "<div class=\"alert alert-info\"><h4>Note</h4><p>``nn.Module`` (uppercase M) is a PyTorch specific concept, and is a\n",
 509 |     "   class we'll be using a lot. ``nn.Module`` is not to be confused with the Python\n",
 510 |     "   concept of a (lowercase ``m``) `module <https://docs.python.org/3/tutorial/modules.html>`_,\n",
 511 |     "   which is a file of Python code that can be imported.</p></div>\n",
 512 |     "\n"
 513 |    ]
 514 |   },
 515 |   {
 516 |    "cell_type": "code",
 517 |    "execution_count": null,
 518 |    "metadata": {
 519 |     "collapsed": false
 520 |    },
 521 |    "outputs": [],
 522 |    "source": [
 523 |     "from torch import nn\n",
 524 |     "\n",
 525 |     "class Mnist_Logistic(nn.Module):\n",
 526 |     "    def __init__(self):\n",
 527 |     "        super().__init__()\n",
 528 |     "        self.weights = nn.Parameter(torch.randn(784, 10) / math.sqrt(784))\n",
 529 |     "        self.bias = nn.Parameter(torch.zeros(10))\n",
 530 |     "\n",
 531 |     "    def forward(self, xb):\n",
 532 |     "        return xb @ self.weights + self.bias"
 533 |    ]
 534 |   },
 535 |   {
 536 |    "cell_type": "markdown",
 537 |    "metadata": {},
 538 |    "source": [
 539 |     "Since we're now using an object instead of just using a function, we\n",
 540 |     "first have to instantiate our model:\n",
 541 |     "\n"
 542 |    ]
 543 |   },
 544 |   {
 545 |    "cell_type": "code",
 546 |    "execution_count": null,
 547 |    "metadata": {
 548 |     "collapsed": false
 549 |    },
 550 |    "outputs": [],
 551 |    "source": [
 552 |     "model = Mnist_Logistic()"
 553 |    ]
 554 |   },
 555 |   {
 556 |    "cell_type": "markdown",
 557 |    "metadata": {},
 558 |    "source": [
 559 |     "Now we can calculate the loss in the same way as before. Note that\n",
 560 |     "``nn.Module`` objects are used as if they are functions (i.e they are\n",
 561 |     "*callable*), but behind the scenes Pytorch will call our ``forward``\n",
 562 |     "method automatically.\n",
 563 |     "\n"
 564 |    ]
 565 |   },
 566 |   {
 567 |    "cell_type": "code",
 568 |    "execution_count": null,
 569 |    "metadata": {
 570 |     "collapsed": false
 571 |    },
 572 |    "outputs": [],
 573 |    "source": [
 574 |     "print(loss_func(model(xb), yb))"
 575 |    ]
 576 |   },
 577 |   {
 578 |    "cell_type": "markdown",
 579 |    "metadata": {},
 580 |    "source": [
 581 |     "Previously for our training loop we had to update the values for each parameter\n",
 582 |     "by name, and manually zero out the grads for each parameter separately, like this:\n",
 583 |     "::\n",
 584 |     "  with torch.no_grad():\n",
 585 |     "      weights -= weights.grad * lr\n",
 586 |     "      bias -= bias.grad * lr\n",
 587 |     "      weights.grad.zero_()\n",
 588 |     "      bias.grad.zero_()\n",
 589 |     "\n",
 590 |     "\n",
 591 |     "Now we can take advantage of model.parameters() and model.zero_grad() (which\n",
 592 |     "are both defined by PyTorch for ``nn.Module``) to make those steps more concise\n",
 593 |     "and less prone to the error of forgetting some of our parameters, particularly\n",
 594 |     "if we had a more complicated model:\n",
 595 |     "::\n",
 596 |     "  with torch.no_grad():\n",
 597 |     "      for p in model.parameters(): p -= p.grad * lr\n",
 598 |     "      model.zero_grad()\n",
 599 |     "\n",
 600 |     "\n",
 601 |     "We'll wrap our little training loop in a ``fit`` function so we can run it\n",
 602 |     "again later.\n",
 603 |     "\n"
 604 |    ]
 605 |   },
 606 |   {
 607 |    "cell_type": "code",
 608 |    "execution_count": null,
 609 |    "metadata": {
 610 |     "collapsed": false
 611 |    },
 612 |    "outputs": [],
 613 |    "source": [
 614 |     "def fit():\n",
 615 |     "    for epoch in range(epochs):\n",
 616 |     "        for i in range((n - 1) // bs + 1):\n",
 617 |     "            start_i = i * bs\n",
 618 |     "            end_i = start_i + bs\n",
 619 |     "            xb = x_train[start_i:end_i]\n",
 620 |     "            yb = y_train[start_i:end_i]\n",
 621 |     "            pred = model(xb)\n",
 622 |     "            loss = loss_func(pred, yb)\n",
 623 |     "\n",
 624 |     "            loss.backward()\n",
 625 |     "            with torch.no_grad():\n",
 626 |     "                for p in model.parameters():\n",
 627 |     "                    p -= p.grad * lr\n",
 628 |     "                model.zero_grad()\n",
 629 |     "\n",
 630 |     "fit()"
 631 |    ]
 632 |   },
 633 |   {
 634 |    "cell_type": "markdown",
 635 |    "metadata": {},
 636 |    "source": [
 637 |     "Let's double-check that our loss has gone down:\n",
 638 |     "\n"
 639 |    ]
 640 |   },
 641 |   {
 642 |    "cell_type": "code",
 643 |    "execution_count": null,
 644 |    "metadata": {
 645 |     "collapsed": false
 646 |    },
 647 |    "outputs": [],
 648 |    "source": [
 649 |     "print(loss_func(model(xb), yb))"
 650 |    ]
 651 |   },
 652 |   {
 653 |    "cell_type": "markdown",
 654 |    "metadata": {},
 655 |    "source": [
 656 |     "Refactor using nn.Linear\n",
 657 |     "-------------------------\n",
 658 |     "\n",
 659 |     "We continue to refactor our code.  Instead of manually defining and\n",
 660 |     "initializing ``self.weights`` and ``self.bias``, and calculating ``xb  @\n",
 661 |     "self.weights + self.bias``, we will instead use the Pytorch class\n",
 662 |     "`nn.Linear <https://pytorch.org/docs/stable/nn.html#linear-layers>`_ for a\n",
 663 |     "linear layer, which does all that for us. Pytorch has many types of\n",
 664 |     "predefined layers that can greatly simplify our code, and often makes it\n",
 665 |     "faster too.\n",
 666 |     "\n"
 667 |    ]
 668 |   },
 669 |   {
 670 |    "cell_type": "code",
 671 |    "execution_count": null,
 672 |    "metadata": {
 673 |     "collapsed": false
 674 |    },
 675 |    "outputs": [],
 676 |    "source": [
 677 |     "class Mnist_Logistic(nn.Module):\n",
 678 |     "    def __init__(self):\n",
 679 |     "        super().__init__()\n",
 680 |     "        self.lin = nn.Linear(784, 10)\n",
 681 |     "\n",
 682 |     "    def forward(self, xb):\n",
 683 |     "        return self.lin(xb)"
 684 |    ]
 685 |   },
 686 |   {
 687 |    "cell_type": "markdown",
 688 |    "metadata": {},
 689 |    "source": [
 690 |     "We instantiate our model and calculate the loss in the same way as before:\n",
 691 |     "\n"
 692 |    ]
 693 |   },
 694 |   {
 695 |    "cell_type": "code",
 696 |    "execution_count": null,
 697 |    "metadata": {
 698 |     "collapsed": false
 699 |    },
 700 |    "outputs": [],
 701 |    "source": [
 702 |     "model = Mnist_Logistic()\n",
 703 |     "print(loss_func(model(xb), yb))"
 704 |    ]
 705 |   },
 706 |   {
 707 |    "cell_type": "markdown",
 708 |    "metadata": {},
 709 |    "source": [
 710 |     "We are still able to use our same ``fit`` method as before.\n",
 711 |     "\n"
 712 |    ]
 713 |   },
 714 |   {
 715 |    "cell_type": "code",
 716 |    "execution_count": null,
 717 |    "metadata": {
 718 |     "collapsed": false
 719 |    },
 720 |    "outputs": [],
 721 |    "source": [
 722 |     "fit()\n",
 723 |     "\n",
 724 |     "print(loss_func(model(xb), yb))"
 725 |    ]
 726 |   },
 727 |   {
 728 |    "cell_type": "markdown",
 729 |    "metadata": {},
 730 |    "source": [
 731 |     "Refactor using optim\n",
 732 |     "------------------------------\n",
 733 |     "\n",
 734 |     "Pytorch also has a package with various optimization algorithms, ``torch.optim``.\n",
 735 |     "We can use the ``step`` method from our optimizer to take a forward step, instead\n",
 736 |     "of manually updating each parameter.\n",
 737 |     "\n",
 738 |     "This will let us replace our previous manually coded optimization step:\n",
 739 |     "::\n",
 740 |     "  with torch.no_grad():\n",
 741 |     "      for p in model.parameters(): p -= p.grad * lr\n",
 742 |     "      model.zero_grad()\n",
 743 |     "\n",
 744 |     "and instead use just:\n",
 745 |     "::\n",
 746 |     "  opt.step()\n",
 747 |     "  opt.zero_grad()\n",
 748 |     "\n",
 749 |     "(``optim.zero_grad()`` resets the gradient to 0 and we need to call it before\n",
 750 |     "computing the gradient for the next minibatch.)\n",
 751 |     "\n"
 752 |    ]
 753 |   },
 754 |   {
 755 |    "cell_type": "code",
 756 |    "execution_count": null,
 757 |    "metadata": {
 758 |     "collapsed": false
 759 |    },
 760 |    "outputs": [],
 761 |    "source": [
 762 |     "from torch import optim"
 763 |    ]
 764 |   },
 765 |   {
 766 |    "cell_type": "markdown",
 767 |    "metadata": {},
 768 |    "source": [
 769 |     "We'll define a little function to create our model and optimizer so we\n",
 770 |     "can reuse it in the future.\n",
 771 |     "\n"
 772 |    ]
 773 |   },
 774 |   {
 775 |    "cell_type": "code",
 776 |    "execution_count": null,
 777 |    "metadata": {
 778 |     "collapsed": false
 779 |    },
 780 |    "outputs": [],
 781 |    "source": [
 782 |     "def get_model():\n",
 783 |     "    model = Mnist_Logistic()\n",
 784 |     "    return model, optim.SGD(model.parameters(), lr=lr)\n",
 785 |     "\n",
 786 |     "model, opt = get_model()\n",
 787 |     "print(loss_func(model(xb), yb))\n",
 788 |     "\n",
 789 |     "for epoch in range(epochs):\n",
 790 |     "    for i in range((n - 1) // bs + 1):\n",
 791 |     "        start_i = i * bs\n",
 792 |     "        end_i = start_i + bs\n",
 793 |     "        xb = x_train[start_i:end_i]\n",
 794 |     "        yb = y_train[start_i:end_i]\n",
 795 |     "        pred = model(xb)\n",
 796 |     "        loss = loss_func(pred, yb)\n",
 797 |     "\n",
 798 |     "        loss.backward()\n",
 799 |     "        opt.step()\n",
 800 |     "        opt.zero_grad()\n",
 801 |     "\n",
 802 |     "print(loss_func(model(xb), yb))"
 803 |    ]
 804 |   },
 805 |   {
 806 |    "cell_type": "markdown",
 807 |    "metadata": {},
 808 |    "source": [
 809 |     "Refactor using Dataset\n",
 810 |     "------------------------------\n",
 811 |     "\n",
 812 |     "PyTorch has an abstract Dataset class.  A Dataset can be anything that has\n",
 813 |     "a ``__len__`` function (called by Python's standard ``len`` function) and\n",
 814 |     "a ``__getitem__`` function as a way of indexing into it.\n",
 815 |     "`This tutorial <https://pytorch.org/tutorials/beginner/data_loading_tutorial.html>`_\n",
 816 |     "walks through a nice example of creating a custom ``FacialLandmarkDataset`` class\n",
 817 |     "as a subclass of ``Dataset``.\n",
 818 |     "\n",
 819 |     "PyTorch's `TensorDataset <https://pytorch.org/docs/stable/_modules/torch/utils/data/dataset.html#TensorDataset>`_\n",
 820 |     "is a Dataset wrapping tensors. By defining a length and way of indexing,\n",
 821 |     "this also gives us a way to iterate, index, and slice along the first\n",
 822 |     "dimension of a tensor. This will make it easier to access both the\n",
 823 |     "independent and dependent variables in the same line as we train.\n",
 824 |     "\n"
 825 |    ]
 826 |   },
 827 |   {
 828 |    "cell_type": "code",
 829 |    "execution_count": null,
 830 |    "metadata": {
 831 |     "collapsed": false
 832 |    },
 833 |    "outputs": [],
 834 |    "source": [
 835 |     "from torch.utils.data import TensorDataset"
 836 |    ]
 837 |   },
 838 |   {
 839 |    "cell_type": "markdown",
 840 |    "metadata": {},
 841 |    "source": [
 842 |     "Both ``x_train`` and ``y_train`` can be combined in a single ``TensorDataset``,\n",
 843 |     "which will be easier to iterate over and slice.\n",
 844 |     "\n"
 845 |    ]
 846 |   },
 847 |   {
 848 |    "cell_type": "code",
 849 |    "execution_count": null,
 850 |    "metadata": {
 851 |     "collapsed": false
 852 |    },
 853 |    "outputs": [],
 854 |    "source": [
 855 |     "train_ds = TensorDataset(x_train, y_train)"
 856 |    ]
 857 |   },
 858 |   {
 859 |    "cell_type": "markdown",
 860 |    "metadata": {},
 861 |    "source": [
 862 |     "Previously, we had to iterate through minibatches of x and y values separately:\n",
 863 |     "::\n",
 864 |     "    xb = x_train[start_i:end_i]\n",
 865 |     "    yb = y_train[start_i:end_i]\n",
 866 |     "\n",
 867 |     "\n",
 868 |     "Now, we can do these two steps together:\n",
 869 |     "::\n",
 870 |     "    xb,yb = train_ds[i*bs : i*bs+bs]\n",
 871 |     "\n",
 872 |     "\n"
 873 |    ]
 874 |   },
 875 |   {
 876 |    "cell_type": "code",
 877 |    "execution_count": null,
 878 |    "metadata": {
 879 |     "collapsed": false
 880 |    },
 881 |    "outputs": [],
 882 |    "source": [
 883 |     "model, opt = get_model()\n",
 884 |     "\n",
 885 |     "for epoch in range(epochs):\n",
 886 |     "    for i in range((n - 1) // bs + 1):\n",
 887 |     "        xb, yb = train_ds[i * bs: i * bs + bs]\n",
 888 |     "        pred = model(xb)\n",
 889 |     "        loss = loss_func(pred, yb)\n",
 890 |     "\n",
 891 |     "        loss.backward()\n",
 892 |     "        opt.step()\n",
 893 |     "        opt.zero_grad()\n",
 894 |     "\n",
 895 |     "print(loss_func(model(xb), yb))"
 896 |    ]
 897 |   },
 898 |   {
 899 |    "cell_type": "markdown",
 900 |    "metadata": {},
 901 |    "source": [
 902 |     "Refactor using DataLoader\n",
 903 |     "------------------------------\n",
 904 |     "\n",
 905 |     "Pytorch's ``DataLoader`` is responsible for managing batches. You can\n",
 906 |     "create a ``DataLoader`` from any ``Dataset``. ``DataLoader`` makes it easier\n",
 907 |     "to iterate over batches. Rather than having to use ``train_ds[i*bs : i*bs+bs]``,\n",
 908 |     "the DataLoader gives us each minibatch automatically.\n",
 909 |     "\n"
 910 |    ]
 911 |   },
 912 |   {
 913 |    "cell_type": "code",
 914 |    "execution_count": null,
 915 |    "metadata": {
 916 |     "collapsed": false
 917 |    },
 918 |    "outputs": [],
 919 |    "source": [
 920 |     "from torch.utils.data import DataLoader\n",
 921 |     "\n",
 922 |     "train_ds = TensorDataset(x_train, y_train)\n",
 923 |     "train_dl = DataLoader(train_ds, batch_size=bs)"
 924 |    ]
 925 |   },
 926 |   {
 927 |    "cell_type": "markdown",
 928 |    "metadata": {},
 929 |    "source": [
 930 |     "Previously, our loop iterated over batches (xb, yb) like this:\n",
 931 |     "::\n",
 932 |     "      for i in range((n-1)//bs + 1):\n",
 933 |     "          xb,yb = train_ds[i*bs : i*bs+bs]\n",
 934 |     "          pred = model(xb)\n",
 935 |     "\n",
 936 |     "Now, our loop is much cleaner, as (xb, yb) are loaded automatically from the data loader:\n",
 937 |     "::\n",
 938 |     "      for xb,yb in train_dl:\n",
 939 |     "          pred = model(xb)\n",
 940 |     "\n"
 941 |    ]
 942 |   },
 943 |   {
 944 |    "cell_type": "code",
 945 |    "execution_count": null,
 946 |    "metadata": {
 947 |     "collapsed": false
 948 |    },
 949 |    "outputs": [],
 950 |    "source": [
 951 |     "model, opt = get_model()\n",
 952 |     "\n",
 953 |     "for epoch in range(epochs):\n",
 954 |     "    for xb, yb in train_dl:\n",
 955 |     "        pred = model(xb)\n",
 956 |     "        loss = loss_func(pred, yb)\n",
 957 |     "\n",
 958 |     "        loss.backward()\n",
 959 |     "        opt.step()\n",
 960 |     "        opt.zero_grad()\n",
 961 |     "\n",
 962 |     "print(loss_func(model(xb), yb))"
 963 |    ]
 964 |   },
 965 |   {
 966 |    "cell_type": "markdown",
 967 |    "metadata": {},
 968 |    "source": [
 969 |     "Thanks to Pytorch's ``nn.Module``, ``nn.Parameter``, ``Dataset``, and ``DataLoader``,\n",
 970 |     "our training loop is now dramatically smaller and easier to understand. Let's\n",
 971 |     "now try to add the basic features necessary to create effecive models in practice.\n",
 972 |     "\n",
 973 |     "Add validation\n",
 974 |     "-----------------------\n",
 975 |     "\n",
 976 |     "In section 1, we were just trying to get a reasonable training loop set up for\n",
 977 |     "use on our training data.  In reality, you **always** should also have\n",
 978 |     "a `validation set <https://www.fast.ai/2017/11/13/validation-sets/>`_, in order\n",
 979 |     "to identify if you are overfitting.\n",
 980 |     "\n",
 981 |     "Shuffling the training data is\n",
 982 |     "`important <https://www.quora.com/Does-the-order-of-training-data-matter-when-training-neural-networks>`_\n",
 983 |     "to prevent correlation between batches and overfitting. On the other hand, the\n",
 984 |     "validation loss will be identical whether we shuffle the validation set or not.\n",
 985 |     "Since shuffling takes extra time, it makes no sense to shuffle the validation data.\n",
 986 |     "\n",
 987 |     "We'll use a batch size for the validation set that is twice as large as\n",
 988 |     "that for the training set. This is because the validation set does not\n",
 989 |     "need backpropagation and thus takes less memory (it doesn't need to\n",
 990 |     "store the gradients). We take advantage of this to use a larger batch\n",
 991 |     "size and compute the loss more quickly.\n",
 992 |     "\n"
 993 |    ]
 994 |   },
 995 |   {
 996 |    "cell_type": "code",
 997 |    "execution_count": null,
 998 |    "metadata": {
 999 |     "collapsed": false
1000 |    },
1001 |    "outputs": [],
1002 |    "source": [
1003 |     "train_ds = TensorDataset(x_train, y_train)\n",
1004 |     "train_dl = DataLoader(train_ds, batch_size=bs, shuffle=True)\n",
1005 |     "\n",
1006 |     "valid_ds = TensorDataset(x_valid, y_valid)\n",
1007 |     "valid_dl = DataLoader(valid_ds, batch_size=bs * 2)"
1008 |    ]
1009 |   },
1010 |   {
1011 |    "cell_type": "markdown",
1012 |    "metadata": {},
1013 |    "source": [
1014 |     "We will calculate and print the validation loss at the end of each epoch.\n",
1015 |     "\n",
1016 |     "(Note that we always call ``model.train()`` before training, and ``model.eval()``\n",
1017 |     "before inference, because these are used by layers such as ``nn.BatchNorm2d``\n",
1018 |     "and ``nn.Dropout`` to ensure appropriate behaviour for these different phases.)\n",
1019 |     "\n"
1020 |    ]
1021 |   },
1022 |   {
1023 |    "cell_type": "code",
1024 |    "execution_count": null,
1025 |    "metadata": {
1026 |     "collapsed": false
1027 |    },
1028 |    "outputs": [],
1029 |    "source": [
1030 |     "model, opt = get_model()\n",
1031 |     "\n",
1032 |     "for epoch in range(epochs):\n",
1033 |     "    model.train()\n",
1034 |     "    for xb, yb in train_dl:\n",
1035 |     "        pred = model(xb)\n",
1036 |     "        loss = loss_func(pred, yb)\n",
1037 |     "\n",
1038 |     "        loss.backward()\n",
1039 |     "        opt.step()\n",
1040 |     "        opt.zero_grad()\n",
1041 |     "\n",
1042 |     "    model.eval()\n",
1043 |     "    with torch.no_grad():\n",
1044 |     "        valid_loss = sum(loss_func(model(xb), yb) for xb, yb in valid_dl)\n",
1045 |     "\n",
1046 |     "    print(epoch, valid_loss / len(valid_dl))"
1047 |    ]
1048 |   },
1049 |   {
1050 |    "cell_type": "markdown",
1051 |    "metadata": {},
1052 |    "source": [
1053 |     "Create fit() and get_data()\n",
1054 |     "----------------------------------\n",
1055 |     "\n",
1056 |     "We'll now do a little refactoring of our own. Since we go through a similar\n",
1057 |     "process twice of calculating the loss for both the training set and the\n",
1058 |     "validation set, let's make that into its own function, ``loss_batch``, which\n",
1059 |     "computes the loss for one batch.\n",
1060 |     "\n",
1061 |     "We pass an optimizer in for the training set, and use it to perform\n",
1062 |     "backprop.  For the validation set, we don't pass an optimizer, so the\n",
1063 |     "method doesn't perform backprop.\n",
1064 |     "\n"
1065 |    ]
1066 |   },
1067 |   {
1068 |    "cell_type": "code",
1069 |    "execution_count": null,
1070 |    "metadata": {
1071 |     "collapsed": false
1072 |    },
1073 |    "outputs": [],
1074 |    "source": [
1075 |     "def loss_batch(model, loss_func, xb, yb, opt=None):\n",
1076 |     "    loss = loss_func(model(xb), yb)\n",
1077 |     "\n",
1078 |     "    if opt is not None:\n",
1079 |     "        loss.backward()\n",
1080 |     "        opt.step()\n",
1081 |     "        opt.zero_grad()\n",
1082 |     "\n",
1083 |     "    return loss.item(), len(xb)"
1084 |    ]
1085 |   },
1086 |   {
1087 |    "cell_type": "markdown",
1088 |    "metadata": {},
1089 |    "source": [
1090 |     "``fit`` runs the necessary operations to train our model and compute the\n",
1091 |     "training and validation losses for each epoch.\n",
1092 |     "\n"
1093 |    ]
1094 |   },
1095 |   {
1096 |    "cell_type": "code",
1097 |    "execution_count": null,
1098 |    "metadata": {
1099 |     "collapsed": false
1100 |    },
1101 |    "outputs": [],
1102 |    "source": [
1103 |     "import numpy as np\n",
1104 |     "\n",
1105 |     "def fit(epochs, model, loss_func, opt, train_dl, valid_dl):\n",
1106 |     "    for epoch in range(epochs):\n",
1107 |     "        model.train()\n",
1108 |     "        for xb, yb in train_dl:\n",
1109 |     "            loss_batch(model, loss_func, xb, yb, opt)\n",
1110 |     "\n",
1111 |     "        model.eval()\n",
1112 |     "        with torch.no_grad():\n",
1113 |     "            losses, nums = zip(\n",
1114 |     "                *[loss_batch(model, loss_func, xb, yb) for xb, yb in valid_dl]\n",
1115 |     "            )\n",
1116 |     "        val_loss = np.sum(np.multiply(losses, nums)) / np.sum(nums)\n",
1117 |     "\n",
1118 |     "        print(epoch, val_loss)"
1119 |    ]
1120 |   },
1121 |   {
1122 |    "cell_type": "markdown",
1123 |    "metadata": {},
1124 |    "source": [
1125 |     "``get_data`` returns dataloaders for the training and validation sets.\n",
1126 |     "\n"
1127 |    ]
1128 |   },
1129 |   {
1130 |    "cell_type": "code",
1131 |    "execution_count": null,
1132 |    "metadata": {
1133 |     "collapsed": false
1134 |    },
1135 |    "outputs": [],
1136 |    "source": [
1137 |     "def get_data(train_ds, valid_ds, bs):\n",
1138 |     "    return (\n",
1139 |     "        DataLoader(train_ds, batch_size=bs, shuffle=True),\n",
1140 |     "        DataLoader(valid_ds, batch_size=bs * 2),\n",
1141 |     "    )"
1142 |    ]
1143 |   },
1144 |   {
1145 |    "cell_type": "markdown",
1146 |    "metadata": {},
1147 |    "source": [
1148 |     "Now, our whole process of obtaining the data loaders and fitting the\n",
1149 |     "model can be run in 3 lines of code:\n",
1150 |     "\n"
1151 |    ]
1152 |   },
1153 |   {
1154 |    "cell_type": "code",
1155 |    "execution_count": null,
1156 |    "metadata": {
1157 |     "collapsed": false
1158 |    },
1159 |    "outputs": [],
1160 |    "source": [
1161 |     "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n",
1162 |     "model, opt = get_model()\n",
1163 |     "fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
1164 |    ]
1165 |   },
1166 |   {
1167 |    "cell_type": "markdown",
1168 |    "metadata": {},
1169 |    "source": [
1170 |     "You can use these basic 3 lines of code to train a wide variety of models.\n",
1171 |     "Let's see if we can use them to train a convolutional neural network (CNN)!\n",
1172 |     "\n",
1173 |     "Switch to CNN\n",
1174 |     "-------------\n",
1175 |     "\n",
1176 |     "We are now going to build our neural network with three convolutional layers.\n",
1177 |     "Because none of the functions in the previous section assume anything about\n",
1178 |     "the model form, we'll be able to use them to train a CNN without any modification.\n",
1179 |     "\n",
1180 |     "We will use Pytorch's predefined\n",
1181 |     "`Conv2d <https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d>`_ class\n",
1182 |     "as our convolutional layer. We define a CNN with 3 convolutional layers.\n",
1183 |     "Each convolution is followed by a ReLU.  At the end, we perform an\n",
1184 |     "average pooling.  (Note that ``view`` is PyTorch's version of numpy's\n",
1185 |     "``reshape``)\n",
1186 |     "\n"
1187 |    ]
1188 |   },
1189 |   {
1190 |    "cell_type": "code",
1191 |    "execution_count": null,
1192 |    "metadata": {
1193 |     "collapsed": false
1194 |    },
1195 |    "outputs": [],
1196 |    "source": [
1197 |     "class Mnist_CNN(nn.Module):\n",
1198 |     "    def __init__(self):\n",
1199 |     "        super().__init__()\n",
1200 |     "        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1)\n",
1201 |     "        self.conv2 = nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1)\n",
1202 |     "        self.conv3 = nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1)\n",
1203 |     "\n",
1204 |     "    def forward(self, xb):\n",
1205 |     "        xb = xb.view(-1, 1, 28, 28)\n",
1206 |     "        xb = F.relu(self.conv1(xb))\n",
1207 |     "        xb = F.relu(self.conv2(xb))\n",
1208 |     "        xb = F.relu(self.conv3(xb))\n",
1209 |     "        xb = F.avg_pool2d(xb, 4)\n",
1210 |     "        return xb.view(-1, xb.size(1))\n",
1211 |     "\n",
1212 |     "lr = 0.1"
1213 |    ]
1214 |   },
1215 |   {
1216 |    "cell_type": "markdown",
1217 |    "metadata": {},
1218 |    "source": [
1219 |     "`Momentum <https://cs231n.github.io/neural-networks-3/#sgd>`_ is a variation on\n",
1220 |     "stochastic gradient descent that takes previous updates into account as well\n",
1221 |     "and generally leads to faster training.\n",
1222 |     "\n"
1223 |    ]
1224 |   },
1225 |   {
1226 |    "cell_type": "code",
1227 |    "execution_count": null,
1228 |    "metadata": {
1229 |     "collapsed": false
1230 |    },
1231 |    "outputs": [],
1232 |    "source": [
1233 |     "model = Mnist_CNN()\n",
1234 |     "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n",
1235 |     "\n",
1236 |     "fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
1237 |    ]
1238 |   },
1239 |   {
1240 |    "cell_type": "markdown",
1241 |    "metadata": {},
1242 |    "source": [
1243 |     "nn.Sequential\n",
1244 |     "------------------------\n",
1245 |     "\n",
1246 |     "``torch.nn`` has another handy class we can use to simply our code:\n",
1247 |     "`Sequential <https://pytorch.org/docs/stable/nn.html#torch.nn.Sequential>`_ .\n",
1248 |     "A ``Sequential`` object runs each of the modules contained within it, in a\n",
1249 |     "sequential manner. This is a simpler way of writing our neural network.\n",
1250 |     "\n",
1251 |     "To take advantage of this, we need to be able to easily define a\n",
1252 |     "**custom layer** from a given function.  For instance, PyTorch doesn't\n",
1253 |     "have a `view` layer, and we need to create one for our network. ``Lambda``\n",
1254 |     "will create a layer that we can then use when defining a network with\n",
1255 |     "``Sequential``.\n",
1256 |     "\n"
1257 |    ]
1258 |   },
1259 |   {
1260 |    "cell_type": "code",
1261 |    "execution_count": null,
1262 |    "metadata": {
1263 |     "collapsed": false
1264 |    },
1265 |    "outputs": [],
1266 |    "source": [
1267 |     "class Lambda(nn.Module):\n",
1268 |     "    def __init__(self, func):\n",
1269 |     "        super().__init__()\n",
1270 |     "        self.func = func\n",
1271 |     "\n",
1272 |     "    def forward(self, x):\n",
1273 |     "        return self.func(x)\n",
1274 |     "\n",
1275 |     "\n",
1276 |     "def preprocess(x):\n",
1277 |     "    return x.view(-1, 1, 28, 28)"
1278 |    ]
1279 |   },
1280 |   {
1281 |    "cell_type": "markdown",
1282 |    "metadata": {},
1283 |    "source": [
1284 |     "The model created with ``Sequential`` is simply:\n",
1285 |     "\n"
1286 |    ]
1287 |   },
1288 |   {
1289 |    "cell_type": "code",
1290 |    "execution_count": null,
1291 |    "metadata": {
1292 |     "collapsed": false
1293 |    },
1294 |    "outputs": [],
1295 |    "source": [
1296 |     "model = nn.Sequential(\n",
1297 |     "    Lambda(preprocess),\n",
1298 |     "    nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n",
1299 |     "    nn.ReLU(),\n",
1300 |     "    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n",
1301 |     "    nn.ReLU(),\n",
1302 |     "    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n",
1303 |     "    nn.ReLU(),\n",
1304 |     "    nn.AvgPool2d(4),\n",
1305 |     "    Lambda(lambda x: x.view(x.size(0), -1)),\n",
1306 |     ")\n",
1307 |     "\n",
1308 |     "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)\n",
1309 |     "\n",
1310 |     "fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
1311 |    ]
1312 |   },
1313 |   {
1314 |    "cell_type": "markdown",
1315 |    "metadata": {},
1316 |    "source": [
1317 |     "Wrapping DataLoader\n",
1318 |     "-----------------------------\n",
1319 |     "\n",
1320 |     "Our CNN is fairly concise, but it only works with MNIST, because:\n",
1321 |     " - It assumes the input is a 28\\*28 long vector\n",
1322 |     " - It assumes that the final CNN grid size is 4\\*4 (since that's the average\n",
1323 |     "pooling kernel size we used)\n",
1324 |     "\n",
1325 |     "Let's get rid of these two assumptions, so our model works with any 2d\n",
1326 |     "single channel image. First, we can remove the initial Lambda layer but\n",
1327 |     "moving the data preprocessing into a generator:\n",
1328 |     "\n"
1329 |    ]
1330 |   },
1331 |   {
1332 |    "cell_type": "code",
1333 |    "execution_count": null,
1334 |    "metadata": {
1335 |     "collapsed": false
1336 |    },
1337 |    "outputs": [],
1338 |    "source": [
1339 |     "def preprocess(x, y):\n",
1340 |     "    return x.view(-1, 1, 28, 28), y\n",
1341 |     "\n",
1342 |     "\n",
1343 |     "class WrappedDataLoader:\n",
1344 |     "    def __init__(self, dl, func):\n",
1345 |     "        self.dl = dl\n",
1346 |     "        self.func = func\n",
1347 |     "\n",
1348 |     "    def __len__(self):\n",
1349 |     "        return len(self.dl)\n",
1350 |     "\n",
1351 |     "    def __iter__(self):\n",
1352 |     "        batches = iter(self.dl)\n",
1353 |     "        for b in batches:\n",
1354 |     "            yield (self.func(*b))\n",
1355 |     "\n",
1356 |     "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n",
1357 |     "train_dl = WrappedDataLoader(train_dl, preprocess)\n",
1358 |     "valid_dl = WrappedDataLoader(valid_dl, preprocess)"
1359 |    ]
1360 |   },
1361 |   {
1362 |    "cell_type": "markdown",
1363 |    "metadata": {},
1364 |    "source": [
1365 |     "Next, we can replace ``nn.AvgPool2d`` with ``nn.AdaptiveAvgPool2d``, which\n",
1366 |     "allows us to define the size of the *output* tensor we want, rather than\n",
1367 |     "the *input* tensor we have. As a result, our model will work with any\n",
1368 |     "size input.\n",
1369 |     "\n"
1370 |    ]
1371 |   },
1372 |   {
1373 |    "cell_type": "code",
1374 |    "execution_count": null,
1375 |    "metadata": {
1376 |     "collapsed": false
1377 |    },
1378 |    "outputs": [],
1379 |    "source": [
1380 |     "model = nn.Sequential(\n",
1381 |     "    nn.Conv2d(1, 16, kernel_size=3, stride=2, padding=1),\n",
1382 |     "    nn.ReLU(),\n",
1383 |     "    nn.Conv2d(16, 16, kernel_size=3, stride=2, padding=1),\n",
1384 |     "    nn.ReLU(),\n",
1385 |     "    nn.Conv2d(16, 10, kernel_size=3, stride=2, padding=1),\n",
1386 |     "    nn.ReLU(),\n",
1387 |     "    nn.AdaptiveAvgPool2d(1),\n",
1388 |     "    Lambda(lambda x: x.view(x.size(0), -1)),\n",
1389 |     ")\n",
1390 |     "\n",
1391 |     "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)"
1392 |    ]
1393 |   },
1394 |   {
1395 |    "cell_type": "markdown",
1396 |    "metadata": {},
1397 |    "source": [
1398 |     "Let's try it out:\n",
1399 |     "\n"
1400 |    ]
1401 |   },
1402 |   {
1403 |    "cell_type": "code",
1404 |    "execution_count": null,
1405 |    "metadata": {
1406 |     "collapsed": false
1407 |    },
1408 |    "outputs": [],
1409 |    "source": [
1410 |     "fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
1411 |    ]
1412 |   },
1413 |   {
1414 |    "cell_type": "markdown",
1415 |    "metadata": {},
1416 |    "source": [
1417 |     "Using your GPU\n",
1418 |     "---------------\n",
1419 |     "\n",
1420 |     "[Switch on a GPU machine](https://docs.floydhub.com/guides/workspace/#switching-between-cpu-and-gpu) to speed up the computation with the next lines of code.\n",
1421 |     "\n",
1422 |     "Note: You can run the next Code Cell on CPU machine as well since the below code is device agnostic.\n"
1423 |    ]
1424 |   },
1425 |   {
1426 |    "cell_type": "code",
1427 |    "execution_count": null,
1428 |    "metadata": {
1429 |     "collapsed": false
1430 |    },
1431 |    "outputs": [],
1432 |    "source": [
1433 |     "print(torch.cuda.is_available())"
1434 |    ]
1435 |   },
1436 |   {
1437 |    "cell_type": "markdown",
1438 |    "metadata": {},
1439 |    "source": [
1440 |     "And then create a device object for it:\n",
1441 |     "\n"
1442 |    ]
1443 |   },
1444 |   {
1445 |    "cell_type": "code",
1446 |    "execution_count": null,
1447 |    "metadata": {
1448 |     "collapsed": false
1449 |    },
1450 |    "outputs": [],
1451 |    "source": [
1452 |     "dev = torch.device(\n",
1453 |     "    \"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")"
1454 |    ]
1455 |   },
1456 |   {
1457 |    "cell_type": "markdown",
1458 |    "metadata": {},
1459 |    "source": [
1460 |     "Let's update ``preprocess`` to move batches to the GPU:\n",
1461 |     "\n"
1462 |    ]
1463 |   },
1464 |   {
1465 |    "cell_type": "code",
1466 |    "execution_count": null,
1467 |    "metadata": {
1468 |     "collapsed": false
1469 |    },
1470 |    "outputs": [],
1471 |    "source": [
1472 |     "def preprocess(x, y):\n",
1473 |     "    return x.view(-1, 1, 28, 28).to(dev), y.to(dev)\n",
1474 |     "\n",
1475 |     "\n",
1476 |     "train_dl, valid_dl = get_data(train_ds, valid_ds, bs)\n",
1477 |     "train_dl = WrappedDataLoader(train_dl, preprocess)\n",
1478 |     "valid_dl = WrappedDataLoader(valid_dl, preprocess)"
1479 |    ]
1480 |   },
1481 |   {
1482 |    "cell_type": "markdown",
1483 |    "metadata": {},
1484 |    "source": [
1485 |     "Finally, we can move our model to the GPU.\n",
1486 |     "\n"
1487 |    ]
1488 |   },
1489 |   {
1490 |    "cell_type": "code",
1491 |    "execution_count": null,
1492 |    "metadata": {
1493 |     "collapsed": false
1494 |    },
1495 |    "outputs": [],
1496 |    "source": [
1497 |     "model.to(dev)\n",
1498 |     "opt = optim.SGD(model.parameters(), lr=lr, momentum=0.9)"
1499 |    ]
1500 |   },
1501 |   {
1502 |    "cell_type": "markdown",
1503 |    "metadata": {},
1504 |    "source": [
1505 |     "You should find it runs faster now:\n",
1506 |     "\n"
1507 |    ]
1508 |   },
1509 |   {
1510 |    "cell_type": "code",
1511 |    "execution_count": null,
1512 |    "metadata": {
1513 |     "collapsed": false
1514 |    },
1515 |    "outputs": [],
1516 |    "source": [
1517 |     "fit(epochs, model, loss_func, opt, train_dl, valid_dl)"
1518 |    ]
1519 |   },
1520 |   {
1521 |    "cell_type": "markdown",
1522 |    "metadata": {},
1523 |    "source": [
1524 |     "Closing thoughts\n",
1525 |     "-----------------\n",
1526 |     "\n",
1527 |     "We now have a general data pipeline and training loop which you can use for\n",
1528 |     "training many types of models using Pytorch. To see how simple training a model\n",
1529 |     "can now be, take a look at the `mnist_sample` sample notebook.\n",
1530 |     "\n",
1531 |     "Of course, there are many things you'll want to add, such as data augmentation,\n",
1532 |     "hyperparameter tuning, monitoring training, transfer learning, and so forth.\n",
1533 |     "These features are available in the fastai library, which has been developed\n",
1534 |     "using the same design approach shown in this tutorial, providing a natural\n",
1535 |     "next step for practitioners looking to take their models further.\n",
1536 |     "\n",
1537 |     "We promised at the start of this tutorial we'd explain through example each of\n",
1538 |     "``torch.nn``, ``torch.optim``, ``Dataset``, and ``DataLoader``. So let's summarize\n",
1539 |     "what we've seen:\n",
1540 |     "\n",
1541 |     " - **torch.nn**\n",
1542 |     "\n",
1543 |     "   + ``Module``: creates a callable which behaves like a function, but can also\n",
1544 |     "     contain state(such as neural net layer weights). It knows what ``Parameter`` (s) it\n",
1545 |     "     contains and can zero all their gradients, loop through them for weight updates, etc.\n",
1546 |     "   + ``Parameter``: a wrapper for a tensor that tells a ``Module`` that it has weights\n",
1547 |     "     that need updating during backprop. Only tensors with the `requires_grad` attribute set are updated\n",
1548 |     "   + ``functional``: a module(usually imported into the ``F`` namespace by convention)\n",
1549 |     "     which contains activation functions, loss functions, etc, as well as non-stateful\n",
1550 |     "     versions of layers such as convolutional and linear layers.\n",
1551 |     " - ``torch.optim``: Contains optimizers such as ``SGD``, which update the weights\n",
1552 |     "   of ``Parameter`` during the backward step\n",
1553 |     " - ``Dataset``: An abstract interface of objects with a ``__len__`` and a ``__getitem__``,\n",
1554 |     "   including classes provided with Pytorch such as ``TensorDataset``\n",
1555 |     " - ``DataLoader``: Takes any ``Dataset`` and creates an iterator which returns batches of data.\n",
1556 |     "\n"
1557 |    ]
1558 |   }
1559 |  ],
1560 |  "metadata": {
1561 |   "kernelspec": {
1562 |    "display_name": "Python 3",
1563 |    "language": "python",
1564 |    "name": "python3"
1565 |   },
1566 |   "language_info": {
1567 |    "codemirror_mode": {
1568 |     "name": "ipython",
1569 |     "version": 3
1570 |    },
1571 |    "file_extension": ".py",
1572 |    "mimetype": "text/x-python",
1573 |    "name": "python",
1574 |    "nbconvert_exporter": "python",
1575 |    "pygments_lexer": "ipython3",
1576 |    "version": "3.6.5"
1577 |   }
1578 |  },
1579 |  "nbformat": 4,
1580 |  "nbformat_minor": 2
1581 | }
1582 | 


--------------------------------------------------------------------------------