├── README.md
├── hw01
    └── 1_Hw_Students.ipynb
├── hw02
    └── 2_Hw_Students.ipynb
├── hw03
    └── 3_Hw_Students.ipynb
├── hw05
    └── 5_Hw_Students.ipynb
├── week01_intro
    ├── lecture.pdf
    └── seminar.ipynb
├── week02_init_regularization
    ├── lecture.pdf
    └── seminar.ipynb
├── week03_conv
    ├── lecture.pdf
    └── seminar.ipynb
├── week04_tricks
    ├── lecture.pdf
    └── seminar.ipynb
├── week05_segmentation
    ├── lecture.pdf
    └── seminar.ipynb
├── week06_detection
    └── lecture.pdf
├── week07_word_embeddings
    ├── lecture.pdf
    └── seminar.ipynb
├── week08_text_classification
    ├── lecture.pdf
    └── seminar.ipynb
├── week09_transformer
    ├── lecture.pdf
    └── seminar.ipynb
├── week10_gpt
    ├── lecture.pdf
    └── seminar.ipynb
├── week11_cv_transformers
    ├── lecture.pdf
    └── seminar.ipynb
├── week12_gan
    ├── lecture.pdf
    └── seminar.ipynb
├── week13_latent_models
    ├── lecture.pdf
    └── seminar.ipynb
└── week14_representation_learning
    ├── .DS_Store
    ├── lecture.pdf
    └── seminar.ipynb


/README.md:
--------------------------------------------------------------------------------
1 | # deep-learning-course


--------------------------------------------------------------------------------
/hw01/1_Hw_Students.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "nbformat": 4,
   3 |   "nbformat_minor": 0,
   4 |   "metadata": {
   5 |     "colab": {
   6 |       "provenance": [],
   7 |       "gpuType": "T4",
   8 |       "toc_visible": true
   9 |     },
  10 |     "kernelspec": {
  11 |       "name": "python3",
  12 |       "display_name": "Python 3"
  13 |     },
  14 |     "language_info": {
  15 |       "name": "python"
  16 |     },
  17 |     "accelerator": "GPU"
  18 |   },
  19 |   "cells": [
  20 |     {
  21 |       "cell_type": "code",
  22 |       "source": [
  23 |         "import torch\n",
  24 |         "import numpy as np\n",
  25 |         "import matplotlib.pyplot as plt\n",
  26 |         "from tqdm import tqdm\n",
  27 |         "from IPython.display import clear_output\n",
  28 |         "\n",
  29 |         "print(torch.__version__)"
  30 |       ],
  31 |       "metadata": {
  32 |         "id": "VFj8-qGfYA-2"
  33 |       },
  34 |       "execution_count": null,
  35 |       "outputs": []
  36 |     },
  37 |     {
  38 |       "cell_type": "code",
  39 |       "source": [
  40 |         "DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'"
  41 |       ],
  42 |       "metadata": {
  43 |         "id": "cSuFlZPnrT8O"
  44 |       },
  45 |       "execution_count": null,
  46 |       "outputs": []
  47 |     },
  48 |     {
  49 |       "cell_type": "code",
  50 |       "source": [
  51 |         "import sys, os\n",
  52 |         "if 'google.colab' in sys.modules and not os.path.exists('.setup_complete'):\n",
  53 |         "    !wget -q https://raw.githubusercontent.com/yandexdataschool/deep_vision_and_graphics/fall22/week01-pytorch_intro/notmnist.py\n",
  54 |         "    !touch .setup_complete"
  55 |       ],
  56 |       "metadata": {
  57 |         "id": "usRNEECdbR9F"
  58 |       },
  59 |       "execution_count": null,
  60 |       "outputs": []
  61 |     },
  62 |     {
  63 |       "cell_type": "markdown",
  64 |       "source": [
  65 |         "# Task 1. Tensors (1 point)"
  66 |       ],
  67 |       "metadata": {
  68 |         "id": "u7FaNwW2X_v0"
  69 |       }
  70 |     },
  71 |     {
  72 |       "cell_type": "markdown",
  73 |       "source": [
  74 |         "Let's write another function, this time in polar coordinates:\n",
  75 |         "$$\\rho(\\theta) = (1 + 0.9 \\cdot cos (8 \\cdot \\theta) ) \\cdot (1 + 0.1 \\cdot cos(24 \\cdot \\theta)) \\cdot (0.9 + 0.05 \\cdot cos(200 \\cdot \\theta)) \\cdot (1 + sin(\\theta))$$\n",
  76 |         "\n",
  77 |         "\n",
  78 |         "Then convert it into cartesian coordinates ([howto](http://www.mathsisfun.com/polar-cartesian-coordinates.html)) and plot the results.\n",
  79 |         "\n",
  80 |         "Use torch tensors only: no lists, loops, numpy arrays, etc."
  81 |       ],
  82 |       "metadata": {
  83 |         "id": "gUtSrsCYaRdA"
  84 |       }
  85 |     },
  86 |     {
  87 |       "cell_type": "code",
  88 |       "source": [
  89 |         "theta = torch.linspace(- np.pi, np.pi, steps=1000)\n",
  90 |         "\n",
  91 |         "# compute rho(theta) as per formula above\n",
  92 |         "rho = YOUR CODE HERE\n",
  93 |         "\n",
  94 |         "# Now convert polar (rho, theta) pairs into cartesian (x,y) to plot them.\n",
  95 |         "x = YOUR CODE HERE\n",
  96 |         "y = YOUR CODE HERE\n",
  97 |         "\n",
  98 |         "\n",
  99 |         "plt.figure(figsize=[6, 6])\n",
 100 |         "plt.fill(x.numpy(), y.numpy(), color='green')\n",
 101 |         "plt.grid()"
 102 |       ],
 103 |       "metadata": {
 104 |         "id": "sTtmnC-EaIr5"
 105 |       },
 106 |       "execution_count": null,
 107 |       "outputs": []
 108 |     },
 109 |     {
 110 |       "cell_type": "markdown",
 111 |       "source": [
 112 |         "# Task 2: Going deeper (6 points)\n",
 113 |         "\n",
 114 |         "Your ultimate task here is to build your first neural network [almost] from scratch and pure PyTorch.\n",
 115 |         "\n",
 116 |         "This time you will solve the same digit recognition problem, but at a larger scale\n",
 117 |         "\n",
 118 |         "* 10 different letters\n",
 119 |         "* 20k samples\n",
 120 |         "\n",
 121 |         "We want you to build a network that __reaches at least 80% accuracy__ and has __at least 2 linear layers__ in it.\n",
 122 |         "\n",
 123 |         "With 10 classes you need __categorical crossentropy__  (see [here](http://wiki.fast.ai/index.php/Log_Loss)) loss. You can write it any way you want, but we recommend to use log_softmax function from pytorch, since it is more numerically stable.\n",
 124 |         "\n",
 125 |         "Note that you are not required to build 152-layer monsters here. A 2-layer (one hidden, one output) neural network should already give you nice score.\n",
 126 |         "\n",
 127 |         "__Win conditions:__\n",
 128 |         "* __Your model must be nonlinear,__ but not necessarily deep.\n",
 129 |         "* __Train your model with your own SGD__ - which you will have to implement\n",
 130 |         "* __For this task only, please do not use the contents of `torch.nn` and `torch.optim`.__ That's for the next task.\n",
 131 |         "* __Do not use Conv layers__\n",
 132 |         "\n",
 133 |         "**Bonus:** For the best score in group you get +1.5, 1.0, 0.5 point(1st. 2nd, 3rd places)."
 134 |       ],
 135 |       "metadata": {
 136 |         "id": "vcfxlYBna3Ke"
 137 |       }
 138 |     },
 139 |     {
 140 |       "cell_type": "code",
 141 |       "source": [
 142 |         "from notmnist import load_notmnist\n",
 143 |         "X_train, y_train, X_val, y_val = load_notmnist(letters='ABCDEFGHIJ')\n",
 144 |         "X_train, X_val = X_train.reshape([-1, 784]), X_val.reshape([-1, 784])"
 145 |       ],
 146 |       "metadata": {
 147 |         "id": "_uZn0Bdba3pH"
 148 |       },
 149 |       "execution_count": null,
 150 |       "outputs": []
 151 |     },
 152 |     {
 153 |       "cell_type": "code",
 154 |       "source": [
 155 |         "%matplotlib inline\n",
 156 |         "plt.figure(figsize=[12, 4])\n",
 157 |         "for i in range(20):\n",
 158 |         "    plt.subplot(2, 10, i+1)\n",
 159 |         "    plt.imshow(X_train[i].reshape([28, 28]))\n",
 160 |         "    plt.title(str(y_train[i]))"
 161 |       ],
 162 |       "metadata": {
 163 |         "id": "3WJlL3PHbs2S"
 164 |       },
 165 |       "execution_count": null,
 166 |       "outputs": []
 167 |     },
 168 |     {
 169 |       "cell_type": "code",
 170 |       "source": [
 171 |         "X_train.shape, y_train.shape, X_val.shape, y_val.shape"
 172 |       ],
 173 |       "metadata": {
 174 |         "id": "_M4n3fcDbvqu"
 175 |       },
 176 |       "execution_count": null,
 177 |       "outputs": []
 178 |     },
 179 |     {
 180 |       "cell_type": "code",
 181 |       "source": [
 182 |         "classes = np.unique(y_train)\n",
 183 |         "n_classes = len(classes)\n",
 184 |         "classes"
 185 |       ],
 186 |       "metadata": {
 187 |         "id": "bIH6GPb3djr7"
 188 |       },
 189 |       "execution_count": null,
 190 |       "outputs": []
 191 |     },
 192 |     {
 193 |       "cell_type": "code",
 194 |       "source": [
 195 |         "class CustomNet:\n",
 196 |         "    def __init__(self, hidden_size, in_size=28*28, num_classes=n_classes):\n",
 197 |         "        # self.W = YOUR CODE HERE\n",
 198 |         "        pass\n",
 199 |         "    def forward(self, x):\n",
 200 |         "        # YOUR CODE HERE\n",
 201 |         "        pass"
 202 |       ],
 203 |       "metadata": {
 204 |         "id": "mc7bSgpCbzHN"
 205 |       },
 206 |       "execution_count": null,
 207 |       "outputs": []
 208 |     },
 209 |     {
 210 |       "cell_type": "code",
 211 |       "source": [
 212 |         "net = CustomNet()\n",
 213 |         "out = net.forward(torch.randn(2, 28*28, device=DEVICE))\n",
 214 |         "assert len(out.shape) == 2\n",
 215 |         "assert out.shape[-1] == n_classes"
 216 |       ],
 217 |       "metadata": {
 218 |         "id": "649qrlZXfUB6"
 219 |       },
 220 |       "execution_count": null,
 221 |       "outputs": []
 222 |     },
 223 |     {
 224 |       "cell_type": "code",
 225 |       "source": [
 226 |         "import torch.nn.functional as F\n",
 227 |         "\n",
 228 |         "def cross_entropy_loss(logits, target):\n",
 229 |         "    N = logits.size(0)\n",
 230 |         "    # Get the log probabilities\n",
 231 |         "    log_probs = # YOUR CODE HERE\n",
 232 |         "    # Gather the log probabilities at the target indices\n",
 233 |         "    log_probs_at_target = # YOUR CODE HERE\n",
 234 |         "    # Compute the negative log likelihood\n",
 235 |         "    nll = # YOUR CODE HERE\n",
 236 |         "    return nll / N\n",
 237 |         "\n",
 238 |         "y_tmp = torch.tensor(y_train[:2], device=DEVICE)\n",
 239 |         "cross_entropy_loss(out, y_tmp), torch.nn.CrossEntropyLoss()(out, y_tmp)"
 240 |       ],
 241 |       "metadata": {
 242 |         "id": "HnnVGZSwmyu_"
 243 |       },
 244 |       "execution_count": null,
 245 |       "outputs": []
 246 |     },
 247 |     {
 248 |       "cell_type": "code",
 249 |       "source": [
 250 |         "class CustomSGD:\n",
 251 |         "    def __init__(self, model, lr=1e-4):\n",
 252 |         "        self.model = model\n",
 253 |         "        self.lr = lr\n",
 254 |         "\n",
 255 |         "    def step(self):\n",
 256 |         "        with torch.no_grad():\n",
 257 |         "            for param in # YOUR CODE HERE:\n",
 258 |         "                # YOUR CODE HERE\n",
 259 |         "    def zero_grad(self):\n",
 260 |         "        for param in # YOUR CODE HERE:\n",
 261 |         "            # YOUR CODE HERE"
 262 |       ],
 263 |       "metadata": {
 264 |         "id": "70swrTmCeBZx"
 265 |       },
 266 |       "execution_count": null,
 267 |       "outputs": []
 268 |     },
 269 |     {
 270 |       "cell_type": "code",
 271 |       "source": [
 272 |         "def iterate_minibatches(X, y, batch_size):\n",
 273 |         "    indices = np.random.permutation(np.arange(len(X)))\n",
 274 |         "    for start in range(0, len(indices), batch_size):\n",
 275 |         "        ix = indices[start: start + batch_size]\n",
 276 |         "        yield torch.from_numpy(X[ix]), torch.from_numpy(y[ix])"
 277 |       ],
 278 |       "metadata": {
 279 |         "id": "nVvO14e90YjC"
 280 |       },
 281 |       "execution_count": null,
 282 |       "outputs": []
 283 |     },
 284 |     {
 285 |       "cell_type": "code",
 286 |       "source": [
 287 |         "def train(net, optimizer, loss_fn, n_epoch=20):\n",
 288 |         "    loss_history = []\n",
 289 |         "    acc_history = []\n",
 290 |         "    val_loss_history = []\n",
 291 |         "    val_acc_history = []\n",
 292 |         "\n",
 293 |         "    for i in range(n_epoch):\n",
 294 |         "        # Training\n",
 295 |         "        # net.train()\n",
 296 |         "        acc_batches=[]\n",
 297 |         "        loss_batches=[]\n",
 298 |         "        for x_batch, y_batch in iterate_minibatches(X_train, y_train, batch_size=64):\n",
 299 |         "            # x_batch = # YOUR CODE HERE\n",
 300 |         "            # y_batch = # YOUR CODE HERE\n",
 301 |         "            # Forward\n",
 302 |         "            # loss = # YOUR CODE HERE\n",
 303 |         "\n",
 304 |         "            # Backward\n",
 305 |         "            # YOUR CODE HERE\n",
 306 |         "\n",
 307 |         "            # Accuracy\n",
 308 |         "            acc_batches += (out.argmax(axis=1) == y_batch).detach().cpu().numpy().tolist()\n",
 309 |         "\n",
 310 |         "        loss_history.append(np.mean(loss_batches))\n",
 311 |         "        acc_history.append(np.mean(acc_batches))\n",
 312 |         "\n",
 313 |         "        # Validating\n",
 314 |         "        # net.eval()\n",
 315 |         "        with torch.no_grad():\n",
 316 |         "            acc_batches=[]\n",
 317 |         "            loss_batches=[]\n",
 318 |         "            for x_batch, y_batch in iterate_minibatches(X_val, y_val, batch_size=64):\n",
 319 |         "                # x_batch = # YOUR CODE HERE\n",
 320 |         "                # y_batch = # YOUR CODE HERE\n",
 321 |         "                # Forward\n",
 322 |         "                # loss = # YOUR CODE HERE\n",
 323 |         "                # Accuracy\n",
 324 |         "                acc_batches += (out.argmax(axis=1) == y_batch).detach().cpu().numpy().tolist()\n",
 325 |         "\n",
 326 |         "            val_loss_history.append(np.mean(loss_batches))\n",
 327 |         "            val_acc_history.append(np.mean(acc_batches))\n",
 328 |         "\n",
 329 |         "        clear_output(wait=True)\n",
 330 |         "        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))\n",
 331 |         "        ax1.set_xlabel(\"#epoch\")\n",
 332 |         "        ax1.set_ylabel(\"Loss\")\n",
 333 |         "        ax1.plot(loss_history, 'b', label='train loss')\n",
 334 |         "        ax1.plot(val_loss_history, 'r', label='val loss')\n",
 335 |         "\n",
 336 |         "        ax2.set_xlabel(\"#epoch\")\n",
 337 |         "        ax2.set_ylabel(\"Acc\")\n",
 338 |         "        ax2.plot(acc_history, 'b', label='train acc')\n",
 339 |         "        ax2.plot(val_acc_history, 'r', label='val acc')\n",
 340 |         "        plt.axhline(y = 0.8, color = 'g', linestyle = '--')\n",
 341 |         "\n",
 342 |         "        plt.legend()\n",
 343 |         "        plt.show()\n",
 344 |         "    return max(val_acc_history)"
 345 |       ],
 346 |       "metadata": {
 347 |         "id": "DjLxUiA1eLtZ"
 348 |       },
 349 |       "execution_count": null,
 350 |       "outputs": []
 351 |     },
 352 |     {
 353 |       "cell_type": "code",
 354 |       "source": [
 355 |         "net = # YOUR CODE HERE\n",
 356 |         "opt = # YOUR CODE HERE\n",
 357 |         "# train(net, opt, cross_entropy_loss)"
 358 |       ],
 359 |       "metadata": {
 360 |         "id": "Ps2HNwINqtVn"
 361 |       },
 362 |       "execution_count": null,
 363 |       "outputs": []
 364 |     },
 365 |     {
 366 |       "cell_type": "markdown",
 367 |       "source": [
 368 |         "### Hints:\n",
 369 |         "  - You'll have to use matrix W(feature_id x class_id)\n",
 370 |         "  - Softmax (exp over sum of exps) can be implemented manually or as `torch.softmax`\n",
 371 |         "  - Probably better to use STOCHASTIC gradient descent (minibatch) for greater speed\n",
 372 |         "  - You need to train both layers, not just the output layer :)\n",
 373 |         "  - 50 hidden neurons and a ReLU nonlinearity will do for a start. Many ways to improve.\n",
 374 |         "  - In ideal case this totals to 2 `torch.matmul`'s, 1 softmax and 1 ReLU/sigmoid  \n",
 375 |         "  - If anything seems wrong, try going through one step of training and printing everything you compute.\n",
 376 |         "  - If you see NaNs midway through optimization, you can estimate $\\log P(y \\mid x)$ as `torch.log_softmax(last_linear_layer_outputs)`."
 377 |       ],
 378 |       "metadata": {
 379 |         "id": "wwQYqNdugwmp"
 380 |       }
 381 |     },
 382 |     {
 383 |       "cell_type": "markdown",
 384 |       "source": [
 385 |         "# Task 3. Overfitting (4 points)\n"
 386 |       ],
 387 |       "metadata": {
 388 |         "id": "1fxSuZJwb1bx"
 389 |       }
 390 |     },
 391 |     {
 392 |       "cell_type": "markdown",
 393 |       "source": [
 394 |         "Today we work with [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) (*hint: it is available in `torchvision`*).\n",
 395 |         "\n",
 396 |         "Your goal for today:\n",
 397 |         "0. Fill the gaps in training loop and architectures.\n",
 398 |         "1. Train a tiny __FC__ network.\n",
 399 |         "2. Cause considerable overfitting by modifying the network (e.g. increasing the number of network parameters and/or layers) and demonstrate in in the appropriate way (e.g. plot loss and accurasy on train and validation set w.r.t. network complexity).\n",
 400 |         "3. Try to deal with overfitting (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results.\n",
 401 |         "\n",
 402 |         "Train a network that achieves $\\geq 0.885$ test accuracy. Again you should use only Linear (`nn.Linear`) layers and activations/dropout/batchnorm. Convolutional layers might be a great use, but we will meet them a bit later.\n",
 403 |         "\n",
 404 |         "__Please, write a small report describing your ideas, tries and achieved results in the end of this task.__\n",
 405 |         "\n",
 406 |         "*Note*: in task 3 your goal is to make the network from task 2 less prone to overfitting. And then to train the network that achives $\\geq 0.885$ test accuracy, so it can be different.\n",
 407 |         "\n",
 408 |         "**Bonus:** For the best score in group you get +1.5, 1.0, 0.5 point(1st, 2nd, 3rd places)."
 409 |       ],
 410 |       "metadata": {
 411 |         "id": "g9j3vEc9rsQC"
 412 |       }
 413 |     },
 414 |     {
 415 |       "cell_type": "code",
 416 |       "source": [
 417 |         "import torch\n",
 418 |         "import torch.nn as nn\n",
 419 |         "import torchvision\n",
 420 |         "import torchvision.transforms as transforms\n",
 421 |         "import torchsummary\n",
 422 |         "\n",
 423 |         "from matplotlib import pyplot as plt\n",
 424 |         "from matplotlib.pyplot import figure\n",
 425 |         "import numpy as np\n",
 426 |         "import os\n",
 427 |         "from tqdm import tqdm\n",
 428 |         "from sklearn.model_selection import train_test_split"
 429 |       ],
 430 |       "metadata": {
 431 |         "id": "O1fxVkDOb3QX"
 432 |       },
 433 |       "execution_count": null,
 434 |       "outputs": []
 435 |     },
 436 |     {
 437 |       "cell_type": "code",
 438 |       "source": [
 439 |         "# Technical function\n",
 440 |         "def mkdir(path):\n",
 441 |         "    if not os.path.exists(root_path):\n",
 442 |         "        os.mkdir(root_path)\n",
 443 |         "        print('Directory', path, 'is created!')\n",
 444 |         "    else:\n",
 445 |         "        print('Directory', path, 'already exists!')\n",
 446 |         "\n",
 447 |         "root_path = 'fmnist'\n",
 448 |         "mkdir(root_path)"
 449 |       ],
 450 |       "metadata": {
 451 |         "id": "kzUtrIEgrwWi"
 452 |       },
 453 |       "execution_count": null,
 454 |       "outputs": []
 455 |     },
 456 |     {
 457 |       "cell_type": "code",
 458 |       "execution_count": null,
 459 |       "metadata": {
 460 |         "id": "qt6LE7XaTDT9"
 461 |       },
 462 |       "outputs": [],
 463 |       "source": [
 464 |         "download = True\n",
 465 |         "train_transform = transforms.ToTensor()\n",
 466 |         "test_transform = transforms.ToTensor()\n",
 467 |         "\n",
 468 |         "fmnist_dataset_train = torchvision.datasets.FashionMNIST(root_path,\n",
 469 |         "                                                        train=True,\n",
 470 |         "                                                        transform=train_transform,\n",
 471 |         "                                                        target_transform=None,\n",
 472 |         "                                                        download=download)\n",
 473 |         "fmnist_dataset_test = torchvision.datasets.FashionMNIST(root_path,\n",
 474 |         "                                                       train=False,\n",
 475 |         "                                                       transform=test_transform,\n",
 476 |         "                                                       target_transform=None,\n",
 477 |         "                                                       download=download)"
 478 |       ]
 479 |     },
 480 |     {
 481 |       "cell_type": "code",
 482 |       "source": [
 483 |         "fmnist_dataset_train, fmnist_dataset_val = train_test_split(fmnist_dataset_train, train_size=50000)"
 484 |       ],
 485 |       "metadata": {
 486 |         "id": "15B67G4iGMIr"
 487 |       },
 488 |       "execution_count": null,
 489 |       "outputs": []
 490 |     },
 491 |     {
 492 |       "cell_type": "code",
 493 |       "source": [
 494 |         "len(fmnist_dataset_train), len(fmnist_dataset_val), len(fmnist_dataset_test)"
 495 |       ],
 496 |       "metadata": {
 497 |         "id": "cpbEkSCM1n3Z"
 498 |       },
 499 |       "execution_count": null,
 500 |       "outputs": []
 501 |     },
 502 |     {
 503 |       "cell_type": "code",
 504 |       "execution_count": null,
 505 |       "metadata": {
 506 |         "id": "71YP0SPwTIxD"
 507 |       },
 508 |       "outputs": [],
 509 |       "source": [
 510 |         "train_loader = torch.utils.data.DataLoader(fmnist_dataset_train,\n",
 511 |         "                                           batch_size=128,\n",
 512 |         "                                           shuffle=True,\n",
 513 |         "                                           num_workers=2)\n",
 514 |         "\n",
 515 |         "val_loader = torch.utils.data.DataLoader(fmnist_dataset_val,\n",
 516 |         "                                           batch_size=256,\n",
 517 |         "                                           shuffle=True,\n",
 518 |         "                                           num_workers=2)\n",
 519 |         "\n",
 520 |         "test_loader = torch.utils.data.DataLoader(fmnist_dataset_test,\n",
 521 |         "                                          batch_size=256,\n",
 522 |         "                                          shuffle=False,\n",
 523 |         "                                          num_workers=2)"
 524 |       ]
 525 |     },
 526 |     {
 527 |       "cell_type": "code",
 528 |       "execution_count": null,
 529 |       "metadata": {
 530 |         "id": "aHca15bOTY4B"
 531 |       },
 532 |       "outputs": [],
 533 |       "source": [
 534 |         "for img, label in train_loader:\n",
 535 |         "    print(img.shape)\n",
 536 |         "    # print(img)\n",
 537 |         "    print(label.shape)\n",
 538 |         "    break\n",
 539 |         "\n",
 540 |         "plt.imshow(img[0, 0]);"
 541 |       ]
 542 |     },
 543 |     {
 544 |       "cell_type": "code",
 545 |       "source": [
 546 |         "def train_val_loop(net, train_loader, val_loader, name, optimizer, criterion, n_epoch=20):\n",
 547 |         "    loss_history = []\n",
 548 |         "    acc_history = []\n",
 549 |         "    val_loss_history = []\n",
 550 |         "    val_acc_history = []\n",
 551 |         "\n",
 552 |         "    for i in range(n_epoch):\n",
 553 |         "        net.train()\n",
 554 |         "        acc_batches=[]\n",
 555 |         "        loss_batches=[]\n",
 556 |         "        for x_batch, y_batch in train_loader:\n",
 557 |         "            # x_batch = YOUR CODE HERE\n",
 558 |         "            # y_batch = YOUR CODE HERE\n",
 559 |         "\n",
 560 |         "            # Forward\n",
 561 |         "            # loss = YOUR CODE HERE\n",
 562 |         "\n",
 563 |         "            # Backward\n",
 564 |         "            # ... YOUR CODE HERE\n",
 565 |         "\n",
 566 |         "            # Accuracy\n",
 567 |         "            # acc_batches = YOUR CODE HERE\n",
 568 |         "\n",
 569 |         "        loss_history.append(np.mean(loss_batches))\n",
 570 |         "        acc_history.append(np.mean(acc_batches))\n",
 571 |         "\n",
 572 |         "        # Validating\n",
 573 |         "        net.eval()\n",
 574 |         "        with torch.no_grad():\n",
 575 |         "            acc_batches=[]\n",
 576 |         "            loss_batches=[]\n",
 577 |         "            for x_batch, y_batch in val_loader:\n",
 578 |         "                # x_batch = YOUR CODE HERE\n",
 579 |         "                # y_batch = YOUR CODE HERE\n",
 580 |         "\n",
 581 |         "                # Forward\n",
 582 |         "                # loss = YOUR CODE HERE\n",
 583 |         "\n",
 584 |         "                # Accuracy\n",
 585 |         "                # acc_batches = YOUR CODE HERE\n",
 586 |         "\n",
 587 |         "            val_loss_history.append(np.mean(loss_batches))\n",
 588 |         "            val_acc_history.append(np.mean(acc_batches))\n",
 589 |         "\n",
 590 |         "        clear_output(wait=True)\n",
 591 |         "        plt.figure(figsize=(8, 5))\n",
 592 |         "        plt.title(f\"Training/validating loss {name}\")\n",
 593 |         "        plt.xlabel(\"#epoch\")\n",
 594 |         "        plt.ylabel(\"Loss\")\n",
 595 |         "        plt.plot(loss_history, 'b', label='train')\n",
 596 |         "        plt.plot(val_loss_history, 'r', label='validation')\n",
 597 |         "        plt.legend()\n",
 598 |         "\n",
 599 |         "        plt.figure(figsize=(8, 5))\n",
 600 |         "        plt.title(f\"Training/validating accuracy {name}\")\n",
 601 |         "        plt.xlabel(\"#epoch\")\n",
 602 |         "        plt.ylabel(\"Accuracy\")\n",
 603 |         "        plt.plot(acc_history, 'b', label='train')\n",
 604 |         "        plt.plot(val_acc_history, 'r', label='validation')\n",
 605 |         "        plt.legend()\n",
 606 |         "\n",
 607 |         "        plt.show()\n",
 608 |         "\n",
 609 |         "def test_accuracy(model):\n",
 610 |         "    model.eval()\n",
 611 |         "    test_acc_batches = []\n",
 612 |         "    with torch.no_grad():\n",
 613 |         "        for X_test, Y_test in test_loader:\n",
 614 |         "            X_test = X_test.to(DEVICE)\n",
 615 |         "            Y_test = Y_test.to(DEVICE)\n",
 616 |         "            out = model.forward(X_test)\n",
 617 |         "            test_acc_batches += (out.argmax(axis=1) == Y_test).detach().cpu().numpy().tolist()\n",
 618 |         "    print(f'Test accuracy {np.mean(test_acc_batches)}')"
 619 |       ],
 620 |       "metadata": {
 621 |         "id": "QBu9fg_dAymN"
 622 |       },
 623 |       "execution_count": null,
 624 |       "outputs": []
 625 |     },
 626 |     {
 627 |       "cell_type": "markdown",
 628 |       "metadata": {
 629 |         "id": "b6OOOffHTfX5"
 630 |       },
 631 |       "source": [
 632 |         "## Task 3.1 Tiny net\n",
 633 |         "Train a tiny network just to validate correctness of train loop, net architecture, params."
 634 |       ]
 635 |     },
 636 |     {
 637 |       "cell_type": "code",
 638 |       "execution_count": null,
 639 |       "metadata": {
 640 |         "id": "ftpkTjxlTcFx"
 641 |       },
 642 |       "outputs": [],
 643 |       "source": [
 644 |         "class TinyNeuralNetwork(nn.Module):\n",
 645 |         "    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):\n",
 646 |         "        super(self.__class__, self).__init__()\n",
 647 |         "        self.model = nn.Sequential(\n",
 648 |         "            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards\n",
 649 |         "            # YOUR CODE HERE\n",
 650 |         "        )\n",
 651 |         "\n",
 652 |         "    def forward(self, inp):\n",
 653 |         "        out = self.model(inp)\n",
 654 |         "        return out"
 655 |       ]
 656 |     },
 657 |     {
 658 |       "cell_type": "code",
 659 |       "source": [
 660 |         "model = TinyNeuralNetwork()\n",
 661 |         "out = model(torch.randn(2, 1, 28, 28))\n",
 662 |         "assert len(out.shape) == 2\n",
 663 |         "assert out.shape[-1] == 10"
 664 |       ],
 665 |       "metadata": {
 666 |         "id": "bFnYI29v4N1P"
 667 |       },
 668 |       "execution_count": null,
 669 |       "outputs": []
 670 |     },
 671 |     {
 672 |       "cell_type": "code",
 673 |       "execution_count": null,
 674 |       "metadata": {
 675 |         "id": "EAhMwySkrlpq"
 676 |       },
 677 |       "outputs": [],
 678 |       "source": [
 679 |         "torchsummary.summary(TinyNeuralNetwork().to(DEVICE), (28*28,))"
 680 |       ]
 681 |     },
 682 |     {
 683 |       "cell_type": "code",
 684 |       "execution_count": null,
 685 |       "metadata": {
 686 |         "id": "i3POFj90Ti-6"
 687 |       },
 688 |       "outputs": [],
 689 |       "source": [
 690 |         "# tiny_model = TinyNeuralNetwork().to(DEVICE)\n",
 691 |         "# opt = # YOUR CODE HERE\n",
 692 |         "# loss_func = # YOUR CODE HERE\n",
 693 |         "# n_epoch = # YOUR CODE HERE\n",
 694 |         "\n",
 695 |         "# Your experiments, come here\n",
 696 |         "# train_val_loop(tiny_model, train_loader=train_loader, val_loader=val_loader, name='tiny model', optimizer=opt, criterion=loss_func, n_epoch=n_epoch)"
 697 |       ]
 698 |     },
 699 |     {
 700 |       "cell_type": "code",
 701 |       "source": [
 702 |         "# test_accuracy(tiny_model)"
 703 |       ],
 704 |       "metadata": {
 705 |         "id": "XxRsu7pBIf5f"
 706 |       },
 707 |       "execution_count": null,
 708 |       "outputs": []
 709 |     },
 710 |     {
 711 |       "cell_type": "markdown",
 712 |       "metadata": {
 713 |         "id": "L7ISqkjmCPB1"
 714 |       },
 715 |       "source": [
 716 |         "## Task 3.2: Overfit it.\n",
 717 |         "Build a network that will overfit to this dataset. Demonstrate the overfitting in the appropriate way (e.g. plot loss and accurasy on train and test set w.r.t. network complexity).\n",
 718 |         "\n",
 719 |         "*Note:* you also might decrease the size of `train` dataset to enforce the overfitting and speed up the computations."
 720 |       ]
 721 |     },
 722 |     {
 723 |       "cell_type": "code",
 724 |       "execution_count": null,
 725 |       "metadata": {
 726 |         "id": "H12uAWiGBwJx"
 727 |       },
 728 |       "outputs": [],
 729 |       "source": [
 730 |         "class OverfittingNeuralNetwork(nn.Module):\n",
 731 |         "    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):\n",
 732 |         "        super(self.__class__, self).__init__()\n",
 733 |         "        self.model = nn.Sequential(\n",
 734 |         "            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards\n",
 735 |         "            # YOUR CODE HERE\n",
 736 |         "        )\n",
 737 |         "\n",
 738 |         "    def forward(self, inp):\n",
 739 |         "        out = self.model(inp)\n",
 740 |         "        return out"
 741 |       ]
 742 |     },
 743 |     {
 744 |       "cell_type": "code",
 745 |       "source": [
 746 |         "model = OverfittingNeuralNetwork()\n",
 747 |         "out = model(torch.randn(2, 1, 28, 28))\n",
 748 |         "assert len(out.shape) == 2\n",
 749 |         "assert out.shape[-1] == 10"
 750 |       ],
 751 |       "metadata": {
 752 |         "id": "zR4muHZr5k8I"
 753 |       },
 754 |       "execution_count": null,
 755 |       "outputs": []
 756 |     },
 757 |     {
 758 |       "cell_type": "code",
 759 |       "execution_count": null,
 760 |       "metadata": {
 761 |         "id": "JgXAKCpvCwqH"
 762 |       },
 763 |       "outputs": [],
 764 |       "source": [
 765 |         "torchsummary.summary(OverfittingNeuralNetwork().to(DEVICE), (28*28,))"
 766 |       ]
 767 |     },
 768 |     {
 769 |       "cell_type": "code",
 770 |       "execution_count": null,
 771 |       "metadata": {
 772 |         "id": "Iyuwd4ZLrlpr"
 773 |       },
 774 |       "outputs": [],
 775 |       "source": [
 776 |         "# overfit_model = OverfittingNeuralNetwork().to(DEVICE)\n",
 777 |         "# opt = # YOUR CODE HERE\n",
 778 |         "# loss_func = # YOUR CODE HERE\n",
 779 |         "# n_epoch = # YOUR CODE HERE\n",
 780 |         "\n",
 781 |         "# Your experiments, come here\n",
 782 |         "# train_val_loop(overfit_model, train_loader=train_loader, val_loader=val_loader, name='overfit model', optimizer=opt, criterion=loss_func, n_epoch=n_epoch)"
 783 |       ]
 784 |     },
 785 |     {
 786 |       "cell_type": "code",
 787 |       "source": [
 788 |         "# test_accuracy(overfit_model)"
 789 |       ],
 790 |       "metadata": {
 791 |         "id": "GqFQzpfJIem3"
 792 |       },
 793 |       "execution_count": null,
 794 |       "outputs": []
 795 |     },
 796 |     {
 797 |       "cell_type": "markdown",
 798 |       "metadata": {
 799 |         "id": "LG8mNHtPrlpr"
 800 |       },
 801 |       "source": [
 802 |         "## Task 3.3: Fix it.\n",
 803 |         "Fix the overfitted network from the previous step (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results."
 804 |       ]
 805 |     },
 806 |     {
 807 |       "cell_type": "code",
 808 |       "execution_count": null,
 809 |       "metadata": {
 810 |         "id": "42343iSyrlpr"
 811 |       },
 812 |       "outputs": [],
 813 |       "source": [
 814 |         "class FixedNeuralNetwork(nn.Module):\n",
 815 |         "    def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):\n",
 816 |         "        super(self.__class__, self).__init__()\n",
 817 |         "        self.model = nn.Sequential(\n",
 818 |         "            nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards\n",
 819 |         "            # YOUR CODE HERE\n",
 820 |         "        )\n",
 821 |         "\n",
 822 |         "    def forward(self, inp):\n",
 823 |         "        out = self.model(inp)\n",
 824 |         "        return out"
 825 |       ]
 826 |     },
 827 |     {
 828 |       "cell_type": "code",
 829 |       "source": [
 830 |         "model = FixedNeuralNetwork()\n",
 831 |         "out = model(torch.randn(2, 1, 28, 28))\n",
 832 |         "assert len(out.shape) == 2\n",
 833 |         "assert out.shape[-1] == 10"
 834 |       ],
 835 |       "metadata": {
 836 |         "id": "93Twxz_N6Ade"
 837 |       },
 838 |       "execution_count": null,
 839 |       "outputs": []
 840 |     },
 841 |     {
 842 |       "cell_type": "code",
 843 |       "execution_count": null,
 844 |       "metadata": {
 845 |         "id": "TR1xQBp9rlps"
 846 |       },
 847 |       "outputs": [],
 848 |       "source": [
 849 |         "torchsummary.summary(FixedNeuralNetwork().to(DEVICE), (28*28,))"
 850 |       ]
 851 |     },
 852 |     {
 853 |       "cell_type": "code",
 854 |       "execution_count": null,
 855 |       "metadata": {
 856 |         "id": "OMdEf9Kbrlps"
 857 |       },
 858 |       "outputs": [],
 859 |       "source": [
 860 |         "# fixed_model = FixedNeuralNetwork().to(device)\n",
 861 |         "# opt = # YOUR CODE HERE\n",
 862 |         "# loss_func = # YOUR CODE HERE\n",
 863 |         "# n_epoch = # YOUR CODE HERE\n",
 864 |         "\n",
 865 |         "# Your experiments, come here\n",
 866 |         "# train_val_loop(fixed_model, train_loader=train_loader, val_loader=val_loader, name='fixed model', optimizer=opt, criterion=loss_func, n_epoch=n_epoch)"
 867 |       ]
 868 |     },
 869 |     {
 870 |       "cell_type": "code",
 871 |       "source": [
 872 |         "# test_accuracy(fixed_model)"
 873 |       ],
 874 |       "metadata": {
 875 |         "id": "idv0HgvDIckN"
 876 |       },
 877 |       "execution_count": null,
 878 |       "outputs": []
 879 |     },
 880 |     {
 881 |       "cell_type": "markdown",
 882 |       "metadata": {
 883 |         "id": "dMui_uLJ7G0d"
 884 |       },
 885 |       "source": [
 886 |         "### Conclusions:\n",
 887 |         "_Write down small report with your conclusions and your ideas._\n",
 888 |         "\n",
 889 |         "YOUR WORDS HERE"
 890 |       ]
 891 |     },
 892 |     {
 893 |       "cell_type": "markdown",
 894 |       "source": [
 895 |         "# Task 4. Your own nn layer. (4 points)"
 896 |       ],
 897 |       "metadata": {
 898 |         "id": "7J7Tpbc9udSn"
 899 |       }
 900 |     },
 901 |     {
 902 |       "cell_type": "code",
 903 |       "source": [
 904 |         "class Module(object):\n",
 905 |         "    \"\"\"\n",
 906 |         "    Basically, you can think of a module as of a something (black box)\n",
 907 |         "    which can process `input` data and produce `ouput` data.\n",
 908 |         "    This is like applying a function which is called `forward`:\n",
 909 |         "\n",
 910 |         "        output = module.forward(input)\n",
 911 |         "\n",
 912 |         "    The module should be able to perform a backward pass: to differentiate the `forward` function.\n",
 913 |         "    More, it should be able to differentiate it if is a part of chain (chain rule).\n",
 914 |         "    The latter implies there is a gradient from previous step of a chain rule.\n",
 915 |         "\n",
 916 |         "        gradInput = module.backward(input, gradOutput)\n",
 917 |         "    \"\"\"\n",
 918 |         "    def __init__ (self):\n",
 919 |         "        self.output = None\n",
 920 |         "        self.gradInput = None\n",
 921 |         "        self.training = True\n",
 922 |         "\n",
 923 |         "    def forward(self, input):\n",
 924 |         "        \"\"\"\n",
 925 |         "        Takes an input object, and computes the corresponding output of the module.\n",
 926 |         "        \"\"\"\n",
 927 |         "        return self.updateOutput(input)\n",
 928 |         "\n",
 929 |         "    def backward(self,input, gradOutput):\n",
 930 |         "        \"\"\"\n",
 931 |         "        Performs a backpropagation step through the module, with respect to the given input.\n",
 932 |         "\n",
 933 |         "        This includes\n",
 934 |         "         - computing a gradient w.r.t. `input` (is needed for further backprop),\n",
 935 |         "         - computing a gradient w.r.t. parameters (to update parameters while optimizing).\n",
 936 |         "        \"\"\"\n",
 937 |         "        self.updateGradInput(input, gradOutput)\n",
 938 |         "        self.accGradParameters(input, gradOutput)\n",
 939 |         "        return self.gradInput\n",
 940 |         "\n",
 941 |         "\n",
 942 |         "    def updateOutput(self, input):\n",
 943 |         "        \"\"\"\n",
 944 |         "        Computes the output using the current parameter set of the class and input.\n",
 945 |         "        This function returns the result which is stored in the `output` field.\n",
 946 |         "\n",
 947 |         "        Make sure to both store the data in `output` field and return it.\n",
 948 |         "        \"\"\"\n",
 949 |         "\n",
 950 |         "        # The easiest case:\n",
 951 |         "\n",
 952 |         "        # self.output = input\n",
 953 |         "        # return self.output\n",
 954 |         "\n",
 955 |         "        pass\n",
 956 |         "\n",
 957 |         "    def updateGradInput(self, input, gradOutput):\n",
 958 |         "        \"\"\"\n",
 959 |         "        Computing the gradient of the module with respect to its own input.\n",
 960 |         "        This is returned in `gradInput`. Also, the `gradInput` state variable is updated accordingly.\n",
 961 |         "\n",
 962 |         "        The shape of `gradInput` is always the same as the shape of `input`.\n",
 963 |         "\n",
 964 |         "        Make sure to both store the gradients in `gradInput` field and return it.\n",
 965 |         "        \"\"\"\n",
 966 |         "\n",
 967 |         "        # The easiest case:\n",
 968 |         "\n",
 969 |         "        # self.gradInput = gradOutput\n",
 970 |         "        # return self.gradInput\n",
 971 |         "\n",
 972 |         "        pass\n",
 973 |         "\n",
 974 |         "    def accGradParameters(self, input, gradOutput):\n",
 975 |         "        \"\"\"\n",
 976 |         "        Computing the gradient of the module with respect to its own parameters.\n",
 977 |         "        No need to override if module has no parameters (e.g. ReLU).\n",
 978 |         "        \"\"\"\n",
 979 |         "        pass\n",
 980 |         "\n",
 981 |         "    def zeroGradParameters(self):\n",
 982 |         "        \"\"\"\n",
 983 |         "        Zeroes `gradParams` variable if the module has params.\n",
 984 |         "        \"\"\"\n",
 985 |         "        pass\n",
 986 |         "\n",
 987 |         "    def getParameters(self):\n",
 988 |         "        \"\"\"\n",
 989 |         "        Returns a list with its parameters.\n",
 990 |         "        If the module does not have parameters return empty list.\n",
 991 |         "        \"\"\"\n",
 992 |         "        return []\n",
 993 |         "\n",
 994 |         "    def getGradParameters(self):\n",
 995 |         "        \"\"\"\n",
 996 |         "        Returns a list with gradients with respect to its parameters.\n",
 997 |         "        If the module does not have parameters return empty list.\n",
 998 |         "        \"\"\"\n",
 999 |         "        return []\n",
1000 |         "\n",
1001 |         "    def train(self):\n",
1002 |         "        \"\"\"\n",
1003 |         "        Sets training mode for the module.\n",
1004 |         "        Training and testing behaviour differs for Dropout, BatchNorm.\n",
1005 |         "        \"\"\"\n",
1006 |         "        self.training = True\n",
1007 |         "\n",
1008 |         "    def evaluate(self):\n",
1009 |         "        \"\"\"\n",
1010 |         "        Sets evaluation mode for the module.\n",
1011 |         "        Training and testing behaviour differs for Dropout, BatchNorm.\n",
1012 |         "        \"\"\"\n",
1013 |         "        self.training = False\n",
1014 |         "\n",
1015 |         "    def __repr__(self):\n",
1016 |         "        \"\"\"\n",
1017 |         "        Pretty printing. Should be overrided in every module if you want\n",
1018 |         "        to have readable description.\n",
1019 |         "        \"\"\"\n",
1020 |         "        return \"Module\""
1021 |       ],
1022 |       "metadata": {
1023 |         "id": "vEb-QueVzvMq"
1024 |       },
1025 |       "execution_count": null,
1026 |       "outputs": []
1027 |     },
1028 |     {
1029 |       "cell_type": "markdown",
1030 |       "source": [
1031 |         "### Linear transform layer\n",
1032 |         "Also known as dense layer, fully-connected layer, FC-layer.\n",
1033 |         "You should implement it.\n",
1034 |         "\n",
1035 |         "- input:   **`batch_size x n_feats1`**\n",
1036 |         "- output: **`batch_size x n_feats2`**"
1037 |       ],
1038 |       "metadata": {
1039 |         "id": "HzohCP4qz42l"
1040 |       }
1041 |     },
1042 |     {
1043 |       "cell_type": "code",
1044 |       "source": [
1045 |         "class Linear(Module):\n",
1046 |         "    \"\"\"\n",
1047 |         "    A module which applies a linear transformation\n",
1048 |         "    A common name is fully-connected layer, InnerProductLayer in caffe.\n",
1049 |         "\n",
1050 |         "    The module should work with 2D input of shape (n_samples, n_feature).\n",
1051 |         "    \"\"\"\n",
1052 |         "    def __init__(self, n_in, n_out):\n",
1053 |         "        super(Linear, self).__init__()\n",
1054 |         "\n",
1055 |         "        # This is a nice initialization\n",
1056 |         "        stdv = 1./np.sqrt(n_in)\n",
1057 |         "        #it is important that we should multiply X @ W^T\n",
1058 |         "        self.W = np.random.uniform(-stdv, stdv, size = (n_out, n_in))\n",
1059 |         "        self.b = np.random.uniform(-stdv, stdv, size = n_out)\n",
1060 |         "\n",
1061 |         "        self.gradW = np.zeros_like(self.W)\n",
1062 |         "        self.gradb = np.zeros_like(self.b)\n",
1063 |         "\n",
1064 |         "    def updateOutput(self, input):\n",
1065 |         "        # YOUR CODE HERE\n",
1066 |         "        pass\n",
1067 |         "\n",
1068 |         "    def updateGradInput(self, input, gradOutput):\n",
1069 |         "        # YOUR CODE HERE\n",
1070 |         "        pass\n",
1071 |         "\n",
1072 |         "    def accGradParameters(self, input, gradOutput):\n",
1073 |         "        # YOUR CODE HERE\n",
1074 |         "        pass\n",
1075 |         "\n",
1076 |         "    def zeroGradParameters(self):\n",
1077 |         "        self.gradW.fill(0)\n",
1078 |         "        self.gradb.fill(0)\n",
1079 |         "\n",
1080 |         "    def getParameters(self):\n",
1081 |         "        return [self.W, self.b]\n",
1082 |         "\n",
1083 |         "    def getGradParameters(self):\n",
1084 |         "        return [self.gradW, self.gradb]\n",
1085 |         "\n",
1086 |         "    def __repr__(self):\n",
1087 |         "        s = self.W.shape\n",
1088 |         "        q = 'Linear %d -> %d' %(s[1],s[0])\n",
1089 |         "        return q"
1090 |       ],
1091 |       "metadata": {
1092 |         "id": "bbkgSwhpz1yw"
1093 |       },
1094 |       "execution_count": null,
1095 |       "outputs": []
1096 |     },
1097 |     {
1098 |       "cell_type": "code",
1099 |       "source": [
1100 |         "def test_Linear():\n",
1101 |         "    np.random.seed(42)\n",
1102 |         "    torch.manual_seed(42)\n",
1103 |         "\n",
1104 |         "    batch_size, n_in, n_out = 2, 3, 4\n",
1105 |         "    for _ in range(100):\n",
1106 |         "        # layers initialization\n",
1107 |         "        torch_layer = torch.nn.Linear(n_in, n_out)\n",
1108 |         "        custom_layer = Linear(n_in, n_out)\n",
1109 |         "        custom_layer.W = torch_layer.weight.data.numpy()\n",
1110 |         "        custom_layer.b = torch_layer.bias.data.numpy()\n",
1111 |         "\n",
1112 |         "        layer_input = np.random.uniform(-10, 10, (batch_size, n_in)).astype(np.float32)\n",
1113 |         "        next_layer_grad = np.random.uniform(-10, 10, (batch_size, n_out)).astype(np.float32)\n",
1114 |         "\n",
1115 |         "        # 1. check layer output\n",
1116 |         "        custom_layer_output = custom_layer.updateOutput(layer_input)\n",
1117 |         "        layer_input_var = torch.from_numpy(layer_input).requires_grad_(True)\n",
1118 |         "        torch_layer_output_var = torch_layer(layer_input_var)\n",
1119 |         "        assert np.allclose(torch_layer_output_var.data.numpy(), custom_layer_output, atol=1e-6)\n",
1120 |         "\n",
1121 |         "        # 2. check layer input grad\n",
1122 |         "        custom_layer_grad = custom_layer.updateGradInput(layer_input, next_layer_grad)\n",
1123 |         "        torch_layer_output_var.backward(torch.from_numpy(next_layer_grad))\n",
1124 |         "        torch_layer_grad_var = layer_input_var.grad\n",
1125 |         "        assert np.allclose(torch_layer_grad_var.data.numpy(), custom_layer_grad, atol=1e-6)\n",
1126 |         "\n",
1127 |         "        # 3. check layer parameters grad\n",
1128 |         "        custom_layer.accGradParameters(layer_input, next_layer_grad)\n",
1129 |         "        weight_grad = custom_layer.gradW\n",
1130 |         "        bias_grad = custom_layer.gradb\n",
1131 |         "        torch_weight_grad = torch_layer.weight.grad.data.numpy()\n",
1132 |         "        torch_bias_grad = torch_layer.bias.grad.data.numpy()\n",
1133 |         "        assert np.allclose(torch_weight_grad, weight_grad, atol=1e-6)\n",
1134 |         "        assert np.allclose(torch_bias_grad, bias_grad, atol=1e-6)"
1135 |       ],
1136 |       "metadata": {
1137 |         "id": "M0PEw9VVugYd"
1138 |       },
1139 |       "execution_count": null,
1140 |       "outputs": []
1141 |     },
1142 |     {
1143 |       "cell_type": "code",
1144 |       "source": [
1145 |         "%%time\n",
1146 |         "test_Linear()"
1147 |       ],
1148 |       "metadata": {
1149 |         "id": "xOt8kzgwz7Ev"
1150 |       },
1151 |       "execution_count": null,
1152 |       "outputs": []
1153 |     },
1154 |     {
1155 |       "cell_type": "code",
1156 |       "source": [],
1157 |       "metadata": {
1158 |         "id": "hBPzu7JAauKd"
1159 |       },
1160 |       "execution_count": null,
1161 |       "outputs": []
1162 |     }
1163 |   ]
1164 | }


--------------------------------------------------------------------------------
/hw02/2_Hw_Students.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "cells": [
  3 |     {
  4 |       "cell_type": "code",
  5 |       "execution_count": null,
  6 |       "metadata": {
  7 |         "colab": {
  8 |           "base_uri": "https://localhost:8080/"
  9 |         },
 10 |         "id": "tsZpFIaRfROD",
 11 |         "outputId": "e5e6a59d-d91b-41f7-a230-fa4e9bc3e449"
 12 |       },
 13 |       "outputs": [
 14 |         {
 15 |           "output_type": "stream",
 16 |           "name": "stderr",
 17 |           "text": [
 18 |             "/usr/local/lib/python3.10/dist-packages/albumentations/__init__.py:13: UserWarning: A new version of Albumentations is available: 1.4.18 (you have 1.4.15). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.\n",
 19 |             "  check_for_updates()\n"
 20 |           ]
 21 |         }
 22 |       ],
 23 |       "source": [
 24 |         "import torch\n",
 25 |         "import torch.nn as nn\n",
 26 |         "import torchvision.models\n",
 27 |         "from torch.utils.data import Dataset, DataLoader\n",
 28 |         "import torch.optim as optim\n",
 29 |         "import torch.nn.functional as F\n",
 30 |         "\n",
 31 |         "import albumentations as A\n",
 32 |         "from albumentations.pytorch import ToTensorV2\n",
 33 |         "\n",
 34 |         "from tqdm import tqdm\n",
 35 |         "from PIL import Image\n",
 36 |         "import cv2\n",
 37 |         "import matplotlib.pyplot as plt\n",
 38 |         "import numpy as np\n",
 39 |         "\n",
 40 |         "import os\n",
 41 |         "from time import time"
 42 |       ]
 43 |     },
 44 |     {
 45 |       "cell_type": "markdown",
 46 |       "source": [
 47 |         "### Get the data"
 48 |       ],
 49 |       "metadata": {
 50 |         "id": "U9HlqnlYoUJM"
 51 |       }
 52 |     },
 53 |     {
 54 |       "cell_type": "code",
 55 |       "source": [
 56 |         "import gdown\n",
 57 |         "url = 'https://drive.google.com/uc?id=10f1H2T-5W-BiqabHHtlZ4ASs19TZmg8R'\n",
 58 |         "output = 'data.zip'\n",
 59 |         "gdown.download(url, output, quiet=False)"
 60 |       ],
 61 |       "metadata": {
 62 |         "colab": {
 63 |           "base_uri": "https://localhost:8080/",
 64 |           "height": 122
 65 |         },
 66 |         "id": "AYTvLpTFfR9L",
 67 |         "outputId": "3baedbdb-2b28-4ed1-d627-647633ef1d94"
 68 |       },
 69 |       "execution_count": null,
 70 |       "outputs": [
 71 |         {
 72 |           "output_type": "stream",
 73 |           "name": "stderr",
 74 |           "text": [
 75 |             "Downloading...\n",
 76 |             "From (original): https://drive.google.com/uc?id=10f1H2T-5W-BiqabHHtlZ4ASs19TZmg8R\n",
 77 |             "From (redirected): https://drive.google.com/uc?id=10f1H2T-5W-BiqabHHtlZ4ASs19TZmg8R&confirm=t&uuid=8c91a23e-b723-404e-864e-01c84f6f72f9\n",
 78 |             "To: /content/data.zip\n",
 79 |             "100%|██████████| 979M/979M [00:19<00:00, 50.4MB/s]\n"
 80 |           ]
 81 |         },
 82 |         {
 83 |           "output_type": "execute_result",
 84 |           "data": {
 85 |             "text/plain": [
 86 |               "'data.zip'"
 87 |             ],
 88 |             "application/vnd.google.colaboratory.intrinsic+json": {
 89 |               "type": "string"
 90 |             }
 91 |           },
 92 |           "metadata": {},
 93 |           "execution_count": 2
 94 |         }
 95 |       ]
 96 |     },
 97 |     {
 98 |       "cell_type": "code",
 99 |       "source": [
100 |         "!unzip data.zip"
101 |       ],
102 |       "metadata": {
103 |         "id": "TLSvVki2fzUf"
104 |       },
105 |       "execution_count": null,
106 |       "outputs": []
107 |     },
108 |     {
109 |       "cell_type": "markdown",
110 |       "source": [
111 |         "### Utilities (0.5 point)\n",
112 |         "\n",
113 |         "Complete dataset to load prepared images and masks. Don't forget to use augmentations.\n",
114 |         "\n",
115 |         "Some of the images are 1 channels, so use `gray2rgb`."
116 |       ],
117 |       "metadata": {
118 |         "id": "w1g03B9mtZeb"
119 |       }
120 |     },
121 |     {
122 |       "cell_type": "code",
123 |       "source": [
124 |         "def gray2rgb(img):\n",
125 |         "    if len(img.shape) != 3:\n",
126 |         "        img = np.dstack([img, img, img])\n",
127 |         "    return img\n",
128 |         "\n",
129 |         "def get_iou(gt, pred):\n",
130 |         "    pred = pred > 0.5\n",
131 |         "    return (gt & pred).sum() / (gt | pred).sum()\n",
132 |         "\n",
133 |         "class BirdsDataset(Dataset):\n",
134 |         "    def __init__(self, folder, ...) -> None:\n",
135 |         "        images_folder = os.path.join(folder, 'images')\n",
136 |         "        gt_folder = os.path.join(folder, 'gt')\n",
137 |         "\n",
138 |         "        for class_name in os.listdir(images_folder):\n",
139 |         "            for fname in os.listdir(os.path.join(images_folder, class_name)):\n",
140 |         "                # YOUR CODE HERE\n",
141 |         "\n",
142 |         "        self.transform = A.Compose([\n",
143 |         "            # YOUR CODE HERE\n",
144 |         "            ToTensorV2()\n",
145 |         "        ])\n",
146 |         "\n",
147 |         "    def __getitem__(self, index):\n",
148 |         "        # YOUR CODE HERE\n",
149 |         "        img = ...\n",
150 |         "        mask = ...\n",
151 |         "        img = gray2rgb(img)\n",
152 |         "        # YOUR CODE HERE\n",
153 |         "        return transformed_img, transformed_mask\n",
154 |         "\n",
155 |         "    def __len__(self):\n",
156 |         "        # YOUR CODE HERE\n",
157 |         "        return"
158 |       ],
159 |       "metadata": {
160 |         "id": "YT2QUTqFooxJ"
161 |       },
162 |       "execution_count": null,
163 |       "outputs": []
164 |     },
165 |     {
166 |       "cell_type": "markdown",
167 |       "source": [
168 |         "### Architecture (1 point)\n",
169 |         "Your task for today is to build your own Unet to solve the segmentation problem.\n",
170 |         "\n",
171 |         "As an encoder, you can use pre-trained on IMAGENET models(or parts) from torchvision. The decoder must be trained from scratch.\n",
172 |         "It is forbidden to use data not from the `data` folder.\n",
173 |         "\n",
174 |         "I advise you to experiment with the number of blocks so as not to overfit on the training sample and get good quality on validation."
175 |       ],
176 |       "metadata": {
177 |         "id": "dss-ZnpTuI1V"
178 |       }
179 |     },
180 |     {
181 |       "cell_type": "code",
182 |       "source": [
183 |         "class DecoderBlock(nn.Module):\n",
184 |         "    def __init__(self, in_channels, mid_channels, out_channels):\n",
185 |         "        super().__init__()\n",
186 |         "        # YOUR CODE HERE\n",
187 |         "\n",
188 |         "    def forward(self,x):\n",
189 |         "        # YOUR CODE HERE\n",
190 |         "        return\n",
191 |         "\n",
192 |         "class Unet(nn.Module):\n",
193 |         "    def __init__(self):\n",
194 |         "        super().__init__()\n",
195 |         "        # YOUR CODE HERE\n",
196 |         "        # encoder blocks\n",
197 |         "        self.encoder1=\n",
198 |         "        self.encoder2=\n",
199 |         "        self.encoder3=\n",
200 |         "        # decoder blocks\n",
201 |         "        self.decoder1=\n",
202 |         "        self.decoder2=\n",
203 |         "        self.decoder3=\n",
204 |         "\n",
205 |         "\n",
206 |         "    def forward(self,x):\n",
207 |         "        # YOUR CODE HERE\n",
208 |         "        return"
209 |       ],
210 |       "metadata": {
211 |         "id": "_Elr1Uw3uITD"
212 |       },
213 |       "execution_count": null,
214 |       "outputs": []
215 |     },
216 |     {
217 |       "cell_type": "markdown",
218 |       "source": [
219 |         "### Train script (0.5 point)\n",
220 |         "\n",
221 |         "Complete the train and predict scripts."
222 |       ],
223 |       "metadata": {
224 |         "id": "7Sq4WwZsuMeD"
225 |       }
226 |     },
227 |     {
228 |       "cell_type": "code",
229 |       "execution_count": null,
230 |       "metadata": {
231 |         "id": "d_ha44iifROE"
232 |       },
233 |       "outputs": [],
234 |       "source": [
235 |         "def train_segmentation_model(data_path):\n",
236 |         "    BATCH_SIZE = 8\n",
237 |         "    N_EPOCH = 15\n",
238 |         "    DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
239 |         "\n",
240 |         "    train_dataset = BirdsDataset(data_path + 'train')\n",
241 |         "    val_dataset = BirdsDataset(data_path + 'val')\n",
242 |         "    train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)\n",
243 |         "    val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)\n",
244 |         "\n",
245 |         "    model = Unet().to(DEVICE)\n",
246 |         "    optimizer = # YOUR CODE HERE\n",
247 |         "    criterion = # YOUR CODE HERE\n",
248 |         "    losses_train, losses_val, ious_train, ious_val = [], [], [], []\n",
249 |         "\n",
250 |         "    for epoch in range(N_EPOCH):\n",
251 |         "        model.train()\n",
252 |         "\n",
253 |         "        for tqdm(inputs, masks) in train_dataloader:\n",
254 |         "            inputs = inputs.to(DEVICE)\n",
255 |         "            masks = masks.to(DEVICE)\n",
256 |         "            # YOUR CODE HERE\n",
257 |         "        losses_train.append(...)\n",
258 |         "        ious_train.append(...)\n",
259 |         "\n",
260 |         "        model.eval()\n",
261 |         "        with torch.no_grad():\n",
262 |         "            for inputs, masks in tqdm(val_dataloader):\n",
263 |         "                inputs = inputs.to(DEVICE)\n",
264 |         "                masks = masks.to(DEVICE)\n",
265 |         "                # YOUR CODE HERE\n",
266 |         "        losses_val.append(...)\n",
267 |         "        ious_val.append(...)\n",
268 |         "\n",
269 |         "        torch.save(model.state_dict(), f'model_{epoch}.pth')\n",
270 |         "\n",
271 |         "        print(f\"Epoch: {epoch}, train loss: {losses_train[-1]}, val loss: {losses_val[-1]}, train iou: {ious_train[-1]}, val iou: {ious_val[-1]}\")"
272 |       ]
273 |     },
274 |     {
275 |       "cell_type": "code",
276 |       "source": [
277 |         "def predict(model, img_path):\n",
278 |         "    with torch.no_grad():\n",
279 |         "        # YOUR CODE HERE TO PREPARE IMAGE\n",
280 |         "        # GET PREDICTIONS\n",
281 |         "        # POST PROCESS\n",
282 |         "        return segm\n",
283 |         "\n",
284 |         "def get_model(path):\n",
285 |         "    model = Unet()\n",
286 |         "    model.load_state_dict(torch.load(path))\n",
287 |         "    model.eval()\n",
288 |         "    return model"
289 |       ],
290 |       "metadata": {
291 |         "id": "96EkIQmutpdS"
292 |       },
293 |       "execution_count": null,
294 |       "outputs": []
295 |     },
296 |     {
297 |       "cell_type": "code",
298 |       "execution_count": null,
299 |       "metadata": {
300 |         "id": "LzZS9Z2jfROF"
301 |       },
302 |       "outputs": [],
303 |       "source": [
304 |         "train_segmentation_model('data/')"
305 |       ]
306 |     },
307 |     {
308 |       "cell_type": "markdown",
309 |       "source": [
310 |         "You can also experiment with models and write a small report about results. If the report will be meaningful, you will receive an extra point."
311 |       ],
312 |       "metadata": {
313 |         "id": "MWKD09whySKA"
314 |       }
315 |     },
316 |     {
317 |       "cell_type": "markdown",
318 |       "source": [
319 |         "### Testing (8 points)\n",
320 |         "Your model will be tested on the new data, similar to validation, so use techniques to prevent overfitting the model.\n",
321 |         "\n",
322 |         "* IoU > 0.85 — 8 points\n",
323 |         "* IoU > 0.80 — 7 points\n",
324 |         "* IoU > 0.75 — 6 points\n",
325 |         "* IoU > 0.70 — 5 points\n",
326 |         "* IoU > 0.60 — 4 points\n",
327 |         "* IoU > 0.50 — 3 points\n",
328 |         "* IoU > 0.40 — 2 points\n",
329 |         "* IoU > 0.30 — 1 points"
330 |       ],
331 |       "metadata": {
332 |         "id": "zCHacSHutHo4"
333 |       }
334 |     },
335 |     {
336 |       "cell_type": "code",
337 |       "source": [
338 |         "model = get_model('model_14.pth').to('cuda')"
339 |       ],
340 |       "metadata": {
341 |         "id": "DZ6h11Q0tUHN"
342 |       },
343 |       "execution_count": null,
344 |       "outputs": []
345 |     },
346 |     {
347 |       "cell_type": "code",
348 |       "execution_count": null,
349 |       "metadata": {
350 |         "id": "yV9zadusfROF"
351 |       },
352 |       "outputs": [],
353 |       "source": [
354 |         "ious, times = [], []\n",
355 |         "test_dir = 'data/val/'\n",
356 |         "\n",
357 |         "for class_name in tqdm(sorted(os.listdir(os.path.join(test_dir, 'images')))):\n",
358 |         "    for img_name in sorted(os.listdir(os.path.join(test_dir, 'images', class_name))):\n",
359 |         "\n",
360 |         "        t_start = time()\n",
361 |         "        pred = predict(model, os.path.join(test_dir, 'images', class_name, img_name))\n",
362 |         "        times.append(time() - t_start)\n",
363 |         "\n",
364 |         "        gt_name = img_name.replace('jpg', 'png')\n",
365 |         "        gt = np.asarray(Image.open(os.path.join(test_dir, 'gt', class_name, gt_name)), dtype = np.uint8)\n",
366 |         "        if len(gt.shape) > 2:\n",
367 |         "            gt = gt[:, :, 0]\n",
368 |         "\n",
369 |         "        iou = get_iou(gt==255, pred>0.5)\n",
370 |         "        ious.append(iou)\n",
371 |         "\n",
372 |         "np.mean(ious), np.mean(times)"
373 |       ]
374 |     },
375 |     {
376 |       "cell_type": "markdown",
377 |       "source": [
378 |         "### Compression (1 point)"
379 |       ],
380 |       "metadata": {
381 |         "id": "47KgrqdpvKWS"
382 |       }
383 |     },
384 |     {
385 |       "cell_type": "markdown",
386 |       "source": [
387 |         "Try to speed up the model in any way without losing more than 1% in iou score.\n",
388 |         "For example [torch2trt](https://github.com/NVIDIA-AI-IOT/torch2trt)"
389 |       ],
390 |       "metadata": {
391 |         "id": "4kJiLB__vTC3"
392 |       }
393 |     },
394 |     {
395 |       "cell_type": "code",
396 |       "source": [
397 |         "def get_fast_model():\n",
398 |         "    # YOUR CODE HERE\n",
399 |         "    return model"
400 |       ],
401 |       "metadata": {
402 |         "id": "UQyNHbt0vtMu"
403 |       },
404 |       "execution_count": null,
405 |       "outputs": []
406 |     },
407 |     {
408 |       "cell_type": "code",
409 |       "source": [
410 |         "fast_model = get_fast_model().to('cuda')"
411 |       ],
412 |       "metadata": {
413 |         "id": "f2DedST0v6aF"
414 |       },
415 |       "execution_count": null,
416 |       "outputs": []
417 |     },
418 |     {
419 |       "cell_type": "code",
420 |       "source": [
421 |         "ious, times = [], []\n",
422 |         "test_dir = 'data/val/'\n",
423 |         "\n",
424 |         "for class_name in tqdm(sorted(os.listdir(os.path.join(test_dir, 'images')))):\n",
425 |         "    for img_name in sorted(os.listdir(os.path.join(test_dir, 'images', class_name))):\n",
426 |         "\n",
427 |         "        t_start = time()\n",
428 |         "        pred = predict(fast_model, os.path.join(test_dir, 'images', class_name, img_name))\n",
429 |         "        times.append(time() - t_start)\n",
430 |         "\n",
431 |         "        gt_name = img_name.replace('jpg', 'png')\n",
432 |         "        gt = np.asarray(Image.open(os.path.join(test_dir, 'gt', class_name, gt_name)), dtype = np.uint8)\n",
433 |         "        if len(gt.shape) > 2:\n",
434 |         "            gt = gt[:, :, 0]\n",
435 |         "\n",
436 |         "        iou = get_iou(gt==255, pred>0.5)\n",
437 |         "        ious.append(iou)\n",
438 |         "\n",
439 |         "np.mean(ious), np.mean(times)"
440 |       ],
441 |       "metadata": {
442 |         "id": "ryWUekS2vlv8"
443 |       },
444 |       "execution_count": null,
445 |       "outputs": []
446 |     },
447 |     {
448 |       "cell_type": "markdown",
449 |       "source": [
450 |         "**Bonus:** For the best iou score on test(without compression) in group you will get 1.5, 1, 0.5 extra points(for 1st, 2nd, 3rd places)."
451 |       ],
452 |       "metadata": {
453 |         "id": "QCdMgBoOwXAb"
454 |       }
455 |     },
456 |     {
457 |       "cell_type": "code",
458 |       "source": [],
459 |       "metadata": {
460 |         "id": "daanikNkwo5t"
461 |       },
462 |       "execution_count": null,
463 |       "outputs": []
464 |     }
465 |   ],
466 |   "metadata": {
467 |     "kernelspec": {
468 |       "display_name": "Python 3",
469 |       "name": "python3"
470 |     },
471 |     "language_info": {
472 |       "codemirror_mode": {
473 |         "name": "ipython",
474 |         "version": 3
475 |       },
476 |       "file_extension": ".py",
477 |       "mimetype": "text/x-python",
478 |       "name": "python",
479 |       "nbconvert_exporter": "python",
480 |       "pygments_lexer": "ipython3",
481 |       "version": "3.12.2"
482 |     },
483 |     "colab": {
484 |       "provenance": [],
485 |       "gpuType": "T4"
486 |     },
487 |     "accelerator": "GPU"
488 |   },
489 |   "nbformat": 4,
490 |   "nbformat_minor": 0
491 | }


--------------------------------------------------------------------------------
/week01_intro/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week01_intro/lecture.pdf


--------------------------------------------------------------------------------
/week02_init_regularization/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week02_init_regularization/lecture.pdf


--------------------------------------------------------------------------------
/week03_conv/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week03_conv/lecture.pdf


--------------------------------------------------------------------------------
/week04_tricks/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week04_tricks/lecture.pdf


--------------------------------------------------------------------------------
/week05_segmentation/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week05_segmentation/lecture.pdf


--------------------------------------------------------------------------------
/week06_detection/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week06_detection/lecture.pdf


--------------------------------------------------------------------------------
/week07_word_embeddings/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week07_word_embeddings/lecture.pdf


--------------------------------------------------------------------------------
/week07_word_embeddings/seminar.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |   "cells": [
  3 |     {
  4 |       "cell_type": "markdown",
  5 |       "metadata": {
  6 |         "id": "hchgr-3Nmn7o"
  7 |       },
  8 |       "source": [
  9 |         "## Seminar 1: Fun with Word Embeddings (3 points)\n",
 10 |         "\n",
 11 |         "Today we gonna play with word embeddings: train our own little embedding, load one from   gensim model zoo and use it to visualize text corpora.\n",
 12 |         "\n",
 13 |         "This whole thing is gonna happen on top of embedding dataset.\n",
 14 |         "\n",
 15 |         "__Requirements:__  `pip install --upgrade nltk gensim bokeh` , but only if you're running locally."
 16 |       ]
 17 |     },
 18 |     {
 19 |       "cell_type": "code",
 20 |       "execution_count": null,
 21 |       "metadata": {
 22 |         "collapsed": true,
 23 |         "id": "QmUCK9lVmn7q"
 24 |       },
 25 |       "outputs": [],
 26 |       "source": [
 27 |         "# download the data:\n",
 28 |         "!wget https://www.dropbox.com/s/obaitrix9jyu84r/quora.txt?dl=1 -O ./quora.txt\n",
 29 |         "# alternative download link: https://yadi.sk/i/BPQrUu1NaTduEw"
 30 |       ]
 31 |     },
 32 |     {
 33 |       "cell_type": "code",
 34 |       "execution_count": null,
 35 |       "metadata": {
 36 |         "scrolled": false,
 37 |         "id": "YyzusR4Lmn7r"
 38 |       },
 39 |       "outputs": [],
 40 |       "source": [
 41 |         "import numpy as np\n",
 42 |         "\n",
 43 |         "data = list(open(\"./quora.txt\", encoding=\"utf-8\"))\n",
 44 |         "data[50]"
 45 |       ]
 46 |     },
 47 |     {
 48 |       "cell_type": "markdown",
 49 |       "metadata": {
 50 |         "id": "jOTmojdtmn7r"
 51 |       },
 52 |       "source": [
 53 |         "__Tokenization:__ a typical first step for an nlp task is to split raw data into words.\n",
 54 |         "The text we're working with is in raw format: with all the punctuation and smiles attached to some words, so a simple str.split won't do.\n",
 55 |         "\n",
 56 |         "Let's use __`nltk`__ - a library that handles many nlp tasks like tokenization, stemming or part-of-speech tagging."
 57 |       ]
 58 |     },
 59 |     {
 60 |       "cell_type": "code",
 61 |       "execution_count": null,
 62 |       "metadata": {
 63 |         "id": "Jya8V2Skmn7r"
 64 |       },
 65 |       "outputs": [],
 66 |       "source": [
 67 |         "from nltk.tokenize import WordPunctTokenizer\n",
 68 |         "tokenizer = WordPunctTokenizer()\n",
 69 |         "\n",
 70 |         "print(tokenizer.tokenize(data[50]))"
 71 |       ]
 72 |     },
 73 |     {
 74 |       "cell_type": "code",
 75 |       "execution_count": null,
 76 |       "metadata": {
 77 |         "collapsed": true,
 78 |         "id": "kitrV92Amn7r"
 79 |       },
 80 |       "outputs": [],
 81 |       "source": [
 82 |         "# TASK: lowercase everything and extract tokens with tokenizer.\n",
 83 |         "# data_tok should be a list of lists of tokens for each line in data.\n",
 84 |         "\n",
 85 |         "data_tok = # YOUR CODE"
 86 |       ]
 87 |     },
 88 |     {
 89 |       "cell_type": "code",
 90 |       "execution_count": null,
 91 |       "metadata": {
 92 |         "collapsed": true,
 93 |         "id": "bD7uAQzgmn7r"
 94 |       },
 95 |       "outputs": [],
 96 |       "source": [
 97 |         "assert all(isinstance(row, (list, tuple)) for row in data_tok), \"please convert each line into a list of tokens (strings)\"\n",
 98 |         "assert all(all(isinstance(tok, str) for tok in row) for row in data_tok), \"please convert each line into a list of tokens (strings)\"\n",
 99 |         "is_latin = lambda tok: all('a' <= x.lower() <= 'z' for x in tok)\n",
100 |         "assert all(map(lambda l: not is_latin(l) or l.islower(), map(' '.join, data_tok))), \"please make sure to lowercase the data\""
101 |       ]
102 |     },
103 |     {
104 |       "cell_type": "code",
105 |       "execution_count": null,
106 |       "metadata": {
107 |         "id": "sm2nO5yzmn7s"
108 |       },
109 |       "outputs": [],
110 |       "source": [
111 |         "print([' '.join(row) for row in data_tok[:2]])"
112 |       ]
113 |     },
114 |     {
115 |       "cell_type": "markdown",
116 |       "metadata": {
117 |         "id": "RloDQkKSmn7s"
118 |       },
119 |       "source": [
120 |         "__Word vectors:__ as the saying goes, there's more than one way to train word embeddings. There's Word2Vec and GloVe with different objective functions. Then there's fasttext that uses character-level models to train word embeddings.\n",
121 |         "\n",
122 |         "The choice is huge, so let's start someplace small: __gensim__ is another nlp library that features many vector-based models incuding word2vec."
123 |       ]
124 |     },
125 |     {
126 |       "cell_type": "code",
127 |       "execution_count": null,
128 |       "metadata": {
129 |         "collapsed": true,
130 |         "id": "HT6ie7OWmn7s"
131 |       },
132 |       "outputs": [],
133 |       "source": [
134 |         "from gensim.models import Word2Vec\n",
135 |         "model = Word2Vec(data_tok,\n",
136 |         "                 vector_size=32,      # embedding vector size\n",
137 |         "                 min_count=5,  # consider words that occured at least 5 times\n",
138 |         "                 window=5).wv  # define context as a 5-word window around the target word"
139 |       ]
140 |     },
141 |     {
142 |       "cell_type": "code",
143 |       "execution_count": null,
144 |       "metadata": {
145 |         "id": "_utr_4ZEmn7s"
146 |       },
147 |       "outputs": [],
148 |       "source": [
149 |         "# now you can get word vectors !\n",
150 |         "model.get_vector('anything')"
151 |       ]
152 |     },
153 |     {
154 |       "cell_type": "code",
155 |       "execution_count": null,
156 |       "metadata": {
157 |         "id": "x7X2rBLImn7s"
158 |       },
159 |       "outputs": [],
160 |       "source": [
161 |         "# or query similar words directly. Go play with it!\n",
162 |         "model.most_similar('bread')"
163 |       ]
164 |     },
165 |     {
166 |       "cell_type": "markdown",
167 |       "metadata": {
168 |         "id": "varM16R3mn7t"
169 |       },
170 |       "source": [
171 |         "### Using pre-trained model\n",
172 |         "\n",
173 |         "Took it a while, huh? Now imagine training life-sized (100~300D) word embeddings on gigabytes of text: wikipedia articles or twitter posts.\n",
174 |         "\n",
175 |         "Thankfully, nowadays you can get a pre-trained word embedding model in 2 lines of code (no sms required, promise)."
176 |       ]
177 |     },
178 |     {
179 |       "cell_type": "code",
180 |       "execution_count": null,
181 |       "metadata": {
182 |         "collapsed": true,
183 |         "id": "oeiEoLrUmn7t"
184 |       },
185 |       "outputs": [],
186 |       "source": [
187 |         "import gensim.downloader as api\n",
188 |         "model = api.load('glove-twitter-100')"
189 |       ]
190 |     },
191 |     {
192 |       "cell_type": "code",
193 |       "execution_count": null,
194 |       "metadata": {
195 |         "id": "ysNoDw7Umn7t"
196 |       },
197 |       "outputs": [],
198 |       "source": [
199 |         "model.most_similar(positive=[\"coder\", \"money\"], negative=[\"brain\"])"
200 |       ]
201 |     },
202 |     {
203 |       "cell_type": "markdown",
204 |       "metadata": {
205 |         "id": "_Kde3hgNmn7t"
206 |       },
207 |       "source": [
208 |         "### Visualizing word vectors\n",
209 |         "\n",
210 |         "One way to see if our vectors are any good is to plot them. Thing is, those vectors are in 30D+ space and we humans are more used to 2-3D.\n",
211 |         "\n",
212 |         "Luckily, we machine learners know about __dimensionality reduction__ methods.\n",
213 |         "\n",
214 |         "Let's use that to plot 1000 most frequent words"
215 |       ]
216 |     },
217 |     {
218 |       "cell_type": "code",
219 |       "execution_count": null,
220 |       "metadata": {
221 |         "id": "l0yKTqYymn7t"
222 |       },
223 |       "outputs": [],
224 |       "source": [
225 |         "words = model.index_to_key[:1000]\n",
226 |         "\n",
227 |         "print(words[::100])"
228 |       ]
229 |     },
230 |     {
231 |       "cell_type": "code",
232 |       "execution_count": null,
233 |       "metadata": {
234 |         "id": "rLxEBnscmn7t"
235 |       },
236 |       "outputs": [],
237 |       "source": [
238 |         "# for each word, compute it's vector with model\n",
239 |         "word_vectors = # YOUR CODE"
240 |       ]
241 |     },
242 |     {
243 |       "cell_type": "code",
244 |       "execution_count": null,
245 |       "metadata": {
246 |         "collapsed": true,
247 |         "id": "lZ06vHSJmn7t"
248 |       },
249 |       "outputs": [],
250 |       "source": [
251 |         "assert isinstance(word_vectors, np.ndarray)\n",
252 |         "assert word_vectors.shape == (len(words), 100)\n",
253 |         "assert np.isfinite(word_vectors).all()"
254 |       ]
255 |     },
256 |     {
257 |       "cell_type": "markdown",
258 |       "metadata": {
259 |         "id": "S2wMcn29mn7t"
260 |       },
261 |       "source": [
262 |         "#### Linear projection: PCA\n",
263 |         "\n",
264 |         "The simplest linear dimensionality reduction method is __P__rincipial __C__omponent __A__nalysis.\n",
265 |         "\n",
266 |         "In geometric terms, PCA tries to find axes along which most of the variance occurs. The \"natural\" axes, if you wish.\n",
267 |         "\n",
268 |         "<img src=\"https://github.com/yandexdataschool/Practical_RL/raw/master/yet_another_week/_resource/pca_fish.png\" style=\"width:30%\">\n",
269 |         "\n",
270 |         "\n",
271 |         "Under the hood, it attempts to decompose object-feature matrix $X$ into two smaller matrices: $W$ and $\\hat W$ minimizing _mean squared error_:\n",
272 |         "\n",
273 |         "$$\\|(X W) \\hat{W} - X\\|^2_2 \\to_{W, \\hat{W}} \\min$$\n",
274 |         "- $X \\in \\mathbb{R}^{n \\times m}$ - object matrix (**centered**);\n",
275 |         "- $W \\in \\mathbb{R}^{m \\times d}$ - matrix of direct transformation;\n",
276 |         "- $\\hat{W} \\in \\mathbb{R}^{d \\times m}$ - matrix of reverse transformation;\n",
277 |         "- $n$ samples, $m$ original dimensions and $d$ target dimensions;\n",
278 |         "\n"
279 |       ]
280 |     },
281 |     {
282 |       "cell_type": "code",
283 |       "execution_count": null,
284 |       "metadata": {
285 |         "collapsed": true,
286 |         "id": "USPP-k-Imn7t"
287 |       },
288 |       "outputs": [],
289 |       "source": [
290 |         "from sklearn.decomposition import PCA\n",
291 |         "\n",
292 |         "# map word vectors onto 2d plane with PCA. Use good old sklearn api (fit, transform)\n",
293 |         "# after that, normalize vectors to make sure they have zero mean and unit variance\n",
294 |         "word_vectors_pca = # YOUR CODE\n",
295 |         "\n",
296 |         "# and maybe MORE OF YOUR CODE here :)"
297 |       ]
298 |     },
299 |     {
300 |       "cell_type": "code",
301 |       "execution_count": null,
302 |       "metadata": {
303 |         "collapsed": true,
304 |         "id": "NV_x7D4omn7t"
305 |       },
306 |       "outputs": [],
307 |       "source": [
308 |         "assert word_vectors_pca.shape == (len(word_vectors), 2), \"there must be a 2d vector for each word\"\n",
309 |         "assert max(abs(word_vectors_pca.mean(0))) < 1e-5, \"points must be zero-centered\""
310 |       ]
311 |     },
312 |     {
313 |       "cell_type": "markdown",
314 |       "metadata": {
315 |         "id": "VnybG7wHmn7t"
316 |       },
317 |       "source": [
318 |         "#### Let's draw it!"
319 |       ]
320 |     },
321 |     {
322 |       "cell_type": "code",
323 |       "execution_count": null,
324 |       "metadata": {
325 |         "id": "jo2-yN80mn7t"
326 |       },
327 |       "outputs": [],
328 |       "source": [
329 |         "import bokeh.models as bm, bokeh.plotting as pl\n",
330 |         "from bokeh.io import output_notebook\n",
331 |         "output_notebook()\n",
332 |         "\n",
333 |         "def draw_vectors(x, y, radius=10, alpha=0.25, color='blue',\n",
334 |         "                 width=600, height=400, show=True, **kwargs):\n",
335 |         "    \"\"\" draws an interactive plot for data points with auxilirary info on hover \"\"\"\n",
336 |         "    if isinstance(color, str): color = [color] * len(x)\n",
337 |         "    data_source = bm.ColumnDataSource({ 'x' : x, 'y' : y, 'color': color, **kwargs })\n",
338 |         "\n",
339 |         "    fig = pl.figure(active_scroll='wheel_zoom', width=width, height=height)\n",
340 |         "    fig.scatter('x', 'y', size=radius, color='color', alpha=alpha, source=data_source)\n",
341 |         "\n",
342 |         "    fig.add_tools(bm.HoverTool(tooltips=[(key, \"@\" + key) for key in kwargs.keys()]))\n",
343 |         "    if show: pl.show(fig)\n",
344 |         "    return fig"
345 |       ]
346 |     },
347 |     {
348 |       "cell_type": "code",
349 |       "execution_count": null,
350 |       "metadata": {
351 |         "id": "6J1c7Q9bmn7t"
352 |       },
353 |       "outputs": [],
354 |       "source": [
355 |         "draw_vectors(word_vectors_pca[:, 0], word_vectors_pca[:, 1], token=words)\n",
356 |         "\n",
357 |         "# hover a mouse over there and see if you can identify the clusters"
358 |       ]
359 |     },
360 |     {
361 |       "cell_type": "markdown",
362 |       "metadata": {
363 |         "id": "u9qhJAptmn7t"
364 |       },
365 |       "source": [
366 |         "### Visualizing neighbors with t-SNE\n",
367 |         "PCA is nice but it's strictly linear and thus only able to capture coarse high-level structure of the data.\n",
368 |         "\n",
369 |         "If we instead want to focus on keeping neighboring points near, we could use TSNE, which is itself an embedding method. Here you can read __[more on TSNE](https://distill.pub/2016/misread-tsne/)__."
370 |       ]
371 |     },
372 |     {
373 |       "cell_type": "code",
374 |       "execution_count": null,
375 |       "metadata": {
376 |         "id": "UeQ2ixkHmn7t"
377 |       },
378 |       "outputs": [],
379 |       "source": [
380 |         "from sklearn.manifold import TSNE\n",
381 |         "\n",
382 |         "# map word vectors onto 2d plane with TSNE. hint: don't panic it may take a minute or two to fit.\n",
383 |         "# normalize them as just lke with pca\n",
384 |         "\n",
385 |         "\n",
386 |         "word_tsne = #YOUR CODE"
387 |       ]
388 |     },
389 |     {
390 |       "cell_type": "code",
391 |       "execution_count": null,
392 |       "metadata": {
393 |         "collapsed": true,
394 |         "scrolled": false,
395 |         "id": "I5sA7faVmn7t"
396 |       },
397 |       "outputs": [],
398 |       "source": [
399 |         "draw_vectors(word_tsne[:, 0], word_tsne[:, 1], color='green', token=words)"
400 |       ]
401 |     },
402 |     {
403 |       "cell_type": "markdown",
404 |       "metadata": {
405 |         "id": "-j4S1Bwbmn7u"
406 |       },
407 |       "source": [
408 |         "### Visualizing phrases\n",
409 |         "\n",
410 |         "Word embeddings can also be used to represent short phrases. The simplest way is to take __an average__ of vectors for all tokens in the phrase with some weights.\n",
411 |         "\n",
412 |         "This trick is useful to identify what data are you working with: find if there are any outliers, clusters or other artefacts.\n",
413 |         "\n",
414 |         "Let's try this new hammer on our data!\n"
415 |       ]
416 |     },
417 |     {
418 |       "cell_type": "code",
419 |       "execution_count": null,
420 |       "metadata": {
421 |         "collapsed": true,
422 |         "id": "zWEBCqxQmn7u"
423 |       },
424 |       "outputs": [],
425 |       "source": [
426 |         "def get_phrase_embedding(phrase):\n",
427 |         "    \"\"\"\n",
428 |         "    Convert phrase to a vector by aggregating it's word embeddings. See description above.\n",
429 |         "    \"\"\"\n",
430 |         "    # 1. lowercase phrase\n",
431 |         "    # 2. tokenize phrase\n",
432 |         "    # 3. average word vectors for all words in tokenized phrase\n",
433 |         "    # skip words that are not in model's vocabulary\n",
434 |         "    # if all words are missing from vocabulary, return zeros\n",
435 |         "\n",
436 |         "    vector = np.zeros([model.vector_size], dtype='float32')\n",
437 |         "\n",
438 |         "    # YOUR CODE\n",
439 |         "\n",
440 |         "    return vector\n",
441 |         "\n"
442 |       ]
443 |     },
444 |     {
445 |       "cell_type": "code",
446 |       "execution_count": null,
447 |       "metadata": {
448 |         "collapsed": true,
449 |         "id": "Upwk1fsNmn7u"
450 |       },
451 |       "outputs": [],
452 |       "source": [
453 |         "vector = get_phrase_embedding(\"I'm very sure. This never happened to me before...\")\n",
454 |         "\n",
455 |         "assert np.allclose(vector[::10],\n",
456 |         "                   np.array([ 0.31807372, -0.02558171,  0.0933293 , -0.1002182 , -1.0278689 ,\n",
457 |         "                             -0.16621883,  0.05083408,  0.17989802,  1.3701859 ,  0.08655966],\n",
458 |         "                              dtype=np.float32))"
459 |       ]
460 |     },
461 |     {
462 |       "cell_type": "code",
463 |       "execution_count": null,
464 |       "metadata": {
465 |         "collapsed": true,
466 |         "id": "e1gQrVSVmn7u"
467 |       },
468 |       "outputs": [],
469 |       "source": [
470 |         "# let's only consider ~5k phrases for a first run.\n",
471 |         "chosen_phrases = data[::len(data) // 1000]\n",
472 |         "\n",
473 |         "# compute vectors for chosen phrases\n",
474 |         "phrase_vectors = # YOUR CODE"
475 |       ]
476 |     },
477 |     {
478 |       "cell_type": "code",
479 |       "execution_count": null,
480 |       "metadata": {
481 |         "collapsed": true,
482 |         "id": "pWXfU6rTmn7u"
483 |       },
484 |       "outputs": [],
485 |       "source": [
486 |         "assert isinstance(phrase_vectors, np.ndarray) and np.isfinite(phrase_vectors).all()\n",
487 |         "assert phrase_vectors.shape == (len(chosen_phrases), model.vector_size)"
488 |       ]
489 |     },
490 |     {
491 |       "cell_type": "code",
492 |       "execution_count": null,
493 |       "metadata": {
494 |         "collapsed": true,
495 |         "id": "g8P1tU0omn7u"
496 |       },
497 |       "outputs": [],
498 |       "source": [
499 |         "# map vectors into 2d space with pca, tsne or your other method of choice\n",
500 |         "# don't forget to normalize\n",
501 |         "\n",
502 |         "phrase_vectors_2d = TSNE().fit_transform(phrase_vectors)\n",
503 |         "\n",
504 |         "phrase_vectors_2d = (phrase_vectors_2d - phrase_vectors_2d.mean(axis=0)) / phrase_vectors_2d.std(axis=0)"
505 |       ]
506 |     },
507 |     {
508 |       "cell_type": "code",
509 |       "execution_count": null,
510 |       "metadata": {
511 |         "collapsed": true,
512 |         "id": "N_zCSz5Zmn7u"
513 |       },
514 |       "outputs": [],
515 |       "source": [
516 |         "draw_vectors(phrase_vectors_2d[:, 0], phrase_vectors_2d[:, 1],\n",
517 |         "             phrase=[phrase[:50] for phrase in chosen_phrases],\n",
518 |         "             radius=20,)"
519 |       ]
520 |     },
521 |     {
522 |       "cell_type": "markdown",
523 |       "metadata": {
524 |         "id": "ML_oG0Nlmn7u"
525 |       },
526 |       "source": [
527 |         "Finally, let's build a simple \"similar question\" engine with phrase embeddings we've built."
528 |       ]
529 |     },
530 |     {
531 |       "cell_type": "code",
532 |       "execution_count": null,
533 |       "metadata": {
534 |         "collapsed": true,
535 |         "id": "tfp3TEFpmn7u"
536 |       },
537 |       "outputs": [],
538 |       "source": [
539 |         "# compute vector embedding for all lines in data\n",
540 |         "data_vectors = np.array([get_phrase_embedding(l) for l in data])"
541 |       ]
542 |     },
543 |     {
544 |       "cell_type": "code",
545 |       "execution_count": null,
546 |       "metadata": {
547 |         "collapsed": true,
548 |         "id": "-F9ozB8umn7u"
549 |       },
550 |       "outputs": [],
551 |       "source": [
552 |         "def find_nearest(query, k=10):\n",
553 |         "    \"\"\"\n",
554 |         "    given text line (query), return k most similar lines from data, sorted from most to least similar\n",
555 |         "    similarity should be measured as cosine between query and line embedding vectors\n",
556 |         "    hint: it's okay to use global variables: data and data_vectors. see also: np.argpartition, np.argsort\n",
557 |         "    \"\"\"\n",
558 |         "    # YOUR CODE\n",
559 |         "\n",
560 |         "    return <YOUR CODE: top-k lines starting from most similar>"
561 |       ]
562 |     },
563 |     {
564 |       "cell_type": "code",
565 |       "execution_count": null,
566 |       "metadata": {
567 |         "collapsed": true,
568 |         "id": "9HKuytD-mn7u"
569 |       },
570 |       "outputs": [],
571 |       "source": [
572 |         "results = find_nearest(query=\"How do i enter the matrix?\", k=10)\n",
573 |         "\n",
574 |         "print(''.join(results))\n",
575 |         "\n",
576 |         "assert len(results) == 10 and isinstance(results[0], str)\n",
577 |         "assert results[0] == 'How do I get to the dark web?\\n'\n",
578 |         "assert results[3] == 'What can I do to save the world?\\n'"
579 |       ]
580 |     },
581 |     {
582 |       "cell_type": "code",
583 |       "execution_count": null,
584 |       "metadata": {
585 |         "collapsed": true,
586 |         "id": "YG1OIIQ3mn7y"
587 |       },
588 |       "outputs": [],
589 |       "source": [
590 |         "find_nearest(query=\"How does Trump?\", k=10)"
591 |       ]
592 |     },
593 |     {
594 |       "cell_type": "code",
595 |       "execution_count": null,
596 |       "metadata": {
597 |         "collapsed": true,
598 |         "id": "YZzWxmukmn7y"
599 |       },
600 |       "outputs": [],
601 |       "source": [
602 |         "find_nearest(query=\"Why don't i ask a question myself?\", k=10)"
603 |       ]
604 |     },
605 |     {
606 |       "cell_type": "markdown",
607 |       "metadata": {
608 |         "collapsed": true,
609 |         "id": "Oj76TY5Ymn7y"
610 |       },
611 |       "source": [
612 |         "__Now what?__\n",
613 |         "* Try running TSNE on all data, not just 1000 phrases\n",
614 |         "* See what other embeddings are there in the model zoo: `gensim.downloader.info()`\n",
615 |         "* Take a look at [FastText](https://github.com/facebookresearch/fastText) embeddings\n",
616 |         "* Optimize find_nearest with locality-sensitive hashing: use [nearpy](https://github.com/pixelogik/NearPy) or `sklearn.neighbors`."
617 |       ]
618 |     }
619 |   ],
620 |   "metadata": {
621 |     "kernelspec": {
622 |       "display_name": "Python 3",
623 |       "language": "python",
624 |       "name": "python3"
625 |     },
626 |     "language_info": {
627 |       "codemirror_mode": {
628 |         "name": "ipython",
629 |         "version": 3
630 |       },
631 |       "file_extension": ".py",
632 |       "mimetype": "text/x-python",
633 |       "name": "python",
634 |       "nbconvert_exporter": "python",
635 |       "pygments_lexer": "ipython3",
636 |       "version": "3.5.2"
637 |     },
638 |     "colab": {
639 |       "provenance": []
640 |     }
641 |   },
642 |   "nbformat": 4,
643 |   "nbformat_minor": 0
644 | }


--------------------------------------------------------------------------------
/week08_text_classification/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week08_text_classification/lecture.pdf


--------------------------------------------------------------------------------
/week09_transformer/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week09_transformer/lecture.pdf


--------------------------------------------------------------------------------
/week09_transformer/seminar.ipynb:
--------------------------------------------------------------------------------
   1 | {
   2 |   "cells": [
   3 |     {
   4 |       "cell_type": "code",
   5 |       "execution_count": null,
   6 |       "metadata": {
   7 |         "id": "zriTdjauH8iQ",
   8 |         "colab": {
   9 |           "base_uri": "https://localhost:8080/"
  10 |         },
  11 |         "outputId": "f21304a0-5eef-4ade-e088-948c5db9a171"
  12 |       },
  13 |       "outputs": [
  14 |         {
  15 |           "output_type": "stream",
  16 |           "name": "stdout",
  17 |           "text": [
  18 |             "\u001b[?25l   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/480.6 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K   \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m \u001b[32m471.0/480.6 kB\u001b[0m \u001b[31m22.9 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m480.6/480.6 kB\u001b[0m \u001b[31m10.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
  19 |             "\u001b[?25h\u001b[?25l   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/84.0 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.0/84.0 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
  20 |             "\u001b[?25h\u001b[?25l   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/116.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m7.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
  21 |             "\u001b[?25h\u001b[?25l   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/179.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m179.3/179.3 kB\u001b[0m \u001b[31m12.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
  22 |             "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m10.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
  23 |             "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m12.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
  24 |             "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
  25 |             "gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.\u001b[0m\u001b[31m\n",
  26 |             "\u001b[0m"
  27 |           ]
  28 |         }
  29 |       ],
  30 |       "source": [
  31 |         "!pip install transformers datasets evaluate -q\n",
  32 |         "import transformers"
  33 |       ]
  34 |     },
  35 |     {
  36 |       "cell_type": "markdown",
  37 |       "metadata": {
  38 |         "id": "xQiRPWWHlSgv"
  39 |       },
  40 |       "source": [
  41 |         "### Using pre-trained transformers (seminar is worth 2 points)\n",
  42 |         "_for fun and profit_\n",
  43 |         "\n",
  44 |         "There are many toolkits that let you access pre-trained transformer models, but the most powerful and convenient by far is [`huggingface/transformers`](https://github.com/huggingface/transformers). In this week's practice, you'll learn how to download, apply and modify pre-trained transformers for a range of tasks. Buckle up, we're going in!\n",
  45 |         "\n",
  46 |         "\n",
  47 |         "__Pipelines:__ if all you want is to apply a pre-trained model, you can do that in one line of code using pipeline. Huggingface/transformers has a selection of pre-configured pipelines for masked language modelling, sentiment classification, question aswering, etc. ([see full list here](https://huggingface.co/transformers/main_classes/pipelines.html))\n",
  48 |         "\n",
  49 |         "A typical pipeline includes:\n",
  50 |         "* pre-processing, e.g. tokenization, subword segmentation\n",
  51 |         "* a backbone model, e.g. bert finetuned for classification\n",
  52 |         "* output post-processing\n",
  53 |         "\n",
  54 |         "Let's see it in action:"
  55 |       ]
  56 |     },
  57 |     {
  58 |       "cell_type": "code",
  59 |       "execution_count": null,
  60 |       "metadata": {
  61 |         "id": "rP1KFtvLlJHR",
  62 |         "colab": {
  63 |           "base_uri": "https://localhost:8080/",
  64 |           "height": 284,
  65 |           "referenced_widgets": [
  66 |             "9fe16c621bd643318bc341864efb3e4d",
  67 |             "d930d2cef8ed4ef29db50c13b9103056",
  68 |             "e8bbd6a633f54344aced163ff020c6b9",
  69 |             "53fd63ef3e6940c3a1a7da0d26f5bf00",
  70 |             "05c022d131f1454e8d9674da8bba015d",
  71 |             "1b51b4f4463341c1acdb4c57dbf8130b",
  72 |             "9fa2a910690040aabcd2b05e449b0c45",
  73 |             "f0ff5f31afd4450ea185123a577df0cf",
  74 |             "be63ea3b752642b6bf7e74e7fdc87430",
  75 |             "8dffc51581f74d7c81c6bdb3feb7ae4f",
  76 |             "e9a828847dcf41b5bbc1eaa44fde2c95",
  77 |             "e4e616b460ef4b11b4c352d467f494dc",
  78 |             "7ce760b40b834494a1efa1b078985bff",
  79 |             "d6f3f9f5db014075bd4fbca506e943f8",
  80 |             "97716ed55a074ce0b52e68be926a09c3",
  81 |             "86551d955c684d43b7f51d52cec868ec",
  82 |             "368689f0e8974d7eae2fd497c996ba20",
  83 |             "826d46d71d35444a8123fffd107d1f1b",
  84 |             "82fb664d4d614a0f9bea5b297783f0d5",
  85 |             "28d6b5e50d4545559dd0d610b30c3b96",
  86 |             "33ee09dba6d5422896396a102f78dbba",
  87 |             "e0a71a1e15fc4070b3147560f0e7d694",
  88 |             "c57238fcb356435ea7ad8daf7761879a",
  89 |             "0e35c262e8d343f7b83016b8c61fd744",
  90 |             "9081142688d44540a28e4b08e4091270",
  91 |             "83fdaa28d4ae47fea15f4ec950f1825f",
  92 |             "493ff70e5b59414c8ff285fc50391b12",
  93 |             "f6eaee53284c427f99126187a1883081",
  94 |             "04c3cf767105473fb6756223ef2d0030",
  95 |             "7ffd07ee1d4d482d8a9f1e2d8ddb2772",
  96 |             "15898c9d06734d60b23ee92d03592c33",
  97 |             "92bef379c00f46e2b02ace047c50a9af",
  98 |             "9a0a067c17b64df2a1184986c2412b26",
  99 |             "668ed89f0cdd460498543af635f4dd68",
 100 |             "ae9729bae35748b9b757ab558d50bdc4",
 101 |             "31797fec525e43aa9f968b63e95b8aaa",
 102 |             "5ec34446ceeb4f37a9f9ca0e43103416",
 103 |             "f0d3166484384e15ae480e8d42504d52",
 104 |             "560e25624e6844c7b92f705b9a6d06f5",
 105 |             "521e0794f59a4d0c8a74858d8ca8802c",
 106 |             "d6ba19a6e2374b34bd7a954a0f29a95a",
 107 |             "d04c2e5c9a9d4dc59a4ef5d61f55faf7",
 108 |             "0b4f45ff04994accb494587a662a648a",
 109 |             "beb8dd797e774760b641d27ecde161bc"
 110 |           ]
 111 |         },
 112 |         "outputId": "fd2868b8-1a26-4873-da4d-011719825781"
 113 |       },
 114 |       "outputs": [
 115 |         {
 116 |           "output_type": "stream",
 117 |           "name": "stderr",
 118 |           "text": [
 119 |             "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n",
 120 |             "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
 121 |             "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
 122 |             "You will be able to reuse this secret in all of your notebooks.\n",
 123 |             "Please note that authentication is recommended but still optional to access public models or datasets.\n",
 124 |             "  warnings.warn(\n"
 125 |           ]
 126 |         },
 127 |         {
 128 |           "output_type": "display_data",
 129 |           "data": {
 130 |             "text/plain": [
 131 |               "config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]"
 132 |             ],
 133 |             "application/vnd.jupyter.widget-view+json": {
 134 |               "version_major": 2,
 135 |               "version_minor": 0,
 136 |               "model_id": "9fe16c621bd643318bc341864efb3e4d"
 137 |             }
 138 |           },
 139 |           "metadata": {}
 140 |         },
 141 |         {
 142 |           "output_type": "display_data",
 143 |           "data": {
 144 |             "text/plain": [
 145 |               "model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]"
 146 |             ],
 147 |             "application/vnd.jupyter.widget-view+json": {
 148 |               "version_major": 2,
 149 |               "version_minor": 0,
 150 |               "model_id": "e4e616b460ef4b11b4c352d467f494dc"
 151 |             }
 152 |           },
 153 |           "metadata": {}
 154 |         },
 155 |         {
 156 |           "output_type": "display_data",
 157 |           "data": {
 158 |             "text/plain": [
 159 |               "tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]"
 160 |             ],
 161 |             "application/vnd.jupyter.widget-view+json": {
 162 |               "version_major": 2,
 163 |               "version_minor": 0,
 164 |               "model_id": "c57238fcb356435ea7ad8daf7761879a"
 165 |             }
 166 |           },
 167 |           "metadata": {}
 168 |         },
 169 |         {
 170 |           "output_type": "display_data",
 171 |           "data": {
 172 |             "text/plain": [
 173 |               "vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]"
 174 |             ],
 175 |             "application/vnd.jupyter.widget-view+json": {
 176 |               "version_major": 2,
 177 |               "version_minor": 0,
 178 |               "model_id": "668ed89f0cdd460498543af635f4dd68"
 179 |             }
 180 |           },
 181 |           "metadata": {}
 182 |         },
 183 |         {
 184 |           "output_type": "stream",
 185 |           "name": "stderr",
 186 |           "text": [
 187 |             "Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.\n"
 188 |           ]
 189 |         },
 190 |         {
 191 |           "output_type": "stream",
 192 |           "name": "stdout",
 193 |           "text": [
 194 |             "[{'label': 'POSITIVE', 'score': 0.9998860359191895}]\n"
 195 |           ]
 196 |         }
 197 |       ],
 198 |       "source": [
 199 |         "from transformers import pipeline\n",
 200 |         "classifier = pipeline('sentiment-analysis', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
 201 |         "\n",
 202 |         "print(classifier(\"BERT is amazing!\"))"
 203 |       ]
 204 |     },
 205 |     {
 206 |       "cell_type": "code",
 207 |       "execution_count": null,
 208 |       "metadata": {
 209 |         "id": "nYUNuyXMn5l9"
 210 |       },
 211 |       "outputs": [],
 212 |       "source": [
 213 |         "import base64\n",
 214 |         "data = {\n",
 215 |         "    'arryn': 'As High as Honor.',\n",
 216 |         "    'baratheon': 'Ours is the fury.',\n",
 217 |         "    'stark': 'Winter is coming.',\n",
 218 |         "    'tyrell': 'Growing strong.'\n",
 219 |         "}\n",
 220 |         "\n",
 221 |         "# YOUR CODE: predict sentiment for each noble house and create outputs dict\n",
 222 |         "<...>\n",
 223 |         "outputs = <YOUR CODE: dict (house name) : True if positive, False if negative>\n",
 224 |         "\n",
 225 |         "assert sum(outputs.values()) == 3 and outputs[base64.decodebytes(b'YmFyYXRoZW9u\\n').decode()] == False\n",
 226 |         "print(\"Well done!\")"
 227 |       ]
 228 |     },
 229 |     {
 230 |       "cell_type": "markdown",
 231 |       "metadata": {
 232 |         "id": "BRDhIH-XpSNo"
 233 |       },
 234 |       "source": [
 235 |         "You can also access vanilla Masked Language Model that was trained to predict masked words. Here's how:"
 236 |       ]
 237 |     },
 238 |     {
 239 |       "cell_type": "code",
 240 |       "execution_count": null,
 241 |       "metadata": {
 242 |         "id": "pa-8noIllRbZ"
 243 |       },
 244 |       "outputs": [],
 245 |       "source": [
 246 |         "mlm_model = pipeline('fill-mask', model=\"bert-base-uncased\")\n",
 247 |         "MASK = mlm_model.tokenizer.mask_token\n",
 248 |         "\n",
 249 |         "for hypo in mlm_model(f\"Donald {MASK} is the president of the united states.\"):\n",
 250 |         "  print(f\"P={hypo['score']:.5f}\", hypo['sequence'])"
 251 |       ]
 252 |     },
 253 |     {
 254 |       "cell_type": "code",
 255 |       "execution_count": null,
 256 |       "metadata": {
 257 |         "id": "9NxeG1Y5pwX1"
 258 |       },
 259 |       "outputs": [],
 260 |       "source": [
 261 |         "# Your turn: use bert to recall what year was the Soviet Union founded in\n",
 262 |         "mlm_model(<YOUR PROMPT>)"
 263 |       ]
 264 |     },
 265 |     {
 266 |       "cell_type": "markdown",
 267 |       "metadata": {
 268 |         "id": "YJxRFzCSq903"
 269 |       },
 270 |       "source": [
 271 |         "```\n",
 272 |         "\n",
 273 |         "```\n",
 274 |         "\n",
 275 |         "```\n",
 276 |         "\n",
 277 |         "```\n",
 278 |         "\n",
 279 |         "\n",
 280 |         "Huggingface offers hundreds of pre-trained models that specialize on different tasks. You can quickly find the model you need using [this list](https://huggingface.co/models).\n"
 281 |       ]
 282 |     },
 283 |     {
 284 |       "cell_type": "code",
 285 |       "execution_count": null,
 286 |       "metadata": {
 287 |         "id": "HRux8Qp2hkXr"
 288 |       },
 289 |       "outputs": [],
 290 |       "source": [
 291 |         "text = \"\"\"Almost two-thirds of the 1.5 million people who viewed this liveblog had Googled to discover\n",
 292 |         " the latest on the Rosetta mission. They were treated to this detailed account by the Guardian’s science editor,\n",
 293 |         " Ian Sample, and astronomy writer Stuart Clark of the moment scientists landed a robotic spacecraft on a comet\n",
 294 |         " for the first time in history, and the delirious reaction it provoked at their headquarters in Germany.\n",
 295 |         "  “We are there. We are sitting on the surface. Philae is talking to us,” said one scientist.\n",
 296 |         "\"\"\"\n",
 297 |         "\n",
 298 |         "# Task: create a pipeline for named entity recognition, use task name 'ner' and search for the right model in the list\n",
 299 |         "ner_model = <YOUR CODE>\n",
 300 |         "\n",
 301 |         "named_entities = ner_model(text)"
 302 |       ]
 303 |     },
 304 |     {
 305 |       "cell_type": "code",
 306 |       "execution_count": null,
 307 |       "metadata": {
 308 |         "id": "hf57MRzSiSON"
 309 |       },
 310 |       "outputs": [],
 311 |       "source": [
 312 |         "print('OUTPUT:', named_entities)\n",
 313 |         "word_to_entity = {item['word']: item['entity'] for item in named_entities}\n",
 314 |         "assert 'org' in word_to_entity.get('Guardian').lower() and 'per' in word_to_entity.get('Stuart').lower()\n",
 315 |         "print(\"All tests passed\")"
 316 |       ]
 317 |     },
 318 |     {
 319 |       "cell_type": "markdown",
 320 |       "metadata": {
 321 |         "id": "ULMownz6sP9n"
 322 |       },
 323 |       "source": [
 324 |         "### The building blocks of a pipeline\n",
 325 |         "\n",
 326 |         "Huggingface also allows you to access its pipelines on a lower level. There are two main abstractions for you:\n",
 327 |         "* `Tokenizer` - converts from strings to token ids and back\n",
 328 |         "* `Model` - a pytorch `nn.Module` with pre-trained weights\n",
 329 |         "\n",
 330 |         "You can use such models as part of your regular pytorch code: insert is as a layer in your model, apply it to a batch of data, backpropagate, optimize, etc."
 331 |       ]
 332 |     },
 333 |     {
 334 |       "cell_type": "code",
 335 |       "execution_count": null,
 336 |       "metadata": {
 337 |         "id": "KMJbV0QVsO0Q"
 338 |       },
 339 |       "outputs": [],
 340 |       "source": [
 341 |         "import torch\n",
 342 |         "from transformers import AutoTokenizer, AutoModel, pipeline\n",
 343 |         "\n",
 344 |         "model_name = 'bert-base-uncased'\n",
 345 |         "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
 346 |         "model = AutoModel.from_pretrained(model_name)\n"
 347 |       ]
 348 |     },
 349 |     {
 350 |       "cell_type": "code",
 351 |       "execution_count": null,
 352 |       "metadata": {
 353 |         "id": "ZgSPHKPRxG6U"
 354 |       },
 355 |       "outputs": [],
 356 |       "source": [
 357 |         "lines = [\n",
 358 |         "    \"Luke, I am your father.\",\n",
 359 |         "    \"Life is what happens when you're busy making other plans.\",\n",
 360 |         "    ]\n",
 361 |         "\n",
 362 |         "# tokenize a batch of inputs. \"pt\" means [p]y[t]orch tensors\n",
 363 |         "tokens_info = tokenizer(lines, padding=True, truncation=True, return_tensors=\"pt\")\n",
 364 |         "\n",
 365 |         "for key in tokens_info:\n",
 366 |         "    print(key, tokens_info[key])\n",
 367 |         "\n",
 368 |         "print(\"Detokenized:\")\n",
 369 |         "for i in range(2):\n",
 370 |         "    print(tokenizer.decode(tokens_info['input_ids'][i]))"
 371 |       ]
 372 |     },
 373 |     {
 374 |       "cell_type": "code",
 375 |       "execution_count": null,
 376 |       "metadata": {
 377 |         "id": "MJkbHxERyfL4"
 378 |       },
 379 |       "outputs": [],
 380 |       "source": [
 381 |         "# You can now apply the model to get embeddings\n",
 382 |         "with torch.no_grad():\n",
 383 |         "    out = model(**tokens_info)\n",
 384 |         "\n",
 385 |         "print(out['pooler_output'])"
 386 |       ]
 387 |     },
 388 |     {
 389 |       "cell_type": "code",
 390 |       "execution_count": null,
 391 |       "metadata": {
 392 |         "id": "vWCajBGcAern"
 393 |       },
 394 |       "outputs": [],
 395 |       "source": [
 396 |         "import torch\n",
 397 |         "import numpy as np\n",
 398 |         "from transformers import GPT2Tokenizer, GPT2LMHeadModel\n",
 399 |         "\n",
 400 |         "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
 401 |         "tokenizer = GPT2Tokenizer.from_pretrained('gpt2', add_prefix_space=True)\n",
 402 |         "model = GPT2LMHeadModel.from_pretrained('gpt2').train(False).to(device)\n",
 403 |         "\n",
 404 |         "text = \"The Fermi paradox \"\n",
 405 |         "tokens = tokenizer.encode(text)\n",
 406 |         "num_steps = 1024 - len(tokens) + 1\n",
 407 |         "line_length, max_length = 0, 70\n",
 408 |         "\n",
 409 |         "print(end=tokenizer.decode(tokens))\n",
 410 |         "\n",
 411 |         "for i in range(num_steps):\n",
 412 |         "    with torch.no_grad():\n",
 413 |         "        logits = model(torch.as_tensor([tokens], device=device))[0]\n",
 414 |         "    p_next = torch.softmax(logits[0, -1, :], dim=-1).data.cpu().numpy()\n",
 415 |         "\n",
 416 |         "    next_token_index = p_next.argmax() #<YOUR CODE: REPLACE THIS LINE>\n",
 417 |         "    # YOUR TASK: change the code so that it performs nucleus sampling\n",
 418 |         "\n",
 419 |         "    tokens.append(int(next_token_index))\n",
 420 |         "    print(end=tokenizer.decode(tokens[-1]))\n",
 421 |         "    line_length += len(tokenizer.decode(tokens[-1]))\n",
 422 |         "    if line_length >= max_length:\n",
 423 |         "        line_length = 0\n",
 424 |         "        print()\n",
 425 |         "\n"
 426 |       ]
 427 |     },
 428 |     {
 429 |       "cell_type": "markdown",
 430 |       "metadata": {
 431 |         "id": "_Vij7Gc1wOaq"
 432 |       },
 433 |       "source": [
 434 |         "Transformers knowledge hub: https://huggingface.co/transformers/"
 435 |       ]
 436 |     },
 437 |     {
 438 |       "cell_type": "markdown",
 439 |       "source": [
 440 |         "Just pytorch (in particular) models, so we can train then as usual"
 441 |       ],
 442 |       "metadata": {
 443 |         "id": "QxMgH1-OGfw7"
 444 |       }
 445 |     },
 446 |     {
 447 |       "cell_type": "code",
 448 |       "source": [
 449 |         "from datasets import load_dataset\n",
 450 |         "from transformers import AutoTokenizer\n",
 451 |         "\n",
 452 |         "raw_datasets = load_dataset(\"glue\", \"mrpc\")\n",
 453 |         "checkpoint = \"bert-base-uncased\"\n",
 454 |         "tokenizer = AutoTokenizer.from_pretrained(checkpoint)\n",
 455 |         "\n",
 456 |         "\n",
 457 |         "def tokenize_function(example):\n",
 458 |         "    return tokenizer(example[\"sentence1\"], example[\"sentence2\"], truncation=True)\n",
 459 |         "\n",
 460 |         "\n",
 461 |         "tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)\n",
 462 |         "tokenized_datasets = tokenized_datasets.remove_columns([\"sentence1\", \"sentence2\", \"idx\"])\n",
 463 |         "tokenized_datasets = tokenized_datasets.rename_column(\"label\", \"labels\")\n",
 464 |         "tokenized_datasets.set_format(\"torch\")\n",
 465 |         "tokenized_datasets[\"train\"].column_names"
 466 |       ],
 467 |       "metadata": {
 468 |         "id": "oWjfs7ZQGhv9"
 469 |       },
 470 |       "execution_count": null,
 471 |       "outputs": []
 472 |     },
 473 |     {
 474 |       "cell_type": "code",
 475 |       "source": [
 476 |         "from torch.utils.data import DataLoader\n",
 477 |         "from transformers import DataCollatorWithPadding\n",
 478 |         "\n",
 479 |         "data_collator = DataCollatorWithPadding(tokenizer=tokenizer)\n",
 480 |         "\n",
 481 |         "train_dataloader = DataLoader(\n",
 482 |         "    tokenized_datasets[\"train\"], shuffle=True, batch_size=8, collate_fn=data_collator\n",
 483 |         ")\n",
 484 |         "eval_dataloader = DataLoader(\n",
 485 |         "    tokenized_datasets[\"validation\"], batch_size=8, collate_fn=data_collator\n",
 486 |         ")"
 487 |       ],
 488 |       "metadata": {
 489 |         "id": "uspRhoQNGrE3"
 490 |       },
 491 |       "execution_count": null,
 492 |       "outputs": []
 493 |     },
 494 |     {
 495 |       "cell_type": "code",
 496 |       "source": [
 497 |         "from transformers import AdamW, AutoModelForSequenceClassification, get_scheduler\n",
 498 |         "import torch\n",
 499 |         "from tqdm import tqdm\n",
 500 |         "\n",
 501 |         "model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)\n",
 502 |         "optimizer = AdamW(model.parameters(), lr=3e-5)\n",
 503 |         "\n",
 504 |         "device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n",
 505 |         "model.to(device)\n",
 506 |         "\n",
 507 |         "num_epochs = 3\n",
 508 |         "num_training_steps = num_epochs * len(train_dataloader)\n",
 509 |         "lr_scheduler = get_scheduler(\n",
 510 |         "    \"linear\",\n",
 511 |         "    optimizer=optimizer,\n",
 512 |         "    num_warmup_steps=0,\n",
 513 |         "    num_training_steps=num_training_steps,\n",
 514 |         ")\n",
 515 |         "\n",
 516 |         "progress_bar = tqdm(range(num_training_steps))\n",
 517 |         "\n",
 518 |         "model.train()\n",
 519 |         "for epoch in range(num_epochs):\n",
 520 |         "    for batch in train_dataloader:\n",
 521 |         "        batch = {k: v.to(device) for k, v in batch.items()}\n",
 522 |         "        outputs = model(**batch)\n",
 523 |         "        loss = outputs.loss\n",
 524 |         "        loss.backward()\n",
 525 |         "\n",
 526 |         "        optimizer.step()\n",
 527 |         "        lr_scheduler.step()\n",
 528 |         "        optimizer.zero_grad()\n",
 529 |         "        progress_bar.update(1)"
 530 |       ],
 531 |       "metadata": {
 532 |         "id": "vo7MlmdiG1Jx"
 533 |       },
 534 |       "execution_count": null,
 535 |       "outputs": []
 536 |     },
 537 |     {
 538 |       "cell_type": "code",
 539 |       "source": [
 540 |         "import evaluate\n",
 541 |         "\n",
 542 |         "metric = evaluate.load(\"glue\", \"mrpc\")\n",
 543 |         "model.eval()\n",
 544 |         "for batch in eval_dataloader:\n",
 545 |         "    batch = {k: v.to(device) for k, v in batch.items()}\n",
 546 |         "    with torch.no_grad():\n",
 547 |         "        outputs = model(**batch)\n",
 548 |         "\n",
 549 |         "    logits = outputs.logits\n",
 550 |         "    predictions = torch.argmax(logits, dim=-1)\n",
 551 |         "    metric.add_batch(predictions=predictions, references=batch[\"labels\"])\n",
 552 |         "\n",
 553 |         "metric.compute()"
 554 |       ],
 555 |       "metadata": {
 556 |         "id": "wqtMj9JbH8-l"
 557 |       },
 558 |       "execution_count": null,
 559 |       "outputs": []
 560 |     }
 561 |   ],
 562 |   "metadata": {
 563 |     "accelerator": "GPU",
 564 |     "colab": {
 565 |       "provenance": []
 566 |     },
 567 |     "kernelspec": {
 568 |       "display_name": "Python 3",
 569 |       "language": "python",
 570 |       "name": "python3"
 571 |     },
 572 |     "language_info": {
 573 |       "codemirror_mode": {
 574 |         "name": "ipython",
 575 |         "version": 3
 576 |       },
 577 |       "file_extension": ".py",
 578 |       "mimetype": "text/x-python",
 579 |       "name": "python",
 580 |       "nbconvert_exporter": "python",
 581 |       "pygments_lexer": "ipython3",
 582 |       "version": "3.8.8"
 583 |     },
 584 |     "widgets": {
 585 |       "application/vnd.jupyter.widget-state+json": {
 586 |         "9fe16c621bd643318bc341864efb3e4d": {
 587 |           "model_module": "@jupyter-widgets/controls",
 588 |           "model_name": "HBoxModel",
 589 |           "model_module_version": "1.5.0",
 590 |           "state": {
 591 |             "_dom_classes": [],
 592 |             "_model_module": "@jupyter-widgets/controls",
 593 |             "_model_module_version": "1.5.0",
 594 |             "_model_name": "HBoxModel",
 595 |             "_view_count": null,
 596 |             "_view_module": "@jupyter-widgets/controls",
 597 |             "_view_module_version": "1.5.0",
 598 |             "_view_name": "HBoxView",
 599 |             "box_style": "",
 600 |             "children": [
 601 |               "IPY_MODEL_d930d2cef8ed4ef29db50c13b9103056",
 602 |               "IPY_MODEL_e8bbd6a633f54344aced163ff020c6b9",
 603 |               "IPY_MODEL_53fd63ef3e6940c3a1a7da0d26f5bf00"
 604 |             ],
 605 |             "layout": "IPY_MODEL_05c022d131f1454e8d9674da8bba015d"
 606 |           }
 607 |         },
 608 |         "d930d2cef8ed4ef29db50c13b9103056": {
 609 |           "model_module": "@jupyter-widgets/controls",
 610 |           "model_name": "HTMLModel",
 611 |           "model_module_version": "1.5.0",
 612 |           "state": {
 613 |             "_dom_classes": [],
 614 |             "_model_module": "@jupyter-widgets/controls",
 615 |             "_model_module_version": "1.5.0",
 616 |             "_model_name": "HTMLModel",
 617 |             "_view_count": null,
 618 |             "_view_module": "@jupyter-widgets/controls",
 619 |             "_view_module_version": "1.5.0",
 620 |             "_view_name": "HTMLView",
 621 |             "description": "",
 622 |             "description_tooltip": null,
 623 |             "layout": "IPY_MODEL_1b51b4f4463341c1acdb4c57dbf8130b",
 624 |             "placeholder": "​",
 625 |             "style": "IPY_MODEL_9fa2a910690040aabcd2b05e449b0c45",
 626 |             "value": "config.json: 100%"
 627 |           }
 628 |         },
 629 |         "e8bbd6a633f54344aced163ff020c6b9": {
 630 |           "model_module": "@jupyter-widgets/controls",
 631 |           "model_name": "FloatProgressModel",
 632 |           "model_module_version": "1.5.0",
 633 |           "state": {
 634 |             "_dom_classes": [],
 635 |             "_model_module": "@jupyter-widgets/controls",
 636 |             "_model_module_version": "1.5.0",
 637 |             "_model_name": "FloatProgressModel",
 638 |             "_view_count": null,
 639 |             "_view_module": "@jupyter-widgets/controls",
 640 |             "_view_module_version": "1.5.0",
 641 |             "_view_name": "ProgressView",
 642 |             "bar_style": "success",
 643 |             "description": "",
 644 |             "description_tooltip": null,
 645 |             "layout": "IPY_MODEL_f0ff5f31afd4450ea185123a577df0cf",
 646 |             "max": 629,
 647 |             "min": 0,
 648 |             "orientation": "horizontal",
 649 |             "style": "IPY_MODEL_be63ea3b752642b6bf7e74e7fdc87430",
 650 |             "value": 629
 651 |           }
 652 |         },
 653 |         "53fd63ef3e6940c3a1a7da0d26f5bf00": {
 654 |           "model_module": "@jupyter-widgets/controls",
 655 |           "model_name": "HTMLModel",
 656 |           "model_module_version": "1.5.0",
 657 |           "state": {
 658 |             "_dom_classes": [],
 659 |             "_model_module": "@jupyter-widgets/controls",
 660 |             "_model_module_version": "1.5.0",
 661 |             "_model_name": "HTMLModel",
 662 |             "_view_count": null,
 663 |             "_view_module": "@jupyter-widgets/controls",
 664 |             "_view_module_version": "1.5.0",
 665 |             "_view_name": "HTMLView",
 666 |             "description": "",
 667 |             "description_tooltip": null,
 668 |             "layout": "IPY_MODEL_8dffc51581f74d7c81c6bdb3feb7ae4f",
 669 |             "placeholder": "​",
 670 |             "style": "IPY_MODEL_e9a828847dcf41b5bbc1eaa44fde2c95",
 671 |             "value": " 629/629 [00:00&lt;00:00, 26.2kB/s]"
 672 |           }
 673 |         },
 674 |         "05c022d131f1454e8d9674da8bba015d": {
 675 |           "model_module": "@jupyter-widgets/base",
 676 |           "model_name": "LayoutModel",
 677 |           "model_module_version": "1.2.0",
 678 |           "state": {
 679 |             "_model_module": "@jupyter-widgets/base",
 680 |             "_model_module_version": "1.2.0",
 681 |             "_model_name": "LayoutModel",
 682 |             "_view_count": null,
 683 |             "_view_module": "@jupyter-widgets/base",
 684 |             "_view_module_version": "1.2.0",
 685 |             "_view_name": "LayoutView",
 686 |             "align_content": null,
 687 |             "align_items": null,
 688 |             "align_self": null,
 689 |             "border": null,
 690 |             "bottom": null,
 691 |             "display": null,
 692 |             "flex": null,
 693 |             "flex_flow": null,
 694 |             "grid_area": null,
 695 |             "grid_auto_columns": null,
 696 |             "grid_auto_flow": null,
 697 |             "grid_auto_rows": null,
 698 |             "grid_column": null,
 699 |             "grid_gap": null,
 700 |             "grid_row": null,
 701 |             "grid_template_areas": null,
 702 |             "grid_template_columns": null,
 703 |             "grid_template_rows": null,
 704 |             "height": null,
 705 |             "justify_content": null,
 706 |             "justify_items": null,
 707 |             "left": null,
 708 |             "margin": null,
 709 |             "max_height": null,
 710 |             "max_width": null,
 711 |             "min_height": null,
 712 |             "min_width": null,
 713 |             "object_fit": null,
 714 |             "object_position": null,
 715 |             "order": null,
 716 |             "overflow": null,
 717 |             "overflow_x": null,
 718 |             "overflow_y": null,
 719 |             "padding": null,
 720 |             "right": null,
 721 |             "top": null,
 722 |             "visibility": null,
 723 |             "width": null
 724 |           }
 725 |         },
 726 |         "1b51b4f4463341c1acdb4c57dbf8130b": {
 727 |           "model_module": "@jupyter-widgets/base",
 728 |           "model_name": "LayoutModel",
 729 |           "model_module_version": "1.2.0",
 730 |           "state": {
 731 |             "_model_module": "@jupyter-widgets/base",
 732 |             "_model_module_version": "1.2.0",
 733 |             "_model_name": "LayoutModel",
 734 |             "_view_count": null,
 735 |             "_view_module": "@jupyter-widgets/base",
 736 |             "_view_module_version": "1.2.0",
 737 |             "_view_name": "LayoutView",
 738 |             "align_content": null,
 739 |             "align_items": null,
 740 |             "align_self": null,
 741 |             "border": null,
 742 |             "bottom": null,
 743 |             "display": null,
 744 |             "flex": null,
 745 |             "flex_flow": null,
 746 |             "grid_area": null,
 747 |             "grid_auto_columns": null,
 748 |             "grid_auto_flow": null,
 749 |             "grid_auto_rows": null,
 750 |             "grid_column": null,
 751 |             "grid_gap": null,
 752 |             "grid_row": null,
 753 |             "grid_template_areas": null,
 754 |             "grid_template_columns": null,
 755 |             "grid_template_rows": null,
 756 |             "height": null,
 757 |             "justify_content": null,
 758 |             "justify_items": null,
 759 |             "left": null,
 760 |             "margin": null,
 761 |             "max_height": null,
 762 |             "max_width": null,
 763 |             "min_height": null,
 764 |             "min_width": null,
 765 |             "object_fit": null,
 766 |             "object_position": null,
 767 |             "order": null,
 768 |             "overflow": null,
 769 |             "overflow_x": null,
 770 |             "overflow_y": null,
 771 |             "padding": null,
 772 |             "right": null,
 773 |             "top": null,
 774 |             "visibility": null,
 775 |             "width": null
 776 |           }
 777 |         },
 778 |         "9fa2a910690040aabcd2b05e449b0c45": {
 779 |           "model_module": "@jupyter-widgets/controls",
 780 |           "model_name": "DescriptionStyleModel",
 781 |           "model_module_version": "1.5.0",
 782 |           "state": {
 783 |             "_model_module": "@jupyter-widgets/controls",
 784 |             "_model_module_version": "1.5.0",
 785 |             "_model_name": "DescriptionStyleModel",
 786 |             "_view_count": null,
 787 |             "_view_module": "@jupyter-widgets/base",
 788 |             "_view_module_version": "1.2.0",
 789 |             "_view_name": "StyleView",
 790 |             "description_width": ""
 791 |           }
 792 |         },
 793 |         "f0ff5f31afd4450ea185123a577df0cf": {
 794 |           "model_module": "@jupyter-widgets/base",
 795 |           "model_name": "LayoutModel",
 796 |           "model_module_version": "1.2.0",
 797 |           "state": {
 798 |             "_model_module": "@jupyter-widgets/base",
 799 |             "_model_module_version": "1.2.0",
 800 |             "_model_name": "LayoutModel",
 801 |             "_view_count": null,
 802 |             "_view_module": "@jupyter-widgets/base",
 803 |             "_view_module_version": "1.2.0",
 804 |             "_view_name": "LayoutView",
 805 |             "align_content": null,
 806 |             "align_items": null,
 807 |             "align_self": null,
 808 |             "border": null,
 809 |             "bottom": null,
 810 |             "display": null,
 811 |             "flex": null,
 812 |             "flex_flow": null,
 813 |             "grid_area": null,
 814 |             "grid_auto_columns": null,
 815 |             "grid_auto_flow": null,
 816 |             "grid_auto_rows": null,
 817 |             "grid_column": null,
 818 |             "grid_gap": null,
 819 |             "grid_row": null,
 820 |             "grid_template_areas": null,
 821 |             "grid_template_columns": null,
 822 |             "grid_template_rows": null,
 823 |             "height": null,
 824 |             "justify_content": null,
 825 |             "justify_items": null,
 826 |             "left": null,
 827 |             "margin": null,
 828 |             "max_height": null,
 829 |             "max_width": null,
 830 |             "min_height": null,
 831 |             "min_width": null,
 832 |             "object_fit": null,
 833 |             "object_position": null,
 834 |             "order": null,
 835 |             "overflow": null,
 836 |             "overflow_x": null,
 837 |             "overflow_y": null,
 838 |             "padding": null,
 839 |             "right": null,
 840 |             "top": null,
 841 |             "visibility": null,
 842 |             "width": null
 843 |           }
 844 |         },
 845 |         "be63ea3b752642b6bf7e74e7fdc87430": {
 846 |           "model_module": "@jupyter-widgets/controls",
 847 |           "model_name": "ProgressStyleModel",
 848 |           "model_module_version": "1.5.0",
 849 |           "state": {
 850 |             "_model_module": "@jupyter-widgets/controls",
 851 |             "_model_module_version": "1.5.0",
 852 |             "_model_name": "ProgressStyleModel",
 853 |             "_view_count": null,
 854 |             "_view_module": "@jupyter-widgets/base",
 855 |             "_view_module_version": "1.2.0",
 856 |             "_view_name": "StyleView",
 857 |             "bar_color": null,
 858 |             "description_width": ""
 859 |           }
 860 |         },
 861 |         "8dffc51581f74d7c81c6bdb3feb7ae4f": {
 862 |           "model_module": "@jupyter-widgets/base",
 863 |           "model_name": "LayoutModel",
 864 |           "model_module_version": "1.2.0",
 865 |           "state": {
 866 |             "_model_module": "@jupyter-widgets/base",
 867 |             "_model_module_version": "1.2.0",
 868 |             "_model_name": "LayoutModel",
 869 |             "_view_count": null,
 870 |             "_view_module": "@jupyter-widgets/base",
 871 |             "_view_module_version": "1.2.0",
 872 |             "_view_name": "LayoutView",
 873 |             "align_content": null,
 874 |             "align_items": null,
 875 |             "align_self": null,
 876 |             "border": null,
 877 |             "bottom": null,
 878 |             "display": null,
 879 |             "flex": null,
 880 |             "flex_flow": null,
 881 |             "grid_area": null,
 882 |             "grid_auto_columns": null,
 883 |             "grid_auto_flow": null,
 884 |             "grid_auto_rows": null,
 885 |             "grid_column": null,
 886 |             "grid_gap": null,
 887 |             "grid_row": null,
 888 |             "grid_template_areas": null,
 889 |             "grid_template_columns": null,
 890 |             "grid_template_rows": null,
 891 |             "height": null,
 892 |             "justify_content": null,
 893 |             "justify_items": null,
 894 |             "left": null,
 895 |             "margin": null,
 896 |             "max_height": null,
 897 |             "max_width": null,
 898 |             "min_height": null,
 899 |             "min_width": null,
 900 |             "object_fit": null,
 901 |             "object_position": null,
 902 |             "order": null,
 903 |             "overflow": null,
 904 |             "overflow_x": null,
 905 |             "overflow_y": null,
 906 |             "padding": null,
 907 |             "right": null,
 908 |             "top": null,
 909 |             "visibility": null,
 910 |             "width": null
 911 |           }
 912 |         },
 913 |         "e9a828847dcf41b5bbc1eaa44fde2c95": {
 914 |           "model_module": "@jupyter-widgets/controls",
 915 |           "model_name": "DescriptionStyleModel",
 916 |           "model_module_version": "1.5.0",
 917 |           "state": {
 918 |             "_model_module": "@jupyter-widgets/controls",
 919 |             "_model_module_version": "1.5.0",
 920 |             "_model_name": "DescriptionStyleModel",
 921 |             "_view_count": null,
 922 |             "_view_module": "@jupyter-widgets/base",
 923 |             "_view_module_version": "1.2.0",
 924 |             "_view_name": "StyleView",
 925 |             "description_width": ""
 926 |           }
 927 |         },
 928 |         "e4e616b460ef4b11b4c352d467f494dc": {
 929 |           "model_module": "@jupyter-widgets/controls",
 930 |           "model_name": "HBoxModel",
 931 |           "model_module_version": "1.5.0",
 932 |           "state": {
 933 |             "_dom_classes": [],
 934 |             "_model_module": "@jupyter-widgets/controls",
 935 |             "_model_module_version": "1.5.0",
 936 |             "_model_name": "HBoxModel",
 937 |             "_view_count": null,
 938 |             "_view_module": "@jupyter-widgets/controls",
 939 |             "_view_module_version": "1.5.0",
 940 |             "_view_name": "HBoxView",
 941 |             "box_style": "",
 942 |             "children": [
 943 |               "IPY_MODEL_7ce760b40b834494a1efa1b078985bff",
 944 |               "IPY_MODEL_d6f3f9f5db014075bd4fbca506e943f8",
 945 |               "IPY_MODEL_97716ed55a074ce0b52e68be926a09c3"
 946 |             ],
 947 |             "layout": "IPY_MODEL_86551d955c684d43b7f51d52cec868ec"
 948 |           }
 949 |         },
 950 |         "7ce760b40b834494a1efa1b078985bff": {
 951 |           "model_module": "@jupyter-widgets/controls",
 952 |           "model_name": "HTMLModel",
 953 |           "model_module_version": "1.5.0",
 954 |           "state": {
 955 |             "_dom_classes": [],
 956 |             "_model_module": "@jupyter-widgets/controls",
 957 |             "_model_module_version": "1.5.0",
 958 |             "_model_name": "HTMLModel",
 959 |             "_view_count": null,
 960 |             "_view_module": "@jupyter-widgets/controls",
 961 |             "_view_module_version": "1.5.0",
 962 |             "_view_name": "HTMLView",
 963 |             "description": "",
 964 |             "description_tooltip": null,
 965 |             "layout": "IPY_MODEL_368689f0e8974d7eae2fd497c996ba20",
 966 |             "placeholder": "​",
 967 |             "style": "IPY_MODEL_826d46d71d35444a8123fffd107d1f1b",
 968 |             "value": "model.safetensors: 100%"
 969 |           }
 970 |         },
 971 |         "d6f3f9f5db014075bd4fbca506e943f8": {
 972 |           "model_module": "@jupyter-widgets/controls",
 973 |           "model_name": "FloatProgressModel",
 974 |           "model_module_version": "1.5.0",
 975 |           "state": {
 976 |             "_dom_classes": [],
 977 |             "_model_module": "@jupyter-widgets/controls",
 978 |             "_model_module_version": "1.5.0",
 979 |             "_model_name": "FloatProgressModel",
 980 |             "_view_count": null,
 981 |             "_view_module": "@jupyter-widgets/controls",
 982 |             "_view_module_version": "1.5.0",
 983 |             "_view_name": "ProgressView",
 984 |             "bar_style": "success",
 985 |             "description": "",
 986 |             "description_tooltip": null,
 987 |             "layout": "IPY_MODEL_82fb664d4d614a0f9bea5b297783f0d5",
 988 |             "max": 267832558,
 989 |             "min": 0,
 990 |             "orientation": "horizontal",
 991 |             "style": "IPY_MODEL_28d6b5e50d4545559dd0d610b30c3b96",
 992 |             "value": 267832558
 993 |           }
 994 |         },
 995 |         "97716ed55a074ce0b52e68be926a09c3": {
 996 |           "model_module": "@jupyter-widgets/controls",
 997 |           "model_name": "HTMLModel",
 998 |           "model_module_version": "1.5.0",
 999 |           "state": {
1000 |             "_dom_classes": [],
1001 |             "_model_module": "@jupyter-widgets/controls",
1002 |             "_model_module_version": "1.5.0",
1003 |             "_model_name": "HTMLModel",
1004 |             "_view_count": null,
1005 |             "_view_module": "@jupyter-widgets/controls",
1006 |             "_view_module_version": "1.5.0",
1007 |             "_view_name": "HTMLView",
1008 |             "description": "",
1009 |             "description_tooltip": null,
1010 |             "layout": "IPY_MODEL_33ee09dba6d5422896396a102f78dbba",
1011 |             "placeholder": "​",
1012 |             "style": "IPY_MODEL_e0a71a1e15fc4070b3147560f0e7d694",
1013 |             "value": " 268M/268M [00:01&lt;00:00, 226MB/s]"
1014 |           }
1015 |         },
1016 |         "86551d955c684d43b7f51d52cec868ec": {
1017 |           "model_module": "@jupyter-widgets/base",
1018 |           "model_name": "LayoutModel",
1019 |           "model_module_version": "1.2.0",
1020 |           "state": {
1021 |             "_model_module": "@jupyter-widgets/base",
1022 |             "_model_module_version": "1.2.0",
1023 |             "_model_name": "LayoutModel",
1024 |             "_view_count": null,
1025 |             "_view_module": "@jupyter-widgets/base",
1026 |             "_view_module_version": "1.2.0",
1027 |             "_view_name": "LayoutView",
1028 |             "align_content": null,
1029 |             "align_items": null,
1030 |             "align_self": null,
1031 |             "border": null,
1032 |             "bottom": null,
1033 |             "display": null,
1034 |             "flex": null,
1035 |             "flex_flow": null,
1036 |             "grid_area": null,
1037 |             "grid_auto_columns": null,
1038 |             "grid_auto_flow": null,
1039 |             "grid_auto_rows": null,
1040 |             "grid_column": null,
1041 |             "grid_gap": null,
1042 |             "grid_row": null,
1043 |             "grid_template_areas": null,
1044 |             "grid_template_columns": null,
1045 |             "grid_template_rows": null,
1046 |             "height": null,
1047 |             "justify_content": null,
1048 |             "justify_items": null,
1049 |             "left": null,
1050 |             "margin": null,
1051 |             "max_height": null,
1052 |             "max_width": null,
1053 |             "min_height": null,
1054 |             "min_width": null,
1055 |             "object_fit": null,
1056 |             "object_position": null,
1057 |             "order": null,
1058 |             "overflow": null,
1059 |             "overflow_x": null,
1060 |             "overflow_y": null,
1061 |             "padding": null,
1062 |             "right": null,
1063 |             "top": null,
1064 |             "visibility": null,
1065 |             "width": null
1066 |           }
1067 |         },
1068 |         "368689f0e8974d7eae2fd497c996ba20": {
1069 |           "model_module": "@jupyter-widgets/base",
1070 |           "model_name": "LayoutModel",
1071 |           "model_module_version": "1.2.0",
1072 |           "state": {
1073 |             "_model_module": "@jupyter-widgets/base",
1074 |             "_model_module_version": "1.2.0",
1075 |             "_model_name": "LayoutModel",
1076 |             "_view_count": null,
1077 |             "_view_module": "@jupyter-widgets/base",
1078 |             "_view_module_version": "1.2.0",
1079 |             "_view_name": "LayoutView",
1080 |             "align_content": null,
1081 |             "align_items": null,
1082 |             "align_self": null,
1083 |             "border": null,
1084 |             "bottom": null,
1085 |             "display": null,
1086 |             "flex": null,
1087 |             "flex_flow": null,
1088 |             "grid_area": null,
1089 |             "grid_auto_columns": null,
1090 |             "grid_auto_flow": null,
1091 |             "grid_auto_rows": null,
1092 |             "grid_column": null,
1093 |             "grid_gap": null,
1094 |             "grid_row": null,
1095 |             "grid_template_areas": null,
1096 |             "grid_template_columns": null,
1097 |             "grid_template_rows": null,
1098 |             "height": null,
1099 |             "justify_content": null,
1100 |             "justify_items": null,
1101 |             "left": null,
1102 |             "margin": null,
1103 |             "max_height": null,
1104 |             "max_width": null,
1105 |             "min_height": null,
1106 |             "min_width": null,
1107 |             "object_fit": null,
1108 |             "object_position": null,
1109 |             "order": null,
1110 |             "overflow": null,
1111 |             "overflow_x": null,
1112 |             "overflow_y": null,
1113 |             "padding": null,
1114 |             "right": null,
1115 |             "top": null,
1116 |             "visibility": null,
1117 |             "width": null
1118 |           }
1119 |         },
1120 |         "826d46d71d35444a8123fffd107d1f1b": {
1121 |           "model_module": "@jupyter-widgets/controls",
1122 |           "model_name": "DescriptionStyleModel",
1123 |           "model_module_version": "1.5.0",
1124 |           "state": {
1125 |             "_model_module": "@jupyter-widgets/controls",
1126 |             "_model_module_version": "1.5.0",
1127 |             "_model_name": "DescriptionStyleModel",
1128 |             "_view_count": null,
1129 |             "_view_module": "@jupyter-widgets/base",
1130 |             "_view_module_version": "1.2.0",
1131 |             "_view_name": "StyleView",
1132 |             "description_width": ""
1133 |           }
1134 |         },
1135 |         "82fb664d4d614a0f9bea5b297783f0d5": {
1136 |           "model_module": "@jupyter-widgets/base",
1137 |           "model_name": "LayoutModel",
1138 |           "model_module_version": "1.2.0",
1139 |           "state": {
1140 |             "_model_module": "@jupyter-widgets/base",
1141 |             "_model_module_version": "1.2.0",
1142 |             "_model_name": "LayoutModel",
1143 |             "_view_count": null,
1144 |             "_view_module": "@jupyter-widgets/base",
1145 |             "_view_module_version": "1.2.0",
1146 |             "_view_name": "LayoutView",
1147 |             "align_content": null,
1148 |             "align_items": null,
1149 |             "align_self": null,
1150 |             "border": null,
1151 |             "bottom": null,
1152 |             "display": null,
1153 |             "flex": null,
1154 |             "flex_flow": null,
1155 |             "grid_area": null,
1156 |             "grid_auto_columns": null,
1157 |             "grid_auto_flow": null,
1158 |             "grid_auto_rows": null,
1159 |             "grid_column": null,
1160 |             "grid_gap": null,
1161 |             "grid_row": null,
1162 |             "grid_template_areas": null,
1163 |             "grid_template_columns": null,
1164 |             "grid_template_rows": null,
1165 |             "height": null,
1166 |             "justify_content": null,
1167 |             "justify_items": null,
1168 |             "left": null,
1169 |             "margin": null,
1170 |             "max_height": null,
1171 |             "max_width": null,
1172 |             "min_height": null,
1173 |             "min_width": null,
1174 |             "object_fit": null,
1175 |             "object_position": null,
1176 |             "order": null,
1177 |             "overflow": null,
1178 |             "overflow_x": null,
1179 |             "overflow_y": null,
1180 |             "padding": null,
1181 |             "right": null,
1182 |             "top": null,
1183 |             "visibility": null,
1184 |             "width": null
1185 |           }
1186 |         },
1187 |         "28d6b5e50d4545559dd0d610b30c3b96": {
1188 |           "model_module": "@jupyter-widgets/controls",
1189 |           "model_name": "ProgressStyleModel",
1190 |           "model_module_version": "1.5.0",
1191 |           "state": {
1192 |             "_model_module": "@jupyter-widgets/controls",
1193 |             "_model_module_version": "1.5.0",
1194 |             "_model_name": "ProgressStyleModel",
1195 |             "_view_count": null,
1196 |             "_view_module": "@jupyter-widgets/base",
1197 |             "_view_module_version": "1.2.0",
1198 |             "_view_name": "StyleView",
1199 |             "bar_color": null,
1200 |             "description_width": ""
1201 |           }
1202 |         },
1203 |         "33ee09dba6d5422896396a102f78dbba": {
1204 |           "model_module": "@jupyter-widgets/base",
1205 |           "model_name": "LayoutModel",
1206 |           "model_module_version": "1.2.0",
1207 |           "state": {
1208 |             "_model_module": "@jupyter-widgets/base",
1209 |             "_model_module_version": "1.2.0",
1210 |             "_model_name": "LayoutModel",
1211 |             "_view_count": null,
1212 |             "_view_module": "@jupyter-widgets/base",
1213 |             "_view_module_version": "1.2.0",
1214 |             "_view_name": "LayoutView",
1215 |             "align_content": null,
1216 |             "align_items": null,
1217 |             "align_self": null,
1218 |             "border": null,
1219 |             "bottom": null,
1220 |             "display": null,
1221 |             "flex": null,
1222 |             "flex_flow": null,
1223 |             "grid_area": null,
1224 |             "grid_auto_columns": null,
1225 |             "grid_auto_flow": null,
1226 |             "grid_auto_rows": null,
1227 |             "grid_column": null,
1228 |             "grid_gap": null,
1229 |             "grid_row": null,
1230 |             "grid_template_areas": null,
1231 |             "grid_template_columns": null,
1232 |             "grid_template_rows": null,
1233 |             "height": null,
1234 |             "justify_content": null,
1235 |             "justify_items": null,
1236 |             "left": null,
1237 |             "margin": null,
1238 |             "max_height": null,
1239 |             "max_width": null,
1240 |             "min_height": null,
1241 |             "min_width": null,
1242 |             "object_fit": null,
1243 |             "object_position": null,
1244 |             "order": null,
1245 |             "overflow": null,
1246 |             "overflow_x": null,
1247 |             "overflow_y": null,
1248 |             "padding": null,
1249 |             "right": null,
1250 |             "top": null,
1251 |             "visibility": null,
1252 |             "width": null
1253 |           }
1254 |         },
1255 |         "e0a71a1e15fc4070b3147560f0e7d694": {
1256 |           "model_module": "@jupyter-widgets/controls",
1257 |           "model_name": "DescriptionStyleModel",
1258 |           "model_module_version": "1.5.0",
1259 |           "state": {
1260 |             "_model_module": "@jupyter-widgets/controls",
1261 |             "_model_module_version": "1.5.0",
1262 |             "_model_name": "DescriptionStyleModel",
1263 |             "_view_count": null,
1264 |             "_view_module": "@jupyter-widgets/base",
1265 |             "_view_module_version": "1.2.0",
1266 |             "_view_name": "StyleView",
1267 |             "description_width": ""
1268 |           }
1269 |         },
1270 |         "c57238fcb356435ea7ad8daf7761879a": {
1271 |           "model_module": "@jupyter-widgets/controls",
1272 |           "model_name": "HBoxModel",
1273 |           "model_module_version": "1.5.0",
1274 |           "state": {
1275 |             "_dom_classes": [],
1276 |             "_model_module": "@jupyter-widgets/controls",
1277 |             "_model_module_version": "1.5.0",
1278 |             "_model_name": "HBoxModel",
1279 |             "_view_count": null,
1280 |             "_view_module": "@jupyter-widgets/controls",
1281 |             "_view_module_version": "1.5.0",
1282 |             "_view_name": "HBoxView",
1283 |             "box_style": "",
1284 |             "children": [
1285 |               "IPY_MODEL_0e35c262e8d343f7b83016b8c61fd744",
1286 |               "IPY_MODEL_9081142688d44540a28e4b08e4091270",
1287 |               "IPY_MODEL_83fdaa28d4ae47fea15f4ec950f1825f"
1288 |             ],
1289 |             "layout": "IPY_MODEL_493ff70e5b59414c8ff285fc50391b12"
1290 |           }
1291 |         },
1292 |         "0e35c262e8d343f7b83016b8c61fd744": {
1293 |           "model_module": "@jupyter-widgets/controls",
1294 |           "model_name": "HTMLModel",
1295 |           "model_module_version": "1.5.0",
1296 |           "state": {
1297 |             "_dom_classes": [],
1298 |             "_model_module": "@jupyter-widgets/controls",
1299 |             "_model_module_version": "1.5.0",
1300 |             "_model_name": "HTMLModel",
1301 |             "_view_count": null,
1302 |             "_view_module": "@jupyter-widgets/controls",
1303 |             "_view_module_version": "1.5.0",
1304 |             "_view_name": "HTMLView",
1305 |             "description": "",
1306 |             "description_tooltip": null,
1307 |             "layout": "IPY_MODEL_f6eaee53284c427f99126187a1883081",
1308 |             "placeholder": "​",
1309 |             "style": "IPY_MODEL_04c3cf767105473fb6756223ef2d0030",
1310 |             "value": "tokenizer_config.json: 100%"
1311 |           }
1312 |         },
1313 |         "9081142688d44540a28e4b08e4091270": {
1314 |           "model_module": "@jupyter-widgets/controls",
1315 |           "model_name": "FloatProgressModel",
1316 |           "model_module_version": "1.5.0",
1317 |           "state": {
1318 |             "_dom_classes": [],
1319 |             "_model_module": "@jupyter-widgets/controls",
1320 |             "_model_module_version": "1.5.0",
1321 |             "_model_name": "FloatProgressModel",
1322 |             "_view_count": null,
1323 |             "_view_module": "@jupyter-widgets/controls",
1324 |             "_view_module_version": "1.5.0",
1325 |             "_view_name": "ProgressView",
1326 |             "bar_style": "success",
1327 |             "description": "",
1328 |             "description_tooltip": null,
1329 |             "layout": "IPY_MODEL_7ffd07ee1d4d482d8a9f1e2d8ddb2772",
1330 |             "max": 48,
1331 |             "min": 0,
1332 |             "orientation": "horizontal",
1333 |             "style": "IPY_MODEL_15898c9d06734d60b23ee92d03592c33",
1334 |             "value": 48
1335 |           }
1336 |         },
1337 |         "83fdaa28d4ae47fea15f4ec950f1825f": {
1338 |           "model_module": "@jupyter-widgets/controls",
1339 |           "model_name": "HTMLModel",
1340 |           "model_module_version": "1.5.0",
1341 |           "state": {
1342 |             "_dom_classes": [],
1343 |             "_model_module": "@jupyter-widgets/controls",
1344 |             "_model_module_version": "1.5.0",
1345 |             "_model_name": "HTMLModel",
1346 |             "_view_count": null,
1347 |             "_view_module": "@jupyter-widgets/controls",
1348 |             "_view_module_version": "1.5.0",
1349 |             "_view_name": "HTMLView",
1350 |             "description": "",
1351 |             "description_tooltip": null,
1352 |             "layout": "IPY_MODEL_92bef379c00f46e2b02ace047c50a9af",
1353 |             "placeholder": "​",
1354 |             "style": "IPY_MODEL_9a0a067c17b64df2a1184986c2412b26",
1355 |             "value": " 48.0/48.0 [00:00&lt;00:00, 2.55kB/s]"
1356 |           }
1357 |         },
1358 |         "493ff70e5b59414c8ff285fc50391b12": {
1359 |           "model_module": "@jupyter-widgets/base",
1360 |           "model_name": "LayoutModel",
1361 |           "model_module_version": "1.2.0",
1362 |           "state": {
1363 |             "_model_module": "@jupyter-widgets/base",
1364 |             "_model_module_version": "1.2.0",
1365 |             "_model_name": "LayoutModel",
1366 |             "_view_count": null,
1367 |             "_view_module": "@jupyter-widgets/base",
1368 |             "_view_module_version": "1.2.0",
1369 |             "_view_name": "LayoutView",
1370 |             "align_content": null,
1371 |             "align_items": null,
1372 |             "align_self": null,
1373 |             "border": null,
1374 |             "bottom": null,
1375 |             "display": null,
1376 |             "flex": null,
1377 |             "flex_flow": null,
1378 |             "grid_area": null,
1379 |             "grid_auto_columns": null,
1380 |             "grid_auto_flow": null,
1381 |             "grid_auto_rows": null,
1382 |             "grid_column": null,
1383 |             "grid_gap": null,
1384 |             "grid_row": null,
1385 |             "grid_template_areas": null,
1386 |             "grid_template_columns": null,
1387 |             "grid_template_rows": null,
1388 |             "height": null,
1389 |             "justify_content": null,
1390 |             "justify_items": null,
1391 |             "left": null,
1392 |             "margin": null,
1393 |             "max_height": null,
1394 |             "max_width": null,
1395 |             "min_height": null,
1396 |             "min_width": null,
1397 |             "object_fit": null,
1398 |             "object_position": null,
1399 |             "order": null,
1400 |             "overflow": null,
1401 |             "overflow_x": null,
1402 |             "overflow_y": null,
1403 |             "padding": null,
1404 |             "right": null,
1405 |             "top": null,
1406 |             "visibility": null,
1407 |             "width": null
1408 |           }
1409 |         },
1410 |         "f6eaee53284c427f99126187a1883081": {
1411 |           "model_module": "@jupyter-widgets/base",
1412 |           "model_name": "LayoutModel",
1413 |           "model_module_version": "1.2.0",
1414 |           "state": {
1415 |             "_model_module": "@jupyter-widgets/base",
1416 |             "_model_module_version": "1.2.0",
1417 |             "_model_name": "LayoutModel",
1418 |             "_view_count": null,
1419 |             "_view_module": "@jupyter-widgets/base",
1420 |             "_view_module_version": "1.2.0",
1421 |             "_view_name": "LayoutView",
1422 |             "align_content": null,
1423 |             "align_items": null,
1424 |             "align_self": null,
1425 |             "border": null,
1426 |             "bottom": null,
1427 |             "display": null,
1428 |             "flex": null,
1429 |             "flex_flow": null,
1430 |             "grid_area": null,
1431 |             "grid_auto_columns": null,
1432 |             "grid_auto_flow": null,
1433 |             "grid_auto_rows": null,
1434 |             "grid_column": null,
1435 |             "grid_gap": null,
1436 |             "grid_row": null,
1437 |             "grid_template_areas": null,
1438 |             "grid_template_columns": null,
1439 |             "grid_template_rows": null,
1440 |             "height": null,
1441 |             "justify_content": null,
1442 |             "justify_items": null,
1443 |             "left": null,
1444 |             "margin": null,
1445 |             "max_height": null,
1446 |             "max_width": null,
1447 |             "min_height": null,
1448 |             "min_width": null,
1449 |             "object_fit": null,
1450 |             "object_position": null,
1451 |             "order": null,
1452 |             "overflow": null,
1453 |             "overflow_x": null,
1454 |             "overflow_y": null,
1455 |             "padding": null,
1456 |             "right": null,
1457 |             "top": null,
1458 |             "visibility": null,
1459 |             "width": null
1460 |           }
1461 |         },
1462 |         "04c3cf767105473fb6756223ef2d0030": {
1463 |           "model_module": "@jupyter-widgets/controls",
1464 |           "model_name": "DescriptionStyleModel",
1465 |           "model_module_version": "1.5.0",
1466 |           "state": {
1467 |             "_model_module": "@jupyter-widgets/controls",
1468 |             "_model_module_version": "1.5.0",
1469 |             "_model_name": "DescriptionStyleModel",
1470 |             "_view_count": null,
1471 |             "_view_module": "@jupyter-widgets/base",
1472 |             "_view_module_version": "1.2.0",
1473 |             "_view_name": "StyleView",
1474 |             "description_width": ""
1475 |           }
1476 |         },
1477 |         "7ffd07ee1d4d482d8a9f1e2d8ddb2772": {
1478 |           "model_module": "@jupyter-widgets/base",
1479 |           "model_name": "LayoutModel",
1480 |           "model_module_version": "1.2.0",
1481 |           "state": {
1482 |             "_model_module": "@jupyter-widgets/base",
1483 |             "_model_module_version": "1.2.0",
1484 |             "_model_name": "LayoutModel",
1485 |             "_view_count": null,
1486 |             "_view_module": "@jupyter-widgets/base",
1487 |             "_view_module_version": "1.2.0",
1488 |             "_view_name": "LayoutView",
1489 |             "align_content": null,
1490 |             "align_items": null,
1491 |             "align_self": null,
1492 |             "border": null,
1493 |             "bottom": null,
1494 |             "display": null,
1495 |             "flex": null,
1496 |             "flex_flow": null,
1497 |             "grid_area": null,
1498 |             "grid_auto_columns": null,
1499 |             "grid_auto_flow": null,
1500 |             "grid_auto_rows": null,
1501 |             "grid_column": null,
1502 |             "grid_gap": null,
1503 |             "grid_row": null,
1504 |             "grid_template_areas": null,
1505 |             "grid_template_columns": null,
1506 |             "grid_template_rows": null,
1507 |             "height": null,
1508 |             "justify_content": null,
1509 |             "justify_items": null,
1510 |             "left": null,
1511 |             "margin": null,
1512 |             "max_height": null,
1513 |             "max_width": null,
1514 |             "min_height": null,
1515 |             "min_width": null,
1516 |             "object_fit": null,
1517 |             "object_position": null,
1518 |             "order": null,
1519 |             "overflow": null,
1520 |             "overflow_x": null,
1521 |             "overflow_y": null,
1522 |             "padding": null,
1523 |             "right": null,
1524 |             "top": null,
1525 |             "visibility": null,
1526 |             "width": null
1527 |           }
1528 |         },
1529 |         "15898c9d06734d60b23ee92d03592c33": {
1530 |           "model_module": "@jupyter-widgets/controls",
1531 |           "model_name": "ProgressStyleModel",
1532 |           "model_module_version": "1.5.0",
1533 |           "state": {
1534 |             "_model_module": "@jupyter-widgets/controls",
1535 |             "_model_module_version": "1.5.0",
1536 |             "_model_name": "ProgressStyleModel",
1537 |             "_view_count": null,
1538 |             "_view_module": "@jupyter-widgets/base",
1539 |             "_view_module_version": "1.2.0",
1540 |             "_view_name": "StyleView",
1541 |             "bar_color": null,
1542 |             "description_width": ""
1543 |           }
1544 |         },
1545 |         "92bef379c00f46e2b02ace047c50a9af": {
1546 |           "model_module": "@jupyter-widgets/base",
1547 |           "model_name": "LayoutModel",
1548 |           "model_module_version": "1.2.0",
1549 |           "state": {
1550 |             "_model_module": "@jupyter-widgets/base",
1551 |             "_model_module_version": "1.2.0",
1552 |             "_model_name": "LayoutModel",
1553 |             "_view_count": null,
1554 |             "_view_module": "@jupyter-widgets/base",
1555 |             "_view_module_version": "1.2.0",
1556 |             "_view_name": "LayoutView",
1557 |             "align_content": null,
1558 |             "align_items": null,
1559 |             "align_self": null,
1560 |             "border": null,
1561 |             "bottom": null,
1562 |             "display": null,
1563 |             "flex": null,
1564 |             "flex_flow": null,
1565 |             "grid_area": null,
1566 |             "grid_auto_columns": null,
1567 |             "grid_auto_flow": null,
1568 |             "grid_auto_rows": null,
1569 |             "grid_column": null,
1570 |             "grid_gap": null,
1571 |             "grid_row": null,
1572 |             "grid_template_areas": null,
1573 |             "grid_template_columns": null,
1574 |             "grid_template_rows": null,
1575 |             "height": null,
1576 |             "justify_content": null,
1577 |             "justify_items": null,
1578 |             "left": null,
1579 |             "margin": null,
1580 |             "max_height": null,
1581 |             "max_width": null,
1582 |             "min_height": null,
1583 |             "min_width": null,
1584 |             "object_fit": null,
1585 |             "object_position": null,
1586 |             "order": null,
1587 |             "overflow": null,
1588 |             "overflow_x": null,
1589 |             "overflow_y": null,
1590 |             "padding": null,
1591 |             "right": null,
1592 |             "top": null,
1593 |             "visibility": null,
1594 |             "width": null
1595 |           }
1596 |         },
1597 |         "9a0a067c17b64df2a1184986c2412b26": {
1598 |           "model_module": "@jupyter-widgets/controls",
1599 |           "model_name": "DescriptionStyleModel",
1600 |           "model_module_version": "1.5.0",
1601 |           "state": {
1602 |             "_model_module": "@jupyter-widgets/controls",
1603 |             "_model_module_version": "1.5.0",
1604 |             "_model_name": "DescriptionStyleModel",
1605 |             "_view_count": null,
1606 |             "_view_module": "@jupyter-widgets/base",
1607 |             "_view_module_version": "1.2.0",
1608 |             "_view_name": "StyleView",
1609 |             "description_width": ""
1610 |           }
1611 |         },
1612 |         "668ed89f0cdd460498543af635f4dd68": {
1613 |           "model_module": "@jupyter-widgets/controls",
1614 |           "model_name": "HBoxModel",
1615 |           "model_module_version": "1.5.0",
1616 |           "state": {
1617 |             "_dom_classes": [],
1618 |             "_model_module": "@jupyter-widgets/controls",
1619 |             "_model_module_version": "1.5.0",
1620 |             "_model_name": "HBoxModel",
1621 |             "_view_count": null,
1622 |             "_view_module": "@jupyter-widgets/controls",
1623 |             "_view_module_version": "1.5.0",
1624 |             "_view_name": "HBoxView",
1625 |             "box_style": "",
1626 |             "children": [
1627 |               "IPY_MODEL_ae9729bae35748b9b757ab558d50bdc4",
1628 |               "IPY_MODEL_31797fec525e43aa9f968b63e95b8aaa",
1629 |               "IPY_MODEL_5ec34446ceeb4f37a9f9ca0e43103416"
1630 |             ],
1631 |             "layout": "IPY_MODEL_f0d3166484384e15ae480e8d42504d52"
1632 |           }
1633 |         },
1634 |         "ae9729bae35748b9b757ab558d50bdc4": {
1635 |           "model_module": "@jupyter-widgets/controls",
1636 |           "model_name": "HTMLModel",
1637 |           "model_module_version": "1.5.0",
1638 |           "state": {
1639 |             "_dom_classes": [],
1640 |             "_model_module": "@jupyter-widgets/controls",
1641 |             "_model_module_version": "1.5.0",
1642 |             "_model_name": "HTMLModel",
1643 |             "_view_count": null,
1644 |             "_view_module": "@jupyter-widgets/controls",
1645 |             "_view_module_version": "1.5.0",
1646 |             "_view_name": "HTMLView",
1647 |             "description": "",
1648 |             "description_tooltip": null,
1649 |             "layout": "IPY_MODEL_560e25624e6844c7b92f705b9a6d06f5",
1650 |             "placeholder": "​",
1651 |             "style": "IPY_MODEL_521e0794f59a4d0c8a74858d8ca8802c",
1652 |             "value": "vocab.txt: 100%"
1653 |           }
1654 |         },
1655 |         "31797fec525e43aa9f968b63e95b8aaa": {
1656 |           "model_module": "@jupyter-widgets/controls",
1657 |           "model_name": "FloatProgressModel",
1658 |           "model_module_version": "1.5.0",
1659 |           "state": {
1660 |             "_dom_classes": [],
1661 |             "_model_module": "@jupyter-widgets/controls",
1662 |             "_model_module_version": "1.5.0",
1663 |             "_model_name": "FloatProgressModel",
1664 |             "_view_count": null,
1665 |             "_view_module": "@jupyter-widgets/controls",
1666 |             "_view_module_version": "1.5.0",
1667 |             "_view_name": "ProgressView",
1668 |             "bar_style": "success",
1669 |             "description": "",
1670 |             "description_tooltip": null,
1671 |             "layout": "IPY_MODEL_d6ba19a6e2374b34bd7a954a0f29a95a",
1672 |             "max": 231508,
1673 |             "min": 0,
1674 |             "orientation": "horizontal",
1675 |             "style": "IPY_MODEL_d04c2e5c9a9d4dc59a4ef5d61f55faf7",
1676 |             "value": 231508
1677 |           }
1678 |         },
1679 |         "5ec34446ceeb4f37a9f9ca0e43103416": {
1680 |           "model_module": "@jupyter-widgets/controls",
1681 |           "model_name": "HTMLModel",
1682 |           "model_module_version": "1.5.0",
1683 |           "state": {
1684 |             "_dom_classes": [],
1685 |             "_model_module": "@jupyter-widgets/controls",
1686 |             "_model_module_version": "1.5.0",
1687 |             "_model_name": "HTMLModel",
1688 |             "_view_count": null,
1689 |             "_view_module": "@jupyter-widgets/controls",
1690 |             "_view_module_version": "1.5.0",
1691 |             "_view_name": "HTMLView",
1692 |             "description": "",
1693 |             "description_tooltip": null,
1694 |             "layout": "IPY_MODEL_0b4f45ff04994accb494587a662a648a",
1695 |             "placeholder": "​",
1696 |             "style": "IPY_MODEL_beb8dd797e774760b641d27ecde161bc",
1697 |             "value": " 232k/232k [00:00&lt;00:00, 1.69MB/s]"
1698 |           }
1699 |         },
1700 |         "f0d3166484384e15ae480e8d42504d52": {
1701 |           "model_module": "@jupyter-widgets/base",
1702 |           "model_name": "LayoutModel",
1703 |           "model_module_version": "1.2.0",
1704 |           "state": {
1705 |             "_model_module": "@jupyter-widgets/base",
1706 |             "_model_module_version": "1.2.0",
1707 |             "_model_name": "LayoutModel",
1708 |             "_view_count": null,
1709 |             "_view_module": "@jupyter-widgets/base",
1710 |             "_view_module_version": "1.2.0",
1711 |             "_view_name": "LayoutView",
1712 |             "align_content": null,
1713 |             "align_items": null,
1714 |             "align_self": null,
1715 |             "border": null,
1716 |             "bottom": null,
1717 |             "display": null,
1718 |             "flex": null,
1719 |             "flex_flow": null,
1720 |             "grid_area": null,
1721 |             "grid_auto_columns": null,
1722 |             "grid_auto_flow": null,
1723 |             "grid_auto_rows": null,
1724 |             "grid_column": null,
1725 |             "grid_gap": null,
1726 |             "grid_row": null,
1727 |             "grid_template_areas": null,
1728 |             "grid_template_columns": null,
1729 |             "grid_template_rows": null,
1730 |             "height": null,
1731 |             "justify_content": null,
1732 |             "justify_items": null,
1733 |             "left": null,
1734 |             "margin": null,
1735 |             "max_height": null,
1736 |             "max_width": null,
1737 |             "min_height": null,
1738 |             "min_width": null,
1739 |             "object_fit": null,
1740 |             "object_position": null,
1741 |             "order": null,
1742 |             "overflow": null,
1743 |             "overflow_x": null,
1744 |             "overflow_y": null,
1745 |             "padding": null,
1746 |             "right": null,
1747 |             "top": null,
1748 |             "visibility": null,
1749 |             "width": null
1750 |           }
1751 |         },
1752 |         "560e25624e6844c7b92f705b9a6d06f5": {
1753 |           "model_module": "@jupyter-widgets/base",
1754 |           "model_name": "LayoutModel",
1755 |           "model_module_version": "1.2.0",
1756 |           "state": {
1757 |             "_model_module": "@jupyter-widgets/base",
1758 |             "_model_module_version": "1.2.0",
1759 |             "_model_name": "LayoutModel",
1760 |             "_view_count": null,
1761 |             "_view_module": "@jupyter-widgets/base",
1762 |             "_view_module_version": "1.2.0",
1763 |             "_view_name": "LayoutView",
1764 |             "align_content": null,
1765 |             "align_items": null,
1766 |             "align_self": null,
1767 |             "border": null,
1768 |             "bottom": null,
1769 |             "display": null,
1770 |             "flex": null,
1771 |             "flex_flow": null,
1772 |             "grid_area": null,
1773 |             "grid_auto_columns": null,
1774 |             "grid_auto_flow": null,
1775 |             "grid_auto_rows": null,
1776 |             "grid_column": null,
1777 |             "grid_gap": null,
1778 |             "grid_row": null,
1779 |             "grid_template_areas": null,
1780 |             "grid_template_columns": null,
1781 |             "grid_template_rows": null,
1782 |             "height": null,
1783 |             "justify_content": null,
1784 |             "justify_items": null,
1785 |             "left": null,
1786 |             "margin": null,
1787 |             "max_height": null,
1788 |             "max_width": null,
1789 |             "min_height": null,
1790 |             "min_width": null,
1791 |             "object_fit": null,
1792 |             "object_position": null,
1793 |             "order": null,
1794 |             "overflow": null,
1795 |             "overflow_x": null,
1796 |             "overflow_y": null,
1797 |             "padding": null,
1798 |             "right": null,
1799 |             "top": null,
1800 |             "visibility": null,
1801 |             "width": null
1802 |           }
1803 |         },
1804 |         "521e0794f59a4d0c8a74858d8ca8802c": {
1805 |           "model_module": "@jupyter-widgets/controls",
1806 |           "model_name": "DescriptionStyleModel",
1807 |           "model_module_version": "1.5.0",
1808 |           "state": {
1809 |             "_model_module": "@jupyter-widgets/controls",
1810 |             "_model_module_version": "1.5.0",
1811 |             "_model_name": "DescriptionStyleModel",
1812 |             "_view_count": null,
1813 |             "_view_module": "@jupyter-widgets/base",
1814 |             "_view_module_version": "1.2.0",
1815 |             "_view_name": "StyleView",
1816 |             "description_width": ""
1817 |           }
1818 |         },
1819 |         "d6ba19a6e2374b34bd7a954a0f29a95a": {
1820 |           "model_module": "@jupyter-widgets/base",
1821 |           "model_name": "LayoutModel",
1822 |           "model_module_version": "1.2.0",
1823 |           "state": {
1824 |             "_model_module": "@jupyter-widgets/base",
1825 |             "_model_module_version": "1.2.0",
1826 |             "_model_name": "LayoutModel",
1827 |             "_view_count": null,
1828 |             "_view_module": "@jupyter-widgets/base",
1829 |             "_view_module_version": "1.2.0",
1830 |             "_view_name": "LayoutView",
1831 |             "align_content": null,
1832 |             "align_items": null,
1833 |             "align_self": null,
1834 |             "border": null,
1835 |             "bottom": null,
1836 |             "display": null,
1837 |             "flex": null,
1838 |             "flex_flow": null,
1839 |             "grid_area": null,
1840 |             "grid_auto_columns": null,
1841 |             "grid_auto_flow": null,
1842 |             "grid_auto_rows": null,
1843 |             "grid_column": null,
1844 |             "grid_gap": null,
1845 |             "grid_row": null,
1846 |             "grid_template_areas": null,
1847 |             "grid_template_columns": null,
1848 |             "grid_template_rows": null,
1849 |             "height": null,
1850 |             "justify_content": null,
1851 |             "justify_items": null,
1852 |             "left": null,
1853 |             "margin": null,
1854 |             "max_height": null,
1855 |             "max_width": null,
1856 |             "min_height": null,
1857 |             "min_width": null,
1858 |             "object_fit": null,
1859 |             "object_position": null,
1860 |             "order": null,
1861 |             "overflow": null,
1862 |             "overflow_x": null,
1863 |             "overflow_y": null,
1864 |             "padding": null,
1865 |             "right": null,
1866 |             "top": null,
1867 |             "visibility": null,
1868 |             "width": null
1869 |           }
1870 |         },
1871 |         "d04c2e5c9a9d4dc59a4ef5d61f55faf7": {
1872 |           "model_module": "@jupyter-widgets/controls",
1873 |           "model_name": "ProgressStyleModel",
1874 |           "model_module_version": "1.5.0",
1875 |           "state": {
1876 |             "_model_module": "@jupyter-widgets/controls",
1877 |             "_model_module_version": "1.5.0",
1878 |             "_model_name": "ProgressStyleModel",
1879 |             "_view_count": null,
1880 |             "_view_module": "@jupyter-widgets/base",
1881 |             "_view_module_version": "1.2.0",
1882 |             "_view_name": "StyleView",
1883 |             "bar_color": null,
1884 |             "description_width": ""
1885 |           }
1886 |         },
1887 |         "0b4f45ff04994accb494587a662a648a": {
1888 |           "model_module": "@jupyter-widgets/base",
1889 |           "model_name": "LayoutModel",
1890 |           "model_module_version": "1.2.0",
1891 |           "state": {
1892 |             "_model_module": "@jupyter-widgets/base",
1893 |             "_model_module_version": "1.2.0",
1894 |             "_model_name": "LayoutModel",
1895 |             "_view_count": null,
1896 |             "_view_module": "@jupyter-widgets/base",
1897 |             "_view_module_version": "1.2.0",
1898 |             "_view_name": "LayoutView",
1899 |             "align_content": null,
1900 |             "align_items": null,
1901 |             "align_self": null,
1902 |             "border": null,
1903 |             "bottom": null,
1904 |             "display": null,
1905 |             "flex": null,
1906 |             "flex_flow": null,
1907 |             "grid_area": null,
1908 |             "grid_auto_columns": null,
1909 |             "grid_auto_flow": null,
1910 |             "grid_auto_rows": null,
1911 |             "grid_column": null,
1912 |             "grid_gap": null,
1913 |             "grid_row": null,
1914 |             "grid_template_areas": null,
1915 |             "grid_template_columns": null,
1916 |             "grid_template_rows": null,
1917 |             "height": null,
1918 |             "justify_content": null,
1919 |             "justify_items": null,
1920 |             "left": null,
1921 |             "margin": null,
1922 |             "max_height": null,
1923 |             "max_width": null,
1924 |             "min_height": null,
1925 |             "min_width": null,
1926 |             "object_fit": null,
1927 |             "object_position": null,
1928 |             "order": null,
1929 |             "overflow": null,
1930 |             "overflow_x": null,
1931 |             "overflow_y": null,
1932 |             "padding": null,
1933 |             "right": null,
1934 |             "top": null,
1935 |             "visibility": null,
1936 |             "width": null
1937 |           }
1938 |         },
1939 |         "beb8dd797e774760b641d27ecde161bc": {
1940 |           "model_module": "@jupyter-widgets/controls",
1941 |           "model_name": "DescriptionStyleModel",
1942 |           "model_module_version": "1.5.0",
1943 |           "state": {
1944 |             "_model_module": "@jupyter-widgets/controls",
1945 |             "_model_module_version": "1.5.0",
1946 |             "_model_name": "DescriptionStyleModel",
1947 |             "_view_count": null,
1948 |             "_view_module": "@jupyter-widgets/base",
1949 |             "_view_module_version": "1.2.0",
1950 |             "_view_name": "StyleView",
1951 |             "description_width": ""
1952 |           }
1953 |         }
1954 |       }
1955 |     }
1956 |   },
1957 |   "nbformat": 4,
1958 |   "nbformat_minor": 0
1959 | }


--------------------------------------------------------------------------------
/week10_gpt/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week10_gpt/lecture.pdf


--------------------------------------------------------------------------------
/week11_cv_transformers/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week11_cv_transformers/lecture.pdf


--------------------------------------------------------------------------------
/week12_gan/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week12_gan/lecture.pdf


--------------------------------------------------------------------------------
/week13_latent_models/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week13_latent_models/lecture.pdf


--------------------------------------------------------------------------------
/week14_representation_learning/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week14_representation_learning/.DS_Store


--------------------------------------------------------------------------------
/week14_representation_learning/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week14_representation_learning/lecture.pdf


--------------------------------------------------------------------------------