├── README.md ├── hw01 └── 1_Hw_Students.ipynb ├── hw02 └── 2_Hw_Students.ipynb ├── hw03 └── 3_Hw_Students.ipynb ├── hw05 └── 5_Hw_Students.ipynb ├── week01_intro ├── lecture.pdf └── seminar.ipynb ├── week02_init_regularization ├── lecture.pdf └── seminar.ipynb ├── week03_conv ├── lecture.pdf └── seminar.ipynb ├── week04_tricks ├── lecture.pdf └── seminar.ipynb ├── week05_segmentation ├── lecture.pdf └── seminar.ipynb ├── week06_detection └── lecture.pdf ├── week07_word_embeddings ├── lecture.pdf └── seminar.ipynb ├── week08_text_classification ├── lecture.pdf └── seminar.ipynb ├── week09_transformer ├── lecture.pdf └── seminar.ipynb ├── week10_gpt ├── lecture.pdf └── seminar.ipynb ├── week11_cv_transformers ├── lecture.pdf └── seminar.ipynb ├── week12_gan ├── lecture.pdf └── seminar.ipynb ├── week13_latent_models ├── lecture.pdf └── seminar.ipynb └── week14_representation_learning ├── .DS_Store ├── lecture.pdf └── seminar.ipynb /README.md: -------------------------------------------------------------------------------- 1 | # deep-learning-course -------------------------------------------------------------------------------- /hw01/1_Hw_Students.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "provenance": [], 7 | "gpuType": "T4", 8 | "toc_visible": true 9 | }, 10 | "kernelspec": { 11 | "name": "python3", 12 | "display_name": "Python 3" 13 | }, 14 | "language_info": { 15 | "name": "python" 16 | }, 17 | "accelerator": "GPU" 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "code", 22 | "source": [ 23 | "import torch\n", 24 | "import numpy as np\n", 25 | "import matplotlib.pyplot as plt\n", 26 | "from tqdm import tqdm\n", 27 | "from IPython.display import clear_output\n", 28 | "\n", 29 | "print(torch.__version__)" 30 | ], 31 | "metadata": { 32 | "id": "VFj8-qGfYA-2" 33 | }, 34 | "execution_count": null, 35 | "outputs": [] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "source": [ 40 | "DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'" 41 | ], 42 | "metadata": { 43 | "id": "cSuFlZPnrT8O" 44 | }, 45 | "execution_count": null, 46 | "outputs": [] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "source": [ 51 | "import sys, os\n", 52 | "if 'google.colab' in sys.modules and not os.path.exists('.setup_complete'):\n", 53 | " !wget -q https://raw.githubusercontent.com/yandexdataschool/deep_vision_and_graphics/fall22/week01-pytorch_intro/notmnist.py\n", 54 | " !touch .setup_complete" 55 | ], 56 | "metadata": { 57 | "id": "usRNEECdbR9F" 58 | }, 59 | "execution_count": null, 60 | "outputs": [] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "source": [ 65 | "# Task 1. Tensors (1 point)" 66 | ], 67 | "metadata": { 68 | "id": "u7FaNwW2X_v0" 69 | } 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "source": [ 74 | "Let's write another function, this time in polar coordinates:\n", 75 | "$$\\rho(\\theta) = (1 + 0.9 \\cdot cos (8 \\cdot \\theta) ) \\cdot (1 + 0.1 \\cdot cos(24 \\cdot \\theta)) \\cdot (0.9 + 0.05 \\cdot cos(200 \\cdot \\theta)) \\cdot (1 + sin(\\theta))$$\n", 76 | "\n", 77 | "\n", 78 | "Then convert it into cartesian coordinates ([howto](http://www.mathsisfun.com/polar-cartesian-coordinates.html)) and plot the results.\n", 79 | "\n", 80 | "Use torch tensors only: no lists, loops, numpy arrays, etc." 81 | ], 82 | "metadata": { 83 | "id": "gUtSrsCYaRdA" 84 | } 85 | }, 86 | { 87 | "cell_type": "code", 88 | "source": [ 89 | "theta = torch.linspace(- np.pi, np.pi, steps=1000)\n", 90 | "\n", 91 | "# compute rho(theta) as per formula above\n", 92 | "rho = YOUR CODE HERE\n", 93 | "\n", 94 | "# Now convert polar (rho, theta) pairs into cartesian (x,y) to plot them.\n", 95 | "x = YOUR CODE HERE\n", 96 | "y = YOUR CODE HERE\n", 97 | "\n", 98 | "\n", 99 | "plt.figure(figsize=[6, 6])\n", 100 | "plt.fill(x.numpy(), y.numpy(), color='green')\n", 101 | "plt.grid()" 102 | ], 103 | "metadata": { 104 | "id": "sTtmnC-EaIr5" 105 | }, 106 | "execution_count": null, 107 | "outputs": [] 108 | }, 109 | { 110 | "cell_type": "markdown", 111 | "source": [ 112 | "# Task 2: Going deeper (6 points)\n", 113 | "\n", 114 | "Your ultimate task here is to build your first neural network [almost] from scratch and pure PyTorch.\n", 115 | "\n", 116 | "This time you will solve the same digit recognition problem, but at a larger scale\n", 117 | "\n", 118 | "* 10 different letters\n", 119 | "* 20k samples\n", 120 | "\n", 121 | "We want you to build a network that __reaches at least 80% accuracy__ and has __at least 2 linear layers__ in it.\n", 122 | "\n", 123 | "With 10 classes you need __categorical crossentropy__ (see [here](http://wiki.fast.ai/index.php/Log_Loss)) loss. You can write it any way you want, but we recommend to use log_softmax function from pytorch, since it is more numerically stable.\n", 124 | "\n", 125 | "Note that you are not required to build 152-layer monsters here. A 2-layer (one hidden, one output) neural network should already give you nice score.\n", 126 | "\n", 127 | "__Win conditions:__\n", 128 | "* __Your model must be nonlinear,__ but not necessarily deep.\n", 129 | "* __Train your model with your own SGD__ - which you will have to implement\n", 130 | "* __For this task only, please do not use the contents of `torch.nn` and `torch.optim`.__ That's for the next task.\n", 131 | "* __Do not use Conv layers__\n", 132 | "\n", 133 | "**Bonus:** For the best score in group you get +1.5, 1.0, 0.5 point(1st. 2nd, 3rd places)." 134 | ], 135 | "metadata": { 136 | "id": "vcfxlYBna3Ke" 137 | } 138 | }, 139 | { 140 | "cell_type": "code", 141 | "source": [ 142 | "from notmnist import load_notmnist\n", 143 | "X_train, y_train, X_val, y_val = load_notmnist(letters='ABCDEFGHIJ')\n", 144 | "X_train, X_val = X_train.reshape([-1, 784]), X_val.reshape([-1, 784])" 145 | ], 146 | "metadata": { 147 | "id": "_uZn0Bdba3pH" 148 | }, 149 | "execution_count": null, 150 | "outputs": [] 151 | }, 152 | { 153 | "cell_type": "code", 154 | "source": [ 155 | "%matplotlib inline\n", 156 | "plt.figure(figsize=[12, 4])\n", 157 | "for i in range(20):\n", 158 | " plt.subplot(2, 10, i+1)\n", 159 | " plt.imshow(X_train[i].reshape([28, 28]))\n", 160 | " plt.title(str(y_train[i]))" 161 | ], 162 | "metadata": { 163 | "id": "3WJlL3PHbs2S" 164 | }, 165 | "execution_count": null, 166 | "outputs": [] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "source": [ 171 | "X_train.shape, y_train.shape, X_val.shape, y_val.shape" 172 | ], 173 | "metadata": { 174 | "id": "_M4n3fcDbvqu" 175 | }, 176 | "execution_count": null, 177 | "outputs": [] 178 | }, 179 | { 180 | "cell_type": "code", 181 | "source": [ 182 | "classes = np.unique(y_train)\n", 183 | "n_classes = len(classes)\n", 184 | "classes" 185 | ], 186 | "metadata": { 187 | "id": "bIH6GPb3djr7" 188 | }, 189 | "execution_count": null, 190 | "outputs": [] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "source": [ 195 | "class CustomNet:\n", 196 | " def __init__(self, hidden_size, in_size=28*28, num_classes=n_classes):\n", 197 | " # self.W = YOUR CODE HERE\n", 198 | " pass\n", 199 | " def forward(self, x):\n", 200 | " # YOUR CODE HERE\n", 201 | " pass" 202 | ], 203 | "metadata": { 204 | "id": "mc7bSgpCbzHN" 205 | }, 206 | "execution_count": null, 207 | "outputs": [] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "source": [ 212 | "net = CustomNet()\n", 213 | "out = net.forward(torch.randn(2, 28*28, device=DEVICE))\n", 214 | "assert len(out.shape) == 2\n", 215 | "assert out.shape[-1] == n_classes" 216 | ], 217 | "metadata": { 218 | "id": "649qrlZXfUB6" 219 | }, 220 | "execution_count": null, 221 | "outputs": [] 222 | }, 223 | { 224 | "cell_type": "code", 225 | "source": [ 226 | "import torch.nn.functional as F\n", 227 | "\n", 228 | "def cross_entropy_loss(logits, target):\n", 229 | " N = logits.size(0)\n", 230 | " # Get the log probabilities\n", 231 | " log_probs = # YOUR CODE HERE\n", 232 | " # Gather the log probabilities at the target indices\n", 233 | " log_probs_at_target = # YOUR CODE HERE\n", 234 | " # Compute the negative log likelihood\n", 235 | " nll = # YOUR CODE HERE\n", 236 | " return nll / N\n", 237 | "\n", 238 | "y_tmp = torch.tensor(y_train[:2], device=DEVICE)\n", 239 | "cross_entropy_loss(out, y_tmp), torch.nn.CrossEntropyLoss()(out, y_tmp)" 240 | ], 241 | "metadata": { 242 | "id": "HnnVGZSwmyu_" 243 | }, 244 | "execution_count": null, 245 | "outputs": [] 246 | }, 247 | { 248 | "cell_type": "code", 249 | "source": [ 250 | "class CustomSGD:\n", 251 | " def __init__(self, model, lr=1e-4):\n", 252 | " self.model = model\n", 253 | " self.lr = lr\n", 254 | "\n", 255 | " def step(self):\n", 256 | " with torch.no_grad():\n", 257 | " for param in # YOUR CODE HERE:\n", 258 | " # YOUR CODE HERE\n", 259 | " def zero_grad(self):\n", 260 | " for param in # YOUR CODE HERE:\n", 261 | " # YOUR CODE HERE" 262 | ], 263 | "metadata": { 264 | "id": "70swrTmCeBZx" 265 | }, 266 | "execution_count": null, 267 | "outputs": [] 268 | }, 269 | { 270 | "cell_type": "code", 271 | "source": [ 272 | "def iterate_minibatches(X, y, batch_size):\n", 273 | " indices = np.random.permutation(np.arange(len(X)))\n", 274 | " for start in range(0, len(indices), batch_size):\n", 275 | " ix = indices[start: start + batch_size]\n", 276 | " yield torch.from_numpy(X[ix]), torch.from_numpy(y[ix])" 277 | ], 278 | "metadata": { 279 | "id": "nVvO14e90YjC" 280 | }, 281 | "execution_count": null, 282 | "outputs": [] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "source": [ 287 | "def train(net, optimizer, loss_fn, n_epoch=20):\n", 288 | " loss_history = []\n", 289 | " acc_history = []\n", 290 | " val_loss_history = []\n", 291 | " val_acc_history = []\n", 292 | "\n", 293 | " for i in range(n_epoch):\n", 294 | " # Training\n", 295 | " # net.train()\n", 296 | " acc_batches=[]\n", 297 | " loss_batches=[]\n", 298 | " for x_batch, y_batch in iterate_minibatches(X_train, y_train, batch_size=64):\n", 299 | " # x_batch = # YOUR CODE HERE\n", 300 | " # y_batch = # YOUR CODE HERE\n", 301 | " # Forward\n", 302 | " # loss = # YOUR CODE HERE\n", 303 | "\n", 304 | " # Backward\n", 305 | " # YOUR CODE HERE\n", 306 | "\n", 307 | " # Accuracy\n", 308 | " acc_batches += (out.argmax(axis=1) == y_batch).detach().cpu().numpy().tolist()\n", 309 | "\n", 310 | " loss_history.append(np.mean(loss_batches))\n", 311 | " acc_history.append(np.mean(acc_batches))\n", 312 | "\n", 313 | " # Validating\n", 314 | " # net.eval()\n", 315 | " with torch.no_grad():\n", 316 | " acc_batches=[]\n", 317 | " loss_batches=[]\n", 318 | " for x_batch, y_batch in iterate_minibatches(X_val, y_val, batch_size=64):\n", 319 | " # x_batch = # YOUR CODE HERE\n", 320 | " # y_batch = # YOUR CODE HERE\n", 321 | " # Forward\n", 322 | " # loss = # YOUR CODE HERE\n", 323 | " # Accuracy\n", 324 | " acc_batches += (out.argmax(axis=1) == y_batch).detach().cpu().numpy().tolist()\n", 325 | "\n", 326 | " val_loss_history.append(np.mean(loss_batches))\n", 327 | " val_acc_history.append(np.mean(acc_batches))\n", 328 | "\n", 329 | " clear_output(wait=True)\n", 330 | " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))\n", 331 | " ax1.set_xlabel(\"#epoch\")\n", 332 | " ax1.set_ylabel(\"Loss\")\n", 333 | " ax1.plot(loss_history, 'b', label='train loss')\n", 334 | " ax1.plot(val_loss_history, 'r', label='val loss')\n", 335 | "\n", 336 | " ax2.set_xlabel(\"#epoch\")\n", 337 | " ax2.set_ylabel(\"Acc\")\n", 338 | " ax2.plot(acc_history, 'b', label='train acc')\n", 339 | " ax2.plot(val_acc_history, 'r', label='val acc')\n", 340 | " plt.axhline(y = 0.8, color = 'g', linestyle = '--')\n", 341 | "\n", 342 | " plt.legend()\n", 343 | " plt.show()\n", 344 | " return max(val_acc_history)" 345 | ], 346 | "metadata": { 347 | "id": "DjLxUiA1eLtZ" 348 | }, 349 | "execution_count": null, 350 | "outputs": [] 351 | }, 352 | { 353 | "cell_type": "code", 354 | "source": [ 355 | "net = # YOUR CODE HERE\n", 356 | "opt = # YOUR CODE HERE\n", 357 | "# train(net, opt, cross_entropy_loss)" 358 | ], 359 | "metadata": { 360 | "id": "Ps2HNwINqtVn" 361 | }, 362 | "execution_count": null, 363 | "outputs": [] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "source": [ 368 | "### Hints:\n", 369 | " - You'll have to use matrix W(feature_id x class_id)\n", 370 | " - Softmax (exp over sum of exps) can be implemented manually or as `torch.softmax`\n", 371 | " - Probably better to use STOCHASTIC gradient descent (minibatch) for greater speed\n", 372 | " - You need to train both layers, not just the output layer :)\n", 373 | " - 50 hidden neurons and a ReLU nonlinearity will do for a start. Many ways to improve.\n", 374 | " - In ideal case this totals to 2 `torch.matmul`'s, 1 softmax and 1 ReLU/sigmoid \n", 375 | " - If anything seems wrong, try going through one step of training and printing everything you compute.\n", 376 | " - If you see NaNs midway through optimization, you can estimate $\\log P(y \\mid x)$ as `torch.log_softmax(last_linear_layer_outputs)`." 377 | ], 378 | "metadata": { 379 | "id": "wwQYqNdugwmp" 380 | } 381 | }, 382 | { 383 | "cell_type": "markdown", 384 | "source": [ 385 | "# Task 3. Overfitting (4 points)\n" 386 | ], 387 | "metadata": { 388 | "id": "1fxSuZJwb1bx" 389 | } 390 | }, 391 | { 392 | "cell_type": "markdown", 393 | "source": [ 394 | "Today we work with [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) (*hint: it is available in `torchvision`*).\n", 395 | "\n", 396 | "Your goal for today:\n", 397 | "0. Fill the gaps in training loop and architectures.\n", 398 | "1. Train a tiny __FC__ network.\n", 399 | "2. Cause considerable overfitting by modifying the network (e.g. increasing the number of network parameters and/or layers) and demonstrate in in the appropriate way (e.g. plot loss and accurasy on train and validation set w.r.t. network complexity).\n", 400 | "3. Try to deal with overfitting (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results.\n", 401 | "\n", 402 | "Train a network that achieves $\\geq 0.885$ test accuracy. Again you should use only Linear (`nn.Linear`) layers and activations/dropout/batchnorm. Convolutional layers might be a great use, but we will meet them a bit later.\n", 403 | "\n", 404 | "__Please, write a small report describing your ideas, tries and achieved results in the end of this task.__\n", 405 | "\n", 406 | "*Note*: in task 3 your goal is to make the network from task 2 less prone to overfitting. And then to train the network that achives $\\geq 0.885$ test accuracy, so it can be different.\n", 407 | "\n", 408 | "**Bonus:** For the best score in group you get +1.5, 1.0, 0.5 point(1st, 2nd, 3rd places)." 409 | ], 410 | "metadata": { 411 | "id": "g9j3vEc9rsQC" 412 | } 413 | }, 414 | { 415 | "cell_type": "code", 416 | "source": [ 417 | "import torch\n", 418 | "import torch.nn as nn\n", 419 | "import torchvision\n", 420 | "import torchvision.transforms as transforms\n", 421 | "import torchsummary\n", 422 | "\n", 423 | "from matplotlib import pyplot as plt\n", 424 | "from matplotlib.pyplot import figure\n", 425 | "import numpy as np\n", 426 | "import os\n", 427 | "from tqdm import tqdm\n", 428 | "from sklearn.model_selection import train_test_split" 429 | ], 430 | "metadata": { 431 | "id": "O1fxVkDOb3QX" 432 | }, 433 | "execution_count": null, 434 | "outputs": [] 435 | }, 436 | { 437 | "cell_type": "code", 438 | "source": [ 439 | "# Technical function\n", 440 | "def mkdir(path):\n", 441 | " if not os.path.exists(root_path):\n", 442 | " os.mkdir(root_path)\n", 443 | " print('Directory', path, 'is created!')\n", 444 | " else:\n", 445 | " print('Directory', path, 'already exists!')\n", 446 | "\n", 447 | "root_path = 'fmnist'\n", 448 | "mkdir(root_path)" 449 | ], 450 | "metadata": { 451 | "id": "kzUtrIEgrwWi" 452 | }, 453 | "execution_count": null, 454 | "outputs": [] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": null, 459 | "metadata": { 460 | "id": "qt6LE7XaTDT9" 461 | }, 462 | "outputs": [], 463 | "source": [ 464 | "download = True\n", 465 | "train_transform = transforms.ToTensor()\n", 466 | "test_transform = transforms.ToTensor()\n", 467 | "\n", 468 | "fmnist_dataset_train = torchvision.datasets.FashionMNIST(root_path,\n", 469 | " train=True,\n", 470 | " transform=train_transform,\n", 471 | " target_transform=None,\n", 472 | " download=download)\n", 473 | "fmnist_dataset_test = torchvision.datasets.FashionMNIST(root_path,\n", 474 | " train=False,\n", 475 | " transform=test_transform,\n", 476 | " target_transform=None,\n", 477 | " download=download)" 478 | ] 479 | }, 480 | { 481 | "cell_type": "code", 482 | "source": [ 483 | "fmnist_dataset_train, fmnist_dataset_val = train_test_split(fmnist_dataset_train, train_size=50000)" 484 | ], 485 | "metadata": { 486 | "id": "15B67G4iGMIr" 487 | }, 488 | "execution_count": null, 489 | "outputs": [] 490 | }, 491 | { 492 | "cell_type": "code", 493 | "source": [ 494 | "len(fmnist_dataset_train), len(fmnist_dataset_val), len(fmnist_dataset_test)" 495 | ], 496 | "metadata": { 497 | "id": "cpbEkSCM1n3Z" 498 | }, 499 | "execution_count": null, 500 | "outputs": [] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": null, 505 | "metadata": { 506 | "id": "71YP0SPwTIxD" 507 | }, 508 | "outputs": [], 509 | "source": [ 510 | "train_loader = torch.utils.data.DataLoader(fmnist_dataset_train,\n", 511 | " batch_size=128,\n", 512 | " shuffle=True,\n", 513 | " num_workers=2)\n", 514 | "\n", 515 | "val_loader = torch.utils.data.DataLoader(fmnist_dataset_val,\n", 516 | " batch_size=256,\n", 517 | " shuffle=True,\n", 518 | " num_workers=2)\n", 519 | "\n", 520 | "test_loader = torch.utils.data.DataLoader(fmnist_dataset_test,\n", 521 | " batch_size=256,\n", 522 | " shuffle=False,\n", 523 | " num_workers=2)" 524 | ] 525 | }, 526 | { 527 | "cell_type": "code", 528 | "execution_count": null, 529 | "metadata": { 530 | "id": "aHca15bOTY4B" 531 | }, 532 | "outputs": [], 533 | "source": [ 534 | "for img, label in train_loader:\n", 535 | " print(img.shape)\n", 536 | " # print(img)\n", 537 | " print(label.shape)\n", 538 | " break\n", 539 | "\n", 540 | "plt.imshow(img[0, 0]);" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "source": [ 546 | "def train_val_loop(net, train_loader, val_loader, name, optimizer, criterion, n_epoch=20):\n", 547 | " loss_history = []\n", 548 | " acc_history = []\n", 549 | " val_loss_history = []\n", 550 | " val_acc_history = []\n", 551 | "\n", 552 | " for i in range(n_epoch):\n", 553 | " net.train()\n", 554 | " acc_batches=[]\n", 555 | " loss_batches=[]\n", 556 | " for x_batch, y_batch in train_loader:\n", 557 | " # x_batch = YOUR CODE HERE\n", 558 | " # y_batch = YOUR CODE HERE\n", 559 | "\n", 560 | " # Forward\n", 561 | " # loss = YOUR CODE HERE\n", 562 | "\n", 563 | " # Backward\n", 564 | " # ... YOUR CODE HERE\n", 565 | "\n", 566 | " # Accuracy\n", 567 | " # acc_batches = YOUR CODE HERE\n", 568 | "\n", 569 | " loss_history.append(np.mean(loss_batches))\n", 570 | " acc_history.append(np.mean(acc_batches))\n", 571 | "\n", 572 | " # Validating\n", 573 | " net.eval()\n", 574 | " with torch.no_grad():\n", 575 | " acc_batches=[]\n", 576 | " loss_batches=[]\n", 577 | " for x_batch, y_batch in val_loader:\n", 578 | " # x_batch = YOUR CODE HERE\n", 579 | " # y_batch = YOUR CODE HERE\n", 580 | "\n", 581 | " # Forward\n", 582 | " # loss = YOUR CODE HERE\n", 583 | "\n", 584 | " # Accuracy\n", 585 | " # acc_batches = YOUR CODE HERE\n", 586 | "\n", 587 | " val_loss_history.append(np.mean(loss_batches))\n", 588 | " val_acc_history.append(np.mean(acc_batches))\n", 589 | "\n", 590 | " clear_output(wait=True)\n", 591 | " plt.figure(figsize=(8, 5))\n", 592 | " plt.title(f\"Training/validating loss {name}\")\n", 593 | " plt.xlabel(\"#epoch\")\n", 594 | " plt.ylabel(\"Loss\")\n", 595 | " plt.plot(loss_history, 'b', label='train')\n", 596 | " plt.plot(val_loss_history, 'r', label='validation')\n", 597 | " plt.legend()\n", 598 | "\n", 599 | " plt.figure(figsize=(8, 5))\n", 600 | " plt.title(f\"Training/validating accuracy {name}\")\n", 601 | " plt.xlabel(\"#epoch\")\n", 602 | " plt.ylabel(\"Accuracy\")\n", 603 | " plt.plot(acc_history, 'b', label='train')\n", 604 | " plt.plot(val_acc_history, 'r', label='validation')\n", 605 | " plt.legend()\n", 606 | "\n", 607 | " plt.show()\n", 608 | "\n", 609 | "def test_accuracy(model):\n", 610 | " model.eval()\n", 611 | " test_acc_batches = []\n", 612 | " with torch.no_grad():\n", 613 | " for X_test, Y_test in test_loader:\n", 614 | " X_test = X_test.to(DEVICE)\n", 615 | " Y_test = Y_test.to(DEVICE)\n", 616 | " out = model.forward(X_test)\n", 617 | " test_acc_batches += (out.argmax(axis=1) == Y_test).detach().cpu().numpy().tolist()\n", 618 | " print(f'Test accuracy {np.mean(test_acc_batches)}')" 619 | ], 620 | "metadata": { 621 | "id": "QBu9fg_dAymN" 622 | }, 623 | "execution_count": null, 624 | "outputs": [] 625 | }, 626 | { 627 | "cell_type": "markdown", 628 | "metadata": { 629 | "id": "b6OOOffHTfX5" 630 | }, 631 | "source": [ 632 | "## Task 3.1 Tiny net\n", 633 | "Train a tiny network just to validate correctness of train loop, net architecture, params." 634 | ] 635 | }, 636 | { 637 | "cell_type": "code", 638 | "execution_count": null, 639 | "metadata": { 640 | "id": "ftpkTjxlTcFx" 641 | }, 642 | "outputs": [], 643 | "source": [ 644 | "class TinyNeuralNetwork(nn.Module):\n", 645 | " def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):\n", 646 | " super(self.__class__, self).__init__()\n", 647 | " self.model = nn.Sequential(\n", 648 | " nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards\n", 649 | " # YOUR CODE HERE\n", 650 | " )\n", 651 | "\n", 652 | " def forward(self, inp):\n", 653 | " out = self.model(inp)\n", 654 | " return out" 655 | ] 656 | }, 657 | { 658 | "cell_type": "code", 659 | "source": [ 660 | "model = TinyNeuralNetwork()\n", 661 | "out = model(torch.randn(2, 1, 28, 28))\n", 662 | "assert len(out.shape) == 2\n", 663 | "assert out.shape[-1] == 10" 664 | ], 665 | "metadata": { 666 | "id": "bFnYI29v4N1P" 667 | }, 668 | "execution_count": null, 669 | "outputs": [] 670 | }, 671 | { 672 | "cell_type": "code", 673 | "execution_count": null, 674 | "metadata": { 675 | "id": "EAhMwySkrlpq" 676 | }, 677 | "outputs": [], 678 | "source": [ 679 | "torchsummary.summary(TinyNeuralNetwork().to(DEVICE), (28*28,))" 680 | ] 681 | }, 682 | { 683 | "cell_type": "code", 684 | "execution_count": null, 685 | "metadata": { 686 | "id": "i3POFj90Ti-6" 687 | }, 688 | "outputs": [], 689 | "source": [ 690 | "# tiny_model = TinyNeuralNetwork().to(DEVICE)\n", 691 | "# opt = # YOUR CODE HERE\n", 692 | "# loss_func = # YOUR CODE HERE\n", 693 | "# n_epoch = # YOUR CODE HERE\n", 694 | "\n", 695 | "# Your experiments, come here\n", 696 | "# train_val_loop(tiny_model, train_loader=train_loader, val_loader=val_loader, name='tiny model', optimizer=opt, criterion=loss_func, n_epoch=n_epoch)" 697 | ] 698 | }, 699 | { 700 | "cell_type": "code", 701 | "source": [ 702 | "# test_accuracy(tiny_model)" 703 | ], 704 | "metadata": { 705 | "id": "XxRsu7pBIf5f" 706 | }, 707 | "execution_count": null, 708 | "outputs": [] 709 | }, 710 | { 711 | "cell_type": "markdown", 712 | "metadata": { 713 | "id": "L7ISqkjmCPB1" 714 | }, 715 | "source": [ 716 | "## Task 3.2: Overfit it.\n", 717 | "Build a network that will overfit to this dataset. Demonstrate the overfitting in the appropriate way (e.g. plot loss and accurasy on train and test set w.r.t. network complexity).\n", 718 | "\n", 719 | "*Note:* you also might decrease the size of `train` dataset to enforce the overfitting and speed up the computations." 720 | ] 721 | }, 722 | { 723 | "cell_type": "code", 724 | "execution_count": null, 725 | "metadata": { 726 | "id": "H12uAWiGBwJx" 727 | }, 728 | "outputs": [], 729 | "source": [ 730 | "class OverfittingNeuralNetwork(nn.Module):\n", 731 | " def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):\n", 732 | " super(self.__class__, self).__init__()\n", 733 | " self.model = nn.Sequential(\n", 734 | " nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards\n", 735 | " # YOUR CODE HERE\n", 736 | " )\n", 737 | "\n", 738 | " def forward(self, inp):\n", 739 | " out = self.model(inp)\n", 740 | " return out" 741 | ] 742 | }, 743 | { 744 | "cell_type": "code", 745 | "source": [ 746 | "model = OverfittingNeuralNetwork()\n", 747 | "out = model(torch.randn(2, 1, 28, 28))\n", 748 | "assert len(out.shape) == 2\n", 749 | "assert out.shape[-1] == 10" 750 | ], 751 | "metadata": { 752 | "id": "zR4muHZr5k8I" 753 | }, 754 | "execution_count": null, 755 | "outputs": [] 756 | }, 757 | { 758 | "cell_type": "code", 759 | "execution_count": null, 760 | "metadata": { 761 | "id": "JgXAKCpvCwqH" 762 | }, 763 | "outputs": [], 764 | "source": [ 765 | "torchsummary.summary(OverfittingNeuralNetwork().to(DEVICE), (28*28,))" 766 | ] 767 | }, 768 | { 769 | "cell_type": "code", 770 | "execution_count": null, 771 | "metadata": { 772 | "id": "Iyuwd4ZLrlpr" 773 | }, 774 | "outputs": [], 775 | "source": [ 776 | "# overfit_model = OverfittingNeuralNetwork().to(DEVICE)\n", 777 | "# opt = # YOUR CODE HERE\n", 778 | "# loss_func = # YOUR CODE HERE\n", 779 | "# n_epoch = # YOUR CODE HERE\n", 780 | "\n", 781 | "# Your experiments, come here\n", 782 | "# train_val_loop(overfit_model, train_loader=train_loader, val_loader=val_loader, name='overfit model', optimizer=opt, criterion=loss_func, n_epoch=n_epoch)" 783 | ] 784 | }, 785 | { 786 | "cell_type": "code", 787 | "source": [ 788 | "# test_accuracy(overfit_model)" 789 | ], 790 | "metadata": { 791 | "id": "GqFQzpfJIem3" 792 | }, 793 | "execution_count": null, 794 | "outputs": [] 795 | }, 796 | { 797 | "cell_type": "markdown", 798 | "metadata": { 799 | "id": "LG8mNHtPrlpr" 800 | }, 801 | "source": [ 802 | "## Task 3.3: Fix it.\n", 803 | "Fix the overfitted network from the previous step (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results." 804 | ] 805 | }, 806 | { 807 | "cell_type": "code", 808 | "execution_count": null, 809 | "metadata": { 810 | "id": "42343iSyrlpr" 811 | }, 812 | "outputs": [], 813 | "source": [ 814 | "class FixedNeuralNetwork(nn.Module):\n", 815 | " def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):\n", 816 | " super(self.__class__, self).__init__()\n", 817 | " self.model = nn.Sequential(\n", 818 | " nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards\n", 819 | " # YOUR CODE HERE\n", 820 | " )\n", 821 | "\n", 822 | " def forward(self, inp):\n", 823 | " out = self.model(inp)\n", 824 | " return out" 825 | ] 826 | }, 827 | { 828 | "cell_type": "code", 829 | "source": [ 830 | "model = FixedNeuralNetwork()\n", 831 | "out = model(torch.randn(2, 1, 28, 28))\n", 832 | "assert len(out.shape) == 2\n", 833 | "assert out.shape[-1] == 10" 834 | ], 835 | "metadata": { 836 | "id": "93Twxz_N6Ade" 837 | }, 838 | "execution_count": null, 839 | "outputs": [] 840 | }, 841 | { 842 | "cell_type": "code", 843 | "execution_count": null, 844 | "metadata": { 845 | "id": "TR1xQBp9rlps" 846 | }, 847 | "outputs": [], 848 | "source": [ 849 | "torchsummary.summary(FixedNeuralNetwork().to(DEVICE), (28*28,))" 850 | ] 851 | }, 852 | { 853 | "cell_type": "code", 854 | "execution_count": null, 855 | "metadata": { 856 | "id": "OMdEf9Kbrlps" 857 | }, 858 | "outputs": [], 859 | "source": [ 860 | "# fixed_model = FixedNeuralNetwork().to(device)\n", 861 | "# opt = # YOUR CODE HERE\n", 862 | "# loss_func = # YOUR CODE HERE\n", 863 | "# n_epoch = # YOUR CODE HERE\n", 864 | "\n", 865 | "# Your experiments, come here\n", 866 | "# train_val_loop(fixed_model, train_loader=train_loader, val_loader=val_loader, name='fixed model', optimizer=opt, criterion=loss_func, n_epoch=n_epoch)" 867 | ] 868 | }, 869 | { 870 | "cell_type": "code", 871 | "source": [ 872 | "# test_accuracy(fixed_model)" 873 | ], 874 | "metadata": { 875 | "id": "idv0HgvDIckN" 876 | }, 877 | "execution_count": null, 878 | "outputs": [] 879 | }, 880 | { 881 | "cell_type": "markdown", 882 | "metadata": { 883 | "id": "dMui_uLJ7G0d" 884 | }, 885 | "source": [ 886 | "### Conclusions:\n", 887 | "_Write down small report with your conclusions and your ideas._\n", 888 | "\n", 889 | "YOUR WORDS HERE" 890 | ] 891 | }, 892 | { 893 | "cell_type": "markdown", 894 | "source": [ 895 | "# Task 4. Your own nn layer. (4 points)" 896 | ], 897 | "metadata": { 898 | "id": "7J7Tpbc9udSn" 899 | } 900 | }, 901 | { 902 | "cell_type": "code", 903 | "source": [ 904 | "class Module(object):\n", 905 | " \"\"\"\n", 906 | " Basically, you can think of a module as of a something (black box)\n", 907 | " which can process `input` data and produce `ouput` data.\n", 908 | " This is like applying a function which is called `forward`:\n", 909 | "\n", 910 | " output = module.forward(input)\n", 911 | "\n", 912 | " The module should be able to perform a backward pass: to differentiate the `forward` function.\n", 913 | " More, it should be able to differentiate it if is a part of chain (chain rule).\n", 914 | " The latter implies there is a gradient from previous step of a chain rule.\n", 915 | "\n", 916 | " gradInput = module.backward(input, gradOutput)\n", 917 | " \"\"\"\n", 918 | " def __init__ (self):\n", 919 | " self.output = None\n", 920 | " self.gradInput = None\n", 921 | " self.training = True\n", 922 | "\n", 923 | " def forward(self, input):\n", 924 | " \"\"\"\n", 925 | " Takes an input object, and computes the corresponding output of the module.\n", 926 | " \"\"\"\n", 927 | " return self.updateOutput(input)\n", 928 | "\n", 929 | " def backward(self,input, gradOutput):\n", 930 | " \"\"\"\n", 931 | " Performs a backpropagation step through the module, with respect to the given input.\n", 932 | "\n", 933 | " This includes\n", 934 | " - computing a gradient w.r.t. `input` (is needed for further backprop),\n", 935 | " - computing a gradient w.r.t. parameters (to update parameters while optimizing).\n", 936 | " \"\"\"\n", 937 | " self.updateGradInput(input, gradOutput)\n", 938 | " self.accGradParameters(input, gradOutput)\n", 939 | " return self.gradInput\n", 940 | "\n", 941 | "\n", 942 | " def updateOutput(self, input):\n", 943 | " \"\"\"\n", 944 | " Computes the output using the current parameter set of the class and input.\n", 945 | " This function returns the result which is stored in the `output` field.\n", 946 | "\n", 947 | " Make sure to both store the data in `output` field and return it.\n", 948 | " \"\"\"\n", 949 | "\n", 950 | " # The easiest case:\n", 951 | "\n", 952 | " # self.output = input\n", 953 | " # return self.output\n", 954 | "\n", 955 | " pass\n", 956 | "\n", 957 | " def updateGradInput(self, input, gradOutput):\n", 958 | " \"\"\"\n", 959 | " Computing the gradient of the module with respect to its own input.\n", 960 | " This is returned in `gradInput`. Also, the `gradInput` state variable is updated accordingly.\n", 961 | "\n", 962 | " The shape of `gradInput` is always the same as the shape of `input`.\n", 963 | "\n", 964 | " Make sure to both store the gradients in `gradInput` field and return it.\n", 965 | " \"\"\"\n", 966 | "\n", 967 | " # The easiest case:\n", 968 | "\n", 969 | " # self.gradInput = gradOutput\n", 970 | " # return self.gradInput\n", 971 | "\n", 972 | " pass\n", 973 | "\n", 974 | " def accGradParameters(self, input, gradOutput):\n", 975 | " \"\"\"\n", 976 | " Computing the gradient of the module with respect to its own parameters.\n", 977 | " No need to override if module has no parameters (e.g. ReLU).\n", 978 | " \"\"\"\n", 979 | " pass\n", 980 | "\n", 981 | " def zeroGradParameters(self):\n", 982 | " \"\"\"\n", 983 | " Zeroes `gradParams` variable if the module has params.\n", 984 | " \"\"\"\n", 985 | " pass\n", 986 | "\n", 987 | " def getParameters(self):\n", 988 | " \"\"\"\n", 989 | " Returns a list with its parameters.\n", 990 | " If the module does not have parameters return empty list.\n", 991 | " \"\"\"\n", 992 | " return []\n", 993 | "\n", 994 | " def getGradParameters(self):\n", 995 | " \"\"\"\n", 996 | " Returns a list with gradients with respect to its parameters.\n", 997 | " If the module does not have parameters return empty list.\n", 998 | " \"\"\"\n", 999 | " return []\n", 1000 | "\n", 1001 | " def train(self):\n", 1002 | " \"\"\"\n", 1003 | " Sets training mode for the module.\n", 1004 | " Training and testing behaviour differs for Dropout, BatchNorm.\n", 1005 | " \"\"\"\n", 1006 | " self.training = True\n", 1007 | "\n", 1008 | " def evaluate(self):\n", 1009 | " \"\"\"\n", 1010 | " Sets evaluation mode for the module.\n", 1011 | " Training and testing behaviour differs for Dropout, BatchNorm.\n", 1012 | " \"\"\"\n", 1013 | " self.training = False\n", 1014 | "\n", 1015 | " def __repr__(self):\n", 1016 | " \"\"\"\n", 1017 | " Pretty printing. Should be overrided in every module if you want\n", 1018 | " to have readable description.\n", 1019 | " \"\"\"\n", 1020 | " return \"Module\"" 1021 | ], 1022 | "metadata": { 1023 | "id": "vEb-QueVzvMq" 1024 | }, 1025 | "execution_count": null, 1026 | "outputs": [] 1027 | }, 1028 | { 1029 | "cell_type": "markdown", 1030 | "source": [ 1031 | "### Linear transform layer\n", 1032 | "Also known as dense layer, fully-connected layer, FC-layer.\n", 1033 | "You should implement it.\n", 1034 | "\n", 1035 | "- input: **`batch_size x n_feats1`**\n", 1036 | "- output: **`batch_size x n_feats2`**" 1037 | ], 1038 | "metadata": { 1039 | "id": "HzohCP4qz42l" 1040 | } 1041 | }, 1042 | { 1043 | "cell_type": "code", 1044 | "source": [ 1045 | "class Linear(Module):\n", 1046 | " \"\"\"\n", 1047 | " A module which applies a linear transformation\n", 1048 | " A common name is fully-connected layer, InnerProductLayer in caffe.\n", 1049 | "\n", 1050 | " The module should work with 2D input of shape (n_samples, n_feature).\n", 1051 | " \"\"\"\n", 1052 | " def __init__(self, n_in, n_out):\n", 1053 | " super(Linear, self).__init__()\n", 1054 | "\n", 1055 | " # This is a nice initialization\n", 1056 | " stdv = 1./np.sqrt(n_in)\n", 1057 | " #it is important that we should multiply X @ W^T\n", 1058 | " self.W = np.random.uniform(-stdv, stdv, size = (n_out, n_in))\n", 1059 | " self.b = np.random.uniform(-stdv, stdv, size = n_out)\n", 1060 | "\n", 1061 | " self.gradW = np.zeros_like(self.W)\n", 1062 | " self.gradb = np.zeros_like(self.b)\n", 1063 | "\n", 1064 | " def updateOutput(self, input):\n", 1065 | " # YOUR CODE HERE\n", 1066 | " pass\n", 1067 | "\n", 1068 | " def updateGradInput(self, input, gradOutput):\n", 1069 | " # YOUR CODE HERE\n", 1070 | " pass\n", 1071 | "\n", 1072 | " def accGradParameters(self, input, gradOutput):\n", 1073 | " # YOUR CODE HERE\n", 1074 | " pass\n", 1075 | "\n", 1076 | " def zeroGradParameters(self):\n", 1077 | " self.gradW.fill(0)\n", 1078 | " self.gradb.fill(0)\n", 1079 | "\n", 1080 | " def getParameters(self):\n", 1081 | " return [self.W, self.b]\n", 1082 | "\n", 1083 | " def getGradParameters(self):\n", 1084 | " return [self.gradW, self.gradb]\n", 1085 | "\n", 1086 | " def __repr__(self):\n", 1087 | " s = self.W.shape\n", 1088 | " q = 'Linear %d -> %d' %(s[1],s[0])\n", 1089 | " return q" 1090 | ], 1091 | "metadata": { 1092 | "id": "bbkgSwhpz1yw" 1093 | }, 1094 | "execution_count": null, 1095 | "outputs": [] 1096 | }, 1097 | { 1098 | "cell_type": "code", 1099 | "source": [ 1100 | "def test_Linear():\n", 1101 | " np.random.seed(42)\n", 1102 | " torch.manual_seed(42)\n", 1103 | "\n", 1104 | " batch_size, n_in, n_out = 2, 3, 4\n", 1105 | " for _ in range(100):\n", 1106 | " # layers initialization\n", 1107 | " torch_layer = torch.nn.Linear(n_in, n_out)\n", 1108 | " custom_layer = Linear(n_in, n_out)\n", 1109 | " custom_layer.W = torch_layer.weight.data.numpy()\n", 1110 | " custom_layer.b = torch_layer.bias.data.numpy()\n", 1111 | "\n", 1112 | " layer_input = np.random.uniform(-10, 10, (batch_size, n_in)).astype(np.float32)\n", 1113 | " next_layer_grad = np.random.uniform(-10, 10, (batch_size, n_out)).astype(np.float32)\n", 1114 | "\n", 1115 | " # 1. check layer output\n", 1116 | " custom_layer_output = custom_layer.updateOutput(layer_input)\n", 1117 | " layer_input_var = torch.from_numpy(layer_input).requires_grad_(True)\n", 1118 | " torch_layer_output_var = torch_layer(layer_input_var)\n", 1119 | " assert np.allclose(torch_layer_output_var.data.numpy(), custom_layer_output, atol=1e-6)\n", 1120 | "\n", 1121 | " # 2. check layer input grad\n", 1122 | " custom_layer_grad = custom_layer.updateGradInput(layer_input, next_layer_grad)\n", 1123 | " torch_layer_output_var.backward(torch.from_numpy(next_layer_grad))\n", 1124 | " torch_layer_grad_var = layer_input_var.grad\n", 1125 | " assert np.allclose(torch_layer_grad_var.data.numpy(), custom_layer_grad, atol=1e-6)\n", 1126 | "\n", 1127 | " # 3. check layer parameters grad\n", 1128 | " custom_layer.accGradParameters(layer_input, next_layer_grad)\n", 1129 | " weight_grad = custom_layer.gradW\n", 1130 | " bias_grad = custom_layer.gradb\n", 1131 | " torch_weight_grad = torch_layer.weight.grad.data.numpy()\n", 1132 | " torch_bias_grad = torch_layer.bias.grad.data.numpy()\n", 1133 | " assert np.allclose(torch_weight_grad, weight_grad, atol=1e-6)\n", 1134 | " assert np.allclose(torch_bias_grad, bias_grad, atol=1e-6)" 1135 | ], 1136 | "metadata": { 1137 | "id": "M0PEw9VVugYd" 1138 | }, 1139 | "execution_count": null, 1140 | "outputs": [] 1141 | }, 1142 | { 1143 | "cell_type": "code", 1144 | "source": [ 1145 | "%%time\n", 1146 | "test_Linear()" 1147 | ], 1148 | "metadata": { 1149 | "id": "xOt8kzgwz7Ev" 1150 | }, 1151 | "execution_count": null, 1152 | "outputs": [] 1153 | }, 1154 | { 1155 | "cell_type": "code", 1156 | "source": [], 1157 | "metadata": { 1158 | "id": "hBPzu7JAauKd" 1159 | }, 1160 | "execution_count": null, 1161 | "outputs": [] 1162 | } 1163 | ] 1164 | } -------------------------------------------------------------------------------- /hw02/2_Hw_Students.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "colab": { 8 | "base_uri": "https://localhost:8080/" 9 | }, 10 | "id": "tsZpFIaRfROD", 11 | "outputId": "e5e6a59d-d91b-41f7-a230-fa4e9bc3e449" 12 | }, 13 | "outputs": [ 14 | { 15 | "output_type": "stream", 16 | "name": "stderr", 17 | "text": [ 18 | "/usr/local/lib/python3.10/dist-packages/albumentations/__init__.py:13: UserWarning: A new version of Albumentations is available: 1.4.18 (you have 1.4.15). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.\n", 19 | " check_for_updates()\n" 20 | ] 21 | } 22 | ], 23 | "source": [ 24 | "import torch\n", 25 | "import torch.nn as nn\n", 26 | "import torchvision.models\n", 27 | "from torch.utils.data import Dataset, DataLoader\n", 28 | "import torch.optim as optim\n", 29 | "import torch.nn.functional as F\n", 30 | "\n", 31 | "import albumentations as A\n", 32 | "from albumentations.pytorch import ToTensorV2\n", 33 | "\n", 34 | "from tqdm import tqdm\n", 35 | "from PIL import Image\n", 36 | "import cv2\n", 37 | "import matplotlib.pyplot as plt\n", 38 | "import numpy as np\n", 39 | "\n", 40 | "import os\n", 41 | "from time import time" 42 | ] 43 | }, 44 | { 45 | "cell_type": "markdown", 46 | "source": [ 47 | "### Get the data" 48 | ], 49 | "metadata": { 50 | "id": "U9HlqnlYoUJM" 51 | } 52 | }, 53 | { 54 | "cell_type": "code", 55 | "source": [ 56 | "import gdown\n", 57 | "url = 'https://drive.google.com/uc?id=10f1H2T-5W-BiqabHHtlZ4ASs19TZmg8R'\n", 58 | "output = 'data.zip'\n", 59 | "gdown.download(url, output, quiet=False)" 60 | ], 61 | "metadata": { 62 | "colab": { 63 | "base_uri": "https://localhost:8080/", 64 | "height": 122 65 | }, 66 | "id": "AYTvLpTFfR9L", 67 | "outputId": "3baedbdb-2b28-4ed1-d627-647633ef1d94" 68 | }, 69 | "execution_count": null, 70 | "outputs": [ 71 | { 72 | "output_type": "stream", 73 | "name": "stderr", 74 | "text": [ 75 | "Downloading...\n", 76 | "From (original): https://drive.google.com/uc?id=10f1H2T-5W-BiqabHHtlZ4ASs19TZmg8R\n", 77 | "From (redirected): https://drive.google.com/uc?id=10f1H2T-5W-BiqabHHtlZ4ASs19TZmg8R&confirm=t&uuid=8c91a23e-b723-404e-864e-01c84f6f72f9\n", 78 | "To: /content/data.zip\n", 79 | "100%|██████████| 979M/979M [00:19<00:00, 50.4MB/s]\n" 80 | ] 81 | }, 82 | { 83 | "output_type": "execute_result", 84 | "data": { 85 | "text/plain": [ 86 | "'data.zip'" 87 | ], 88 | "application/vnd.google.colaboratory.intrinsic+json": { 89 | "type": "string" 90 | } 91 | }, 92 | "metadata": {}, 93 | "execution_count": 2 94 | } 95 | ] 96 | }, 97 | { 98 | "cell_type": "code", 99 | "source": [ 100 | "!unzip data.zip" 101 | ], 102 | "metadata": { 103 | "id": "TLSvVki2fzUf" 104 | }, 105 | "execution_count": null, 106 | "outputs": [] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "source": [ 111 | "### Utilities (0.5 point)\n", 112 | "\n", 113 | "Complete dataset to load prepared images and masks. Don't forget to use augmentations.\n", 114 | "\n", 115 | "Some of the images are 1 channels, so use `gray2rgb`." 116 | ], 117 | "metadata": { 118 | "id": "w1g03B9mtZeb" 119 | } 120 | }, 121 | { 122 | "cell_type": "code", 123 | "source": [ 124 | "def gray2rgb(img):\n", 125 | " if len(img.shape) != 3:\n", 126 | " img = np.dstack([img, img, img])\n", 127 | " return img\n", 128 | "\n", 129 | "def get_iou(gt, pred):\n", 130 | " pred = pred > 0.5\n", 131 | " return (gt & pred).sum() / (gt | pred).sum()\n", 132 | "\n", 133 | "class BirdsDataset(Dataset):\n", 134 | " def __init__(self, folder, ...) -> None:\n", 135 | " images_folder = os.path.join(folder, 'images')\n", 136 | " gt_folder = os.path.join(folder, 'gt')\n", 137 | "\n", 138 | " for class_name in os.listdir(images_folder):\n", 139 | " for fname in os.listdir(os.path.join(images_folder, class_name)):\n", 140 | " # YOUR CODE HERE\n", 141 | "\n", 142 | " self.transform = A.Compose([\n", 143 | " # YOUR CODE HERE\n", 144 | " ToTensorV2()\n", 145 | " ])\n", 146 | "\n", 147 | " def __getitem__(self, index):\n", 148 | " # YOUR CODE HERE\n", 149 | " img = ...\n", 150 | " mask = ...\n", 151 | " img = gray2rgb(img)\n", 152 | " # YOUR CODE HERE\n", 153 | " return transformed_img, transformed_mask\n", 154 | "\n", 155 | " def __len__(self):\n", 156 | " # YOUR CODE HERE\n", 157 | " return" 158 | ], 159 | "metadata": { 160 | "id": "YT2QUTqFooxJ" 161 | }, 162 | "execution_count": null, 163 | "outputs": [] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "source": [ 168 | "### Architecture (1 point)\n", 169 | "Your task for today is to build your own Unet to solve the segmentation problem.\n", 170 | "\n", 171 | "As an encoder, you can use pre-trained on IMAGENET models(or parts) from torchvision. The decoder must be trained from scratch.\n", 172 | "It is forbidden to use data not from the `data` folder.\n", 173 | "\n", 174 | "I advise you to experiment with the number of blocks so as not to overfit on the training sample and get good quality on validation." 175 | ], 176 | "metadata": { 177 | "id": "dss-ZnpTuI1V" 178 | } 179 | }, 180 | { 181 | "cell_type": "code", 182 | "source": [ 183 | "class DecoderBlock(nn.Module):\n", 184 | " def __init__(self, in_channels, mid_channels, out_channels):\n", 185 | " super().__init__()\n", 186 | " # YOUR CODE HERE\n", 187 | "\n", 188 | " def forward(self,x):\n", 189 | " # YOUR CODE HERE\n", 190 | " return\n", 191 | "\n", 192 | "class Unet(nn.Module):\n", 193 | " def __init__(self):\n", 194 | " super().__init__()\n", 195 | " # YOUR CODE HERE\n", 196 | " # encoder blocks\n", 197 | " self.encoder1=\n", 198 | " self.encoder2=\n", 199 | " self.encoder3=\n", 200 | " # decoder blocks\n", 201 | " self.decoder1=\n", 202 | " self.decoder2=\n", 203 | " self.decoder3=\n", 204 | "\n", 205 | "\n", 206 | " def forward(self,x):\n", 207 | " # YOUR CODE HERE\n", 208 | " return" 209 | ], 210 | "metadata": { 211 | "id": "_Elr1Uw3uITD" 212 | }, 213 | "execution_count": null, 214 | "outputs": [] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "source": [ 219 | "### Train script (0.5 point)\n", 220 | "\n", 221 | "Complete the train and predict scripts." 222 | ], 223 | "metadata": { 224 | "id": "7Sq4WwZsuMeD" 225 | } 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": null, 230 | "metadata": { 231 | "id": "d_ha44iifROE" 232 | }, 233 | "outputs": [], 234 | "source": [ 235 | "def train_segmentation_model(data_path):\n", 236 | " BATCH_SIZE = 8\n", 237 | " N_EPOCH = 15\n", 238 | " DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'\n", 239 | "\n", 240 | " train_dataset = BirdsDataset(data_path + 'train')\n", 241 | " val_dataset = BirdsDataset(data_path + 'val')\n", 242 | " train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)\n", 243 | " val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)\n", 244 | "\n", 245 | " model = Unet().to(DEVICE)\n", 246 | " optimizer = # YOUR CODE HERE\n", 247 | " criterion = # YOUR CODE HERE\n", 248 | " losses_train, losses_val, ious_train, ious_val = [], [], [], []\n", 249 | "\n", 250 | " for epoch in range(N_EPOCH):\n", 251 | " model.train()\n", 252 | "\n", 253 | " for tqdm(inputs, masks) in train_dataloader:\n", 254 | " inputs = inputs.to(DEVICE)\n", 255 | " masks = masks.to(DEVICE)\n", 256 | " # YOUR CODE HERE\n", 257 | " losses_train.append(...)\n", 258 | " ious_train.append(...)\n", 259 | "\n", 260 | " model.eval()\n", 261 | " with torch.no_grad():\n", 262 | " for inputs, masks in tqdm(val_dataloader):\n", 263 | " inputs = inputs.to(DEVICE)\n", 264 | " masks = masks.to(DEVICE)\n", 265 | " # YOUR CODE HERE\n", 266 | " losses_val.append(...)\n", 267 | " ious_val.append(...)\n", 268 | "\n", 269 | " torch.save(model.state_dict(), f'model_{epoch}.pth')\n", 270 | "\n", 271 | " print(f\"Epoch: {epoch}, train loss: {losses_train[-1]}, val loss: {losses_val[-1]}, train iou: {ious_train[-1]}, val iou: {ious_val[-1]}\")" 272 | ] 273 | }, 274 | { 275 | "cell_type": "code", 276 | "source": [ 277 | "def predict(model, img_path):\n", 278 | " with torch.no_grad():\n", 279 | " # YOUR CODE HERE TO PREPARE IMAGE\n", 280 | " # GET PREDICTIONS\n", 281 | " # POST PROCESS\n", 282 | " return segm\n", 283 | "\n", 284 | "def get_model(path):\n", 285 | " model = Unet()\n", 286 | " model.load_state_dict(torch.load(path))\n", 287 | " model.eval()\n", 288 | " return model" 289 | ], 290 | "metadata": { 291 | "id": "96EkIQmutpdS" 292 | }, 293 | "execution_count": null, 294 | "outputs": [] 295 | }, 296 | { 297 | "cell_type": "code", 298 | "execution_count": null, 299 | "metadata": { 300 | "id": "LzZS9Z2jfROF" 301 | }, 302 | "outputs": [], 303 | "source": [ 304 | "train_segmentation_model('data/')" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "source": [ 310 | "You can also experiment with models and write a small report about results. If the report will be meaningful, you will receive an extra point." 311 | ], 312 | "metadata": { 313 | "id": "MWKD09whySKA" 314 | } 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "source": [ 319 | "### Testing (8 points)\n", 320 | "Your model will be tested on the new data, similar to validation, so use techniques to prevent overfitting the model.\n", 321 | "\n", 322 | "* IoU > 0.85 — 8 points\n", 323 | "* IoU > 0.80 — 7 points\n", 324 | "* IoU > 0.75 — 6 points\n", 325 | "* IoU > 0.70 — 5 points\n", 326 | "* IoU > 0.60 — 4 points\n", 327 | "* IoU > 0.50 — 3 points\n", 328 | "* IoU > 0.40 — 2 points\n", 329 | "* IoU > 0.30 — 1 points" 330 | ], 331 | "metadata": { 332 | "id": "zCHacSHutHo4" 333 | } 334 | }, 335 | { 336 | "cell_type": "code", 337 | "source": [ 338 | "model = get_model('model_14.pth').to('cuda')" 339 | ], 340 | "metadata": { 341 | "id": "DZ6h11Q0tUHN" 342 | }, 343 | "execution_count": null, 344 | "outputs": [] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": null, 349 | "metadata": { 350 | "id": "yV9zadusfROF" 351 | }, 352 | "outputs": [], 353 | "source": [ 354 | "ious, times = [], []\n", 355 | "test_dir = 'data/val/'\n", 356 | "\n", 357 | "for class_name in tqdm(sorted(os.listdir(os.path.join(test_dir, 'images')))):\n", 358 | " for img_name in sorted(os.listdir(os.path.join(test_dir, 'images', class_name))):\n", 359 | "\n", 360 | " t_start = time()\n", 361 | " pred = predict(model, os.path.join(test_dir, 'images', class_name, img_name))\n", 362 | " times.append(time() - t_start)\n", 363 | "\n", 364 | " gt_name = img_name.replace('jpg', 'png')\n", 365 | " gt = np.asarray(Image.open(os.path.join(test_dir, 'gt', class_name, gt_name)), dtype = np.uint8)\n", 366 | " if len(gt.shape) > 2:\n", 367 | " gt = gt[:, :, 0]\n", 368 | "\n", 369 | " iou = get_iou(gt==255, pred>0.5)\n", 370 | " ious.append(iou)\n", 371 | "\n", 372 | "np.mean(ious), np.mean(times)" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "source": [ 378 | "### Compression (1 point)" 379 | ], 380 | "metadata": { 381 | "id": "47KgrqdpvKWS" 382 | } 383 | }, 384 | { 385 | "cell_type": "markdown", 386 | "source": [ 387 | "Try to speed up the model in any way without losing more than 1% in iou score.\n", 388 | "For example [torch2trt](https://github.com/NVIDIA-AI-IOT/torch2trt)" 389 | ], 390 | "metadata": { 391 | "id": "4kJiLB__vTC3" 392 | } 393 | }, 394 | { 395 | "cell_type": "code", 396 | "source": [ 397 | "def get_fast_model():\n", 398 | " # YOUR CODE HERE\n", 399 | " return model" 400 | ], 401 | "metadata": { 402 | "id": "UQyNHbt0vtMu" 403 | }, 404 | "execution_count": null, 405 | "outputs": [] 406 | }, 407 | { 408 | "cell_type": "code", 409 | "source": [ 410 | "fast_model = get_fast_model().to('cuda')" 411 | ], 412 | "metadata": { 413 | "id": "f2DedST0v6aF" 414 | }, 415 | "execution_count": null, 416 | "outputs": [] 417 | }, 418 | { 419 | "cell_type": "code", 420 | "source": [ 421 | "ious, times = [], []\n", 422 | "test_dir = 'data/val/'\n", 423 | "\n", 424 | "for class_name in tqdm(sorted(os.listdir(os.path.join(test_dir, 'images')))):\n", 425 | " for img_name in sorted(os.listdir(os.path.join(test_dir, 'images', class_name))):\n", 426 | "\n", 427 | " t_start = time()\n", 428 | " pred = predict(fast_model, os.path.join(test_dir, 'images', class_name, img_name))\n", 429 | " times.append(time() - t_start)\n", 430 | "\n", 431 | " gt_name = img_name.replace('jpg', 'png')\n", 432 | " gt = np.asarray(Image.open(os.path.join(test_dir, 'gt', class_name, gt_name)), dtype = np.uint8)\n", 433 | " if len(gt.shape) > 2:\n", 434 | " gt = gt[:, :, 0]\n", 435 | "\n", 436 | " iou = get_iou(gt==255, pred>0.5)\n", 437 | " ious.append(iou)\n", 438 | "\n", 439 | "np.mean(ious), np.mean(times)" 440 | ], 441 | "metadata": { 442 | "id": "ryWUekS2vlv8" 443 | }, 444 | "execution_count": null, 445 | "outputs": [] 446 | }, 447 | { 448 | "cell_type": "markdown", 449 | "source": [ 450 | "**Bonus:** For the best iou score on test(without compression) in group you will get 1.5, 1, 0.5 extra points(for 1st, 2nd, 3rd places)." 451 | ], 452 | "metadata": { 453 | "id": "QCdMgBoOwXAb" 454 | } 455 | }, 456 | { 457 | "cell_type": "code", 458 | "source": [], 459 | "metadata": { 460 | "id": "daanikNkwo5t" 461 | }, 462 | "execution_count": null, 463 | "outputs": [] 464 | } 465 | ], 466 | "metadata": { 467 | "kernelspec": { 468 | "display_name": "Python 3", 469 | "name": "python3" 470 | }, 471 | "language_info": { 472 | "codemirror_mode": { 473 | "name": "ipython", 474 | "version": 3 475 | }, 476 | "file_extension": ".py", 477 | "mimetype": "text/x-python", 478 | "name": "python", 479 | "nbconvert_exporter": "python", 480 | "pygments_lexer": "ipython3", 481 | "version": "3.12.2" 482 | }, 483 | "colab": { 484 | "provenance": [], 485 | "gpuType": "T4" 486 | }, 487 | "accelerator": "GPU" 488 | }, 489 | "nbformat": 4, 490 | "nbformat_minor": 0 491 | } -------------------------------------------------------------------------------- /week01_intro/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week01_intro/lecture.pdf -------------------------------------------------------------------------------- /week02_init_regularization/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week02_init_regularization/lecture.pdf -------------------------------------------------------------------------------- /week03_conv/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week03_conv/lecture.pdf -------------------------------------------------------------------------------- /week04_tricks/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week04_tricks/lecture.pdf -------------------------------------------------------------------------------- /week05_segmentation/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week05_segmentation/lecture.pdf -------------------------------------------------------------------------------- /week06_detection/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week06_detection/lecture.pdf -------------------------------------------------------------------------------- /week07_word_embeddings/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week07_word_embeddings/lecture.pdf -------------------------------------------------------------------------------- /week07_word_embeddings/seminar.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "id": "hchgr-3Nmn7o" 7 | }, 8 | "source": [ 9 | "## Seminar 1: Fun with Word Embeddings (3 points)\n", 10 | "\n", 11 | "Today we gonna play with word embeddings: train our own little embedding, load one from gensim model zoo and use it to visualize text corpora.\n", 12 | "\n", 13 | "This whole thing is gonna happen on top of embedding dataset.\n", 14 | "\n", 15 | "__Requirements:__ `pip install --upgrade nltk gensim bokeh` , but only if you're running locally." 16 | ] 17 | }, 18 | { 19 | "cell_type": "code", 20 | "execution_count": null, 21 | "metadata": { 22 | "collapsed": true, 23 | "id": "QmUCK9lVmn7q" 24 | }, 25 | "outputs": [], 26 | "source": [ 27 | "# download the data:\n", 28 | "!wget https://www.dropbox.com/s/obaitrix9jyu84r/quora.txt?dl=1 -O ./quora.txt\n", 29 | "# alternative download link: https://yadi.sk/i/BPQrUu1NaTduEw" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": null, 35 | "metadata": { 36 | "scrolled": false, 37 | "id": "YyzusR4Lmn7r" 38 | }, 39 | "outputs": [], 40 | "source": [ 41 | "import numpy as np\n", 42 | "\n", 43 | "data = list(open(\"./quora.txt\", encoding=\"utf-8\"))\n", 44 | "data[50]" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "id": "jOTmojdtmn7r" 51 | }, 52 | "source": [ 53 | "__Tokenization:__ a typical first step for an nlp task is to split raw data into words.\n", 54 | "The text we're working with is in raw format: with all the punctuation and smiles attached to some words, so a simple str.split won't do.\n", 55 | "\n", 56 | "Let's use __`nltk`__ - a library that handles many nlp tasks like tokenization, stemming or part-of-speech tagging." 57 | ] 58 | }, 59 | { 60 | "cell_type": "code", 61 | "execution_count": null, 62 | "metadata": { 63 | "id": "Jya8V2Skmn7r" 64 | }, 65 | "outputs": [], 66 | "source": [ 67 | "from nltk.tokenize import WordPunctTokenizer\n", 68 | "tokenizer = WordPunctTokenizer()\n", 69 | "\n", 70 | "print(tokenizer.tokenize(data[50]))" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": null, 76 | "metadata": { 77 | "collapsed": true, 78 | "id": "kitrV92Amn7r" 79 | }, 80 | "outputs": [], 81 | "source": [ 82 | "# TASK: lowercase everything and extract tokens with tokenizer.\n", 83 | "# data_tok should be a list of lists of tokens for each line in data.\n", 84 | "\n", 85 | "data_tok = # YOUR CODE" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": null, 91 | "metadata": { 92 | "collapsed": true, 93 | "id": "bD7uAQzgmn7r" 94 | }, 95 | "outputs": [], 96 | "source": [ 97 | "assert all(isinstance(row, (list, tuple)) for row in data_tok), \"please convert each line into a list of tokens (strings)\"\n", 98 | "assert all(all(isinstance(tok, str) for tok in row) for row in data_tok), \"please convert each line into a list of tokens (strings)\"\n", 99 | "is_latin = lambda tok: all('a' <= x.lower() <= 'z' for x in tok)\n", 100 | "assert all(map(lambda l: not is_latin(l) or l.islower(), map(' '.join, data_tok))), \"please make sure to lowercase the data\"" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": null, 106 | "metadata": { 107 | "id": "sm2nO5yzmn7s" 108 | }, 109 | "outputs": [], 110 | "source": [ 111 | "print([' '.join(row) for row in data_tok[:2]])" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": { 117 | "id": "RloDQkKSmn7s" 118 | }, 119 | "source": [ 120 | "__Word vectors:__ as the saying goes, there's more than one way to train word embeddings. There's Word2Vec and GloVe with different objective functions. Then there's fasttext that uses character-level models to train word embeddings.\n", 121 | "\n", 122 | "The choice is huge, so let's start someplace small: __gensim__ is another nlp library that features many vector-based models incuding word2vec." 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": null, 128 | "metadata": { 129 | "collapsed": true, 130 | "id": "HT6ie7OWmn7s" 131 | }, 132 | "outputs": [], 133 | "source": [ 134 | "from gensim.models import Word2Vec\n", 135 | "model = Word2Vec(data_tok,\n", 136 | " vector_size=32, # embedding vector size\n", 137 | " min_count=5, # consider words that occured at least 5 times\n", 138 | " window=5).wv # define context as a 5-word window around the target word" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": { 145 | "id": "_utr_4ZEmn7s" 146 | }, 147 | "outputs": [], 148 | "source": [ 149 | "# now you can get word vectors !\n", 150 | "model.get_vector('anything')" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": null, 156 | "metadata": { 157 | "id": "x7X2rBLImn7s" 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "# or query similar words directly. Go play with it!\n", 162 | "model.most_similar('bread')" 163 | ] 164 | }, 165 | { 166 | "cell_type": "markdown", 167 | "metadata": { 168 | "id": "varM16R3mn7t" 169 | }, 170 | "source": [ 171 | "### Using pre-trained model\n", 172 | "\n", 173 | "Took it a while, huh? Now imagine training life-sized (100~300D) word embeddings on gigabytes of text: wikipedia articles or twitter posts.\n", 174 | "\n", 175 | "Thankfully, nowadays you can get a pre-trained word embedding model in 2 lines of code (no sms required, promise)." 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": null, 181 | "metadata": { 182 | "collapsed": true, 183 | "id": "oeiEoLrUmn7t" 184 | }, 185 | "outputs": [], 186 | "source": [ 187 | "import gensim.downloader as api\n", 188 | "model = api.load('glove-twitter-100')" 189 | ] 190 | }, 191 | { 192 | "cell_type": "code", 193 | "execution_count": null, 194 | "metadata": { 195 | "id": "ysNoDw7Umn7t" 196 | }, 197 | "outputs": [], 198 | "source": [ 199 | "model.most_similar(positive=[\"coder\", \"money\"], negative=[\"brain\"])" 200 | ] 201 | }, 202 | { 203 | "cell_type": "markdown", 204 | "metadata": { 205 | "id": "_Kde3hgNmn7t" 206 | }, 207 | "source": [ 208 | "### Visualizing word vectors\n", 209 | "\n", 210 | "One way to see if our vectors are any good is to plot them. Thing is, those vectors are in 30D+ space and we humans are more used to 2-3D.\n", 211 | "\n", 212 | "Luckily, we machine learners know about __dimensionality reduction__ methods.\n", 213 | "\n", 214 | "Let's use that to plot 1000 most frequent words" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": null, 220 | "metadata": { 221 | "id": "l0yKTqYymn7t" 222 | }, 223 | "outputs": [], 224 | "source": [ 225 | "words = model.index_to_key[:1000]\n", 226 | "\n", 227 | "print(words[::100])" 228 | ] 229 | }, 230 | { 231 | "cell_type": "code", 232 | "execution_count": null, 233 | "metadata": { 234 | "id": "rLxEBnscmn7t" 235 | }, 236 | "outputs": [], 237 | "source": [ 238 | "# for each word, compute it's vector with model\n", 239 | "word_vectors = # YOUR CODE" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": null, 245 | "metadata": { 246 | "collapsed": true, 247 | "id": "lZ06vHSJmn7t" 248 | }, 249 | "outputs": [], 250 | "source": [ 251 | "assert isinstance(word_vectors, np.ndarray)\n", 252 | "assert word_vectors.shape == (len(words), 100)\n", 253 | "assert np.isfinite(word_vectors).all()" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": { 259 | "id": "S2wMcn29mn7t" 260 | }, 261 | "source": [ 262 | "#### Linear projection: PCA\n", 263 | "\n", 264 | "The simplest linear dimensionality reduction method is __P__rincipial __C__omponent __A__nalysis.\n", 265 | "\n", 266 | "In geometric terms, PCA tries to find axes along which most of the variance occurs. The \"natural\" axes, if you wish.\n", 267 | "\n", 268 | "\n", 269 | "\n", 270 | "\n", 271 | "Under the hood, it attempts to decompose object-feature matrix $X$ into two smaller matrices: $W$ and $\\hat W$ minimizing _mean squared error_:\n", 272 | "\n", 273 | "$$\\|(X W) \\hat{W} - X\\|^2_2 \\to_{W, \\hat{W}} \\min$$\n", 274 | "- $X \\in \\mathbb{R}^{n \\times m}$ - object matrix (**centered**);\n", 275 | "- $W \\in \\mathbb{R}^{m \\times d}$ - matrix of direct transformation;\n", 276 | "- $\\hat{W} \\in \\mathbb{R}^{d \\times m}$ - matrix of reverse transformation;\n", 277 | "- $n$ samples, $m$ original dimensions and $d$ target dimensions;\n", 278 | "\n" 279 | ] 280 | }, 281 | { 282 | "cell_type": "code", 283 | "execution_count": null, 284 | "metadata": { 285 | "collapsed": true, 286 | "id": "USPP-k-Imn7t" 287 | }, 288 | "outputs": [], 289 | "source": [ 290 | "from sklearn.decomposition import PCA\n", 291 | "\n", 292 | "# map word vectors onto 2d plane with PCA. Use good old sklearn api (fit, transform)\n", 293 | "# after that, normalize vectors to make sure they have zero mean and unit variance\n", 294 | "word_vectors_pca = # YOUR CODE\n", 295 | "\n", 296 | "# and maybe MORE OF YOUR CODE here :)" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": null, 302 | "metadata": { 303 | "collapsed": true, 304 | "id": "NV_x7D4omn7t" 305 | }, 306 | "outputs": [], 307 | "source": [ 308 | "assert word_vectors_pca.shape == (len(word_vectors), 2), \"there must be a 2d vector for each word\"\n", 309 | "assert max(abs(word_vectors_pca.mean(0))) < 1e-5, \"points must be zero-centered\"" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": { 315 | "id": "VnybG7wHmn7t" 316 | }, 317 | "source": [ 318 | "#### Let's draw it!" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": null, 324 | "metadata": { 325 | "id": "jo2-yN80mn7t" 326 | }, 327 | "outputs": [], 328 | "source": [ 329 | "import bokeh.models as bm, bokeh.plotting as pl\n", 330 | "from bokeh.io import output_notebook\n", 331 | "output_notebook()\n", 332 | "\n", 333 | "def draw_vectors(x, y, radius=10, alpha=0.25, color='blue',\n", 334 | " width=600, height=400, show=True, **kwargs):\n", 335 | " \"\"\" draws an interactive plot for data points with auxilirary info on hover \"\"\"\n", 336 | " if isinstance(color, str): color = [color] * len(x)\n", 337 | " data_source = bm.ColumnDataSource({ 'x' : x, 'y' : y, 'color': color, **kwargs })\n", 338 | "\n", 339 | " fig = pl.figure(active_scroll='wheel_zoom', width=width, height=height)\n", 340 | " fig.scatter('x', 'y', size=radius, color='color', alpha=alpha, source=data_source)\n", 341 | "\n", 342 | " fig.add_tools(bm.HoverTool(tooltips=[(key, \"@\" + key) for key in kwargs.keys()]))\n", 343 | " if show: pl.show(fig)\n", 344 | " return fig" 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": { 351 | "id": "6J1c7Q9bmn7t" 352 | }, 353 | "outputs": [], 354 | "source": [ 355 | "draw_vectors(word_vectors_pca[:, 0], word_vectors_pca[:, 1], token=words)\n", 356 | "\n", 357 | "# hover a mouse over there and see if you can identify the clusters" 358 | ] 359 | }, 360 | { 361 | "cell_type": "markdown", 362 | "metadata": { 363 | "id": "u9qhJAptmn7t" 364 | }, 365 | "source": [ 366 | "### Visualizing neighbors with t-SNE\n", 367 | "PCA is nice but it's strictly linear and thus only able to capture coarse high-level structure of the data.\n", 368 | "\n", 369 | "If we instead want to focus on keeping neighboring points near, we could use TSNE, which is itself an embedding method. Here you can read __[more on TSNE](https://distill.pub/2016/misread-tsne/)__." 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": null, 375 | "metadata": { 376 | "id": "UeQ2ixkHmn7t" 377 | }, 378 | "outputs": [], 379 | "source": [ 380 | "from sklearn.manifold import TSNE\n", 381 | "\n", 382 | "# map word vectors onto 2d plane with TSNE. hint: don't panic it may take a minute or two to fit.\n", 383 | "# normalize them as just lke with pca\n", 384 | "\n", 385 | "\n", 386 | "word_tsne = #YOUR CODE" 387 | ] 388 | }, 389 | { 390 | "cell_type": "code", 391 | "execution_count": null, 392 | "metadata": { 393 | "collapsed": true, 394 | "scrolled": false, 395 | "id": "I5sA7faVmn7t" 396 | }, 397 | "outputs": [], 398 | "source": [ 399 | "draw_vectors(word_tsne[:, 0], word_tsne[:, 1], color='green', token=words)" 400 | ] 401 | }, 402 | { 403 | "cell_type": "markdown", 404 | "metadata": { 405 | "id": "-j4S1Bwbmn7u" 406 | }, 407 | "source": [ 408 | "### Visualizing phrases\n", 409 | "\n", 410 | "Word embeddings can also be used to represent short phrases. The simplest way is to take __an average__ of vectors for all tokens in the phrase with some weights.\n", 411 | "\n", 412 | "This trick is useful to identify what data are you working with: find if there are any outliers, clusters or other artefacts.\n", 413 | "\n", 414 | "Let's try this new hammer on our data!\n" 415 | ] 416 | }, 417 | { 418 | "cell_type": "code", 419 | "execution_count": null, 420 | "metadata": { 421 | "collapsed": true, 422 | "id": "zWEBCqxQmn7u" 423 | }, 424 | "outputs": [], 425 | "source": [ 426 | "def get_phrase_embedding(phrase):\n", 427 | " \"\"\"\n", 428 | " Convert phrase to a vector by aggregating it's word embeddings. See description above.\n", 429 | " \"\"\"\n", 430 | " # 1. lowercase phrase\n", 431 | " # 2. tokenize phrase\n", 432 | " # 3. average word vectors for all words in tokenized phrase\n", 433 | " # skip words that are not in model's vocabulary\n", 434 | " # if all words are missing from vocabulary, return zeros\n", 435 | "\n", 436 | " vector = np.zeros([model.vector_size], dtype='float32')\n", 437 | "\n", 438 | " # YOUR CODE\n", 439 | "\n", 440 | " return vector\n", 441 | "\n" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": null, 447 | "metadata": { 448 | "collapsed": true, 449 | "id": "Upwk1fsNmn7u" 450 | }, 451 | "outputs": [], 452 | "source": [ 453 | "vector = get_phrase_embedding(\"I'm very sure. This never happened to me before...\")\n", 454 | "\n", 455 | "assert np.allclose(vector[::10],\n", 456 | " np.array([ 0.31807372, -0.02558171, 0.0933293 , -0.1002182 , -1.0278689 ,\n", 457 | " -0.16621883, 0.05083408, 0.17989802, 1.3701859 , 0.08655966],\n", 458 | " dtype=np.float32))" 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": { 465 | "collapsed": true, 466 | "id": "e1gQrVSVmn7u" 467 | }, 468 | "outputs": [], 469 | "source": [ 470 | "# let's only consider ~5k phrases for a first run.\n", 471 | "chosen_phrases = data[::len(data) // 1000]\n", 472 | "\n", 473 | "# compute vectors for chosen phrases\n", 474 | "phrase_vectors = # YOUR CODE" 475 | ] 476 | }, 477 | { 478 | "cell_type": "code", 479 | "execution_count": null, 480 | "metadata": { 481 | "collapsed": true, 482 | "id": "pWXfU6rTmn7u" 483 | }, 484 | "outputs": [], 485 | "source": [ 486 | "assert isinstance(phrase_vectors, np.ndarray) and np.isfinite(phrase_vectors).all()\n", 487 | "assert phrase_vectors.shape == (len(chosen_phrases), model.vector_size)" 488 | ] 489 | }, 490 | { 491 | "cell_type": "code", 492 | "execution_count": null, 493 | "metadata": { 494 | "collapsed": true, 495 | "id": "g8P1tU0omn7u" 496 | }, 497 | "outputs": [], 498 | "source": [ 499 | "# map vectors into 2d space with pca, tsne or your other method of choice\n", 500 | "# don't forget to normalize\n", 501 | "\n", 502 | "phrase_vectors_2d = TSNE().fit_transform(phrase_vectors)\n", 503 | "\n", 504 | "phrase_vectors_2d = (phrase_vectors_2d - phrase_vectors_2d.mean(axis=0)) / phrase_vectors_2d.std(axis=0)" 505 | ] 506 | }, 507 | { 508 | "cell_type": "code", 509 | "execution_count": null, 510 | "metadata": { 511 | "collapsed": true, 512 | "id": "N_zCSz5Zmn7u" 513 | }, 514 | "outputs": [], 515 | "source": [ 516 | "draw_vectors(phrase_vectors_2d[:, 0], phrase_vectors_2d[:, 1],\n", 517 | " phrase=[phrase[:50] for phrase in chosen_phrases],\n", 518 | " radius=20,)" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": { 524 | "id": "ML_oG0Nlmn7u" 525 | }, 526 | "source": [ 527 | "Finally, let's build a simple \"similar question\" engine with phrase embeddings we've built." 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": null, 533 | "metadata": { 534 | "collapsed": true, 535 | "id": "tfp3TEFpmn7u" 536 | }, 537 | "outputs": [], 538 | "source": [ 539 | "# compute vector embedding for all lines in data\n", 540 | "data_vectors = np.array([get_phrase_embedding(l) for l in data])" 541 | ] 542 | }, 543 | { 544 | "cell_type": "code", 545 | "execution_count": null, 546 | "metadata": { 547 | "collapsed": true, 548 | "id": "-F9ozB8umn7u" 549 | }, 550 | "outputs": [], 551 | "source": [ 552 | "def find_nearest(query, k=10):\n", 553 | " \"\"\"\n", 554 | " given text line (query), return k most similar lines from data, sorted from most to least similar\n", 555 | " similarity should be measured as cosine between query and line embedding vectors\n", 556 | " hint: it's okay to use global variables: data and data_vectors. see also: np.argpartition, np.argsort\n", 557 | " \"\"\"\n", 558 | " # YOUR CODE\n", 559 | "\n", 560 | " return " 561 | ] 562 | }, 563 | { 564 | "cell_type": "code", 565 | "execution_count": null, 566 | "metadata": { 567 | "collapsed": true, 568 | "id": "9HKuytD-mn7u" 569 | }, 570 | "outputs": [], 571 | "source": [ 572 | "results = find_nearest(query=\"How do i enter the matrix?\", k=10)\n", 573 | "\n", 574 | "print(''.join(results))\n", 575 | "\n", 576 | "assert len(results) == 10 and isinstance(results[0], str)\n", 577 | "assert results[0] == 'How do I get to the dark web?\\n'\n", 578 | "assert results[3] == 'What can I do to save the world?\\n'" 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": null, 584 | "metadata": { 585 | "collapsed": true, 586 | "id": "YG1OIIQ3mn7y" 587 | }, 588 | "outputs": [], 589 | "source": [ 590 | "find_nearest(query=\"How does Trump?\", k=10)" 591 | ] 592 | }, 593 | { 594 | "cell_type": "code", 595 | "execution_count": null, 596 | "metadata": { 597 | "collapsed": true, 598 | "id": "YZzWxmukmn7y" 599 | }, 600 | "outputs": [], 601 | "source": [ 602 | "find_nearest(query=\"Why don't i ask a question myself?\", k=10)" 603 | ] 604 | }, 605 | { 606 | "cell_type": "markdown", 607 | "metadata": { 608 | "collapsed": true, 609 | "id": "Oj76TY5Ymn7y" 610 | }, 611 | "source": [ 612 | "__Now what?__\n", 613 | "* Try running TSNE on all data, not just 1000 phrases\n", 614 | "* See what other embeddings are there in the model zoo: `gensim.downloader.info()`\n", 615 | "* Take a look at [FastText](https://github.com/facebookresearch/fastText) embeddings\n", 616 | "* Optimize find_nearest with locality-sensitive hashing: use [nearpy](https://github.com/pixelogik/NearPy) or `sklearn.neighbors`." 617 | ] 618 | } 619 | ], 620 | "metadata": { 621 | "kernelspec": { 622 | "display_name": "Python 3", 623 | "language": "python", 624 | "name": "python3" 625 | }, 626 | "language_info": { 627 | "codemirror_mode": { 628 | "name": "ipython", 629 | "version": 3 630 | }, 631 | "file_extension": ".py", 632 | "mimetype": "text/x-python", 633 | "name": "python", 634 | "nbconvert_exporter": "python", 635 | "pygments_lexer": "ipython3", 636 | "version": "3.5.2" 637 | }, 638 | "colab": { 639 | "provenance": [] 640 | } 641 | }, 642 | "nbformat": 4, 643 | "nbformat_minor": 0 644 | } -------------------------------------------------------------------------------- /week08_text_classification/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week08_text_classification/lecture.pdf -------------------------------------------------------------------------------- /week09_transformer/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week09_transformer/lecture.pdf -------------------------------------------------------------------------------- /week09_transformer/seminar.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "id": "zriTdjauH8iQ", 8 | "colab": { 9 | "base_uri": "https://localhost:8080/" 10 | }, 11 | "outputId": "f21304a0-5eef-4ade-e088-948c5db9a171" 12 | }, 13 | "outputs": [ 14 | { 15 | "output_type": "stream", 16 | "name": "stdout", 17 | "text": [ 18 | "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/480.6 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m \u001b[32m471.0/480.6 kB\u001b[0m \u001b[31m22.9 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m480.6/480.6 kB\u001b[0m \u001b[31m10.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 19 | "\u001b[?25h\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/84.0 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.0/84.0 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 20 | "\u001b[?25h\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/116.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m7.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 21 | "\u001b[?25h\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/179.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m179.3/179.3 kB\u001b[0m \u001b[31m12.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 22 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m10.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 23 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m12.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", 24 | "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", 25 | "gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.\u001b[0m\u001b[31m\n", 26 | "\u001b[0m" 27 | ] 28 | } 29 | ], 30 | "source": [ 31 | "!pip install transformers datasets evaluate -q\n", 32 | "import transformers" 33 | ] 34 | }, 35 | { 36 | "cell_type": "markdown", 37 | "metadata": { 38 | "id": "xQiRPWWHlSgv" 39 | }, 40 | "source": [ 41 | "### Using pre-trained transformers (seminar is worth 2 points)\n", 42 | "_for fun and profit_\n", 43 | "\n", 44 | "There are many toolkits that let you access pre-trained transformer models, but the most powerful and convenient by far is [`huggingface/transformers`](https://github.com/huggingface/transformers). In this week's practice, you'll learn how to download, apply and modify pre-trained transformers for a range of tasks. Buckle up, we're going in!\n", 45 | "\n", 46 | "\n", 47 | "__Pipelines:__ if all you want is to apply a pre-trained model, you can do that in one line of code using pipeline. Huggingface/transformers has a selection of pre-configured pipelines for masked language modelling, sentiment classification, question aswering, etc. ([see full list here](https://huggingface.co/transformers/main_classes/pipelines.html))\n", 48 | "\n", 49 | "A typical pipeline includes:\n", 50 | "* pre-processing, e.g. tokenization, subword segmentation\n", 51 | "* a backbone model, e.g. bert finetuned for classification\n", 52 | "* output post-processing\n", 53 | "\n", 54 | "Let's see it in action:" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": null, 60 | "metadata": { 61 | "id": "rP1KFtvLlJHR", 62 | "colab": { 63 | "base_uri": "https://localhost:8080/", 64 | "height": 284, 65 | "referenced_widgets": [ 66 | "9fe16c621bd643318bc341864efb3e4d", 67 | "d930d2cef8ed4ef29db50c13b9103056", 68 | "e8bbd6a633f54344aced163ff020c6b9", 69 | "53fd63ef3e6940c3a1a7da0d26f5bf00", 70 | "05c022d131f1454e8d9674da8bba015d", 71 | "1b51b4f4463341c1acdb4c57dbf8130b", 72 | "9fa2a910690040aabcd2b05e449b0c45", 73 | "f0ff5f31afd4450ea185123a577df0cf", 74 | "be63ea3b752642b6bf7e74e7fdc87430", 75 | "8dffc51581f74d7c81c6bdb3feb7ae4f", 76 | "e9a828847dcf41b5bbc1eaa44fde2c95", 77 | "e4e616b460ef4b11b4c352d467f494dc", 78 | "7ce760b40b834494a1efa1b078985bff", 79 | "d6f3f9f5db014075bd4fbca506e943f8", 80 | "97716ed55a074ce0b52e68be926a09c3", 81 | "86551d955c684d43b7f51d52cec868ec", 82 | "368689f0e8974d7eae2fd497c996ba20", 83 | "826d46d71d35444a8123fffd107d1f1b", 84 | "82fb664d4d614a0f9bea5b297783f0d5", 85 | "28d6b5e50d4545559dd0d610b30c3b96", 86 | "33ee09dba6d5422896396a102f78dbba", 87 | "e0a71a1e15fc4070b3147560f0e7d694", 88 | "c57238fcb356435ea7ad8daf7761879a", 89 | "0e35c262e8d343f7b83016b8c61fd744", 90 | "9081142688d44540a28e4b08e4091270", 91 | "83fdaa28d4ae47fea15f4ec950f1825f", 92 | "493ff70e5b59414c8ff285fc50391b12", 93 | "f6eaee53284c427f99126187a1883081", 94 | "04c3cf767105473fb6756223ef2d0030", 95 | "7ffd07ee1d4d482d8a9f1e2d8ddb2772", 96 | "15898c9d06734d60b23ee92d03592c33", 97 | "92bef379c00f46e2b02ace047c50a9af", 98 | "9a0a067c17b64df2a1184986c2412b26", 99 | "668ed89f0cdd460498543af635f4dd68", 100 | "ae9729bae35748b9b757ab558d50bdc4", 101 | "31797fec525e43aa9f968b63e95b8aaa", 102 | "5ec34446ceeb4f37a9f9ca0e43103416", 103 | "f0d3166484384e15ae480e8d42504d52", 104 | "560e25624e6844c7b92f705b9a6d06f5", 105 | "521e0794f59a4d0c8a74858d8ca8802c", 106 | "d6ba19a6e2374b34bd7a954a0f29a95a", 107 | "d04c2e5c9a9d4dc59a4ef5d61f55faf7", 108 | "0b4f45ff04994accb494587a662a648a", 109 | "beb8dd797e774760b641d27ecde161bc" 110 | ] 111 | }, 112 | "outputId": "fd2868b8-1a26-4873-da4d-011719825781" 113 | }, 114 | "outputs": [ 115 | { 116 | "output_type": "stream", 117 | "name": "stderr", 118 | "text": [ 119 | "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n", 120 | "The secret `HF_TOKEN` does not exist in your Colab secrets.\n", 121 | "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n", 122 | "You will be able to reuse this secret in all of your notebooks.\n", 123 | "Please note that authentication is recommended but still optional to access public models or datasets.\n", 124 | " warnings.warn(\n" 125 | ] 126 | }, 127 | { 128 | "output_type": "display_data", 129 | "data": { 130 | "text/plain": [ 131 | "config.json: 0%| | 0.00/629 [00:00\n", 223 | "outputs = \n", 224 | "\n", 225 | "assert sum(outputs.values()) == 3 and outputs[base64.decodebytes(b'YmFyYXRoZW9u\\n').decode()] == False\n", 226 | "print(\"Well done!\")" 227 | ] 228 | }, 229 | { 230 | "cell_type": "markdown", 231 | "metadata": { 232 | "id": "BRDhIH-XpSNo" 233 | }, 234 | "source": [ 235 | "You can also access vanilla Masked Language Model that was trained to predict masked words. Here's how:" 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": null, 241 | "metadata": { 242 | "id": "pa-8noIllRbZ" 243 | }, 244 | "outputs": [], 245 | "source": [ 246 | "mlm_model = pipeline('fill-mask', model=\"bert-base-uncased\")\n", 247 | "MASK = mlm_model.tokenizer.mask_token\n", 248 | "\n", 249 | "for hypo in mlm_model(f\"Donald {MASK} is the president of the united states.\"):\n", 250 | " print(f\"P={hypo['score']:.5f}\", hypo['sequence'])" 251 | ] 252 | }, 253 | { 254 | "cell_type": "code", 255 | "execution_count": null, 256 | "metadata": { 257 | "id": "9NxeG1Y5pwX1" 258 | }, 259 | "outputs": [], 260 | "source": [ 261 | "# Your turn: use bert to recall what year was the Soviet Union founded in\n", 262 | "mlm_model()" 263 | ] 264 | }, 265 | { 266 | "cell_type": "markdown", 267 | "metadata": { 268 | "id": "YJxRFzCSq903" 269 | }, 270 | "source": [ 271 | "```\n", 272 | "\n", 273 | "```\n", 274 | "\n", 275 | "```\n", 276 | "\n", 277 | "```\n", 278 | "\n", 279 | "\n", 280 | "Huggingface offers hundreds of pre-trained models that specialize on different tasks. You can quickly find the model you need using [this list](https://huggingface.co/models).\n" 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": null, 286 | "metadata": { 287 | "id": "HRux8Qp2hkXr" 288 | }, 289 | "outputs": [], 290 | "source": [ 291 | "text = \"\"\"Almost two-thirds of the 1.5 million people who viewed this liveblog had Googled to discover\n", 292 | " the latest on the Rosetta mission. They were treated to this detailed account by the Guardian’s science editor,\n", 293 | " Ian Sample, and astronomy writer Stuart Clark of the moment scientists landed a robotic spacecraft on a comet\n", 294 | " for the first time in history, and the delirious reaction it provoked at their headquarters in Germany.\n", 295 | " “We are there. We are sitting on the surface. Philae is talking to us,” said one scientist.\n", 296 | "\"\"\"\n", 297 | "\n", 298 | "# Task: create a pipeline for named entity recognition, use task name 'ner' and search for the right model in the list\n", 299 | "ner_model = \n", 300 | "\n", 301 | "named_entities = ner_model(text)" 302 | ] 303 | }, 304 | { 305 | "cell_type": "code", 306 | "execution_count": null, 307 | "metadata": { 308 | "id": "hf57MRzSiSON" 309 | }, 310 | "outputs": [], 311 | "source": [ 312 | "print('OUTPUT:', named_entities)\n", 313 | "word_to_entity = {item['word']: item['entity'] for item in named_entities}\n", 314 | "assert 'org' in word_to_entity.get('Guardian').lower() and 'per' in word_to_entity.get('Stuart').lower()\n", 315 | "print(\"All tests passed\")" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": { 321 | "id": "ULMownz6sP9n" 322 | }, 323 | "source": [ 324 | "### The building blocks of a pipeline\n", 325 | "\n", 326 | "Huggingface also allows you to access its pipelines on a lower level. There are two main abstractions for you:\n", 327 | "* `Tokenizer` - converts from strings to token ids and back\n", 328 | "* `Model` - a pytorch `nn.Module` with pre-trained weights\n", 329 | "\n", 330 | "You can use such models as part of your regular pytorch code: insert is as a layer in your model, apply it to a batch of data, backpropagate, optimize, etc." 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": null, 336 | "metadata": { 337 | "id": "KMJbV0QVsO0Q" 338 | }, 339 | "outputs": [], 340 | "source": [ 341 | "import torch\n", 342 | "from transformers import AutoTokenizer, AutoModel, pipeline\n", 343 | "\n", 344 | "model_name = 'bert-base-uncased'\n", 345 | "tokenizer = AutoTokenizer.from_pretrained(model_name)\n", 346 | "model = AutoModel.from_pretrained(model_name)\n" 347 | ] 348 | }, 349 | { 350 | "cell_type": "code", 351 | "execution_count": null, 352 | "metadata": { 353 | "id": "ZgSPHKPRxG6U" 354 | }, 355 | "outputs": [], 356 | "source": [ 357 | "lines = [\n", 358 | " \"Luke, I am your father.\",\n", 359 | " \"Life is what happens when you're busy making other plans.\",\n", 360 | " ]\n", 361 | "\n", 362 | "# tokenize a batch of inputs. \"pt\" means [p]y[t]orch tensors\n", 363 | "tokens_info = tokenizer(lines, padding=True, truncation=True, return_tensors=\"pt\")\n", 364 | "\n", 365 | "for key in tokens_info:\n", 366 | " print(key, tokens_info[key])\n", 367 | "\n", 368 | "print(\"Detokenized:\")\n", 369 | "for i in range(2):\n", 370 | " print(tokenizer.decode(tokens_info['input_ids'][i]))" 371 | ] 372 | }, 373 | { 374 | "cell_type": "code", 375 | "execution_count": null, 376 | "metadata": { 377 | "id": "MJkbHxERyfL4" 378 | }, 379 | "outputs": [], 380 | "source": [ 381 | "# You can now apply the model to get embeddings\n", 382 | "with torch.no_grad():\n", 383 | " out = model(**tokens_info)\n", 384 | "\n", 385 | "print(out['pooler_output'])" 386 | ] 387 | }, 388 | { 389 | "cell_type": "code", 390 | "execution_count": null, 391 | "metadata": { 392 | "id": "vWCajBGcAern" 393 | }, 394 | "outputs": [], 395 | "source": [ 396 | "import torch\n", 397 | "import numpy as np\n", 398 | "from transformers import GPT2Tokenizer, GPT2LMHeadModel\n", 399 | "\n", 400 | "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n", 401 | "tokenizer = GPT2Tokenizer.from_pretrained('gpt2', add_prefix_space=True)\n", 402 | "model = GPT2LMHeadModel.from_pretrained('gpt2').train(False).to(device)\n", 403 | "\n", 404 | "text = \"The Fermi paradox \"\n", 405 | "tokens = tokenizer.encode(text)\n", 406 | "num_steps = 1024 - len(tokens) + 1\n", 407 | "line_length, max_length = 0, 70\n", 408 | "\n", 409 | "print(end=tokenizer.decode(tokens))\n", 410 | "\n", 411 | "for i in range(num_steps):\n", 412 | " with torch.no_grad():\n", 413 | " logits = model(torch.as_tensor([tokens], device=device))[0]\n", 414 | " p_next = torch.softmax(logits[0, -1, :], dim=-1).data.cpu().numpy()\n", 415 | "\n", 416 | " next_token_index = p_next.argmax() #\n", 417 | " # YOUR TASK: change the code so that it performs nucleus sampling\n", 418 | "\n", 419 | " tokens.append(int(next_token_index))\n", 420 | " print(end=tokenizer.decode(tokens[-1]))\n", 421 | " line_length += len(tokenizer.decode(tokens[-1]))\n", 422 | " if line_length >= max_length:\n", 423 | " line_length = 0\n", 424 | " print()\n", 425 | "\n" 426 | ] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": { 431 | "id": "_Vij7Gc1wOaq" 432 | }, 433 | "source": [ 434 | "Transformers knowledge hub: https://huggingface.co/transformers/" 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "source": [ 440 | "Just pytorch (in particular) models, so we can train then as usual" 441 | ], 442 | "metadata": { 443 | "id": "QxMgH1-OGfw7" 444 | } 445 | }, 446 | { 447 | "cell_type": "code", 448 | "source": [ 449 | "from datasets import load_dataset\n", 450 | "from transformers import AutoTokenizer\n", 451 | "\n", 452 | "raw_datasets = load_dataset(\"glue\", \"mrpc\")\n", 453 | "checkpoint = \"bert-base-uncased\"\n", 454 | "tokenizer = AutoTokenizer.from_pretrained(checkpoint)\n", 455 | "\n", 456 | "\n", 457 | "def tokenize_function(example):\n", 458 | " return tokenizer(example[\"sentence1\"], example[\"sentence2\"], truncation=True)\n", 459 | "\n", 460 | "\n", 461 | "tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)\n", 462 | "tokenized_datasets = tokenized_datasets.remove_columns([\"sentence1\", \"sentence2\", \"idx\"])\n", 463 | "tokenized_datasets = tokenized_datasets.rename_column(\"label\", \"labels\")\n", 464 | "tokenized_datasets.set_format(\"torch\")\n", 465 | "tokenized_datasets[\"train\"].column_names" 466 | ], 467 | "metadata": { 468 | "id": "oWjfs7ZQGhv9" 469 | }, 470 | "execution_count": null, 471 | "outputs": [] 472 | }, 473 | { 474 | "cell_type": "code", 475 | "source": [ 476 | "from torch.utils.data import DataLoader\n", 477 | "from transformers import DataCollatorWithPadding\n", 478 | "\n", 479 | "data_collator = DataCollatorWithPadding(tokenizer=tokenizer)\n", 480 | "\n", 481 | "train_dataloader = DataLoader(\n", 482 | " tokenized_datasets[\"train\"], shuffle=True, batch_size=8, collate_fn=data_collator\n", 483 | ")\n", 484 | "eval_dataloader = DataLoader(\n", 485 | " tokenized_datasets[\"validation\"], batch_size=8, collate_fn=data_collator\n", 486 | ")" 487 | ], 488 | "metadata": { 489 | "id": "uspRhoQNGrE3" 490 | }, 491 | "execution_count": null, 492 | "outputs": [] 493 | }, 494 | { 495 | "cell_type": "code", 496 | "source": [ 497 | "from transformers import AdamW, AutoModelForSequenceClassification, get_scheduler\n", 498 | "import torch\n", 499 | "from tqdm import tqdm\n", 500 | "\n", 501 | "model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)\n", 502 | "optimizer = AdamW(model.parameters(), lr=3e-5)\n", 503 | "\n", 504 | "device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n", 505 | "model.to(device)\n", 506 | "\n", 507 | "num_epochs = 3\n", 508 | "num_training_steps = num_epochs * len(train_dataloader)\n", 509 | "lr_scheduler = get_scheduler(\n", 510 | " \"linear\",\n", 511 | " optimizer=optimizer,\n", 512 | " num_warmup_steps=0,\n", 513 | " num_training_steps=num_training_steps,\n", 514 | ")\n", 515 | "\n", 516 | "progress_bar = tqdm(range(num_training_steps))\n", 517 | "\n", 518 | "model.train()\n", 519 | "for epoch in range(num_epochs):\n", 520 | " for batch in train_dataloader:\n", 521 | " batch = {k: v.to(device) for k, v in batch.items()}\n", 522 | " outputs = model(**batch)\n", 523 | " loss = outputs.loss\n", 524 | " loss.backward()\n", 525 | "\n", 526 | " optimizer.step()\n", 527 | " lr_scheduler.step()\n", 528 | " optimizer.zero_grad()\n", 529 | " progress_bar.update(1)" 530 | ], 531 | "metadata": { 532 | "id": "vo7MlmdiG1Jx" 533 | }, 534 | "execution_count": null, 535 | "outputs": [] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "source": [ 540 | "import evaluate\n", 541 | "\n", 542 | "metric = evaluate.load(\"glue\", \"mrpc\")\n", 543 | "model.eval()\n", 544 | "for batch in eval_dataloader:\n", 545 | " batch = {k: v.to(device) for k, v in batch.items()}\n", 546 | " with torch.no_grad():\n", 547 | " outputs = model(**batch)\n", 548 | "\n", 549 | " logits = outputs.logits\n", 550 | " predictions = torch.argmax(logits, dim=-1)\n", 551 | " metric.add_batch(predictions=predictions, references=batch[\"labels\"])\n", 552 | "\n", 553 | "metric.compute()" 554 | ], 555 | "metadata": { 556 | "id": "wqtMj9JbH8-l" 557 | }, 558 | "execution_count": null, 559 | "outputs": [] 560 | } 561 | ], 562 | "metadata": { 563 | "accelerator": "GPU", 564 | "colab": { 565 | "provenance": [] 566 | }, 567 | "kernelspec": { 568 | "display_name": "Python 3", 569 | "language": "python", 570 | "name": "python3" 571 | }, 572 | "language_info": { 573 | "codemirror_mode": { 574 | "name": "ipython", 575 | "version": 3 576 | }, 577 | "file_extension": ".py", 578 | "mimetype": "text/x-python", 579 | "name": "python", 580 | "nbconvert_exporter": "python", 581 | "pygments_lexer": "ipython3", 582 | "version": "3.8.8" 583 | }, 584 | "widgets": { 585 | "application/vnd.jupyter.widget-state+json": { 586 | "9fe16c621bd643318bc341864efb3e4d": { 587 | "model_module": "@jupyter-widgets/controls", 588 | "model_name": "HBoxModel", 589 | "model_module_version": "1.5.0", 590 | "state": { 591 | "_dom_classes": [], 592 | "_model_module": "@jupyter-widgets/controls", 593 | "_model_module_version": "1.5.0", 594 | "_model_name": "HBoxModel", 595 | "_view_count": null, 596 | "_view_module": "@jupyter-widgets/controls", 597 | "_view_module_version": "1.5.0", 598 | "_view_name": "HBoxView", 599 | "box_style": "", 600 | "children": [ 601 | "IPY_MODEL_d930d2cef8ed4ef29db50c13b9103056", 602 | "IPY_MODEL_e8bbd6a633f54344aced163ff020c6b9", 603 | "IPY_MODEL_53fd63ef3e6940c3a1a7da0d26f5bf00" 604 | ], 605 | "layout": "IPY_MODEL_05c022d131f1454e8d9674da8bba015d" 606 | } 607 | }, 608 | "d930d2cef8ed4ef29db50c13b9103056": { 609 | "model_module": "@jupyter-widgets/controls", 610 | "model_name": "HTMLModel", 611 | "model_module_version": "1.5.0", 612 | "state": { 613 | "_dom_classes": [], 614 | "_model_module": "@jupyter-widgets/controls", 615 | "_model_module_version": "1.5.0", 616 | "_model_name": "HTMLModel", 617 | "_view_count": null, 618 | "_view_module": "@jupyter-widgets/controls", 619 | "_view_module_version": "1.5.0", 620 | "_view_name": "HTMLView", 621 | "description": "", 622 | "description_tooltip": null, 623 | "layout": "IPY_MODEL_1b51b4f4463341c1acdb4c57dbf8130b", 624 | "placeholder": "​", 625 | "style": "IPY_MODEL_9fa2a910690040aabcd2b05e449b0c45", 626 | "value": "config.json: 100%" 627 | } 628 | }, 629 | "e8bbd6a633f54344aced163ff020c6b9": { 630 | "model_module": "@jupyter-widgets/controls", 631 | "model_name": "FloatProgressModel", 632 | "model_module_version": "1.5.0", 633 | "state": { 634 | "_dom_classes": [], 635 | "_model_module": "@jupyter-widgets/controls", 636 | "_model_module_version": "1.5.0", 637 | "_model_name": "FloatProgressModel", 638 | "_view_count": null, 639 | "_view_module": "@jupyter-widgets/controls", 640 | "_view_module_version": "1.5.0", 641 | "_view_name": "ProgressView", 642 | "bar_style": "success", 643 | "description": "", 644 | "description_tooltip": null, 645 | "layout": "IPY_MODEL_f0ff5f31afd4450ea185123a577df0cf", 646 | "max": 629, 647 | "min": 0, 648 | "orientation": "horizontal", 649 | "style": "IPY_MODEL_be63ea3b752642b6bf7e74e7fdc87430", 650 | "value": 629 651 | } 652 | }, 653 | "53fd63ef3e6940c3a1a7da0d26f5bf00": { 654 | "model_module": "@jupyter-widgets/controls", 655 | "model_name": "HTMLModel", 656 | "model_module_version": "1.5.0", 657 | "state": { 658 | "_dom_classes": [], 659 | "_model_module": "@jupyter-widgets/controls", 660 | "_model_module_version": "1.5.0", 661 | "_model_name": "HTMLModel", 662 | "_view_count": null, 663 | "_view_module": "@jupyter-widgets/controls", 664 | "_view_module_version": "1.5.0", 665 | "_view_name": "HTMLView", 666 | "description": "", 667 | "description_tooltip": null, 668 | "layout": "IPY_MODEL_8dffc51581f74d7c81c6bdb3feb7ae4f", 669 | "placeholder": "​", 670 | "style": "IPY_MODEL_e9a828847dcf41b5bbc1eaa44fde2c95", 671 | "value": " 629/629 [00:00<00:00, 26.2kB/s]" 672 | } 673 | }, 674 | "05c022d131f1454e8d9674da8bba015d": { 675 | "model_module": "@jupyter-widgets/base", 676 | "model_name": "LayoutModel", 677 | "model_module_version": "1.2.0", 678 | "state": { 679 | "_model_module": "@jupyter-widgets/base", 680 | "_model_module_version": "1.2.0", 681 | "_model_name": "LayoutModel", 682 | "_view_count": null, 683 | "_view_module": "@jupyter-widgets/base", 684 | "_view_module_version": "1.2.0", 685 | "_view_name": "LayoutView", 686 | "align_content": null, 687 | "align_items": null, 688 | "align_self": null, 689 | "border": null, 690 | "bottom": null, 691 | "display": null, 692 | "flex": null, 693 | "flex_flow": null, 694 | "grid_area": null, 695 | "grid_auto_columns": null, 696 | "grid_auto_flow": null, 697 | "grid_auto_rows": null, 698 | "grid_column": null, 699 | "grid_gap": null, 700 | "grid_row": null, 701 | "grid_template_areas": null, 702 | "grid_template_columns": null, 703 | "grid_template_rows": null, 704 | "height": null, 705 | "justify_content": null, 706 | "justify_items": null, 707 | "left": null, 708 | "margin": null, 709 | "max_height": null, 710 | "max_width": null, 711 | "min_height": null, 712 | "min_width": null, 713 | "object_fit": null, 714 | "object_position": null, 715 | "order": null, 716 | "overflow": null, 717 | "overflow_x": null, 718 | "overflow_y": null, 719 | "padding": null, 720 | "right": null, 721 | "top": null, 722 | "visibility": null, 723 | "width": null 724 | } 725 | }, 726 | "1b51b4f4463341c1acdb4c57dbf8130b": { 727 | "model_module": "@jupyter-widgets/base", 728 | "model_name": "LayoutModel", 729 | "model_module_version": "1.2.0", 730 | "state": { 731 | "_model_module": "@jupyter-widgets/base", 732 | "_model_module_version": "1.2.0", 733 | "_model_name": "LayoutModel", 734 | "_view_count": null, 735 | "_view_module": "@jupyter-widgets/base", 736 | "_view_module_version": "1.2.0", 737 | "_view_name": "LayoutView", 738 | "align_content": null, 739 | "align_items": null, 740 | "align_self": null, 741 | "border": null, 742 | "bottom": null, 743 | "display": null, 744 | "flex": null, 745 | "flex_flow": null, 746 | "grid_area": null, 747 | "grid_auto_columns": null, 748 | "grid_auto_flow": null, 749 | "grid_auto_rows": null, 750 | "grid_column": null, 751 | "grid_gap": null, 752 | "grid_row": null, 753 | "grid_template_areas": null, 754 | "grid_template_columns": null, 755 | "grid_template_rows": null, 756 | "height": null, 757 | "justify_content": null, 758 | "justify_items": null, 759 | "left": null, 760 | "margin": null, 761 | "max_height": null, 762 | "max_width": null, 763 | "min_height": null, 764 | "min_width": null, 765 | "object_fit": null, 766 | "object_position": null, 767 | "order": null, 768 | "overflow": null, 769 | "overflow_x": null, 770 | "overflow_y": null, 771 | "padding": null, 772 | "right": null, 773 | "top": null, 774 | "visibility": null, 775 | "width": null 776 | } 777 | }, 778 | "9fa2a910690040aabcd2b05e449b0c45": { 779 | "model_module": "@jupyter-widgets/controls", 780 | "model_name": "DescriptionStyleModel", 781 | "model_module_version": "1.5.0", 782 | "state": { 783 | "_model_module": "@jupyter-widgets/controls", 784 | "_model_module_version": "1.5.0", 785 | "_model_name": "DescriptionStyleModel", 786 | "_view_count": null, 787 | "_view_module": "@jupyter-widgets/base", 788 | "_view_module_version": "1.2.0", 789 | "_view_name": "StyleView", 790 | "description_width": "" 791 | } 792 | }, 793 | "f0ff5f31afd4450ea185123a577df0cf": { 794 | "model_module": "@jupyter-widgets/base", 795 | "model_name": "LayoutModel", 796 | "model_module_version": "1.2.0", 797 | "state": { 798 | "_model_module": "@jupyter-widgets/base", 799 | "_model_module_version": "1.2.0", 800 | "_model_name": "LayoutModel", 801 | "_view_count": null, 802 | "_view_module": "@jupyter-widgets/base", 803 | "_view_module_version": "1.2.0", 804 | "_view_name": "LayoutView", 805 | "align_content": null, 806 | "align_items": null, 807 | "align_self": null, 808 | "border": null, 809 | "bottom": null, 810 | "display": null, 811 | "flex": null, 812 | "flex_flow": null, 813 | "grid_area": null, 814 | "grid_auto_columns": null, 815 | "grid_auto_flow": null, 816 | "grid_auto_rows": null, 817 | "grid_column": null, 818 | "grid_gap": null, 819 | "grid_row": null, 820 | "grid_template_areas": null, 821 | "grid_template_columns": null, 822 | "grid_template_rows": null, 823 | "height": null, 824 | "justify_content": null, 825 | "justify_items": null, 826 | "left": null, 827 | "margin": null, 828 | "max_height": null, 829 | "max_width": null, 830 | "min_height": null, 831 | "min_width": null, 832 | "object_fit": null, 833 | "object_position": null, 834 | "order": null, 835 | "overflow": null, 836 | "overflow_x": null, 837 | "overflow_y": null, 838 | "padding": null, 839 | "right": null, 840 | "top": null, 841 | "visibility": null, 842 | "width": null 843 | } 844 | }, 845 | "be63ea3b752642b6bf7e74e7fdc87430": { 846 | "model_module": "@jupyter-widgets/controls", 847 | "model_name": "ProgressStyleModel", 848 | "model_module_version": "1.5.0", 849 | "state": { 850 | "_model_module": "@jupyter-widgets/controls", 851 | "_model_module_version": "1.5.0", 852 | "_model_name": "ProgressStyleModel", 853 | "_view_count": null, 854 | "_view_module": "@jupyter-widgets/base", 855 | "_view_module_version": "1.2.0", 856 | "_view_name": "StyleView", 857 | "bar_color": null, 858 | "description_width": "" 859 | } 860 | }, 861 | "8dffc51581f74d7c81c6bdb3feb7ae4f": { 862 | "model_module": "@jupyter-widgets/base", 863 | "model_name": "LayoutModel", 864 | "model_module_version": "1.2.0", 865 | "state": { 866 | "_model_module": "@jupyter-widgets/base", 867 | "_model_module_version": "1.2.0", 868 | "_model_name": "LayoutModel", 869 | "_view_count": null, 870 | "_view_module": "@jupyter-widgets/base", 871 | "_view_module_version": "1.2.0", 872 | "_view_name": "LayoutView", 873 | "align_content": null, 874 | "align_items": null, 875 | "align_self": null, 876 | "border": null, 877 | "bottom": null, 878 | "display": null, 879 | "flex": null, 880 | "flex_flow": null, 881 | "grid_area": null, 882 | "grid_auto_columns": null, 883 | "grid_auto_flow": null, 884 | "grid_auto_rows": null, 885 | "grid_column": null, 886 | "grid_gap": null, 887 | "grid_row": null, 888 | "grid_template_areas": null, 889 | "grid_template_columns": null, 890 | "grid_template_rows": null, 891 | "height": null, 892 | "justify_content": null, 893 | "justify_items": null, 894 | "left": null, 895 | "margin": null, 896 | "max_height": null, 897 | "max_width": null, 898 | "min_height": null, 899 | "min_width": null, 900 | "object_fit": null, 901 | "object_position": null, 902 | "order": null, 903 | "overflow": null, 904 | "overflow_x": null, 905 | "overflow_y": null, 906 | "padding": null, 907 | "right": null, 908 | "top": null, 909 | "visibility": null, 910 | "width": null 911 | } 912 | }, 913 | "e9a828847dcf41b5bbc1eaa44fde2c95": { 914 | "model_module": "@jupyter-widgets/controls", 915 | "model_name": "DescriptionStyleModel", 916 | "model_module_version": "1.5.0", 917 | "state": { 918 | "_model_module": "@jupyter-widgets/controls", 919 | "_model_module_version": "1.5.0", 920 | "_model_name": "DescriptionStyleModel", 921 | "_view_count": null, 922 | "_view_module": "@jupyter-widgets/base", 923 | "_view_module_version": "1.2.0", 924 | "_view_name": "StyleView", 925 | "description_width": "" 926 | } 927 | }, 928 | "e4e616b460ef4b11b4c352d467f494dc": { 929 | "model_module": "@jupyter-widgets/controls", 930 | "model_name": "HBoxModel", 931 | "model_module_version": "1.5.0", 932 | "state": { 933 | "_dom_classes": [], 934 | "_model_module": "@jupyter-widgets/controls", 935 | "_model_module_version": "1.5.0", 936 | "_model_name": "HBoxModel", 937 | "_view_count": null, 938 | "_view_module": "@jupyter-widgets/controls", 939 | "_view_module_version": "1.5.0", 940 | "_view_name": "HBoxView", 941 | "box_style": "", 942 | "children": [ 943 | "IPY_MODEL_7ce760b40b834494a1efa1b078985bff", 944 | "IPY_MODEL_d6f3f9f5db014075bd4fbca506e943f8", 945 | "IPY_MODEL_97716ed55a074ce0b52e68be926a09c3" 946 | ], 947 | "layout": "IPY_MODEL_86551d955c684d43b7f51d52cec868ec" 948 | } 949 | }, 950 | "7ce760b40b834494a1efa1b078985bff": { 951 | "model_module": "@jupyter-widgets/controls", 952 | "model_name": "HTMLModel", 953 | "model_module_version": "1.5.0", 954 | "state": { 955 | "_dom_classes": [], 956 | "_model_module": "@jupyter-widgets/controls", 957 | "_model_module_version": "1.5.0", 958 | "_model_name": "HTMLModel", 959 | "_view_count": null, 960 | "_view_module": "@jupyter-widgets/controls", 961 | "_view_module_version": "1.5.0", 962 | "_view_name": "HTMLView", 963 | "description": "", 964 | "description_tooltip": null, 965 | "layout": "IPY_MODEL_368689f0e8974d7eae2fd497c996ba20", 966 | "placeholder": "​", 967 | "style": "IPY_MODEL_826d46d71d35444a8123fffd107d1f1b", 968 | "value": "model.safetensors: 100%" 969 | } 970 | }, 971 | "d6f3f9f5db014075bd4fbca506e943f8": { 972 | "model_module": "@jupyter-widgets/controls", 973 | "model_name": "FloatProgressModel", 974 | "model_module_version": "1.5.0", 975 | "state": { 976 | "_dom_classes": [], 977 | "_model_module": "@jupyter-widgets/controls", 978 | "_model_module_version": "1.5.0", 979 | "_model_name": "FloatProgressModel", 980 | "_view_count": null, 981 | "_view_module": "@jupyter-widgets/controls", 982 | "_view_module_version": "1.5.0", 983 | "_view_name": "ProgressView", 984 | "bar_style": "success", 985 | "description": "", 986 | "description_tooltip": null, 987 | "layout": "IPY_MODEL_82fb664d4d614a0f9bea5b297783f0d5", 988 | "max": 267832558, 989 | "min": 0, 990 | "orientation": "horizontal", 991 | "style": "IPY_MODEL_28d6b5e50d4545559dd0d610b30c3b96", 992 | "value": 267832558 993 | } 994 | }, 995 | "97716ed55a074ce0b52e68be926a09c3": { 996 | "model_module": "@jupyter-widgets/controls", 997 | "model_name": "HTMLModel", 998 | "model_module_version": "1.5.0", 999 | "state": { 1000 | "_dom_classes": [], 1001 | "_model_module": "@jupyter-widgets/controls", 1002 | "_model_module_version": "1.5.0", 1003 | "_model_name": "HTMLModel", 1004 | "_view_count": null, 1005 | "_view_module": "@jupyter-widgets/controls", 1006 | "_view_module_version": "1.5.0", 1007 | "_view_name": "HTMLView", 1008 | "description": "", 1009 | "description_tooltip": null, 1010 | "layout": "IPY_MODEL_33ee09dba6d5422896396a102f78dbba", 1011 | "placeholder": "​", 1012 | "style": "IPY_MODEL_e0a71a1e15fc4070b3147560f0e7d694", 1013 | "value": " 268M/268M [00:01<00:00, 226MB/s]" 1014 | } 1015 | }, 1016 | "86551d955c684d43b7f51d52cec868ec": { 1017 | "model_module": "@jupyter-widgets/base", 1018 | "model_name": "LayoutModel", 1019 | "model_module_version": "1.2.0", 1020 | "state": { 1021 | "_model_module": "@jupyter-widgets/base", 1022 | "_model_module_version": "1.2.0", 1023 | "_model_name": "LayoutModel", 1024 | "_view_count": null, 1025 | "_view_module": "@jupyter-widgets/base", 1026 | "_view_module_version": "1.2.0", 1027 | "_view_name": "LayoutView", 1028 | "align_content": null, 1029 | "align_items": null, 1030 | "align_self": null, 1031 | "border": null, 1032 | "bottom": null, 1033 | "display": null, 1034 | "flex": null, 1035 | "flex_flow": null, 1036 | "grid_area": null, 1037 | "grid_auto_columns": null, 1038 | "grid_auto_flow": null, 1039 | "grid_auto_rows": null, 1040 | "grid_column": null, 1041 | "grid_gap": null, 1042 | "grid_row": null, 1043 | "grid_template_areas": null, 1044 | "grid_template_columns": null, 1045 | "grid_template_rows": null, 1046 | "height": null, 1047 | "justify_content": null, 1048 | "justify_items": null, 1049 | "left": null, 1050 | "margin": null, 1051 | "max_height": null, 1052 | "max_width": null, 1053 | "min_height": null, 1054 | "min_width": null, 1055 | "object_fit": null, 1056 | "object_position": null, 1057 | "order": null, 1058 | "overflow": null, 1059 | "overflow_x": null, 1060 | "overflow_y": null, 1061 | "padding": null, 1062 | "right": null, 1063 | "top": null, 1064 | "visibility": null, 1065 | "width": null 1066 | } 1067 | }, 1068 | "368689f0e8974d7eae2fd497c996ba20": { 1069 | "model_module": "@jupyter-widgets/base", 1070 | "model_name": "LayoutModel", 1071 | "model_module_version": "1.2.0", 1072 | "state": { 1073 | "_model_module": "@jupyter-widgets/base", 1074 | "_model_module_version": "1.2.0", 1075 | "_model_name": "LayoutModel", 1076 | "_view_count": null, 1077 | "_view_module": "@jupyter-widgets/base", 1078 | "_view_module_version": "1.2.0", 1079 | "_view_name": "LayoutView", 1080 | "align_content": null, 1081 | "align_items": null, 1082 | "align_self": null, 1083 | "border": null, 1084 | "bottom": null, 1085 | "display": null, 1086 | "flex": null, 1087 | "flex_flow": null, 1088 | "grid_area": null, 1089 | "grid_auto_columns": null, 1090 | "grid_auto_flow": null, 1091 | "grid_auto_rows": null, 1092 | "grid_column": null, 1093 | "grid_gap": null, 1094 | "grid_row": null, 1095 | "grid_template_areas": null, 1096 | "grid_template_columns": null, 1097 | "grid_template_rows": null, 1098 | "height": null, 1099 | "justify_content": null, 1100 | "justify_items": null, 1101 | "left": null, 1102 | "margin": null, 1103 | "max_height": null, 1104 | "max_width": null, 1105 | "min_height": null, 1106 | "min_width": null, 1107 | "object_fit": null, 1108 | "object_position": null, 1109 | "order": null, 1110 | "overflow": null, 1111 | "overflow_x": null, 1112 | "overflow_y": null, 1113 | "padding": null, 1114 | "right": null, 1115 | "top": null, 1116 | "visibility": null, 1117 | "width": null 1118 | } 1119 | }, 1120 | "826d46d71d35444a8123fffd107d1f1b": { 1121 | "model_module": "@jupyter-widgets/controls", 1122 | "model_name": "DescriptionStyleModel", 1123 | "model_module_version": "1.5.0", 1124 | "state": { 1125 | "_model_module": "@jupyter-widgets/controls", 1126 | "_model_module_version": "1.5.0", 1127 | "_model_name": "DescriptionStyleModel", 1128 | "_view_count": null, 1129 | "_view_module": "@jupyter-widgets/base", 1130 | "_view_module_version": "1.2.0", 1131 | "_view_name": "StyleView", 1132 | "description_width": "" 1133 | } 1134 | }, 1135 | "82fb664d4d614a0f9bea5b297783f0d5": { 1136 | "model_module": "@jupyter-widgets/base", 1137 | "model_name": "LayoutModel", 1138 | "model_module_version": "1.2.0", 1139 | "state": { 1140 | "_model_module": "@jupyter-widgets/base", 1141 | "_model_module_version": "1.2.0", 1142 | "_model_name": "LayoutModel", 1143 | "_view_count": null, 1144 | "_view_module": "@jupyter-widgets/base", 1145 | "_view_module_version": "1.2.0", 1146 | "_view_name": "LayoutView", 1147 | "align_content": null, 1148 | "align_items": null, 1149 | "align_self": null, 1150 | "border": null, 1151 | "bottom": null, 1152 | "display": null, 1153 | "flex": null, 1154 | "flex_flow": null, 1155 | "grid_area": null, 1156 | "grid_auto_columns": null, 1157 | "grid_auto_flow": null, 1158 | "grid_auto_rows": null, 1159 | "grid_column": null, 1160 | "grid_gap": null, 1161 | "grid_row": null, 1162 | "grid_template_areas": null, 1163 | "grid_template_columns": null, 1164 | "grid_template_rows": null, 1165 | "height": null, 1166 | "justify_content": null, 1167 | "justify_items": null, 1168 | "left": null, 1169 | "margin": null, 1170 | "max_height": null, 1171 | "max_width": null, 1172 | "min_height": null, 1173 | "min_width": null, 1174 | "object_fit": null, 1175 | "object_position": null, 1176 | "order": null, 1177 | "overflow": null, 1178 | "overflow_x": null, 1179 | "overflow_y": null, 1180 | "padding": null, 1181 | "right": null, 1182 | "top": null, 1183 | "visibility": null, 1184 | "width": null 1185 | } 1186 | }, 1187 | "28d6b5e50d4545559dd0d610b30c3b96": { 1188 | "model_module": "@jupyter-widgets/controls", 1189 | "model_name": "ProgressStyleModel", 1190 | "model_module_version": "1.5.0", 1191 | "state": { 1192 | "_model_module": "@jupyter-widgets/controls", 1193 | "_model_module_version": "1.5.0", 1194 | "_model_name": "ProgressStyleModel", 1195 | "_view_count": null, 1196 | "_view_module": "@jupyter-widgets/base", 1197 | "_view_module_version": "1.2.0", 1198 | "_view_name": "StyleView", 1199 | "bar_color": null, 1200 | "description_width": "" 1201 | } 1202 | }, 1203 | "33ee09dba6d5422896396a102f78dbba": { 1204 | "model_module": "@jupyter-widgets/base", 1205 | "model_name": "LayoutModel", 1206 | "model_module_version": "1.2.0", 1207 | "state": { 1208 | "_model_module": "@jupyter-widgets/base", 1209 | "_model_module_version": "1.2.0", 1210 | "_model_name": "LayoutModel", 1211 | "_view_count": null, 1212 | "_view_module": "@jupyter-widgets/base", 1213 | "_view_module_version": "1.2.0", 1214 | "_view_name": "LayoutView", 1215 | "align_content": null, 1216 | "align_items": null, 1217 | "align_self": null, 1218 | "border": null, 1219 | "bottom": null, 1220 | "display": null, 1221 | "flex": null, 1222 | "flex_flow": null, 1223 | "grid_area": null, 1224 | "grid_auto_columns": null, 1225 | "grid_auto_flow": null, 1226 | "grid_auto_rows": null, 1227 | "grid_column": null, 1228 | "grid_gap": null, 1229 | "grid_row": null, 1230 | "grid_template_areas": null, 1231 | "grid_template_columns": null, 1232 | "grid_template_rows": null, 1233 | "height": null, 1234 | "justify_content": null, 1235 | "justify_items": null, 1236 | "left": null, 1237 | "margin": null, 1238 | "max_height": null, 1239 | "max_width": null, 1240 | "min_height": null, 1241 | "min_width": null, 1242 | "object_fit": null, 1243 | "object_position": null, 1244 | "order": null, 1245 | "overflow": null, 1246 | "overflow_x": null, 1247 | "overflow_y": null, 1248 | "padding": null, 1249 | "right": null, 1250 | "top": null, 1251 | "visibility": null, 1252 | "width": null 1253 | } 1254 | }, 1255 | "e0a71a1e15fc4070b3147560f0e7d694": { 1256 | "model_module": "@jupyter-widgets/controls", 1257 | "model_name": "DescriptionStyleModel", 1258 | "model_module_version": "1.5.0", 1259 | "state": { 1260 | "_model_module": "@jupyter-widgets/controls", 1261 | "_model_module_version": "1.5.0", 1262 | "_model_name": "DescriptionStyleModel", 1263 | "_view_count": null, 1264 | "_view_module": "@jupyter-widgets/base", 1265 | "_view_module_version": "1.2.0", 1266 | "_view_name": "StyleView", 1267 | "description_width": "" 1268 | } 1269 | }, 1270 | "c57238fcb356435ea7ad8daf7761879a": { 1271 | "model_module": "@jupyter-widgets/controls", 1272 | "model_name": "HBoxModel", 1273 | "model_module_version": "1.5.0", 1274 | "state": { 1275 | "_dom_classes": [], 1276 | "_model_module": "@jupyter-widgets/controls", 1277 | "_model_module_version": "1.5.0", 1278 | "_model_name": "HBoxModel", 1279 | "_view_count": null, 1280 | "_view_module": "@jupyter-widgets/controls", 1281 | "_view_module_version": "1.5.0", 1282 | "_view_name": "HBoxView", 1283 | "box_style": "", 1284 | "children": [ 1285 | "IPY_MODEL_0e35c262e8d343f7b83016b8c61fd744", 1286 | "IPY_MODEL_9081142688d44540a28e4b08e4091270", 1287 | "IPY_MODEL_83fdaa28d4ae47fea15f4ec950f1825f" 1288 | ], 1289 | "layout": "IPY_MODEL_493ff70e5b59414c8ff285fc50391b12" 1290 | } 1291 | }, 1292 | "0e35c262e8d343f7b83016b8c61fd744": { 1293 | "model_module": "@jupyter-widgets/controls", 1294 | "model_name": "HTMLModel", 1295 | "model_module_version": "1.5.0", 1296 | "state": { 1297 | "_dom_classes": [], 1298 | "_model_module": "@jupyter-widgets/controls", 1299 | "_model_module_version": "1.5.0", 1300 | "_model_name": "HTMLModel", 1301 | "_view_count": null, 1302 | "_view_module": "@jupyter-widgets/controls", 1303 | "_view_module_version": "1.5.0", 1304 | "_view_name": "HTMLView", 1305 | "description": "", 1306 | "description_tooltip": null, 1307 | "layout": "IPY_MODEL_f6eaee53284c427f99126187a1883081", 1308 | "placeholder": "​", 1309 | "style": "IPY_MODEL_04c3cf767105473fb6756223ef2d0030", 1310 | "value": "tokenizer_config.json: 100%" 1311 | } 1312 | }, 1313 | "9081142688d44540a28e4b08e4091270": { 1314 | "model_module": "@jupyter-widgets/controls", 1315 | "model_name": "FloatProgressModel", 1316 | "model_module_version": "1.5.0", 1317 | "state": { 1318 | "_dom_classes": [], 1319 | "_model_module": "@jupyter-widgets/controls", 1320 | "_model_module_version": "1.5.0", 1321 | "_model_name": "FloatProgressModel", 1322 | "_view_count": null, 1323 | "_view_module": "@jupyter-widgets/controls", 1324 | "_view_module_version": "1.5.0", 1325 | "_view_name": "ProgressView", 1326 | "bar_style": "success", 1327 | "description": "", 1328 | "description_tooltip": null, 1329 | "layout": "IPY_MODEL_7ffd07ee1d4d482d8a9f1e2d8ddb2772", 1330 | "max": 48, 1331 | "min": 0, 1332 | "orientation": "horizontal", 1333 | "style": "IPY_MODEL_15898c9d06734d60b23ee92d03592c33", 1334 | "value": 48 1335 | } 1336 | }, 1337 | "83fdaa28d4ae47fea15f4ec950f1825f": { 1338 | "model_module": "@jupyter-widgets/controls", 1339 | "model_name": "HTMLModel", 1340 | "model_module_version": "1.5.0", 1341 | "state": { 1342 | "_dom_classes": [], 1343 | "_model_module": "@jupyter-widgets/controls", 1344 | "_model_module_version": "1.5.0", 1345 | "_model_name": "HTMLModel", 1346 | "_view_count": null, 1347 | "_view_module": "@jupyter-widgets/controls", 1348 | "_view_module_version": "1.5.0", 1349 | "_view_name": "HTMLView", 1350 | "description": "", 1351 | "description_tooltip": null, 1352 | "layout": "IPY_MODEL_92bef379c00f46e2b02ace047c50a9af", 1353 | "placeholder": "​", 1354 | "style": "IPY_MODEL_9a0a067c17b64df2a1184986c2412b26", 1355 | "value": " 48.0/48.0 [00:00<00:00, 2.55kB/s]" 1356 | } 1357 | }, 1358 | "493ff70e5b59414c8ff285fc50391b12": { 1359 | "model_module": "@jupyter-widgets/base", 1360 | "model_name": "LayoutModel", 1361 | "model_module_version": "1.2.0", 1362 | "state": { 1363 | "_model_module": "@jupyter-widgets/base", 1364 | "_model_module_version": "1.2.0", 1365 | "_model_name": "LayoutModel", 1366 | "_view_count": null, 1367 | "_view_module": "@jupyter-widgets/base", 1368 | "_view_module_version": "1.2.0", 1369 | "_view_name": "LayoutView", 1370 | "align_content": null, 1371 | "align_items": null, 1372 | "align_self": null, 1373 | "border": null, 1374 | "bottom": null, 1375 | "display": null, 1376 | "flex": null, 1377 | "flex_flow": null, 1378 | "grid_area": null, 1379 | "grid_auto_columns": null, 1380 | "grid_auto_flow": null, 1381 | "grid_auto_rows": null, 1382 | "grid_column": null, 1383 | "grid_gap": null, 1384 | "grid_row": null, 1385 | "grid_template_areas": null, 1386 | "grid_template_columns": null, 1387 | "grid_template_rows": null, 1388 | "height": null, 1389 | "justify_content": null, 1390 | "justify_items": null, 1391 | "left": null, 1392 | "margin": null, 1393 | "max_height": null, 1394 | "max_width": null, 1395 | "min_height": null, 1396 | "min_width": null, 1397 | "object_fit": null, 1398 | "object_position": null, 1399 | "order": null, 1400 | "overflow": null, 1401 | "overflow_x": null, 1402 | "overflow_y": null, 1403 | "padding": null, 1404 | "right": null, 1405 | "top": null, 1406 | "visibility": null, 1407 | "width": null 1408 | } 1409 | }, 1410 | "f6eaee53284c427f99126187a1883081": { 1411 | "model_module": "@jupyter-widgets/base", 1412 | "model_name": "LayoutModel", 1413 | "model_module_version": "1.2.0", 1414 | "state": { 1415 | "_model_module": "@jupyter-widgets/base", 1416 | "_model_module_version": "1.2.0", 1417 | "_model_name": "LayoutModel", 1418 | "_view_count": null, 1419 | "_view_module": "@jupyter-widgets/base", 1420 | "_view_module_version": "1.2.0", 1421 | "_view_name": "LayoutView", 1422 | "align_content": null, 1423 | "align_items": null, 1424 | "align_self": null, 1425 | "border": null, 1426 | "bottom": null, 1427 | "display": null, 1428 | "flex": null, 1429 | "flex_flow": null, 1430 | "grid_area": null, 1431 | "grid_auto_columns": null, 1432 | "grid_auto_flow": null, 1433 | "grid_auto_rows": null, 1434 | "grid_column": null, 1435 | "grid_gap": null, 1436 | "grid_row": null, 1437 | "grid_template_areas": null, 1438 | "grid_template_columns": null, 1439 | "grid_template_rows": null, 1440 | "height": null, 1441 | "justify_content": null, 1442 | "justify_items": null, 1443 | "left": null, 1444 | "margin": null, 1445 | "max_height": null, 1446 | "max_width": null, 1447 | "min_height": null, 1448 | "min_width": null, 1449 | "object_fit": null, 1450 | "object_position": null, 1451 | "order": null, 1452 | "overflow": null, 1453 | "overflow_x": null, 1454 | "overflow_y": null, 1455 | "padding": null, 1456 | "right": null, 1457 | "top": null, 1458 | "visibility": null, 1459 | "width": null 1460 | } 1461 | }, 1462 | "04c3cf767105473fb6756223ef2d0030": { 1463 | "model_module": "@jupyter-widgets/controls", 1464 | "model_name": "DescriptionStyleModel", 1465 | "model_module_version": "1.5.0", 1466 | "state": { 1467 | "_model_module": "@jupyter-widgets/controls", 1468 | "_model_module_version": "1.5.0", 1469 | "_model_name": "DescriptionStyleModel", 1470 | "_view_count": null, 1471 | "_view_module": "@jupyter-widgets/base", 1472 | "_view_module_version": "1.2.0", 1473 | "_view_name": "StyleView", 1474 | "description_width": "" 1475 | } 1476 | }, 1477 | "7ffd07ee1d4d482d8a9f1e2d8ddb2772": { 1478 | "model_module": "@jupyter-widgets/base", 1479 | "model_name": "LayoutModel", 1480 | "model_module_version": "1.2.0", 1481 | "state": { 1482 | "_model_module": "@jupyter-widgets/base", 1483 | "_model_module_version": "1.2.0", 1484 | "_model_name": "LayoutModel", 1485 | "_view_count": null, 1486 | "_view_module": "@jupyter-widgets/base", 1487 | "_view_module_version": "1.2.0", 1488 | "_view_name": "LayoutView", 1489 | "align_content": null, 1490 | "align_items": null, 1491 | "align_self": null, 1492 | "border": null, 1493 | "bottom": null, 1494 | "display": null, 1495 | "flex": null, 1496 | "flex_flow": null, 1497 | "grid_area": null, 1498 | "grid_auto_columns": null, 1499 | "grid_auto_flow": null, 1500 | "grid_auto_rows": null, 1501 | "grid_column": null, 1502 | "grid_gap": null, 1503 | "grid_row": null, 1504 | "grid_template_areas": null, 1505 | "grid_template_columns": null, 1506 | "grid_template_rows": null, 1507 | "height": null, 1508 | "justify_content": null, 1509 | "justify_items": null, 1510 | "left": null, 1511 | "margin": null, 1512 | "max_height": null, 1513 | "max_width": null, 1514 | "min_height": null, 1515 | "min_width": null, 1516 | "object_fit": null, 1517 | "object_position": null, 1518 | "order": null, 1519 | "overflow": null, 1520 | "overflow_x": null, 1521 | "overflow_y": null, 1522 | "padding": null, 1523 | "right": null, 1524 | "top": null, 1525 | "visibility": null, 1526 | "width": null 1527 | } 1528 | }, 1529 | "15898c9d06734d60b23ee92d03592c33": { 1530 | "model_module": "@jupyter-widgets/controls", 1531 | "model_name": "ProgressStyleModel", 1532 | "model_module_version": "1.5.0", 1533 | "state": { 1534 | "_model_module": "@jupyter-widgets/controls", 1535 | "_model_module_version": "1.5.0", 1536 | "_model_name": "ProgressStyleModel", 1537 | "_view_count": null, 1538 | "_view_module": "@jupyter-widgets/base", 1539 | "_view_module_version": "1.2.0", 1540 | "_view_name": "StyleView", 1541 | "bar_color": null, 1542 | "description_width": "" 1543 | } 1544 | }, 1545 | "92bef379c00f46e2b02ace047c50a9af": { 1546 | "model_module": "@jupyter-widgets/base", 1547 | "model_name": "LayoutModel", 1548 | "model_module_version": "1.2.0", 1549 | "state": { 1550 | "_model_module": "@jupyter-widgets/base", 1551 | "_model_module_version": "1.2.0", 1552 | "_model_name": "LayoutModel", 1553 | "_view_count": null, 1554 | "_view_module": "@jupyter-widgets/base", 1555 | "_view_module_version": "1.2.0", 1556 | "_view_name": "LayoutView", 1557 | "align_content": null, 1558 | "align_items": null, 1559 | "align_self": null, 1560 | "border": null, 1561 | "bottom": null, 1562 | "display": null, 1563 | "flex": null, 1564 | "flex_flow": null, 1565 | "grid_area": null, 1566 | "grid_auto_columns": null, 1567 | "grid_auto_flow": null, 1568 | "grid_auto_rows": null, 1569 | "grid_column": null, 1570 | "grid_gap": null, 1571 | "grid_row": null, 1572 | "grid_template_areas": null, 1573 | "grid_template_columns": null, 1574 | "grid_template_rows": null, 1575 | "height": null, 1576 | "justify_content": null, 1577 | "justify_items": null, 1578 | "left": null, 1579 | "margin": null, 1580 | "max_height": null, 1581 | "max_width": null, 1582 | "min_height": null, 1583 | "min_width": null, 1584 | "object_fit": null, 1585 | "object_position": null, 1586 | "order": null, 1587 | "overflow": null, 1588 | "overflow_x": null, 1589 | "overflow_y": null, 1590 | "padding": null, 1591 | "right": null, 1592 | "top": null, 1593 | "visibility": null, 1594 | "width": null 1595 | } 1596 | }, 1597 | "9a0a067c17b64df2a1184986c2412b26": { 1598 | "model_module": "@jupyter-widgets/controls", 1599 | "model_name": "DescriptionStyleModel", 1600 | "model_module_version": "1.5.0", 1601 | "state": { 1602 | "_model_module": "@jupyter-widgets/controls", 1603 | "_model_module_version": "1.5.0", 1604 | "_model_name": "DescriptionStyleModel", 1605 | "_view_count": null, 1606 | "_view_module": "@jupyter-widgets/base", 1607 | "_view_module_version": "1.2.0", 1608 | "_view_name": "StyleView", 1609 | "description_width": "" 1610 | } 1611 | }, 1612 | "668ed89f0cdd460498543af635f4dd68": { 1613 | "model_module": "@jupyter-widgets/controls", 1614 | "model_name": "HBoxModel", 1615 | "model_module_version": "1.5.0", 1616 | "state": { 1617 | "_dom_classes": [], 1618 | "_model_module": "@jupyter-widgets/controls", 1619 | "_model_module_version": "1.5.0", 1620 | "_model_name": "HBoxModel", 1621 | "_view_count": null, 1622 | "_view_module": "@jupyter-widgets/controls", 1623 | "_view_module_version": "1.5.0", 1624 | "_view_name": "HBoxView", 1625 | "box_style": "", 1626 | "children": [ 1627 | "IPY_MODEL_ae9729bae35748b9b757ab558d50bdc4", 1628 | "IPY_MODEL_31797fec525e43aa9f968b63e95b8aaa", 1629 | "IPY_MODEL_5ec34446ceeb4f37a9f9ca0e43103416" 1630 | ], 1631 | "layout": "IPY_MODEL_f0d3166484384e15ae480e8d42504d52" 1632 | } 1633 | }, 1634 | "ae9729bae35748b9b757ab558d50bdc4": { 1635 | "model_module": "@jupyter-widgets/controls", 1636 | "model_name": "HTMLModel", 1637 | "model_module_version": "1.5.0", 1638 | "state": { 1639 | "_dom_classes": [], 1640 | "_model_module": "@jupyter-widgets/controls", 1641 | "_model_module_version": "1.5.0", 1642 | "_model_name": "HTMLModel", 1643 | "_view_count": null, 1644 | "_view_module": "@jupyter-widgets/controls", 1645 | "_view_module_version": "1.5.0", 1646 | "_view_name": "HTMLView", 1647 | "description": "", 1648 | "description_tooltip": null, 1649 | "layout": "IPY_MODEL_560e25624e6844c7b92f705b9a6d06f5", 1650 | "placeholder": "​", 1651 | "style": "IPY_MODEL_521e0794f59a4d0c8a74858d8ca8802c", 1652 | "value": "vocab.txt: 100%" 1653 | } 1654 | }, 1655 | "31797fec525e43aa9f968b63e95b8aaa": { 1656 | "model_module": "@jupyter-widgets/controls", 1657 | "model_name": "FloatProgressModel", 1658 | "model_module_version": "1.5.0", 1659 | "state": { 1660 | "_dom_classes": [], 1661 | "_model_module": "@jupyter-widgets/controls", 1662 | "_model_module_version": "1.5.0", 1663 | "_model_name": "FloatProgressModel", 1664 | "_view_count": null, 1665 | "_view_module": "@jupyter-widgets/controls", 1666 | "_view_module_version": "1.5.0", 1667 | "_view_name": "ProgressView", 1668 | "bar_style": "success", 1669 | "description": "", 1670 | "description_tooltip": null, 1671 | "layout": "IPY_MODEL_d6ba19a6e2374b34bd7a954a0f29a95a", 1672 | "max": 231508, 1673 | "min": 0, 1674 | "orientation": "horizontal", 1675 | "style": "IPY_MODEL_d04c2e5c9a9d4dc59a4ef5d61f55faf7", 1676 | "value": 231508 1677 | } 1678 | }, 1679 | "5ec34446ceeb4f37a9f9ca0e43103416": { 1680 | "model_module": "@jupyter-widgets/controls", 1681 | "model_name": "HTMLModel", 1682 | "model_module_version": "1.5.0", 1683 | "state": { 1684 | "_dom_classes": [], 1685 | "_model_module": "@jupyter-widgets/controls", 1686 | "_model_module_version": "1.5.0", 1687 | "_model_name": "HTMLModel", 1688 | "_view_count": null, 1689 | "_view_module": "@jupyter-widgets/controls", 1690 | "_view_module_version": "1.5.0", 1691 | "_view_name": "HTMLView", 1692 | "description": "", 1693 | "description_tooltip": null, 1694 | "layout": "IPY_MODEL_0b4f45ff04994accb494587a662a648a", 1695 | "placeholder": "​", 1696 | "style": "IPY_MODEL_beb8dd797e774760b641d27ecde161bc", 1697 | "value": " 232k/232k [00:00<00:00, 1.69MB/s]" 1698 | } 1699 | }, 1700 | "f0d3166484384e15ae480e8d42504d52": { 1701 | "model_module": "@jupyter-widgets/base", 1702 | "model_name": "LayoutModel", 1703 | "model_module_version": "1.2.0", 1704 | "state": { 1705 | "_model_module": "@jupyter-widgets/base", 1706 | "_model_module_version": "1.2.0", 1707 | "_model_name": "LayoutModel", 1708 | "_view_count": null, 1709 | "_view_module": "@jupyter-widgets/base", 1710 | "_view_module_version": "1.2.0", 1711 | "_view_name": "LayoutView", 1712 | "align_content": null, 1713 | "align_items": null, 1714 | "align_self": null, 1715 | "border": null, 1716 | "bottom": null, 1717 | "display": null, 1718 | "flex": null, 1719 | "flex_flow": null, 1720 | "grid_area": null, 1721 | "grid_auto_columns": null, 1722 | "grid_auto_flow": null, 1723 | "grid_auto_rows": null, 1724 | "grid_column": null, 1725 | "grid_gap": null, 1726 | "grid_row": null, 1727 | "grid_template_areas": null, 1728 | "grid_template_columns": null, 1729 | "grid_template_rows": null, 1730 | "height": null, 1731 | "justify_content": null, 1732 | "justify_items": null, 1733 | "left": null, 1734 | "margin": null, 1735 | "max_height": null, 1736 | "max_width": null, 1737 | "min_height": null, 1738 | "min_width": null, 1739 | "object_fit": null, 1740 | "object_position": null, 1741 | "order": null, 1742 | "overflow": null, 1743 | "overflow_x": null, 1744 | "overflow_y": null, 1745 | "padding": null, 1746 | "right": null, 1747 | "top": null, 1748 | "visibility": null, 1749 | "width": null 1750 | } 1751 | }, 1752 | "560e25624e6844c7b92f705b9a6d06f5": { 1753 | "model_module": "@jupyter-widgets/base", 1754 | "model_name": "LayoutModel", 1755 | "model_module_version": "1.2.0", 1756 | "state": { 1757 | "_model_module": "@jupyter-widgets/base", 1758 | "_model_module_version": "1.2.0", 1759 | "_model_name": "LayoutModel", 1760 | "_view_count": null, 1761 | "_view_module": "@jupyter-widgets/base", 1762 | "_view_module_version": "1.2.0", 1763 | "_view_name": "LayoutView", 1764 | "align_content": null, 1765 | "align_items": null, 1766 | "align_self": null, 1767 | "border": null, 1768 | "bottom": null, 1769 | "display": null, 1770 | "flex": null, 1771 | "flex_flow": null, 1772 | "grid_area": null, 1773 | "grid_auto_columns": null, 1774 | "grid_auto_flow": null, 1775 | "grid_auto_rows": null, 1776 | "grid_column": null, 1777 | "grid_gap": null, 1778 | "grid_row": null, 1779 | "grid_template_areas": null, 1780 | "grid_template_columns": null, 1781 | "grid_template_rows": null, 1782 | "height": null, 1783 | "justify_content": null, 1784 | "justify_items": null, 1785 | "left": null, 1786 | "margin": null, 1787 | "max_height": null, 1788 | "max_width": null, 1789 | "min_height": null, 1790 | "min_width": null, 1791 | "object_fit": null, 1792 | "object_position": null, 1793 | "order": null, 1794 | "overflow": null, 1795 | "overflow_x": null, 1796 | "overflow_y": null, 1797 | "padding": null, 1798 | "right": null, 1799 | "top": null, 1800 | "visibility": null, 1801 | "width": null 1802 | } 1803 | }, 1804 | "521e0794f59a4d0c8a74858d8ca8802c": { 1805 | "model_module": "@jupyter-widgets/controls", 1806 | "model_name": "DescriptionStyleModel", 1807 | "model_module_version": "1.5.0", 1808 | "state": { 1809 | "_model_module": "@jupyter-widgets/controls", 1810 | "_model_module_version": "1.5.0", 1811 | "_model_name": "DescriptionStyleModel", 1812 | "_view_count": null, 1813 | "_view_module": "@jupyter-widgets/base", 1814 | "_view_module_version": "1.2.0", 1815 | "_view_name": "StyleView", 1816 | "description_width": "" 1817 | } 1818 | }, 1819 | "d6ba19a6e2374b34bd7a954a0f29a95a": { 1820 | "model_module": "@jupyter-widgets/base", 1821 | "model_name": "LayoutModel", 1822 | "model_module_version": "1.2.0", 1823 | "state": { 1824 | "_model_module": "@jupyter-widgets/base", 1825 | "_model_module_version": "1.2.0", 1826 | "_model_name": "LayoutModel", 1827 | "_view_count": null, 1828 | "_view_module": "@jupyter-widgets/base", 1829 | "_view_module_version": "1.2.0", 1830 | "_view_name": "LayoutView", 1831 | "align_content": null, 1832 | "align_items": null, 1833 | "align_self": null, 1834 | "border": null, 1835 | "bottom": null, 1836 | "display": null, 1837 | "flex": null, 1838 | "flex_flow": null, 1839 | "grid_area": null, 1840 | "grid_auto_columns": null, 1841 | "grid_auto_flow": null, 1842 | "grid_auto_rows": null, 1843 | "grid_column": null, 1844 | "grid_gap": null, 1845 | "grid_row": null, 1846 | "grid_template_areas": null, 1847 | "grid_template_columns": null, 1848 | "grid_template_rows": null, 1849 | "height": null, 1850 | "justify_content": null, 1851 | "justify_items": null, 1852 | "left": null, 1853 | "margin": null, 1854 | "max_height": null, 1855 | "max_width": null, 1856 | "min_height": null, 1857 | "min_width": null, 1858 | "object_fit": null, 1859 | "object_position": null, 1860 | "order": null, 1861 | "overflow": null, 1862 | "overflow_x": null, 1863 | "overflow_y": null, 1864 | "padding": null, 1865 | "right": null, 1866 | "top": null, 1867 | "visibility": null, 1868 | "width": null 1869 | } 1870 | }, 1871 | "d04c2e5c9a9d4dc59a4ef5d61f55faf7": { 1872 | "model_module": "@jupyter-widgets/controls", 1873 | "model_name": "ProgressStyleModel", 1874 | "model_module_version": "1.5.0", 1875 | "state": { 1876 | "_model_module": "@jupyter-widgets/controls", 1877 | "_model_module_version": "1.5.0", 1878 | "_model_name": "ProgressStyleModel", 1879 | "_view_count": null, 1880 | "_view_module": "@jupyter-widgets/base", 1881 | "_view_module_version": "1.2.0", 1882 | "_view_name": "StyleView", 1883 | "bar_color": null, 1884 | "description_width": "" 1885 | } 1886 | }, 1887 | "0b4f45ff04994accb494587a662a648a": { 1888 | "model_module": "@jupyter-widgets/base", 1889 | "model_name": "LayoutModel", 1890 | "model_module_version": "1.2.0", 1891 | "state": { 1892 | "_model_module": "@jupyter-widgets/base", 1893 | "_model_module_version": "1.2.0", 1894 | "_model_name": "LayoutModel", 1895 | "_view_count": null, 1896 | "_view_module": "@jupyter-widgets/base", 1897 | "_view_module_version": "1.2.0", 1898 | "_view_name": "LayoutView", 1899 | "align_content": null, 1900 | "align_items": null, 1901 | "align_self": null, 1902 | "border": null, 1903 | "bottom": null, 1904 | "display": null, 1905 | "flex": null, 1906 | "flex_flow": null, 1907 | "grid_area": null, 1908 | "grid_auto_columns": null, 1909 | "grid_auto_flow": null, 1910 | "grid_auto_rows": null, 1911 | "grid_column": null, 1912 | "grid_gap": null, 1913 | "grid_row": null, 1914 | "grid_template_areas": null, 1915 | "grid_template_columns": null, 1916 | "grid_template_rows": null, 1917 | "height": null, 1918 | "justify_content": null, 1919 | "justify_items": null, 1920 | "left": null, 1921 | "margin": null, 1922 | "max_height": null, 1923 | "max_width": null, 1924 | "min_height": null, 1925 | "min_width": null, 1926 | "object_fit": null, 1927 | "object_position": null, 1928 | "order": null, 1929 | "overflow": null, 1930 | "overflow_x": null, 1931 | "overflow_y": null, 1932 | "padding": null, 1933 | "right": null, 1934 | "top": null, 1935 | "visibility": null, 1936 | "width": null 1937 | } 1938 | }, 1939 | "beb8dd797e774760b641d27ecde161bc": { 1940 | "model_module": "@jupyter-widgets/controls", 1941 | "model_name": "DescriptionStyleModel", 1942 | "model_module_version": "1.5.0", 1943 | "state": { 1944 | "_model_module": "@jupyter-widgets/controls", 1945 | "_model_module_version": "1.5.0", 1946 | "_model_name": "DescriptionStyleModel", 1947 | "_view_count": null, 1948 | "_view_module": "@jupyter-widgets/base", 1949 | "_view_module_version": "1.2.0", 1950 | "_view_name": "StyleView", 1951 | "description_width": "" 1952 | } 1953 | } 1954 | } 1955 | } 1956 | }, 1957 | "nbformat": 4, 1958 | "nbformat_minor": 0 1959 | } -------------------------------------------------------------------------------- /week10_gpt/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week10_gpt/lecture.pdf -------------------------------------------------------------------------------- /week11_cv_transformers/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week11_cv_transformers/lecture.pdf -------------------------------------------------------------------------------- /week12_gan/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week12_gan/lecture.pdf -------------------------------------------------------------------------------- /week13_latent_models/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week13_latent_models/lecture.pdf -------------------------------------------------------------------------------- /week14_representation_learning/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week14_representation_learning/.DS_Store -------------------------------------------------------------------------------- /week14_representation_learning/lecture.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week14_representation_learning/lecture.pdf --------------------------------------------------------------------------------