├── README.md
├── hw01
└── 1_Hw_Students.ipynb
├── hw02
└── 2_Hw_Students.ipynb
├── hw03
└── 3_Hw_Students.ipynb
├── hw05
└── 5_Hw_Students.ipynb
├── week01_intro
├── lecture.pdf
└── seminar.ipynb
├── week02_init_regularization
├── lecture.pdf
└── seminar.ipynb
├── week03_conv
├── lecture.pdf
└── seminar.ipynb
├── week04_tricks
├── lecture.pdf
└── seminar.ipynb
├── week05_segmentation
├── lecture.pdf
└── seminar.ipynb
├── week06_detection
└── lecture.pdf
├── week07_word_embeddings
├── lecture.pdf
└── seminar.ipynb
├── week08_text_classification
├── lecture.pdf
└── seminar.ipynb
├── week09_transformer
├── lecture.pdf
└── seminar.ipynb
├── week10_gpt
├── lecture.pdf
└── seminar.ipynb
├── week11_cv_transformers
├── lecture.pdf
└── seminar.ipynb
├── week12_gan
├── lecture.pdf
└── seminar.ipynb
├── week13_latent_models
├── lecture.pdf
└── seminar.ipynb
└── week14_representation_learning
├── .DS_Store
├── lecture.pdf
└── seminar.ipynb
/README.md:
--------------------------------------------------------------------------------
1 | # deep-learning-course
--------------------------------------------------------------------------------
/hw01/1_Hw_Students.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "nbformat": 4,
3 | "nbformat_minor": 0,
4 | "metadata": {
5 | "colab": {
6 | "provenance": [],
7 | "gpuType": "T4",
8 | "toc_visible": true
9 | },
10 | "kernelspec": {
11 | "name": "python3",
12 | "display_name": "Python 3"
13 | },
14 | "language_info": {
15 | "name": "python"
16 | },
17 | "accelerator": "GPU"
18 | },
19 | "cells": [
20 | {
21 | "cell_type": "code",
22 | "source": [
23 | "import torch\n",
24 | "import numpy as np\n",
25 | "import matplotlib.pyplot as plt\n",
26 | "from tqdm import tqdm\n",
27 | "from IPython.display import clear_output\n",
28 | "\n",
29 | "print(torch.__version__)"
30 | ],
31 | "metadata": {
32 | "id": "VFj8-qGfYA-2"
33 | },
34 | "execution_count": null,
35 | "outputs": []
36 | },
37 | {
38 | "cell_type": "code",
39 | "source": [
40 | "DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'"
41 | ],
42 | "metadata": {
43 | "id": "cSuFlZPnrT8O"
44 | },
45 | "execution_count": null,
46 | "outputs": []
47 | },
48 | {
49 | "cell_type": "code",
50 | "source": [
51 | "import sys, os\n",
52 | "if 'google.colab' in sys.modules and not os.path.exists('.setup_complete'):\n",
53 | " !wget -q https://raw.githubusercontent.com/yandexdataschool/deep_vision_and_graphics/fall22/week01-pytorch_intro/notmnist.py\n",
54 | " !touch .setup_complete"
55 | ],
56 | "metadata": {
57 | "id": "usRNEECdbR9F"
58 | },
59 | "execution_count": null,
60 | "outputs": []
61 | },
62 | {
63 | "cell_type": "markdown",
64 | "source": [
65 | "# Task 1. Tensors (1 point)"
66 | ],
67 | "metadata": {
68 | "id": "u7FaNwW2X_v0"
69 | }
70 | },
71 | {
72 | "cell_type": "markdown",
73 | "source": [
74 | "Let's write another function, this time in polar coordinates:\n",
75 | "$$\\rho(\\theta) = (1 + 0.9 \\cdot cos (8 \\cdot \\theta) ) \\cdot (1 + 0.1 \\cdot cos(24 \\cdot \\theta)) \\cdot (0.9 + 0.05 \\cdot cos(200 \\cdot \\theta)) \\cdot (1 + sin(\\theta))$$\n",
76 | "\n",
77 | "\n",
78 | "Then convert it into cartesian coordinates ([howto](http://www.mathsisfun.com/polar-cartesian-coordinates.html)) and plot the results.\n",
79 | "\n",
80 | "Use torch tensors only: no lists, loops, numpy arrays, etc."
81 | ],
82 | "metadata": {
83 | "id": "gUtSrsCYaRdA"
84 | }
85 | },
86 | {
87 | "cell_type": "code",
88 | "source": [
89 | "theta = torch.linspace(- np.pi, np.pi, steps=1000)\n",
90 | "\n",
91 | "# compute rho(theta) as per formula above\n",
92 | "rho = YOUR CODE HERE\n",
93 | "\n",
94 | "# Now convert polar (rho, theta) pairs into cartesian (x,y) to plot them.\n",
95 | "x = YOUR CODE HERE\n",
96 | "y = YOUR CODE HERE\n",
97 | "\n",
98 | "\n",
99 | "plt.figure(figsize=[6, 6])\n",
100 | "plt.fill(x.numpy(), y.numpy(), color='green')\n",
101 | "plt.grid()"
102 | ],
103 | "metadata": {
104 | "id": "sTtmnC-EaIr5"
105 | },
106 | "execution_count": null,
107 | "outputs": []
108 | },
109 | {
110 | "cell_type": "markdown",
111 | "source": [
112 | "# Task 2: Going deeper (6 points)\n",
113 | "\n",
114 | "Your ultimate task here is to build your first neural network [almost] from scratch and pure PyTorch.\n",
115 | "\n",
116 | "This time you will solve the same digit recognition problem, but at a larger scale\n",
117 | "\n",
118 | "* 10 different letters\n",
119 | "* 20k samples\n",
120 | "\n",
121 | "We want you to build a network that __reaches at least 80% accuracy__ and has __at least 2 linear layers__ in it.\n",
122 | "\n",
123 | "With 10 classes you need __categorical crossentropy__ (see [here](http://wiki.fast.ai/index.php/Log_Loss)) loss. You can write it any way you want, but we recommend to use log_softmax function from pytorch, since it is more numerically stable.\n",
124 | "\n",
125 | "Note that you are not required to build 152-layer monsters here. A 2-layer (one hidden, one output) neural network should already give you nice score.\n",
126 | "\n",
127 | "__Win conditions:__\n",
128 | "* __Your model must be nonlinear,__ but not necessarily deep.\n",
129 | "* __Train your model with your own SGD__ - which you will have to implement\n",
130 | "* __For this task only, please do not use the contents of `torch.nn` and `torch.optim`.__ That's for the next task.\n",
131 | "* __Do not use Conv layers__\n",
132 | "\n",
133 | "**Bonus:** For the best score in group you get +1.5, 1.0, 0.5 point(1st. 2nd, 3rd places)."
134 | ],
135 | "metadata": {
136 | "id": "vcfxlYBna3Ke"
137 | }
138 | },
139 | {
140 | "cell_type": "code",
141 | "source": [
142 | "from notmnist import load_notmnist\n",
143 | "X_train, y_train, X_val, y_val = load_notmnist(letters='ABCDEFGHIJ')\n",
144 | "X_train, X_val = X_train.reshape([-1, 784]), X_val.reshape([-1, 784])"
145 | ],
146 | "metadata": {
147 | "id": "_uZn0Bdba3pH"
148 | },
149 | "execution_count": null,
150 | "outputs": []
151 | },
152 | {
153 | "cell_type": "code",
154 | "source": [
155 | "%matplotlib inline\n",
156 | "plt.figure(figsize=[12, 4])\n",
157 | "for i in range(20):\n",
158 | " plt.subplot(2, 10, i+1)\n",
159 | " plt.imshow(X_train[i].reshape([28, 28]))\n",
160 | " plt.title(str(y_train[i]))"
161 | ],
162 | "metadata": {
163 | "id": "3WJlL3PHbs2S"
164 | },
165 | "execution_count": null,
166 | "outputs": []
167 | },
168 | {
169 | "cell_type": "code",
170 | "source": [
171 | "X_train.shape, y_train.shape, X_val.shape, y_val.shape"
172 | ],
173 | "metadata": {
174 | "id": "_M4n3fcDbvqu"
175 | },
176 | "execution_count": null,
177 | "outputs": []
178 | },
179 | {
180 | "cell_type": "code",
181 | "source": [
182 | "classes = np.unique(y_train)\n",
183 | "n_classes = len(classes)\n",
184 | "classes"
185 | ],
186 | "metadata": {
187 | "id": "bIH6GPb3djr7"
188 | },
189 | "execution_count": null,
190 | "outputs": []
191 | },
192 | {
193 | "cell_type": "code",
194 | "source": [
195 | "class CustomNet:\n",
196 | " def __init__(self, hidden_size, in_size=28*28, num_classes=n_classes):\n",
197 | " # self.W = YOUR CODE HERE\n",
198 | " pass\n",
199 | " def forward(self, x):\n",
200 | " # YOUR CODE HERE\n",
201 | " pass"
202 | ],
203 | "metadata": {
204 | "id": "mc7bSgpCbzHN"
205 | },
206 | "execution_count": null,
207 | "outputs": []
208 | },
209 | {
210 | "cell_type": "code",
211 | "source": [
212 | "net = CustomNet()\n",
213 | "out = net.forward(torch.randn(2, 28*28, device=DEVICE))\n",
214 | "assert len(out.shape) == 2\n",
215 | "assert out.shape[-1] == n_classes"
216 | ],
217 | "metadata": {
218 | "id": "649qrlZXfUB6"
219 | },
220 | "execution_count": null,
221 | "outputs": []
222 | },
223 | {
224 | "cell_type": "code",
225 | "source": [
226 | "import torch.nn.functional as F\n",
227 | "\n",
228 | "def cross_entropy_loss(logits, target):\n",
229 | " N = logits.size(0)\n",
230 | " # Get the log probabilities\n",
231 | " log_probs = # YOUR CODE HERE\n",
232 | " # Gather the log probabilities at the target indices\n",
233 | " log_probs_at_target = # YOUR CODE HERE\n",
234 | " # Compute the negative log likelihood\n",
235 | " nll = # YOUR CODE HERE\n",
236 | " return nll / N\n",
237 | "\n",
238 | "y_tmp = torch.tensor(y_train[:2], device=DEVICE)\n",
239 | "cross_entropy_loss(out, y_tmp), torch.nn.CrossEntropyLoss()(out, y_tmp)"
240 | ],
241 | "metadata": {
242 | "id": "HnnVGZSwmyu_"
243 | },
244 | "execution_count": null,
245 | "outputs": []
246 | },
247 | {
248 | "cell_type": "code",
249 | "source": [
250 | "class CustomSGD:\n",
251 | " def __init__(self, model, lr=1e-4):\n",
252 | " self.model = model\n",
253 | " self.lr = lr\n",
254 | "\n",
255 | " def step(self):\n",
256 | " with torch.no_grad():\n",
257 | " for param in # YOUR CODE HERE:\n",
258 | " # YOUR CODE HERE\n",
259 | " def zero_grad(self):\n",
260 | " for param in # YOUR CODE HERE:\n",
261 | " # YOUR CODE HERE"
262 | ],
263 | "metadata": {
264 | "id": "70swrTmCeBZx"
265 | },
266 | "execution_count": null,
267 | "outputs": []
268 | },
269 | {
270 | "cell_type": "code",
271 | "source": [
272 | "def iterate_minibatches(X, y, batch_size):\n",
273 | " indices = np.random.permutation(np.arange(len(X)))\n",
274 | " for start in range(0, len(indices), batch_size):\n",
275 | " ix = indices[start: start + batch_size]\n",
276 | " yield torch.from_numpy(X[ix]), torch.from_numpy(y[ix])"
277 | ],
278 | "metadata": {
279 | "id": "nVvO14e90YjC"
280 | },
281 | "execution_count": null,
282 | "outputs": []
283 | },
284 | {
285 | "cell_type": "code",
286 | "source": [
287 | "def train(net, optimizer, loss_fn, n_epoch=20):\n",
288 | " loss_history = []\n",
289 | " acc_history = []\n",
290 | " val_loss_history = []\n",
291 | " val_acc_history = []\n",
292 | "\n",
293 | " for i in range(n_epoch):\n",
294 | " # Training\n",
295 | " # net.train()\n",
296 | " acc_batches=[]\n",
297 | " loss_batches=[]\n",
298 | " for x_batch, y_batch in iterate_minibatches(X_train, y_train, batch_size=64):\n",
299 | " # x_batch = # YOUR CODE HERE\n",
300 | " # y_batch = # YOUR CODE HERE\n",
301 | " # Forward\n",
302 | " # loss = # YOUR CODE HERE\n",
303 | "\n",
304 | " # Backward\n",
305 | " # YOUR CODE HERE\n",
306 | "\n",
307 | " # Accuracy\n",
308 | " acc_batches += (out.argmax(axis=1) == y_batch).detach().cpu().numpy().tolist()\n",
309 | "\n",
310 | " loss_history.append(np.mean(loss_batches))\n",
311 | " acc_history.append(np.mean(acc_batches))\n",
312 | "\n",
313 | " # Validating\n",
314 | " # net.eval()\n",
315 | " with torch.no_grad():\n",
316 | " acc_batches=[]\n",
317 | " loss_batches=[]\n",
318 | " for x_batch, y_batch in iterate_minibatches(X_val, y_val, batch_size=64):\n",
319 | " # x_batch = # YOUR CODE HERE\n",
320 | " # y_batch = # YOUR CODE HERE\n",
321 | " # Forward\n",
322 | " # loss = # YOUR CODE HERE\n",
323 | " # Accuracy\n",
324 | " acc_batches += (out.argmax(axis=1) == y_batch).detach().cpu().numpy().tolist()\n",
325 | "\n",
326 | " val_loss_history.append(np.mean(loss_batches))\n",
327 | " val_acc_history.append(np.mean(acc_batches))\n",
328 | "\n",
329 | " clear_output(wait=True)\n",
330 | " fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16, 8))\n",
331 | " ax1.set_xlabel(\"#epoch\")\n",
332 | " ax1.set_ylabel(\"Loss\")\n",
333 | " ax1.plot(loss_history, 'b', label='train loss')\n",
334 | " ax1.plot(val_loss_history, 'r', label='val loss')\n",
335 | "\n",
336 | " ax2.set_xlabel(\"#epoch\")\n",
337 | " ax2.set_ylabel(\"Acc\")\n",
338 | " ax2.plot(acc_history, 'b', label='train acc')\n",
339 | " ax2.plot(val_acc_history, 'r', label='val acc')\n",
340 | " plt.axhline(y = 0.8, color = 'g', linestyle = '--')\n",
341 | "\n",
342 | " plt.legend()\n",
343 | " plt.show()\n",
344 | " return max(val_acc_history)"
345 | ],
346 | "metadata": {
347 | "id": "DjLxUiA1eLtZ"
348 | },
349 | "execution_count": null,
350 | "outputs": []
351 | },
352 | {
353 | "cell_type": "code",
354 | "source": [
355 | "net = # YOUR CODE HERE\n",
356 | "opt = # YOUR CODE HERE\n",
357 | "# train(net, opt, cross_entropy_loss)"
358 | ],
359 | "metadata": {
360 | "id": "Ps2HNwINqtVn"
361 | },
362 | "execution_count": null,
363 | "outputs": []
364 | },
365 | {
366 | "cell_type": "markdown",
367 | "source": [
368 | "### Hints:\n",
369 | " - You'll have to use matrix W(feature_id x class_id)\n",
370 | " - Softmax (exp over sum of exps) can be implemented manually or as `torch.softmax`\n",
371 | " - Probably better to use STOCHASTIC gradient descent (minibatch) for greater speed\n",
372 | " - You need to train both layers, not just the output layer :)\n",
373 | " - 50 hidden neurons and a ReLU nonlinearity will do for a start. Many ways to improve.\n",
374 | " - In ideal case this totals to 2 `torch.matmul`'s, 1 softmax and 1 ReLU/sigmoid \n",
375 | " - If anything seems wrong, try going through one step of training and printing everything you compute.\n",
376 | " - If you see NaNs midway through optimization, you can estimate $\\log P(y \\mid x)$ as `torch.log_softmax(last_linear_layer_outputs)`."
377 | ],
378 | "metadata": {
379 | "id": "wwQYqNdugwmp"
380 | }
381 | },
382 | {
383 | "cell_type": "markdown",
384 | "source": [
385 | "# Task 3. Overfitting (4 points)\n"
386 | ],
387 | "metadata": {
388 | "id": "1fxSuZJwb1bx"
389 | }
390 | },
391 | {
392 | "cell_type": "markdown",
393 | "source": [
394 | "Today we work with [Fashion-MNIST dataset](https://github.com/zalandoresearch/fashion-mnist) (*hint: it is available in `torchvision`*).\n",
395 | "\n",
396 | "Your goal for today:\n",
397 | "0. Fill the gaps in training loop and architectures.\n",
398 | "1. Train a tiny __FC__ network.\n",
399 | "2. Cause considerable overfitting by modifying the network (e.g. increasing the number of network parameters and/or layers) and demonstrate in in the appropriate way (e.g. plot loss and accurasy on train and validation set w.r.t. network complexity).\n",
400 | "3. Try to deal with overfitting (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results.\n",
401 | "\n",
402 | "Train a network that achieves $\\geq 0.885$ test accuracy. Again you should use only Linear (`nn.Linear`) layers and activations/dropout/batchnorm. Convolutional layers might be a great use, but we will meet them a bit later.\n",
403 | "\n",
404 | "__Please, write a small report describing your ideas, tries and achieved results in the end of this task.__\n",
405 | "\n",
406 | "*Note*: in task 3 your goal is to make the network from task 2 less prone to overfitting. And then to train the network that achives $\\geq 0.885$ test accuracy, so it can be different.\n",
407 | "\n",
408 | "**Bonus:** For the best score in group you get +1.5, 1.0, 0.5 point(1st, 2nd, 3rd places)."
409 | ],
410 | "metadata": {
411 | "id": "g9j3vEc9rsQC"
412 | }
413 | },
414 | {
415 | "cell_type": "code",
416 | "source": [
417 | "import torch\n",
418 | "import torch.nn as nn\n",
419 | "import torchvision\n",
420 | "import torchvision.transforms as transforms\n",
421 | "import torchsummary\n",
422 | "\n",
423 | "from matplotlib import pyplot as plt\n",
424 | "from matplotlib.pyplot import figure\n",
425 | "import numpy as np\n",
426 | "import os\n",
427 | "from tqdm import tqdm\n",
428 | "from sklearn.model_selection import train_test_split"
429 | ],
430 | "metadata": {
431 | "id": "O1fxVkDOb3QX"
432 | },
433 | "execution_count": null,
434 | "outputs": []
435 | },
436 | {
437 | "cell_type": "code",
438 | "source": [
439 | "# Technical function\n",
440 | "def mkdir(path):\n",
441 | " if not os.path.exists(root_path):\n",
442 | " os.mkdir(root_path)\n",
443 | " print('Directory', path, 'is created!')\n",
444 | " else:\n",
445 | " print('Directory', path, 'already exists!')\n",
446 | "\n",
447 | "root_path = 'fmnist'\n",
448 | "mkdir(root_path)"
449 | ],
450 | "metadata": {
451 | "id": "kzUtrIEgrwWi"
452 | },
453 | "execution_count": null,
454 | "outputs": []
455 | },
456 | {
457 | "cell_type": "code",
458 | "execution_count": null,
459 | "metadata": {
460 | "id": "qt6LE7XaTDT9"
461 | },
462 | "outputs": [],
463 | "source": [
464 | "download = True\n",
465 | "train_transform = transforms.ToTensor()\n",
466 | "test_transform = transforms.ToTensor()\n",
467 | "\n",
468 | "fmnist_dataset_train = torchvision.datasets.FashionMNIST(root_path,\n",
469 | " train=True,\n",
470 | " transform=train_transform,\n",
471 | " target_transform=None,\n",
472 | " download=download)\n",
473 | "fmnist_dataset_test = torchvision.datasets.FashionMNIST(root_path,\n",
474 | " train=False,\n",
475 | " transform=test_transform,\n",
476 | " target_transform=None,\n",
477 | " download=download)"
478 | ]
479 | },
480 | {
481 | "cell_type": "code",
482 | "source": [
483 | "fmnist_dataset_train, fmnist_dataset_val = train_test_split(fmnist_dataset_train, train_size=50000)"
484 | ],
485 | "metadata": {
486 | "id": "15B67G4iGMIr"
487 | },
488 | "execution_count": null,
489 | "outputs": []
490 | },
491 | {
492 | "cell_type": "code",
493 | "source": [
494 | "len(fmnist_dataset_train), len(fmnist_dataset_val), len(fmnist_dataset_test)"
495 | ],
496 | "metadata": {
497 | "id": "cpbEkSCM1n3Z"
498 | },
499 | "execution_count": null,
500 | "outputs": []
501 | },
502 | {
503 | "cell_type": "code",
504 | "execution_count": null,
505 | "metadata": {
506 | "id": "71YP0SPwTIxD"
507 | },
508 | "outputs": [],
509 | "source": [
510 | "train_loader = torch.utils.data.DataLoader(fmnist_dataset_train,\n",
511 | " batch_size=128,\n",
512 | " shuffle=True,\n",
513 | " num_workers=2)\n",
514 | "\n",
515 | "val_loader = torch.utils.data.DataLoader(fmnist_dataset_val,\n",
516 | " batch_size=256,\n",
517 | " shuffle=True,\n",
518 | " num_workers=2)\n",
519 | "\n",
520 | "test_loader = torch.utils.data.DataLoader(fmnist_dataset_test,\n",
521 | " batch_size=256,\n",
522 | " shuffle=False,\n",
523 | " num_workers=2)"
524 | ]
525 | },
526 | {
527 | "cell_type": "code",
528 | "execution_count": null,
529 | "metadata": {
530 | "id": "aHca15bOTY4B"
531 | },
532 | "outputs": [],
533 | "source": [
534 | "for img, label in train_loader:\n",
535 | " print(img.shape)\n",
536 | " # print(img)\n",
537 | " print(label.shape)\n",
538 | " break\n",
539 | "\n",
540 | "plt.imshow(img[0, 0]);"
541 | ]
542 | },
543 | {
544 | "cell_type": "code",
545 | "source": [
546 | "def train_val_loop(net, train_loader, val_loader, name, optimizer, criterion, n_epoch=20):\n",
547 | " loss_history = []\n",
548 | " acc_history = []\n",
549 | " val_loss_history = []\n",
550 | " val_acc_history = []\n",
551 | "\n",
552 | " for i in range(n_epoch):\n",
553 | " net.train()\n",
554 | " acc_batches=[]\n",
555 | " loss_batches=[]\n",
556 | " for x_batch, y_batch in train_loader:\n",
557 | " # x_batch = YOUR CODE HERE\n",
558 | " # y_batch = YOUR CODE HERE\n",
559 | "\n",
560 | " # Forward\n",
561 | " # loss = YOUR CODE HERE\n",
562 | "\n",
563 | " # Backward\n",
564 | " # ... YOUR CODE HERE\n",
565 | "\n",
566 | " # Accuracy\n",
567 | " # acc_batches = YOUR CODE HERE\n",
568 | "\n",
569 | " loss_history.append(np.mean(loss_batches))\n",
570 | " acc_history.append(np.mean(acc_batches))\n",
571 | "\n",
572 | " # Validating\n",
573 | " net.eval()\n",
574 | " with torch.no_grad():\n",
575 | " acc_batches=[]\n",
576 | " loss_batches=[]\n",
577 | " for x_batch, y_batch in val_loader:\n",
578 | " # x_batch = YOUR CODE HERE\n",
579 | " # y_batch = YOUR CODE HERE\n",
580 | "\n",
581 | " # Forward\n",
582 | " # loss = YOUR CODE HERE\n",
583 | "\n",
584 | " # Accuracy\n",
585 | " # acc_batches = YOUR CODE HERE\n",
586 | "\n",
587 | " val_loss_history.append(np.mean(loss_batches))\n",
588 | " val_acc_history.append(np.mean(acc_batches))\n",
589 | "\n",
590 | " clear_output(wait=True)\n",
591 | " plt.figure(figsize=(8, 5))\n",
592 | " plt.title(f\"Training/validating loss {name}\")\n",
593 | " plt.xlabel(\"#epoch\")\n",
594 | " plt.ylabel(\"Loss\")\n",
595 | " plt.plot(loss_history, 'b', label='train')\n",
596 | " plt.plot(val_loss_history, 'r', label='validation')\n",
597 | " plt.legend()\n",
598 | "\n",
599 | " plt.figure(figsize=(8, 5))\n",
600 | " plt.title(f\"Training/validating accuracy {name}\")\n",
601 | " plt.xlabel(\"#epoch\")\n",
602 | " plt.ylabel(\"Accuracy\")\n",
603 | " plt.plot(acc_history, 'b', label='train')\n",
604 | " plt.plot(val_acc_history, 'r', label='validation')\n",
605 | " plt.legend()\n",
606 | "\n",
607 | " plt.show()\n",
608 | "\n",
609 | "def test_accuracy(model):\n",
610 | " model.eval()\n",
611 | " test_acc_batches = []\n",
612 | " with torch.no_grad():\n",
613 | " for X_test, Y_test in test_loader:\n",
614 | " X_test = X_test.to(DEVICE)\n",
615 | " Y_test = Y_test.to(DEVICE)\n",
616 | " out = model.forward(X_test)\n",
617 | " test_acc_batches += (out.argmax(axis=1) == Y_test).detach().cpu().numpy().tolist()\n",
618 | " print(f'Test accuracy {np.mean(test_acc_batches)}')"
619 | ],
620 | "metadata": {
621 | "id": "QBu9fg_dAymN"
622 | },
623 | "execution_count": null,
624 | "outputs": []
625 | },
626 | {
627 | "cell_type": "markdown",
628 | "metadata": {
629 | "id": "b6OOOffHTfX5"
630 | },
631 | "source": [
632 | "## Task 3.1 Tiny net\n",
633 | "Train a tiny network just to validate correctness of train loop, net architecture, params."
634 | ]
635 | },
636 | {
637 | "cell_type": "code",
638 | "execution_count": null,
639 | "metadata": {
640 | "id": "ftpkTjxlTcFx"
641 | },
642 | "outputs": [],
643 | "source": [
644 | "class TinyNeuralNetwork(nn.Module):\n",
645 | " def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):\n",
646 | " super(self.__class__, self).__init__()\n",
647 | " self.model = nn.Sequential(\n",
648 | " nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards\n",
649 | " # YOUR CODE HERE\n",
650 | " )\n",
651 | "\n",
652 | " def forward(self, inp):\n",
653 | " out = self.model(inp)\n",
654 | " return out"
655 | ]
656 | },
657 | {
658 | "cell_type": "code",
659 | "source": [
660 | "model = TinyNeuralNetwork()\n",
661 | "out = model(torch.randn(2, 1, 28, 28))\n",
662 | "assert len(out.shape) == 2\n",
663 | "assert out.shape[-1] == 10"
664 | ],
665 | "metadata": {
666 | "id": "bFnYI29v4N1P"
667 | },
668 | "execution_count": null,
669 | "outputs": []
670 | },
671 | {
672 | "cell_type": "code",
673 | "execution_count": null,
674 | "metadata": {
675 | "id": "EAhMwySkrlpq"
676 | },
677 | "outputs": [],
678 | "source": [
679 | "torchsummary.summary(TinyNeuralNetwork().to(DEVICE), (28*28,))"
680 | ]
681 | },
682 | {
683 | "cell_type": "code",
684 | "execution_count": null,
685 | "metadata": {
686 | "id": "i3POFj90Ti-6"
687 | },
688 | "outputs": [],
689 | "source": [
690 | "# tiny_model = TinyNeuralNetwork().to(DEVICE)\n",
691 | "# opt = # YOUR CODE HERE\n",
692 | "# loss_func = # YOUR CODE HERE\n",
693 | "# n_epoch = # YOUR CODE HERE\n",
694 | "\n",
695 | "# Your experiments, come here\n",
696 | "# train_val_loop(tiny_model, train_loader=train_loader, val_loader=val_loader, name='tiny model', optimizer=opt, criterion=loss_func, n_epoch=n_epoch)"
697 | ]
698 | },
699 | {
700 | "cell_type": "code",
701 | "source": [
702 | "# test_accuracy(tiny_model)"
703 | ],
704 | "metadata": {
705 | "id": "XxRsu7pBIf5f"
706 | },
707 | "execution_count": null,
708 | "outputs": []
709 | },
710 | {
711 | "cell_type": "markdown",
712 | "metadata": {
713 | "id": "L7ISqkjmCPB1"
714 | },
715 | "source": [
716 | "## Task 3.2: Overfit it.\n",
717 | "Build a network that will overfit to this dataset. Demonstrate the overfitting in the appropriate way (e.g. plot loss and accurasy on train and test set w.r.t. network complexity).\n",
718 | "\n",
719 | "*Note:* you also might decrease the size of `train` dataset to enforce the overfitting and speed up the computations."
720 | ]
721 | },
722 | {
723 | "cell_type": "code",
724 | "execution_count": null,
725 | "metadata": {
726 | "id": "H12uAWiGBwJx"
727 | },
728 | "outputs": [],
729 | "source": [
730 | "class OverfittingNeuralNetwork(nn.Module):\n",
731 | " def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):\n",
732 | " super(self.__class__, self).__init__()\n",
733 | " self.model = nn.Sequential(\n",
734 | " nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards\n",
735 | " # YOUR CODE HERE\n",
736 | " )\n",
737 | "\n",
738 | " def forward(self, inp):\n",
739 | " out = self.model(inp)\n",
740 | " return out"
741 | ]
742 | },
743 | {
744 | "cell_type": "code",
745 | "source": [
746 | "model = OverfittingNeuralNetwork()\n",
747 | "out = model(torch.randn(2, 1, 28, 28))\n",
748 | "assert len(out.shape) == 2\n",
749 | "assert out.shape[-1] == 10"
750 | ],
751 | "metadata": {
752 | "id": "zR4muHZr5k8I"
753 | },
754 | "execution_count": null,
755 | "outputs": []
756 | },
757 | {
758 | "cell_type": "code",
759 | "execution_count": null,
760 | "metadata": {
761 | "id": "JgXAKCpvCwqH"
762 | },
763 | "outputs": [],
764 | "source": [
765 | "torchsummary.summary(OverfittingNeuralNetwork().to(DEVICE), (28*28,))"
766 | ]
767 | },
768 | {
769 | "cell_type": "code",
770 | "execution_count": null,
771 | "metadata": {
772 | "id": "Iyuwd4ZLrlpr"
773 | },
774 | "outputs": [],
775 | "source": [
776 | "# overfit_model = OverfittingNeuralNetwork().to(DEVICE)\n",
777 | "# opt = # YOUR CODE HERE\n",
778 | "# loss_func = # YOUR CODE HERE\n",
779 | "# n_epoch = # YOUR CODE HERE\n",
780 | "\n",
781 | "# Your experiments, come here\n",
782 | "# train_val_loop(overfit_model, train_loader=train_loader, val_loader=val_loader, name='overfit model', optimizer=opt, criterion=loss_func, n_epoch=n_epoch)"
783 | ]
784 | },
785 | {
786 | "cell_type": "code",
787 | "source": [
788 | "# test_accuracy(overfit_model)"
789 | ],
790 | "metadata": {
791 | "id": "GqFQzpfJIem3"
792 | },
793 | "execution_count": null,
794 | "outputs": []
795 | },
796 | {
797 | "cell_type": "markdown",
798 | "metadata": {
799 | "id": "LG8mNHtPrlpr"
800 | },
801 | "source": [
802 | "## Task 3.3: Fix it.\n",
803 | "Fix the overfitted network from the previous step (at least partially) by using regularization techniques (Dropout/Batchnorm/...) and demonstrate the results."
804 | ]
805 | },
806 | {
807 | "cell_type": "code",
808 | "execution_count": null,
809 | "metadata": {
810 | "id": "42343iSyrlpr"
811 | },
812 | "outputs": [],
813 | "source": [
814 | "class FixedNeuralNetwork(nn.Module):\n",
815 | " def __init__(self, input_shape=28*28, num_classes=10, input_channels=1):\n",
816 | " super(self.__class__, self).__init__()\n",
817 | " self.model = nn.Sequential(\n",
818 | " nn.Flatten(), # This layer converts image into a vector to use Linear layers afterwards\n",
819 | " # YOUR CODE HERE\n",
820 | " )\n",
821 | "\n",
822 | " def forward(self, inp):\n",
823 | " out = self.model(inp)\n",
824 | " return out"
825 | ]
826 | },
827 | {
828 | "cell_type": "code",
829 | "source": [
830 | "model = FixedNeuralNetwork()\n",
831 | "out = model(torch.randn(2, 1, 28, 28))\n",
832 | "assert len(out.shape) == 2\n",
833 | "assert out.shape[-1] == 10"
834 | ],
835 | "metadata": {
836 | "id": "93Twxz_N6Ade"
837 | },
838 | "execution_count": null,
839 | "outputs": []
840 | },
841 | {
842 | "cell_type": "code",
843 | "execution_count": null,
844 | "metadata": {
845 | "id": "TR1xQBp9rlps"
846 | },
847 | "outputs": [],
848 | "source": [
849 | "torchsummary.summary(FixedNeuralNetwork().to(DEVICE), (28*28,))"
850 | ]
851 | },
852 | {
853 | "cell_type": "code",
854 | "execution_count": null,
855 | "metadata": {
856 | "id": "OMdEf9Kbrlps"
857 | },
858 | "outputs": [],
859 | "source": [
860 | "# fixed_model = FixedNeuralNetwork().to(device)\n",
861 | "# opt = # YOUR CODE HERE\n",
862 | "# loss_func = # YOUR CODE HERE\n",
863 | "# n_epoch = # YOUR CODE HERE\n",
864 | "\n",
865 | "# Your experiments, come here\n",
866 | "# train_val_loop(fixed_model, train_loader=train_loader, val_loader=val_loader, name='fixed model', optimizer=opt, criterion=loss_func, n_epoch=n_epoch)"
867 | ]
868 | },
869 | {
870 | "cell_type": "code",
871 | "source": [
872 | "# test_accuracy(fixed_model)"
873 | ],
874 | "metadata": {
875 | "id": "idv0HgvDIckN"
876 | },
877 | "execution_count": null,
878 | "outputs": []
879 | },
880 | {
881 | "cell_type": "markdown",
882 | "metadata": {
883 | "id": "dMui_uLJ7G0d"
884 | },
885 | "source": [
886 | "### Conclusions:\n",
887 | "_Write down small report with your conclusions and your ideas._\n",
888 | "\n",
889 | "YOUR WORDS HERE"
890 | ]
891 | },
892 | {
893 | "cell_type": "markdown",
894 | "source": [
895 | "# Task 4. Your own nn layer. (4 points)"
896 | ],
897 | "metadata": {
898 | "id": "7J7Tpbc9udSn"
899 | }
900 | },
901 | {
902 | "cell_type": "code",
903 | "source": [
904 | "class Module(object):\n",
905 | " \"\"\"\n",
906 | " Basically, you can think of a module as of a something (black box)\n",
907 | " which can process `input` data and produce `ouput` data.\n",
908 | " This is like applying a function which is called `forward`:\n",
909 | "\n",
910 | " output = module.forward(input)\n",
911 | "\n",
912 | " The module should be able to perform a backward pass: to differentiate the `forward` function.\n",
913 | " More, it should be able to differentiate it if is a part of chain (chain rule).\n",
914 | " The latter implies there is a gradient from previous step of a chain rule.\n",
915 | "\n",
916 | " gradInput = module.backward(input, gradOutput)\n",
917 | " \"\"\"\n",
918 | " def __init__ (self):\n",
919 | " self.output = None\n",
920 | " self.gradInput = None\n",
921 | " self.training = True\n",
922 | "\n",
923 | " def forward(self, input):\n",
924 | " \"\"\"\n",
925 | " Takes an input object, and computes the corresponding output of the module.\n",
926 | " \"\"\"\n",
927 | " return self.updateOutput(input)\n",
928 | "\n",
929 | " def backward(self,input, gradOutput):\n",
930 | " \"\"\"\n",
931 | " Performs a backpropagation step through the module, with respect to the given input.\n",
932 | "\n",
933 | " This includes\n",
934 | " - computing a gradient w.r.t. `input` (is needed for further backprop),\n",
935 | " - computing a gradient w.r.t. parameters (to update parameters while optimizing).\n",
936 | " \"\"\"\n",
937 | " self.updateGradInput(input, gradOutput)\n",
938 | " self.accGradParameters(input, gradOutput)\n",
939 | " return self.gradInput\n",
940 | "\n",
941 | "\n",
942 | " def updateOutput(self, input):\n",
943 | " \"\"\"\n",
944 | " Computes the output using the current parameter set of the class and input.\n",
945 | " This function returns the result which is stored in the `output` field.\n",
946 | "\n",
947 | " Make sure to both store the data in `output` field and return it.\n",
948 | " \"\"\"\n",
949 | "\n",
950 | " # The easiest case:\n",
951 | "\n",
952 | " # self.output = input\n",
953 | " # return self.output\n",
954 | "\n",
955 | " pass\n",
956 | "\n",
957 | " def updateGradInput(self, input, gradOutput):\n",
958 | " \"\"\"\n",
959 | " Computing the gradient of the module with respect to its own input.\n",
960 | " This is returned in `gradInput`. Also, the `gradInput` state variable is updated accordingly.\n",
961 | "\n",
962 | " The shape of `gradInput` is always the same as the shape of `input`.\n",
963 | "\n",
964 | " Make sure to both store the gradients in `gradInput` field and return it.\n",
965 | " \"\"\"\n",
966 | "\n",
967 | " # The easiest case:\n",
968 | "\n",
969 | " # self.gradInput = gradOutput\n",
970 | " # return self.gradInput\n",
971 | "\n",
972 | " pass\n",
973 | "\n",
974 | " def accGradParameters(self, input, gradOutput):\n",
975 | " \"\"\"\n",
976 | " Computing the gradient of the module with respect to its own parameters.\n",
977 | " No need to override if module has no parameters (e.g. ReLU).\n",
978 | " \"\"\"\n",
979 | " pass\n",
980 | "\n",
981 | " def zeroGradParameters(self):\n",
982 | " \"\"\"\n",
983 | " Zeroes `gradParams` variable if the module has params.\n",
984 | " \"\"\"\n",
985 | " pass\n",
986 | "\n",
987 | " def getParameters(self):\n",
988 | " \"\"\"\n",
989 | " Returns a list with its parameters.\n",
990 | " If the module does not have parameters return empty list.\n",
991 | " \"\"\"\n",
992 | " return []\n",
993 | "\n",
994 | " def getGradParameters(self):\n",
995 | " \"\"\"\n",
996 | " Returns a list with gradients with respect to its parameters.\n",
997 | " If the module does not have parameters return empty list.\n",
998 | " \"\"\"\n",
999 | " return []\n",
1000 | "\n",
1001 | " def train(self):\n",
1002 | " \"\"\"\n",
1003 | " Sets training mode for the module.\n",
1004 | " Training and testing behaviour differs for Dropout, BatchNorm.\n",
1005 | " \"\"\"\n",
1006 | " self.training = True\n",
1007 | "\n",
1008 | " def evaluate(self):\n",
1009 | " \"\"\"\n",
1010 | " Sets evaluation mode for the module.\n",
1011 | " Training and testing behaviour differs for Dropout, BatchNorm.\n",
1012 | " \"\"\"\n",
1013 | " self.training = False\n",
1014 | "\n",
1015 | " def __repr__(self):\n",
1016 | " \"\"\"\n",
1017 | " Pretty printing. Should be overrided in every module if you want\n",
1018 | " to have readable description.\n",
1019 | " \"\"\"\n",
1020 | " return \"Module\""
1021 | ],
1022 | "metadata": {
1023 | "id": "vEb-QueVzvMq"
1024 | },
1025 | "execution_count": null,
1026 | "outputs": []
1027 | },
1028 | {
1029 | "cell_type": "markdown",
1030 | "source": [
1031 | "### Linear transform layer\n",
1032 | "Also known as dense layer, fully-connected layer, FC-layer.\n",
1033 | "You should implement it.\n",
1034 | "\n",
1035 | "- input: **`batch_size x n_feats1`**\n",
1036 | "- output: **`batch_size x n_feats2`**"
1037 | ],
1038 | "metadata": {
1039 | "id": "HzohCP4qz42l"
1040 | }
1041 | },
1042 | {
1043 | "cell_type": "code",
1044 | "source": [
1045 | "class Linear(Module):\n",
1046 | " \"\"\"\n",
1047 | " A module which applies a linear transformation\n",
1048 | " A common name is fully-connected layer, InnerProductLayer in caffe.\n",
1049 | "\n",
1050 | " The module should work with 2D input of shape (n_samples, n_feature).\n",
1051 | " \"\"\"\n",
1052 | " def __init__(self, n_in, n_out):\n",
1053 | " super(Linear, self).__init__()\n",
1054 | "\n",
1055 | " # This is a nice initialization\n",
1056 | " stdv = 1./np.sqrt(n_in)\n",
1057 | " #it is important that we should multiply X @ W^T\n",
1058 | " self.W = np.random.uniform(-stdv, stdv, size = (n_out, n_in))\n",
1059 | " self.b = np.random.uniform(-stdv, stdv, size = n_out)\n",
1060 | "\n",
1061 | " self.gradW = np.zeros_like(self.W)\n",
1062 | " self.gradb = np.zeros_like(self.b)\n",
1063 | "\n",
1064 | " def updateOutput(self, input):\n",
1065 | " # YOUR CODE HERE\n",
1066 | " pass\n",
1067 | "\n",
1068 | " def updateGradInput(self, input, gradOutput):\n",
1069 | " # YOUR CODE HERE\n",
1070 | " pass\n",
1071 | "\n",
1072 | " def accGradParameters(self, input, gradOutput):\n",
1073 | " # YOUR CODE HERE\n",
1074 | " pass\n",
1075 | "\n",
1076 | " def zeroGradParameters(self):\n",
1077 | " self.gradW.fill(0)\n",
1078 | " self.gradb.fill(0)\n",
1079 | "\n",
1080 | " def getParameters(self):\n",
1081 | " return [self.W, self.b]\n",
1082 | "\n",
1083 | " def getGradParameters(self):\n",
1084 | " return [self.gradW, self.gradb]\n",
1085 | "\n",
1086 | " def __repr__(self):\n",
1087 | " s = self.W.shape\n",
1088 | " q = 'Linear %d -> %d' %(s[1],s[0])\n",
1089 | " return q"
1090 | ],
1091 | "metadata": {
1092 | "id": "bbkgSwhpz1yw"
1093 | },
1094 | "execution_count": null,
1095 | "outputs": []
1096 | },
1097 | {
1098 | "cell_type": "code",
1099 | "source": [
1100 | "def test_Linear():\n",
1101 | " np.random.seed(42)\n",
1102 | " torch.manual_seed(42)\n",
1103 | "\n",
1104 | " batch_size, n_in, n_out = 2, 3, 4\n",
1105 | " for _ in range(100):\n",
1106 | " # layers initialization\n",
1107 | " torch_layer = torch.nn.Linear(n_in, n_out)\n",
1108 | " custom_layer = Linear(n_in, n_out)\n",
1109 | " custom_layer.W = torch_layer.weight.data.numpy()\n",
1110 | " custom_layer.b = torch_layer.bias.data.numpy()\n",
1111 | "\n",
1112 | " layer_input = np.random.uniform(-10, 10, (batch_size, n_in)).astype(np.float32)\n",
1113 | " next_layer_grad = np.random.uniform(-10, 10, (batch_size, n_out)).astype(np.float32)\n",
1114 | "\n",
1115 | " # 1. check layer output\n",
1116 | " custom_layer_output = custom_layer.updateOutput(layer_input)\n",
1117 | " layer_input_var = torch.from_numpy(layer_input).requires_grad_(True)\n",
1118 | " torch_layer_output_var = torch_layer(layer_input_var)\n",
1119 | " assert np.allclose(torch_layer_output_var.data.numpy(), custom_layer_output, atol=1e-6)\n",
1120 | "\n",
1121 | " # 2. check layer input grad\n",
1122 | " custom_layer_grad = custom_layer.updateGradInput(layer_input, next_layer_grad)\n",
1123 | " torch_layer_output_var.backward(torch.from_numpy(next_layer_grad))\n",
1124 | " torch_layer_grad_var = layer_input_var.grad\n",
1125 | " assert np.allclose(torch_layer_grad_var.data.numpy(), custom_layer_grad, atol=1e-6)\n",
1126 | "\n",
1127 | " # 3. check layer parameters grad\n",
1128 | " custom_layer.accGradParameters(layer_input, next_layer_grad)\n",
1129 | " weight_grad = custom_layer.gradW\n",
1130 | " bias_grad = custom_layer.gradb\n",
1131 | " torch_weight_grad = torch_layer.weight.grad.data.numpy()\n",
1132 | " torch_bias_grad = torch_layer.bias.grad.data.numpy()\n",
1133 | " assert np.allclose(torch_weight_grad, weight_grad, atol=1e-6)\n",
1134 | " assert np.allclose(torch_bias_grad, bias_grad, atol=1e-6)"
1135 | ],
1136 | "metadata": {
1137 | "id": "M0PEw9VVugYd"
1138 | },
1139 | "execution_count": null,
1140 | "outputs": []
1141 | },
1142 | {
1143 | "cell_type": "code",
1144 | "source": [
1145 | "%%time\n",
1146 | "test_Linear()"
1147 | ],
1148 | "metadata": {
1149 | "id": "xOt8kzgwz7Ev"
1150 | },
1151 | "execution_count": null,
1152 | "outputs": []
1153 | },
1154 | {
1155 | "cell_type": "code",
1156 | "source": [],
1157 | "metadata": {
1158 | "id": "hBPzu7JAauKd"
1159 | },
1160 | "execution_count": null,
1161 | "outputs": []
1162 | }
1163 | ]
1164 | }
--------------------------------------------------------------------------------
/hw02/2_Hw_Students.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {
7 | "colab": {
8 | "base_uri": "https://localhost:8080/"
9 | },
10 | "id": "tsZpFIaRfROD",
11 | "outputId": "e5e6a59d-d91b-41f7-a230-fa4e9bc3e449"
12 | },
13 | "outputs": [
14 | {
15 | "output_type": "stream",
16 | "name": "stderr",
17 | "text": [
18 | "/usr/local/lib/python3.10/dist-packages/albumentations/__init__.py:13: UserWarning: A new version of Albumentations is available: 1.4.18 (you have 1.4.15). Upgrade using: pip install -U albumentations. To disable automatic update checks, set the environment variable NO_ALBUMENTATIONS_UPDATE to 1.\n",
19 | " check_for_updates()\n"
20 | ]
21 | }
22 | ],
23 | "source": [
24 | "import torch\n",
25 | "import torch.nn as nn\n",
26 | "import torchvision.models\n",
27 | "from torch.utils.data import Dataset, DataLoader\n",
28 | "import torch.optim as optim\n",
29 | "import torch.nn.functional as F\n",
30 | "\n",
31 | "import albumentations as A\n",
32 | "from albumentations.pytorch import ToTensorV2\n",
33 | "\n",
34 | "from tqdm import tqdm\n",
35 | "from PIL import Image\n",
36 | "import cv2\n",
37 | "import matplotlib.pyplot as plt\n",
38 | "import numpy as np\n",
39 | "\n",
40 | "import os\n",
41 | "from time import time"
42 | ]
43 | },
44 | {
45 | "cell_type": "markdown",
46 | "source": [
47 | "### Get the data"
48 | ],
49 | "metadata": {
50 | "id": "U9HlqnlYoUJM"
51 | }
52 | },
53 | {
54 | "cell_type": "code",
55 | "source": [
56 | "import gdown\n",
57 | "url = 'https://drive.google.com/uc?id=10f1H2T-5W-BiqabHHtlZ4ASs19TZmg8R'\n",
58 | "output = 'data.zip'\n",
59 | "gdown.download(url, output, quiet=False)"
60 | ],
61 | "metadata": {
62 | "colab": {
63 | "base_uri": "https://localhost:8080/",
64 | "height": 122
65 | },
66 | "id": "AYTvLpTFfR9L",
67 | "outputId": "3baedbdb-2b28-4ed1-d627-647633ef1d94"
68 | },
69 | "execution_count": null,
70 | "outputs": [
71 | {
72 | "output_type": "stream",
73 | "name": "stderr",
74 | "text": [
75 | "Downloading...\n",
76 | "From (original): https://drive.google.com/uc?id=10f1H2T-5W-BiqabHHtlZ4ASs19TZmg8R\n",
77 | "From (redirected): https://drive.google.com/uc?id=10f1H2T-5W-BiqabHHtlZ4ASs19TZmg8R&confirm=t&uuid=8c91a23e-b723-404e-864e-01c84f6f72f9\n",
78 | "To: /content/data.zip\n",
79 | "100%|██████████| 979M/979M [00:19<00:00, 50.4MB/s]\n"
80 | ]
81 | },
82 | {
83 | "output_type": "execute_result",
84 | "data": {
85 | "text/plain": [
86 | "'data.zip'"
87 | ],
88 | "application/vnd.google.colaboratory.intrinsic+json": {
89 | "type": "string"
90 | }
91 | },
92 | "metadata": {},
93 | "execution_count": 2
94 | }
95 | ]
96 | },
97 | {
98 | "cell_type": "code",
99 | "source": [
100 | "!unzip data.zip"
101 | ],
102 | "metadata": {
103 | "id": "TLSvVki2fzUf"
104 | },
105 | "execution_count": null,
106 | "outputs": []
107 | },
108 | {
109 | "cell_type": "markdown",
110 | "source": [
111 | "### Utilities (0.5 point)\n",
112 | "\n",
113 | "Complete dataset to load prepared images and masks. Don't forget to use augmentations.\n",
114 | "\n",
115 | "Some of the images are 1 channels, so use `gray2rgb`."
116 | ],
117 | "metadata": {
118 | "id": "w1g03B9mtZeb"
119 | }
120 | },
121 | {
122 | "cell_type": "code",
123 | "source": [
124 | "def gray2rgb(img):\n",
125 | " if len(img.shape) != 3:\n",
126 | " img = np.dstack([img, img, img])\n",
127 | " return img\n",
128 | "\n",
129 | "def get_iou(gt, pred):\n",
130 | " pred = pred > 0.5\n",
131 | " return (gt & pred).sum() / (gt | pred).sum()\n",
132 | "\n",
133 | "class BirdsDataset(Dataset):\n",
134 | " def __init__(self, folder, ...) -> None:\n",
135 | " images_folder = os.path.join(folder, 'images')\n",
136 | " gt_folder = os.path.join(folder, 'gt')\n",
137 | "\n",
138 | " for class_name in os.listdir(images_folder):\n",
139 | " for fname in os.listdir(os.path.join(images_folder, class_name)):\n",
140 | " # YOUR CODE HERE\n",
141 | "\n",
142 | " self.transform = A.Compose([\n",
143 | " # YOUR CODE HERE\n",
144 | " ToTensorV2()\n",
145 | " ])\n",
146 | "\n",
147 | " def __getitem__(self, index):\n",
148 | " # YOUR CODE HERE\n",
149 | " img = ...\n",
150 | " mask = ...\n",
151 | " img = gray2rgb(img)\n",
152 | " # YOUR CODE HERE\n",
153 | " return transformed_img, transformed_mask\n",
154 | "\n",
155 | " def __len__(self):\n",
156 | " # YOUR CODE HERE\n",
157 | " return"
158 | ],
159 | "metadata": {
160 | "id": "YT2QUTqFooxJ"
161 | },
162 | "execution_count": null,
163 | "outputs": []
164 | },
165 | {
166 | "cell_type": "markdown",
167 | "source": [
168 | "### Architecture (1 point)\n",
169 | "Your task for today is to build your own Unet to solve the segmentation problem.\n",
170 | "\n",
171 | "As an encoder, you can use pre-trained on IMAGENET models(or parts) from torchvision. The decoder must be trained from scratch.\n",
172 | "It is forbidden to use data not from the `data` folder.\n",
173 | "\n",
174 | "I advise you to experiment with the number of blocks so as not to overfit on the training sample and get good quality on validation."
175 | ],
176 | "metadata": {
177 | "id": "dss-ZnpTuI1V"
178 | }
179 | },
180 | {
181 | "cell_type": "code",
182 | "source": [
183 | "class DecoderBlock(nn.Module):\n",
184 | " def __init__(self, in_channels, mid_channels, out_channels):\n",
185 | " super().__init__()\n",
186 | " # YOUR CODE HERE\n",
187 | "\n",
188 | " def forward(self,x):\n",
189 | " # YOUR CODE HERE\n",
190 | " return\n",
191 | "\n",
192 | "class Unet(nn.Module):\n",
193 | " def __init__(self):\n",
194 | " super().__init__()\n",
195 | " # YOUR CODE HERE\n",
196 | " # encoder blocks\n",
197 | " self.encoder1=\n",
198 | " self.encoder2=\n",
199 | " self.encoder3=\n",
200 | " # decoder blocks\n",
201 | " self.decoder1=\n",
202 | " self.decoder2=\n",
203 | " self.decoder3=\n",
204 | "\n",
205 | "\n",
206 | " def forward(self,x):\n",
207 | " # YOUR CODE HERE\n",
208 | " return"
209 | ],
210 | "metadata": {
211 | "id": "_Elr1Uw3uITD"
212 | },
213 | "execution_count": null,
214 | "outputs": []
215 | },
216 | {
217 | "cell_type": "markdown",
218 | "source": [
219 | "### Train script (0.5 point)\n",
220 | "\n",
221 | "Complete the train and predict scripts."
222 | ],
223 | "metadata": {
224 | "id": "7Sq4WwZsuMeD"
225 | }
226 | },
227 | {
228 | "cell_type": "code",
229 | "execution_count": null,
230 | "metadata": {
231 | "id": "d_ha44iifROE"
232 | },
233 | "outputs": [],
234 | "source": [
235 | "def train_segmentation_model(data_path):\n",
236 | " BATCH_SIZE = 8\n",
237 | " N_EPOCH = 15\n",
238 | " DEVICE = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
239 | "\n",
240 | " train_dataset = BirdsDataset(data_path + 'train')\n",
241 | " val_dataset = BirdsDataset(data_path + 'val')\n",
242 | " train_dataloader = DataLoader(train_dataset, batch_size=BATCH_SIZE, shuffle=True)\n",
243 | " val_dataloader = DataLoader(val_dataset, batch_size=BATCH_SIZE, shuffle=False)\n",
244 | "\n",
245 | " model = Unet().to(DEVICE)\n",
246 | " optimizer = # YOUR CODE HERE\n",
247 | " criterion = # YOUR CODE HERE\n",
248 | " losses_train, losses_val, ious_train, ious_val = [], [], [], []\n",
249 | "\n",
250 | " for epoch in range(N_EPOCH):\n",
251 | " model.train()\n",
252 | "\n",
253 | " for tqdm(inputs, masks) in train_dataloader:\n",
254 | " inputs = inputs.to(DEVICE)\n",
255 | " masks = masks.to(DEVICE)\n",
256 | " # YOUR CODE HERE\n",
257 | " losses_train.append(...)\n",
258 | " ious_train.append(...)\n",
259 | "\n",
260 | " model.eval()\n",
261 | " with torch.no_grad():\n",
262 | " for inputs, masks in tqdm(val_dataloader):\n",
263 | " inputs = inputs.to(DEVICE)\n",
264 | " masks = masks.to(DEVICE)\n",
265 | " # YOUR CODE HERE\n",
266 | " losses_val.append(...)\n",
267 | " ious_val.append(...)\n",
268 | "\n",
269 | " torch.save(model.state_dict(), f'model_{epoch}.pth')\n",
270 | "\n",
271 | " print(f\"Epoch: {epoch}, train loss: {losses_train[-1]}, val loss: {losses_val[-1]}, train iou: {ious_train[-1]}, val iou: {ious_val[-1]}\")"
272 | ]
273 | },
274 | {
275 | "cell_type": "code",
276 | "source": [
277 | "def predict(model, img_path):\n",
278 | " with torch.no_grad():\n",
279 | " # YOUR CODE HERE TO PREPARE IMAGE\n",
280 | " # GET PREDICTIONS\n",
281 | " # POST PROCESS\n",
282 | " return segm\n",
283 | "\n",
284 | "def get_model(path):\n",
285 | " model = Unet()\n",
286 | " model.load_state_dict(torch.load(path))\n",
287 | " model.eval()\n",
288 | " return model"
289 | ],
290 | "metadata": {
291 | "id": "96EkIQmutpdS"
292 | },
293 | "execution_count": null,
294 | "outputs": []
295 | },
296 | {
297 | "cell_type": "code",
298 | "execution_count": null,
299 | "metadata": {
300 | "id": "LzZS9Z2jfROF"
301 | },
302 | "outputs": [],
303 | "source": [
304 | "train_segmentation_model('data/')"
305 | ]
306 | },
307 | {
308 | "cell_type": "markdown",
309 | "source": [
310 | "You can also experiment with models and write a small report about results. If the report will be meaningful, you will receive an extra point."
311 | ],
312 | "metadata": {
313 | "id": "MWKD09whySKA"
314 | }
315 | },
316 | {
317 | "cell_type": "markdown",
318 | "source": [
319 | "### Testing (8 points)\n",
320 | "Your model will be tested on the new data, similar to validation, so use techniques to prevent overfitting the model.\n",
321 | "\n",
322 | "* IoU > 0.85 — 8 points\n",
323 | "* IoU > 0.80 — 7 points\n",
324 | "* IoU > 0.75 — 6 points\n",
325 | "* IoU > 0.70 — 5 points\n",
326 | "* IoU > 0.60 — 4 points\n",
327 | "* IoU > 0.50 — 3 points\n",
328 | "* IoU > 0.40 — 2 points\n",
329 | "* IoU > 0.30 — 1 points"
330 | ],
331 | "metadata": {
332 | "id": "zCHacSHutHo4"
333 | }
334 | },
335 | {
336 | "cell_type": "code",
337 | "source": [
338 | "model = get_model('model_14.pth').to('cuda')"
339 | ],
340 | "metadata": {
341 | "id": "DZ6h11Q0tUHN"
342 | },
343 | "execution_count": null,
344 | "outputs": []
345 | },
346 | {
347 | "cell_type": "code",
348 | "execution_count": null,
349 | "metadata": {
350 | "id": "yV9zadusfROF"
351 | },
352 | "outputs": [],
353 | "source": [
354 | "ious, times = [], []\n",
355 | "test_dir = 'data/val/'\n",
356 | "\n",
357 | "for class_name in tqdm(sorted(os.listdir(os.path.join(test_dir, 'images')))):\n",
358 | " for img_name in sorted(os.listdir(os.path.join(test_dir, 'images', class_name))):\n",
359 | "\n",
360 | " t_start = time()\n",
361 | " pred = predict(model, os.path.join(test_dir, 'images', class_name, img_name))\n",
362 | " times.append(time() - t_start)\n",
363 | "\n",
364 | " gt_name = img_name.replace('jpg', 'png')\n",
365 | " gt = np.asarray(Image.open(os.path.join(test_dir, 'gt', class_name, gt_name)), dtype = np.uint8)\n",
366 | " if len(gt.shape) > 2:\n",
367 | " gt = gt[:, :, 0]\n",
368 | "\n",
369 | " iou = get_iou(gt==255, pred>0.5)\n",
370 | " ious.append(iou)\n",
371 | "\n",
372 | "np.mean(ious), np.mean(times)"
373 | ]
374 | },
375 | {
376 | "cell_type": "markdown",
377 | "source": [
378 | "### Compression (1 point)"
379 | ],
380 | "metadata": {
381 | "id": "47KgrqdpvKWS"
382 | }
383 | },
384 | {
385 | "cell_type": "markdown",
386 | "source": [
387 | "Try to speed up the model in any way without losing more than 1% in iou score.\n",
388 | "For example [torch2trt](https://github.com/NVIDIA-AI-IOT/torch2trt)"
389 | ],
390 | "metadata": {
391 | "id": "4kJiLB__vTC3"
392 | }
393 | },
394 | {
395 | "cell_type": "code",
396 | "source": [
397 | "def get_fast_model():\n",
398 | " # YOUR CODE HERE\n",
399 | " return model"
400 | ],
401 | "metadata": {
402 | "id": "UQyNHbt0vtMu"
403 | },
404 | "execution_count": null,
405 | "outputs": []
406 | },
407 | {
408 | "cell_type": "code",
409 | "source": [
410 | "fast_model = get_fast_model().to('cuda')"
411 | ],
412 | "metadata": {
413 | "id": "f2DedST0v6aF"
414 | },
415 | "execution_count": null,
416 | "outputs": []
417 | },
418 | {
419 | "cell_type": "code",
420 | "source": [
421 | "ious, times = [], []\n",
422 | "test_dir = 'data/val/'\n",
423 | "\n",
424 | "for class_name in tqdm(sorted(os.listdir(os.path.join(test_dir, 'images')))):\n",
425 | " for img_name in sorted(os.listdir(os.path.join(test_dir, 'images', class_name))):\n",
426 | "\n",
427 | " t_start = time()\n",
428 | " pred = predict(fast_model, os.path.join(test_dir, 'images', class_name, img_name))\n",
429 | " times.append(time() - t_start)\n",
430 | "\n",
431 | " gt_name = img_name.replace('jpg', 'png')\n",
432 | " gt = np.asarray(Image.open(os.path.join(test_dir, 'gt', class_name, gt_name)), dtype = np.uint8)\n",
433 | " if len(gt.shape) > 2:\n",
434 | " gt = gt[:, :, 0]\n",
435 | "\n",
436 | " iou = get_iou(gt==255, pred>0.5)\n",
437 | " ious.append(iou)\n",
438 | "\n",
439 | "np.mean(ious), np.mean(times)"
440 | ],
441 | "metadata": {
442 | "id": "ryWUekS2vlv8"
443 | },
444 | "execution_count": null,
445 | "outputs": []
446 | },
447 | {
448 | "cell_type": "markdown",
449 | "source": [
450 | "**Bonus:** For the best iou score on test(without compression) in group you will get 1.5, 1, 0.5 extra points(for 1st, 2nd, 3rd places)."
451 | ],
452 | "metadata": {
453 | "id": "QCdMgBoOwXAb"
454 | }
455 | },
456 | {
457 | "cell_type": "code",
458 | "source": [],
459 | "metadata": {
460 | "id": "daanikNkwo5t"
461 | },
462 | "execution_count": null,
463 | "outputs": []
464 | }
465 | ],
466 | "metadata": {
467 | "kernelspec": {
468 | "display_name": "Python 3",
469 | "name": "python3"
470 | },
471 | "language_info": {
472 | "codemirror_mode": {
473 | "name": "ipython",
474 | "version": 3
475 | },
476 | "file_extension": ".py",
477 | "mimetype": "text/x-python",
478 | "name": "python",
479 | "nbconvert_exporter": "python",
480 | "pygments_lexer": "ipython3",
481 | "version": "3.12.2"
482 | },
483 | "colab": {
484 | "provenance": [],
485 | "gpuType": "T4"
486 | },
487 | "accelerator": "GPU"
488 | },
489 | "nbformat": 4,
490 | "nbformat_minor": 0
491 | }
--------------------------------------------------------------------------------
/week01_intro/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week01_intro/lecture.pdf
--------------------------------------------------------------------------------
/week02_init_regularization/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week02_init_regularization/lecture.pdf
--------------------------------------------------------------------------------
/week03_conv/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week03_conv/lecture.pdf
--------------------------------------------------------------------------------
/week04_tricks/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week04_tricks/lecture.pdf
--------------------------------------------------------------------------------
/week05_segmentation/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week05_segmentation/lecture.pdf
--------------------------------------------------------------------------------
/week06_detection/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week06_detection/lecture.pdf
--------------------------------------------------------------------------------
/week07_word_embeddings/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week07_word_embeddings/lecture.pdf
--------------------------------------------------------------------------------
/week07_word_embeddings/seminar.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {
6 | "id": "hchgr-3Nmn7o"
7 | },
8 | "source": [
9 | "## Seminar 1: Fun with Word Embeddings (3 points)\n",
10 | "\n",
11 | "Today we gonna play with word embeddings: train our own little embedding, load one from gensim model zoo and use it to visualize text corpora.\n",
12 | "\n",
13 | "This whole thing is gonna happen on top of embedding dataset.\n",
14 | "\n",
15 | "__Requirements:__ `pip install --upgrade nltk gensim bokeh` , but only if you're running locally."
16 | ]
17 | },
18 | {
19 | "cell_type": "code",
20 | "execution_count": null,
21 | "metadata": {
22 | "collapsed": true,
23 | "id": "QmUCK9lVmn7q"
24 | },
25 | "outputs": [],
26 | "source": [
27 | "# download the data:\n",
28 | "!wget https://www.dropbox.com/s/obaitrix9jyu84r/quora.txt?dl=1 -O ./quora.txt\n",
29 | "# alternative download link: https://yadi.sk/i/BPQrUu1NaTduEw"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": null,
35 | "metadata": {
36 | "scrolled": false,
37 | "id": "YyzusR4Lmn7r"
38 | },
39 | "outputs": [],
40 | "source": [
41 | "import numpy as np\n",
42 | "\n",
43 | "data = list(open(\"./quora.txt\", encoding=\"utf-8\"))\n",
44 | "data[50]"
45 | ]
46 | },
47 | {
48 | "cell_type": "markdown",
49 | "metadata": {
50 | "id": "jOTmojdtmn7r"
51 | },
52 | "source": [
53 | "__Tokenization:__ a typical first step for an nlp task is to split raw data into words.\n",
54 | "The text we're working with is in raw format: with all the punctuation and smiles attached to some words, so a simple str.split won't do.\n",
55 | "\n",
56 | "Let's use __`nltk`__ - a library that handles many nlp tasks like tokenization, stemming or part-of-speech tagging."
57 | ]
58 | },
59 | {
60 | "cell_type": "code",
61 | "execution_count": null,
62 | "metadata": {
63 | "id": "Jya8V2Skmn7r"
64 | },
65 | "outputs": [],
66 | "source": [
67 | "from nltk.tokenize import WordPunctTokenizer\n",
68 | "tokenizer = WordPunctTokenizer()\n",
69 | "\n",
70 | "print(tokenizer.tokenize(data[50]))"
71 | ]
72 | },
73 | {
74 | "cell_type": "code",
75 | "execution_count": null,
76 | "metadata": {
77 | "collapsed": true,
78 | "id": "kitrV92Amn7r"
79 | },
80 | "outputs": [],
81 | "source": [
82 | "# TASK: lowercase everything and extract tokens with tokenizer.\n",
83 | "# data_tok should be a list of lists of tokens for each line in data.\n",
84 | "\n",
85 | "data_tok = # YOUR CODE"
86 | ]
87 | },
88 | {
89 | "cell_type": "code",
90 | "execution_count": null,
91 | "metadata": {
92 | "collapsed": true,
93 | "id": "bD7uAQzgmn7r"
94 | },
95 | "outputs": [],
96 | "source": [
97 | "assert all(isinstance(row, (list, tuple)) for row in data_tok), \"please convert each line into a list of tokens (strings)\"\n",
98 | "assert all(all(isinstance(tok, str) for tok in row) for row in data_tok), \"please convert each line into a list of tokens (strings)\"\n",
99 | "is_latin = lambda tok: all('a' <= x.lower() <= 'z' for x in tok)\n",
100 | "assert all(map(lambda l: not is_latin(l) or l.islower(), map(' '.join, data_tok))), \"please make sure to lowercase the data\""
101 | ]
102 | },
103 | {
104 | "cell_type": "code",
105 | "execution_count": null,
106 | "metadata": {
107 | "id": "sm2nO5yzmn7s"
108 | },
109 | "outputs": [],
110 | "source": [
111 | "print([' '.join(row) for row in data_tok[:2]])"
112 | ]
113 | },
114 | {
115 | "cell_type": "markdown",
116 | "metadata": {
117 | "id": "RloDQkKSmn7s"
118 | },
119 | "source": [
120 | "__Word vectors:__ as the saying goes, there's more than one way to train word embeddings. There's Word2Vec and GloVe with different objective functions. Then there's fasttext that uses character-level models to train word embeddings.\n",
121 | "\n",
122 | "The choice is huge, so let's start someplace small: __gensim__ is another nlp library that features many vector-based models incuding word2vec."
123 | ]
124 | },
125 | {
126 | "cell_type": "code",
127 | "execution_count": null,
128 | "metadata": {
129 | "collapsed": true,
130 | "id": "HT6ie7OWmn7s"
131 | },
132 | "outputs": [],
133 | "source": [
134 | "from gensim.models import Word2Vec\n",
135 | "model = Word2Vec(data_tok,\n",
136 | " vector_size=32, # embedding vector size\n",
137 | " min_count=5, # consider words that occured at least 5 times\n",
138 | " window=5).wv # define context as a 5-word window around the target word"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": null,
144 | "metadata": {
145 | "id": "_utr_4ZEmn7s"
146 | },
147 | "outputs": [],
148 | "source": [
149 | "# now you can get word vectors !\n",
150 | "model.get_vector('anything')"
151 | ]
152 | },
153 | {
154 | "cell_type": "code",
155 | "execution_count": null,
156 | "metadata": {
157 | "id": "x7X2rBLImn7s"
158 | },
159 | "outputs": [],
160 | "source": [
161 | "# or query similar words directly. Go play with it!\n",
162 | "model.most_similar('bread')"
163 | ]
164 | },
165 | {
166 | "cell_type": "markdown",
167 | "metadata": {
168 | "id": "varM16R3mn7t"
169 | },
170 | "source": [
171 | "### Using pre-trained model\n",
172 | "\n",
173 | "Took it a while, huh? Now imagine training life-sized (100~300D) word embeddings on gigabytes of text: wikipedia articles or twitter posts.\n",
174 | "\n",
175 | "Thankfully, nowadays you can get a pre-trained word embedding model in 2 lines of code (no sms required, promise)."
176 | ]
177 | },
178 | {
179 | "cell_type": "code",
180 | "execution_count": null,
181 | "metadata": {
182 | "collapsed": true,
183 | "id": "oeiEoLrUmn7t"
184 | },
185 | "outputs": [],
186 | "source": [
187 | "import gensim.downloader as api\n",
188 | "model = api.load('glove-twitter-100')"
189 | ]
190 | },
191 | {
192 | "cell_type": "code",
193 | "execution_count": null,
194 | "metadata": {
195 | "id": "ysNoDw7Umn7t"
196 | },
197 | "outputs": [],
198 | "source": [
199 | "model.most_similar(positive=[\"coder\", \"money\"], negative=[\"brain\"])"
200 | ]
201 | },
202 | {
203 | "cell_type": "markdown",
204 | "metadata": {
205 | "id": "_Kde3hgNmn7t"
206 | },
207 | "source": [
208 | "### Visualizing word vectors\n",
209 | "\n",
210 | "One way to see if our vectors are any good is to plot them. Thing is, those vectors are in 30D+ space and we humans are more used to 2-3D.\n",
211 | "\n",
212 | "Luckily, we machine learners know about __dimensionality reduction__ methods.\n",
213 | "\n",
214 | "Let's use that to plot 1000 most frequent words"
215 | ]
216 | },
217 | {
218 | "cell_type": "code",
219 | "execution_count": null,
220 | "metadata": {
221 | "id": "l0yKTqYymn7t"
222 | },
223 | "outputs": [],
224 | "source": [
225 | "words = model.index_to_key[:1000]\n",
226 | "\n",
227 | "print(words[::100])"
228 | ]
229 | },
230 | {
231 | "cell_type": "code",
232 | "execution_count": null,
233 | "metadata": {
234 | "id": "rLxEBnscmn7t"
235 | },
236 | "outputs": [],
237 | "source": [
238 | "# for each word, compute it's vector with model\n",
239 | "word_vectors = # YOUR CODE"
240 | ]
241 | },
242 | {
243 | "cell_type": "code",
244 | "execution_count": null,
245 | "metadata": {
246 | "collapsed": true,
247 | "id": "lZ06vHSJmn7t"
248 | },
249 | "outputs": [],
250 | "source": [
251 | "assert isinstance(word_vectors, np.ndarray)\n",
252 | "assert word_vectors.shape == (len(words), 100)\n",
253 | "assert np.isfinite(word_vectors).all()"
254 | ]
255 | },
256 | {
257 | "cell_type": "markdown",
258 | "metadata": {
259 | "id": "S2wMcn29mn7t"
260 | },
261 | "source": [
262 | "#### Linear projection: PCA\n",
263 | "\n",
264 | "The simplest linear dimensionality reduction method is __P__rincipial __C__omponent __A__nalysis.\n",
265 | "\n",
266 | "In geometric terms, PCA tries to find axes along which most of the variance occurs. The \"natural\" axes, if you wish.\n",
267 | "\n",
268 | "
\n",
269 | "\n",
270 | "\n",
271 | "Under the hood, it attempts to decompose object-feature matrix $X$ into two smaller matrices: $W$ and $\\hat W$ minimizing _mean squared error_:\n",
272 | "\n",
273 | "$$\\|(X W) \\hat{W} - X\\|^2_2 \\to_{W, \\hat{W}} \\min$$\n",
274 | "- $X \\in \\mathbb{R}^{n \\times m}$ - object matrix (**centered**);\n",
275 | "- $W \\in \\mathbb{R}^{m \\times d}$ - matrix of direct transformation;\n",
276 | "- $\\hat{W} \\in \\mathbb{R}^{d \\times m}$ - matrix of reverse transformation;\n",
277 | "- $n$ samples, $m$ original dimensions and $d$ target dimensions;\n",
278 | "\n"
279 | ]
280 | },
281 | {
282 | "cell_type": "code",
283 | "execution_count": null,
284 | "metadata": {
285 | "collapsed": true,
286 | "id": "USPP-k-Imn7t"
287 | },
288 | "outputs": [],
289 | "source": [
290 | "from sklearn.decomposition import PCA\n",
291 | "\n",
292 | "# map word vectors onto 2d plane with PCA. Use good old sklearn api (fit, transform)\n",
293 | "# after that, normalize vectors to make sure they have zero mean and unit variance\n",
294 | "word_vectors_pca = # YOUR CODE\n",
295 | "\n",
296 | "# and maybe MORE OF YOUR CODE here :)"
297 | ]
298 | },
299 | {
300 | "cell_type": "code",
301 | "execution_count": null,
302 | "metadata": {
303 | "collapsed": true,
304 | "id": "NV_x7D4omn7t"
305 | },
306 | "outputs": [],
307 | "source": [
308 | "assert word_vectors_pca.shape == (len(word_vectors), 2), \"there must be a 2d vector for each word\"\n",
309 | "assert max(abs(word_vectors_pca.mean(0))) < 1e-5, \"points must be zero-centered\""
310 | ]
311 | },
312 | {
313 | "cell_type": "markdown",
314 | "metadata": {
315 | "id": "VnybG7wHmn7t"
316 | },
317 | "source": [
318 | "#### Let's draw it!"
319 | ]
320 | },
321 | {
322 | "cell_type": "code",
323 | "execution_count": null,
324 | "metadata": {
325 | "id": "jo2-yN80mn7t"
326 | },
327 | "outputs": [],
328 | "source": [
329 | "import bokeh.models as bm, bokeh.plotting as pl\n",
330 | "from bokeh.io import output_notebook\n",
331 | "output_notebook()\n",
332 | "\n",
333 | "def draw_vectors(x, y, radius=10, alpha=0.25, color='blue',\n",
334 | " width=600, height=400, show=True, **kwargs):\n",
335 | " \"\"\" draws an interactive plot for data points with auxilirary info on hover \"\"\"\n",
336 | " if isinstance(color, str): color = [color] * len(x)\n",
337 | " data_source = bm.ColumnDataSource({ 'x' : x, 'y' : y, 'color': color, **kwargs })\n",
338 | "\n",
339 | " fig = pl.figure(active_scroll='wheel_zoom', width=width, height=height)\n",
340 | " fig.scatter('x', 'y', size=radius, color='color', alpha=alpha, source=data_source)\n",
341 | "\n",
342 | " fig.add_tools(bm.HoverTool(tooltips=[(key, \"@\" + key) for key in kwargs.keys()]))\n",
343 | " if show: pl.show(fig)\n",
344 | " return fig"
345 | ]
346 | },
347 | {
348 | "cell_type": "code",
349 | "execution_count": null,
350 | "metadata": {
351 | "id": "6J1c7Q9bmn7t"
352 | },
353 | "outputs": [],
354 | "source": [
355 | "draw_vectors(word_vectors_pca[:, 0], word_vectors_pca[:, 1], token=words)\n",
356 | "\n",
357 | "# hover a mouse over there and see if you can identify the clusters"
358 | ]
359 | },
360 | {
361 | "cell_type": "markdown",
362 | "metadata": {
363 | "id": "u9qhJAptmn7t"
364 | },
365 | "source": [
366 | "### Visualizing neighbors with t-SNE\n",
367 | "PCA is nice but it's strictly linear and thus only able to capture coarse high-level structure of the data.\n",
368 | "\n",
369 | "If we instead want to focus on keeping neighboring points near, we could use TSNE, which is itself an embedding method. Here you can read __[more on TSNE](https://distill.pub/2016/misread-tsne/)__."
370 | ]
371 | },
372 | {
373 | "cell_type": "code",
374 | "execution_count": null,
375 | "metadata": {
376 | "id": "UeQ2ixkHmn7t"
377 | },
378 | "outputs": [],
379 | "source": [
380 | "from sklearn.manifold import TSNE\n",
381 | "\n",
382 | "# map word vectors onto 2d plane with TSNE. hint: don't panic it may take a minute or two to fit.\n",
383 | "# normalize them as just lke with pca\n",
384 | "\n",
385 | "\n",
386 | "word_tsne = #YOUR CODE"
387 | ]
388 | },
389 | {
390 | "cell_type": "code",
391 | "execution_count": null,
392 | "metadata": {
393 | "collapsed": true,
394 | "scrolled": false,
395 | "id": "I5sA7faVmn7t"
396 | },
397 | "outputs": [],
398 | "source": [
399 | "draw_vectors(word_tsne[:, 0], word_tsne[:, 1], color='green', token=words)"
400 | ]
401 | },
402 | {
403 | "cell_type": "markdown",
404 | "metadata": {
405 | "id": "-j4S1Bwbmn7u"
406 | },
407 | "source": [
408 | "### Visualizing phrases\n",
409 | "\n",
410 | "Word embeddings can also be used to represent short phrases. The simplest way is to take __an average__ of vectors for all tokens in the phrase with some weights.\n",
411 | "\n",
412 | "This trick is useful to identify what data are you working with: find if there are any outliers, clusters or other artefacts.\n",
413 | "\n",
414 | "Let's try this new hammer on our data!\n"
415 | ]
416 | },
417 | {
418 | "cell_type": "code",
419 | "execution_count": null,
420 | "metadata": {
421 | "collapsed": true,
422 | "id": "zWEBCqxQmn7u"
423 | },
424 | "outputs": [],
425 | "source": [
426 | "def get_phrase_embedding(phrase):\n",
427 | " \"\"\"\n",
428 | " Convert phrase to a vector by aggregating it's word embeddings. See description above.\n",
429 | " \"\"\"\n",
430 | " # 1. lowercase phrase\n",
431 | " # 2. tokenize phrase\n",
432 | " # 3. average word vectors for all words in tokenized phrase\n",
433 | " # skip words that are not in model's vocabulary\n",
434 | " # if all words are missing from vocabulary, return zeros\n",
435 | "\n",
436 | " vector = np.zeros([model.vector_size], dtype='float32')\n",
437 | "\n",
438 | " # YOUR CODE\n",
439 | "\n",
440 | " return vector\n",
441 | "\n"
442 | ]
443 | },
444 | {
445 | "cell_type": "code",
446 | "execution_count": null,
447 | "metadata": {
448 | "collapsed": true,
449 | "id": "Upwk1fsNmn7u"
450 | },
451 | "outputs": [],
452 | "source": [
453 | "vector = get_phrase_embedding(\"I'm very sure. This never happened to me before...\")\n",
454 | "\n",
455 | "assert np.allclose(vector[::10],\n",
456 | " np.array([ 0.31807372, -0.02558171, 0.0933293 , -0.1002182 , -1.0278689 ,\n",
457 | " -0.16621883, 0.05083408, 0.17989802, 1.3701859 , 0.08655966],\n",
458 | " dtype=np.float32))"
459 | ]
460 | },
461 | {
462 | "cell_type": "code",
463 | "execution_count": null,
464 | "metadata": {
465 | "collapsed": true,
466 | "id": "e1gQrVSVmn7u"
467 | },
468 | "outputs": [],
469 | "source": [
470 | "# let's only consider ~5k phrases for a first run.\n",
471 | "chosen_phrases = data[::len(data) // 1000]\n",
472 | "\n",
473 | "# compute vectors for chosen phrases\n",
474 | "phrase_vectors = # YOUR CODE"
475 | ]
476 | },
477 | {
478 | "cell_type": "code",
479 | "execution_count": null,
480 | "metadata": {
481 | "collapsed": true,
482 | "id": "pWXfU6rTmn7u"
483 | },
484 | "outputs": [],
485 | "source": [
486 | "assert isinstance(phrase_vectors, np.ndarray) and np.isfinite(phrase_vectors).all()\n",
487 | "assert phrase_vectors.shape == (len(chosen_phrases), model.vector_size)"
488 | ]
489 | },
490 | {
491 | "cell_type": "code",
492 | "execution_count": null,
493 | "metadata": {
494 | "collapsed": true,
495 | "id": "g8P1tU0omn7u"
496 | },
497 | "outputs": [],
498 | "source": [
499 | "# map vectors into 2d space with pca, tsne or your other method of choice\n",
500 | "# don't forget to normalize\n",
501 | "\n",
502 | "phrase_vectors_2d = TSNE().fit_transform(phrase_vectors)\n",
503 | "\n",
504 | "phrase_vectors_2d = (phrase_vectors_2d - phrase_vectors_2d.mean(axis=0)) / phrase_vectors_2d.std(axis=0)"
505 | ]
506 | },
507 | {
508 | "cell_type": "code",
509 | "execution_count": null,
510 | "metadata": {
511 | "collapsed": true,
512 | "id": "N_zCSz5Zmn7u"
513 | },
514 | "outputs": [],
515 | "source": [
516 | "draw_vectors(phrase_vectors_2d[:, 0], phrase_vectors_2d[:, 1],\n",
517 | " phrase=[phrase[:50] for phrase in chosen_phrases],\n",
518 | " radius=20,)"
519 | ]
520 | },
521 | {
522 | "cell_type": "markdown",
523 | "metadata": {
524 | "id": "ML_oG0Nlmn7u"
525 | },
526 | "source": [
527 | "Finally, let's build a simple \"similar question\" engine with phrase embeddings we've built."
528 | ]
529 | },
530 | {
531 | "cell_type": "code",
532 | "execution_count": null,
533 | "metadata": {
534 | "collapsed": true,
535 | "id": "tfp3TEFpmn7u"
536 | },
537 | "outputs": [],
538 | "source": [
539 | "# compute vector embedding for all lines in data\n",
540 | "data_vectors = np.array([get_phrase_embedding(l) for l in data])"
541 | ]
542 | },
543 | {
544 | "cell_type": "code",
545 | "execution_count": null,
546 | "metadata": {
547 | "collapsed": true,
548 | "id": "-F9ozB8umn7u"
549 | },
550 | "outputs": [],
551 | "source": [
552 | "def find_nearest(query, k=10):\n",
553 | " \"\"\"\n",
554 | " given text line (query), return k most similar lines from data, sorted from most to least similar\n",
555 | " similarity should be measured as cosine between query and line embedding vectors\n",
556 | " hint: it's okay to use global variables: data and data_vectors. see also: np.argpartition, np.argsort\n",
557 | " \"\"\"\n",
558 | " # YOUR CODE\n",
559 | "\n",
560 | " return "
561 | ]
562 | },
563 | {
564 | "cell_type": "code",
565 | "execution_count": null,
566 | "metadata": {
567 | "collapsed": true,
568 | "id": "9HKuytD-mn7u"
569 | },
570 | "outputs": [],
571 | "source": [
572 | "results = find_nearest(query=\"How do i enter the matrix?\", k=10)\n",
573 | "\n",
574 | "print(''.join(results))\n",
575 | "\n",
576 | "assert len(results) == 10 and isinstance(results[0], str)\n",
577 | "assert results[0] == 'How do I get to the dark web?\\n'\n",
578 | "assert results[3] == 'What can I do to save the world?\\n'"
579 | ]
580 | },
581 | {
582 | "cell_type": "code",
583 | "execution_count": null,
584 | "metadata": {
585 | "collapsed": true,
586 | "id": "YG1OIIQ3mn7y"
587 | },
588 | "outputs": [],
589 | "source": [
590 | "find_nearest(query=\"How does Trump?\", k=10)"
591 | ]
592 | },
593 | {
594 | "cell_type": "code",
595 | "execution_count": null,
596 | "metadata": {
597 | "collapsed": true,
598 | "id": "YZzWxmukmn7y"
599 | },
600 | "outputs": [],
601 | "source": [
602 | "find_nearest(query=\"Why don't i ask a question myself?\", k=10)"
603 | ]
604 | },
605 | {
606 | "cell_type": "markdown",
607 | "metadata": {
608 | "collapsed": true,
609 | "id": "Oj76TY5Ymn7y"
610 | },
611 | "source": [
612 | "__Now what?__\n",
613 | "* Try running TSNE on all data, not just 1000 phrases\n",
614 | "* See what other embeddings are there in the model zoo: `gensim.downloader.info()`\n",
615 | "* Take a look at [FastText](https://github.com/facebookresearch/fastText) embeddings\n",
616 | "* Optimize find_nearest with locality-sensitive hashing: use [nearpy](https://github.com/pixelogik/NearPy) or `sklearn.neighbors`."
617 | ]
618 | }
619 | ],
620 | "metadata": {
621 | "kernelspec": {
622 | "display_name": "Python 3",
623 | "language": "python",
624 | "name": "python3"
625 | },
626 | "language_info": {
627 | "codemirror_mode": {
628 | "name": "ipython",
629 | "version": 3
630 | },
631 | "file_extension": ".py",
632 | "mimetype": "text/x-python",
633 | "name": "python",
634 | "nbconvert_exporter": "python",
635 | "pygments_lexer": "ipython3",
636 | "version": "3.5.2"
637 | },
638 | "colab": {
639 | "provenance": []
640 | }
641 | },
642 | "nbformat": 4,
643 | "nbformat_minor": 0
644 | }
--------------------------------------------------------------------------------
/week08_text_classification/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week08_text_classification/lecture.pdf
--------------------------------------------------------------------------------
/week09_transformer/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week09_transformer/lecture.pdf
--------------------------------------------------------------------------------
/week09_transformer/seminar.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {
7 | "id": "zriTdjauH8iQ",
8 | "colab": {
9 | "base_uri": "https://localhost:8080/"
10 | },
11 | "outputId": "f21304a0-5eef-4ade-e088-948c5db9a171"
12 | },
13 | "outputs": [
14 | {
15 | "output_type": "stream",
16 | "name": "stdout",
17 | "text": [
18 | "\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/480.6 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[90m╺\u001b[0m \u001b[32m471.0/480.6 kB\u001b[0m \u001b[31m22.9 MB/s\u001b[0m eta \u001b[36m0:00:01\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m480.6/480.6 kB\u001b[0m \u001b[31m10.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
19 | "\u001b[?25h\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/84.0 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m84.0/84.0 kB\u001b[0m \u001b[31m5.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
20 | "\u001b[?25h\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/116.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m116.3/116.3 kB\u001b[0m \u001b[31m7.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
21 | "\u001b[?25h\u001b[?25l \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m0.0/179.3 kB\u001b[0m \u001b[31m?\u001b[0m eta \u001b[36m-:--:--\u001b[0m\r\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m179.3/179.3 kB\u001b[0m \u001b[31m12.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
22 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m134.8/134.8 kB\u001b[0m \u001b[31m10.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
23 | "\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m194.1/194.1 kB\u001b[0m \u001b[31m12.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
24 | "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
25 | "gcsfs 2024.10.0 requires fsspec==2024.10.0, but you have fsspec 2024.9.0 which is incompatible.\u001b[0m\u001b[31m\n",
26 | "\u001b[0m"
27 | ]
28 | }
29 | ],
30 | "source": [
31 | "!pip install transformers datasets evaluate -q\n",
32 | "import transformers"
33 | ]
34 | },
35 | {
36 | "cell_type": "markdown",
37 | "metadata": {
38 | "id": "xQiRPWWHlSgv"
39 | },
40 | "source": [
41 | "### Using pre-trained transformers (seminar is worth 2 points)\n",
42 | "_for fun and profit_\n",
43 | "\n",
44 | "There are many toolkits that let you access pre-trained transformer models, but the most powerful and convenient by far is [`huggingface/transformers`](https://github.com/huggingface/transformers). In this week's practice, you'll learn how to download, apply and modify pre-trained transformers for a range of tasks. Buckle up, we're going in!\n",
45 | "\n",
46 | "\n",
47 | "__Pipelines:__ if all you want is to apply a pre-trained model, you can do that in one line of code using pipeline. Huggingface/transformers has a selection of pre-configured pipelines for masked language modelling, sentiment classification, question aswering, etc. ([see full list here](https://huggingface.co/transformers/main_classes/pipelines.html))\n",
48 | "\n",
49 | "A typical pipeline includes:\n",
50 | "* pre-processing, e.g. tokenization, subword segmentation\n",
51 | "* a backbone model, e.g. bert finetuned for classification\n",
52 | "* output post-processing\n",
53 | "\n",
54 | "Let's see it in action:"
55 | ]
56 | },
57 | {
58 | "cell_type": "code",
59 | "execution_count": null,
60 | "metadata": {
61 | "id": "rP1KFtvLlJHR",
62 | "colab": {
63 | "base_uri": "https://localhost:8080/",
64 | "height": 284,
65 | "referenced_widgets": [
66 | "9fe16c621bd643318bc341864efb3e4d",
67 | "d930d2cef8ed4ef29db50c13b9103056",
68 | "e8bbd6a633f54344aced163ff020c6b9",
69 | "53fd63ef3e6940c3a1a7da0d26f5bf00",
70 | "05c022d131f1454e8d9674da8bba015d",
71 | "1b51b4f4463341c1acdb4c57dbf8130b",
72 | "9fa2a910690040aabcd2b05e449b0c45",
73 | "f0ff5f31afd4450ea185123a577df0cf",
74 | "be63ea3b752642b6bf7e74e7fdc87430",
75 | "8dffc51581f74d7c81c6bdb3feb7ae4f",
76 | "e9a828847dcf41b5bbc1eaa44fde2c95",
77 | "e4e616b460ef4b11b4c352d467f494dc",
78 | "7ce760b40b834494a1efa1b078985bff",
79 | "d6f3f9f5db014075bd4fbca506e943f8",
80 | "97716ed55a074ce0b52e68be926a09c3",
81 | "86551d955c684d43b7f51d52cec868ec",
82 | "368689f0e8974d7eae2fd497c996ba20",
83 | "826d46d71d35444a8123fffd107d1f1b",
84 | "82fb664d4d614a0f9bea5b297783f0d5",
85 | "28d6b5e50d4545559dd0d610b30c3b96",
86 | "33ee09dba6d5422896396a102f78dbba",
87 | "e0a71a1e15fc4070b3147560f0e7d694",
88 | "c57238fcb356435ea7ad8daf7761879a",
89 | "0e35c262e8d343f7b83016b8c61fd744",
90 | "9081142688d44540a28e4b08e4091270",
91 | "83fdaa28d4ae47fea15f4ec950f1825f",
92 | "493ff70e5b59414c8ff285fc50391b12",
93 | "f6eaee53284c427f99126187a1883081",
94 | "04c3cf767105473fb6756223ef2d0030",
95 | "7ffd07ee1d4d482d8a9f1e2d8ddb2772",
96 | "15898c9d06734d60b23ee92d03592c33",
97 | "92bef379c00f46e2b02ace047c50a9af",
98 | "9a0a067c17b64df2a1184986c2412b26",
99 | "668ed89f0cdd460498543af635f4dd68",
100 | "ae9729bae35748b9b757ab558d50bdc4",
101 | "31797fec525e43aa9f968b63e95b8aaa",
102 | "5ec34446ceeb4f37a9f9ca0e43103416",
103 | "f0d3166484384e15ae480e8d42504d52",
104 | "560e25624e6844c7b92f705b9a6d06f5",
105 | "521e0794f59a4d0c8a74858d8ca8802c",
106 | "d6ba19a6e2374b34bd7a954a0f29a95a",
107 | "d04c2e5c9a9d4dc59a4ef5d61f55faf7",
108 | "0b4f45ff04994accb494587a662a648a",
109 | "beb8dd797e774760b641d27ecde161bc"
110 | ]
111 | },
112 | "outputId": "fd2868b8-1a26-4873-da4d-011719825781"
113 | },
114 | "outputs": [
115 | {
116 | "output_type": "stream",
117 | "name": "stderr",
118 | "text": [
119 | "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_auth.py:94: UserWarning: \n",
120 | "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
121 | "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
122 | "You will be able to reuse this secret in all of your notebooks.\n",
123 | "Please note that authentication is recommended but still optional to access public models or datasets.\n",
124 | " warnings.warn(\n"
125 | ]
126 | },
127 | {
128 | "output_type": "display_data",
129 | "data": {
130 | "text/plain": [
131 | "config.json: 0%| | 0.00/629 [00:00, ?B/s]"
132 | ],
133 | "application/vnd.jupyter.widget-view+json": {
134 | "version_major": 2,
135 | "version_minor": 0,
136 | "model_id": "9fe16c621bd643318bc341864efb3e4d"
137 | }
138 | },
139 | "metadata": {}
140 | },
141 | {
142 | "output_type": "display_data",
143 | "data": {
144 | "text/plain": [
145 | "model.safetensors: 0%| | 0.00/268M [00:00, ?B/s]"
146 | ],
147 | "application/vnd.jupyter.widget-view+json": {
148 | "version_major": 2,
149 | "version_minor": 0,
150 | "model_id": "e4e616b460ef4b11b4c352d467f494dc"
151 | }
152 | },
153 | "metadata": {}
154 | },
155 | {
156 | "output_type": "display_data",
157 | "data": {
158 | "text/plain": [
159 | "tokenizer_config.json: 0%| | 0.00/48.0 [00:00, ?B/s]"
160 | ],
161 | "application/vnd.jupyter.widget-view+json": {
162 | "version_major": 2,
163 | "version_minor": 0,
164 | "model_id": "c57238fcb356435ea7ad8daf7761879a"
165 | }
166 | },
167 | "metadata": {}
168 | },
169 | {
170 | "output_type": "display_data",
171 | "data": {
172 | "text/plain": [
173 | "vocab.txt: 0%| | 0.00/232k [00:00, ?B/s]"
174 | ],
175 | "application/vnd.jupyter.widget-view+json": {
176 | "version_major": 2,
177 | "version_minor": 0,
178 | "model_id": "668ed89f0cdd460498543af635f4dd68"
179 | }
180 | },
181 | "metadata": {}
182 | },
183 | {
184 | "output_type": "stream",
185 | "name": "stderr",
186 | "text": [
187 | "Hardware accelerator e.g. GPU is available in the environment, but no `device` argument is passed to the `Pipeline` object. Model will be on CPU.\n"
188 | ]
189 | },
190 | {
191 | "output_type": "stream",
192 | "name": "stdout",
193 | "text": [
194 | "[{'label': 'POSITIVE', 'score': 0.9998860359191895}]\n"
195 | ]
196 | }
197 | ],
198 | "source": [
199 | "from transformers import pipeline\n",
200 | "classifier = pipeline('sentiment-analysis', model=\"distilbert-base-uncased-finetuned-sst-2-english\")\n",
201 | "\n",
202 | "print(classifier(\"BERT is amazing!\"))"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": null,
208 | "metadata": {
209 | "id": "nYUNuyXMn5l9"
210 | },
211 | "outputs": [],
212 | "source": [
213 | "import base64\n",
214 | "data = {\n",
215 | " 'arryn': 'As High as Honor.',\n",
216 | " 'baratheon': 'Ours is the fury.',\n",
217 | " 'stark': 'Winter is coming.',\n",
218 | " 'tyrell': 'Growing strong.'\n",
219 | "}\n",
220 | "\n",
221 | "# YOUR CODE: predict sentiment for each noble house and create outputs dict\n",
222 | "<...>\n",
223 | "outputs = \n",
224 | "\n",
225 | "assert sum(outputs.values()) == 3 and outputs[base64.decodebytes(b'YmFyYXRoZW9u\\n').decode()] == False\n",
226 | "print(\"Well done!\")"
227 | ]
228 | },
229 | {
230 | "cell_type": "markdown",
231 | "metadata": {
232 | "id": "BRDhIH-XpSNo"
233 | },
234 | "source": [
235 | "You can also access vanilla Masked Language Model that was trained to predict masked words. Here's how:"
236 | ]
237 | },
238 | {
239 | "cell_type": "code",
240 | "execution_count": null,
241 | "metadata": {
242 | "id": "pa-8noIllRbZ"
243 | },
244 | "outputs": [],
245 | "source": [
246 | "mlm_model = pipeline('fill-mask', model=\"bert-base-uncased\")\n",
247 | "MASK = mlm_model.tokenizer.mask_token\n",
248 | "\n",
249 | "for hypo in mlm_model(f\"Donald {MASK} is the president of the united states.\"):\n",
250 | " print(f\"P={hypo['score']:.5f}\", hypo['sequence'])"
251 | ]
252 | },
253 | {
254 | "cell_type": "code",
255 | "execution_count": null,
256 | "metadata": {
257 | "id": "9NxeG1Y5pwX1"
258 | },
259 | "outputs": [],
260 | "source": [
261 | "# Your turn: use bert to recall what year was the Soviet Union founded in\n",
262 | "mlm_model()"
263 | ]
264 | },
265 | {
266 | "cell_type": "markdown",
267 | "metadata": {
268 | "id": "YJxRFzCSq903"
269 | },
270 | "source": [
271 | "```\n",
272 | "\n",
273 | "```\n",
274 | "\n",
275 | "```\n",
276 | "\n",
277 | "```\n",
278 | "\n",
279 | "\n",
280 | "Huggingface offers hundreds of pre-trained models that specialize on different tasks. You can quickly find the model you need using [this list](https://huggingface.co/models).\n"
281 | ]
282 | },
283 | {
284 | "cell_type": "code",
285 | "execution_count": null,
286 | "metadata": {
287 | "id": "HRux8Qp2hkXr"
288 | },
289 | "outputs": [],
290 | "source": [
291 | "text = \"\"\"Almost two-thirds of the 1.5 million people who viewed this liveblog had Googled to discover\n",
292 | " the latest on the Rosetta mission. They were treated to this detailed account by the Guardian’s science editor,\n",
293 | " Ian Sample, and astronomy writer Stuart Clark of the moment scientists landed a robotic spacecraft on a comet\n",
294 | " for the first time in history, and the delirious reaction it provoked at their headquarters in Germany.\n",
295 | " “We are there. We are sitting on the surface. Philae is talking to us,” said one scientist.\n",
296 | "\"\"\"\n",
297 | "\n",
298 | "# Task: create a pipeline for named entity recognition, use task name 'ner' and search for the right model in the list\n",
299 | "ner_model = \n",
300 | "\n",
301 | "named_entities = ner_model(text)"
302 | ]
303 | },
304 | {
305 | "cell_type": "code",
306 | "execution_count": null,
307 | "metadata": {
308 | "id": "hf57MRzSiSON"
309 | },
310 | "outputs": [],
311 | "source": [
312 | "print('OUTPUT:', named_entities)\n",
313 | "word_to_entity = {item['word']: item['entity'] for item in named_entities}\n",
314 | "assert 'org' in word_to_entity.get('Guardian').lower() and 'per' in word_to_entity.get('Stuart').lower()\n",
315 | "print(\"All tests passed\")"
316 | ]
317 | },
318 | {
319 | "cell_type": "markdown",
320 | "metadata": {
321 | "id": "ULMownz6sP9n"
322 | },
323 | "source": [
324 | "### The building blocks of a pipeline\n",
325 | "\n",
326 | "Huggingface also allows you to access its pipelines on a lower level. There are two main abstractions for you:\n",
327 | "* `Tokenizer` - converts from strings to token ids and back\n",
328 | "* `Model` - a pytorch `nn.Module` with pre-trained weights\n",
329 | "\n",
330 | "You can use such models as part of your regular pytorch code: insert is as a layer in your model, apply it to a batch of data, backpropagate, optimize, etc."
331 | ]
332 | },
333 | {
334 | "cell_type": "code",
335 | "execution_count": null,
336 | "metadata": {
337 | "id": "KMJbV0QVsO0Q"
338 | },
339 | "outputs": [],
340 | "source": [
341 | "import torch\n",
342 | "from transformers import AutoTokenizer, AutoModel, pipeline\n",
343 | "\n",
344 | "model_name = 'bert-base-uncased'\n",
345 | "tokenizer = AutoTokenizer.from_pretrained(model_name)\n",
346 | "model = AutoModel.from_pretrained(model_name)\n"
347 | ]
348 | },
349 | {
350 | "cell_type": "code",
351 | "execution_count": null,
352 | "metadata": {
353 | "id": "ZgSPHKPRxG6U"
354 | },
355 | "outputs": [],
356 | "source": [
357 | "lines = [\n",
358 | " \"Luke, I am your father.\",\n",
359 | " \"Life is what happens when you're busy making other plans.\",\n",
360 | " ]\n",
361 | "\n",
362 | "# tokenize a batch of inputs. \"pt\" means [p]y[t]orch tensors\n",
363 | "tokens_info = tokenizer(lines, padding=True, truncation=True, return_tensors=\"pt\")\n",
364 | "\n",
365 | "for key in tokens_info:\n",
366 | " print(key, tokens_info[key])\n",
367 | "\n",
368 | "print(\"Detokenized:\")\n",
369 | "for i in range(2):\n",
370 | " print(tokenizer.decode(tokens_info['input_ids'][i]))"
371 | ]
372 | },
373 | {
374 | "cell_type": "code",
375 | "execution_count": null,
376 | "metadata": {
377 | "id": "MJkbHxERyfL4"
378 | },
379 | "outputs": [],
380 | "source": [
381 | "# You can now apply the model to get embeddings\n",
382 | "with torch.no_grad():\n",
383 | " out = model(**tokens_info)\n",
384 | "\n",
385 | "print(out['pooler_output'])"
386 | ]
387 | },
388 | {
389 | "cell_type": "code",
390 | "execution_count": null,
391 | "metadata": {
392 | "id": "vWCajBGcAern"
393 | },
394 | "outputs": [],
395 | "source": [
396 | "import torch\n",
397 | "import numpy as np\n",
398 | "from transformers import GPT2Tokenizer, GPT2LMHeadModel\n",
399 | "\n",
400 | "device = 'cuda' if torch.cuda.is_available() else 'cpu'\n",
401 | "tokenizer = GPT2Tokenizer.from_pretrained('gpt2', add_prefix_space=True)\n",
402 | "model = GPT2LMHeadModel.from_pretrained('gpt2').train(False).to(device)\n",
403 | "\n",
404 | "text = \"The Fermi paradox \"\n",
405 | "tokens = tokenizer.encode(text)\n",
406 | "num_steps = 1024 - len(tokens) + 1\n",
407 | "line_length, max_length = 0, 70\n",
408 | "\n",
409 | "print(end=tokenizer.decode(tokens))\n",
410 | "\n",
411 | "for i in range(num_steps):\n",
412 | " with torch.no_grad():\n",
413 | " logits = model(torch.as_tensor([tokens], device=device))[0]\n",
414 | " p_next = torch.softmax(logits[0, -1, :], dim=-1).data.cpu().numpy()\n",
415 | "\n",
416 | " next_token_index = p_next.argmax() #\n",
417 | " # YOUR TASK: change the code so that it performs nucleus sampling\n",
418 | "\n",
419 | " tokens.append(int(next_token_index))\n",
420 | " print(end=tokenizer.decode(tokens[-1]))\n",
421 | " line_length += len(tokenizer.decode(tokens[-1]))\n",
422 | " if line_length >= max_length:\n",
423 | " line_length = 0\n",
424 | " print()\n",
425 | "\n"
426 | ]
427 | },
428 | {
429 | "cell_type": "markdown",
430 | "metadata": {
431 | "id": "_Vij7Gc1wOaq"
432 | },
433 | "source": [
434 | "Transformers knowledge hub: https://huggingface.co/transformers/"
435 | ]
436 | },
437 | {
438 | "cell_type": "markdown",
439 | "source": [
440 | "Just pytorch (in particular) models, so we can train then as usual"
441 | ],
442 | "metadata": {
443 | "id": "QxMgH1-OGfw7"
444 | }
445 | },
446 | {
447 | "cell_type": "code",
448 | "source": [
449 | "from datasets import load_dataset\n",
450 | "from transformers import AutoTokenizer\n",
451 | "\n",
452 | "raw_datasets = load_dataset(\"glue\", \"mrpc\")\n",
453 | "checkpoint = \"bert-base-uncased\"\n",
454 | "tokenizer = AutoTokenizer.from_pretrained(checkpoint)\n",
455 | "\n",
456 | "\n",
457 | "def tokenize_function(example):\n",
458 | " return tokenizer(example[\"sentence1\"], example[\"sentence2\"], truncation=True)\n",
459 | "\n",
460 | "\n",
461 | "tokenized_datasets = raw_datasets.map(tokenize_function, batched=True)\n",
462 | "tokenized_datasets = tokenized_datasets.remove_columns([\"sentence1\", \"sentence2\", \"idx\"])\n",
463 | "tokenized_datasets = tokenized_datasets.rename_column(\"label\", \"labels\")\n",
464 | "tokenized_datasets.set_format(\"torch\")\n",
465 | "tokenized_datasets[\"train\"].column_names"
466 | ],
467 | "metadata": {
468 | "id": "oWjfs7ZQGhv9"
469 | },
470 | "execution_count": null,
471 | "outputs": []
472 | },
473 | {
474 | "cell_type": "code",
475 | "source": [
476 | "from torch.utils.data import DataLoader\n",
477 | "from transformers import DataCollatorWithPadding\n",
478 | "\n",
479 | "data_collator = DataCollatorWithPadding(tokenizer=tokenizer)\n",
480 | "\n",
481 | "train_dataloader = DataLoader(\n",
482 | " tokenized_datasets[\"train\"], shuffle=True, batch_size=8, collate_fn=data_collator\n",
483 | ")\n",
484 | "eval_dataloader = DataLoader(\n",
485 | " tokenized_datasets[\"validation\"], batch_size=8, collate_fn=data_collator\n",
486 | ")"
487 | ],
488 | "metadata": {
489 | "id": "uspRhoQNGrE3"
490 | },
491 | "execution_count": null,
492 | "outputs": []
493 | },
494 | {
495 | "cell_type": "code",
496 | "source": [
497 | "from transformers import AdamW, AutoModelForSequenceClassification, get_scheduler\n",
498 | "import torch\n",
499 | "from tqdm import tqdm\n",
500 | "\n",
501 | "model = AutoModelForSequenceClassification.from_pretrained(checkpoint, num_labels=2)\n",
502 | "optimizer = AdamW(model.parameters(), lr=3e-5)\n",
503 | "\n",
504 | "device = torch.device(\"cuda\") if torch.cuda.is_available() else torch.device(\"cpu\")\n",
505 | "model.to(device)\n",
506 | "\n",
507 | "num_epochs = 3\n",
508 | "num_training_steps = num_epochs * len(train_dataloader)\n",
509 | "lr_scheduler = get_scheduler(\n",
510 | " \"linear\",\n",
511 | " optimizer=optimizer,\n",
512 | " num_warmup_steps=0,\n",
513 | " num_training_steps=num_training_steps,\n",
514 | ")\n",
515 | "\n",
516 | "progress_bar = tqdm(range(num_training_steps))\n",
517 | "\n",
518 | "model.train()\n",
519 | "for epoch in range(num_epochs):\n",
520 | " for batch in train_dataloader:\n",
521 | " batch = {k: v.to(device) for k, v in batch.items()}\n",
522 | " outputs = model(**batch)\n",
523 | " loss = outputs.loss\n",
524 | " loss.backward()\n",
525 | "\n",
526 | " optimizer.step()\n",
527 | " lr_scheduler.step()\n",
528 | " optimizer.zero_grad()\n",
529 | " progress_bar.update(1)"
530 | ],
531 | "metadata": {
532 | "id": "vo7MlmdiG1Jx"
533 | },
534 | "execution_count": null,
535 | "outputs": []
536 | },
537 | {
538 | "cell_type": "code",
539 | "source": [
540 | "import evaluate\n",
541 | "\n",
542 | "metric = evaluate.load(\"glue\", \"mrpc\")\n",
543 | "model.eval()\n",
544 | "for batch in eval_dataloader:\n",
545 | " batch = {k: v.to(device) for k, v in batch.items()}\n",
546 | " with torch.no_grad():\n",
547 | " outputs = model(**batch)\n",
548 | "\n",
549 | " logits = outputs.logits\n",
550 | " predictions = torch.argmax(logits, dim=-1)\n",
551 | " metric.add_batch(predictions=predictions, references=batch[\"labels\"])\n",
552 | "\n",
553 | "metric.compute()"
554 | ],
555 | "metadata": {
556 | "id": "wqtMj9JbH8-l"
557 | },
558 | "execution_count": null,
559 | "outputs": []
560 | }
561 | ],
562 | "metadata": {
563 | "accelerator": "GPU",
564 | "colab": {
565 | "provenance": []
566 | },
567 | "kernelspec": {
568 | "display_name": "Python 3",
569 | "language": "python",
570 | "name": "python3"
571 | },
572 | "language_info": {
573 | "codemirror_mode": {
574 | "name": "ipython",
575 | "version": 3
576 | },
577 | "file_extension": ".py",
578 | "mimetype": "text/x-python",
579 | "name": "python",
580 | "nbconvert_exporter": "python",
581 | "pygments_lexer": "ipython3",
582 | "version": "3.8.8"
583 | },
584 | "widgets": {
585 | "application/vnd.jupyter.widget-state+json": {
586 | "9fe16c621bd643318bc341864efb3e4d": {
587 | "model_module": "@jupyter-widgets/controls",
588 | "model_name": "HBoxModel",
589 | "model_module_version": "1.5.0",
590 | "state": {
591 | "_dom_classes": [],
592 | "_model_module": "@jupyter-widgets/controls",
593 | "_model_module_version": "1.5.0",
594 | "_model_name": "HBoxModel",
595 | "_view_count": null,
596 | "_view_module": "@jupyter-widgets/controls",
597 | "_view_module_version": "1.5.0",
598 | "_view_name": "HBoxView",
599 | "box_style": "",
600 | "children": [
601 | "IPY_MODEL_d930d2cef8ed4ef29db50c13b9103056",
602 | "IPY_MODEL_e8bbd6a633f54344aced163ff020c6b9",
603 | "IPY_MODEL_53fd63ef3e6940c3a1a7da0d26f5bf00"
604 | ],
605 | "layout": "IPY_MODEL_05c022d131f1454e8d9674da8bba015d"
606 | }
607 | },
608 | "d930d2cef8ed4ef29db50c13b9103056": {
609 | "model_module": "@jupyter-widgets/controls",
610 | "model_name": "HTMLModel",
611 | "model_module_version": "1.5.0",
612 | "state": {
613 | "_dom_classes": [],
614 | "_model_module": "@jupyter-widgets/controls",
615 | "_model_module_version": "1.5.0",
616 | "_model_name": "HTMLModel",
617 | "_view_count": null,
618 | "_view_module": "@jupyter-widgets/controls",
619 | "_view_module_version": "1.5.0",
620 | "_view_name": "HTMLView",
621 | "description": "",
622 | "description_tooltip": null,
623 | "layout": "IPY_MODEL_1b51b4f4463341c1acdb4c57dbf8130b",
624 | "placeholder": "",
625 | "style": "IPY_MODEL_9fa2a910690040aabcd2b05e449b0c45",
626 | "value": "config.json: 100%"
627 | }
628 | },
629 | "e8bbd6a633f54344aced163ff020c6b9": {
630 | "model_module": "@jupyter-widgets/controls",
631 | "model_name": "FloatProgressModel",
632 | "model_module_version": "1.5.0",
633 | "state": {
634 | "_dom_classes": [],
635 | "_model_module": "@jupyter-widgets/controls",
636 | "_model_module_version": "1.5.0",
637 | "_model_name": "FloatProgressModel",
638 | "_view_count": null,
639 | "_view_module": "@jupyter-widgets/controls",
640 | "_view_module_version": "1.5.0",
641 | "_view_name": "ProgressView",
642 | "bar_style": "success",
643 | "description": "",
644 | "description_tooltip": null,
645 | "layout": "IPY_MODEL_f0ff5f31afd4450ea185123a577df0cf",
646 | "max": 629,
647 | "min": 0,
648 | "orientation": "horizontal",
649 | "style": "IPY_MODEL_be63ea3b752642b6bf7e74e7fdc87430",
650 | "value": 629
651 | }
652 | },
653 | "53fd63ef3e6940c3a1a7da0d26f5bf00": {
654 | "model_module": "@jupyter-widgets/controls",
655 | "model_name": "HTMLModel",
656 | "model_module_version": "1.5.0",
657 | "state": {
658 | "_dom_classes": [],
659 | "_model_module": "@jupyter-widgets/controls",
660 | "_model_module_version": "1.5.0",
661 | "_model_name": "HTMLModel",
662 | "_view_count": null,
663 | "_view_module": "@jupyter-widgets/controls",
664 | "_view_module_version": "1.5.0",
665 | "_view_name": "HTMLView",
666 | "description": "",
667 | "description_tooltip": null,
668 | "layout": "IPY_MODEL_8dffc51581f74d7c81c6bdb3feb7ae4f",
669 | "placeholder": "",
670 | "style": "IPY_MODEL_e9a828847dcf41b5bbc1eaa44fde2c95",
671 | "value": " 629/629 [00:00<00:00, 26.2kB/s]"
672 | }
673 | },
674 | "05c022d131f1454e8d9674da8bba015d": {
675 | "model_module": "@jupyter-widgets/base",
676 | "model_name": "LayoutModel",
677 | "model_module_version": "1.2.0",
678 | "state": {
679 | "_model_module": "@jupyter-widgets/base",
680 | "_model_module_version": "1.2.0",
681 | "_model_name": "LayoutModel",
682 | "_view_count": null,
683 | "_view_module": "@jupyter-widgets/base",
684 | "_view_module_version": "1.2.0",
685 | "_view_name": "LayoutView",
686 | "align_content": null,
687 | "align_items": null,
688 | "align_self": null,
689 | "border": null,
690 | "bottom": null,
691 | "display": null,
692 | "flex": null,
693 | "flex_flow": null,
694 | "grid_area": null,
695 | "grid_auto_columns": null,
696 | "grid_auto_flow": null,
697 | "grid_auto_rows": null,
698 | "grid_column": null,
699 | "grid_gap": null,
700 | "grid_row": null,
701 | "grid_template_areas": null,
702 | "grid_template_columns": null,
703 | "grid_template_rows": null,
704 | "height": null,
705 | "justify_content": null,
706 | "justify_items": null,
707 | "left": null,
708 | "margin": null,
709 | "max_height": null,
710 | "max_width": null,
711 | "min_height": null,
712 | "min_width": null,
713 | "object_fit": null,
714 | "object_position": null,
715 | "order": null,
716 | "overflow": null,
717 | "overflow_x": null,
718 | "overflow_y": null,
719 | "padding": null,
720 | "right": null,
721 | "top": null,
722 | "visibility": null,
723 | "width": null
724 | }
725 | },
726 | "1b51b4f4463341c1acdb4c57dbf8130b": {
727 | "model_module": "@jupyter-widgets/base",
728 | "model_name": "LayoutModel",
729 | "model_module_version": "1.2.0",
730 | "state": {
731 | "_model_module": "@jupyter-widgets/base",
732 | "_model_module_version": "1.2.0",
733 | "_model_name": "LayoutModel",
734 | "_view_count": null,
735 | "_view_module": "@jupyter-widgets/base",
736 | "_view_module_version": "1.2.0",
737 | "_view_name": "LayoutView",
738 | "align_content": null,
739 | "align_items": null,
740 | "align_self": null,
741 | "border": null,
742 | "bottom": null,
743 | "display": null,
744 | "flex": null,
745 | "flex_flow": null,
746 | "grid_area": null,
747 | "grid_auto_columns": null,
748 | "grid_auto_flow": null,
749 | "grid_auto_rows": null,
750 | "grid_column": null,
751 | "grid_gap": null,
752 | "grid_row": null,
753 | "grid_template_areas": null,
754 | "grid_template_columns": null,
755 | "grid_template_rows": null,
756 | "height": null,
757 | "justify_content": null,
758 | "justify_items": null,
759 | "left": null,
760 | "margin": null,
761 | "max_height": null,
762 | "max_width": null,
763 | "min_height": null,
764 | "min_width": null,
765 | "object_fit": null,
766 | "object_position": null,
767 | "order": null,
768 | "overflow": null,
769 | "overflow_x": null,
770 | "overflow_y": null,
771 | "padding": null,
772 | "right": null,
773 | "top": null,
774 | "visibility": null,
775 | "width": null
776 | }
777 | },
778 | "9fa2a910690040aabcd2b05e449b0c45": {
779 | "model_module": "@jupyter-widgets/controls",
780 | "model_name": "DescriptionStyleModel",
781 | "model_module_version": "1.5.0",
782 | "state": {
783 | "_model_module": "@jupyter-widgets/controls",
784 | "_model_module_version": "1.5.0",
785 | "_model_name": "DescriptionStyleModel",
786 | "_view_count": null,
787 | "_view_module": "@jupyter-widgets/base",
788 | "_view_module_version": "1.2.0",
789 | "_view_name": "StyleView",
790 | "description_width": ""
791 | }
792 | },
793 | "f0ff5f31afd4450ea185123a577df0cf": {
794 | "model_module": "@jupyter-widgets/base",
795 | "model_name": "LayoutModel",
796 | "model_module_version": "1.2.0",
797 | "state": {
798 | "_model_module": "@jupyter-widgets/base",
799 | "_model_module_version": "1.2.0",
800 | "_model_name": "LayoutModel",
801 | "_view_count": null,
802 | "_view_module": "@jupyter-widgets/base",
803 | "_view_module_version": "1.2.0",
804 | "_view_name": "LayoutView",
805 | "align_content": null,
806 | "align_items": null,
807 | "align_self": null,
808 | "border": null,
809 | "bottom": null,
810 | "display": null,
811 | "flex": null,
812 | "flex_flow": null,
813 | "grid_area": null,
814 | "grid_auto_columns": null,
815 | "grid_auto_flow": null,
816 | "grid_auto_rows": null,
817 | "grid_column": null,
818 | "grid_gap": null,
819 | "grid_row": null,
820 | "grid_template_areas": null,
821 | "grid_template_columns": null,
822 | "grid_template_rows": null,
823 | "height": null,
824 | "justify_content": null,
825 | "justify_items": null,
826 | "left": null,
827 | "margin": null,
828 | "max_height": null,
829 | "max_width": null,
830 | "min_height": null,
831 | "min_width": null,
832 | "object_fit": null,
833 | "object_position": null,
834 | "order": null,
835 | "overflow": null,
836 | "overflow_x": null,
837 | "overflow_y": null,
838 | "padding": null,
839 | "right": null,
840 | "top": null,
841 | "visibility": null,
842 | "width": null
843 | }
844 | },
845 | "be63ea3b752642b6bf7e74e7fdc87430": {
846 | "model_module": "@jupyter-widgets/controls",
847 | "model_name": "ProgressStyleModel",
848 | "model_module_version": "1.5.0",
849 | "state": {
850 | "_model_module": "@jupyter-widgets/controls",
851 | "_model_module_version": "1.5.0",
852 | "_model_name": "ProgressStyleModel",
853 | "_view_count": null,
854 | "_view_module": "@jupyter-widgets/base",
855 | "_view_module_version": "1.2.0",
856 | "_view_name": "StyleView",
857 | "bar_color": null,
858 | "description_width": ""
859 | }
860 | },
861 | "8dffc51581f74d7c81c6bdb3feb7ae4f": {
862 | "model_module": "@jupyter-widgets/base",
863 | "model_name": "LayoutModel",
864 | "model_module_version": "1.2.0",
865 | "state": {
866 | "_model_module": "@jupyter-widgets/base",
867 | "_model_module_version": "1.2.0",
868 | "_model_name": "LayoutModel",
869 | "_view_count": null,
870 | "_view_module": "@jupyter-widgets/base",
871 | "_view_module_version": "1.2.0",
872 | "_view_name": "LayoutView",
873 | "align_content": null,
874 | "align_items": null,
875 | "align_self": null,
876 | "border": null,
877 | "bottom": null,
878 | "display": null,
879 | "flex": null,
880 | "flex_flow": null,
881 | "grid_area": null,
882 | "grid_auto_columns": null,
883 | "grid_auto_flow": null,
884 | "grid_auto_rows": null,
885 | "grid_column": null,
886 | "grid_gap": null,
887 | "grid_row": null,
888 | "grid_template_areas": null,
889 | "grid_template_columns": null,
890 | "grid_template_rows": null,
891 | "height": null,
892 | "justify_content": null,
893 | "justify_items": null,
894 | "left": null,
895 | "margin": null,
896 | "max_height": null,
897 | "max_width": null,
898 | "min_height": null,
899 | "min_width": null,
900 | "object_fit": null,
901 | "object_position": null,
902 | "order": null,
903 | "overflow": null,
904 | "overflow_x": null,
905 | "overflow_y": null,
906 | "padding": null,
907 | "right": null,
908 | "top": null,
909 | "visibility": null,
910 | "width": null
911 | }
912 | },
913 | "e9a828847dcf41b5bbc1eaa44fde2c95": {
914 | "model_module": "@jupyter-widgets/controls",
915 | "model_name": "DescriptionStyleModel",
916 | "model_module_version": "1.5.0",
917 | "state": {
918 | "_model_module": "@jupyter-widgets/controls",
919 | "_model_module_version": "1.5.0",
920 | "_model_name": "DescriptionStyleModel",
921 | "_view_count": null,
922 | "_view_module": "@jupyter-widgets/base",
923 | "_view_module_version": "1.2.0",
924 | "_view_name": "StyleView",
925 | "description_width": ""
926 | }
927 | },
928 | "e4e616b460ef4b11b4c352d467f494dc": {
929 | "model_module": "@jupyter-widgets/controls",
930 | "model_name": "HBoxModel",
931 | "model_module_version": "1.5.0",
932 | "state": {
933 | "_dom_classes": [],
934 | "_model_module": "@jupyter-widgets/controls",
935 | "_model_module_version": "1.5.0",
936 | "_model_name": "HBoxModel",
937 | "_view_count": null,
938 | "_view_module": "@jupyter-widgets/controls",
939 | "_view_module_version": "1.5.0",
940 | "_view_name": "HBoxView",
941 | "box_style": "",
942 | "children": [
943 | "IPY_MODEL_7ce760b40b834494a1efa1b078985bff",
944 | "IPY_MODEL_d6f3f9f5db014075bd4fbca506e943f8",
945 | "IPY_MODEL_97716ed55a074ce0b52e68be926a09c3"
946 | ],
947 | "layout": "IPY_MODEL_86551d955c684d43b7f51d52cec868ec"
948 | }
949 | },
950 | "7ce760b40b834494a1efa1b078985bff": {
951 | "model_module": "@jupyter-widgets/controls",
952 | "model_name": "HTMLModel",
953 | "model_module_version": "1.5.0",
954 | "state": {
955 | "_dom_classes": [],
956 | "_model_module": "@jupyter-widgets/controls",
957 | "_model_module_version": "1.5.0",
958 | "_model_name": "HTMLModel",
959 | "_view_count": null,
960 | "_view_module": "@jupyter-widgets/controls",
961 | "_view_module_version": "1.5.0",
962 | "_view_name": "HTMLView",
963 | "description": "",
964 | "description_tooltip": null,
965 | "layout": "IPY_MODEL_368689f0e8974d7eae2fd497c996ba20",
966 | "placeholder": "",
967 | "style": "IPY_MODEL_826d46d71d35444a8123fffd107d1f1b",
968 | "value": "model.safetensors: 100%"
969 | }
970 | },
971 | "d6f3f9f5db014075bd4fbca506e943f8": {
972 | "model_module": "@jupyter-widgets/controls",
973 | "model_name": "FloatProgressModel",
974 | "model_module_version": "1.5.0",
975 | "state": {
976 | "_dom_classes": [],
977 | "_model_module": "@jupyter-widgets/controls",
978 | "_model_module_version": "1.5.0",
979 | "_model_name": "FloatProgressModel",
980 | "_view_count": null,
981 | "_view_module": "@jupyter-widgets/controls",
982 | "_view_module_version": "1.5.0",
983 | "_view_name": "ProgressView",
984 | "bar_style": "success",
985 | "description": "",
986 | "description_tooltip": null,
987 | "layout": "IPY_MODEL_82fb664d4d614a0f9bea5b297783f0d5",
988 | "max": 267832558,
989 | "min": 0,
990 | "orientation": "horizontal",
991 | "style": "IPY_MODEL_28d6b5e50d4545559dd0d610b30c3b96",
992 | "value": 267832558
993 | }
994 | },
995 | "97716ed55a074ce0b52e68be926a09c3": {
996 | "model_module": "@jupyter-widgets/controls",
997 | "model_name": "HTMLModel",
998 | "model_module_version": "1.5.0",
999 | "state": {
1000 | "_dom_classes": [],
1001 | "_model_module": "@jupyter-widgets/controls",
1002 | "_model_module_version": "1.5.0",
1003 | "_model_name": "HTMLModel",
1004 | "_view_count": null,
1005 | "_view_module": "@jupyter-widgets/controls",
1006 | "_view_module_version": "1.5.0",
1007 | "_view_name": "HTMLView",
1008 | "description": "",
1009 | "description_tooltip": null,
1010 | "layout": "IPY_MODEL_33ee09dba6d5422896396a102f78dbba",
1011 | "placeholder": "",
1012 | "style": "IPY_MODEL_e0a71a1e15fc4070b3147560f0e7d694",
1013 | "value": " 268M/268M [00:01<00:00, 226MB/s]"
1014 | }
1015 | },
1016 | "86551d955c684d43b7f51d52cec868ec": {
1017 | "model_module": "@jupyter-widgets/base",
1018 | "model_name": "LayoutModel",
1019 | "model_module_version": "1.2.0",
1020 | "state": {
1021 | "_model_module": "@jupyter-widgets/base",
1022 | "_model_module_version": "1.2.0",
1023 | "_model_name": "LayoutModel",
1024 | "_view_count": null,
1025 | "_view_module": "@jupyter-widgets/base",
1026 | "_view_module_version": "1.2.0",
1027 | "_view_name": "LayoutView",
1028 | "align_content": null,
1029 | "align_items": null,
1030 | "align_self": null,
1031 | "border": null,
1032 | "bottom": null,
1033 | "display": null,
1034 | "flex": null,
1035 | "flex_flow": null,
1036 | "grid_area": null,
1037 | "grid_auto_columns": null,
1038 | "grid_auto_flow": null,
1039 | "grid_auto_rows": null,
1040 | "grid_column": null,
1041 | "grid_gap": null,
1042 | "grid_row": null,
1043 | "grid_template_areas": null,
1044 | "grid_template_columns": null,
1045 | "grid_template_rows": null,
1046 | "height": null,
1047 | "justify_content": null,
1048 | "justify_items": null,
1049 | "left": null,
1050 | "margin": null,
1051 | "max_height": null,
1052 | "max_width": null,
1053 | "min_height": null,
1054 | "min_width": null,
1055 | "object_fit": null,
1056 | "object_position": null,
1057 | "order": null,
1058 | "overflow": null,
1059 | "overflow_x": null,
1060 | "overflow_y": null,
1061 | "padding": null,
1062 | "right": null,
1063 | "top": null,
1064 | "visibility": null,
1065 | "width": null
1066 | }
1067 | },
1068 | "368689f0e8974d7eae2fd497c996ba20": {
1069 | "model_module": "@jupyter-widgets/base",
1070 | "model_name": "LayoutModel",
1071 | "model_module_version": "1.2.0",
1072 | "state": {
1073 | "_model_module": "@jupyter-widgets/base",
1074 | "_model_module_version": "1.2.0",
1075 | "_model_name": "LayoutModel",
1076 | "_view_count": null,
1077 | "_view_module": "@jupyter-widgets/base",
1078 | "_view_module_version": "1.2.0",
1079 | "_view_name": "LayoutView",
1080 | "align_content": null,
1081 | "align_items": null,
1082 | "align_self": null,
1083 | "border": null,
1084 | "bottom": null,
1085 | "display": null,
1086 | "flex": null,
1087 | "flex_flow": null,
1088 | "grid_area": null,
1089 | "grid_auto_columns": null,
1090 | "grid_auto_flow": null,
1091 | "grid_auto_rows": null,
1092 | "grid_column": null,
1093 | "grid_gap": null,
1094 | "grid_row": null,
1095 | "grid_template_areas": null,
1096 | "grid_template_columns": null,
1097 | "grid_template_rows": null,
1098 | "height": null,
1099 | "justify_content": null,
1100 | "justify_items": null,
1101 | "left": null,
1102 | "margin": null,
1103 | "max_height": null,
1104 | "max_width": null,
1105 | "min_height": null,
1106 | "min_width": null,
1107 | "object_fit": null,
1108 | "object_position": null,
1109 | "order": null,
1110 | "overflow": null,
1111 | "overflow_x": null,
1112 | "overflow_y": null,
1113 | "padding": null,
1114 | "right": null,
1115 | "top": null,
1116 | "visibility": null,
1117 | "width": null
1118 | }
1119 | },
1120 | "826d46d71d35444a8123fffd107d1f1b": {
1121 | "model_module": "@jupyter-widgets/controls",
1122 | "model_name": "DescriptionStyleModel",
1123 | "model_module_version": "1.5.0",
1124 | "state": {
1125 | "_model_module": "@jupyter-widgets/controls",
1126 | "_model_module_version": "1.5.0",
1127 | "_model_name": "DescriptionStyleModel",
1128 | "_view_count": null,
1129 | "_view_module": "@jupyter-widgets/base",
1130 | "_view_module_version": "1.2.0",
1131 | "_view_name": "StyleView",
1132 | "description_width": ""
1133 | }
1134 | },
1135 | "82fb664d4d614a0f9bea5b297783f0d5": {
1136 | "model_module": "@jupyter-widgets/base",
1137 | "model_name": "LayoutModel",
1138 | "model_module_version": "1.2.0",
1139 | "state": {
1140 | "_model_module": "@jupyter-widgets/base",
1141 | "_model_module_version": "1.2.0",
1142 | "_model_name": "LayoutModel",
1143 | "_view_count": null,
1144 | "_view_module": "@jupyter-widgets/base",
1145 | "_view_module_version": "1.2.0",
1146 | "_view_name": "LayoutView",
1147 | "align_content": null,
1148 | "align_items": null,
1149 | "align_self": null,
1150 | "border": null,
1151 | "bottom": null,
1152 | "display": null,
1153 | "flex": null,
1154 | "flex_flow": null,
1155 | "grid_area": null,
1156 | "grid_auto_columns": null,
1157 | "grid_auto_flow": null,
1158 | "grid_auto_rows": null,
1159 | "grid_column": null,
1160 | "grid_gap": null,
1161 | "grid_row": null,
1162 | "grid_template_areas": null,
1163 | "grid_template_columns": null,
1164 | "grid_template_rows": null,
1165 | "height": null,
1166 | "justify_content": null,
1167 | "justify_items": null,
1168 | "left": null,
1169 | "margin": null,
1170 | "max_height": null,
1171 | "max_width": null,
1172 | "min_height": null,
1173 | "min_width": null,
1174 | "object_fit": null,
1175 | "object_position": null,
1176 | "order": null,
1177 | "overflow": null,
1178 | "overflow_x": null,
1179 | "overflow_y": null,
1180 | "padding": null,
1181 | "right": null,
1182 | "top": null,
1183 | "visibility": null,
1184 | "width": null
1185 | }
1186 | },
1187 | "28d6b5e50d4545559dd0d610b30c3b96": {
1188 | "model_module": "@jupyter-widgets/controls",
1189 | "model_name": "ProgressStyleModel",
1190 | "model_module_version": "1.5.0",
1191 | "state": {
1192 | "_model_module": "@jupyter-widgets/controls",
1193 | "_model_module_version": "1.5.0",
1194 | "_model_name": "ProgressStyleModel",
1195 | "_view_count": null,
1196 | "_view_module": "@jupyter-widgets/base",
1197 | "_view_module_version": "1.2.0",
1198 | "_view_name": "StyleView",
1199 | "bar_color": null,
1200 | "description_width": ""
1201 | }
1202 | },
1203 | "33ee09dba6d5422896396a102f78dbba": {
1204 | "model_module": "@jupyter-widgets/base",
1205 | "model_name": "LayoutModel",
1206 | "model_module_version": "1.2.0",
1207 | "state": {
1208 | "_model_module": "@jupyter-widgets/base",
1209 | "_model_module_version": "1.2.0",
1210 | "_model_name": "LayoutModel",
1211 | "_view_count": null,
1212 | "_view_module": "@jupyter-widgets/base",
1213 | "_view_module_version": "1.2.0",
1214 | "_view_name": "LayoutView",
1215 | "align_content": null,
1216 | "align_items": null,
1217 | "align_self": null,
1218 | "border": null,
1219 | "bottom": null,
1220 | "display": null,
1221 | "flex": null,
1222 | "flex_flow": null,
1223 | "grid_area": null,
1224 | "grid_auto_columns": null,
1225 | "grid_auto_flow": null,
1226 | "grid_auto_rows": null,
1227 | "grid_column": null,
1228 | "grid_gap": null,
1229 | "grid_row": null,
1230 | "grid_template_areas": null,
1231 | "grid_template_columns": null,
1232 | "grid_template_rows": null,
1233 | "height": null,
1234 | "justify_content": null,
1235 | "justify_items": null,
1236 | "left": null,
1237 | "margin": null,
1238 | "max_height": null,
1239 | "max_width": null,
1240 | "min_height": null,
1241 | "min_width": null,
1242 | "object_fit": null,
1243 | "object_position": null,
1244 | "order": null,
1245 | "overflow": null,
1246 | "overflow_x": null,
1247 | "overflow_y": null,
1248 | "padding": null,
1249 | "right": null,
1250 | "top": null,
1251 | "visibility": null,
1252 | "width": null
1253 | }
1254 | },
1255 | "e0a71a1e15fc4070b3147560f0e7d694": {
1256 | "model_module": "@jupyter-widgets/controls",
1257 | "model_name": "DescriptionStyleModel",
1258 | "model_module_version": "1.5.0",
1259 | "state": {
1260 | "_model_module": "@jupyter-widgets/controls",
1261 | "_model_module_version": "1.5.0",
1262 | "_model_name": "DescriptionStyleModel",
1263 | "_view_count": null,
1264 | "_view_module": "@jupyter-widgets/base",
1265 | "_view_module_version": "1.2.0",
1266 | "_view_name": "StyleView",
1267 | "description_width": ""
1268 | }
1269 | },
1270 | "c57238fcb356435ea7ad8daf7761879a": {
1271 | "model_module": "@jupyter-widgets/controls",
1272 | "model_name": "HBoxModel",
1273 | "model_module_version": "1.5.0",
1274 | "state": {
1275 | "_dom_classes": [],
1276 | "_model_module": "@jupyter-widgets/controls",
1277 | "_model_module_version": "1.5.0",
1278 | "_model_name": "HBoxModel",
1279 | "_view_count": null,
1280 | "_view_module": "@jupyter-widgets/controls",
1281 | "_view_module_version": "1.5.0",
1282 | "_view_name": "HBoxView",
1283 | "box_style": "",
1284 | "children": [
1285 | "IPY_MODEL_0e35c262e8d343f7b83016b8c61fd744",
1286 | "IPY_MODEL_9081142688d44540a28e4b08e4091270",
1287 | "IPY_MODEL_83fdaa28d4ae47fea15f4ec950f1825f"
1288 | ],
1289 | "layout": "IPY_MODEL_493ff70e5b59414c8ff285fc50391b12"
1290 | }
1291 | },
1292 | "0e35c262e8d343f7b83016b8c61fd744": {
1293 | "model_module": "@jupyter-widgets/controls",
1294 | "model_name": "HTMLModel",
1295 | "model_module_version": "1.5.0",
1296 | "state": {
1297 | "_dom_classes": [],
1298 | "_model_module": "@jupyter-widgets/controls",
1299 | "_model_module_version": "1.5.0",
1300 | "_model_name": "HTMLModel",
1301 | "_view_count": null,
1302 | "_view_module": "@jupyter-widgets/controls",
1303 | "_view_module_version": "1.5.0",
1304 | "_view_name": "HTMLView",
1305 | "description": "",
1306 | "description_tooltip": null,
1307 | "layout": "IPY_MODEL_f6eaee53284c427f99126187a1883081",
1308 | "placeholder": "",
1309 | "style": "IPY_MODEL_04c3cf767105473fb6756223ef2d0030",
1310 | "value": "tokenizer_config.json: 100%"
1311 | }
1312 | },
1313 | "9081142688d44540a28e4b08e4091270": {
1314 | "model_module": "@jupyter-widgets/controls",
1315 | "model_name": "FloatProgressModel",
1316 | "model_module_version": "1.5.0",
1317 | "state": {
1318 | "_dom_classes": [],
1319 | "_model_module": "@jupyter-widgets/controls",
1320 | "_model_module_version": "1.5.0",
1321 | "_model_name": "FloatProgressModel",
1322 | "_view_count": null,
1323 | "_view_module": "@jupyter-widgets/controls",
1324 | "_view_module_version": "1.5.0",
1325 | "_view_name": "ProgressView",
1326 | "bar_style": "success",
1327 | "description": "",
1328 | "description_tooltip": null,
1329 | "layout": "IPY_MODEL_7ffd07ee1d4d482d8a9f1e2d8ddb2772",
1330 | "max": 48,
1331 | "min": 0,
1332 | "orientation": "horizontal",
1333 | "style": "IPY_MODEL_15898c9d06734d60b23ee92d03592c33",
1334 | "value": 48
1335 | }
1336 | },
1337 | "83fdaa28d4ae47fea15f4ec950f1825f": {
1338 | "model_module": "@jupyter-widgets/controls",
1339 | "model_name": "HTMLModel",
1340 | "model_module_version": "1.5.0",
1341 | "state": {
1342 | "_dom_classes": [],
1343 | "_model_module": "@jupyter-widgets/controls",
1344 | "_model_module_version": "1.5.0",
1345 | "_model_name": "HTMLModel",
1346 | "_view_count": null,
1347 | "_view_module": "@jupyter-widgets/controls",
1348 | "_view_module_version": "1.5.0",
1349 | "_view_name": "HTMLView",
1350 | "description": "",
1351 | "description_tooltip": null,
1352 | "layout": "IPY_MODEL_92bef379c00f46e2b02ace047c50a9af",
1353 | "placeholder": "",
1354 | "style": "IPY_MODEL_9a0a067c17b64df2a1184986c2412b26",
1355 | "value": " 48.0/48.0 [00:00<00:00, 2.55kB/s]"
1356 | }
1357 | },
1358 | "493ff70e5b59414c8ff285fc50391b12": {
1359 | "model_module": "@jupyter-widgets/base",
1360 | "model_name": "LayoutModel",
1361 | "model_module_version": "1.2.0",
1362 | "state": {
1363 | "_model_module": "@jupyter-widgets/base",
1364 | "_model_module_version": "1.2.0",
1365 | "_model_name": "LayoutModel",
1366 | "_view_count": null,
1367 | "_view_module": "@jupyter-widgets/base",
1368 | "_view_module_version": "1.2.0",
1369 | "_view_name": "LayoutView",
1370 | "align_content": null,
1371 | "align_items": null,
1372 | "align_self": null,
1373 | "border": null,
1374 | "bottom": null,
1375 | "display": null,
1376 | "flex": null,
1377 | "flex_flow": null,
1378 | "grid_area": null,
1379 | "grid_auto_columns": null,
1380 | "grid_auto_flow": null,
1381 | "grid_auto_rows": null,
1382 | "grid_column": null,
1383 | "grid_gap": null,
1384 | "grid_row": null,
1385 | "grid_template_areas": null,
1386 | "grid_template_columns": null,
1387 | "grid_template_rows": null,
1388 | "height": null,
1389 | "justify_content": null,
1390 | "justify_items": null,
1391 | "left": null,
1392 | "margin": null,
1393 | "max_height": null,
1394 | "max_width": null,
1395 | "min_height": null,
1396 | "min_width": null,
1397 | "object_fit": null,
1398 | "object_position": null,
1399 | "order": null,
1400 | "overflow": null,
1401 | "overflow_x": null,
1402 | "overflow_y": null,
1403 | "padding": null,
1404 | "right": null,
1405 | "top": null,
1406 | "visibility": null,
1407 | "width": null
1408 | }
1409 | },
1410 | "f6eaee53284c427f99126187a1883081": {
1411 | "model_module": "@jupyter-widgets/base",
1412 | "model_name": "LayoutModel",
1413 | "model_module_version": "1.2.0",
1414 | "state": {
1415 | "_model_module": "@jupyter-widgets/base",
1416 | "_model_module_version": "1.2.0",
1417 | "_model_name": "LayoutModel",
1418 | "_view_count": null,
1419 | "_view_module": "@jupyter-widgets/base",
1420 | "_view_module_version": "1.2.0",
1421 | "_view_name": "LayoutView",
1422 | "align_content": null,
1423 | "align_items": null,
1424 | "align_self": null,
1425 | "border": null,
1426 | "bottom": null,
1427 | "display": null,
1428 | "flex": null,
1429 | "flex_flow": null,
1430 | "grid_area": null,
1431 | "grid_auto_columns": null,
1432 | "grid_auto_flow": null,
1433 | "grid_auto_rows": null,
1434 | "grid_column": null,
1435 | "grid_gap": null,
1436 | "grid_row": null,
1437 | "grid_template_areas": null,
1438 | "grid_template_columns": null,
1439 | "grid_template_rows": null,
1440 | "height": null,
1441 | "justify_content": null,
1442 | "justify_items": null,
1443 | "left": null,
1444 | "margin": null,
1445 | "max_height": null,
1446 | "max_width": null,
1447 | "min_height": null,
1448 | "min_width": null,
1449 | "object_fit": null,
1450 | "object_position": null,
1451 | "order": null,
1452 | "overflow": null,
1453 | "overflow_x": null,
1454 | "overflow_y": null,
1455 | "padding": null,
1456 | "right": null,
1457 | "top": null,
1458 | "visibility": null,
1459 | "width": null
1460 | }
1461 | },
1462 | "04c3cf767105473fb6756223ef2d0030": {
1463 | "model_module": "@jupyter-widgets/controls",
1464 | "model_name": "DescriptionStyleModel",
1465 | "model_module_version": "1.5.0",
1466 | "state": {
1467 | "_model_module": "@jupyter-widgets/controls",
1468 | "_model_module_version": "1.5.0",
1469 | "_model_name": "DescriptionStyleModel",
1470 | "_view_count": null,
1471 | "_view_module": "@jupyter-widgets/base",
1472 | "_view_module_version": "1.2.0",
1473 | "_view_name": "StyleView",
1474 | "description_width": ""
1475 | }
1476 | },
1477 | "7ffd07ee1d4d482d8a9f1e2d8ddb2772": {
1478 | "model_module": "@jupyter-widgets/base",
1479 | "model_name": "LayoutModel",
1480 | "model_module_version": "1.2.0",
1481 | "state": {
1482 | "_model_module": "@jupyter-widgets/base",
1483 | "_model_module_version": "1.2.0",
1484 | "_model_name": "LayoutModel",
1485 | "_view_count": null,
1486 | "_view_module": "@jupyter-widgets/base",
1487 | "_view_module_version": "1.2.0",
1488 | "_view_name": "LayoutView",
1489 | "align_content": null,
1490 | "align_items": null,
1491 | "align_self": null,
1492 | "border": null,
1493 | "bottom": null,
1494 | "display": null,
1495 | "flex": null,
1496 | "flex_flow": null,
1497 | "grid_area": null,
1498 | "grid_auto_columns": null,
1499 | "grid_auto_flow": null,
1500 | "grid_auto_rows": null,
1501 | "grid_column": null,
1502 | "grid_gap": null,
1503 | "grid_row": null,
1504 | "grid_template_areas": null,
1505 | "grid_template_columns": null,
1506 | "grid_template_rows": null,
1507 | "height": null,
1508 | "justify_content": null,
1509 | "justify_items": null,
1510 | "left": null,
1511 | "margin": null,
1512 | "max_height": null,
1513 | "max_width": null,
1514 | "min_height": null,
1515 | "min_width": null,
1516 | "object_fit": null,
1517 | "object_position": null,
1518 | "order": null,
1519 | "overflow": null,
1520 | "overflow_x": null,
1521 | "overflow_y": null,
1522 | "padding": null,
1523 | "right": null,
1524 | "top": null,
1525 | "visibility": null,
1526 | "width": null
1527 | }
1528 | },
1529 | "15898c9d06734d60b23ee92d03592c33": {
1530 | "model_module": "@jupyter-widgets/controls",
1531 | "model_name": "ProgressStyleModel",
1532 | "model_module_version": "1.5.0",
1533 | "state": {
1534 | "_model_module": "@jupyter-widgets/controls",
1535 | "_model_module_version": "1.5.0",
1536 | "_model_name": "ProgressStyleModel",
1537 | "_view_count": null,
1538 | "_view_module": "@jupyter-widgets/base",
1539 | "_view_module_version": "1.2.0",
1540 | "_view_name": "StyleView",
1541 | "bar_color": null,
1542 | "description_width": ""
1543 | }
1544 | },
1545 | "92bef379c00f46e2b02ace047c50a9af": {
1546 | "model_module": "@jupyter-widgets/base",
1547 | "model_name": "LayoutModel",
1548 | "model_module_version": "1.2.0",
1549 | "state": {
1550 | "_model_module": "@jupyter-widgets/base",
1551 | "_model_module_version": "1.2.0",
1552 | "_model_name": "LayoutModel",
1553 | "_view_count": null,
1554 | "_view_module": "@jupyter-widgets/base",
1555 | "_view_module_version": "1.2.0",
1556 | "_view_name": "LayoutView",
1557 | "align_content": null,
1558 | "align_items": null,
1559 | "align_self": null,
1560 | "border": null,
1561 | "bottom": null,
1562 | "display": null,
1563 | "flex": null,
1564 | "flex_flow": null,
1565 | "grid_area": null,
1566 | "grid_auto_columns": null,
1567 | "grid_auto_flow": null,
1568 | "grid_auto_rows": null,
1569 | "grid_column": null,
1570 | "grid_gap": null,
1571 | "grid_row": null,
1572 | "grid_template_areas": null,
1573 | "grid_template_columns": null,
1574 | "grid_template_rows": null,
1575 | "height": null,
1576 | "justify_content": null,
1577 | "justify_items": null,
1578 | "left": null,
1579 | "margin": null,
1580 | "max_height": null,
1581 | "max_width": null,
1582 | "min_height": null,
1583 | "min_width": null,
1584 | "object_fit": null,
1585 | "object_position": null,
1586 | "order": null,
1587 | "overflow": null,
1588 | "overflow_x": null,
1589 | "overflow_y": null,
1590 | "padding": null,
1591 | "right": null,
1592 | "top": null,
1593 | "visibility": null,
1594 | "width": null
1595 | }
1596 | },
1597 | "9a0a067c17b64df2a1184986c2412b26": {
1598 | "model_module": "@jupyter-widgets/controls",
1599 | "model_name": "DescriptionStyleModel",
1600 | "model_module_version": "1.5.0",
1601 | "state": {
1602 | "_model_module": "@jupyter-widgets/controls",
1603 | "_model_module_version": "1.5.0",
1604 | "_model_name": "DescriptionStyleModel",
1605 | "_view_count": null,
1606 | "_view_module": "@jupyter-widgets/base",
1607 | "_view_module_version": "1.2.0",
1608 | "_view_name": "StyleView",
1609 | "description_width": ""
1610 | }
1611 | },
1612 | "668ed89f0cdd460498543af635f4dd68": {
1613 | "model_module": "@jupyter-widgets/controls",
1614 | "model_name": "HBoxModel",
1615 | "model_module_version": "1.5.0",
1616 | "state": {
1617 | "_dom_classes": [],
1618 | "_model_module": "@jupyter-widgets/controls",
1619 | "_model_module_version": "1.5.0",
1620 | "_model_name": "HBoxModel",
1621 | "_view_count": null,
1622 | "_view_module": "@jupyter-widgets/controls",
1623 | "_view_module_version": "1.5.0",
1624 | "_view_name": "HBoxView",
1625 | "box_style": "",
1626 | "children": [
1627 | "IPY_MODEL_ae9729bae35748b9b757ab558d50bdc4",
1628 | "IPY_MODEL_31797fec525e43aa9f968b63e95b8aaa",
1629 | "IPY_MODEL_5ec34446ceeb4f37a9f9ca0e43103416"
1630 | ],
1631 | "layout": "IPY_MODEL_f0d3166484384e15ae480e8d42504d52"
1632 | }
1633 | },
1634 | "ae9729bae35748b9b757ab558d50bdc4": {
1635 | "model_module": "@jupyter-widgets/controls",
1636 | "model_name": "HTMLModel",
1637 | "model_module_version": "1.5.0",
1638 | "state": {
1639 | "_dom_classes": [],
1640 | "_model_module": "@jupyter-widgets/controls",
1641 | "_model_module_version": "1.5.0",
1642 | "_model_name": "HTMLModel",
1643 | "_view_count": null,
1644 | "_view_module": "@jupyter-widgets/controls",
1645 | "_view_module_version": "1.5.0",
1646 | "_view_name": "HTMLView",
1647 | "description": "",
1648 | "description_tooltip": null,
1649 | "layout": "IPY_MODEL_560e25624e6844c7b92f705b9a6d06f5",
1650 | "placeholder": "",
1651 | "style": "IPY_MODEL_521e0794f59a4d0c8a74858d8ca8802c",
1652 | "value": "vocab.txt: 100%"
1653 | }
1654 | },
1655 | "31797fec525e43aa9f968b63e95b8aaa": {
1656 | "model_module": "@jupyter-widgets/controls",
1657 | "model_name": "FloatProgressModel",
1658 | "model_module_version": "1.5.0",
1659 | "state": {
1660 | "_dom_classes": [],
1661 | "_model_module": "@jupyter-widgets/controls",
1662 | "_model_module_version": "1.5.0",
1663 | "_model_name": "FloatProgressModel",
1664 | "_view_count": null,
1665 | "_view_module": "@jupyter-widgets/controls",
1666 | "_view_module_version": "1.5.0",
1667 | "_view_name": "ProgressView",
1668 | "bar_style": "success",
1669 | "description": "",
1670 | "description_tooltip": null,
1671 | "layout": "IPY_MODEL_d6ba19a6e2374b34bd7a954a0f29a95a",
1672 | "max": 231508,
1673 | "min": 0,
1674 | "orientation": "horizontal",
1675 | "style": "IPY_MODEL_d04c2e5c9a9d4dc59a4ef5d61f55faf7",
1676 | "value": 231508
1677 | }
1678 | },
1679 | "5ec34446ceeb4f37a9f9ca0e43103416": {
1680 | "model_module": "@jupyter-widgets/controls",
1681 | "model_name": "HTMLModel",
1682 | "model_module_version": "1.5.0",
1683 | "state": {
1684 | "_dom_classes": [],
1685 | "_model_module": "@jupyter-widgets/controls",
1686 | "_model_module_version": "1.5.0",
1687 | "_model_name": "HTMLModel",
1688 | "_view_count": null,
1689 | "_view_module": "@jupyter-widgets/controls",
1690 | "_view_module_version": "1.5.0",
1691 | "_view_name": "HTMLView",
1692 | "description": "",
1693 | "description_tooltip": null,
1694 | "layout": "IPY_MODEL_0b4f45ff04994accb494587a662a648a",
1695 | "placeholder": "",
1696 | "style": "IPY_MODEL_beb8dd797e774760b641d27ecde161bc",
1697 | "value": " 232k/232k [00:00<00:00, 1.69MB/s]"
1698 | }
1699 | },
1700 | "f0d3166484384e15ae480e8d42504d52": {
1701 | "model_module": "@jupyter-widgets/base",
1702 | "model_name": "LayoutModel",
1703 | "model_module_version": "1.2.0",
1704 | "state": {
1705 | "_model_module": "@jupyter-widgets/base",
1706 | "_model_module_version": "1.2.0",
1707 | "_model_name": "LayoutModel",
1708 | "_view_count": null,
1709 | "_view_module": "@jupyter-widgets/base",
1710 | "_view_module_version": "1.2.0",
1711 | "_view_name": "LayoutView",
1712 | "align_content": null,
1713 | "align_items": null,
1714 | "align_self": null,
1715 | "border": null,
1716 | "bottom": null,
1717 | "display": null,
1718 | "flex": null,
1719 | "flex_flow": null,
1720 | "grid_area": null,
1721 | "grid_auto_columns": null,
1722 | "grid_auto_flow": null,
1723 | "grid_auto_rows": null,
1724 | "grid_column": null,
1725 | "grid_gap": null,
1726 | "grid_row": null,
1727 | "grid_template_areas": null,
1728 | "grid_template_columns": null,
1729 | "grid_template_rows": null,
1730 | "height": null,
1731 | "justify_content": null,
1732 | "justify_items": null,
1733 | "left": null,
1734 | "margin": null,
1735 | "max_height": null,
1736 | "max_width": null,
1737 | "min_height": null,
1738 | "min_width": null,
1739 | "object_fit": null,
1740 | "object_position": null,
1741 | "order": null,
1742 | "overflow": null,
1743 | "overflow_x": null,
1744 | "overflow_y": null,
1745 | "padding": null,
1746 | "right": null,
1747 | "top": null,
1748 | "visibility": null,
1749 | "width": null
1750 | }
1751 | },
1752 | "560e25624e6844c7b92f705b9a6d06f5": {
1753 | "model_module": "@jupyter-widgets/base",
1754 | "model_name": "LayoutModel",
1755 | "model_module_version": "1.2.0",
1756 | "state": {
1757 | "_model_module": "@jupyter-widgets/base",
1758 | "_model_module_version": "1.2.0",
1759 | "_model_name": "LayoutModel",
1760 | "_view_count": null,
1761 | "_view_module": "@jupyter-widgets/base",
1762 | "_view_module_version": "1.2.0",
1763 | "_view_name": "LayoutView",
1764 | "align_content": null,
1765 | "align_items": null,
1766 | "align_self": null,
1767 | "border": null,
1768 | "bottom": null,
1769 | "display": null,
1770 | "flex": null,
1771 | "flex_flow": null,
1772 | "grid_area": null,
1773 | "grid_auto_columns": null,
1774 | "grid_auto_flow": null,
1775 | "grid_auto_rows": null,
1776 | "grid_column": null,
1777 | "grid_gap": null,
1778 | "grid_row": null,
1779 | "grid_template_areas": null,
1780 | "grid_template_columns": null,
1781 | "grid_template_rows": null,
1782 | "height": null,
1783 | "justify_content": null,
1784 | "justify_items": null,
1785 | "left": null,
1786 | "margin": null,
1787 | "max_height": null,
1788 | "max_width": null,
1789 | "min_height": null,
1790 | "min_width": null,
1791 | "object_fit": null,
1792 | "object_position": null,
1793 | "order": null,
1794 | "overflow": null,
1795 | "overflow_x": null,
1796 | "overflow_y": null,
1797 | "padding": null,
1798 | "right": null,
1799 | "top": null,
1800 | "visibility": null,
1801 | "width": null
1802 | }
1803 | },
1804 | "521e0794f59a4d0c8a74858d8ca8802c": {
1805 | "model_module": "@jupyter-widgets/controls",
1806 | "model_name": "DescriptionStyleModel",
1807 | "model_module_version": "1.5.0",
1808 | "state": {
1809 | "_model_module": "@jupyter-widgets/controls",
1810 | "_model_module_version": "1.5.0",
1811 | "_model_name": "DescriptionStyleModel",
1812 | "_view_count": null,
1813 | "_view_module": "@jupyter-widgets/base",
1814 | "_view_module_version": "1.2.0",
1815 | "_view_name": "StyleView",
1816 | "description_width": ""
1817 | }
1818 | },
1819 | "d6ba19a6e2374b34bd7a954a0f29a95a": {
1820 | "model_module": "@jupyter-widgets/base",
1821 | "model_name": "LayoutModel",
1822 | "model_module_version": "1.2.0",
1823 | "state": {
1824 | "_model_module": "@jupyter-widgets/base",
1825 | "_model_module_version": "1.2.0",
1826 | "_model_name": "LayoutModel",
1827 | "_view_count": null,
1828 | "_view_module": "@jupyter-widgets/base",
1829 | "_view_module_version": "1.2.0",
1830 | "_view_name": "LayoutView",
1831 | "align_content": null,
1832 | "align_items": null,
1833 | "align_self": null,
1834 | "border": null,
1835 | "bottom": null,
1836 | "display": null,
1837 | "flex": null,
1838 | "flex_flow": null,
1839 | "grid_area": null,
1840 | "grid_auto_columns": null,
1841 | "grid_auto_flow": null,
1842 | "grid_auto_rows": null,
1843 | "grid_column": null,
1844 | "grid_gap": null,
1845 | "grid_row": null,
1846 | "grid_template_areas": null,
1847 | "grid_template_columns": null,
1848 | "grid_template_rows": null,
1849 | "height": null,
1850 | "justify_content": null,
1851 | "justify_items": null,
1852 | "left": null,
1853 | "margin": null,
1854 | "max_height": null,
1855 | "max_width": null,
1856 | "min_height": null,
1857 | "min_width": null,
1858 | "object_fit": null,
1859 | "object_position": null,
1860 | "order": null,
1861 | "overflow": null,
1862 | "overflow_x": null,
1863 | "overflow_y": null,
1864 | "padding": null,
1865 | "right": null,
1866 | "top": null,
1867 | "visibility": null,
1868 | "width": null
1869 | }
1870 | },
1871 | "d04c2e5c9a9d4dc59a4ef5d61f55faf7": {
1872 | "model_module": "@jupyter-widgets/controls",
1873 | "model_name": "ProgressStyleModel",
1874 | "model_module_version": "1.5.0",
1875 | "state": {
1876 | "_model_module": "@jupyter-widgets/controls",
1877 | "_model_module_version": "1.5.0",
1878 | "_model_name": "ProgressStyleModel",
1879 | "_view_count": null,
1880 | "_view_module": "@jupyter-widgets/base",
1881 | "_view_module_version": "1.2.0",
1882 | "_view_name": "StyleView",
1883 | "bar_color": null,
1884 | "description_width": ""
1885 | }
1886 | },
1887 | "0b4f45ff04994accb494587a662a648a": {
1888 | "model_module": "@jupyter-widgets/base",
1889 | "model_name": "LayoutModel",
1890 | "model_module_version": "1.2.0",
1891 | "state": {
1892 | "_model_module": "@jupyter-widgets/base",
1893 | "_model_module_version": "1.2.0",
1894 | "_model_name": "LayoutModel",
1895 | "_view_count": null,
1896 | "_view_module": "@jupyter-widgets/base",
1897 | "_view_module_version": "1.2.0",
1898 | "_view_name": "LayoutView",
1899 | "align_content": null,
1900 | "align_items": null,
1901 | "align_self": null,
1902 | "border": null,
1903 | "bottom": null,
1904 | "display": null,
1905 | "flex": null,
1906 | "flex_flow": null,
1907 | "grid_area": null,
1908 | "grid_auto_columns": null,
1909 | "grid_auto_flow": null,
1910 | "grid_auto_rows": null,
1911 | "grid_column": null,
1912 | "grid_gap": null,
1913 | "grid_row": null,
1914 | "grid_template_areas": null,
1915 | "grid_template_columns": null,
1916 | "grid_template_rows": null,
1917 | "height": null,
1918 | "justify_content": null,
1919 | "justify_items": null,
1920 | "left": null,
1921 | "margin": null,
1922 | "max_height": null,
1923 | "max_width": null,
1924 | "min_height": null,
1925 | "min_width": null,
1926 | "object_fit": null,
1927 | "object_position": null,
1928 | "order": null,
1929 | "overflow": null,
1930 | "overflow_x": null,
1931 | "overflow_y": null,
1932 | "padding": null,
1933 | "right": null,
1934 | "top": null,
1935 | "visibility": null,
1936 | "width": null
1937 | }
1938 | },
1939 | "beb8dd797e774760b641d27ecde161bc": {
1940 | "model_module": "@jupyter-widgets/controls",
1941 | "model_name": "DescriptionStyleModel",
1942 | "model_module_version": "1.5.0",
1943 | "state": {
1944 | "_model_module": "@jupyter-widgets/controls",
1945 | "_model_module_version": "1.5.0",
1946 | "_model_name": "DescriptionStyleModel",
1947 | "_view_count": null,
1948 | "_view_module": "@jupyter-widgets/base",
1949 | "_view_module_version": "1.2.0",
1950 | "_view_name": "StyleView",
1951 | "description_width": ""
1952 | }
1953 | }
1954 | }
1955 | }
1956 | },
1957 | "nbformat": 4,
1958 | "nbformat_minor": 0
1959 | }
--------------------------------------------------------------------------------
/week10_gpt/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week10_gpt/lecture.pdf
--------------------------------------------------------------------------------
/week11_cv_transformers/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week11_cv_transformers/lecture.pdf
--------------------------------------------------------------------------------
/week12_gan/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week12_gan/lecture.pdf
--------------------------------------------------------------------------------
/week13_latent_models/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week13_latent_models/lecture.pdf
--------------------------------------------------------------------------------
/week14_representation_learning/.DS_Store:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week14_representation_learning/.DS_Store
--------------------------------------------------------------------------------
/week14_representation_learning/lecture.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ShieldVP/deep-learning-course/aafaa4d3a61ed36ebcfb789196b4014477fe5b55/week14_representation_learning/lecture.pdf
--------------------------------------------------------------------------------