├── Adam and weight decay ├── Tests SGD with Adam and wd.ipynb └── cifar10-dawn-adam.ipynb ├── Bug with frozen LSTM layer.ipynb ├── Building a French LM.ipynb ├── Cache pointer.ipynb ├── Cyclical LR and momentums.ipynb ├── DeepPainterlyHarmonization.ipynb ├── Experiments ├── Cifar10-mixup-cutout.ipynb ├── Post process logs.ipynb ├── multiGPU │ ├── callbacks.py │ ├── databunch.py │ ├── sampler.py │ ├── train_cifar10.py │ └── utils.py └── record_logs.py ├── First neural net in pytorch.ipynb ├── Initialize the bias in the final layer of an SSD.ipynb ├── LM_wikitext.ipynb ├── LM_wikitext_MOTAS.ipynb ├── LM_wikitext_mixup.ipynb ├── Learning rate finder.ipynb ├── Lesson 9 loss function ├── The loss function from scratch.ipynb ├── overlaps0.npy ├── overlaps4.npy ├── pred_bb.npy ├── pred_bb1.npy ├── pred_cls.npy ├── pred_cls1.npy ├── targ_bb.npy └── targ_cls.npy ├── README.md ├── Resnet 50 and Darknet 53.ipynb ├── Retina net Pascal.ipynb ├── Retina net Pascal1.ipynb ├── Understanding the new fastai API for scheduling training.ipynb ├── Using the callback system in fastai.ipynb ├── img ├── FPN.png └── RetinaHead.png ├── mAP ├── Computing the mAP metric.ipynb └── focus-4b.h5 └── wikitext_103.ipynb /Bug with frozen LSTM layer.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import torch\n", 10 | "from torch.autograd import Variable as V\n", 11 | "import torch.nn as nn\n", 12 | "import torch.nn.functional as F" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "Simple model for repro. We have a pretrained Language Model and want to freeze all of it except the embeddings in a first phase." 20 | ] 21 | }, 22 | { 23 | "cell_type": "code", 24 | "execution_count": 11, 25 | "metadata": {}, 26 | "outputs": [], 27 | "source": [ 28 | "model = nn.Sequential(nn.Linear(10,20), nn.ReLU(inplace=True),nn.LSTM(20,5, 1)).cuda()" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "Freeze the parameters linked to the LSTM." 36 | ] 37 | }, 38 | { 39 | "cell_type": "code", 40 | "execution_count": 3, 41 | "metadata": {}, 42 | "outputs": [], 43 | "source": [ 44 | "for param in list(model.parameters())[2:]: param.requires_grad=False" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": {}, 50 | "source": [ 51 | "Grab a random tensor and feed it to the model." 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 4, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "x = torch.randn(2,4,10).cuda()\n", 61 | "x.requires_grad = True" 62 | ] 63 | }, 64 | { 65 | "cell_type": "code", 66 | "execution_count": 5, 67 | "metadata": {}, 68 | "outputs": [], 69 | "source": [ 70 | "z = model(x)" 71 | ] 72 | }, 73 | { 74 | "cell_type": "code", 75 | "execution_count": 6, 76 | "metadata": {}, 77 | "outputs": [ 78 | { 79 | "data": { 80 | "text/plain": [ 81 | "True" 82 | ] 83 | }, 84 | "execution_count": 6, 85 | "metadata": {}, 86 | "output_type": "execute_result" 87 | } 88 | ], 89 | "source": [ 90 | "z[0].requires_grad" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 7, 96 | "metadata": {}, 97 | "outputs": [ 98 | { 99 | "data": { 100 | "text/plain": [ 101 | "torch.Size([2, 4, 5])" 102 | ] 103 | }, 104 | "execution_count": 7, 105 | "metadata": {}, 106 | "output_type": "execute_result" 107 | } 108 | ], 109 | "source": [ 110 | "z[0].size()" 111 | ] 112 | }, 113 | { 114 | "cell_type": "markdown", 115 | "metadata": {}, 116 | "source": [ 117 | "Create a random target to get some loss." 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": 8, 123 | "metadata": {}, 124 | "outputs": [], 125 | "source": [ 126 | "y = torch.Tensor([0,1,2,3, 0,1,2,3]).long().cuda()" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 9, 132 | "metadata": {}, 133 | "outputs": [], 134 | "source": [ 135 | "loss = F.cross_entropy(z[0].view(-1,5),y)" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 10, 141 | "metadata": {}, 142 | "outputs": [ 143 | { 144 | "ename": "RuntimeError", 145 | "evalue": "inconsistent range for TensorList output", 146 | "output_type": "error", 147 | "traceback": [ 148 | "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m", 149 | "\u001b[1;31mRuntimeError\u001b[0m Traceback (most recent call last)", 150 | "\u001b[1;32m\u001b[0m in \u001b[0;36m\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mloss\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m", 151 | "\u001b[1;32m~\\Anaconda3\\envs\\fastai\\lib\\site-packages\\torch\\tensor.py\u001b[0m in \u001b[0;36mbackward\u001b[1;34m(self, gradient, retain_graph, create_graph)\u001b[0m\n\u001b[0;32m 91\u001b[0m \u001b[0mproducts\u001b[0m\u001b[1;33m.\u001b[0m \u001b[0mDefaults\u001b[0m \u001b[0mto\u001b[0m\u001b[0;31m \u001b[0m\u001b[0;31m`\u001b[0m\u001b[0;31m`\u001b[0m\u001b[1;32mFalse\u001b[0m\u001b[0;31m`\u001b[0m\u001b[0;31m`\u001b[0m\u001b[1;33m.\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 92\u001b[0m \"\"\"\n\u001b[1;32m---> 93\u001b[1;33m \u001b[0mtorch\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mautograd\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mbackward\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgradient\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m 94\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 95\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mregister_hook\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mself\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mhook\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n", 152 | "\u001b[1;32m~\\Anaconda3\\envs\\fastai\\lib\\site-packages\\torch\\autograd\\__init__.py\u001b[0m in \u001b[0;36mbackward\u001b[1;34m(tensors, grad_tensors, retain_graph, create_graph, grad_variables)\u001b[0m\n\u001b[0;32m 87\u001b[0m Variable._execution_engine.run_backward(\n\u001b[0;32m 88\u001b[0m \u001b[0mtensors\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mgrad_tensors\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mretain_graph\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0mcreate_graph\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m---> 89\u001b[1;33m allow_unreachable=True) # allow_unreachable flag\n\u001b[0m\u001b[0;32m 90\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m 91\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n", 153 | "\u001b[1;31mRuntimeError\u001b[0m: inconsistent range for TensorList output" 154 | ] 155 | } 156 | ], 157 | "source": [ 158 | "loss.backward()" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": null, 164 | "metadata": {}, 165 | "outputs": [], 166 | "source": [] 167 | } 168 | ], 169 | "metadata": { 170 | "kernelspec": { 171 | "display_name": "Python 3", 172 | "language": "python", 173 | "name": "python3" 174 | }, 175 | "language_info": { 176 | "codemirror_mode": { 177 | "name": "ipython", 178 | "version": 3 179 | }, 180 | "file_extension": ".py", 181 | "mimetype": "text/x-python", 182 | "name": "python", 183 | "nbconvert_exporter": "python", 184 | "pygments_lexer": "ipython3", 185 | "version": "3.6.4" 186 | } 187 | }, 188 | "nbformat": 4, 189 | "nbformat_minor": 2 190 | } 191 | -------------------------------------------------------------------------------- /Cache pointer.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "This notebook goes with [this blog post](https://sgugger.github.io/pointer-cache-for-language-model.html#pointer-cache-for-language-model) that explains what the continuous cache pointer is. This technique was introduce by Grave et al. in [this article](https://arxiv.org/pdf/1612.04426.pdf)." 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "%matplotlib inline\n", 17 | "%reload_ext autoreload\n", 18 | "%autoreload 2" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "This notebook uses the [fastai](https://github.com/fastai/fastai) library." 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 2, 31 | "metadata": {}, 32 | "outputs": [], 33 | "source": [ 34 | "from fastai.text import *" 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": {}, 40 | "source": [ 41 | "Be sure to change the path to where the data is on your hard drive. The wikitext-2 can be downloaded [here](https://www.salesforce.com/products/einstein/ai-research/the-wikitext-dependency-language-modeling-dataset/)." 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 3, 47 | "metadata": {}, 48 | "outputs": [], 49 | "source": [ 50 | "EOS = ''\n", 51 | "PATH=Path('../data/wikitext')" 52 | ] 53 | }, 54 | { 55 | "cell_type": "markdown", 56 | "metadata": {}, 57 | "source": [ 58 | "As indicated on their website, we just had the EOS token at the end of each line." 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 4, 64 | "metadata": {}, 65 | "outputs": [], 66 | "source": [ 67 | "def read_file(filename):\n", 68 | " tokens = []\n", 69 | " with open(PATH/filename, encoding='utf8') as f:\n", 70 | " for line in f:\n", 71 | " tokens.append(line.split() + [EOS])\n", 72 | " return np.array(tokens)" 73 | ] 74 | }, 75 | { 76 | "cell_type": "code", 77 | "execution_count": 5, 78 | "metadata": {}, 79 | "outputs": [], 80 | "source": [ 81 | "trn_tok = read_file('wiki.train.tokens')\n", 82 | "val_tok = read_file('wiki.valid.tokens')\n", 83 | "tst_tok = read_file('wiki.test.tokens')" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 6, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "data": { 93 | "text/plain": [ 94 | "36718" 95 | ] 96 | }, 97 | "execution_count": 6, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | } 101 | ], 102 | "source": [ 103 | "len(trn_tok)" 104 | ] 105 | }, 106 | { 107 | "cell_type": "markdown", 108 | "metadata": {}, 109 | "source": [ 110 | "We numericliaze the tokens into ids." 111 | ] 112 | }, 113 | { 114 | "cell_type": "code", 115 | "execution_count": 7, 116 | "metadata": {}, 117 | "outputs": [], 118 | "source": [ 119 | "cnt = Counter(word for sent in trn_tok for word in sent)\n", 120 | "itos = [o for o,c in cnt.most_common()]\n", 121 | "itos.insert(0,'_pad_')" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 8, 127 | "metadata": {}, 128 | "outputs": [ 129 | { 130 | "data": { 131 | "text/plain": [ 132 | "33279" 133 | ] 134 | }, 135 | "execution_count": 8, 136 | "metadata": {}, 137 | "output_type": "execute_result" 138 | } 139 | ], 140 | "source": [ 141 | "vocab_size = len(itos); vocab_size" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": {}, 147 | "source": [ 148 | "And here is the way from tokens to ids." 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 9, 154 | "metadata": {}, 155 | "outputs": [], 156 | "source": [ 157 | "stoi = collections.defaultdict(lambda : 5, {w:i for i,w in enumerate(itos)})" 158 | ] 159 | }, 160 | { 161 | "cell_type": "code", 162 | "execution_count": 10, 163 | "metadata": {}, 164 | "outputs": [], 165 | "source": [ 166 | "trn_ids = np.array([([stoi[w] for w in s]) for s in trn_tok])\n", 167 | "val_ids = np.array([([stoi[w] for w in s]) for s in val_tok])\n", 168 | "tst_ids = np.array([([stoi[w] for w in s]) for s in tst_tok])" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": {}, 174 | "source": [ 175 | "Thos are the parameters of our model" 176 | ] 177 | }, 178 | { 179 | "cell_type": "code", 180 | "execution_count": 11, 181 | "metadata": {}, 182 | "outputs": [], 183 | "source": [ 184 | "em_sz,nh,nl = 400,1150,3\n", 185 | "drops = np.array([0.6,0.4,0.5,0.05,0.2])" 186 | ] 187 | }, 188 | { 189 | "cell_type": "markdown", 190 | "metadata": {}, 191 | "source": [ 192 | "This is just to create a learner object that won't be used since we don't train here." 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 12, 198 | "metadata": {}, 199 | "outputs": [], 200 | "source": [ 201 | "bptt, bs = 5,2" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": 13, 207 | "metadata": {}, 208 | "outputs": [], 209 | "source": [ 210 | "trn_dl = LanguageModelLoader(np.concatenate(trn_ids), bs, bptt)\n", 211 | "val_dl = LanguageModelLoader(np.concatenate(val_ids), bs, bptt)\n", 212 | "md = LanguageModelData(PATH, 0, vocab_size, trn_dl, val_dl, bs=bs, bptt=bptt)" 213 | ] 214 | }, 215 | { 216 | "cell_type": "code", 217 | "execution_count": 14, 218 | "metadata": {}, 219 | "outputs": [], 220 | "source": [ 221 | "opt_fn = partial(optim.SGD, momentum=0.9)\n", 222 | "learner= md.get_model(opt_fn, em_sz, nh, nl,\n", 223 | " dropouti=drops[0], dropout=drops[1], wdrop=drops[2], dropoute=drops[3], dropouth=drops[4])" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "The model I use as en example is stored [here](https://s3.us-east-2.amazonaws.com/sgugger/best.h5). Be sure to have the file best.h5 in a directory called models where the variable PATH points to (our replace by any model you've saved)." 231 | ] 232 | }, 233 | { 234 | "cell_type": "code", 235 | "execution_count": 15, 236 | "metadata": {}, 237 | "outputs": [], 238 | "source": [ 239 | "learner.load('best')" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "Let's begin by computing how well our model is doing before anything else. To do that we will need a way to go through all of our text, but instead of using the fastai LanguageModelLoader (who randomly modifies the bptt) we'll change the code to have a fixed bptt.\n", 247 | "\n", 248 | "Also we don't want to do mini-batches on this validation because it resets the hidden state at each batch, making us lose valuable information. It makes a tiny bit of difference as we will see. " 249 | ] 250 | }, 251 | { 252 | "cell_type": "code", 253 | "execution_count": 16, 254 | "metadata": {}, 255 | "outputs": [], 256 | "source": [ 257 | "#Comes from the LanguageModelLoader class, I just removed the minibatch and fixed the bptt.\n", 258 | "#Now it gives an iterator that will spit bits of size bptt.\n", 259 | "class TextReader():\n", 260 | " def __init__(self, nums, bptt, backwards=False):\n", 261 | " self.bptt,self.backwards = bptt,backwards\n", 262 | " self.data = self.batchify(nums)\n", 263 | " self.i,self.iter = 0,0\n", 264 | " self.n = len(self.data)\n", 265 | "\n", 266 | " def __iter__(self):\n", 267 | " self.i,self.iter = 0,0\n", 268 | " while self.i < self.n-1 and self.iter