├── AutoEncoders.ipynb ├── BigGAN.ipynb ├── BigGanEx.ipynb ├── Custom_Data_Generator_in_Keras.ipynb ├── Deep_GCN_Spam.ipynb ├── Eager_Execution_Enabled.ipynb ├── Eager_Execution_Gradient_.ipynb ├── GPUvsTPU.ipynb ├── GradientFlow.ipynb ├── LICENSE ├── Localizer.ipynb ├── Mask_RCNN.ipynb ├── ONePlace.ipynb ├── QuickDraw10.ipynb ├── README.md ├── SC_FEGAN.ipynb ├── Sketcher.ipynb ├── Strokes_QuickDraw.ipynb ├── Swift4TF_CIFAR10.ipynb ├── Swift4TF_DeepDream.ipynb ├── TF4ST_MNIST.ipynb ├── TF_2_0.ipynb ├── TF_Swift.ipynb ├── WeightTransfer.ipynb ├── images ├── tmp └── weightrasnfer.png ├── tf_ClassficationLocalization.ipynb ├── tf_Face_SSD.ipynb ├── tf_TransferLearning.ipynb ├── tf_handBbox_esitmation.ipynb ├── tf_pix2pix.ipynb └── unet.ipynb /Deep_GCN_Spam.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Deep GCN Spam.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "accelerator": "GPU" 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "metadata": { 30 | "id": "UjoTbUQVnCz8", 31 | "colab_type": "code", 32 | "colab": {} 33 | }, 34 | "cell_type": "code", 35 | "source": [ 36 | "!pip install --upgrade torch-scatter\n", 37 | "!pip install --upgrade torch-sparse\n", 38 | "!pip install --upgrade torch-cluster\n", 39 | "!pip install --upgrade torch-spline-conv \n", 40 | "!pip install torch-geometric" 41 | ], 42 | "execution_count": 0, 43 | "outputs": [] 44 | }, 45 | { 46 | "metadata": { 47 | "id": "xjY9vtO9MgoL", 48 | "colab_type": "text" 49 | }, 50 | "cell_type": "markdown", 51 | "source": [ 52 | "![alt text](https://raw.githubusercontent.com/rusty1s/pytorch_geometric/master/docs/source/_static/img/pyg_logo_text.svg?sanitize=true)" 53 | ] 54 | }, 55 | { 56 | "metadata": { 57 | "id": "P3UffAf8M2Gw", 58 | "colab_type": "text" 59 | }, 60 | "cell_type": "markdown", 61 | "source": [ 62 | "# Intorduction" 63 | ] 64 | }, 65 | { 66 | "metadata": { 67 | "id": "_4_eVOI2M4Uo", 68 | "colab_type": "text" 69 | }, 70 | "cell_type": "markdown", 71 | "source": [ 72 | "PyTorch Geometric [PyG](https://github.com/rusty1s/pytorch_geometric) is a geometric deep learning (GDN) extension library for PyTorch. In general GDN is used to generalize deep learning for non-Ecludian data. For the most part, CNN doesn't work very good for 3D shapes, point clouds and graph structures. Moreover, many real life datasets are inherently non-ecludian like social communicatin datasets, molecular structures, network traffic . etc ... \n", 73 | "\n", 74 | "Graph convolutional networks (GCN) come to the rescue to generalize CNNs to work for non-ecludian datasets. The basic architecture is illustrated below \n", 75 | "\n", 76 | "![alt text](https://tkipf.github.io/graph-convolutional-networks/images/gcn_web.png)\n", 77 | "\n", 78 | "where the input is a graph $G = (V,E)$ represented as \n", 79 | "\n", 80 | "* Feature repsentation for each node $N \\times D$ where N is the number of nodes in the graph and $D$ is the number of features per node. \n", 81 | "* A matrix repsentation of the graph in the form $2\\times L$ where $L$ is the number of edges in the graph. Each column in the matrix represents an edge between two nodes. \n", 82 | "* Edge attributes of the form $L \\times R$ where R is the number of features per each edge. \n", 83 | "\n", 84 | "The output is of form $N \\times F$ where $F$ is the number of features per each node in the graph. \n", 85 | "\n", 86 | "\n" 87 | ] 88 | }, 89 | { 90 | "metadata": { 91 | "id": "YeA0slcJnQik", 92 | "colab_type": "code", 93 | "colab": {} 94 | }, 95 | "cell_type": "code", 96 | "source": [ 97 | "import numpy as np\n", 98 | "import os.path as osp\n", 99 | "import torch\n", 100 | "import torch.nn.functional as F\n", 101 | "from torch_geometric.nn import SplineConv\n", 102 | "from torch_geometric.data import Data\n", 103 | "from random import shuffle, randint\n", 104 | "import networkx as nx\n", 105 | "import matplotlib.pyplot as plt\n", 106 | "import random " 107 | ], 108 | "execution_count": 0, 109 | "outputs": [] 110 | }, 111 | { 112 | "metadata": { 113 | "id": "6pQ-c3ftL_gp", 114 | "colab_type": "text" 115 | }, 116 | "cell_type": "markdown", 117 | "source": [ 118 | "# Dataset\n", 119 | "\n", 120 | "We will simulate a spammer vs non-spammer graph network. Given a node which represents a client that can send emails to different node (another client). \n", 121 | "\n", 122 | "Spammers have some similarities \n", 123 | "\n", 124 | "* More likely to send lots of emails (more edges)\n", 125 | "* More likely to send lots of data through email (we will represent an edge feature is the number of bytes where the value [0, 1] where 1 represents more bytes sent)\n", 126 | "* Each spammer has an associated trust value which is given by the server. If the node is more likely to be a spammer then the value will be closer to 1. \n", 127 | "\n", 128 | "Non-spammers have the opposite features. In the next code snippet will try to simulate all of these features through randomization\n", 129 | "\n" 130 | ] 131 | }, 132 | { 133 | "metadata": { 134 | "id": "MhlVjcdM7l6H", 135 | "colab_type": "code", 136 | "colab": {} 137 | }, 138 | "cell_type": "code", 139 | "source": [ 140 | "labels = []\n", 141 | "N = 1000 \n", 142 | "nodes = range(0, N)\n", 143 | "node_features = []\n", 144 | "edge_features = []\n", 145 | "\n", 146 | "for node in nodes:\n", 147 | " \n", 148 | " #spammer \n", 149 | " if random.random() > 0.5:\n", 150 | " #more likely to have many connections with a maximum of 1/5 of the nodes in the graph \n", 151 | " nb_nbrs = int(random.random() * (N/5))\n", 152 | " #more likely to have sent many bytes\n", 153 | " node_features.append((random.random()+1) / 2.)\n", 154 | " #more likely to have a high trust value \n", 155 | " edge_features += [(random.random()+2)/3.] * nb_nbrs\n", 156 | " #associate a label \n", 157 | " labels.append(1)\n", 158 | " \n", 159 | " #non-spammer \n", 160 | " else:\n", 161 | " #at most connected to 10 nbrs \n", 162 | " nb_nbrs = int(random.random() * 10 + 1)\n", 163 | " #associate more bytes and random bytes \n", 164 | " node_features.append(random.random())\n", 165 | " edge_features += [random.random()] * nb_nbrs\n", 166 | " labels.append(0)\n", 167 | " \n", 168 | " #connect to some random nodes \n", 169 | " nbrs = np.random.choice(nodes, size = nb_nbrs)\n", 170 | " nbrs = nbrs.reshape((1, nb_nbrs))\n", 171 | " \n", 172 | " #add the edges of nbrs \n", 173 | " node_edges = np.concatenate([np.ones((1, nb_nbrs), dtype = np.int32) * node, nbrs], axis = 0)\n", 174 | " \n", 175 | " #add the overall edges \n", 176 | " if node == 0:\n", 177 | " edges = node_edges\n", 178 | " else:\n", 179 | " edges = np.concatenate([edges, node_edges], axis = 1)" 180 | ], 181 | "execution_count": 0, 182 | "outputs": [] 183 | }, 184 | { 185 | "metadata": { 186 | "id": "qvfuQZv5lcM8", 187 | "colab_type": "text" 188 | }, 189 | "cell_type": "markdown", 190 | "source": [ 191 | "Create a data structure " 192 | ] 193 | }, 194 | { 195 | "metadata": { 196 | "id": "W1tyghgVFinu", 197 | "colab_type": "code", 198 | "outputId": "2c970876-76f9-4ef7-c8a5-d9a222b0a768", 199 | "colab": { 200 | "base_uri": "https://localhost:8080/", 201 | "height": 34 202 | } 203 | }, 204 | "cell_type": "code", 205 | "source": [ 206 | "x = torch.tensor(np.expand_dims(node_features, 1), dtype=torch.float)\n", 207 | "y = torch.tensor(labels, dtype=torch.long)\n", 208 | "\n", 209 | "edge_index = torch.tensor(edges, dtype=torch.long)\n", 210 | "edge_attr = torch.tensor(np.expand_dims(edge_features, 1), dtype=torch.float)\n", 211 | "\n", 212 | "data = Data(x = x, edge_index=edge_index, y =y, edge_attr=edge_attr )\n", 213 | "print(data)" 214 | ], 215 | "execution_count": 22, 216 | "outputs": [ 217 | { 218 | "output_type": "stream", 219 | "text": [ 220 | "Data(edge_attr=[49077, 1], edge_index=[2, 49077], x=[1000, 1], y=[1000])\n" 221 | ], 222 | "name": "stdout" 223 | } 224 | ] 225 | }, 226 | { 227 | "metadata": { 228 | "id": "bGcoGWzKlkHy", 229 | "colab_type": "text" 230 | }, 231 | "cell_type": "markdown", 232 | "source": [ 233 | "We will create a trian/test mask where we split the data into training and test. This is necessary because during optimizing the loss when training we don't want to include the nodes part of the testing process " 234 | ] 235 | }, 236 | { 237 | "metadata": { 238 | "id": "WRwBaYmyoLDX", 239 | "colab_type": "code", 240 | "colab": {} 241 | }, 242 | "cell_type": "code", 243 | "source": [ 244 | "data.train_mask = torch.zeros(data.num_nodes, dtype=torch.uint8)\n", 245 | "data.train_mask[:int(0.8 * data.num_nodes)] = 1 #train only on the 80% nodes\n", 246 | "data.test_mask = torch.zeros(data.num_nodes, dtype=torch.uint8) #test on 20 % nodes \n", 247 | "data.test_mask[- int(0.2 * data.num_nodes):] = 1" 248 | ], 249 | "execution_count": 0, 250 | "outputs": [] 251 | }, 252 | { 253 | "metadata": { 254 | "id": "H2YFmL6kl5Dh", 255 | "colab_type": "text" 256 | }, 257 | "cell_type": "markdown", 258 | "source": [ 259 | "# Deep GCN\n", 260 | "\n", 261 | "We will use [SplineConv](https://arxiv.org/abs/1711.08920) layer for the convolution. We will illsue exponential ReLU as an activation function and dropout for regulaization" 262 | ] 263 | }, 264 | { 265 | "metadata": { 266 | "id": "MTlX4IBkoOnm", 267 | "colab_type": "code", 268 | "colab": {} 269 | }, 270 | "cell_type": "code", 271 | "source": [ 272 | "class Net(torch.nn.Module):\n", 273 | " def __init__(self):\n", 274 | " super(Net, self).__init__()\n", 275 | " self.conv1 = SplineConv(1, 16, dim=1, kernel_size=5)\n", 276 | " self.conv2 = SplineConv(16, 32, dim=1, kernel_size=5)\n", 277 | " self.conv3 = SplineConv(32, 64, dim=1, kernel_size=7)\n", 278 | " self.conv4 = SplineConv(64, 128, dim=1, kernel_size=7)\n", 279 | " self.conv5 = SplineConv(128, 128, dim=1, kernel_size=11)\n", 280 | " self.conv6 = SplineConv(128, 2, dim=1, kernel_size=11)\n", 281 | "\n", 282 | " def forward(self):\n", 283 | " x, edge_index, edge_attr = data.x, data.edge_index, data.edge_attr\n", 284 | " x = F.elu(self.conv1(x, edge_index, edge_attr))\n", 285 | " x = self.conv2(x, edge_index, edge_attr)\n", 286 | " x = F.elu(self.conv3(x, edge_index, edge_attr))\n", 287 | " x = self.conv4(x, edge_index, edge_attr)\n", 288 | " x = F.elu(self.conv5(x, edge_index, edge_attr))\n", 289 | " x = self.conv6(x, edge_index, edge_attr)\n", 290 | " x = F.dropout(x, training = self.training)\n", 291 | " return F.log_softmax(x, dim=1)" 292 | ], 293 | "execution_count": 0, 294 | "outputs": [] 295 | }, 296 | { 297 | "metadata": { 298 | "id": "pULYL97tmYel", 299 | "colab_type": "text" 300 | }, 301 | "cell_type": "markdown", 302 | "source": [ 303 | "# Optimization \n", 304 | "\n", 305 | "We will use nll_loss which can be used for classification of arbitrary classes" 306 | ] 307 | }, 308 | { 309 | "metadata": { 310 | "id": "Hhabp4QvoP6V", 311 | "colab_type": "code", 312 | "colab": {} 313 | }, 314 | "cell_type": "code", 315 | "source": [ 316 | "def evaluate_loss(mode = 'train'):\n", 317 | " \n", 318 | " #use masking for loss evaluation \n", 319 | " if mode == 'train':\n", 320 | " loss = F.nll_loss(model()[data.train_mask], data.y[data.train_mask])\n", 321 | " else:\n", 322 | " loss = F.nll_loss(model()[data.test_mask], data.y[data.test_mask])\n", 323 | " return loss\n", 324 | "\n", 325 | "def train():\n", 326 | " #training \n", 327 | " model.train()\n", 328 | " optimizer.zero_grad()\n", 329 | " loss = evaluate_loss()\n", 330 | " loss.backward()\n", 331 | " optimizer.step()\n", 332 | " return loss.detach().cpu().numpy() \n", 333 | "\n", 334 | "def test():\n", 335 | " #testing \n", 336 | " model.eval()\n", 337 | " logits, accs = model(), []\n", 338 | " loss = evaluate_loss(mode = 'test').detach().cpu().numpy() \n", 339 | "\n", 340 | " for _, mask in data('train_mask', 'test_mask'):\n", 341 | " pred = logits[mask].max(1)[1]\n", 342 | " acc = pred.eq(data.y[mask]).sum().item() / mask.sum().item()\n", 343 | " accs.append(acc)\n", 344 | " return [loss] + accs" 345 | ], 346 | "execution_count": 0, 347 | "outputs": [] 348 | }, 349 | { 350 | "metadata": { 351 | "id": "y0XicLqpmqwR", 352 | "colab_type": "text" 353 | }, 354 | "cell_type": "markdown", 355 | "source": [ 356 | "# Setup the model \n", 357 | "We will create the model and setup training using adam optimizer" 358 | ] 359 | }, 360 | { 361 | "metadata": { 362 | "id": "sDvcl5eLoRb3", 363 | "colab_type": "code", 364 | "colab": {} 365 | }, 366 | "cell_type": "code", 367 | "source": [ 368 | "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", 369 | "model, data = Net().to(device), data.to(device)\n", 370 | "optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)" 371 | ], 372 | "execution_count": 0, 373 | "outputs": [] 374 | }, 375 | { 376 | "metadata": { 377 | "id": "FyyfCGZimtX2", 378 | "colab_type": "text" 379 | }, 380 | "cell_type": "markdown", 381 | "source": [ 382 | "# Training and Testing" 383 | ] 384 | }, 385 | { 386 | "metadata": { 387 | "id": "qsslw_68oS52", 388 | "colab_type": "code", 389 | "colab": { 390 | "base_uri": "https://localhost:8080/", 391 | "height": 3400 392 | }, 393 | "outputId": "aa47e6c7-8985-4409-bfe5-805c0e3df3ae" 394 | }, 395 | "cell_type": "code", 396 | "source": [ 397 | "losses = []\n", 398 | "for epoch in range(1, 200):\n", 399 | " train_loss = train()\n", 400 | " log = 'Epoch: {:03d}, train_loss: {:.3f}, test_loss:{:.3f}, train_acc: {:.2f}, test_acc: {:.2f}'\n", 401 | " test_loss = test()[0]\n", 402 | " losses.append([train_loss,test_loss])\n", 403 | " print(log.format(epoch, train_loss, *test()))" 404 | ], 405 | "execution_count": 27, 406 | "outputs": [ 407 | { 408 | "output_type": "stream", 409 | "text": [ 410 | "Epoch: 001, train_loss: 0.692, test_loss:0.687, train_acc: 0.54, test_acc: 0.51\n", 411 | "Epoch: 002, train_loss: 0.686, test_loss:0.680, train_acc: 0.80, test_acc: 0.81\n", 412 | "Epoch: 003, train_loss: 0.680, test_loss:0.670, train_acc: 0.82, test_acc: 0.83\n", 413 | "Epoch: 004, train_loss: 0.671, test_loss:0.656, train_acc: 0.82, test_acc: 0.84\n", 414 | "Epoch: 005, train_loss: 0.657, test_loss:0.635, train_acc: 0.82, test_acc: 0.84\n", 415 | "Epoch: 006, train_loss: 0.639, test_loss:0.606, train_acc: 0.82, test_acc: 0.84\n", 416 | "Epoch: 007, train_loss: 0.613, test_loss:0.570, train_acc: 0.82, test_acc: 0.84\n", 417 | "Epoch: 008, train_loss: 0.585, test_loss:0.525, train_acc: 0.82, test_acc: 0.84\n", 418 | "Epoch: 009, train_loss: 0.554, test_loss:0.477, train_acc: 0.82, test_acc: 0.84\n", 419 | "Epoch: 010, train_loss: 0.513, test_loss:0.433, train_acc: 0.82, test_acc: 0.84\n", 420 | "Epoch: 011, train_loss: 0.503, test_loss:0.403, train_acc: 0.82, test_acc: 0.84\n", 421 | "Epoch: 012, train_loss: 0.524, test_loss:0.388, train_acc: 0.82, test_acc: 0.84\n", 422 | "Epoch: 013, train_loss: 0.501, test_loss:0.385, train_acc: 0.82, test_acc: 0.83\n", 423 | "Epoch: 014, train_loss: 0.478, test_loss:0.393, train_acc: 0.82, test_acc: 0.83\n", 424 | "Epoch: 015, train_loss: 0.466, test_loss:0.414, train_acc: 0.81, test_acc: 0.82\n", 425 | "Epoch: 016, train_loss: 0.476, test_loss:0.427, train_acc: 0.81, test_acc: 0.77\n", 426 | "Epoch: 017, train_loss: 0.472, test_loss:0.416, train_acc: 0.81, test_acc: 0.80\n", 427 | "Epoch: 018, train_loss: 0.480, test_loss:0.398, train_acc: 0.82, test_acc: 0.83\n", 428 | "Epoch: 019, train_loss: 0.483, test_loss:0.389, train_acc: 0.82, test_acc: 0.84\n", 429 | "Epoch: 020, train_loss: 0.463, test_loss:0.385, train_acc: 0.82, test_acc: 0.84\n", 430 | "Epoch: 021, train_loss: 0.460, test_loss:0.386, train_acc: 0.82, test_acc: 0.84\n", 431 | "Epoch: 022, train_loss: 0.467, test_loss:0.392, train_acc: 0.82, test_acc: 0.84\n", 432 | "Epoch: 023, train_loss: 0.462, test_loss:0.400, train_acc: 0.82, test_acc: 0.84\n", 433 | "Epoch: 024, train_loss: 0.462, test_loss:0.406, train_acc: 0.81, test_acc: 0.83\n", 434 | "Epoch: 025, train_loss: 0.459, test_loss:0.405, train_acc: 0.82, test_acc: 0.84\n", 435 | "Epoch: 026, train_loss: 0.468, test_loss:0.400, train_acc: 0.82, test_acc: 0.84\n", 436 | "Epoch: 027, train_loss: 0.472, test_loss:0.393, train_acc: 0.82, test_acc: 0.84\n", 437 | "Epoch: 028, train_loss: 0.461, test_loss:0.389, train_acc: 0.82, test_acc: 0.84\n", 438 | "Epoch: 029, train_loss: 0.476, test_loss:0.387, train_acc: 0.82, test_acc: 0.84\n", 439 | "Epoch: 030, train_loss: 0.476, test_loss:0.387, train_acc: 0.82, test_acc: 0.84\n", 440 | "Epoch: 031, train_loss: 0.459, test_loss:0.387, train_acc: 0.82, test_acc: 0.84\n", 441 | "Epoch: 032, train_loss: 0.466, test_loss:0.389, train_acc: 0.82, test_acc: 0.84\n", 442 | "Epoch: 033, train_loss: 0.457, test_loss:0.388, train_acc: 0.83, test_acc: 0.84\n", 443 | "Epoch: 034, train_loss: 0.463, test_loss:0.386, train_acc: 0.82, test_acc: 0.84\n", 444 | "Epoch: 035, train_loss: 0.454, test_loss:0.385, train_acc: 0.82, test_acc: 0.85\n", 445 | "Epoch: 036, train_loss: 0.456, test_loss:0.385, train_acc: 0.82, test_acc: 0.85\n", 446 | "Epoch: 037, train_loss: 0.466, test_loss:0.386, train_acc: 0.82, test_acc: 0.85\n", 447 | "Epoch: 038, train_loss: 0.452, test_loss:0.387, train_acc: 0.83, test_acc: 0.85\n", 448 | "Epoch: 039, train_loss: 0.462, test_loss:0.389, train_acc: 0.83, test_acc: 0.85\n", 449 | "Epoch: 040, train_loss: 0.456, test_loss:0.389, train_acc: 0.83, test_acc: 0.85\n", 450 | "Epoch: 041, train_loss: 0.455, test_loss:0.388, train_acc: 0.83, test_acc: 0.85\n", 451 | "Epoch: 042, train_loss: 0.446, test_loss:0.387, train_acc: 0.83, test_acc: 0.85\n", 452 | "Epoch: 043, train_loss: 0.466, test_loss:0.386, train_acc: 0.83, test_acc: 0.85\n", 453 | "Epoch: 044, train_loss: 0.456, test_loss:0.386, train_acc: 0.83, test_acc: 0.85\n", 454 | "Epoch: 045, train_loss: 0.450, test_loss:0.384, train_acc: 0.83, test_acc: 0.85\n", 455 | "Epoch: 046, train_loss: 0.446, test_loss:0.383, train_acc: 0.83, test_acc: 0.85\n", 456 | "Epoch: 047, train_loss: 0.440, test_loss:0.383, train_acc: 0.83, test_acc: 0.85\n", 457 | "Epoch: 048, train_loss: 0.469, test_loss:0.383, train_acc: 0.83, test_acc: 0.85\n", 458 | "Epoch: 049, train_loss: 0.456, test_loss:0.383, train_acc: 0.83, test_acc: 0.84\n", 459 | "Epoch: 050, train_loss: 0.445, test_loss:0.381, train_acc: 0.83, test_acc: 0.84\n", 460 | "Epoch: 051, train_loss: 0.456, test_loss:0.378, train_acc: 0.83, test_acc: 0.84\n", 461 | "Epoch: 052, train_loss: 0.453, test_loss:0.376, train_acc: 0.83, test_acc: 0.84\n", 462 | "Epoch: 053, train_loss: 0.444, test_loss:0.374, train_acc: 0.83, test_acc: 0.84\n", 463 | "Epoch: 054, train_loss: 0.451, test_loss:0.372, train_acc: 0.83, test_acc: 0.84\n", 464 | "Epoch: 055, train_loss: 0.447, test_loss:0.371, train_acc: 0.83, test_acc: 0.85\n", 465 | "Epoch: 056, train_loss: 0.446, test_loss:0.370, train_acc: 0.83, test_acc: 0.85\n", 466 | "Epoch: 057, train_loss: 0.433, test_loss:0.369, train_acc: 0.83, test_acc: 0.85\n", 467 | "Epoch: 058, train_loss: 0.446, test_loss:0.365, train_acc: 0.83, test_acc: 0.85\n", 468 | "Epoch: 059, train_loss: 0.438, test_loss:0.363, train_acc: 0.83, test_acc: 0.85\n", 469 | "Epoch: 060, train_loss: 0.443, test_loss:0.360, train_acc: 0.83, test_acc: 0.85\n", 470 | "Epoch: 061, train_loss: 0.446, test_loss:0.359, train_acc: 0.83, test_acc: 0.85\n", 471 | "Epoch: 062, train_loss: 0.431, test_loss:0.360, train_acc: 0.84, test_acc: 0.86\n", 472 | "Epoch: 063, train_loss: 0.437, test_loss:0.360, train_acc: 0.84, test_acc: 0.86\n", 473 | "Epoch: 064, train_loss: 0.419, test_loss:0.357, train_acc: 0.85, test_acc: 0.86\n", 474 | "Epoch: 065, train_loss: 0.416, test_loss:0.353, train_acc: 0.85, test_acc: 0.87\n", 475 | "Epoch: 066, train_loss: 0.429, test_loss:0.349, train_acc: 0.85, test_acc: 0.87\n", 476 | "Epoch: 067, train_loss: 0.415, test_loss:0.343, train_acc: 0.86, test_acc: 0.88\n", 477 | "Epoch: 068, train_loss: 0.401, test_loss:0.337, train_acc: 0.87, test_acc: 0.88\n", 478 | "Epoch: 069, train_loss: 0.403, test_loss:0.330, train_acc: 0.87, test_acc: 0.88\n", 479 | "Epoch: 070, train_loss: 0.408, test_loss:0.325, train_acc: 0.88, test_acc: 0.89\n", 480 | "Epoch: 071, train_loss: 0.414, test_loss:0.324, train_acc: 0.89, test_acc: 0.89\n", 481 | "Epoch: 072, train_loss: 0.389, test_loss:0.323, train_acc: 0.89, test_acc: 0.89\n", 482 | "Epoch: 073, train_loss: 0.406, test_loss:0.318, train_acc: 0.91, test_acc: 0.90\n", 483 | "Epoch: 074, train_loss: 0.388, test_loss:0.310, train_acc: 0.91, test_acc: 0.90\n", 484 | "Epoch: 075, train_loss: 0.394, test_loss:0.304, train_acc: 0.91, test_acc: 0.90\n", 485 | "Epoch: 076, train_loss: 0.372, test_loss:0.300, train_acc: 0.90, test_acc: 0.91\n", 486 | "Epoch: 077, train_loss: 0.353, test_loss:0.298, train_acc: 0.90, test_acc: 0.91\n", 487 | "Epoch: 078, train_loss: 0.378, test_loss:0.298, train_acc: 0.90, test_acc: 0.91\n", 488 | "Epoch: 079, train_loss: 0.370, test_loss:0.299, train_acc: 0.91, test_acc: 0.91\n", 489 | "Epoch: 080, train_loss: 0.375, test_loss:0.303, train_acc: 0.91, test_acc: 0.91\n", 490 | "Epoch: 081, train_loss: 0.374, test_loss:0.309, train_acc: 0.89, test_acc: 0.90\n", 491 | "Epoch: 082, train_loss: 0.381, test_loss:0.312, train_acc: 0.89, test_acc: 0.90\n", 492 | "Epoch: 083, train_loss: 0.382, test_loss:0.304, train_acc: 0.91, test_acc: 0.90\n", 493 | "Epoch: 084, train_loss: 0.397, test_loss:0.304, train_acc: 0.91, test_acc: 0.90\n", 494 | "Epoch: 085, train_loss: 0.363, test_loss:0.306, train_acc: 0.90, test_acc: 0.90\n", 495 | "Epoch: 086, train_loss: 0.369, test_loss:0.309, train_acc: 0.90, test_acc: 0.90\n", 496 | "Epoch: 087, train_loss: 0.389, test_loss:0.316, train_acc: 0.89, test_acc: 0.89\n", 497 | "Epoch: 088, train_loss: 0.396, test_loss:0.314, train_acc: 0.90, test_acc: 0.90\n", 498 | "Epoch: 089, train_loss: 0.398, test_loss:0.312, train_acc: 0.90, test_acc: 0.90\n", 499 | "Epoch: 090, train_loss: 0.359, test_loss:0.312, train_acc: 0.91, test_acc: 0.91\n", 500 | "Epoch: 091, train_loss: 0.378, test_loss:0.316, train_acc: 0.90, test_acc: 0.90\n", 501 | "Epoch: 092, train_loss: 0.370, test_loss:0.319, train_acc: 0.90, test_acc: 0.91\n", 502 | "Epoch: 093, train_loss: 0.379, test_loss:0.321, train_acc: 0.90, test_acc: 0.90\n", 503 | "Epoch: 094, train_loss: 0.341, test_loss:0.313, train_acc: 0.91, test_acc: 0.91\n", 504 | "Epoch: 095, train_loss: 0.377, test_loss:0.312, train_acc: 0.91, test_acc: 0.91\n", 505 | "Epoch: 096, train_loss: 0.371, test_loss:0.315, train_acc: 0.90, test_acc: 0.91\n", 506 | "Epoch: 097, train_loss: 0.374, test_loss:0.322, train_acc: 0.91, test_acc: 0.90\n", 507 | "Epoch: 098, train_loss: 0.367, test_loss:0.319, train_acc: 0.91, test_acc: 0.91\n", 508 | "Epoch: 099, train_loss: 0.359, test_loss:0.315, train_acc: 0.91, test_acc: 0.91\n", 509 | "Epoch: 100, train_loss: 0.365, test_loss:0.312, train_acc: 0.91, test_acc: 0.91\n", 510 | "Epoch: 101, train_loss: 0.377, test_loss:0.315, train_acc: 0.91, test_acc: 0.91\n", 511 | "Epoch: 102, train_loss: 0.352, test_loss:0.323, train_acc: 0.91, test_acc: 0.91\n", 512 | "Epoch: 103, train_loss: 0.362, test_loss:0.327, train_acc: 0.90, test_acc: 0.88\n", 513 | "Epoch: 104, train_loss: 0.362, test_loss:0.312, train_acc: 0.91, test_acc: 0.91\n", 514 | "Epoch: 105, train_loss: 0.356, test_loss:0.307, train_acc: 0.91, test_acc: 0.92\n", 515 | "Epoch: 106, train_loss: 0.346, test_loss:0.309, train_acc: 0.92, test_acc: 0.92\n", 516 | "Epoch: 107, train_loss: 0.366, test_loss:0.318, train_acc: 0.91, test_acc: 0.90\n", 517 | "Epoch: 108, train_loss: 0.360, test_loss:0.322, train_acc: 0.91, test_acc: 0.89\n", 518 | "Epoch: 109, train_loss: 0.377, test_loss:0.311, train_acc: 0.92, test_acc: 0.92\n", 519 | "Epoch: 110, train_loss: 0.362, test_loss:0.304, train_acc: 0.91, test_acc: 0.92\n", 520 | "Epoch: 111, train_loss: 0.396, test_loss:0.304, train_acc: 0.91, test_acc: 0.92\n", 521 | "Epoch: 112, train_loss: 0.360, test_loss:0.314, train_acc: 0.91, test_acc: 0.90\n", 522 | "Epoch: 113, train_loss: 0.368, test_loss:0.332, train_acc: 0.89, test_acc: 0.85\n", 523 | "Epoch: 114, train_loss: 0.352, test_loss:0.309, train_acc: 0.92, test_acc: 0.91\n", 524 | "Epoch: 115, train_loss: 0.355, test_loss:0.302, train_acc: 0.91, test_acc: 0.92\n", 525 | "Epoch: 116, train_loss: 0.352, test_loss:0.301, train_acc: 0.91, test_acc: 0.92\n", 526 | "Epoch: 117, train_loss: 0.371, test_loss:0.307, train_acc: 0.92, test_acc: 0.91\n", 527 | "Epoch: 118, train_loss: 0.364, test_loss:0.336, train_acc: 0.90, test_acc: 0.84\n", 528 | "Epoch: 119, train_loss: 0.361, test_loss:0.325, train_acc: 0.90, test_acc: 0.86\n", 529 | "Epoch: 120, train_loss: 0.345, test_loss:0.304, train_acc: 0.92, test_acc: 0.92\n", 530 | "Epoch: 121, train_loss: 0.359, test_loss:0.306, train_acc: 0.91, test_acc: 0.92\n", 531 | "Epoch: 122, train_loss: 0.357, test_loss:0.304, train_acc: 0.92, test_acc: 0.92\n", 532 | "Epoch: 123, train_loss: 0.367, test_loss:0.312, train_acc: 0.92, test_acc: 0.91\n", 533 | "Epoch: 124, train_loss: 0.346, test_loss:0.325, train_acc: 0.92, test_acc: 0.87\n", 534 | "Epoch: 125, train_loss: 0.350, test_loss:0.317, train_acc: 0.92, test_acc: 0.89\n", 535 | "Epoch: 126, train_loss: 0.354, test_loss:0.307, train_acc: 0.92, test_acc: 0.92\n", 536 | "Epoch: 127, train_loss: 0.325, test_loss:0.304, train_acc: 0.93, test_acc: 0.92\n", 537 | "Epoch: 128, train_loss: 0.363, test_loss:0.306, train_acc: 0.93, test_acc: 0.92\n", 538 | "Epoch: 129, train_loss: 0.348, test_loss:0.312, train_acc: 0.93, test_acc: 0.91\n", 539 | "Epoch: 130, train_loss: 0.330, test_loss:0.320, train_acc: 0.91, test_acc: 0.89\n", 540 | "Epoch: 131, train_loss: 0.346, test_loss:0.314, train_acc: 0.92, test_acc: 0.90\n", 541 | "Epoch: 132, train_loss: 0.336, test_loss:0.309, train_acc: 0.93, test_acc: 0.91\n", 542 | "Epoch: 133, train_loss: 0.336, test_loss:0.306, train_acc: 0.92, test_acc: 0.92\n", 543 | "Epoch: 134, train_loss: 0.346, test_loss:0.307, train_acc: 0.93, test_acc: 0.92\n", 544 | "Epoch: 135, train_loss: 0.359, test_loss:0.312, train_acc: 0.93, test_acc: 0.91\n", 545 | "Epoch: 136, train_loss: 0.336, test_loss:0.317, train_acc: 0.92, test_acc: 0.89\n", 546 | "Epoch: 137, train_loss: 0.332, test_loss:0.314, train_acc: 0.93, test_acc: 0.91\n", 547 | "Epoch: 138, train_loss: 0.353, test_loss:0.311, train_acc: 0.93, test_acc: 0.91\n", 548 | "Epoch: 139, train_loss: 0.333, test_loss:0.311, train_acc: 0.93, test_acc: 0.92\n", 549 | "Epoch: 140, train_loss: 0.359, test_loss:0.313, train_acc: 0.93, test_acc: 0.92\n", 550 | "Epoch: 141, train_loss: 0.362, test_loss:0.323, train_acc: 0.91, test_acc: 0.88\n", 551 | "Epoch: 142, train_loss: 0.359, test_loss:0.327, train_acc: 0.91, test_acc: 0.88\n", 552 | "Epoch: 143, train_loss: 0.336, test_loss:0.312, train_acc: 0.93, test_acc: 0.91\n", 553 | "Epoch: 144, train_loss: 0.337, test_loss:0.309, train_acc: 0.93, test_acc: 0.91\n", 554 | "Epoch: 145, train_loss: 0.356, test_loss:0.310, train_acc: 0.93, test_acc: 0.92\n", 555 | "Epoch: 146, train_loss: 0.336, test_loss:0.315, train_acc: 0.93, test_acc: 0.90\n", 556 | "Epoch: 147, train_loss: 0.357, test_loss:0.327, train_acc: 0.92, test_acc: 0.87\n", 557 | "Epoch: 148, train_loss: 0.340, test_loss:0.320, train_acc: 0.93, test_acc: 0.90\n", 558 | "Epoch: 149, train_loss: 0.349, test_loss:0.314, train_acc: 0.93, test_acc: 0.91\n", 559 | "Epoch: 150, train_loss: 0.352, test_loss:0.314, train_acc: 0.93, test_acc: 0.91\n", 560 | "Epoch: 151, train_loss: 0.330, test_loss:0.315, train_acc: 0.94, test_acc: 0.91\n", 561 | "Epoch: 152, train_loss: 0.331, test_loss:0.318, train_acc: 0.93, test_acc: 0.91\n", 562 | "Epoch: 153, train_loss: 0.341, test_loss:0.321, train_acc: 0.93, test_acc: 0.90\n", 563 | "Epoch: 154, train_loss: 0.340, test_loss:0.323, train_acc: 0.93, test_acc: 0.89\n", 564 | "Epoch: 155, train_loss: 0.351, test_loss:0.319, train_acc: 0.93, test_acc: 0.91\n", 565 | "Epoch: 156, train_loss: 0.340, test_loss:0.318, train_acc: 0.94, test_acc: 0.91\n", 566 | "Epoch: 157, train_loss: 0.360, test_loss:0.320, train_acc: 0.93, test_acc: 0.91\n", 567 | "Epoch: 158, train_loss: 0.336, test_loss:0.324, train_acc: 0.94, test_acc: 0.91\n", 568 | "Epoch: 159, train_loss: 0.336, test_loss:0.324, train_acc: 0.94, test_acc: 0.91\n", 569 | "Epoch: 160, train_loss: 0.331, test_loss:0.323, train_acc: 0.94, test_acc: 0.90\n", 570 | "Epoch: 161, train_loss: 0.335, test_loss:0.324, train_acc: 0.94, test_acc: 0.92\n", 571 | "Epoch: 162, train_loss: 0.359, test_loss:0.324, train_acc: 0.94, test_acc: 0.90\n", 572 | "Epoch: 163, train_loss: 0.338, test_loss:0.326, train_acc: 0.94, test_acc: 0.91\n", 573 | "Epoch: 164, train_loss: 0.321, test_loss:0.328, train_acc: 0.93, test_acc: 0.90\n", 574 | "Epoch: 165, train_loss: 0.328, test_loss:0.326, train_acc: 0.93, test_acc: 0.90\n", 575 | "Epoch: 166, train_loss: 0.322, test_loss:0.324, train_acc: 0.94, test_acc: 0.91\n", 576 | "Epoch: 167, train_loss: 0.342, test_loss:0.325, train_acc: 0.94, test_acc: 0.91\n", 577 | "Epoch: 168, train_loss: 0.312, test_loss:0.326, train_acc: 0.94, test_acc: 0.91\n", 578 | "Epoch: 169, train_loss: 0.331, test_loss:0.328, train_acc: 0.94, test_acc: 0.90\n", 579 | "Epoch: 170, train_loss: 0.312, test_loss:0.326, train_acc: 0.94, test_acc: 0.91\n", 580 | "Epoch: 171, train_loss: 0.341, test_loss:0.326, train_acc: 0.94, test_acc: 0.91\n", 581 | "Epoch: 172, train_loss: 0.318, test_loss:0.326, train_acc: 0.94, test_acc: 0.91\n", 582 | "Epoch: 173, train_loss: 0.333, test_loss:0.326, train_acc: 0.94, test_acc: 0.91\n", 583 | "Epoch: 174, train_loss: 0.333, test_loss:0.326, train_acc: 0.94, test_acc: 0.91\n", 584 | "Epoch: 175, train_loss: 0.354, test_loss:0.331, train_acc: 0.93, test_acc: 0.89\n", 585 | "Epoch: 176, train_loss: 0.321, test_loss:0.326, train_acc: 0.94, test_acc: 0.89\n", 586 | "Epoch: 177, train_loss: 0.321, test_loss:0.318, train_acc: 0.95, test_acc: 0.91\n", 587 | "Epoch: 178, train_loss: 0.341, test_loss:0.316, train_acc: 0.94, test_acc: 0.91\n", 588 | "Epoch: 179, train_loss: 0.320, test_loss:0.317, train_acc: 0.94, test_acc: 0.91\n", 589 | "Epoch: 180, train_loss: 0.301, test_loss:0.322, train_acc: 0.94, test_acc: 0.89\n", 590 | "Epoch: 181, train_loss: 0.303, test_loss:0.318, train_acc: 0.94, test_acc: 0.90\n", 591 | "Epoch: 182, train_loss: 0.310, test_loss:0.317, train_acc: 0.95, test_acc: 0.92\n", 592 | "Epoch: 183, train_loss: 0.320, test_loss:0.321, train_acc: 0.94, test_acc: 0.91\n", 593 | "Epoch: 184, train_loss: 0.313, test_loss:0.325, train_acc: 0.94, test_acc: 0.91\n", 594 | "Epoch: 185, train_loss: 0.335, test_loss:0.327, train_acc: 0.93, test_acc: 0.89\n", 595 | "Epoch: 186, train_loss: 0.316, test_loss:0.321, train_acc: 0.94, test_acc: 0.91\n", 596 | "Epoch: 187, train_loss: 0.302, test_loss:0.319, train_acc: 0.94, test_acc: 0.92\n", 597 | "Epoch: 188, train_loss: 0.306, test_loss:0.320, train_acc: 0.94, test_acc: 0.91\n", 598 | "Epoch: 189, train_loss: 0.343, test_loss:0.318, train_acc: 0.94, test_acc: 0.91\n", 599 | "Epoch: 190, train_loss: 0.314, test_loss:0.311, train_acc: 0.94, test_acc: 0.91\n", 600 | "Epoch: 191, train_loss: 0.310, test_loss:0.315, train_acc: 0.94, test_acc: 0.90\n", 601 | "Epoch: 192, train_loss: 0.334, test_loss:0.315, train_acc: 0.94, test_acc: 0.91\n", 602 | "Epoch: 193, train_loss: 0.316, test_loss:0.312, train_acc: 0.94, test_acc: 0.92\n", 603 | "Epoch: 194, train_loss: 0.328, test_loss:0.315, train_acc: 0.94, test_acc: 0.90\n", 604 | "Epoch: 195, train_loss: 0.303, test_loss:0.324, train_acc: 0.94, test_acc: 0.89\n", 605 | "Epoch: 196, train_loss: 0.304, test_loss:0.318, train_acc: 0.94, test_acc: 0.91\n", 606 | "Epoch: 197, train_loss: 0.328, test_loss:0.309, train_acc: 0.95, test_acc: 0.91\n", 607 | "Epoch: 198, train_loss: 0.323, test_loss:0.309, train_acc: 0.95, test_acc: 0.91\n", 608 | "Epoch: 199, train_loss: 0.296, test_loss:0.312, train_acc: 0.95, test_acc: 0.91\n" 609 | ], 610 | "name": "stdout" 611 | } 612 | ] 613 | }, 614 | { 615 | "metadata": { 616 | "id": "adWu02_enxNp", 617 | "colab_type": "text" 618 | }, 619 | "cell_type": "markdown", 620 | "source": [ 621 | "# References\n", 622 | "[1] https://github.com/rusty1s/pytorch_geometric\n", 623 | "\n", 624 | "[2] https://rusty1s.github.io/pytorch_geometric/build/html/notes/introduction.html\n", 625 | "\n", 626 | "[3] https://tkipf.github.io/graph-convolutional-networks/" 627 | ] 628 | } 629 | ] 630 | } -------------------------------------------------------------------------------- /Eager_Execution_Gradient_.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Eager Execution - Gradient .ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "accelerator": "GPU" 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "[View in Colaboratory](https://colab.research.google.com/github/zaidalyafeai/Notebooks/blob/master/Eager_Execution_Gradient_.ipynb)" 26 | ] 27 | }, 28 | { 29 | "metadata": { 30 | "id": "KSrDBc-DNg8x", 31 | "colab_type": "text" 32 | }, 33 | "cell_type": "markdown", 34 | "source": [ 35 | "# Enable Eager Execution " 36 | ] 37 | }, 38 | { 39 | "metadata": { 40 | "id": "dHHHtRg0NdGx", 41 | "colab_type": "code", 42 | "colab": { 43 | "base_uri": "https://localhost:8080/", 44 | "height": 34 45 | }, 46 | "outputId": "960741ef-ef3e-499e-98a1-9c0928f34743" 47 | }, 48 | "cell_type": "code", 49 | "source": [ 50 | "import tensorflow as tf\n", 51 | "\n", 52 | "tf.enable_eager_execution()\n", 53 | "\n", 54 | "print('Is it enabled ? ', tf.executing_eagerly())" 55 | ], 56 | "execution_count": 1, 57 | "outputs": [ 58 | { 59 | "output_type": "stream", 60 | "text": [ 61 | "Is it enabled ? True\n" 62 | ], 63 | "name": "stdout" 64 | } 65 | ] 66 | }, 67 | { 68 | "metadata": { 69 | "id": "Y7CmID5qNzAY", 70 | "colab_type": "text" 71 | }, 72 | "cell_type": "markdown", 73 | "source": [ 74 | "As we know that TenosrFlow works with static graphs. So, first you have to create the graph then execute it later. This makes debugging a bit complicated. With Eager Execution you can now evalute operations directly without creating a session." 75 | ] 76 | }, 77 | { 78 | "metadata": { 79 | "id": "zuVssr1XNt0z", 80 | "colab_type": "code", 81 | "colab": { 82 | "base_uri": "https://localhost:8080/", 83 | "height": 34 84 | }, 85 | "outputId": "70d0eb8c-70c8-4c42-96f5-b0625a5d122a" 86 | }, 87 | "cell_type": "code", 88 | "source": [ 89 | "x = 2\n", 90 | "m = tf.square(x)\n", 91 | "print(\"x^2 = {}\".format(m))" 92 | ], 93 | "execution_count": 6, 94 | "outputs": [ 95 | { 96 | "output_type": "stream", 97 | "text": [ 98 | "x^2 = 4\n" 99 | ], 100 | "name": "stdout" 101 | } 102 | ] 103 | }, 104 | { 105 | "metadata": { 106 | "id": "QhV6mfuJOdgb", 107 | "colab_type": "text" 108 | }, 109 | "cell_type": "markdown", 110 | "source": [ 111 | "Cool, isn't it ? Now, let us look at a very important function called [tf.GradientTape ](https://https://www.tensorflow.org/api_docs/python/tf/GradientTape). This function allows you to record automatic differentiation operations. So If you evaluted a loss function $L$ included inside the scope of that function, you can evaluate the gradient with respect to the input. Seems complicated ? Let us see an example. Suppose, we want to find the derivative of a simple function $f(x) = x^2$ at $x = 2$. Then do the following " 112 | ] 113 | }, 114 | { 115 | "metadata": { 116 | "id": "0CPedGTyPcKy", 117 | "colab_type": "code", 118 | "colab": { 119 | "base_uri": "https://localhost:8080/", 120 | "height": 34 121 | }, 122 | "outputId": "cbaf0557-00e3-45c5-8327-27dc2f8c8720" 123 | }, 124 | "cell_type": "code", 125 | "source": [ 126 | "x = tf.Variable([[2.0]])\n", 127 | "with tf.GradientTape() as tape:\n", 128 | " loss = x * x\n", 129 | "\n", 130 | "grad = tape.gradient(loss, x)\n", 131 | "print(grad)" 132 | ], 133 | "execution_count": 2, 134 | "outputs": [ 135 | { 136 | "output_type": "stream", 137 | "text": [ 138 | "tf.Tensor([[4.]], shape=(1, 1), dtype=float32)\n" 139 | ], 140 | "name": "stdout" 141 | } 142 | ] 143 | }, 144 | { 145 | "metadata": { 146 | "id": "lOrG97j3Pos8", 147 | "colab_type": "text" 148 | }, 149 | "cell_type": "markdown", 150 | "source": [ 151 | "As you can see the derivative is evaluted correctly as $f'(x = 2) = 2 \\times 2 = 4$. This function is very important as it allows you to closely watch the gradient of the graph you are working on. Moreoever, you can calcuate the gradient of an arbitrary loss function and update the parameters accordingly. Let us take a look at a complicated example. " 152 | ] 153 | }, 154 | { 155 | "metadata": { 156 | "id": "SiQOSWTWQHaG", 157 | "colab_type": "text" 158 | }, 159 | "cell_type": "markdown", 160 | "source": [ 161 | "# Evaluate the Gradient of a CNN\n", 162 | "In this example we will create a loss function on a CNN model. Then we will evaluate the gradient and update the parameters of the model. We will train the model on mnist dataset." 163 | ] 164 | }, 165 | { 166 | "metadata": { 167 | "id": "djhdKtR4QfKY", 168 | "colab_type": "text" 169 | }, 170 | "cell_type": "markdown", 171 | "source": [ 172 | "## Load Data" 173 | ] 174 | }, 175 | { 176 | "metadata": { 177 | "id": "qtNUKwbGQjUG", 178 | "colab_type": "code", 179 | "colab": { 180 | "base_uri": "https://localhost:8080/", 181 | "height": 52 182 | }, 183 | "outputId": "9efd36a8-d2f4-4173-cd32-98d7d13f82f0" 184 | }, 185 | "cell_type": "code", 186 | "source": [ 187 | "(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()" 188 | ], 189 | "execution_count": 3, 190 | "outputs": [ 191 | { 192 | "output_type": "stream", 193 | "text": [ 194 | "Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz\n", 195 | "11493376/11490434 [==============================] - 1s 0us/step\n" 196 | ], 197 | "name": "stdout" 198 | } 199 | ] 200 | }, 201 | { 202 | "metadata": { 203 | "id": "Oq-tfTzPQslc", 204 | "colab_type": "text" 205 | }, 206 | "cell_type": "markdown", 207 | "source": [ 208 | "## Preprocess the Dataset" 209 | ] 210 | }, 211 | { 212 | "metadata": { 213 | "id": "cQDTxumRQxVW", 214 | "colab_type": "code", 215 | "colab": {} 216 | }, 217 | "cell_type": "code", 218 | "source": [ 219 | "import numpy as np\n", 220 | "\n", 221 | "\n", 222 | "x_train = tf.expand_dims(np.float32(x_train)/ 255., 3)\n", 223 | "x_test = tf.expand_dims(np.float32(x_test )/ 255., 3)\n", 224 | "\n", 225 | "y_train = tf.one_hot(y_train, 10)\n", 226 | "y_test = tf.one_hot(y_test , 10)" 227 | ], 228 | "execution_count": 0, 229 | "outputs": [] 230 | }, 231 | { 232 | "metadata": { 233 | "id": "exd7YnKBQ4-r", 234 | "colab_type": "text" 235 | }, 236 | "cell_type": "markdown", 237 | "source": [ 238 | "## Look at the Data\n", 239 | "\n", 240 | "Now since we use Eager Execution we can easily look at the data by calling `.numpy()` on the tenosr. This will return the numpy array " 241 | ] 242 | }, 243 | { 244 | "metadata": { 245 | "id": "Nk7O5A2WRLST", 246 | "colab_type": "code", 247 | "colab": { 248 | "base_uri": "https://localhost:8080/", 249 | "height": 364 250 | }, 251 | "outputId": "e339c356-4b57-4116-ab05-22e105b33d55" 252 | }, 253 | "cell_type": "code", 254 | "source": [ 255 | "import matplotlib.pyplot as plt\n", 256 | "\n", 257 | "img = x_train[0].numpy().squeeze()\n", 258 | "lbl = np.argmax(y_train[0])\n", 259 | "print('Label :', lbl)\n", 260 | "plt.imshow(img)\n", 261 | "plt.show()" 262 | ], 263 | "execution_count": 5, 264 | "outputs": [ 265 | { 266 | "output_type": "stream", 267 | "text": [ 268 | "Label : 5\n" 269 | ], 270 | "name": "stdout" 271 | }, 272 | { 273 | "output_type": "display_data", 274 | "data": { 275 | "image/png": "iVBORw0KGgoAAAANSUhEUgAAAUsAAAFKCAYAAACU6307AAAABHNCSVQICAgIfAhkiAAAAAlwSFlz\nAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBo\ndHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAEyJJREFUeJzt3X1MlfX/x/HXiRPCGTgEOWxu3c2p\nsdQ5GxaaJjezdGt5UxkMXcstrUneZI5R0o2bKGFLpE2htCZrnUW2anOD7GYzhzhZo0ErzC1HZohF\n5g0anPj98dv3TBTlzeEcrgM9H391PufN57yvrnrtc53rXNfl6unp6REA4KZucboBABgOCEsAMCAs\nAcCAsAQAA8ISAAwISwAwICwBwICwBAADd7B/uGXLFjU2NsrlcqmwsFBTp04NZV8AEFGCCsujR4/q\n5MmT8vl8OnHihAoLC+Xz+ULdGwBEjKAOw+vq6pSdnS1JGj9+vM6dO6cLFy6EtDEAiCRBheXZs2c1\nZsyYwOvExES1t7eHrCkAiDQhOcHDvTgAjHRBhaXX69XZs2cDr8+cOaPk5OSQNQUAkSaosJw1a5Zq\namokSc3NzfJ6vYqLiwtpYwAQSYI6Gz59+nTdc889evLJJ+VyufTKK6+Eui8AiCgubv4LAP3jCh4A\nMCAsAcCAsAQAA8ISAAwISwAwICwBwICwBAADwhIADAhLADAgLAHAgLAEAAPCEgAMCEsAMCAsAcCA\nsAQAA8ISAAwISwAwICwBwICwBAADwhIADAhLADAgLAHAgLAEAAPCEgAMCEsAMCAsAcCAsAQAA8IS\nAAwISwAwICwBwICwBAADwhIADAhLADAgLAHAgLAEAAPCEgAMCEsAMCAsAcCAsAQAA8ISAAwISwAw\nICwBwMDtdAMY+f79919z7ZUrV8LYSW+xsbHq7OzsNfb++++b/vbixYvmz/nhhx/MtW+99Za5trCw\n8LqxnTt3Kj8/v9dYeXm5ec7Y2Fhz7fbt2011zz77rHnOSMbKEgAMglpZ1tfXa82aNZowYYIkaeLE\nidq0aVNIGwOASBL0YfiMGTNUVlYWyl4AIGJxGA4ABkGH5c8//6xVq1YpJydHhw8fDmVPABBxXD09\nPT0D/aO2tjY1NDRo/vz5am1t1fLly1VbW6vo6Ohw9AgAjgvqO8uUlBQtWLBAknT77bdr7Nixamtr\n02233RbS5jAy8NMhfjo0EgR1GP7ZZ5/p3XfflSS1t7frjz/+UEpKSkgbA4BIEtTKMjMzUxs2bNCX\nX36prq4uvfrqqxyCAxjRggrLuLg47dq1K9S9AEDECuoED5x37tw5c63f7zfXNjY29jmekZGhr7/+\nOvC6trbWPOdff/1lrq2oqDDXDpbf71dUVFTYP+fOO+8012ZlZZlr//dV2NX62qb4+HjznLNnzzbX\nlpaWmuomTZpknjOS8TtLADAgLAHAgLAEAAPCEgAMCEsAMCAsAcCAsAQAA8ISAAwISwAwICwBwIDL\nHSPMr7/+aqqbNm2aec6Ojo5g2wkYqksDh9JgtumWW+zrjC+++MJcO5BbpPXlvvvuU319fa8xr9dr\n/vu4uDhzbXJysrl2JGBlCQAGhCUAGBCWAGBAWAKAAWEJAAaEJQAYEJYAYEBYAoABYQkABkE93RHh\nk5SUZKobyHPaQ3EFT6SZN2+eufZm/05zcnJ6vd6/f79pzlGjRpk/f+7cuebaULjvvvuG9PP+K1hZ\nAoABYQkABoQlABgQlgBgQFgCgAFhCQAGhCUAGBCWAGBAWAKAAWEJAAZc7hhhrA+seu+998xzVldX\nm2vT09Nv+N7HH38c+OclS5aY5xyIBx54wFT36aefmueMjo6+4XtVVVW9Xv/++++mOXfs2GH+fIwM\nrCwBwICwBAADwhIADAhLADAgLAHAgLAEAAPCEgAMCEsAMCAsAcCAsAQAA1dPT0+P000gvK5cuWKu\nvdGlgS6XS1f/p1JYWGies6SkxFz79ddfm+rmzJljnhMIBdPKsqWlRdnZ2YHraE+fPq1ly5YpNzdX\na9as0T///BPWJgHAaf2G5aVLl7R58+ZeN1goKytTbm6uPvjgA91xxx0DulEDAAxH/YZldHS0Kisr\n5fV6A2P19fXKysqSJGVkZKiuri58HQJABOj3Fm1ut1tud++yzs7OwHdbSUlJam9vD093ABAhBn0/\nS84PRb5Ro0aFZB6XyxX45+LiYvPfDaQWiFRBhaXH49Hly5cVExOjtra2XofoiDycDQcGL6jfWc6c\nOVM1NTWSpNraWs2ePTukTQFApOl3ZdnU1KRt27bp1KlTcrvdqqmpUWlpqQoKCuTz+TRu3DgtXLhw\nKHoFAMf0G5aTJ0/Wvn37rhvfu3dvWBoCgEjEA8v+A8JxgmfMmDEhmfNaZWVlprqBfPVzdd9AsLg2\nHAAMCEsAMCAsAcCAsAQAA8ISAAwISwAwICwBwICwBAADwhIADAhLADDggWUIykCeu5Sbm2uu/eST\nT0x1jY2N5jknT55srgVuhJUlABgQlgBgQFgCgAFhCQAGhCUAGBCWAGBAWAKAAWEJAAaEJQAYEJYA\nYMDljgi7P//801w7fvx4U11iYqJ5zhs913779u164YUXeo3NmjXLNOeiRYvMn8/TJUcGVpYAYEBY\nAoABYQkABoQlABgQlgBgQFgCgAFhCQAGhCUAGBCWAGDAFTyIKEePHjXVPfzww+Y5z5071+e43+9X\nVFSUeZ6r7dmzx1y7ZMkSc21cXFww7WAIsLIEAAPCEgAMCEsAMCAsAcCAsAQAA8ISAAwISwAwICwB\nwICwBAADwhIADNxONwBcbcaMGaa65uZm85zr1q274XuPP/54r9cfffSRac6nn37a/PknTpww1774\n4ovm2vj4eHMtBo+VJQAYmMKypaVF2dnZqqqqkiQVFBTokUce0bJly7Rs2TJ988034ewRABzX72H4\npUuXtHnzZqWnp/caX79+vTIyMsLWGABEkn5XltHR0aqsrJTX6x2KfgAgIpnvZ7lz506NGTNGeXl5\nKigoUHt7u7q6upSUlKRNmzYpMTEx3L0CgGOCOhv+6KOPKiEhQampqaqoqFB5ebmKiopC3RtwQ6dP\nnzbX3uhs+Icffqgnn3yy15j1bPhAvPTSS+ZazoZHrqDOhqenpys1NVWSlJmZqZaWlpA2BQCRJqiw\nzM/PV2trqySpvr5eEyZMCGlTABBp+j0Mb2pq0rZt23Tq1Cm53W7V1NQoLy9Pa9euVWxsrDwej4qL\ni4eiVwBwTL9hOXnyZO3bt++68YceeigsDQFAJOLpjhjxLl++3Od4TEzMde8dOXLENGd2drb58wfy\nv9hjjz1mrvX5fOZaDB6XOwKAAWEJAAaEJQAYEJYAYEBYAoABYQkABoQlABgQlgBgQFgCgAFhCQAG\nXO4IBGHUqFHm2u7ubnOt222/xez3339/3dikSZP0008/XTeGwWNlCQAGhCUAGBCWAGBAWAKAAWEJ\nAAaEJQAYEJYAYEBYAoABYQkABvbLBYAI8ttvv5lr9+/f3+f46tWrVV5e3musrq7ONOdArsoZiLS0\nNHPtxIkTBzSOwWFlCQAGhCUAGBCWAGBAWAKAAWEJAAaEJQAYEJYAYEBYAoABYQkABoQlABjwwDKE\nXXt7u7n27bffNtXt3bvXPOevv/7a57jf71dUVJR5nmAN5DOeeOIJc21VVVUw7SBIrCwBwICwBAAD\nwhIADAhLADAgLAHAgLAEAAPCEgAMCEsAMCAsAcCAsAQAA57uiF4uXLjQ53hcXFyv9z7//HPznK+/\n/rq5tqWlxVzrpMzMTHPt1q1bzbX33ntvMO1gCJjCsqSkRA0NDeru7tbKlSs1ZcoUbdy4UX6/X8nJ\nyXrjjTcUHR0d7l4BwDH9huWRI0d0/Phx+Xw+dXR0aNGiRUpPT1dubq7mz5+vN998U9XV1crNzR2K\nfgHAEf1+Z5mWlqYdO3ZIkkaPHq3Ozk7V19crKytLkpSRkWF+MD0ADFf9hmVUVJQ8Ho8kqbq6WnPm\nzFFnZ2fgsDspKWlAt+ACgOHIfILn4MGDqq6u1p49ezRv3rzAOLfDHFni4uJM7+Xk5JjnHEjtUPP7\n/U63gGHCFJaHDh3Srl279M477yg+Pl4ej0eXL19WTEyM2tra5PV6w90nhsh/6Wz4YG7+y9nw/55+\nD8PPnz+vkpIS7d69WwkJCZKkmTNnqqamRpJUW1ur2bNnh7dLAHBYvyvLAwcOqKOjQ2vXrg2Mbd26\nVS+//LJ8Pp/GjRunhQsXhrVJAHBav2G5dOlSLV269LrxgTwDBQCGO67gGaYuXrxorm1tbTXX5uXl\n9Tl+7NgxzZ07N/D6u+++M8/ptKtPSPb33muvvWaaMy0tzfz5LpfLXIvIxbXhAGBAWAKAAWEJAAaE\nJQAYEJYAYEBYAoABYQkABoQlABgQlgBgQFgCgIGrhxtShl1nZ6e59uobltzMt99+a57zxx9/NNfe\nyGBuZzYQCxYsMNUVFRWZ55w2bVqf47feequ6urquGwP6wsoSAAwISwAwICwBwICwBAADwhIADAhL\nADAgLAHAgLAEAAPCEgAMCEsAMODpjtf45ZdfTHVbtmzpc7yiokLPPPNMr7GDBw+aP//kyZPmWid5\nPB5z7ebNm821zz33nKkuOjraPOfNcHkjrFhZAoABYQkABoQlABgQlgBgQFgCgAFhCQAGhCUAGBCW\nAGBAWAKAAQ8su8b27dtNdRs3buxzfKge7DV9+nRzbU5OjrnW7e77oq7nn39eZWVlgdfXXqV0MzEx\nMeZaIFKxsgQAA8ISAAwISwAwICwBwICwBAADwhIADAhLADAgLAHAgLAEAAPCEgAMuNwRAAxMT3cs\nKSlRQ0ODuru7tXLlSn311Vdqbm5WQkKCJGnFihWaO3duOPsEAEf1G5ZHjhzR8ePH5fP51NHRoUWL\nFun+++/X+vXrlZGRMRQ9AoDj+g3LtLQ0TZ06VZI0evRodXZ2yu/3h70xAIgkA/rO0ufz6dixY4qK\nilJ7e7u6urqUlJSkTZs2KTExMZx9AoCjzGF58OBB7d69W3v27FFTU5MSEhKUmpqqiooK/f777yoq\nKgp3rwDgGNNPhw4dOqRdu3apsrJS8fHxSk9PV2pqqiQpMzNTLS0tYW0SAJzWb1ieP39eJSUl2r17\nd+Dsd35+vlpbWyVJ9fX1mjBhQni7BACH9XuC58CBA+ro6NDatWsDY4sXL9batWsVGxsrj8ej4uLi\nsDYJAE7jR+kAYMDljgBgQFgCgAFhCQAGhCUAGBCWAGBAWAKAAWEJAAaEJQAYEJYAYEBYAoABYQkA\nBoQlABgQlgBgQFgCgAFhCQAGhCUAGBCWAGBAWAKAAWEJAAaEJQAYEJYAYEBYAoABYQkABoQlABgQ\nlgBgQFgCgAFhCQAGhCUAGBCWAGDgduJDt2zZosbGRrlcLhUWFmrq1KlOtBFS9fX1WrNmjSZMmCBJ\nmjhxojZt2uRwV8FraWnRc889p6eeekp5eXk6ffq0Nm7cKL/fr+TkZL3xxhuKjo52us0BuXabCgoK\n1NzcrISEBEnSihUrNHfuXGebHKCSkhI1NDSou7tbK1eu1JQpU4b9fpKu366vvvrK8X015GF59OhR\nnTx5Uj6fTydOnFBhYaF8Pt9QtxEWM2bMUFlZmdNtDNqlS5e0efNmpaenB8bKysqUm5ur+fPn6803\n31R1dbVyc3Md7HJg+tomSVq/fr0yMjIc6mpwjhw5ouPHj8vn86mjo0OLFi1Senr6sN5PUt/bdf/9\n9zu+r4b8MLyurk7Z2dmSpPHjx+vcuXO6cOHCULeBm4iOjlZlZaW8Xm9grL6+XllZWZKkjIwM1dXV\nOdVeUPrapuEuLS1NO3bskCSNHj1anZ2dw34/SX1vl9/vd7grB8Ly7NmzGjNmTOB1YmKi2tvbh7qN\nsPj555+1atUq5eTk6PDhw063EzS3262YmJheY52dnYHDuaSkpGG3z/raJkmqqqrS8uXLtW7dOv35\n558OdBa8qKgoeTweSVJ1dbXmzJkz7PeT1Pd2RUVFOb6vHPnO8mo9PT1OtxASd955p1avXq358+er\ntbVVy5cvV21t7bD8vqg/I2WfPfroo0pISFBqaqoqKipUXl6uoqIip9sasIMHD6q6ulp79uzRvHnz\nAuPDfT9dvV1NTU2O76shX1l6vV6dPXs28PrMmTNKTk4e6jZCLiUlRQsWLJDL5dLtt9+usWPHqq2t\nzem2Qsbj8ejy5cuSpLa2thFxOJuenq7U1FRJUmZmplpaWhzuaOAOHTqkXbt2qbKyUvHx8SNmP127\nXZGwr4Y8LGfNmqWamhpJUnNzs7xer+Li4oa6jZD77LPP9O6770qS2tvb9ccffyglJcXhrkJn5syZ\ngf1WW1ur2bNnO9zR4OXn56u1tVXS/38n+79fMgwX58+fV0lJiXbv3h04SzwS9lNf2xUJ+8rV48Ba\nvbS0VMeOHZPL5dIrr7yiu+++e6hbCLkLFy5ow4YN+vvvv9XV1aXVq1frwQcfdLqtoDQ1NWnbtm06\ndeqU3G63UlJSVFpaqoKCAl25ckXjxo1TcXGxbr31VqdbNetrm/Ly8lRRUaHY2Fh5PB4VFxcrKSnJ\n6VbNfD6fdu7cqbvuuiswtnXrVr388svDdj9JfW/X4sWLVVVV5ei+ciQsAWC44QoeADAgLAHAgLAE\nAAPCEgAMCEsAMCAsAcCAsAQAA8ISAAz+D4GsMlewG9H3AAAAAElFTkSuQmCC\n", 276 | "text/plain": [ 277 | "" 278 | ] 279 | }, 280 | "metadata": { 281 | "tags": [] 282 | } 283 | } 284 | ] 285 | }, 286 | { 287 | "metadata": { 288 | "id": "6g0CKyv3SFxp", 289 | "colab_type": "text" 290 | }, 291 | "cell_type": "markdown", 292 | "source": [ 293 | "## Get a Batch of the Data" 294 | ] 295 | }, 296 | { 297 | "metadata": { 298 | "id": "L0dt5XIqSK_p", 299 | "colab_type": "code", 300 | "colab": {} 301 | }, 302 | "cell_type": "code", 303 | "source": [ 304 | "import numpy as np\n", 305 | "def get_batch(batch_size = 32):\n", 306 | " #get a random index to extract a batch \n", 307 | " r = np.random.randint(0, 60000-batch_size)\n", 308 | " return x_train[r: r + batch_size], y_train[r: r + batch_size]" 309 | ], 310 | "execution_count": 0, 311 | "outputs": [] 312 | }, 313 | { 314 | "metadata": { 315 | "id": "1Ao4i46DR2M9", 316 | "colab_type": "text" 317 | }, 318 | "cell_type": "markdown", 319 | "source": [ 320 | "## Simple Model" 321 | ] 322 | }, 323 | { 324 | "metadata": { 325 | "id": "oUcac6HAR5-S", 326 | "colab_type": "code", 327 | "colab": { 328 | "base_uri": "https://localhost:8080/", 329 | "height": 486 330 | }, 331 | "outputId": "2512bacb-ce78-42dd-ca56-0650ae2d7886" 332 | }, 333 | "cell_type": "code", 334 | "source": [ 335 | "from tensorflow.keras.layers import Dense, Convolution2D, MaxPooling2D, Flatten, BatchNormalization, Dropout\n", 336 | "from tensorflow.keras.models import Sequential\n", 337 | "\n", 338 | "def create_model():\n", 339 | " model = Sequential()\n", 340 | " model.add(Convolution2D(filters = 16, kernel_size = 3, padding = 'same', input_shape = [28, 28, 1], activation = 'relu'))\n", 341 | " model.add(MaxPooling2D(pool_size = (2,2)))\n", 342 | " model.add(BatchNormalization())\n", 343 | " model.add(Convolution2D(filters = 32, kernel_size = 3, padding = 'same', activation = 'relu'))\n", 344 | " model.add(MaxPooling2D(pool_size = (2,2)))\n", 345 | " model.add(BatchNormalization())\n", 346 | " model.add(Flatten())\n", 347 | " model.add(Dense(units = 100, activation = 'relu'))\n", 348 | " model.add(Dropout(0.5))\n", 349 | " model.add(Dense(units = 10 , activation = 'softmax'))\n", 350 | " return model\n", 351 | "\n", 352 | "model = create_model()\n", 353 | "model.summary()" 354 | ], 355 | "execution_count": 7, 356 | "outputs": [ 357 | { 358 | "output_type": "stream", 359 | "text": [ 360 | "_________________________________________________________________\n", 361 | "Layer (type) Output Shape Param # \n", 362 | "=================================================================\n", 363 | "conv2d (Conv2D) (None, 28, 28, 16) 160 \n", 364 | "_________________________________________________________________\n", 365 | "max_pooling2d (MaxPooling2D) (None, 14, 14, 16) 0 \n", 366 | "_________________________________________________________________\n", 367 | "batch_normalization (BatchNo (None, 14, 14, 16) 64 \n", 368 | "_________________________________________________________________\n", 369 | "conv2d_1 (Conv2D) (None, 14, 14, 32) 4640 \n", 370 | "_________________________________________________________________\n", 371 | "max_pooling2d_1 (MaxPooling2 (None, 7, 7, 32) 0 \n", 372 | "_________________________________________________________________\n", 373 | "batch_normalization_1 (Batch (None, 7, 7, 32) 128 \n", 374 | "_________________________________________________________________\n", 375 | "flatten (Flatten) (None, 1568) 0 \n", 376 | "_________________________________________________________________\n", 377 | "dense (Dense) (None, 100) 156900 \n", 378 | "_________________________________________________________________\n", 379 | "dropout (Dropout) (None, 100) 0 \n", 380 | "_________________________________________________________________\n", 381 | "dense_1 (Dense) (None, 10) 1010 \n", 382 | "=================================================================\n", 383 | "Total params: 162,902\n", 384 | "Trainable params: 162,806\n", 385 | "Non-trainable params: 96\n", 386 | "_________________________________________________________________\n" 387 | ], 388 | "name": "stdout" 389 | } 390 | ] 391 | }, 392 | { 393 | "metadata": { 394 | "id": "DH6vYwpXScJG", 395 | "colab_type": "text" 396 | }, 397 | "cell_type": "markdown", 398 | "source": [ 399 | "## Calculate the Loss and Gradient " 400 | ] 401 | }, 402 | { 403 | "metadata": { 404 | "id": "ZTBv2wsJSnFz", 405 | "colab_type": "code", 406 | "colab": {} 407 | }, 408 | "cell_type": "code", 409 | "source": [ 410 | "def loss(model, x, y):\n", 411 | " #use cross entropy\n", 412 | " prediction = model(x)\n", 413 | " return tf.losses.softmax_cross_entropy(y, logits=prediction)\n", 414 | "\n", 415 | "def grad(model, x, y):\n", 416 | " #use tf.GradientTape() to record the gradient \n", 417 | " with tf.GradientTape() as tape:\n", 418 | " loss_value = loss(model, x, y)\n", 419 | " return tape.gradient(loss_value, model.variables)" 420 | ], 421 | "execution_count": 0, 422 | "outputs": [] 423 | }, 424 | { 425 | "metadata": { 426 | "id": "aU8oNX2vSva3", 427 | "colab_type": "text" 428 | }, 429 | "cell_type": "markdown", 430 | "source": [ 431 | "## Calcuate Accuracy of the Model " 432 | ] 433 | }, 434 | { 435 | "metadata": { 436 | "id": "icQFu_AFSzcs", 437 | "colab_type": "code", 438 | "colab": {} 439 | }, 440 | "cell_type": "code", 441 | "source": [ 442 | "def accuracy(model, x, y):\n", 443 | " #calcuate the model prediction eagerly \n", 444 | " yhat = model(x)\n", 445 | " \n", 446 | " #compare the predictions to the truth\n", 447 | " yhat = tf.argmax(yhat, 1).numpy()\n", 448 | " y = tf.argmax(y , 1).numpy()\n", 449 | " return np.sum(y == yhat)/len(y)" 450 | ], 451 | "execution_count": 0, 452 | "outputs": [] 453 | }, 454 | { 455 | "metadata": { 456 | "id": "zei0Ga5jTEJC", 457 | "colab_type": "text" 458 | }, 459 | "cell_type": "markdown", 460 | "source": [ 461 | "## Training\n", 462 | "Initialize the variables for training. Note that we need to know the length of an epoch in order to evaluate the metrics of the model. Also, we are using SGD to update the model paramteres " 463 | ] 464 | }, 465 | { 466 | "metadata": { 467 | "id": "X-iPCeBwTHJv", 468 | "colab_type": "code", 469 | "colab": {} 470 | }, 471 | "cell_type": "code", 472 | "source": [ 473 | "#iterations and epochs variables \n", 474 | "i = 1\n", 475 | "epoch = 0\n", 476 | "epochs = 4 \n", 477 | "\n", 478 | "#running loss and accuracy\n", 479 | "running_loss = 0.\n", 480 | "running_acc = 0.\n", 481 | "\n", 482 | "#calcuate the length of the epoch \n", 483 | "batch_size = 32 \n", 484 | "epoch_length = x_train.numpy().shape[0] // batch_size\n", 485 | "\n", 486 | "#optimizer to update the parmeters of the model \n", 487 | "optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.05)" 488 | ], 489 | "execution_count": 0, 490 | "outputs": [] 491 | }, 492 | { 493 | "metadata": { 494 | "id": "yJTCnxusTczk", 495 | "colab_type": "code", 496 | "colab": { 497 | "base_uri": "https://localhost:8080/", 498 | "height": 104 499 | }, 500 | "outputId": "bdd9609a-25f9-4e82-8cf1-7651ffebdad0" 501 | }, 502 | "cell_type": "code", 503 | "source": [ 504 | "while epoch <= epochs:\n", 505 | " #get next batch\n", 506 | " x, y = get_batch(batch_size = batch_size)\n", 507 | "\n", 508 | " # Calculate derivatives of the input function with respect to its parameters.\n", 509 | " grads = grad(model, x, y)\n", 510 | "\n", 511 | " # Apply the gradient to the model\n", 512 | " optimizer.apply_gradients(zip(grads, model.variables),\n", 513 | " global_step=tf.train.get_or_create_global_step())\n", 514 | " loss_value = loss(model, x, y)\n", 515 | " \n", 516 | " #calcuate running loss and accuracy \n", 517 | " running_loss += loss_value\n", 518 | " running_acc += accuracy(model, x, y)\n", 519 | " \n", 520 | " #report values at the end of the epoch \n", 521 | " if i % epoch_length == 0:\n", 522 | " print(\"Epoch: {:d} Loss: {:.3f}, Acc: {:.3f}\".format(epoch, running_loss/epoch_length, running_acc/epoch_length))\n", 523 | " \n", 524 | " #reset the running loss an accuracy \n", 525 | " running_loss = 0 \n", 526 | " running_acc = 0 \n", 527 | " \n", 528 | " epoch += 1\n", 529 | " i += 1" 530 | ], 531 | "execution_count": 11, 532 | "outputs": [ 533 | { 534 | "output_type": "stream", 535 | "text": [ 536 | "Epoch: 0 Loss: 1.706, Acc: 0.780\n", 537 | "Epoch: 1 Loss: 1.498, Acc: 0.965\n", 538 | "Epoch: 2 Loss: 1.485, Acc: 0.977\n", 539 | "Epoch: 3 Loss: 1.480, Acc: 0.982\n", 540 | "Epoch: 4 Loss: 1.477, Acc: 0.985\n" 541 | ], 542 | "name": "stdout" 543 | } 544 | ] 545 | }, 546 | { 547 | "metadata": { 548 | "id": "SN2lmMTOTx1L", 549 | "colab_type": "text" 550 | }, 551 | "cell_type": "markdown", 552 | "source": [ 553 | "## Testing" 554 | ] 555 | }, 556 | { 557 | "metadata": { 558 | "id": "NsOsTt-WT461", 559 | "colab_type": "code", 560 | "colab": { 561 | "base_uri": "https://localhost:8080/", 562 | "height": 34 563 | }, 564 | "outputId": "23891420-1306-4190-edac-079fcd638ab1" 565 | }, 566 | "cell_type": "code", 567 | "source": [ 568 | "print('Accuracy on the test set , ', accuracy(model, x_test, y_test))" 569 | ], 570 | "execution_count": 12, 571 | "outputs": [ 572 | { 573 | "output_type": "stream", 574 | "text": [ 575 | "Accuracy on the test set , 0.9788\n" 576 | ], 577 | "name": "stdout" 578 | } 579 | ] 580 | }, 581 | { 582 | "metadata": { 583 | "id": "S09UdnjoXvE9", 584 | "colab_type": "text" 585 | }, 586 | "cell_type": "markdown", 587 | "source": [ 588 | "**Read more ** https://www.tensorflow.org/guide/eager" 589 | ] 590 | } 591 | ] 592 | } -------------------------------------------------------------------------------- /GPUvsTPU.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "GPUvsTPU.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "accelerator": "TPU" 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "[View in Colaboratory](https://colab.research.google.com/github/zaidalyafeai/Notebooks/blob/master/GPUvsTPU.ipynb)" 26 | ] 27 | }, 28 | { 29 | "metadata": { 30 | "id": "QueVjeESsyKe", 31 | "colab_type": "text" 32 | }, 33 | "cell_type": "markdown", 34 | "source": [ 35 | "# Why TPUs ? " 36 | ] 37 | }, 38 | { 39 | "metadata": { 40 | "id": "5moeHHv4shGw", 41 | "colab_type": "text" 42 | }, 43 | "cell_type": "markdown", 44 | "source": [ 45 | "TPUs are tensor processing units developed by Google to accelerate operations on a Tensorflow Graph. Each TPU packs up to 180 teraflops of floating-point performance and 64 GB of high-bandwidth memory onto a single board. Here is a comparions between TPUs and Nvidia GPUs. The y axis represents # images per seconds and the x axis is different models.\n", 46 | "\n", 47 | "\"Drawing\"" 48 | ] 49 | }, 50 | { 51 | "metadata": { 52 | "id": "_SXoMcRs8aRs", 53 | "colab_type": "text" 54 | }, 55 | "cell_type": "markdown", 56 | "source": [ 57 | "# Experiement \n", 58 | "\n", 59 | "TPUs were only available on Google cloud but now they are available for free in Colab. We will be comparing TPU vs GPU here on colab using mnist dataset. We will compare the time of each step and epoch against different batch sizes. " 60 | ] 61 | }, 62 | { 63 | "metadata": { 64 | "id": "4ECTupP8warH", 65 | "colab_type": "text" 66 | }, 67 | "cell_type": "markdown", 68 | "source": [ 69 | "# Downoad MNIST " 70 | ] 71 | }, 72 | { 73 | "metadata": { 74 | "id": "oX7DOjhUlCLb", 75 | "colab_type": "code", 76 | "colab": {} 77 | }, 78 | "cell_type": "code", 79 | "source": [ 80 | "import tensorflow as tf\n", 81 | "import os\n", 82 | "import numpy as np\n", 83 | "from tensorflow.keras.utils import to_categorical\n", 84 | "\n", 85 | "def get_data():\n", 86 | "\n", 87 | " #Load mnist data set\n", 88 | " (x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()\n", 89 | "\n", 90 | " x_train = x_train.astype('float32') / 255\n", 91 | " x_test = x_test.astype('float32') / 255\n", 92 | "\n", 93 | " x_train = np.expand_dims(x_train, 3)\n", 94 | " x_test = np.expand_dims(x_test, 3)\n", 95 | "\n", 96 | " y_train = to_categorical(y_train)\n", 97 | " y_test = to_categorical(y_test)\n", 98 | "\n", 99 | " return x_train, y_train, x_test, y_test " 100 | ], 101 | "execution_count": 0, 102 | "outputs": [] 103 | }, 104 | { 105 | "metadata": { 106 | "id": "AtEJD_s1wdty", 107 | "colab_type": "text" 108 | }, 109 | "cell_type": "markdown", 110 | "source": [ 111 | "# Basic CNN" 112 | ] 113 | }, 114 | { 115 | "metadata": { 116 | "id": "IaZZ2OwmwhKQ", 117 | "colab_type": "text" 118 | }, 119 | "cell_type": "markdown", 120 | "source": [ 121 | "Note that since we need to run the code on TPU we need to do more work. We need to specify the address of the TPU and tell tensorflow to run the model on the TPU cluster " 122 | ] 123 | }, 124 | { 125 | "metadata": { 126 | "id": "cUYn3VomnQDL", 127 | "colab_type": "code", 128 | "colab": {} 129 | }, 130 | "cell_type": "code", 131 | "source": [ 132 | "from tensorflow.contrib.tpu.python.tpu import keras_support\n", 133 | "\n", 134 | "def get_model(tpu = False):\n", 135 | " model = tf.keras.Sequential()\n", 136 | "\n", 137 | " #add layers to the model \n", 138 | " model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=2, padding='same', activation='relu', input_shape=(28,28,1))) \n", 139 | " model.add(tf.keras.layers.MaxPooling2D(pool_size=2))\n", 140 | " model.add(tf.keras.layers.Dropout(0.3))\n", 141 | "\n", 142 | " model.add(tf.keras.layers.Conv2D(filters=32, kernel_size=2, padding='same', activation='relu'))\n", 143 | " model.add(tf.keras.layers.MaxPooling2D(pool_size=2))\n", 144 | " model.add(tf.keras.layers.Dropout(0.3))\n", 145 | "\n", 146 | " model.add(tf.keras.layers.Flatten())\n", 147 | " model.add(tf.keras.layers.Dense(256, activation='relu'))\n", 148 | " model.add(tf.keras.layers.Dropout(0.5))\n", 149 | " model.add(tf.keras.layers.Dense(10, activation='softmax'))\n", 150 | "\n", 151 | " #compile the model \n", 152 | " model.compile(loss='categorical_crossentropy',\n", 153 | " optimizer='adam',\n", 154 | " metrics=['accuracy'])\n", 155 | "\n", 156 | " #flag to run on tpu \n", 157 | " if tpu:\n", 158 | " tpu_grpc_url = \"grpc://\"+os.environ[\"COLAB_TPU_ADDR\"]\n", 159 | " \n", 160 | " #connect the TPU cluster using the address \n", 161 | " tpu_cluster_resolver = tf.contrib.cluster_resolver.TPUClusterResolver(tpu_grpc_url)\n", 162 | " \n", 163 | " #run the model on different clusters \n", 164 | " strategy = keras_support.TPUDistributionStrategy(tpu_cluster_resolver)\n", 165 | " \n", 166 | " #convert the model to run on tpu \n", 167 | " model = tf.contrib.tpu.keras_to_tpu_model(model, strategy=strategy)\n", 168 | " return model" 169 | ], 170 | "execution_count": 0, 171 | "outputs": [] 172 | }, 173 | { 174 | "metadata": { 175 | "id": "cSoBDg4PwylQ", 176 | "colab_type": "text" 177 | }, 178 | "cell_type": "markdown", 179 | "source": [ 180 | "#GPU vs TPU\n" 181 | ] 182 | }, 183 | { 184 | "metadata": { 185 | "id": "yluf1xqjsILa", 186 | "colab_type": "code", 187 | "colab": {} 188 | }, 189 | "cell_type": "code", 190 | "source": [ 191 | "x_train, y_train, x_test, y_test = get_data()" 192 | ], 193 | "execution_count": 0, 194 | "outputs": [] 195 | }, 196 | { 197 | "metadata": { 198 | "id": "Wn9zXSBUw8m1", 199 | "colab_type": "text" 200 | }, 201 | "cell_type": "markdown", 202 | "source": [ 203 | "Each time you want to run the model on TPU make sure to set the tpu flag and change the enviornment runtime via Edit> Notebook Setting > Hardware Accelerator > TPU and then click save. " 204 | ] 205 | }, 206 | { 207 | "metadata": { 208 | "id": "4vAM7pBPxVbm", 209 | "colab_type": "code", 210 | "colab": {} 211 | }, 212 | "cell_type": "code", 213 | "source": [ 214 | "#set tpu = True if you want to run the model on TPU\n", 215 | "model = get_model(tpu = False)" 216 | ], 217 | "execution_count": 0, 218 | "outputs": [] 219 | }, 220 | { 221 | "metadata": { 222 | "id": "_m67tWDhnm7f", 223 | "colab_type": "code", 224 | "colab": { 225 | "base_uri": "https://localhost:8080/", 226 | "height": 139 227 | }, 228 | "outputId": "59645203-25f4-4e4f-87a2-72da5fa1cf48" 229 | }, 230 | "cell_type": "code", 231 | "source": [ 232 | "model.fit(x_train,\n", 233 | " y_train,\n", 234 | " batch_size=1024,\n", 235 | " epochs=10,\n", 236 | " validation_data=(x_test, y_test))" 237 | ], 238 | "execution_count": 18, 239 | "outputs": [ 240 | { 241 | "output_type": "stream", 242 | "text": [ 243 | "Train on 60000 samples, validate on 10000 samples\n", 244 | "Epoch 1/3\n", 245 | "60000/60000 [==============================] - 2s 38us/step - loss: 0.1639 - acc: 0.9513 - val_loss: 0.0677 - val_acc: 0.9752\n", 246 | "Epoch 2/3\n", 247 | "60000/60000 [==============================] - 2s 35us/step - loss: 0.1345 - acc: 0.9573 - val_loss: 0.0552 - val_acc: 0.9808\n", 248 | "Epoch 3/3\n", 249 | "60000/60000 [==============================] - 2s 40us/step - loss: 0.1189 - acc: 0.9619 - val_loss: 0.0443 - val_acc: 0.9848\n" 250 | ], 251 | "name": "stdout" 252 | } 253 | ] 254 | }, 255 | { 256 | "metadata": { 257 | "id": "QGof6K46zXfq", 258 | "colab_type": "text" 259 | }, 260 | "cell_type": "markdown", 261 | "source": [ 262 | "# Benchmarks \n", 263 | "\n", 264 | "Note that TPU setup takes some time when compiling the model and distributing the data in the clusters, so the first epoch will take alonger time. I only reported the time for the later epochs. I calculated the average time accross different epochs." 265 | ] 266 | }, 267 | { 268 | "metadata": { 269 | "id": "4cbKs72g00sQ", 270 | "colab_type": "text" 271 | }, 272 | "cell_type": "markdown", 273 | "source": [ 274 | "### Epoch Time ($s$)" 275 | ] 276 | }, 277 | { 278 | "metadata": { 279 | "id": "QNh64VMDz1Ks", 280 | "colab_type": "text" 281 | }, 282 | "cell_type": "markdown", 283 | "source": [ 284 | "$$\\left[\\begin{array}{c|c|c} \n", 285 | " \\textbf{Batch Size} & \\textbf{GPU} & \\textbf{TPU} \\\\\n", 286 | " 256 & 6s & 6s\\\\ \n", 287 | " 512 & 5s & 3s\\\\\n", 288 | " 1024 & 4s & 2s\\\\\n", 289 | "\\end{array}\\right]$$" 290 | ] 291 | }, 292 | { 293 | "metadata": { 294 | "id": "Q8eMm1GD1Mu5", 295 | "colab_type": "text" 296 | }, 297 | "cell_type": "markdown", 298 | "source": [ 299 | "### Step Time ($\\mu s$)" 300 | ] 301 | }, 302 | { 303 | "metadata": { 304 | "id": "q1hElmjr05Ah", 305 | "colab_type": "text" 306 | }, 307 | "cell_type": "markdown", 308 | "source": [ 309 | "$$\\left[\\begin{array}{c|c|c} \n", 310 | " \\textbf{Batch Size} & \\textbf{GPU} & \\textbf{TPU} \\\\\n", 311 | " 256 & 94 \\mu s & 97 \\mu s\\\\ \n", 312 | " 512 & 82 \\mu s& 58 \\mu s \\\\\n", 313 | " 1024 & 79 \\mu s & 37 \\mu s\\\\\n", 314 | "\\end{array}\\right]$$" 315 | ] 316 | }, 317 | { 318 | "metadata": { 319 | "id": "J6eOKfyW38rN", 320 | "colab_type": "text" 321 | }, 322 | "cell_type": "markdown", 323 | "source": [ 324 | "# References\n", 325 | "\n", 326 | "\n", 327 | "\n", 328 | "* https://qiita.com/koshian2/items/25a6341c035e8a260a01\n", 329 | "* https://medium.com/tensorflow/hello-deep-learning-fashion-mnist-with-keras-50fcff8cd74a\n", 330 | "* https://blog.riseml.com/benchmarking-googles-new-tpuv2-121c03b71384\n", 331 | "* https://cloudplatform.googleblog.com/2018/02/Cloud-TPU-machine-learning-accelerators-now-available-in-beta.html\n", 332 | "\n" 333 | ] 334 | } 335 | ] 336 | } -------------------------------------------------------------------------------- /GradientFlow.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "GradientFlow.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "accelerator": "GPU" 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "cell_type": "markdown", 30 | "metadata": { 31 | "id": "nLytHilIDc-y", 32 | "colab_type": "text" 33 | }, 34 | "source": [ 35 | "## Introduction \n", 36 | "\n", 37 | " Real valued functions with real inputs are defined as $f: \\mathbb{R} \\to \\mathbb{R}$. So the mapping is from the real numbers to the real numbers. Most of the famous functions follow this rule. For instance $\\sin, \\cos, \\log, \\exp$ . The derivatives with respect to these functions is well-known in all calculus books. \n", 38 | "\n", 39 | "$$\\begin{array}{|c|c|} \n", 40 | "\\hline\n", 41 | " \\textbf{$f(x)$} & \\textbf{$f'(x)$} \\\\\n", 42 | " \\hline\n", 43 | " \\sin(x) & \\cos(x)\\\\ \n", 44 | " \\cos(x) & - \\sin(x)\\\\\n", 45 | " \\log(x) & \\frac{1}{x}\\\\\n", 46 | " e^x & e^x\\\\\n", 47 | " \\hline\n", 48 | "\\end{array}$$\n", 49 | "\n", 50 | "In calculus the derivative with respect to $x$ is evaluated using this limit \n", 51 | "\n", 52 | "$$ f(x_0) = \\lim_{x \\to x_0} \\frac{f(x) - f(x_0)}{x - x_0}$$\n", 53 | "\n", 54 | "Geometrically the derivative at a point approximates the slope of the tangent line at that point\n", 55 | "\n", 56 | "
\n", 57 | "\n", 58 | "The slope of a function is a representation of how fast the function accelerates/decelerates at a certain point $x$. If the derivative/slope at a certain point is positive then the function is increasing, if it is negative then the function is decreasing and if it is zero then the function is constant. \n", 59 | "\n", 60 | "This concept is very important in function **optimization**. *Usually*, we want to find the point where the function achieves its minimum. We can set $f'(x) = 0 $ and find the zeros of such function. We can check the neighborhood of each root to decide if the function is a local minimum or maximum. The minimum of all local minma is called a global minimum. \n", 61 | "\n", 62 | "
\n" 63 | ] 64 | }, 65 | { 66 | "cell_type": "markdown", 67 | "metadata": { 68 | "id": "3sKWblDsKdi5", 69 | "colab_type": "text" 70 | }, 71 | "source": [ 72 | "**Problems** \n", 73 | "\n", 74 | "\n", 75 | "1. The derivative can be undefined for instance $|x|$ has no derivative at $x = 0$ because the limit from the left and the right are not equal \n", 76 | "$$f'(0^+) = \\lim_{x \\to 0 ^+} \\frac{- x - 0 }{x - 0} = -1 \\neq f'(0^-) = \\lim_{x \\to 0^-} \\frac{x - 0}{x - 0} = 1$$\n", 77 | "2. The point could be a suddle point where the neighborhood of the function could be increasing from the left and decreasing from the right or vice virsa. For instance, the function $x^3$ has a saddle point at $x = 0 $. \n", 78 | "\n" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": { 84 | "id": "DR3Lexe1JVgo", 85 | "colab_type": "text" 86 | }, 87 | "source": [ 88 | "## Derivative Evaluation Algorithms\n", 89 | " There are bascially three approaches to implement gradients in computers \n", 90 | "\n", 91 | "1. **Numerical Differentiation**, which basically uses the finite difference rule for small $h$\n", 92 | "$$f'(x) \\approx \\frac{f(x+h)-f(x)}{h}$$ This formula suffers for numerical instability for small values of $h$.\n", 93 | "2. **Symbolic Differentiation**, this calculates a symbolic expression for the derivative of the function. This approach is bascially used in matlab and mathematica. This approach is quite slow and requires symbols parsing and manipulation. \n", 94 | "\n", 95 | "3. **Automatic Differentiation**, this approach is the base that is used in most deep learning libraries like TensorFlow and Pytorch. Basically, the mathematical expressions are divided into primitve blocks and the derivative is evaluated using the chain rule. \n", 96 | "\n" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": { 102 | "id": "Txp2Po0JU_9_", 103 | "colab_type": "text" 104 | }, 105 | "source": [ 106 | "## Automatic Differentiation\n", 107 | "\n", 108 | "Mainly, there are two main parts for automatic differentiation \n", 109 | "\n", 110 | "* **Forward-mode automatic differentiation** Evaluates the gradient in a froward manner for all the input variables. \n", 111 | "\n", 112 | "* **Reverse-mode automatic differentiation** Evaluates the gradient with respect to the ouput first then back-probogate the gradient to the input. \n", 113 | "\n", 114 | "In most machine learning libraries the reverse-mode is mostly used because the forward mode is more expensive in terms of evaluating the gradient with respect to many inputs. For instance, Tensorflow uses reverse-mode differentiation as explained in this [post](https://www.tensorflow.org/tutorials/eager/automatic_differentiation). \n", 115 | "\n", 116 | "### Computation Graph\n", 117 | "\n", 118 | "Machine learning libraries like Tensorflow build computational graphs where each node represents a simple computation function. This makes reverse-mode differentiation very easy to compute as we back-progbogate the gradieht from the outputs to the inputs. \n", 119 | "\n", 120 | "
\n", 121 | "\n", 122 | "Basically, TensorFlow defines the primitive functions which are the functions that cannot be reduced further. This includes $x^n, e^x, \\sin(x), \\log(x), \\frac{1}{x}\\cdots$. We assume that any other function is a composition of these functions. The composition function is defined as \n", 123 | "\n", 124 | "$$g(x) = f_1 \\circ f_2 \\circ \\cdots f_n(x) = f_1(f_2(\\cdots f_n(x)))$$\n", 125 | "\n", 126 | "Note that the evaluation of the primitives start from the inner function to the outer function. In addition, the derivaitve of each of these operations are already defined inside TensorFlow. After that TensorFlow uses the chain rule which states that the derivative of $g(x) = f_1 \\circ f_2 \\cdots \\circ f_n(x)$ is \n", 127 | "\n", 128 | "$$g'(x_0) = \\frac{df_n(x_0)}{dx} \\times \\frac{df_{n-1}(f_n(x_0))}{dx} \\times \\cdots \\times \\frac{df_1(f_2(\\cdots(f_n(x_0)))}{dx}$$\n" 129 | ] 130 | }, 131 | { 132 | "cell_type": "markdown", 133 | "metadata": { 134 | "id": "oDd62xqvL_kM", 135 | "colab_type": "text" 136 | }, 137 | "source": [ 138 | "First we import the necessary libraries " 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "metadata": { 144 | "id": "tr1NmrQhXjb2", 145 | "colab_type": "code", 146 | "colab": {} 147 | }, 148 | "source": [ 149 | "import tensorflow as tf\n", 150 | "import tensorflow.contrib.eager as tfe" 151 | ], 152 | "execution_count": 0, 153 | "outputs": [] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": { 158 | "id": "CK9uqaV0MCOu", 159 | "colab_type": "text" 160 | }, 161 | "source": [ 162 | "We will use eager execution to evaluate the operations directly " 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "metadata": { 168 | "id": "jmbndYZYXmrR", 169 | "colab_type": "code", 170 | "colab": {} 171 | }, 172 | "source": [ 173 | "tf.enable_eager_execution()" 174 | ], 175 | "execution_count": 0, 176 | "outputs": [] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": { 181 | "id": "4wQYnPBCu8yP", 182 | "colab_type": "text" 183 | }, 184 | "source": [ 185 | "### Real-Valued Functions\n", 186 | "\n", 187 | "Here we look at the functions of the form $f: \\mathbb{R} \\to \\mathbb{R}$. So the input is a real number and the output is a real number. " 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": { 193 | "id": "VSYSycSKMG2z", 194 | "colab_type": "text" 195 | }, 196 | "source": [ 197 | "**Example** \n", 198 | "Let us evaluate the gradient of a simple function $f(x) = x^2$. In mathematics we evaluate the gradient using the limit of the difference\n", 199 | "\n", 200 | "$$ f'(x) = \\lim_{h \\to 0} \\frac{(x+h)^2 - x^2}{h} = \\lim_{h \\to 0} \\frac{x^2 + 2hx + h^ 2 - x^ 2}{h} = \\lim_{h \\to 0} \\frac{h(2x + h)}{h} = 2x $$" 201 | ] 202 | }, 203 | { 204 | "cell_type": "code", 205 | "metadata": { 206 | "id": "zN18WUvjXqvN", 207 | "colab_type": "code", 208 | "colab": {} 209 | }, 210 | "source": [ 211 | "#define the function to differentiate\n", 212 | "def f(x):\n", 213 | " return x**2\n", 214 | "\n", 215 | "#evaluate the gradient as a yet-to-be evaluated function\n", 216 | "g = tfe.gradients_function(f)" 217 | ], 218 | "execution_count": 0, 219 | "outputs": [] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": { 224 | "id": "jepfaZYQMTMS", 225 | "colab_type": "text" 226 | }, 227 | "source": [ 228 | "Evaluate the derivative of the function at $x = 5.0$" 229 | ] 230 | }, 231 | { 232 | "cell_type": "code", 233 | "metadata": { 234 | "id": "QuuXuyY7X-xr", 235 | "colab_type": "code", 236 | "outputId": "026eefc4-b0d5-4539-bb13-d925eb24236e", 237 | "colab": { 238 | "base_uri": "https://localhost:8080/", 239 | "height": 34 240 | } 241 | }, 242 | "source": [ 243 | "x = tf.Variable(5.)\n", 244 | "\n", 245 | "g(x)[0].numpy()" 246 | ], 247 | "execution_count": 12, 248 | "outputs": [ 249 | { 250 | "output_type": "execute_result", 251 | "data": { 252 | "text/plain": [ 253 | "10.0" 254 | ] 255 | }, 256 | "metadata": { 257 | "tags": [] 258 | }, 259 | "execution_count": 12 260 | } 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": { 266 | "id": "1RzK4zPXTFan", 267 | "colab_type": "text" 268 | }, 269 | "source": [ 270 | "**Example**\n", 271 | "\n", 272 | "Suppose that we want to evaluate the gradient of the sigmoid function. \n", 273 | "\n", 274 | "$$ \\sigma (x) = \\frac{1}{1+e^{-x}} $$\n", 275 | "\n", 276 | "We can easily proof using calculus that \n", 277 | "\n", 278 | "$$\\sigma'(x) = \\sigma(x) (1-\\sigma(x))$$\n", 279 | "\n", 280 | "But, we will go the long way! We first decompose it into the primitives\n", 281 | "\n", 282 | "$$ \\sigma (x) = \\frac{1}{1+e^{-x}} = f_1 \\circ f_2 \\circ f_3\\circ f_4(x)$$\n", 283 | "\n", 284 | "where we have $f_1 = 1/ x, f_2 = 1+x , f_3 = e^{x}, f_4 = -1 \\cdot x$. \n", 285 | "\n", 286 | "Then TensorFlow will construct the following graph of these primitives. \n", 287 | "\n", 288 | "![alt text](https://www.researchgate.net/profile/Igor_Macedo_Quintanilha/publication/325694563/figure/fig3/AS:636339552784386@1528726580172/Computational-graph-of-sigmoid-The-values-in-black-red-on-the-top-bottom-of-the-arrows.png)\n", 289 | "\n", 290 | "\n", 291 | "Now we can evaluate the forward pass by composition and the backward pass by chain rule according to the following table where we evaluate $\\sigma'(1)$\n", 292 | "$$\\begin{array}{|c|c|} \n", 293 | "\\hline\n", 294 | " \\textbf{Forward} & \\textbf{Backward} \\\\\n", 295 | " \\hline\n", 296 | " f_4(1) = -1 & f'_1(f_2(f_3(f_4(1)))) =-0.534 \\\\ \n", 297 | " f_3(f_4(1)) = 0.368 & f'_2(f_3(f_4(1))) = 1\\\\\n", 298 | " f_2(f_3(f_4(1))) = 1.368 & f'_3(f_4(1)) = 0.368\\\\\n", 299 | " f_1(f_2(f_3(f_4(1)))) = 0.731 & f'_4(1) = -1\\\\\n", 300 | " \\hline\n", 301 | "\\end{array}$$\n", 302 | "\n", 303 | "From the table we see that \n", 304 | "\n", 305 | "$$\\sigma'(1) \\approx -0.534 \\times 1 \\times 0.368 \\times 1 = 0.19664$$\n", 306 | "\n", 307 | "Now let us evaluate the derivative using TensorFlow" 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "metadata": { 313 | "id": "6koBsFy1WIsE", 314 | "colab_type": "code", 315 | "colab": {} 316 | }, 317 | "source": [ 318 | "def sigmoid(x):\n", 319 | " return 1/(1+tf.exp(-x))\n", 320 | "\n", 321 | "g = tfe.gradients_function(sigmoid)" 322 | ], 323 | "execution_count": 0, 324 | "outputs": [] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "metadata": { 329 | "id": "fQrKT9kjWTBi", 330 | "colab_type": "code", 331 | "outputId": "40e41b33-ad48-4faa-f61a-95409797ef79", 332 | "colab": { 333 | "base_uri": "https://localhost:8080/", 334 | "height": 34 335 | } 336 | }, 337 | "source": [ 338 | "g(1.)[0].numpy()" 339 | ], 340 | "execution_count": 14, 341 | "outputs": [ 342 | { 343 | "output_type": "execute_result", 344 | "data": { 345 | "text/plain": [ 346 | "0.19661194" 347 | ] 348 | }, 349 | "metadata": { 350 | "tags": [] 351 | }, 352 | "execution_count": 14 353 | } 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": { 359 | "id": "2zS7yGofIovU", 360 | "colab_type": "text" 361 | }, 362 | "source": [ 363 | "## Shape Input Convention \n", 364 | "Most machine learning libraries force the gradient to have the same dimension as the input to update the parameters. Conventionally, this might contradict with the basic rules of the matrix calculus. In a calculus form given $x \\in \\mathbb{R}^{n_1 \\cdots n_k}$ and suppose that $f(x) = y$ where $y \\in \\mathbb{R}$ then $f'(x) = w$ then we enforce that $w \\in \\mathbb{R}^{n_1 \\cdots n_k}$. \n", 365 | "\n" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": { 371 | "id": "AtxvntayY1zj", 372 | "colab_type": "text" 373 | }, 374 | "source": [ 375 | "## High Dimension Gradient\n", 376 | "\n", 377 | "### Gradient of vectors \n", 378 | "\n", 379 | "In mathematics we usually use the gradient term to generalize the derivative to higher dimensions. Mainly we define a real valued function with vector inputs as \n", 380 | "\n", 381 | "$$f: \\mathbb{R}^n \\to \\mathbb{R} $$\n", 382 | "\n", 383 | "Hence we could say $y = f(x)$ where $x = (x_1, x_2, \\cdots, x_n)$ and $y \\in \\mathbb{R}$. Then we can define the derivative as \n", 384 | "\n", 385 | "$$\\nabla f(x) = \\left( \\frac{ \\partial y}{\\partial x_1}, \\frac{\\partial y}{\\partial x_2}, \\cdots, \\frac{\\partial y}{\\partial x_n}\\right)$$\n" 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": { 391 | "id": "zaEBuV77xo9C", 392 | "colab_type": "text" 393 | }, 394 | "source": [ 395 | "**Example**\n", 396 | "\n", 397 | "The norm of a function operates on vectors \n", 398 | "\n", 399 | "$$\\Vert x \\Vert = \\sqrt{\\sum_{i=1}^n x^2_i}$$\n", 400 | "\n", 401 | "So this function sums the squares of the components and takes the root. What is the gradient of the squared norm $f(x) = \\Vert x \\Vert ^2$ ? \n", 402 | "\n", 403 | "\n", 404 | "$$\\frac{\\partial f}{\\partial x_1} = 2x_1, \\frac{\\partial f}{\\partial x_2} = 2x_2, \\cdots , \\frac{\\partial f}{\\partial x_n} = 2x_n$$\n", 405 | "\n", 406 | "In a simpler format we have \n", 407 | "\n", 408 | "$$\\nabla f = (2x_1, \\cdots 2 x_n ) = 2 x $$" 409 | ] 410 | }, 411 | { 412 | "cell_type": "code", 413 | "metadata": { 414 | "id": "MDxOdbOeCqZt", 415 | "colab_type": "code", 416 | "colab": {} 417 | }, 418 | "source": [ 419 | "#create a variale with three components \n", 420 | "x = tf.Variable([1., 2. , 3.])\n", 421 | "\n", 422 | "#define the norm \n", 423 | "def norm(x):\n", 424 | " return tf.reduce_sum(tf.square(x))\n", 425 | "\n", 426 | "#evaluate the gradient\n", 427 | "g = tfe.gradients_function(norm)" 428 | ], 429 | "execution_count": 0, 430 | "outputs": [] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "metadata": { 435 | "id": "L71YecDVDLvT", 436 | "colab_type": "code", 437 | "outputId": "589223d9-1d88-4f2c-dbc6-568ec82f1111", 438 | "colab": { 439 | "base_uri": "https://localhost:8080/", 440 | "height": 34 441 | } 442 | }, 443 | "source": [ 444 | "g(x)[0].numpy()" 445 | ], 446 | "execution_count": 16, 447 | "outputs": [ 448 | { 449 | "output_type": "execute_result", 450 | "data": { 451 | "text/plain": [ 452 | "array([2., 4., 6.], dtype=float32)" 453 | ] 454 | }, 455 | "metadata": { 456 | "tags": [] 457 | }, 458 | "execution_count": 16 459 | } 460 | ] 461 | }, 462 | { 463 | "cell_type": "markdown", 464 | "metadata": { 465 | "id": "KJaqZXMUz2EZ", 466 | "colab_type": "text" 467 | }, 468 | "source": [ 469 | "We can compute the second derivative in a similar approach" 470 | ] 471 | }, 472 | { 473 | "cell_type": "code", 474 | "metadata": { 475 | "id": "oeuf_grPZOwj", 476 | "colab_type": "code", 477 | "colab": {} 478 | }, 479 | "source": [ 480 | "#create a variale with three components \n", 481 | "x = tf.Variable([[1.], [2.], [3.]])\n", 482 | "\n", 483 | "#define the operation \n", 484 | "def op(x):\n", 485 | " return tf.square(x)\n", 486 | "\n", 487 | "dx = tfe.gradients_function\n", 488 | "\n", 489 | "#compute the second order derivative\n", 490 | "g = dx(dx(op))" 491 | ], 492 | "execution_count": 0, 493 | "outputs": [] 494 | }, 495 | { 496 | "cell_type": "code", 497 | "metadata": { 498 | "id": "SxtANKE752WV", 499 | "colab_type": "code", 500 | "colab": { 501 | "base_uri": "https://localhost:8080/", 502 | "height": 68 503 | }, 504 | "outputId": "07ae5548-5692-4ca3-8d3e-5c0c17bb43c4" 505 | }, 506 | "source": [ 507 | "g(x)[0].numpy()" 508 | ], 509 | "execution_count": 18, 510 | "outputs": [ 511 | { 512 | "output_type": "execute_result", 513 | "data": { 514 | "text/plain": [ 515 | "array([[2.],\n", 516 | " [2.],\n", 517 | " [2.]], dtype=float32)" 518 | ] 519 | }, 520 | "metadata": { 521 | "tags": [] 522 | }, 523 | "execution_count": 18 524 | } 525 | ] 526 | }, 527 | { 528 | "cell_type": "markdown", 529 | "metadata": { 530 | "id": "rU1hE2vJa944", 531 | "colab_type": "text" 532 | }, 533 | "source": [ 534 | "We can also compute the gradient of functions with two variables" 535 | ] 536 | }, 537 | { 538 | "cell_type": "code", 539 | "metadata": { 540 | "id": "mAdhmGTlZTtA", 541 | "colab_type": "code", 542 | "colab": {} 543 | }, 544 | "source": [ 545 | "#create a variale with three components \n", 546 | "x = tf.Variable([[1.], [2.], [3.]])\n", 547 | "y = tf.Variable([[2.], [4.], [6.]])\n", 548 | "\n", 549 | "#define the operation \n", 550 | "def op(x, y):\n", 551 | " return x + y \n", 552 | "\n", 553 | "g = tfe.gradients_function(op)" 554 | ], 555 | "execution_count": 0, 556 | "outputs": [] 557 | }, 558 | { 559 | "cell_type": "code", 560 | "metadata": { 561 | "id": "NaPeOHcsa4bf", 562 | "colab_type": "code", 563 | "outputId": "7464e1b6-0b0a-43ee-ef74-9951eb35316b", 564 | "colab": { 565 | "base_uri": "https://localhost:8080/", 566 | "height": 153 567 | } 568 | }, 569 | "source": [ 570 | "g(x, y)" 571 | ], 572 | "execution_count": 20, 573 | "outputs": [ 574 | { 575 | "output_type": "execute_result", 576 | "data": { 577 | "text/plain": [ 578 | "[,\n", 582 | " ]" 586 | ] 587 | }, 588 | "metadata": { 589 | "tags": [] 590 | }, 591 | "execution_count": 20 592 | } 593 | ] 594 | }, 595 | { 596 | "cell_type": "markdown", 597 | "metadata": { 598 | "id": "JkzTLwl7OhuE", 599 | "colab_type": "text" 600 | }, 601 | "source": [ 602 | "## The gradient of a matrix\n", 603 | "\n", 604 | " Mainly we define a real valued function with matrix inputs as \n", 605 | "\n", 606 | "$$f: \\mathbb{R}^{n \\times m} \\to \\mathbb{R} $$\n", 607 | "\n", 608 | "where we define $n$ as the number of rows and $m$ as the number of columns. We usually prefer to work with square matrices because of their nice properties. However, this approach should generalize easily to rectangular matrices \n", 609 | "\n", 610 | "**Example**\n", 611 | "\n", 612 | "Let us define the [frobenious norm](http://mathworld.wolfram.com/FrobeniusNorm.html) for a given matix $A$\n", 613 | "\n", 614 | "$$\\Vert A \\Vert_F = \\sqrt{\\sum_{i=1}^n \\sum_{j=1}^m |a_{ij}|^2} $$\n", 615 | "\n", 616 | "To make things simpler we could rewrite $\\Vert A \\Vert_F = a_{11}^2 + \\dots + a_{nm}^2$. Using this format we can easily deduce that the gradient with respec to the matrix can be evaluated as \n", 617 | "\n", 618 | "$$\\frac{\\partial \\Vert A \\Vert_F}{\\partial a_{ij}} = 2 a_{ij}$$\n", 619 | "\n", 620 | "Or in simpler terms \n", 621 | "\n", 622 | "$$\\frac{\\partial \\Vert A \\Vert_F}{\\partial A} = 2 A$$\n", 623 | "\n", 624 | "\n", 625 | "\n" 626 | ] 627 | }, 628 | { 629 | "cell_type": "code", 630 | "metadata": { 631 | "id": "dSUYfad4RvOy", 632 | "colab_type": "code", 633 | "colab": {} 634 | }, 635 | "source": [ 636 | "#create a variale with three components \n", 637 | "A = tf.constant([[2., 3., 4.], [5., 6., 7.], [8., 9. , 10.]])\n", 638 | "\n", 639 | "#define the norm \n", 640 | "def frobenious_norm(A):\n", 641 | " return tf.reduce_sum(tf.square(A))\n", 642 | "\n", 643 | "#evaluate the gradient\n", 644 | "g = tfe.gradients_function(frobenious_norm)" 645 | ], 646 | "execution_count": 0, 647 | "outputs": [] 648 | }, 649 | { 650 | "cell_type": "code", 651 | "metadata": { 652 | "id": "JRDHxQtGR_NQ", 653 | "colab_type": "code", 654 | "outputId": "4d374028-3a00-458c-899f-5250aa61f774", 655 | "colab": { 656 | "base_uri": "https://localhost:8080/", 657 | "height": 68 658 | } 659 | }, 660 | "source": [ 661 | "g(A)[0].numpy()" 662 | ], 663 | "execution_count": 22, 664 | "outputs": [ 665 | { 666 | "output_type": "execute_result", 667 | "data": { 668 | "text/plain": [ 669 | "array([[ 4., 6., 8.],\n", 670 | " [10., 12., 14.],\n", 671 | " [16., 18., 20.]], dtype=float32)" 672 | ] 673 | }, 674 | "metadata": { 675 | "tags": [] 676 | }, 677 | "execution_count": 22 678 | } 679 | ] 680 | }, 681 | { 682 | "cell_type": "markdown", 683 | "metadata": { 684 | "id": "SAOMg-5pTCIO", 685 | "colab_type": "text" 686 | }, 687 | "source": [ 688 | "## Jaccobian Matrix\n", 689 | "\n", 690 | "The Jaccobian matrix is the first order derivatives of a vector valued function. Vector valued functions are defined as $f: \\mathbb{R}^n \\to \\mathbb{R}^m$.\n", 691 | "\n", 692 | "\n", 693 | "Given $x \\in \\mathbb{R}^n$ and $f_j : \\mathbb{R}^n \\to \\mathbb{R}$ we have\n", 694 | "\n", 695 | "$$f(x) = \\begin{bmatrix}\n", 696 | " f_1(x) \\\\\n", 697 | " f_2(x) \\\\\n", 698 | " \\vdots \\\\\n", 699 | " f_m(x)\n", 700 | "\\end{bmatrix}$$\n", 701 | " \n", 702 | " We could then define the jaccobian as \n", 703 | " \n", 704 | " $$J(x) = \\begin{bmatrix}\n", 705 | " \\frac{\\partial f_1}{\\partial x_1} & \\frac{\\partial f_1}{\\partial x_2} & \\dots & \\frac{\\partial f_1}{\\partial x_n} \\\\\n", 706 | " \\frac{\\partial f_2}{\\partial x_1} & \\frac{\\partial f_2}{\\partial x_2} & \\dots & \\frac{\\partial f_2}{\\partial x_n} \\\\\n", 707 | " \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", 708 | " \\frac{\\partial f_m}{\\partial x_1} & \\frac{\\partial f_m}{\\partial x_2} & \\dots & \\frac{\\partial f_m}{\\partial x_n} \n", 709 | "\\end{bmatrix} $$\n", 710 | " \n", 711 | "Hence each row represents the derivative of a real valued function with input vectors. Note that using shape convention we must reshape that to have the same output as the vector input. " 712 | ] 713 | }, 714 | { 715 | "cell_type": "markdown", 716 | "metadata": { 717 | "id": "8bD7dtEMyuSI", 718 | "colab_type": "text" 719 | }, 720 | "source": [ 721 | "**Example**\n", 722 | "\n", 723 | "define $x \\in \\mathbb{R}^n$ and $f: \\mathbb{R}^n \\to \\mathbb{R}^2$ where $f(x) = ( x_1 , x_2)$. It is always customary to write it in an expanded form\n", 724 | "\n", 725 | "$$f \\left( \\begin{bmatrix}\n", 726 | " x_{1} \\\\\n", 727 | " x_{2} \\\\\n", 728 | " \\vdots \\\\\n", 729 | " x_{n}\n", 730 | "\\end{bmatrix} \\right) = \\begin{bmatrix}\n", 731 | " x_1 \\\\\n", 732 | " x_2 \\\\\n", 733 | "\\end{bmatrix}$$\n", 734 | "\n", 735 | "The gradient can be evaluated as \n", 736 | "\n", 737 | "$$J \\left(x \\right) = \\begin{bmatrix}\n", 738 | " \\frac{\\partial x_1}{\\partial x} \\\\\n", 739 | " \\frac{\\partial x_2}{\\partial x} \\\\\n", 740 | "\\end{bmatrix} = \\begin{bmatrix}\n", 741 | " 1 & 0 & 0 & \\cdots & 0 \\\\\n", 742 | " 0 & 1 & 0 & \\cdots & 0 \\\\\n", 743 | "\\end{bmatrix}$$\n", 744 | "\n", 745 | "Since we must have the same shape as the input we sum the rows and pad zeros for the other variables\n", 746 | "\n", 747 | "$$J(x) = \\begin{bmatrix}\n", 748 | " 1 & 1 & 0 & \\cdots & 0 \n", 749 | "\\end{bmatrix}^ T$$\n" 750 | ] 751 | }, 752 | { 753 | "cell_type": "code", 754 | "metadata": { 755 | "id": "0O6_ZYzaWmJW", 756 | "colab_type": "code", 757 | "colab": {} 758 | }, 759 | "source": [ 760 | "#create a variale with three components \n", 761 | "x = tf.Variable([[1.], [2.], [3.]])\n", 762 | "\n", 763 | "#define the operation \n", 764 | "def slice(x):\n", 765 | " return x[0:2]\n", 766 | "\n", 767 | "g = tfe.gradients_function(slice)" 768 | ], 769 | "execution_count": 0, 770 | "outputs": [] 771 | }, 772 | { 773 | "cell_type": "code", 774 | "metadata": { 775 | "id": "b2suoSGsXQWq", 776 | "colab_type": "code", 777 | "outputId": "2fb4a6f4-66df-4811-e735-7d84b06894fe", 778 | "colab": { 779 | "base_uri": "https://localhost:8080/", 780 | "height": 68 781 | } 782 | }, 783 | "source": [ 784 | "g(x)[0].numpy()" 785 | ], 786 | "execution_count": 24, 787 | "outputs": [ 788 | { 789 | "output_type": "execute_result", 790 | "data": { 791 | "text/plain": [ 792 | "array([[1.],\n", 793 | " [1.],\n", 794 | " [0.]], dtype=float32)" 795 | ] 796 | }, 797 | "metadata": { 798 | "tags": [] 799 | }, 800 | "execution_count": 24 801 | } 802 | ] 803 | }, 804 | { 805 | "cell_type": "markdown", 806 | "metadata": { 807 | "id": "YUIUaX1zRJpv", 808 | "colab_type": "text" 809 | }, 810 | "source": [ 811 | "**Example**\n", 812 | "\n", 813 | "Let $A$ be an $m \\times n$ constant matrix and $x$ be $n\\times 1$ vector \n", 814 | "\n", 815 | "$$f(x) = A x $$\n", 816 | "\n", 817 | "In an expanded form this is like \n", 818 | "\n", 819 | "$$f(x) = \\begin{bmatrix}\n", 820 | " a_{11} & a_{12} & a_{13} & \\dots & a_{1n} \\\\\n", 821 | " a_{21} & a_{22} & a_{23} & \\dots & a_{2n} \\\\\n", 822 | " \\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n", 823 | " a_{m1} & a_{m2} & a_{m3} & \\dots & a_{mn}\n", 824 | "\\end{bmatrix} \\times \\begin{bmatrix}\n", 825 | " x_{1} \\\\\n", 826 | " x_{2} \\\\\n", 827 | " \\vdots \\\\\n", 828 | " x_{n}\n", 829 | "\\end{bmatrix} = \\begin{bmatrix}\n", 830 | " A_1 \\cdot x \\\\\n", 831 | " A_2 \\cdot x \\\\\n", 832 | " \\vdots \\\\\n", 833 | " A_n \\cdot x\n", 834 | "\\end{bmatrix} $$\n", 835 | "\n", 836 | "Where $A_i$ is the ith row and $\\cdot$ is the dot product of vectors. If we fix the other variables and took the derivative with $x_1$ for instance then we see that the partial derivative is in terms of the column vector of the form $A^T_1$ which is the 1st column of the array. Since we have multiple values we just sum them togehter. Generally $$ \\frac{\\partial f}{\\partial x_i} = \\sum_{j=1}^m a_{ji}$$. " 837 | ] 838 | }, 839 | { 840 | "cell_type": "markdown", 841 | "metadata": { 842 | "id": "z4d_BEqvXam1", 843 | "colab_type": "text" 844 | }, 845 | "source": [ 846 | "We see that the derivative with respect to each variable is just the sum of the corrosponding column vector in the matrix" 847 | ] 848 | }, 849 | { 850 | "cell_type": "code", 851 | "metadata": { 852 | "id": "s55vrSL2UhDC", 853 | "colab_type": "code", 854 | "colab": {} 855 | }, 856 | "source": [ 857 | "#create a variale with three components \n", 858 | "x = tf.Variable([[1.], [2.], [3.]])\n", 859 | "\n", 860 | "#define the multiplication \n", 861 | "def op(y):\n", 862 | " A = tf.constant([[2., 3., 4.], [5., 6., 7.], [8., 9. , 10.]])\n", 863 | " return tf.matmul(A, y)\n", 864 | "\n", 865 | "#evaluate the gradient\n", 866 | "g = tfe.gradients_function(op)" 867 | ], 868 | "execution_count": 0, 869 | "outputs": [] 870 | }, 871 | { 872 | "cell_type": "code", 873 | "metadata": { 874 | "id": "gL4jIKzFbF2p", 875 | "colab_type": "code", 876 | "outputId": "efaa74aa-42aa-4de9-d77c-394e2f5750ad", 877 | "colab": { 878 | "base_uri": "https://localhost:8080/", 879 | "height": 68 880 | } 881 | }, 882 | "source": [ 883 | "g(x)[0].numpy()" 884 | ], 885 | "execution_count": 26, 886 | "outputs": [ 887 | { 888 | "output_type": "execute_result", 889 | "data": { 890 | "text/plain": [ 891 | "array([[15.],\n", 892 | " [18.],\n", 893 | " [21.]], dtype=float32)" 894 | ] 895 | }, 896 | "metadata": { 897 | "tags": [] 898 | }, 899 | "execution_count": 26 900 | } 901 | ] 902 | }, 903 | { 904 | "cell_type": "markdown", 905 | "metadata": { 906 | "id": "-VBSUGKyZQwr", 907 | "colab_type": "text" 908 | }, 909 | "source": [ 910 | "### References\n", 911 | "http://www.columbia.edu/~ahd2125/post/2015/12/5/\n", 912 | "\n", 913 | "https://www.easy-tensorflow.com/tf-tutorials/basics/graph-and-session\n", 914 | "\n", 915 | "https://rufflewind.com/2016-12-30/reverse-mode-automatic-differentiation" 916 | ] 917 | } 918 | ] 919 | } -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Zaid Alyafeai 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /ONePlace.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "ONePlace.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "include_colab_link": true 10 | }, 11 | "kernelspec": { 12 | "name": "python3", 13 | "display_name": "Python 3" 14 | }, 15 | "accelerator": "GPU" 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "[View in Colaboratory](https://colab.research.google.com/github/zaidalyafeai/Notebooks/blob/master/ONePlace.ipynb)" 26 | ] 27 | }, 28 | { 29 | "metadata": { 30 | "id": "bSaqAJczA8GS", 31 | "colab_type": "text" 32 | }, 33 | "cell_type": "markdown", 34 | "source": [ 35 | "# Contents" 36 | ] 37 | }, 38 | { 39 | "metadata": { 40 | "id": "SolX8T8iAnVz", 41 | "colab_type": "toc" 42 | }, 43 | "cell_type": "markdown", 44 | "source": [ 45 | ">[Introduction](#scrollTo=taEElfKSBHge)\n", 46 | "\n", 47 | ">[Setup GitHub](#scrollTo=IKr3caNnTCqh)\n", 48 | "\n", 49 | ">[Creating the model using keras](#scrollTo=tVqBtdDDPqtM)\n", 50 | "\n", 51 | ">[Convert the Model](#scrollTo=glkP5CvySfgK)\n", 52 | "\n", 53 | ">[Create a Web Page](#scrollTo=dr8MnQUbUBY7)\n", 54 | "\n", 55 | ">[Upload the model to GitHub](#scrollTo=RGI4Lj3IS1kC)\n", 56 | "\n" 57 | ] 58 | }, 59 | { 60 | "metadata": { 61 | "id": "taEElfKSBHge", 62 | "colab_type": "text" 63 | }, 64 | "cell_type": "markdown", 65 | "source": [ 66 | "# Introduction \n", 67 | "In this tutorial we learn how to \n", 68 | "\n", 69 | "\n", 70 | "1. Train a model with Keras with GPU\n", 71 | "2. Convert a model to web format \n", 72 | "3. Upload the model to GitHub Pages \n", 73 | "4. Prediction using TensorFlow.js \n", 74 | "\n" 75 | ] 76 | }, 77 | { 78 | "metadata": { 79 | "id": "IKr3caNnTCqh", 80 | "colab_type": "text" 81 | }, 82 | "cell_type": "markdown", 83 | "source": [ 84 | "# Setup GitHub" 85 | ] 86 | }, 87 | { 88 | "metadata": { 89 | "id": "Yu-qRz_SOAoQ", 90 | "colab_type": "text" 91 | }, 92 | "cell_type": "markdown", 93 | "source": [ 94 | "Enter your email and user name " 95 | ] 96 | }, 97 | { 98 | "metadata": { 99 | "id": "mlxScGFrN7PZ", 100 | "colab_type": "code", 101 | "colab": {} 102 | }, 103 | "cell_type": "code", 104 | "source": [ 105 | "!git config --global user.email \"email\"\n", 106 | "!git config --global user.name \"user\"" 107 | ], 108 | "execution_count": 0, 109 | "outputs": [] 110 | }, 111 | { 112 | "metadata": { 113 | "id": "jBgrC0BCOjnX", 114 | "colab_type": "text" 115 | }, 116 | "cell_type": "markdown", 117 | "source": [ 118 | "Clone the directory of github pages. If you don't have one, check this [GitHub Pages](https://pages.github.com/). For instance mine is\n", 119 | "\n", 120 | "`https://github.com/zaidalyafeai/zaidalyafeai.github.io`" 121 | ] 122 | }, 123 | { 124 | "metadata": { 125 | "id": "U0aQzFskOi4L", 126 | "colab_type": "code", 127 | "colab": {} 128 | }, 129 | "cell_type": "code", 130 | "source": [ 131 | "!git clone " 132 | ], 133 | "execution_count": 0, 134 | "outputs": [] 135 | }, 136 | { 137 | "metadata": { 138 | "id": "muiepE0p7f46", 139 | "colab_type": "text" 140 | }, 141 | "cell_type": "markdown", 142 | "source": [ 143 | "Change directory " 144 | ] 145 | }, 146 | { 147 | "metadata": { 148 | "id": "JeFg-o5b7d99", 149 | "colab_type": "code", 150 | "colab": {} 151 | }, 152 | "cell_type": "code", 153 | "source": [ 154 | "import os\n", 155 | "os.chdir('')" 156 | ], 157 | "execution_count": 0, 158 | "outputs": [] 159 | }, 160 | { 161 | "metadata": { 162 | "id": "qah-8mhPPAiy", 163 | "colab_type": "text" 164 | }, 165 | "cell_type": "markdown", 166 | "source": [ 167 | "Create a new foulder for the project and change the current directory to be inside " 168 | ] 169 | }, 170 | { 171 | "metadata": { 172 | "id": "O7IIAbu6O4uO", 173 | "colab_type": "code", 174 | "colab": {} 175 | }, 176 | "cell_type": "code", 177 | "source": [ 178 | "!mkdir XOR\n", 179 | "os.chdir('XOR')" 180 | ], 181 | "execution_count": 0, 182 | "outputs": [] 183 | }, 184 | { 185 | "metadata": { 186 | "id": "GwvM9LncPdh-", 187 | "colab_type": "text" 188 | }, 189 | "cell_type": "markdown", 190 | "source": [ 191 | "Create a directory for saving the keras model and the web model " 192 | ] 193 | }, 194 | { 195 | "metadata": { 196 | "id": "wOyiFQOfPWYR", 197 | "colab_type": "code", 198 | "colab": {} 199 | }, 200 | "cell_type": "code", 201 | "source": [ 202 | "!mkdir web_model \n", 203 | "!mkdir saved_model" 204 | ], 205 | "execution_count": 0, 206 | "outputs": [] 207 | }, 208 | { 209 | "metadata": { 210 | "id": "tVqBtdDDPqtM", 211 | "colab_type": "text" 212 | }, 213 | "cell_type": "markdown", 214 | "source": [ 215 | "# Creating the model using keras " 216 | ] 217 | }, 218 | { 219 | "metadata": { 220 | "id": "b1JCrGrePvKp", 221 | "colab_type": "text" 222 | }, 223 | "cell_type": "markdown", 224 | "source": [ 225 | "We will create a simple model that models XOR operation. Given two inputs $(x_0, x_1)$ it outputs $y$\n", 226 | "\n", 227 | "$$\\left[\\begin{array}{cc|c} \n", 228 | " x_0 & x_1 & y\\\\\n", 229 | " 0 & 0 & 0\\\\ \n", 230 | " 0 & 1 & 1\\\\\n", 231 | " 1 & 0 & 1\\\\\n", 232 | " 1 & 1 & 0\n", 233 | "\\end{array}\\right]$$" 234 | ] 235 | }, 236 | { 237 | "metadata": { 238 | "id": "WKYiL-oYR0yk", 239 | "colab_type": "text" 240 | }, 241 | "cell_type": "markdown", 242 | "source": [ 243 | "Imports " 244 | ] 245 | }, 246 | { 247 | "metadata": { 248 | "id": "a-UbSG-DR3ID", 249 | "colab_type": "code", 250 | "colab": { 251 | "base_uri": "https://localhost:8080/", 252 | "height": 34 253 | }, 254 | "outputId": "3c877413-375c-4eca-b7aa-9234cb927538" 255 | }, 256 | "cell_type": "code", 257 | "source": [ 258 | "from keras.models import Sequential\n", 259 | "from keras.layers.core import Dense, Dropout, Activation\n", 260 | "from keras.optimizers import SGD\n", 261 | "import numpy as np " 262 | ], 263 | "execution_count": 13, 264 | "outputs": [ 265 | { 266 | "output_type": "stream", 267 | "text": [ 268 | "Using TensorFlow backend.\n" 269 | ], 270 | "name": "stderr" 271 | } 272 | ] 273 | }, 274 | { 275 | "metadata": { 276 | "id": "UvBySAixR4Ca", 277 | "colab_type": "text" 278 | }, 279 | "cell_type": "markdown", 280 | "source": [ 281 | "Initialize the inputs " 282 | ] 283 | }, 284 | { 285 | "metadata": { 286 | "id": "Hj65iQS6R6pO", 287 | "colab_type": "code", 288 | "colab": {} 289 | }, 290 | "cell_type": "code", 291 | "source": [ 292 | "X = np.array([[0,0],[0,1],[1,0],[1,1]])\n", 293 | "y = np.array([[0],[1],[1],[0]])" 294 | ], 295 | "execution_count": 0, 296 | "outputs": [] 297 | }, 298 | { 299 | "metadata": { 300 | "id": "TTyQKnEgSBQb", 301 | "colab_type": "text" 302 | }, 303 | "cell_type": "markdown", 304 | "source": [ 305 | "Create the model " 306 | ] 307 | }, 308 | { 309 | "metadata": { 310 | "id": "ivnpyw3ZSAF9", 311 | "colab_type": "code", 312 | "colab": {} 313 | }, 314 | "cell_type": "code", 315 | "source": [ 316 | "model = Sequential()\n", 317 | "model.add(Dense(8, input_dim=2))\n", 318 | "model.add(Activation('tanh'))\n", 319 | "model.add(Dense(1))\n", 320 | "model.add(Activation('sigmoid'))\n", 321 | "\n", 322 | "sgd = SGD(lr=0.1)\n", 323 | "model.compile(loss='binary_crossentropy', optimizer=sgd)" 324 | ], 325 | "execution_count": 0, 326 | "outputs": [] 327 | }, 328 | { 329 | "metadata": { 330 | "id": "zzrpHO1XSIeJ", 331 | "colab_type": "text" 332 | }, 333 | "cell_type": "markdown", 334 | "source": [ 335 | "Train the model " 336 | ] 337 | }, 338 | { 339 | "metadata": { 340 | "id": "jRwYsPJxRrYT", 341 | "colab_type": "code", 342 | "colab": { 343 | "base_uri": "https://localhost:8080/", 344 | "height": 69 345 | }, 346 | "outputId": "b6d6e30d-b51f-4266-d468-88af8f7ff384" 347 | }, 348 | "cell_type": "code", 349 | "source": [ 350 | "model.fit(X, y, batch_size=1, nb_epoch=1000, verbose= 0)" 351 | ], 352 | "execution_count": 16, 353 | "outputs": [ 354 | { 355 | "output_type": "stream", 356 | "text": [ 357 | "/usr/local/lib/python3.6/dist-packages/keras/models.py:981: UserWarning: The `nb_epoch` argument in `fit` has been renamed `epochs`.\n", 358 | " warnings.warn('The `nb_epoch` argument in `fit` '\n" 359 | ], 360 | "name": "stderr" 361 | }, 362 | { 363 | "output_type": "execute_result", 364 | "data": { 365 | "text/plain": [ 366 | "" 367 | ] 368 | }, 369 | "metadata": { 370 | "tags": [] 371 | }, 372 | "execution_count": 16 373 | } 374 | ] 375 | }, 376 | { 377 | "metadata": { 378 | "id": "VHlJ2cmpSbZ7", 379 | "colab_type": "text" 380 | }, 381 | "cell_type": "markdown", 382 | "source": [ 383 | "Predict the output " 384 | ] 385 | }, 386 | { 387 | "metadata": { 388 | "id": "ky1bM2EiSHYt", 389 | "colab_type": "code", 390 | "colab": { 391 | "base_uri": "https://localhost:8080/", 392 | "height": 86 393 | }, 394 | "outputId": "c0e91991-4252-4649-99fd-64c044f8148d" 395 | }, 396 | "cell_type": "code", 397 | "source": [ 398 | "print(model.predict_proba(X))" 399 | ], 400 | "execution_count": 17, 401 | "outputs": [ 402 | { 403 | "output_type": "stream", 404 | "text": [ 405 | "[[0.00199795]\n", 406 | " [0.99443084]\n", 407 | " [0.99369615]\n", 408 | " [0.00691568]]\n" 409 | ], 410 | "name": "stdout" 411 | } 412 | ] 413 | }, 414 | { 415 | "metadata": { 416 | "id": "vvdWZCRslZUz", 417 | "colab_type": "text" 418 | }, 419 | "cell_type": "markdown", 420 | "source": [ 421 | "Save the model " 422 | ] 423 | }, 424 | { 425 | "metadata": { 426 | "id": "mxRke-l9lXfY", 427 | "colab_type": "code", 428 | "colab": {} 429 | }, 430 | "cell_type": "code", 431 | "source": [ 432 | "model.save('saved_model/keras.h5')" 433 | ], 434 | "execution_count": 0, 435 | "outputs": [] 436 | }, 437 | { 438 | "metadata": { 439 | "id": "glkP5CvySfgK", 440 | "colab_type": "text" 441 | }, 442 | "cell_type": "markdown", 443 | "source": [ 444 | "# Convert the Model " 445 | ] 446 | }, 447 | { 448 | "metadata": { 449 | "id": "q30sPc63lbvw", 450 | "colab_type": "text" 451 | }, 452 | "cell_type": "markdown", 453 | "source": [ 454 | "Download the library " 455 | ] 456 | }, 457 | { 458 | "metadata": { 459 | "id": "-FSJVtS9SiVi", 460 | "colab_type": "code", 461 | "colab": {} 462 | }, 463 | "cell_type": "code", 464 | "source": [ 465 | "!pip install tensorflowjs" 466 | ], 467 | "execution_count": 0, 468 | "outputs": [] 469 | }, 470 | { 471 | "metadata": { 472 | "id": "HWCP02udldLr", 473 | "colab_type": "text" 474 | }, 475 | "cell_type": "markdown", 476 | "source": [ 477 | "Convert the model " 478 | ] 479 | }, 480 | { 481 | "metadata": { 482 | "id": "DuQP_mkeSkKL", 483 | "colab_type": "code", 484 | "colab": { 485 | "base_uri": "https://localhost:8080/", 486 | "height": 34 487 | }, 488 | "outputId": "2ccdd0bd-4536-4923-c973-34df97eb1b87" 489 | }, 490 | "cell_type": "code", 491 | "source": [ 492 | "!tensorflowjs_converter --input_format keras saved_model/keras.h5 web_model" 493 | ], 494 | "execution_count": 19, 495 | "outputs": [ 496 | { 497 | "output_type": "stream", 498 | "text": [ 499 | "Using TensorFlow backend.\r\n" 500 | ], 501 | "name": "stdout" 502 | } 503 | ] 504 | }, 505 | { 506 | "metadata": { 507 | "id": "dr8MnQUbUBY7", 508 | "colab_type": "text" 509 | }, 510 | "cell_type": "markdown", 511 | "source": [ 512 | "# Create a Web Page " 513 | ] 514 | }, 515 | { 516 | "metadata": { 517 | "id": "SC9QaDQreTDr", 518 | "colab_type": "text" 519 | }, 520 | "cell_type": "markdown", 521 | "source": [ 522 | "Import TensorFlow.js " 523 | ] 524 | }, 525 | { 526 | "metadata": { 527 | "id": "iwJQerK2eA_u", 528 | "colab_type": "code", 529 | "colab": {} 530 | }, 531 | "cell_type": "code", 532 | "source": [ 533 | "header = '\\n'" 534 | ], 535 | "execution_count": 0, 536 | "outputs": [] 537 | }, 538 | { 539 | "metadata": { 540 | "id": "GE_Y5U3UeW6U", 541 | "colab_type": "text" 542 | }, 543 | "cell_type": "markdown", 544 | "source": [ 545 | "Code for loading the web model. We predict a tensor of zeros and show the result in the page. " 546 | ] 547 | }, 548 | { 549 | "metadata": { 550 | "id": "kpGEMkjJecBM", 551 | "colab_type": "code", 552 | "colab": {} 553 | }, 554 | "cell_type": "code", 555 | "source": [ 556 | "script = '\\\n", 557 | "\\n\\\n", 565 | " \\n'" 566 | ], 567 | "execution_count": 0, 568 | "outputs": [] 569 | }, 570 | { 571 | "metadata": { 572 | "id": "0TDOfXR6f9tp", 573 | "colab_type": "text" 574 | }, 575 | "cell_type": "markdown", 576 | "source": [ 577 | "Body of the page" 578 | ] 579 | }, 580 | { 581 | "metadata": { 582 | "id": "cf5VErepf9H0", 583 | "colab_type": "code", 584 | "colab": {} 585 | }, 586 | "cell_type": "code", 587 | "source": [ 588 | "body = '\\\n", 589 | "\\n\\\n", 590 | "

\\n\\\n", 591 | "'" 592 | ], 593 | "execution_count": 0, 594 | "outputs": [] 595 | }, 596 | { 597 | "metadata": { 598 | "id": "2DaBOiA-jTER", 599 | "colab_type": "text" 600 | }, 601 | "cell_type": "markdown", 602 | "source": [ 603 | "Save the code as html file" 604 | ] 605 | }, 606 | { 607 | "metadata": { 608 | "id": "pM6JIkRCglMu", 609 | "colab_type": "code", 610 | "colab": {} 611 | }, 612 | "cell_type": "code", 613 | "source": [ 614 | "with open('index.html','w') as f:\n", 615 | " f.write(header+script+body)\n", 616 | " f.close()" 617 | ], 618 | "execution_count": 0, 619 | "outputs": [] 620 | }, 621 | { 622 | "metadata": { 623 | "id": "RGI4Lj3IS1kC", 624 | "colab_type": "text" 625 | }, 626 | "cell_type": "markdown", 627 | "source": [ 628 | "# Upload the model to GitHub " 629 | ] 630 | }, 631 | { 632 | "metadata": { 633 | "id": "t9sqGiaOTXpq", 634 | "colab_type": "text" 635 | }, 636 | "cell_type": "markdown", 637 | "source": [ 638 | "Use the following to generate an access token https://github.com/settings/tokens/new\n", 639 | ". Once you do that you can push the commits to your repository using the following \n", 640 | "\n", 641 | "`https://:x-oauth-basic@github.com/ master` \n", 642 | "\n", 643 | "For instance \n", 644 | "\n", 645 | "`https://7ab3a8fe5742bf829f1a832a5f330a8365820:x-oauth-basic@github.com/zaidalyafeai/zaidalyafeai.github.io master`" 646 | ] 647 | }, 648 | { 649 | "metadata": { 650 | "id": "-H8rf2agOdGh", 651 | "colab_type": "code", 652 | "colab": {} 653 | }, 654 | "cell_type": "code", 655 | "source": [ 656 | "!git add . \n", 657 | "!git commit -m \"create xor project\"\n", 658 | "!git push :x-oauth-basic@github.com/ master" 659 | ], 660 | "execution_count": 0, 661 | "outputs": [] 662 | } 663 | ] 664 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Notebooks 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 98 | 99 | 100 | 101 | 102 | 104 | 105 | 108 | 109 | 110 | 111 | 112 | 114 | 115 | 118 | 119 | 120 | 121 | 122 | 124 | 125 | 128 | 129 | 130 | 131 | 132 | 134 | 135 | 138 | 139 | 140 | 141 | 142 | 144 | 145 | 148 | 149 | 150 | 151 | 152 | 155 | 156 | 159 | 160 | 161 | 162 | 163 | 165 | 166 | 169 | 170 | 171 | 172 | 173 | 175 | 176 | 179 | 180 | 181 | 182 | 183 | 186 | 187 | 190 | 191 | 192 | 193 | 194 | 197 | 198 | 201 | 202 | 203 | 204 | 205 | 208 | 209 | 212 | 213 | 214 | 215 | 216 | 219 | 220 | 223 | 224 | 225 | 226 | 227 | 230 | 231 | 234 | 235 | 236 | 237 | 238 | 241 | 242 | 245 | 246 | 247 |
NameDescriptionCategory Link
Training pix2pixThis notebook shows a simple pipeline for training pix2pix on a simple dataset. Most of the code is based on this implementation. GAN 15 | 16 |
One PlaceThis notebook shows how to train, test then deploy models in the browser directly from one notebook. We use a simple XOR example to prove this simple concept.Deployment 24 | 25 |
TPU vs GPUGoogle recently allowed training on TPUs for free on colab. This notebook explains how to enable TPU training. Also, it reports some benchmarks using mnist dataset by comparing TPU and GPU performance.TPU 33 | 34 |
Keras Custom Data GeneratorThis notebook shows to create a custom data genertor in keras.Data Generatation 42 | 43 |
Eager Execution (1)As we know that TenosrFlow works with static graphs. So, first you have to create the graph then execute it later. This makes debugging a bit complicated. With Eager Execution you can now evalute operations directly without creating a session. Dynamic Graphs 51 | 52 |
Eager Execution (2)In this notebook I explain different concepts in eager execution. I go over variables, ops, gradients, custom gradients, callbacks, metrics and creating models with tf.keras and saving/restoring them.Dynamic Graphs 60 | 61 |
SketcherCreate a simple app to recognize 100 drawings from the quickdraw dataset. A simple CNN model is created and served to deoploy in the browser to create a sketch recognizer app. Deployment 69 | 70 |
QuickDraw10In this notebook we provide QuickDraw10 as an alternative for MNIST. A script is provided to download and load a preprocessed dataset for 10 classes with training and testing split. Also, a simple CNN model is implemented for training and testing. Data Preperation 78 | 79 |
AutoencodersAutoencoders consists of two structures: the encoder and the decoder. The encoder network downsamples the data into lower dimensions and the decoder network reconstructs the original data from the lower dimension representation. The lower dimension representation is usually called latent space representation. Auto-encoder 87 | 88 |
Weight TransferIn this tutorial we explain how to transfer weights from a static graph model built with TensorFlow to a dynamic graph built with Keras. We will first train a model using Tensorflow then we will create the same model in keras and transfer the trained weights between the two models. Weights Save and Load 96 | 97 |
BigGan (1)Create some cool gifs by interpolation in the latent space of the BigGan model. The model is imported from tensorflow hub. 103 | GAN 106 | 107 |
BigGan (2)In this notebook I give a basic introduction to bigGans. I also, how to interpolate between z-vector values. Moreover, I show the results of multiple experiments I made in the latent space of BigGans. 113 | GAN 116 | 117 |
Mask R-CNN In this notebook a pretrained Mask R-CNN model is used to predict the bounding box and the segmentation mask of objects. I used this notebook to create the dataset for training the pix2pix model. 123 | Segmentation 126 | 127 |
QuickDraw Strokes A notebook exploring the drawing data of quickdraw. I also illustrate how to make a cool animation of the drawing process in colab. 133 | Data Preperation 136 | 137 |
U-Net The U-Net model is a simple fully convolutional neural network that is used for binary segmentation i.e foreground and background pixel-wise classification. In this notebook we use it to segment cats and dogs from arbitrary images. 143 | Segmentation 146 | 147 |
Localizer A simple CNN with a regression branch to predict bounding box parameters. The model is trained on a dataset 153 | of dogs and cats with bounding box annotations around the head of the pets. 154 | Object Localization 157 | 158 |
Classification and Localization We create a simple CNN with two branches for classification and locazliation of cats and dogs. 164 | Classification, Localization 167 | 168 |
Transfer Learning A notebook about using Mobilenet for transfer learning in TensorFlow. The model is very fast and achieves 97% validation accuracy on a binary classification dataset. 174 | Transfer Learning 177 | 178 |
Hand Detection 184 | In this task we want to localize the right and left hands for each person that exists in a single frame. It acheives around 0.85 IoU. 185 | Detection 188 | 189 |
Face Detection 195 | In this task we used a simple version of SSD for face detection. The model was trained on less than 3K images using TensorFlow with eager execution 196 | Detection 199 | 200 |
TensorFlow 2.0 206 | In this task we use the brand new TF 2.0 with default eager execution. We explore, tensors, gradients, dataset and many more. 207 | Platform 210 | 211 |
SC-FEGAN 217 | In this notebook, you can play directly with the SC-FEGAN for face-editting directly in the browser. 218 | GAN 221 | 222 |
Swift for TensorFlow 228 | Swift for TensorFlow is a next-generation platform for machine learning that incorporates differentiable programming. In this notebook a go over its basics and also how to create a simple NN and CNN. 229 | Platform 232 | 233 |
GCN 239 | Ever asked yourself how to use convolution networks for non Euclidean data for instance graphs ? GCNs are becoming increasingly popular to solve such problems. I used Deep GCNs to classify spammers & non-spammers. 240 | Platform 243 | 244 |
248 | 249 | 250 | -------------------------------------------------------------------------------- /Strokes_QuickDraw.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Strokes_QuickDraw.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "collapsed_sections": [], 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "name": "python3", 14 | "display_name": "Python 3" 15 | }, 16 | "accelerator": "GPU" 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "view-in-github", 23 | "colab_type": "text" 24 | }, 25 | "source": [ 26 | "\"Open" 27 | ] 28 | }, 29 | { 30 | "metadata": { 31 | "id": "VRvAWhDx42vJ", 32 | "colab_type": "text" 33 | }, 34 | "cell_type": "markdown", 35 | "source": [ 36 | "# Introduction\n", 37 | "\n", 38 | "The Quick Draw Dataset is a collection of 50 million drawings across 345 categories, contributed by players of the game Quick Draw. \n", 39 | "\n", 40 | "![alt text](https://raw.githubusercontent.com/googlecreativelab/quickdraw-dataset/master/preview.jpg)" 41 | ] 42 | }, 43 | { 44 | "metadata": { 45 | "id": "x3mArkrmcIMD", 46 | "colab_type": "text" 47 | }, 48 | "cell_type": "markdown", 49 | "source": [ 50 | "# The Raw Dataset\n", 51 | "\n", 52 | "This table shows a description of the fields of each entry in the dataset" 53 | ] 54 | }, 55 | { 56 | "metadata": { 57 | "id": "18brrxIxchKc", 58 | "colab_type": "text" 59 | }, 60 | "cell_type": "markdown", 61 | "source": [ 62 | ">Key | Type | Description\n", 63 | ">--- | ---\n", 64 | ">key_id \t| 64-bit unsigned integer |\tA unique identifier across all drawings.\n", 65 | "> word |\tstring \t|Category the player was prompted to draw.\n", 66 | ">recognized |\tboolean \t|Whether the word was recognized by the game.\n", 67 | "> timestamp \t| datetime \t| When the drawing was created.\n", 68 | "> countrycode |\tstring |\tA two letter country code \n", 69 | "> drawing |\tstring |\tA JSON array representing the vector drawing" 70 | ] 71 | }, 72 | { 73 | "metadata": { 74 | "id": "tY1OeVFpeIk0", 75 | "colab_type": "text" 76 | }, 77 | "cell_type": "markdown", 78 | "source": [ 79 | "# Imports" 80 | ] 81 | }, 82 | { 83 | "metadata": { 84 | "id": "0ABX6O4kYwYS", 85 | "colab_type": "code", 86 | "colab": {} 87 | }, 88 | "cell_type": "code", 89 | "source": [ 90 | "import os\n", 91 | "import io\n", 92 | "import random\n", 93 | "import glob\n", 94 | "import math\n", 95 | "import base64\n", 96 | "import json\n", 97 | "import numpy as np\n", 98 | "import urllib.request\n", 99 | "import numpy as np\n", 100 | "import matplotlib.pyplot as plt\n", 101 | "from matplotlib import animation\n", 102 | "from IPython.display import HTML" 103 | ], 104 | "execution_count": 0, 105 | "outputs": [] 106 | }, 107 | { 108 | "metadata": { 109 | "id": "5NDfBHVjACAt", 110 | "colab_type": "text" 111 | }, 112 | "cell_type": "markdown", 113 | "source": [ 114 | "# Download the Dataset " 115 | ] 116 | }, 117 | { 118 | "metadata": { 119 | "id": "7MC_PUS-fKjH", 120 | "colab_type": "text" 121 | }, 122 | "cell_type": "markdown", 123 | "source": [ 124 | "Loop over the classes and download the currospondent data. We only download 10 classes for visualization. " 125 | ] 126 | }, 127 | { 128 | "metadata": { 129 | "id": "rdSUnpL0u22Q", 130 | "colab_type": "code", 131 | "colab": {} 132 | }, 133 | "cell_type": "code", 134 | "source": [ 135 | "!mkdir data\n", 136 | "classes = ['table', 'sun', 'laptop', 'face', 'pants', 'ladder', 'eyeglasses', 'camera', 'sword', 'cat']" 137 | ], 138 | "execution_count": 0, 139 | "outputs": [] 140 | }, 141 | { 142 | "metadata": { 143 | "id": "22DPhL5FtWcQ", 144 | "colab_type": "code", 145 | "colab": {} 146 | }, 147 | "cell_type": "code", 148 | "source": [ 149 | "def download(): \n", 150 | " #base link \n", 151 | " base = 'https://storage.googleapis.com/quickdraw_dataset/full/'\n", 152 | " \n", 153 | " #download each class as json files \n", 154 | " for c in classes:\n", 155 | " path = f'{base}raw/{c}.ndjson'\n", 156 | " print(path)\n", 157 | " urllib.request.urlretrieve(path, f'data/{c}.ndjson')" 158 | ], 159 | "execution_count": 0, 160 | "outputs": [] 161 | }, 162 | { 163 | "metadata": { 164 | "id": "O5jF6TXXu-Bu", 165 | "colab_type": "code", 166 | "outputId": "82935531-e229-4e8d-a9e1-b1d3078e6a12", 167 | "colab": { 168 | "base_uri": "https://localhost:8080/", 169 | "height": 191 170 | } 171 | }, 172 | "cell_type": "code", 173 | "source": [ 174 | "download() " 175 | ], 176 | "execution_count": 6, 177 | "outputs": [ 178 | { 179 | "output_type": "stream", 180 | "text": [ 181 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/table.ndjson\n", 182 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/sun.ndjson\n", 183 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/laptop.ndjson\n", 184 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/face.ndjson\n", 185 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/pants.ndjson\n", 186 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/ladder.ndjson\n", 187 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/eyeglasses.ndjson\n", 188 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/camera.ndjson\n", 189 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/sword.ndjson\n", 190 | "https://storage.googleapis.com/quickdraw_dataset/full/raw/cat.ndjson\n" 191 | ], 192 | "name": "stdout" 193 | } 194 | ] 195 | }, 196 | { 197 | "metadata": { 198 | "id": "PIx1l3CGex4-", 199 | "colab_type": "text" 200 | }, 201 | "cell_type": "markdown", 202 | "source": [ 203 | "# Load to Memory \n", 204 | "\n", 205 | "Load the `drawing` information for each file. Each drawing contains a number of strokes and each stroke contain the array $[x, y, t]$ where $x,y$ are the coordinates as array and $t$ is the time stamps. " 206 | ] 207 | }, 208 | { 209 | "metadata": { 210 | "id": "-Fwjz3DRbhzj", 211 | "colab_type": "code", 212 | "colab": {} 213 | }, 214 | "cell_type": "code", 215 | "source": [ 216 | "drawings = []\n", 217 | "\n", 218 | "files = os.listdir('data')\n", 219 | "\n", 220 | "i = 0 \n", 221 | "\n", 222 | "for file in files:\n", 223 | " contents = open(f'data/{file}', \"r\").read() \n", 224 | " data = contents.split('\\n')\n", 225 | " \n", 226 | " #load samples for each class \n", 227 | " for h in data[:5]:\n", 228 | " drawings.append(json.loads(h)['drawing'])\n", 229 | " i += 1" 230 | ], 231 | "execution_count": 0, 232 | "outputs": [] 233 | }, 234 | { 235 | "metadata": { 236 | "id": "_O_7-oJOy2OD", 237 | "colab_type": "code", 238 | "colab": {} 239 | }, 240 | "cell_type": "code", 241 | "source": [ 242 | "#the first stroke of the first drawing\n", 243 | "[x, y, t] = drawings[0][0]" 244 | ], 245 | "execution_count": 0, 246 | "outputs": [] 247 | }, 248 | { 249 | "metadata": { 250 | "id": "3WNQCtLOfj9D", 251 | "colab_type": "text" 252 | }, 253 | "cell_type": "markdown", 254 | "source": [ 255 | "# Animation" 256 | ] 257 | }, 258 | { 259 | "metadata": { 260 | "id": "u5lOmx6mdD6L", 261 | "colab_type": "code", 262 | "colab": {} 263 | }, 264 | "cell_type": "code", 265 | "source": [ 266 | "def create_animation(drawing, fps = 30, idx = 0, lw = 5): \n", 267 | " \n", 268 | " seq_length = 0 \n", 269 | " \n", 270 | " xmax = 0 \n", 271 | " ymax = 0 \n", 272 | " \n", 273 | " xmin = math.inf\n", 274 | " ymin = math.inf\n", 275 | " \n", 276 | " #retreive min,max and the length of the drawing \n", 277 | " for k in range(0, len(drawing)):\n", 278 | " x = drawing[k][0]\n", 279 | " y = drawing[k][1]\n", 280 | "\n", 281 | " seq_length += len(x)\n", 282 | " xmax = max([max(x), xmax]) \n", 283 | " ymax = max([max(y), ymax]) \n", 284 | " \n", 285 | " xmin = min([min(x), xmin]) \n", 286 | " ymin = min([min(y), ymin]) \n", 287 | " \n", 288 | " i = 0 \n", 289 | " j = 0\n", 290 | " \n", 291 | " # First set up the figure, the axis, and the plot element we want to animate\n", 292 | " fig = plt.figure()\n", 293 | " ax = plt.axes(xlim=(xmax+lw, xmin-lw), ylim=(ymax+lw, ymin-lw))\n", 294 | " ax.set_facecolor(\"white\")\n", 295 | " line, = ax.plot([], [], lw=lw)\n", 296 | "\n", 297 | " #remove the axis \n", 298 | " ax.grid = False\n", 299 | " ax.set_xticks([])\n", 300 | " ax.set_yticks([])\n", 301 | " \n", 302 | " # initialization function: plot the background of each frame\n", 303 | " def init():\n", 304 | " line.set_data([], [])\n", 305 | " return line, \n", 306 | "\n", 307 | " # animation function. This is called sequentially\n", 308 | " def animate(frame): \n", 309 | " nonlocal i, j, line\n", 310 | " x = drawing[i][0]\n", 311 | " y = drawing[i][1]\n", 312 | " line.set_data(x[0:j], y[0:j])\n", 313 | " \n", 314 | " if j >= len(x):\n", 315 | " i +=1\n", 316 | " j = 0 \n", 317 | " line, = ax.plot([], [], lw=lw)\n", 318 | " \n", 319 | " else:\n", 320 | " j += 1\n", 321 | " return line,\n", 322 | " \n", 323 | " # call the animator. blit=True means only re-draw the parts that have changed.\n", 324 | " anim = animation.FuncAnimation(fig, animate, init_func=init,\n", 325 | " frames= seq_length + len(drawing), blit=True)\n", 326 | " plt.close()\n", 327 | " \n", 328 | " # save the animation as an mp4. \n", 329 | " anim.save(f'video.mp4', fps=fps, extra_args=['-vcodec', 'libx264'])" 330 | ], 331 | "execution_count": 0, 332 | "outputs": [] 333 | }, 334 | { 335 | "metadata": { 336 | "id": "sLTrgWG1-BrS", 337 | "colab_type": "code", 338 | "colab": {} 339 | }, 340 | "cell_type": "code", 341 | "source": [ 342 | "#create animation for a random drawing \n", 343 | "drawing = drawings[np.random.randint(0, len(drawings))]\n", 344 | "create_animation(drawing)" 345 | ], 346 | "execution_count": 0, 347 | "outputs": [] 348 | }, 349 | { 350 | "metadata": { 351 | "id": "av_gNHXIPHvu", 352 | "colab_type": "code", 353 | "outputId": "5933893e-3aa4-4418-93ed-b3bf0e678d4f", 354 | "colab": { 355 | "base_uri": "https://localhost:8080/", 356 | "height": 417 357 | } 358 | }, 359 | "cell_type": "code", 360 | "source": [ 361 | "video = io.open('video.mp4', 'r+b').read()\n", 362 | "\n", 363 | "encoded = base64.b64encode(video)\n", 364 | "HTML(data=''''''.format(encoded.decode('ascii')))" 367 | ], 368 | "execution_count": 46, 369 | "outputs": [ 370 | { 371 | "output_type": "execute_result", 372 | "data": { 373 | "text/html": [ 374 | "" 377 | ], 378 | "text/plain": [ 379 | "" 380 | ] 381 | }, 382 | "metadata": { 383 | "tags": [] 384 | }, 385 | "execution_count": 46 386 | } 387 | ] 388 | } 389 | ] 390 | } -------------------------------------------------------------------------------- /Swift4TF_CIFAR10.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "Swift4TF_CIFAR10.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "collapsed_sections": [], 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "name": "swift", 14 | "display_name": "Swift" 15 | }, 16 | "accelerator": "GPU" 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "view-in-github", 23 | "colab_type": "text" 24 | }, 25 | "source": [ 26 | "\"Open" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "metadata": { 32 | "id": "2Rw1ZBHPxdsX", 33 | "colab_type": "code", 34 | "colab": {} 35 | }, 36 | "source": [ 37 | "import Python\n", 38 | "import TensorFlow" 39 | ], 40 | "execution_count": 0, 41 | "outputs": [] 42 | }, 43 | { 44 | "cell_type": "markdown", 45 | "metadata": { 46 | "id": "J2RItu06_wQJ", 47 | "colab_type": "text" 48 | }, 49 | "source": [ 50 | "Import some python libraries that we need " 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "metadata": { 56 | "id": "SslxOdhg3Num", 57 | "colab_type": "code", 58 | "colab": {} 59 | }, 60 | "source": [ 61 | "%include \"EnableIPythonDisplay.swift\"\n", 62 | "IPythonDisplay.shell.enable_matplotlib(\"inline\")\n", 63 | "\n", 64 | "let plt = Python.import(\"matplotlib.pyplot\")\n", 65 | "let np = Python.import(\"numpy\")\n", 66 | "let subprocess = Python.import(\"subprocess\")\n", 67 | "let path = Python.import(\"os.path\")" 68 | ], 69 | "execution_count": 0, 70 | "outputs": [] 71 | }, 72 | { 73 | "cell_type": "markdown", 74 | "metadata": { 75 | "id": "02PhB9dg_09o", 76 | "colab_type": "text" 77 | }, 78 | "source": [ 79 | "Download cifar 10 " 80 | ] 81 | }, 82 | { 83 | "cell_type": "code", 84 | "metadata": { 85 | "id": "sGzbZdJH3F9B", 86 | "colab_type": "code", 87 | "colab": { 88 | "base_uri": "https://localhost:8080/", 89 | "height": 54 90 | }, 91 | "outputId": "ae7aa258-70c2-4b80-df3e-c27c7429c4a5" 92 | }, 93 | "source": [ 94 | "//https://github.com/tensorflow/swift-models/tree/master/CIFAR\n", 95 | "\n", 96 | "let filepath = \"./cifar-10-batches-py\"\n", 97 | "let isdir = Bool(path.isdir(filepath))!\n", 98 | "if !isdir {\n", 99 | " print(\"Downloading CIFAR data...\")\n", 100 | " let command = \"wget -nv -O- https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz | tar xzf - -C .\"\n", 101 | " subprocess.call(command, shell: true)\n", 102 | "}" 103 | ], 104 | "execution_count": 3, 105 | "outputs": [ 106 | { 107 | "output_type": "stream", 108 | "text": [ 109 | "Downloading CIFAR data...\n", 110 | "2019-05-05 22:15:06 URL:https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz [170498071/170498071] -> \"-\" [1]\n" 111 | ], 112 | "name": "stdout" 113 | } 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": { 119 | "id": "C1iYm9INCh2A", 120 | "colab_type": "text" 121 | }, 122 | "source": [ 123 | "Setup the dataset " 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "metadata": { 129 | "id": "1aiJm7Ht1S9c", 130 | "colab_type": "code", 131 | "colab": {} 132 | }, 133 | "source": [ 134 | "//https://github.com/tensorflow/swift-models/tree/master/CIFAR\n", 135 | "\n", 136 | "var batchSize:Int = 64 \n", 137 | "\n", 138 | "func loadCIFARFile(named name: String, in directory: String = \".\") -> (Tensor, Tensor) {\n", 139 | " let np = Python.import(\"numpy\")\n", 140 | " let pickle = Python.import(\"pickle\")\n", 141 | " let path = \"\\(directory)/cifar-10-batches-py/\\(name)\"\n", 142 | " let f = Python.open(path, \"rb\")\n", 143 | " let res = pickle.load(f, encoding: \"bytes\")\n", 144 | "\n", 145 | " let bytes = res[Python.bytes(\"data\", encoding: \"utf8\")]\n", 146 | " let labels = res[Python.bytes(\"labels\", encoding: \"utf8\")]\n", 147 | "\n", 148 | " let labelTensor = Tensor(numpy: np.array(labels))!\n", 149 | " let images = Tensor(numpy: bytes)!\n", 150 | " let imageCount = images.shape[0]\n", 151 | "\n", 152 | " // reshape and transpose from the provided N(CHW) to TF default NHWC\n", 153 | " let imageTensor = Tensor(images\n", 154 | " .reshaped(to: [imageCount, 3, 32, 32])\n", 155 | " .transposed(withPermutations: [0, 2, 3, 1]))\n", 156 | "\n", 157 | " let mean = Tensor([0.485, 0.456, 0.406])\n", 158 | " let std = Tensor([0.229, 0.224, 0.225])\n", 159 | " let imagesNormalized = ((imageTensor / 255.0) - mean) / std\n", 160 | "\n", 161 | " return (imagesNormalized, Tensor(labelTensor))\n", 162 | "}\n", 163 | "\n", 164 | "/// helper functions \n", 165 | "\n", 166 | "// report accuracy of a batch \n", 167 | "func getAccuracy(y:Tensor, logits:Tensor) -> Float{\n", 168 | " let out = Tensor(logits.argmax(squeezingAxis: 1) .== y).sum().scalarized()\n", 169 | " return Float(out) / Float(y.shape[0])\n", 170 | "}\n", 171 | "\n", 172 | "//round two decimal places \n", 173 | "func roundTwo(_ input:Float) -> Float{\n", 174 | " return (input*100).rounded()/100\n", 175 | "}\n", 176 | "\n", 177 | "//crop to a certain size \n", 178 | "func crop(_ tensor:Tensor, _ size:Int) -> Tensor {\n", 179 | " let i = Int.random(in: 0..<32-size)\n", 180 | " let j = Int.random(in: 0..<32-size)\n", 181 | " let N = Int(tensor.shape[0])\n", 182 | " \n", 183 | " return tensor[0..) -> Tensor {\n", 188 | " var out = tensor\n", 189 | " \n", 190 | " //maybe flip\n", 191 | " if Float.random(in:0...1) < 0.5{\n", 192 | " out = tensor.transposed(withPermutations: [0, 1, 2, 3])\n", 193 | " }\n", 194 | " //maybe crop and resize \n", 195 | " if Float.random(in:0...1) < 0.5{\n", 196 | " let cropped = crop(tensor, 25)\n", 197 | " out = Raw.resizeArea(images:cropped , size:[32, 32] )\n", 198 | " }\n", 199 | " \n", 200 | " return out\n", 201 | "}" 202 | ], 203 | "execution_count": 0, 204 | "outputs": [] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": { 209 | "id": "RQCTZq_VC90x", 210 | "colab_type": "text" 211 | }, 212 | "source": [ 213 | "Create a dataset " 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "metadata": { 219 | "id": "h9nh7E_i7A9C", 220 | "colab_type": "code", 221 | "colab": {} 222 | }, 223 | "source": [ 224 | "struct Element: TensorGroup {\n", 225 | " var x: Tensor\n", 226 | " var y: Tensor\n", 227 | "}\n", 228 | "\n", 229 | "//cifar 10 training comes in 6 files we load/concatenate them [5000, 32, 32, 3]\n", 230 | "let train_data = (1..<6).map { loadCIFARFile(named: \"data_batch_\\($0)\") }\n", 231 | "\n", 232 | "let trainX = Raw.concat(concatDim: Tensor(0), train_data.map { $0.0})\n", 233 | "let trainY = Raw.concat(concatDim: Tensor(0), train_data.map { $0.1})\n", 234 | "\n", 235 | "//load testing images size [1000, 32, 32, 3]\n", 236 | "let (testX, testY) = loadCIFARFile(named: \"test_batch\")\n", 237 | "\n", 238 | "//create a dataset for training and testing \n", 239 | "let trainDataset = Dataset(elements: Element(x: trainX, y:trainY))\n", 240 | "let testDataset = Dataset(elements: Element(x:testX, y:testY))" 241 | ], 242 | "execution_count": 0, 243 | "outputs": [] 244 | }, 245 | { 246 | "cell_type": "markdown", 247 | "metadata": { 248 | "id": "8f_A1GMKDmsW", 249 | "colab_type": "text" 250 | }, 251 | "source": [ 252 | "Create the basic parts of the mode [convblocks + classifier]" 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "metadata": { 258 | "id": "EiFHwXwiGohF", 259 | "colab_type": "code", 260 | "colab": {} 261 | }, 262 | "source": [ 263 | "struct ConvBlock:Layer{\n", 264 | "\n", 265 | " typealias Input = Tensor\n", 266 | " typealias Output = Tensor\n", 267 | " \n", 268 | " var conv1: Conv2D\n", 269 | " var conv2: Conv2D\n", 270 | " var pool: MaxPool2D\n", 271 | " var norm: BatchNorm\n", 272 | " \n", 273 | " init(filterShape:(Int, Int))\n", 274 | " {\n", 275 | " self.conv1 = Conv2D(filterShape: (3, 3,filterShape.0, filterShape.1), \n", 276 | " strides: (1, 1), padding : .same, activation: relu)\n", 277 | " \n", 278 | " self.conv2 = Conv2D(filterShape: (3, 3,filterShape.1, filterShape.1), \n", 279 | " strides: (1, 1), padding : .same, activation: relu)\n", 280 | " \n", 281 | " self.norm = BatchNorm(featureCount: filterShape.1)\n", 282 | " self.pool = MaxPool2D(poolSize: (2, 2), strides: (2, 2))\n", 283 | " }\n", 284 | " \n", 285 | " @differentiable\n", 286 | " func call(_ input: Input) -> Output {\n", 287 | " return input.sequenced(through: conv1, conv2, norm, pool)\n", 288 | " }\n", 289 | "}\n", 290 | "\n", 291 | "struct Classifier:Layer{\n", 292 | "\n", 293 | " typealias Input = Tensor\n", 294 | " typealias Output = Tensor\n", 295 | " \n", 296 | " var dense1: Dense\n", 297 | " var dense2: Dense\n", 298 | " var dropout: Dropout\n", 299 | " \n", 300 | " init(input:Int, mid:Int)\n", 301 | " {\n", 302 | " self.dense1 = Dense(inputSize: input , outputSize: mid, activation: relu)\n", 303 | " self.dropout = Dropout(probability: 0.5)\n", 304 | " self.dense2 = Dense(inputSize: mid , outputSize: 10)\n", 305 | " }\n", 306 | " \n", 307 | " @differentiable\n", 308 | " func call(_ input: Input) -> Output {\n", 309 | " return input.sequenced(through: dense1, dropout, dense2) \n", 310 | " }\n", 311 | "}" 312 | ], 313 | "execution_count": 0, 314 | "outputs": [] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": { 319 | "id": "g8JIXXijD2kV", 320 | "colab_type": "text" 321 | }, 322 | "source": [ 323 | "Create the overall model" 324 | ] 325 | }, 326 | { 327 | "cell_type": "code", 328 | "metadata": { 329 | "id": "3cwhwVuJcenw", 330 | "colab_type": "code", 331 | "colab": {} 332 | }, 333 | "source": [ 334 | "struct CNN: Layer {\n", 335 | " typealias Input = Tensor\n", 336 | " typealias Output = Tensor\n", 337 | "\n", 338 | " var conv1 = ConvBlock(filterShape:(3, 16))\n", 339 | " var conv2 = ConvBlock(filterShape:(16, 32))\n", 340 | " var conv3 = ConvBlock(filterShape:(32, 64))\n", 341 | " var conv4 = ConvBlock(filterShape:(64, 64))\n", 342 | " \n", 343 | " var dropout = Dropout(probability: 0.5)\n", 344 | " \n", 345 | " var flatten = Flatten()\n", 346 | " var classifier = Classifier(input:2*2*64, mid:128)\n", 347 | " \n", 348 | " @differentiable\n", 349 | " func call(_ input: Input) -> Output {\n", 350 | " let convolved = input.sequenced(through: conv1, conv2, conv3, conv4)\n", 351 | " return convolved.sequenced(through:dropout, flatten, classifier)\n", 352 | " }\n", 353 | "}" 354 | ], 355 | "execution_count": 0, 356 | "outputs": [] 357 | }, 358 | { 359 | "cell_type": "code", 360 | "metadata": { 361 | "id": "iqnb1PSJavPO", 362 | "colab_type": "code", 363 | "outputId": "3c95e9ea-5f1a-4daa-9af7-66e99f19afb5", 364 | "colab": { 365 | "base_uri": "https://localhost:8080/", 366 | "height": 35 367 | } 368 | }, 369 | "source": [ 370 | "var model = CNN()\n", 371 | "let optimizer = Adam(for: model)\n", 372 | "\n", 373 | "//warmup \n", 374 | "let tensor = Tensor(zeros: [1, 32, 32, 3])\n", 375 | "print(model(tensor))" 376 | ], 377 | "execution_count": 12, 378 | "outputs": [ 379 | { 380 | "output_type": "stream", 381 | "text": [ 382 | "[[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]]\r\n" 383 | ], 384 | "name": "stdout" 385 | } 386 | ] 387 | }, 388 | { 389 | "cell_type": "markdown", 390 | "metadata": { 391 | "id": "mEBAvChZD7DS", 392 | "colab_type": "text" 393 | }, 394 | "source": [ 395 | "Training and reporting the results" 396 | ] 397 | }, 398 | { 399 | "cell_type": "code", 400 | "metadata": { 401 | "id": "FSF0UUasAXVN", 402 | "colab_type": "code", 403 | "outputId": "612d10c9-97b8-4d55-a2de-23a3a54a5302", 404 | "colab": { 405 | "base_uri": "https://localhost:8080/", 406 | "height": 926 407 | } 408 | }, 409 | "source": [ 410 | "var trainLoss:Float = 0.0\n", 411 | "var trainAcc :Float = 0.0\n", 412 | "var testLoss:Float = 0.0\n", 413 | "var testAcc:Float = 0.0 \n", 414 | "\n", 415 | "var batchCount: Float = 0.0\n", 416 | "\n", 417 | "for epoch in 0..<50{\n", 418 | " \n", 419 | " //evaluate metrics\n", 420 | " trainLoss = 0.0\n", 421 | " trainAcc = 0.0\n", 422 | " batchCount = 0.0 \n", 423 | " \n", 424 | " let shuffled = trainDataset.shuffled(sampleCount:50000 , randomSeed: Int64(epoch))\n", 425 | " \n", 426 | " for batch in shuffled.batched(batchSize) {\n", 427 | " \n", 428 | " //get batches\n", 429 | " let X = augment(batch.x)\n", 430 | " let y = batch.y\n", 431 | " \n", 432 | " //calculate the loss and gradient\n", 433 | " let (loss, grads) = valueWithGradient(at: model) { model -> Tensor in\n", 434 | " let logits = model(X)\n", 435 | " return softmaxCrossEntropy(logits: logits, labels: y)\n", 436 | " }\n", 437 | "\n", 438 | " //make an optimizer step \n", 439 | " optimizer.update(&model.allDifferentiableVariables, along: grads) \n", 440 | " \n", 441 | " let logits = model(X) //this is slowing down ? \n", 442 | " let acc = getAccuracy(y:y, logits:logits)\n", 443 | " \n", 444 | " trainLoss += Float(loss.scalarized())\n", 445 | " trainAcc += acc\n", 446 | " batchCount += 1\n", 447 | " } \n", 448 | " \n", 449 | " trainLoss /= batchCount\n", 450 | " trainAcc /= batchCount\n", 451 | " \n", 452 | " //training\n", 453 | " testLoss = 0.0\n", 454 | " testAcc = 0.0\n", 455 | " \n", 456 | " let logits = model(testX)\n", 457 | " let loss = softmaxCrossEntropy(logits: logits, labels: testY)\n", 458 | " let acc = getAccuracy(y:testY, logits:logits)\n", 459 | "\n", 460 | " testLoss += Float(loss.scalarized())\n", 461 | " testAcc += acc\n", 462 | " print(\"epoch: \\(epoch+1), train_loss: \\(roundTwo(trainLoss)), test_loss: \\(roundTwo(testLoss)), train_acc: \\(roundTwo(trainAcc)), test_acc: \\(roundTwo(testAcc))\" )\n", 463 | "\n", 464 | "}" 465 | ], 466 | "execution_count": 13, 467 | "outputs": [ 468 | { 469 | "output_type": "stream", 470 | "text": [ 471 | "epoch: 1, train_loss: 1.64, test_loss: 1.3, train_acc: 0.41, test_acc: 0.53\n", 472 | "epoch: 2, train_loss: 1.23, test_loss: 1.1, train_acc: 0.58, test_acc: 0.61\n", 473 | "epoch: 3, train_loss: 1.04, test_loss: 0.94, train_acc: 0.66, test_acc: 0.67\n", 474 | "epoch: 4, train_loss: 0.92, test_loss: 0.83, train_acc: 0.7, test_acc: 0.71\n", 475 | "epoch: 5, train_loss: 0.84, test_loss: 0.87, train_acc: 0.73, test_acc: 0.7\n", 476 | "epoch: 6, train_loss: 0.78, test_loss: 0.76, train_acc: 0.76, test_acc: 0.74\n", 477 | "epoch: 7, train_loss: 0.74, test_loss: 0.76, train_acc: 0.77, test_acc: 0.74\n", 478 | "epoch: 8, train_loss: 0.69, test_loss: 0.77, train_acc: 0.79, test_acc: 0.73\n", 479 | "epoch: 9, train_loss: 0.66, test_loss: 0.72, train_acc: 0.8, test_acc: 0.75\n", 480 | "epoch: 10, train_loss: 0.63, test_loss: 0.69, train_acc: 0.81, test_acc: 0.77\n", 481 | "epoch: 11, train_loss: 0.61, test_loss: 0.67, train_acc: 0.82, test_acc: 0.77\n", 482 | "epoch: 12, train_loss: 0.59, test_loss: 0.7, train_acc: 0.82, test_acc: 0.76\n", 483 | "epoch: 13, train_loss: 0.58, test_loss: 0.71, train_acc: 0.83, test_acc: 0.77\n", 484 | "epoch: 14, train_loss: 0.56, test_loss: 0.69, train_acc: 0.84, test_acc: 0.77\n", 485 | "epoch: 15, train_loss: 0.55, test_loss: 0.71, train_acc: 0.84, test_acc: 0.77\n", 486 | "epoch: 16, train_loss: 0.52, test_loss: 0.69, train_acc: 0.85, test_acc: 0.78\n", 487 | "epoch: 17, train_loss: 0.51, test_loss: 0.68, train_acc: 0.86, test_acc: 0.78\n", 488 | "epoch: 18, train_loss: 0.51, test_loss: 0.66, train_acc: 0.85, test_acc: 0.78\n", 489 | "epoch: 19, train_loss: 0.5, test_loss: 0.7, train_acc: 0.86, test_acc: 0.77\n", 490 | "epoch: 20, train_loss: 0.5, test_loss: 0.7, train_acc: 0.86, test_acc: 0.78\n", 491 | "epoch: 21, train_loss: 0.48, test_loss: 0.66, train_acc: 0.86, test_acc: 0.79\n", 492 | "epoch: 22, train_loss: 0.48, test_loss: 0.7, train_acc: 0.86, test_acc: 0.78\n", 493 | "epoch: 23, train_loss: 0.46, test_loss: 0.67, train_acc: 0.87, test_acc: 0.79\n", 494 | "epoch: 24, train_loss: 0.45, test_loss: 0.69, train_acc: 0.87, test_acc: 0.79\n", 495 | "epoch: 25, train_loss: 0.44, test_loss: 0.72, train_acc: 0.88, test_acc: 0.78\n", 496 | "epoch: 26, train_loss: 0.44, test_loss: 0.69, train_acc: 0.88, test_acc: 0.78\n", 497 | "epoch: 27, train_loss: 0.43, test_loss: 0.72, train_acc: 0.88, test_acc: 0.78\n", 498 | "epoch: 28, train_loss: 0.42, test_loss: 0.68, train_acc: 0.88, test_acc: 0.79\n", 499 | "epoch: 29, train_loss: 0.43, test_loss: 0.67, train_acc: 0.88, test_acc: 0.8\n", 500 | "epoch: 30, train_loss: 0.42, test_loss: 0.71, train_acc: 0.89, test_acc: 0.78\n", 501 | "epoch: 31, train_loss: 0.4, test_loss: 0.72, train_acc: 0.89, test_acc: 0.78\n", 502 | "epoch: 32, train_loss: 0.41, test_loss: 0.69, train_acc: 0.89, test_acc: 0.79\n", 503 | "epoch: 33, train_loss: 0.39, test_loss: 0.73, train_acc: 0.9, test_acc: 0.79\n", 504 | "epoch: 34, train_loss: 0.39, test_loss: 0.7, train_acc: 0.9, test_acc: 0.79\n", 505 | "epoch: 35, train_loss: 0.38, test_loss: 0.7, train_acc: 0.9, test_acc: 0.79\n", 506 | "epoch: 36, train_loss: 0.38, test_loss: 0.71, train_acc: 0.9, test_acc: 0.79\n", 507 | "epoch: 37, train_loss: 0.37, test_loss: 0.68, train_acc: 0.9, test_acc: 0.8\n", 508 | "epoch: 38, train_loss: 0.37, test_loss: 0.69, train_acc: 0.9, test_acc: 0.8\n", 509 | "epoch: 39, train_loss: 0.37, test_loss: 0.72, train_acc: 0.9, test_acc: 0.8\n", 510 | "epoch: 40, train_loss: 0.36, test_loss: 0.72, train_acc: 0.91, test_acc: 0.79\n", 511 | "epoch: 41, train_loss: 0.34, test_loss: 0.71, train_acc: 0.91, test_acc: 0.79\n", 512 | "epoch: 42, train_loss: 0.35, test_loss: 0.74, train_acc: 0.91, test_acc: 0.79\n", 513 | "epoch: 43, train_loss: 0.35, test_loss: 0.73, train_acc: 0.91, test_acc: 0.79\n", 514 | "epoch: 44, train_loss: 0.35, test_loss: 0.74, train_acc: 0.91, test_acc: 0.79\n", 515 | "epoch: 45, train_loss: 0.34, test_loss: 0.74, train_acc: 0.91, test_acc: 0.79\n", 516 | "epoch: 46, train_loss: 0.34, test_loss: 0.72, train_acc: 0.91, test_acc: 0.8\n", 517 | "epoch: 47, train_loss: 0.34, test_loss: 0.71, train_acc: 0.91, test_acc: 0.8\n", 518 | "epoch: 48, train_loss: 0.32, test_loss: 0.73, train_acc: 0.92, test_acc: 0.79\n", 519 | "epoch: 49, train_loss: 0.34, test_loss: 0.69, train_acc: 0.91, test_acc: 0.8\n", 520 | "epoch: 50, train_loss: 0.32, test_loss: 0.73, train_acc: 0.92, test_acc: 0.8\n" 521 | ], 522 | "name": "stdout" 523 | } 524 | ] 525 | } 526 | ] 527 | } -------------------------------------------------------------------------------- /TF4ST_MNIST.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "TF4ST MNIST.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "collapsed_sections": [], 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "display_name": "Swift", 14 | "language": "swift", 15 | "name": "swift" 16 | }, 17 | "accelerator": "GPU" 18 | }, 19 | "cells": [ 20 | { 21 | "cell_type": "markdown", 22 | "metadata": { 23 | "id": "view-in-github", 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "\"Open" 28 | ] 29 | }, 30 | { 31 | "metadata": { 32 | "id": "NuzYtyiH_jxK", 33 | "colab_type": "code", 34 | "colab": {} 35 | }, 36 | "cell_type": "code", 37 | "source": [ 38 | "import Foundation\n", 39 | "import TensorFlow\n", 40 | "import Python" 41 | ], 42 | "execution_count": 0, 43 | "outputs": [] 44 | }, 45 | { 46 | "metadata": { 47 | "id": "FU5Vko57dWnZ", 48 | "colab_type": "text" 49 | }, 50 | "cell_type": "markdown", 51 | "source": [ 52 | "## Download Data and Labels" 53 | ] 54 | }, 55 | { 56 | "metadata": { 57 | "id": "nU-JcQQabi5O", 58 | "colab_type": "code", 59 | "colab": { 60 | "base_uri": "https://localhost:8080/", 61 | "height": 51 62 | }, 63 | "outputId": "43deb303-d6f1-49f1-de92-f65e96614336" 64 | }, 65 | "cell_type": "code", 66 | "source": [ 67 | "let urllib = Python.import(\"urllib.request\")\n", 68 | "let fileBaseURL = \"https://raw.githubusercontent.com/tensorflow/swift-models/stable/MNIST/\"\n", 69 | "let files = [\"train-images-idx3-ubyte\", \"train-labels-idx1-ubyte\"]\n", 70 | "\n", 71 | "for file in files {\n", 72 | " print(\"downloading ... \", file)\n", 73 | " urllib.urlretrieve(fileBaseURL + file, filename: file)\n", 74 | "}" 75 | ], 76 | "execution_count": 2, 77 | "outputs": [ 78 | { 79 | "output_type": "stream", 80 | "text": [ 81 | "downloading ... train-images-idx3-ubyte\n", 82 | "downloading ... train-labels-idx1-ubyte\n" 83 | ], 84 | "name": "stdout" 85 | } 86 | ] 87 | }, 88 | { 89 | "metadata": { 90 | "id": "gF303ze8dxXv", 91 | "colab_type": "text" 92 | }, 93 | "cell_type": "markdown", 94 | "source": [ 95 | "## Process Data " 96 | ] 97 | }, 98 | { 99 | "metadata": { 100 | "id": "dthLi1e191dQ", 101 | "colab_type": "code", 102 | "colab": {} 103 | }, 104 | "cell_type": "code", 105 | "source": [ 106 | "var batchSize:Int32 = 32 \n", 107 | "\n", 108 | "/// Reads a file into an array of bytes.\n", 109 | "func readFile(_ path: String) -> [UInt8] {\n", 110 | " let url = URL(fileURLWithPath: path)\n", 111 | " let data = try! Data(contentsOf: url, options: [])\n", 112 | " return [UInt8](data)\n", 113 | "}\n", 114 | "\n", 115 | "/// Reads MNIST images and labels from specified file paths.\n", 116 | "func readMNIST(imagesFile: String, labelsFile: String) -> (images: Tensor,\n", 117 | " labels: Tensor) {\n", 118 | " print(\"Reading data.\")\n", 119 | " let images = readFile(imagesFile).dropFirst(16).map(Float.init)\n", 120 | " let labels = readFile(labelsFile).dropFirst(8).map(Int32.init)\n", 121 | " let rowCount = Int32(labels.count)\n", 122 | " let imageHeight: Int32 = 28, imageWidth: Int32 = 28\n", 123 | "\n", 124 | " print(\"Constructing data tensors.\")\n", 125 | " return (\n", 126 | " images: Tensor(shape: [rowCount, 1, imageHeight, imageWidth], scalars: images)\n", 127 | " .transposed(withPermutations: [0, 2, 3, 1]) / 255, // NHWC\n", 128 | " labels: Tensor(labels)\n", 129 | " )\n", 130 | "}\n", 131 | "\n", 132 | "/// Split data into training and test\n", 133 | "func splitTrainTest(data: Tensor, labels: Tensor) -> (Tensor, Tensor, Tensor , Tensor) {\n", 134 | " \n", 135 | " let N = Int32(data.shape[0])\n", 136 | " let split = Int32(0.8 * Float(N))\n", 137 | " \n", 138 | " let trainX = data[0..(in x: Tensor, at index: Int32) -> Tensor {\n", 149 | " let start = Int32(index * batchSize)\n", 150 | " return x[start..(oneHotAtIndices: trainNumericLabels, depth: 10)\n", 171 | "\n", 172 | "// split into training and testing \n", 173 | "let (trainX, trainY, testX, testY) = splitTrainTest(data: data, labels: labels)" 174 | ], 175 | "execution_count": 4, 176 | "outputs": [ 177 | { 178 | "output_type": "stream", 179 | "text": [ 180 | "Reading data.\n", 181 | "Constructing data tensors.\n" 182 | ], 183 | "name": "stdout" 184 | } 185 | ] 186 | }, 187 | { 188 | "metadata": { 189 | "id": "UgBo0BC7d7bg", 190 | "colab_type": "text" 191 | }, 192 | "cell_type": "markdown", 193 | "source": [ 194 | "## CNN Model" 195 | ] 196 | }, 197 | { 198 | "metadata": { 199 | "id": "YZ-tLhdlJvc2", 200 | "colab_type": "code", 201 | "colab": {} 202 | }, 203 | "cell_type": "code", 204 | "source": [ 205 | "struct CNN: Layer {\n", 206 | " var conv1 = Conv2D(filterShape: (3, 3, 1, 16), activation: relu) \n", 207 | " var conv2 = Conv2D(filterShape: (3, 3, 16, 32), activation: relu) \n", 208 | " \n", 209 | " var pool = MaxPool2D(poolSize: (2, 2), strides: (2, 2))\n", 210 | " \n", 211 | " var flatten = Flatten()\n", 212 | " \n", 213 | " var dense1 = Dense(inputSize: 5*5*32 , outputSize: 128, activation: tanh)\n", 214 | " var dense2 = Dense(inputSize: 128 , outputSize: 10)\n", 215 | "\n", 216 | " @differentiable\n", 217 | " func applied(to input: Tensor, in context: Context) -> Tensor {\n", 218 | " var x = input\n", 219 | " x = conv1.applied(to: x, in: context)\n", 220 | " x = pool.applied(to: x, in: context)\n", 221 | " \n", 222 | " x = conv2.applied(to: x, in: context)\n", 223 | " x = pool.applied(to: x, in: context)\n", 224 | " \n", 225 | " x = flatten.applied(to: x, in: context)\n", 226 | " \n", 227 | " x = dense1.applied(to: x, in: context)\n", 228 | " x = dense2.applied(to: x, in: context)\n", 229 | " return x \n", 230 | " }\n", 231 | "}\n", 232 | "\n", 233 | "let optimizer = SGD(learningRate: 0.01)" 234 | ], 235 | "execution_count": 0, 236 | "outputs": [] 237 | }, 238 | { 239 | "metadata": { 240 | "id": "PDES_rECKXTk", 241 | "colab_type": "code", 242 | "colab": {} 243 | }, 244 | "cell_type": "code", 245 | "source": [ 246 | "func getAccuracy(y:Tensor, logits:Tensor) -> Float{\n", 247 | " let yhat = logits.argmax(squeezingAxis: 1) - y.argmax(squeezingAxis: 1)\n", 248 | " return Float(yhat.makeNumpyArray().count(where: { $0 == 0})) / Float(yhat.shape[0])\n", 249 | "}" 250 | ], 251 | "execution_count": 0, 252 | "outputs": [] 253 | }, 254 | { 255 | "metadata": { 256 | "id": "lV2RzLtMeBBH", 257 | "colab_type": "text" 258 | }, 259 | "cell_type": "markdown", 260 | "source": [ 261 | "## Training" 262 | ] 263 | }, 264 | { 265 | "metadata": { 266 | "id": "hzGVi7oqJ820", 267 | "colab_type": "code", 268 | "outputId": "07e56ed6-d36a-49ca-84d0-520c975e5586", 269 | "colab": { 270 | "base_uri": "https://localhost:8080/", 271 | "height": 102 272 | } 273 | }, 274 | "cell_type": "code", 275 | "source": [ 276 | "var model = CNN()\n", 277 | "\n", 278 | "let trainingContext = Context(learningPhase: .training)\n", 279 | "let inferenceContext = Context(learningPhase: .inference)\n", 280 | "\n", 281 | "let stepsInEpoch:Int32 = Int32(Float(testX.shape[0]) / Float(batchSize))\n", 282 | "var avgLoss:Float = 0.0\n", 283 | "var avgAcc :Float = 0.0\n", 284 | "\n", 285 | "for epoch in 0...4{\n", 286 | " \n", 287 | " //evaluate metrics\n", 288 | " avgLoss = 0.0\n", 289 | " avgAcc = 0.0\n", 290 | " \n", 291 | " for i in 0.. Tensor in\n", 299 | " let logits = model.applied(to: X, in: trainingContext)\n", 300 | " return softmaxCrossEntropy(logits: logits, oneHotLabels: y)\n", 301 | " }\n", 302 | "\n", 303 | " //make an optimizer step \n", 304 | " optimizer.update(&model.allDifferentiableVariables, along: grads) \n", 305 | " \n", 306 | " let logits = model.applied(to: X, in: inferenceContext)\n", 307 | " let acc = getAccuracy(y:y, logits:logits)\n", 308 | " \n", 309 | " avgLoss += (Float(loss) ?? 0.0)/Float(stepsInEpoch)\n", 310 | " avgAcc += acc/Float(stepsInEpoch)\n", 311 | " }\n", 312 | " \n", 313 | " print(String(format:\"epoch: %d, train_loss: %.2f, train_acc: %.2f\", (epoch+1), avgLoss, avgAcc))\n", 314 | "}" 315 | ], 316 | "execution_count": 7, 317 | "outputs": [ 318 | { 319 | "output_type": "stream", 320 | "text": [ 321 | "epoch: 1, train_loss: 1.48, train_acc: 0.62\n", 322 | "epoch: 2, train_loss: 0.42, train_acc: 0.92\n", 323 | "epoch: 3, train_loss: 0.30, train_acc: 0.95\n", 324 | "epoch: 4, train_loss: 0.24, train_acc: 0.96\n", 325 | "epoch: 5, train_loss: 0.20, train_acc: 0.97\n" 326 | ], 327 | "name": "stdout" 328 | } 329 | ] 330 | }, 331 | { 332 | "metadata": { 333 | "id": "QqPujzSXeC8_", 334 | "colab_type": "text" 335 | }, 336 | "cell_type": "markdown", 337 | "source": [ 338 | "## Testing" 339 | ] 340 | }, 341 | { 342 | "metadata": { 343 | "id": "J56Zlfd0XLrD", 344 | "colab_type": "code", 345 | "colab": {} 346 | }, 347 | "cell_type": "code", 348 | "source": [ 349 | "let stepsInEpoch = Int32(Float(testX.shape[0]) / Float(32))\n", 350 | "\n", 351 | "var avgAcc:Float = 0.0 \n", 352 | "\n", 353 | "for i in 0..\"Open" 26 | ] 27 | }, 28 | { 29 | "metadata": { 30 | "id": "fqNvA3-pePdn", 31 | "colab_type": "text" 32 | }, 33 | "cell_type": "markdown", 34 | "source": [ 35 | "## [@Zaid Alyafeai](https://twitter.com/zaidalyafeai)" 36 | ] 37 | }, 38 | { 39 | "metadata": { 40 | "id": "tFZUCH-FS9PB", 41 | "colab_type": "text" 42 | }, 43 | "cell_type": "markdown", 44 | "source": [ 45 | "# Introduction\n", 46 | "In this tutorial we explain how to transfer weights from a static graph model built with TensorFlow to a dynamic graph built with Keras. We will first train a model using Tensorflow then we will create the same model in keras and transfer the trained weights between the two models. \n", 47 | "\n", 48 | "![alt text](https://raw.githubusercontent.com/zaidalyafeai/Notebooks/master/images/weightrasnfer.png)" 49 | ] 50 | }, 51 | { 52 | "metadata": { 53 | "id": "Bj4whuEqZhhs", 54 | "colab_type": "text" 55 | }, 56 | "cell_type": "markdown", 57 | "source": [ 58 | "# Dataset\n", 59 | "\n", 60 | "We will use [QuickDraw10](https://github.com/zaidalyafeai/QuickDraw10) which is a suggested alternative for mnist. QuickDraw10 constains 100K grayscale images with shapes (28 x 28)seperated into 80K for training and 20K for testing for labeling 10 classes. " 61 | ] 62 | }, 63 | { 64 | "metadata": { 65 | "id": "DI3koqs9ayfc", 66 | "colab_type": "text" 67 | }, 68 | "cell_type": "markdown", 69 | "source": [ 70 | "## Download the Data" 71 | ] 72 | }, 73 | { 74 | "metadata": { 75 | "id": "EEgAiZMAkYuI", 76 | "colab_type": "code", 77 | "colab": { 78 | "base_uri": "https://localhost:8080/", 79 | "height": 121 80 | }, 81 | "outputId": "24a28b3a-6676-4499-a56a-5f56dfcd79b0" 82 | }, 83 | "cell_type": "code", 84 | "source": [ 85 | "!git clone https://github.com/zaidalyafeai/QuickDraw10" 86 | ], 87 | "execution_count": 2, 88 | "outputs": [ 89 | { 90 | "output_type": "stream", 91 | "text": [ 92 | "Cloning into 'QuickDraw10'...\n", 93 | "remote: Enumerating objects: 53, done.\u001b[K\n", 94 | "remote: Counting objects: 100% (53/53), done.\u001b[K\n", 95 | "remote: Compressing objects: 100% (49/49), done.\u001b[K\n", 96 | "remote: Total 53 (delta 11), reused 0 (delta 0), pack-reused 0\u001b[K\n", 97 | "Unpacking objects: 100% (53/53), done.\n" 98 | ], 99 | "name": "stdout" 100 | } 101 | ] 102 | }, 103 | { 104 | "metadata": { 105 | "id": "veRsNHa0av9T", 106 | "colab_type": "text" 107 | }, 108 | "cell_type": "markdown", 109 | "source": [ 110 | "## Load the Data" 111 | ] 112 | }, 113 | { 114 | "metadata": { 115 | "id": "ICLDY1PpUtgO", 116 | "colab_type": "code", 117 | "colab": {} 118 | }, 119 | "cell_type": "code", 120 | "source": [ 121 | "import numpy as np\n", 122 | "\n", 123 | "train_data = np.load('QuickDraw10/dataset/train-ubyte.npz')\n", 124 | "test_data = np.load('QuickDraw10/dataset/test-ubyte.npz')\n", 125 | "\n", 126 | "x_train, y_train = train_data['a'], train_data['b']\n", 127 | "x_test, y_test = test_data['a'], test_data['b']" 128 | ], 129 | "execution_count": 0, 130 | "outputs": [] 131 | }, 132 | { 133 | "metadata": { 134 | "id": "QoV1CVXIc3J4", 135 | "colab_type": "code", 136 | "colab": {} 137 | }, 138 | "cell_type": "code", 139 | "source": [ 140 | "BATCH_SIZE = 32\n", 141 | "N = x_train.shape[0]\n", 142 | "\n", 143 | "x_train = np.reshape(x_train/ 255., (x_train.shape[0], 28, 28, 1)).astype('float32')\n", 144 | "x_test = np.reshape(x_test/255., (x_test.shape[0], 28, 28, 1)).astype('float32')" 145 | ], 146 | "execution_count": 0, 147 | "outputs": [] 148 | }, 149 | { 150 | "metadata": { 151 | "id": "iz6Bf5qPZyMl", 152 | "colab_type": "text" 153 | }, 154 | "cell_type": "markdown", 155 | "source": [ 156 | "# TensorFlow Graph" 157 | ] 158 | }, 159 | { 160 | "metadata": { 161 | "id": "wrDQ88OhSzvz", 162 | "colab_type": "code", 163 | "colab": {} 164 | }, 165 | "cell_type": "code", 166 | "source": [ 167 | "import tensorflow as tf " 168 | ], 169 | "execution_count": 0, 170 | "outputs": [] 171 | }, 172 | { 173 | "metadata": { 174 | "id": "xKPhzFPVca2P", 175 | "colab_type": "text" 176 | }, 177 | "cell_type": "markdown", 178 | "source": [ 179 | "Define the model inputs and outputs " 180 | ] 181 | }, 182 | { 183 | "metadata": { 184 | "id": "74L6G12VcQP6", 185 | "colab_type": "code", 186 | "colab": {} 187 | }, 188 | "cell_type": "code", 189 | "source": [ 190 | "#define the data\n", 191 | "with tf.name_scope(\"data\"):\n", 192 | " X = tf.placeholder(tf.float32, shape = [None, 28, 28, 1], name = 'X')\n", 193 | " y = tf.placeholder(tf.int32, shape = [None], name = 'y')" 194 | ], 195 | "execution_count": 0, 196 | "outputs": [] 197 | }, 198 | { 199 | "metadata": { 200 | "id": "11Kpxof2cYgP", 201 | "colab_type": "text" 202 | }, 203 | "cell_type": "markdown", 204 | "source": [ 205 | "Create the layers" 206 | ] 207 | }, 208 | { 209 | "metadata": { 210 | "id": "4LPIoeRVU4-Y", 211 | "colab_type": "code", 212 | "colab": {} 213 | }, 214 | "cell_type": "code", 215 | "source": [ 216 | "with tf.name_scope(\"block1\"):\n", 217 | " conv1 = tf.layers.conv2d(X, filters = 8, kernel_size = 3, \n", 218 | " activation = tf.nn.relu, padding = 'same', name = 'conv1')\n", 219 | " pool1 = tf.layers.max_pooling2d(conv1, pool_size = 2, strides = 2, name = 'pool1')\n", 220 | " \n", 221 | "with tf.name_scope(\"block2\"):\n", 222 | " conv2 = tf.layers.conv2d(pool1, filters = 16, kernel_size = 3, \n", 223 | " activation = tf.nn.relu, padding = 'same', name = 'conv2')\n", 224 | " pool2 = tf.layers.max_pooling2d(conv2, pool_size = 2, strides = 2, name = 'pool2')\n", 225 | " \n", 226 | "with tf.name_scope(\"block3\"):\n", 227 | " conv3 = tf.layers.conv2d(pool2, filters = 32, kernel_size = 3, \n", 228 | " activation = tf.nn.relu, padding = 'same', name = 'conv3')\n", 229 | " pool3 = tf.layers.max_pooling2d(conv3, pool_size = 2, strides = 2, name = 'pool3') \n", 230 | " \n", 231 | "with tf.name_scope(\"flatten\"):\n", 232 | " flatten = tf.reshape(pool3, shape = [-1, 3*3*32 ], name = 'flatten')\n", 233 | " \n", 234 | "with tf.name_scope(\"dense\"):\n", 235 | " logits = tf.layers.dense(flatten, units = 10)" 236 | ], 237 | "execution_count": 0, 238 | "outputs": [] 239 | }, 240 | { 241 | "metadata": { 242 | "id": "VodVRfNacdwG", 243 | "colab_type": "text" 244 | }, 245 | "cell_type": "markdown", 246 | "source": [ 247 | "Define the training procedure and the evaluation metrics " 248 | ] 249 | }, 250 | { 251 | "metadata": { 252 | "id": "G3VPMwTmfPca", 253 | "colab_type": "code", 254 | "colab": {} 255 | }, 256 | "cell_type": "code", 257 | "source": [ 258 | "with tf.name_scope(\"train\"):\n", 259 | " #cross entropy loss\n", 260 | " entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits = logits, labels = y)\n", 261 | " loss = tf.reduce_mean(entropy)\n", 262 | " \n", 263 | " #minimize adam optimizer \n", 264 | " optimizer = tf.train.AdamOptimizer()\n", 265 | " backprob = optimizer.minimize(loss)\n", 266 | " \n", 267 | "with tf.name_scope(\"eval\"):\n", 268 | " #calculate the accuracy \n", 269 | " correct = tf.nn.in_top_k(logits,y,1)\n", 270 | " accuracy = tf.reduce_mean(tf.cast(correct, tf.float32))" 271 | ], 272 | "execution_count": 0, 273 | "outputs": [] 274 | }, 275 | { 276 | "metadata": { 277 | "id": "lnh3pC2phAeF", 278 | "colab_type": "code", 279 | "colab": {} 280 | }, 281 | "cell_type": "code", 282 | "source": [ 283 | "init = tf.global_variables_initializer()" 284 | ], 285 | "execution_count": 0, 286 | "outputs": [] 287 | }, 288 | { 289 | "metadata": { 290 | "id": "FoNWnCb-ez4F", 291 | "colab_type": "code", 292 | "colab": { 293 | "base_uri": "https://localhost:8080/", 294 | "height": 121 295 | }, 296 | "outputId": "693774e8-d94e-4307-fce0-888c6bc7aa93" 297 | }, 298 | "cell_type": "code", 299 | "source": [ 300 | "with tf.Session() as sess:\n", 301 | " epochs = 3\n", 302 | " \n", 303 | " #initialize all the variables \n", 304 | " sess.run(init)\n", 305 | " \n", 306 | " #training \n", 307 | " for epoch in range(0, epochs):\n", 308 | " i = 0 \n", 309 | " while i < N:\n", 310 | " \n", 311 | " #get the next batch \n", 312 | " x_batch = x_train[i: i+BATCH_SIZE]\n", 313 | " y_batch = y_train[i: i+BATCH_SIZE]\n", 314 | " \n", 315 | " #run the graph \n", 316 | " out = sess.run(backprob, feed_dict= {X:x_batch, y:y_batch})\n", 317 | " i = i + BATCH_SIZE\n", 318 | " print('------') \n", 319 | " acc_test = accuracy.eval(feed_dict={X: x_test, y: y_test})\n", 320 | " print(\"Epoch:\", epoch+1, \"test accuracy:\", acc_test)\n", 321 | " \n", 322 | " print('saving the weights ...')\n", 323 | " #extract and save the weights \n", 324 | " variables = [v for v in tf.trainable_variables()]\n", 325 | " idx = 0\n", 326 | " weights = []\n", 327 | " for v in variables:\n", 328 | " out = sess.run(v)\n", 329 | " weights.append(out)" 330 | ], 331 | "execution_count": 9, 332 | "outputs": [ 333 | { 334 | "output_type": "stream", 335 | "text": [ 336 | "------\n", 337 | "Epoch: 1 test accuracy: 0.923\n", 338 | "------\n", 339 | "Epoch: 2 test accuracy: 0.9353\n", 340 | "------\n", 341 | "Epoch: 3 test accuracy: 0.93875\n" 342 | ], 343 | "name": "stdout" 344 | } 345 | ] 346 | }, 347 | { 348 | "metadata": { 349 | "id": "zn79ONlZLx-L", 350 | "colab_type": "code", 351 | "colab": {} 352 | }, 353 | "cell_type": "code", 354 | "source": [ 355 | "tf.reset_default_graph()" 356 | ], 357 | "execution_count": 0, 358 | "outputs": [] 359 | }, 360 | { 361 | "metadata": { 362 | "id": "PE8rSI3cZ4qz", 363 | "colab_type": "text" 364 | }, 365 | "cell_type": "markdown", 366 | "source": [ 367 | "# Keras Model" 368 | ] 369 | }, 370 | { 371 | "metadata": { 372 | "id": "_Y4PTdLDL-WP", 373 | "colab_type": "code", 374 | "colab": { 375 | "base_uri": "https://localhost:8080/", 376 | "height": 34 377 | }, 378 | "outputId": "96c60d30-8f92-4d60-ed55-003ee443ab39" 379 | }, 380 | "cell_type": "code", 381 | "source": [ 382 | "from keras.layers import Dense, Input, Convolution2D, MaxPooling2D, Flatten\n", 383 | "from keras.models import Sequential" 384 | ], 385 | "execution_count": 11, 386 | "outputs": [ 387 | { 388 | "output_type": "stream", 389 | "text": [ 390 | "Using TensorFlow backend.\n" 391 | ], 392 | "name": "stderr" 393 | } 394 | ] 395 | }, 396 | { 397 | "metadata": { 398 | "id": "wldUpaGaMnPh", 399 | "colab_type": "code", 400 | "colab": {} 401 | }, 402 | "cell_type": "code", 403 | "source": [ 404 | "model = Sequential()\n", 405 | "model.add(Convolution2D(filters = 8, kernel_size = 3, activation = 'relu', padding = 'same' , input_shape = (28, 28, 1)))\n", 406 | "model.add(MaxPooling2D(pool_size = 2, strides = 2))\n", 407 | "model.add(Convolution2D(filters = 16, kernel_size = 3, activation = 'relu', padding = 'same' , input_shape = (28, 28, 1)))\n", 408 | "model.add(MaxPooling2D(pool_size = 2, strides = 2))\n", 409 | "model.add(Convolution2D(filters = 32, kernel_size = 3, activation = 'relu', padding = 'same' , input_shape = (28, 28, 1)))\n", 410 | "model.add(MaxPooling2D(pool_size = 2, strides = 2))\n", 411 | "model.add(Flatten())\n", 412 | "model.add(Dense(units = 10))\n", 413 | "model.compile(loss = 'categorical_crossentropy', optimizer = 'adam', metrics = ['accuracy'])" 414 | ], 415 | "execution_count": 0, 416 | "outputs": [] 417 | }, 418 | { 419 | "metadata": { 420 | "id": "AvasZolMZ-L_", 421 | "colab_type": "text" 422 | }, 423 | "cell_type": "markdown", 424 | "source": [ 425 | "## Load the Weights" 426 | ] 427 | }, 428 | { 429 | "metadata": { 430 | "id": "32aIz3ZcQ7eO", 431 | "colab_type": "code", 432 | "colab": {} 433 | }, 434 | "cell_type": "code", 435 | "source": [ 436 | "i = 0 \n", 437 | "for layer in model.layers:\n", 438 | " #load the weights to the model layers\n", 439 | " if 'conv2d' in layer.name or 'dense' in layer.name:\n", 440 | " W = weights[i]\n", 441 | " b = weights[i+1]\n", 442 | " layer.set_weights([W, b])\n", 443 | " i+=2" 444 | ], 445 | "execution_count": 0, 446 | "outputs": [] 447 | }, 448 | { 449 | "metadata": { 450 | "id": "niSyRsD0aDYj", 451 | "colab_type": "text" 452 | }, 453 | "cell_type": "markdown", 454 | "source": [ 455 | "## Prediction" 456 | ] 457 | }, 458 | { 459 | "metadata": { 460 | "id": "7NYY9XlCOWSZ", 461 | "colab_type": "code", 462 | "colab": { 463 | "base_uri": "https://localhost:8080/", 464 | "height": 52 465 | }, 466 | "outputId": "26bbb625-7c1a-4d38-a8e4-65734b12fdf1" 467 | }, 468 | "cell_type": "code", 469 | "source": [ 470 | "n_values = np.max(y_test) + 1\n", 471 | "y_one_hot = np.eye(n_values)[y_test]\n", 472 | "model.evaluate(x = x_test, y = y_one_hot)[1]" 473 | ], 474 | "execution_count": 14, 475 | "outputs": [ 476 | { 477 | "output_type": "stream", 478 | "text": [ 479 | "20000/20000 [==============================] - 2s 90us/step\n" 480 | ], 481 | "name": "stdout" 482 | }, 483 | { 484 | "output_type": "execute_result", 485 | "data": { 486 | "text/plain": [ 487 | "0.93875" 488 | ] 489 | }, 490 | "metadata": { 491 | "tags": [] 492 | }, 493 | "execution_count": 14 494 | } 495 | ] 496 | }, 497 | { 498 | "metadata": { 499 | "id": "CiNJkR8crWzl", 500 | "colab_type": "text" 501 | }, 502 | "cell_type": "markdown", 503 | "source": [ 504 | "# References\n", 505 | "https://www.kaggle.com/andrewrona22/an-example-of-cnn-using-tensorflow" 506 | ] 507 | } 508 | ] 509 | } -------------------------------------------------------------------------------- /images/tmp: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /images/weightrasnfer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/zaidalyafeai/Notebooks/ade45eb95f53ae9a2bd26f3848744652bd0cdbd9/images/weightrasnfer.png --------------------------------------------------------------------------------