├── LICENSE ├── README.md ├── line_model.ipynb └── mnist_model.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2019 Machine Learning Tokyo 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Intro to Deep Learning 2 | 3 | Slides https://goo.gl/cPd9NJ 4 | 5 | MLT Workshop on Feb 2 with Dimitris Katsios and Suzana Ilic 6 | 7 | For ML engineers and researchers, join the community in Tokyo: 8 | * [Discourse] http://discuss.mltokyo.ai 9 | * [Slack] https://goo.gl/WnbYUP 10 | * [FB] https://www.facebook.com/machinelearningtokyo 11 | 12 | 13 | ## Write a Neural Network from scratch in NumPy 14 | 15 | The best way to understand a neural network is to code it up from scratch! 16 | 17 | [[Read more]](https://towardsdatascience.com/lets-code-a-neural-network-in-plain-numpy-ae7e74410795?fbclid=IwAR16PwZLqxnXE8kUUJWvu9Tmjf5OlKczRPUJENXNpuUTTz0iaKvS4Z7usa8) 18 | 19 | [

](https://towardsdatascience.com/lets-code-a-neural-network-in-plain-numpy-ae7e74410795?fbclid=IwAR16PwZLqxnXE8kUUJWvu9Tmjf5OlKczRPUJENXNpuUTTz0iaKvS4Z7usa8) 20 | 21 | -------------------------------------------------------------------------------- /line_model.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "line_model.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "collapsed_sections": [], 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "name": "python3", 14 | "display_name": "Python 3" 15 | }, 16 | "accelerator": "GPU" 17 | }, 18 | "cells": [ 19 | { 20 | "cell_type": "markdown", 21 | "metadata": { 22 | "id": "view-in-github", 23 | "colab_type": "text" 24 | }, 25 | "source": [ 26 | "\"Open" 27 | ] 28 | }, 29 | { 30 | "metadata": { 31 | "id": "-Jv0xHZ3BtKp", 32 | "colab_type": "text" 33 | }, 34 | "cell_type": "markdown", 35 | "source": [ 36 | "## Line model\n", 37 | "\n", 38 | "This notebook has a very simple example of how to build a one layer-model in keras to find the parameters of a given line.\n", 39 | "\n", 40 | "First, we import some necessary packages, like keras for the model, pyplot for plotting the generated line segments and numpy for the array caclucations\n" 41 | ] 42 | }, 43 | { 44 | "metadata": { 45 | "id": "S4W_uQ3ryGIx", 46 | "colab_type": "code", 47 | "colab": {} 48 | }, 49 | "cell_type": "code", 50 | "source": [ 51 | "import keras\n", 52 | "from keras.models import Sequential\n", 53 | "from keras.layers import Dense, Activation\n", 54 | "from keras.optimizers import Adam\n", 55 | "import keras.backend as K\n", 56 | "\n", 57 | "import matplotlib.pyplot as plt\n", 58 | "import numpy as np" 59 | ], 60 | "execution_count": 0, 61 | "outputs": [] 62 | }, 63 | { 64 | "metadata": { 65 | "id": "9H2hsqKODRCo", 66 | "colab_type": "text" 67 | }, 68 | "cell_type": "markdown", 69 | "source": [ 70 | "We start by creating a line segment\n", 71 | "\n", 72 | "We actually create some pairs of x, y numbers and plot them as points.\n", 73 | "If we connect these points we get a line segment." 74 | ] 75 | }, 76 | { 77 | "metadata": { 78 | "id": "s56BcOXqChd7", 79 | "colab_type": "code", 80 | "colab": {} 81 | }, 82 | "cell_type": "code", 83 | "source": [ 84 | "x = np.linspace(0, 10, 11)\n", 85 | "y = np.linspace(-2, 3, 11)\n", 86 | "\n", 87 | "row_format =\"{:>6}\" * (len(x))\n", 88 | "print('y = a*x + b\\n')\n", 89 | "print('x:', row_format.format(*x))\n", 90 | "print('y:', row_format.format(*y), '\\n')\n", 91 | "print('y = 0.5*x - 2')\n", 92 | "\n", 93 | "plt.plot(x, y, '-')\n", 94 | "plt.show()" 95 | ], 96 | "execution_count": 0, 97 | "outputs": [] 98 | }, 99 | { 100 | "metadata": { 101 | "id": "-VN0BqLtDm8P", 102 | "colab_type": "text" 103 | }, 104 | "cell_type": "markdown", 105 | "source": [ 106 | "Now let's build our first keras model. The model will have only one layer with one neuron.\n", 107 | "This means that the \"network\" will\n", 108 | "\n", 109 | "- take as input one number\n", 110 | "- pass the number through the neuron\n", 111 | "- apply a f(x) = a*x + b function inside the neuron\n", 112 | "- return the result of the neuron as output\n", 113 | "\n", 114 | "The summary of the model inform us that the model has one Dense layer with 2 parameters.\n", 115 | "\n", 116 | "These parameters are the \"a\" and \"b\" of the aforementioned equation" 117 | ] 118 | }, 119 | { 120 | "metadata": { 121 | "id": "rQ3PCXKV80I7", 122 | "colab_type": "code", 123 | "colab": {} 124 | }, 125 | "cell_type": "code", 126 | "source": [ 127 | "K.clear_session()\n", 128 | "model =Sequential([Dense(1, input_shape=(1,))])\n", 129 | "model.summary()" 130 | ], 131 | "execution_count": 0, 132 | "outputs": [] 133 | }, 134 | { 135 | "metadata": { 136 | "id": "aR0E5rJpEf30", 137 | "colab_type": "text" 138 | }, 139 | "cell_type": "markdown", 140 | "source": [ 141 | "When building the model these parameters are randomly initialized.\n", 142 | "\n", 143 | "What we do in the next cell is that we pass the x values through the model and get the results (y_pred).\n", 144 | "\n", 145 | "Then we plot both the original points and the model's points." 146 | ] 147 | }, 148 | { 149 | "metadata": { 150 | "id": "FjHTGecLDBUq", 151 | "colab_type": "code", 152 | "colab": {} 153 | }, 154 | "cell_type": "code", 155 | "source": [ 156 | "a, b = model.get_weights()\n", 157 | "y_pred = model.predict(x)\n", 158 | "y_pred = np.round(y_pred.flatten().astype(float), 1)\n", 159 | "\n", 160 | "row_format =\"{:>6}\" * (len(x))\n", 161 | "\n", 162 | "print('y = 0.5*x - 2')\n", 163 | "print('x: ', row_format.format(*x))\n", 164 | "print('y: ', row_format.format(*y), '\\n')\n", 165 | "\n", 166 | "print('y_pred = %.1f*x + %.1f' % (a[0,0], b[0]))\n", 167 | "print('x: ', row_format.format(*x))\n", 168 | "print('y_pred:', row_format.format(*y_pred), '\\n')\n", 169 | "\n", 170 | "plt.plot(x, y, '-')\n", 171 | "plt.plot(x, y_pred, '--')\n", 172 | "plt.yticks(ticks=range(-14, 15, 2))\n", 173 | "plt.show()" 174 | ], 175 | "execution_count": 0, 176 | "outputs": [] 177 | }, 178 | { 179 | "metadata": { 180 | "id": "gV6woA_uG3GK", 181 | "colab_type": "text" 182 | }, 183 | "cell_type": "markdown", 184 | "source": [ 185 | "As we can see the two line segments defined by the sets of points are not the same.\n", 186 | "\n", 187 | "Which means that the *y* values and the *y_pred* values are differnet.\n", 188 | "\n", 189 | "This happens because the *a* and *b* parameters that we defined are different from the *a* and *b* parameters of the neuron of the model.\n", 190 | "\n", 191 | "Thus we have to train the model to get these parameters as close as possible to ours.\n", 192 | "\n", 193 | "First of all we have to select some (or all) of the points to use them as training data.\n", 194 | "These points will be fed to the model as training data.\n", 195 | "We take three out of ten points as training data." 196 | ] 197 | }, 198 | { 199 | "metadata": { 200 | "id": "lm3hdWagl8ff", 201 | "colab_type": "code", 202 | "colab": {} 203 | }, 204 | "cell_type": "code", 205 | "source": [ 206 | "x_train = x[0::5]\n", 207 | "y_train = y[0::5]\n", 208 | "\n", 209 | "\n", 210 | "row_format =\"{:>6}\" * (len(x_train))\n", 211 | "print('x_train:', row_format.format(*x_train))\n", 212 | "print('y_train:', row_format.format(*y_train))" 213 | ], 214 | "execution_count": 0, 215 | "outputs": [] 216 | }, 217 | { 218 | "metadata": { 219 | "id": "Jz-krpSGIBd1", 220 | "colab_type": "text" 221 | }, 222 | "cell_type": "markdown", 223 | "source": [ 224 | "Before we start the training of the model we need to define the target and the way to achieve it.\n", 225 | "\n", 226 | "In our case we want the model's points to be as close as possible to the given points which means that the distance between y an y_pred for a given x shall be minimized.\n", 227 | "\n", 228 | "To ahcieve this we use a *loss function*. In our case the loss function will be the *mean squared error*\n", 229 | "\n", 230 | "Also we need to define a method based upon the model will try to minimize the loss function.\n", 231 | "\n", 232 | "The method (also called optimizer since it optimize the model's parameters) that we will use is Adam.\n", 233 | "\n", 234 | "We don't need to get into too much details for this one." 235 | ] 236 | }, 237 | { 238 | "metadata": { 239 | "id": "k_BGgf7Vz5jl", 240 | "colab_type": "code", 241 | "colab": {} 242 | }, 243 | "cell_type": "code", 244 | "source": [ 245 | "model.compile(optimizer=Adam(0.1), loss='mean_squared_error')" 246 | ], 247 | "execution_count": 0, 248 | "outputs": [] 249 | }, 250 | { 251 | "metadata": { 252 | "id": "IoyM1giGI3rp", 253 | "colab_type": "text" 254 | }, 255 | "cell_type": "markdown", 256 | "source": [ 257 | "Now we will train our model to minimize the distance between the the given 3 points and the generated ones.\n", 258 | "\n", 259 | "We will feed the data once to the model and see how the parameters of the model change." 260 | ] 261 | }, 262 | { 263 | "metadata": { 264 | "id": "kGnMaKbXmoOs", 265 | "colab_type": "code", 266 | "colab": {} 267 | }, 268 | "cell_type": "code", 269 | "source": [ 270 | "model.train_on_batch(x_train, y_train)\n", 271 | "y_pred = model.predict(x)\n", 272 | "\n", 273 | "y_pred = np.round(y_pred.flatten().astype(float), 1)\n", 274 | "a, b = model.get_weights()\n", 275 | "\n", 276 | "row_format =\"{:>6}\" * (len(x))\n", 277 | "\n", 278 | "print('y = 0.5*x + -2')\n", 279 | "print('x: ', row_format.format(*x))\n", 280 | "print('y: ', row_format.format(*y), '\\n')\n", 281 | "\n", 282 | "print('y_pred = %.1f*x + %.1f' % (a[0,0], b[0]))\n", 283 | "print('x: ', row_format.format(*x))\n", 284 | "print('y_pred:', row_format.format(*y_pred), '\\n')\n", 285 | "\n", 286 | "plt.plot(x, y, '-')\n", 287 | "plt.plot(x, y_pred, '--')\n", 288 | "\n", 289 | "plt.yticks(ticks=range(-14, 15, 2))\n", 290 | "plt.show()" 291 | ], 292 | "execution_count": 0, 293 | "outputs": [] 294 | }, 295 | { 296 | "metadata": { 297 | "id": "4eA5DKYvJnhN", 298 | "colab_type": "text" 299 | }, 300 | "cell_type": "markdown", 301 | "source": [ 302 | "As we can see it takes more that one repetition for the model to find the right set of parameters for the given problem.\n", 303 | "\n", 304 | "Let's train the model on the given data a few times more" 305 | ] 306 | }, 307 | { 308 | "metadata": { 309 | "id": "slZK3am5EfJd", 310 | "colab_type": "code", 311 | "colab": {} 312 | }, 313 | "cell_type": "code", 314 | "source": [ 315 | "for i in range(20):\n", 316 | " model.train_on_batch(x_train, y_train)\n", 317 | "\n", 318 | "\n", 319 | "y_pred = model.predict(x)\n", 320 | "\n", 321 | "a, b = model.get_weights()\n", 322 | "y_pred = np.round(y_pred.flatten().astype(float), 1)\n", 323 | "\n", 324 | "row_format =\"{:>6}\" * (len(x))\n", 325 | "\n", 326 | "print('y = 0.5*x + -2')\n", 327 | "print('x: ', row_format.format(*x))\n", 328 | "print('y: ', row_format.format(*y), '\\n')\n", 329 | "\n", 330 | "print('y_pred = %.1f*x + %.1f' % (a[0,0], b[0]))\n", 331 | "print('x: ', row_format.format(*x))\n", 332 | "print('y_pred:', row_format.format(*y_pred), '\\n')\n", 333 | "\n", 334 | "plt.plot(x, y, '-')\n", 335 | "plt.plot(x, y_pred, '--')\n", 336 | "\n", 337 | "plt.yticks(ticks=range(-20, 15, 2))\n", 338 | "plt.show()" 339 | ], 340 | "execution_count": 0, 341 | "outputs": [] 342 | }, 343 | { 344 | "metadata": { 345 | "id": "PehY2r3QKAPS", 346 | "colab_type": "text" 347 | }, 348 | "cell_type": "markdown", 349 | "source": [ 350 | "After a few more repetitions the model is able to generate new points on the line that we used to train it.\n", 351 | "\n", 352 | "That was it!\n", 353 | "\n", 354 | "We just trained a model to generate points on a given line.\n", 355 | "We did it not by giving to the model the function of the line but by giving it three points of the line and train it to learn the function of the line by itself.\n", 356 | "\n", 357 | "Not that astonishing but actually this is more or less the core of what is going on when training a model on a task.\n", 358 | "\n", 359 | "We change the internal parameters of the model so that it outputs the values that we want.\n", 360 | "\n", 361 | "The result is a (usually very complicated) function that takes some input and transforms it to the desired output.\n", 362 | "\n", 363 | "### The end" 364 | ] 365 | } 366 | ] 367 | } -------------------------------------------------------------------------------- /mnist_model.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "nbformat": 4, 3 | "nbformat_minor": 0, 4 | "metadata": { 5 | "colab": { 6 | "name": "mnist_model.ipynb", 7 | "version": "0.3.2", 8 | "provenance": [], 9 | "collapsed_sections": [], 10 | "include_colab_link": true 11 | }, 12 | "kernelspec": { 13 | "name": "python3", 14 | "display_name": "Python 3" 15 | } 16 | }, 17 | "cells": [ 18 | { 19 | "cell_type": "markdown", 20 | "metadata": { 21 | "id": "view-in-github", 22 | "colab_type": "text" 23 | }, 24 | "source": [ 25 | "\"Open" 26 | ] 27 | }, 28 | { 29 | "metadata": { 30 | "id": "b3QokrtPMFMA", 31 | "colab_type": "text" 32 | }, 33 | "cell_type": "markdown", 34 | "source": [ 35 | "## MNIST model\n", 36 | "\n", 37 | "In this notebook we will see how we can train a simple model on the mnist dataset\n", 38 | "\n", 39 | "First, we import some necessary packages, like keras for the model, pyplot for plotting the images and numpy for the array caclucations" 40 | ] 41 | }, 42 | { 43 | "metadata": { 44 | "id": "S4W_uQ3ryGIx", 45 | "colab_type": "code", 46 | "colab": {} 47 | }, 48 | "cell_type": "code", 49 | "source": [ 50 | "import keras\n", 51 | "from keras.datasets import mnist, fashion_mnist\n", 52 | "from keras.models import Sequential, Model\n", 53 | "from keras.layers import *\n", 54 | "from keras.activations import softmax, relu\n", 55 | "import keras.backend as K\n", 56 | "from keras.utils import to_categorical\n", 57 | "\n", 58 | "import matplotlib.pyplot as plt\n", 59 | "import numpy as np" 60 | ], 61 | "execution_count": 0, 62 | "outputs": [] 63 | }, 64 | { 65 | "metadata": { 66 | "id": "Rt1qyRCZMhZA", 67 | "colab_type": "text" 68 | }, 69 | "cell_type": "markdown", 70 | "source": [ 71 | "The first thing we need is the data. The data are small (28x28 pixels) gray scale images of hand-written digits.\n", 72 | "\n", 73 | "Notice the line\n", 74 | "\n", 75 | "\n", 76 | "```\n", 77 | "X_train, X_val = X_train / 255.0, X_val / 255.0\n", 78 | "```\n", 79 | "\n", 80 | "Originally the images' pixels have values in [0, 255]\n", 81 | "However that big values are not easy to be handled by the networks. Thus we usually change the input values to something more \"*model friendly*\".\n", 82 | "\n", 83 | "This is called data preprocessing.\n", 84 | "\n", 85 | "In our case the preprocessing is just to map the values from [0, 255] to [0. 1]\n", 86 | "\n" 87 | ] 88 | }, 89 | { 90 | "metadata": { 91 | "id": "ewHzG5VyzWzw", 92 | "colab_type": "code", 93 | "colab": {} 94 | }, 95 | "cell_type": "code", 96 | "source": [ 97 | "(X_train, Y_train), (X_val, Y_val) = mnist.load_data()\n", 98 | "# (X_train, Y_train), (X_val, Y_val) = fashion_mnist.load_data()\n", 99 | "X_train, X_val = X_train / 255.0, X_val / 255.0\n", 100 | "labels_names = 'zero', 'one', 'two', 'three', 'four', 'five', 'six', 'seven', 'eight', 'nine'\n", 101 | "# labels_names = 'T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'\n", 102 | "r, c = 3, 3" 103 | ], 104 | "execution_count": 0, 105 | "outputs": [] 106 | }, 107 | { 108 | "metadata": { 109 | "id": "B1__zwt1NYmH", 110 | "colab_type": "text" 111 | }, 112 | "cell_type": "markdown", 113 | "source": [ 114 | "It is very important to know the shape of the arrays we use.\n", 115 | "\n", 116 | "All the data that we use when training DL models are actually n-dimensional arrays with some values. No matter if it was originally a video, an image, a voice record or a text, in the end everything is transformed to arrays.\n", 117 | "The shape of the array is important since the models are built to be able to accept specific kind of arrays regarding the shape" 118 | ] 119 | }, 120 | { 121 | "metadata": { 122 | "id": "5HSicIdLz9to", 123 | "colab_type": "code", 124 | "colab": {} 125 | }, 126 | "cell_type": "code", 127 | "source": [ 128 | "print('The shape of the traing data array is:', X_train.shape)" 129 | ], 130 | "execution_count": 0, 131 | "outputs": [] 132 | }, 133 | { 134 | "metadata": { 135 | "id": "b4mPcD5QOLCn", 136 | "colab_type": "text" 137 | }, 138 | "cell_type": "markdown", 139 | "source": [ 140 | "In the next cell we define some functions for getting random images from the dataset and plotting them. Don't pay too much attention to them for the moment." 141 | ] 142 | }, 143 | { 144 | "metadata": { 145 | "id": "pqY001s4AdpE", 146 | "colab_type": "code", 147 | "colab": {} 148 | }, 149 | "cell_type": "code", 150 | "source": [ 151 | "def get_random_imgs_labels(X_set, Y_set, n_imgs):\n", 152 | " inds = np.random.randint(0, len(X_set), n_imgs)\n", 153 | " images, labels = X_set[inds], Y_set[inds]\n", 154 | " return images, labels\n", 155 | "\n", 156 | "def plot_images(images, labels, preds=None):\n", 157 | " fig, axs = plt.subplots(r, c)\n", 158 | " cnt = 0\n", 159 | " for i in range(r):\n", 160 | " for j in range(c):\n", 161 | " axs[i, j].imshow(images[cnt], cmap='gray')\n", 162 | " axs[i, j].axis('off')\n", 163 | " title = labels_names[labels[cnt]] if preds is None else '%s/%s' % (labels_names[labels[cnt]], labels_names[preds[cnt]])\n", 164 | " axs[i, j].set_title(title, fontsize=16)\n", 165 | " cnt += 1\n", 166 | " plt.show()" 167 | ], 168 | "execution_count": 0, 169 | "outputs": [] 170 | }, 171 | { 172 | "metadata": { 173 | "id": "NnO5cfBYOahx", 174 | "colab_type": "text" 175 | }, 176 | "cell_type": "markdown", 177 | "source": [ 178 | "Let's plot some of the images together with their labels to see what the look like" 179 | ] 180 | }, 181 | { 182 | "metadata": { 183 | "id": "qEBEFVxE5UPl", 184 | "colab_type": "code", 185 | "colab": {} 186 | }, 187 | "cell_type": "code", 188 | "source": [ 189 | "images, labels = get_random_imgs_labels(X_train, Y_train, r*c)\n", 190 | "plot_images(images, labels)" 191 | ], 192 | "execution_count": 0, 193 | "outputs": [] 194 | }, 195 | { 196 | "metadata": { 197 | "id": "9xmto0cDOkLx", 198 | "colab_type": "text" 199 | }, 200 | "cell_type": "markdown", 201 | "source": [ 202 | "Now we have to build the model we will use to predict the labe of a given image.\n", 203 | "\n", 204 | "The model we will use has two fully connected layers and outputs numbers that can be interpreted as probabilities for each one of the ten labels (numbers).\n", 205 | "\n", 206 | "We use the label with the highest probability as predicted label." 207 | ] 208 | }, 209 | { 210 | "metadata": { 211 | "id": "7ZB0aex57iWJ", 212 | "colab_type": "code", 213 | "colab": {} 214 | }, 215 | "cell_type": "code", 216 | "source": [ 217 | "K.clear_session()\n", 218 | "model = Sequential([\n", 219 | " Flatten(input_shape=X_train.shape[1:]),\n", 220 | " Dense(32, activation='relu'),\n", 221 | " Dense(len(labels_names), activation='softmax'),\n", 222 | "])\n", 223 | "model.summary()" 224 | ], 225 | "execution_count": 0, 226 | "outputs": [] 227 | }, 228 | { 229 | "metadata": { 230 | "id": "Q_-S7nT9POea", 231 | "colab_type": "text" 232 | }, 233 | "cell_type": "markdown", 234 | "source": [ 235 | "Before we start the training of the model we need to define the target and the way to achieve it.\n", 236 | "\n", 237 | "In our case we want the model's predicted probabilities for each label to be as close as possible to the given probabilities for each label. And since in each case we have only one correct label, we want ideally the model to return probability 1 for the correct label and 0 for the rest ones.\n", 238 | "\n", 239 | "To ahcieve this we use a *loss function*. In our case the loss function will be the *categorical crossentropy*.\n", 240 | "\n", 241 | "Also we need to define a method based upon the model will try to minimize the loss function.\n", 242 | "\n", 243 | "The method (also called optimizer since it optimize the model's parameters) that we will use is Adam.\n", 244 | "\n", 245 | "We don't need to get into too much details for this one." 246 | ] 247 | }, 248 | { 249 | "metadata": { 250 | "id": "ahRvgdb0PTCM", 251 | "colab_type": "code", 252 | "colab": {} 253 | }, 254 | "cell_type": "code", 255 | "source": [ 256 | "model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['acc'])" 257 | ], 258 | "execution_count": 0, 259 | "outputs": [] 260 | }, 261 | { 262 | "metadata": { 263 | "id": "7cL-KUfrP6Hj", 264 | "colab_type": "text" 265 | }, 266 | "cell_type": "markdown", 267 | "source": [ 268 | "Now the model has been randomly initialized which means that the output will be mostly wrong. Let's see some examples. We can run the next cell more than once to obtain different randomly selected images and the corresponding (true and predicted) labels" 269 | ] 270 | }, 271 | { 272 | "metadata": { 273 | "id": "utMajqL0_qET", 274 | "colab_type": "code", 275 | "colab": {} 276 | }, 277 | "cell_type": "code", 278 | "source": [ 279 | "images, labels = get_random_imgs_labels(X_val, Y_val, 9)\n", 280 | "predictions = model.predict_on_batch(images)\n", 281 | "predictions = np.argmax(predictions, 1)\n", 282 | "\n", 283 | "print('correct: %d out of %d' % (np.sum(labels == predictions), len(labels)))\n", 284 | "plot_images(images, labels, predictions)" 285 | ], 286 | "execution_count": 0, 287 | "outputs": [] 288 | }, 289 | { 290 | "metadata": { 291 | "id": "SgIiPVedQa3A", 292 | "colab_type": "text" 293 | }, 294 | "cell_type": "markdown", 295 | "source": [ 296 | "Now let's train the model for some epochs (one epoch is one pass through the how training dataset) and see if we get better results.\n", 297 | "\n", 298 | "After training the model we can run again the previous cell to obtain the results of the trained model.\n", 299 | "\n", 300 | "Of course we can train the model more than once by repeatidly excecuting the next cell and obtain the results by runing the previous one." 301 | ] 302 | }, 303 | { 304 | "metadata": { 305 | "id": "FN_bprgkFw0T", 306 | "colab_type": "code", 307 | "colab": {} 308 | }, 309 | "cell_type": "code", 310 | "source": [ 311 | "history = model.fit(X_train, to_categorical(Y_train), validation_data=(X_val, to_categorical(Y_val)), epochs=10)" 312 | ], 313 | "execution_count": 0, 314 | "outputs": [] 315 | }, 316 | { 317 | "metadata": { 318 | "id": "tMkRFad_Q3mC", 319 | "colab_type": "text" 320 | }, 321 | "cell_type": "markdown", 322 | "source": [ 323 | "That was it!\n", 324 | "\n", 325 | "We trained a model to classify images of handwritten digits.\n", 326 | "\n", 327 | "Now a more challenging task will be to classify images of clothes.\n", 328 | "\n", 329 | "In order to do this uncomment (delete the #) these lines from the second cell:\n", 330 | "\n", 331 | "\n", 332 | "\n", 333 | "```\n", 334 | "# (X_train, Y_train), (X_val, Y_val) = fashion_mnist.load_data()\n", 335 | "\n", 336 | "# labels_names = 'T-shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot'\n", 337 | "```\n", 338 | "\n", 339 | "You will notice that it will be harder for the model to get very high accuracy and training it more times will not increase the accuracy after a point.\n", 340 | "\n", 341 | "In this case we need to make some changes to get better results.\n", 342 | "\n", 343 | "These changes could be related to the model (architecture, depth, width), the optimizer, the preprocessing the regularization the loss function etc.\n", 344 | "\n", 345 | "## The end\n" 346 | ] 347 | } 348 | ] 349 | } --------------------------------------------------------------------------------