├── LICENSE ├── second_edition ├── README.md ├── chapter12_part02_deep-dream.ipynb ├── chapter09_part01_image-segmentation.ipynb ├── chapter09_part02_modern-convnet-architecture-patterns.ipynb ├── chapter12_part04_variational-autoencoders.ipynb ├── chapter12_part03_neural-style-transfer.ipynb ├── chapter13_best-practices-for-the-real-world.ipynb ├── chapter12_part05_gans.ipynb ├── chapter11_part03_transformer.ipynb ├── chapter11_part02_sequence-models.ipynb ├── chapter14_conclusions.ipynb └── chapter12_part01_text-generation.ipynb ├── README.md ├── first_edition ├── 6.1-one-hot-encoding-of-words-or-characters.ipynb ├── 5.1-introduction-to-convnets.ipynb └── 2.1-a-first-look-at-a-neural-network.ipynb ├── chapter09_convnet-architecture-patterns.ipynb └── chapter18_best-practices-for-the-real-world.ipynb /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017-present François Chollet 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /second_edition/README.md: -------------------------------------------------------------------------------- 1 | # Second edition notebooks 2 | 3 | These are the notebooks for the second edition of the book, originally published in 2021. These notebooks use `tf.keras` with TensorFlow 2.16. 4 | 5 | ## Table of contents 6 | 7 | * [Chapter 2: The mathematical building blocks of neural networks](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter02_mathematical-building-blocks.ipynb) 8 | * [Chapter 3: Introduction to Keras and TensorFlow](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter03_introduction-to-keras-and-tf.ipynb) 9 | * [Chapter 4: Getting started with neural networks: classification and regression](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter04_getting-started-with-neural-networks.ipynb) 10 | * [Chapter 5: Fundamentals of machine learning](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter05_fundamentals-of-ml.ipynb) 11 | * [Chapter 7: Working with Keras: a deep dive](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter07_working-with-keras.ipynb) 12 | * [Chapter 8: Introduction to deep learning for computer vision](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter08_intro-to-dl-for-computer-vision.ipynb) 13 | * Chapter 9: Advanced deep learning for computer vision 14 | - [Part 1: Image segmentation](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter09_part01_image-segmentation.ipynb) 15 | - [Part 2: Modern convnet architecture patterns](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter09_part02_modern-convnet-architecture-patterns.ipynb) 16 | - [Part 3: Interpreting what convnets learn](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter09_part03_interpreting-what-convnets-learn.ipynb) 17 | * [Chapter 10: Deep learning for timeseries](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter10_dl-for-timeseries.ipynb) 18 | * Chapter 11: Deep learning for text 19 | - [Part 1: Introduction](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter11_part01_introduction.ipynb) 20 | - [Part 2: Sequence models](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter11_part02_sequence-models.ipynb) 21 | - [Part 3: Transformer](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter11_part03_transformer.ipynb) 22 | - [Part 4: Sequence-to-sequence learning](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter11_part04_sequence-to-sequence-learning.ipynb) 23 | * Chapter 12: Generative deep learning 24 | - [Part 1: Text generation](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter12_part01_text-generation.ipynb) 25 | - [Part 2: Deep Dream](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter12_part02_deep-dream.ipynb) 26 | - [Part 3: Neural style transfer](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter12_part03_neural-style-transfer.ipynb) 27 | - [Part 4: Variational autoencoders](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter12_part04_variational-autoencoders.ipynb) 28 | - [Part 5: Generative adversarial networks](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter12_part05_gans.ipynb) 29 | * [Chapter 13: Best practices for the real world](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter13_best-practices-for-the-real-world.ipynb) 30 | * [Chapter 14: Conclusions](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/second_edition/chapter14_conclusions.ipynb) 31 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Companion notebooks for Deep Learning with Python 2 | 3 | This repository contains Jupyter notebooks implementing the code samples found in the book [Deep Learning with Python, third edition (2025)](https://www.manning.com/books/deep-learning-with-python-third-edition?a_aid=keras&a_bid=76564dff) 4 | by Francois Chollet and Matthew Watson. In addition, you will also find the legacy notebooks for the [second edition (2021)](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff) 5 | and the [first edition (2017)](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). 6 | 7 | For readability, these notebooks only contain runnable code blocks and section titles, and omit everything else in the book: text paragraphs, figures, and pseudocode. 8 | **If you want to be able to follow what's going on, I recommend reading the notebooks side by side with your copy of the book.** 9 | 10 | ## Running the code 11 | 12 | We recommend running these notebooks on [Colab](https://colab.google), which 13 | provides a hosted runtime with all the dependencies you will need. You can also, 14 | run these notebooks locally, either by setting up your own Jupyter environment, 15 | or using Colab's instructions for 16 | [running locally](https://research.google.com/colaboratory/local-runtimes.html). 17 | 18 | By default, all notebooks will run on Colab's free tier GPU runtime, which 19 | is sufficient to run all code in this book. Chapter 8-18 chapters will benefit 20 | from a faster GPU if you have a Colab Pro subscription. You can change your 21 | runtime type using **Runtime -> Change runtime type** in Colab's dropdown menus. 22 | 23 | ## Choosing a backend 24 | 25 | The code for third edition is written using Keras 3. As such, it can be run with 26 | JAX, TensorFlow or PyTorch as a backend. To set the backend, update the backend 27 | in the cell at the top of the colab that looks like this: 28 | 29 | ```python 30 | import os 31 | os.environ["KERAS_BACKEND"] = "jax" 32 | ``` 33 | 34 | This must be done only once per session before importing Keras. If you are 35 | in the middle running a notebook, you will need to restart the notebook session 36 | and rerun all relevant notebook cells. This can be done in using 37 | **Runtime -> Restart Session** in Colab's dropdown menus. 38 | 39 | ## Using Kaggle data 40 | 41 | This book uses datasets and model weights provided by Kaggle, an online Machine 42 | Learning community and platform. You will need to create a Kaggle login to run 43 | Kaggle code in this book; instructions are given in Chapter 8. 44 | 45 | For chapters that need Kaggle data, you can login to Kaggle once per session 46 | when you hit the notebook cell with `kagglehub.login()`. Alternately, 47 | you can set up your Kaggle login information once as Colab secrets: 48 | 49 | * Go to https://www.kaggle.com/ and sign in. 50 | * Go to https://www.kaggle.com/settings and generate a Kaggle API key. 51 | * Open the secrets tab in Colab by clicking the key icon on the left. 52 | * Add two secrets, `KAGGLE_USERNAME` and `KAGGLE_KEY` with the username and key 53 | you just created. 54 | 55 | Following this approach you will only need to copy your Kaggle secret key once, 56 | though you will need to allow each notebook to access your secrets when running 57 | the relevant Kaggle code. 58 | 59 | ## Table of contents 60 | 61 | * [Chapter 2: The mathematical building blocks of neural networks](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter02_mathematical-building-blocks.ipynb) 62 | * [Chapter 3: Introduction to TensorFlow, PyTorch, JAX, and Keras](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter03_introduction-to-ml-frameworks.ipynb) 63 | * [Chapter 4: Classification and regression](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter04_classification-and-regression.ipynb) 64 | * [Chapter 5: Fundamentals of machine learning](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter05_fundamentals-of-ml.ipynb) 65 | * [Chapter 7: A deep dive on Keras](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter07_deep-dive-keras.ipynb) 66 | * [Chapter 8: Image Classification](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter08_image-classification.ipynb) 67 | * [Chapter 9: Convnet architecture patterns](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter09_convnet-architecture-patterns.ipynb) 68 | * [Chapter 10: Interpreting what ConvNets learn](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter10_interpreting-what-convnets-learn.ipynb) 69 | * [Chapter 11: Image Segmentation](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter11_image-segmentation.ipynb) 70 | * [Chapter 12: Object Detection](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter12_object-detection.ipynb) 71 | * [Chapter 13: Timeseries Forecasting](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter13_timeseries-forecasting.ipynb) 72 | * [Chapter 14: Text Classification](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter14_text-classification.ipynb) 73 | * [Chapter 15: Language Models and the Transformer](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter15_language-models-and-the-transformer.ipynb) 74 | * [Chapter 16: Text Generation](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter16_text-generation.ipynb) 75 | * [Chapter 17: Image Generation](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter17_image-generation.ipynb) 76 | * [Chapter 18: Best practices for the real world](https://colab.research.google.com/github/fchollet/deep-learning-with-python-notebooks/blob/master/chapter18_best-practices-for-the-real-world.ipynb) 77 | -------------------------------------------------------------------------------- /second_edition/chapter12_part02_deep-dream.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "## DeepDream" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "### Implementing DeepDream in Keras" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "**Fetching the test image**" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 0, 42 | "metadata": { 43 | "colab_type": "code" 44 | }, 45 | "outputs": [], 46 | "source": [ 47 | "from tensorflow import keras\n", 48 | "import matplotlib.pyplot as plt\n", 49 | "\n", 50 | "base_image_path = keras.utils.get_file(\n", 51 | " \"coast.jpg\", origin=\"https://img-datasets.s3.amazonaws.com/coast.jpg\")\n", 52 | "\n", 53 | "plt.axis(\"off\")\n", 54 | "plt.imshow(keras.utils.load_img(base_image_path))" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "colab_type": "text" 61 | }, 62 | "source": [ 63 | "**Instantiating a pretrained `InceptionV3` model**" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 0, 69 | "metadata": { 70 | "colab_type": "code" 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "from tensorflow.keras.applications import inception_v3\n", 75 | "model = inception_v3.InceptionV3(weights=\"imagenet\", include_top=False)" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": { 81 | "colab_type": "text" 82 | }, 83 | "source": [ 84 | "**Configuring the contribution of each layer to the DeepDream loss**" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 0, 90 | "metadata": { 91 | "colab_type": "code" 92 | }, 93 | "outputs": [], 94 | "source": [ 95 | "layer_settings = {\n", 96 | " \"mixed4\": 1.0,\n", 97 | " \"mixed5\": 1.5,\n", 98 | " \"mixed6\": 2.0,\n", 99 | " \"mixed7\": 2.5,\n", 100 | "}\n", 101 | "outputs_dict = dict(\n", 102 | " [\n", 103 | " (layer.name, layer.output)\n", 104 | " for layer in [model.get_layer(name) for name in layer_settings.keys()]\n", 105 | " ]\n", 106 | ")\n", 107 | "feature_extractor = keras.Model(inputs=model.inputs, outputs=outputs_dict)" 108 | ] 109 | }, 110 | { 111 | "cell_type": "markdown", 112 | "metadata": { 113 | "colab_type": "text" 114 | }, 115 | "source": [ 116 | "**The DeepDream loss**" 117 | ] 118 | }, 119 | { 120 | "cell_type": "code", 121 | "execution_count": 0, 122 | "metadata": { 123 | "colab_type": "code" 124 | }, 125 | "outputs": [], 126 | "source": [ 127 | "def compute_loss(input_image):\n", 128 | " features = feature_extractor(input_image)\n", 129 | " loss = tf.zeros(shape=())\n", 130 | " for name in features.keys():\n", 131 | " coeff = layer_settings[name]\n", 132 | " activation = features[name]\n", 133 | " loss += coeff * tf.reduce_mean(tf.square(activation[:, 2:-2, 2:-2, :]))\n", 134 | " return loss" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": { 140 | "colab_type": "text" 141 | }, 142 | "source": [ 143 | "**The DeepDream gradient ascent process**" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 0, 149 | "metadata": { 150 | "colab_type": "code" 151 | }, 152 | "outputs": [], 153 | "source": [ 154 | "import tensorflow as tf\n", 155 | "\n", 156 | "@tf.function\n", 157 | "def gradient_ascent_step(image, learning_rate):\n", 158 | " with tf.GradientTape() as tape:\n", 159 | " tape.watch(image)\n", 160 | " loss = compute_loss(image)\n", 161 | " grads = tape.gradient(loss, image)\n", 162 | " grads = tf.math.l2_normalize(grads)\n", 163 | " image += learning_rate * grads\n", 164 | " return loss, image\n", 165 | "\n", 166 | "\n", 167 | "def gradient_ascent_loop(image, iterations, learning_rate, max_loss=None):\n", 168 | " for i in range(iterations):\n", 169 | " loss, image = gradient_ascent_step(image, learning_rate)\n", 170 | " if max_loss is not None and loss > max_loss:\n", 171 | " break\n", 172 | " print(f\"... Loss value at step {i}: {loss:.2f}\")\n", 173 | " return image" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 0, 179 | "metadata": { 180 | "colab_type": "code" 181 | }, 182 | "outputs": [], 183 | "source": [ 184 | "step = 20.\n", 185 | "num_octave = 3\n", 186 | "octave_scale = 1.4\n", 187 | "iterations = 30\n", 188 | "max_loss = 15." 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": { 194 | "colab_type": "text" 195 | }, 196 | "source": [ 197 | "**Image processing utilities**" 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 0, 203 | "metadata": { 204 | "colab_type": "code" 205 | }, 206 | "outputs": [], 207 | "source": [ 208 | "import numpy as np\n", 209 | "\n", 210 | "def preprocess_image(image_path):\n", 211 | " img = keras.utils.load_img(image_path)\n", 212 | " img = keras.utils.img_to_array(img)\n", 213 | " img = np.expand_dims(img, axis=0)\n", 214 | " img = keras.applications.inception_v3.preprocess_input(img)\n", 215 | " return img\n", 216 | "\n", 217 | "def deprocess_image(img):\n", 218 | " img = img.reshape((img.shape[1], img.shape[2], 3))\n", 219 | " img /= 2.0\n", 220 | " img += 0.5\n", 221 | " img *= 255.\n", 222 | " img = np.clip(img, 0, 255).astype(\"uint8\")\n", 223 | " return img" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": { 229 | "colab_type": "text" 230 | }, 231 | "source": [ 232 | "**Running gradient ascent over multiple successive \"octaves\"**" 233 | ] 234 | }, 235 | { 236 | "cell_type": "code", 237 | "execution_count": 0, 238 | "metadata": { 239 | "colab_type": "code" 240 | }, 241 | "outputs": [], 242 | "source": [ 243 | "original_img = preprocess_image(base_image_path)\n", 244 | "original_shape = original_img.shape[1:3]\n", 245 | "\n", 246 | "successive_shapes = [original_shape]\n", 247 | "for i in range(1, num_octave):\n", 248 | " shape = tuple([int(dim / (octave_scale ** i)) for dim in original_shape])\n", 249 | " successive_shapes.append(shape)\n", 250 | "successive_shapes = successive_shapes[::-1]\n", 251 | "\n", 252 | "shrunk_original_img = tf.image.resize(original_img, successive_shapes[0])\n", 253 | "\n", 254 | "img = tf.identity(original_img)\n", 255 | "for i, shape in enumerate(successive_shapes):\n", 256 | " print(f\"Processing octave {i} with shape {shape}\")\n", 257 | " img = tf.image.resize(img, shape)\n", 258 | " img = gradient_ascent_loop(\n", 259 | " img, iterations=iterations, learning_rate=step, max_loss=max_loss\n", 260 | " )\n", 261 | " upscaled_shrunk_original_img = tf.image.resize(shrunk_original_img, shape)\n", 262 | " same_size_original = tf.image.resize(original_img, shape)\n", 263 | " lost_detail = same_size_original - upscaled_shrunk_original_img\n", 264 | " img += lost_detail\n", 265 | " shrunk_original_img = tf.image.resize(original_img, shape)\n", 266 | "\n", 267 | "keras.utils.save_img(\"dream.png\", deprocess_image(img.numpy()))" 268 | ] 269 | }, 270 | { 271 | "cell_type": "markdown", 272 | "metadata": { 273 | "colab_type": "text" 274 | }, 275 | "source": [ 276 | "### Wrapping up" 277 | ] 278 | } 279 | ], 280 | "metadata": { 281 | "colab": { 282 | "collapsed_sections": [], 283 | "name": "chapter12_part02_deep-dream.i", 284 | "private_outputs": false, 285 | "provenance": [], 286 | "toc_visible": true 287 | }, 288 | "kernelspec": { 289 | "display_name": "Python 3", 290 | "language": "python", 291 | "name": "python3" 292 | }, 293 | "language_info": { 294 | "codemirror_mode": { 295 | "name": "ipython", 296 | "version": 3 297 | }, 298 | "file_extension": ".py", 299 | "mimetype": "text/x-python", 300 | "name": "python", 301 | "nbconvert_exporter": "python", 302 | "pygments_lexer": "ipython3", 303 | "version": "3.7.0" 304 | } 305 | }, 306 | "nbformat": 4, 307 | "nbformat_minor": 0 308 | } -------------------------------------------------------------------------------- /second_edition/chapter09_part01_image-segmentation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "# Advanced deep learning for computer vision" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "## Three essential computer vision tasks" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "## An image segmentation example" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 0, 42 | "metadata": { 43 | "colab_type": "code" 44 | }, 45 | "outputs": [], 46 | "source": [ 47 | "!wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz\n", 48 | "!wget http://www.robots.ox.ac.uk/~vgg/data/pets/data/annotations.tar.gz\n", 49 | "!tar -xf images.tar.gz\n", 50 | "!tar -xf annotations.tar.gz" 51 | ] 52 | }, 53 | { 54 | "cell_type": "code", 55 | "execution_count": 0, 56 | "metadata": { 57 | "colab_type": "code" 58 | }, 59 | "outputs": [], 60 | "source": [ 61 | "import os\n", 62 | "\n", 63 | "input_dir = \"images/\"\n", 64 | "target_dir = \"annotations/trimaps/\"\n", 65 | "\n", 66 | "input_img_paths = sorted(\n", 67 | " [os.path.join(input_dir, fname)\n", 68 | " for fname in os.listdir(input_dir)\n", 69 | " if fname.endswith(\".jpg\")])\n", 70 | "target_paths = sorted(\n", 71 | " [os.path.join(target_dir, fname)\n", 72 | " for fname in os.listdir(target_dir)\n", 73 | " if fname.endswith(\".png\") and not fname.startswith(\".\")])" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 0, 79 | "metadata": { 80 | "colab_type": "code" 81 | }, 82 | "outputs": [], 83 | "source": [ 84 | "import matplotlib.pyplot as plt\n", 85 | "from tensorflow.keras.utils import load_img, img_to_array\n", 86 | "\n", 87 | "plt.axis(\"off\")\n", 88 | "plt.imshow(load_img(input_img_paths[9]))" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": 0, 94 | "metadata": { 95 | "colab_type": "code" 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "def display_target(target_array):\n", 100 | " normalized_array = (target_array.astype(\"uint8\") - 1) * 127\n", 101 | " plt.axis(\"off\")\n", 102 | " plt.imshow(normalized_array[:, :, 0])\n", 103 | "\n", 104 | "img = img_to_array(load_img(target_paths[9], color_mode=\"grayscale\"))\n", 105 | "display_target(img)" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 0, 111 | "metadata": { 112 | "colab_type": "code" 113 | }, 114 | "outputs": [], 115 | "source": [ 116 | "import numpy as np\n", 117 | "import random\n", 118 | "\n", 119 | "img_size = (200, 200)\n", 120 | "num_imgs = len(input_img_paths)\n", 121 | "\n", 122 | "random.Random(1337).shuffle(input_img_paths)\n", 123 | "random.Random(1337).shuffle(target_paths)\n", 124 | "\n", 125 | "def path_to_input_image(path):\n", 126 | " return img_to_array(load_img(path, target_size=img_size))\n", 127 | "\n", 128 | "def path_to_target(path):\n", 129 | " img = img_to_array(\n", 130 | " load_img(path, target_size=img_size, color_mode=\"grayscale\"))\n", 131 | " img = img.astype(\"uint8\") - 1\n", 132 | " return img\n", 133 | "\n", 134 | "input_imgs = np.zeros((num_imgs,) + img_size + (3,), dtype=\"float32\")\n", 135 | "targets = np.zeros((num_imgs,) + img_size + (1,), dtype=\"uint8\")\n", 136 | "for i in range(num_imgs):\n", 137 | " input_imgs[i] = path_to_input_image(input_img_paths[i])\n", 138 | " targets[i] = path_to_target(target_paths[i])\n", 139 | "\n", 140 | "num_val_samples = 1000\n", 141 | "train_input_imgs = input_imgs[:-num_val_samples]\n", 142 | "train_targets = targets[:-num_val_samples]\n", 143 | "val_input_imgs = input_imgs[-num_val_samples:]\n", 144 | "val_targets = targets[-num_val_samples:]" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 0, 150 | "metadata": { 151 | "colab_type": "code" 152 | }, 153 | "outputs": [], 154 | "source": [ 155 | "from tensorflow import keras\n", 156 | "from tensorflow.keras import layers\n", 157 | "\n", 158 | "def get_model(img_size, num_classes):\n", 159 | " inputs = keras.Input(shape=img_size + (3,))\n", 160 | " x = layers.Rescaling(1./255)(inputs)\n", 161 | "\n", 162 | " x = layers.Conv2D(64, 3, strides=2, activation=\"relu\", padding=\"same\")(x)\n", 163 | " x = layers.Conv2D(64, 3, activation=\"relu\", padding=\"same\")(x)\n", 164 | " x = layers.Conv2D(128, 3, strides=2, activation=\"relu\", padding=\"same\")(x)\n", 165 | " x = layers.Conv2D(128, 3, activation=\"relu\", padding=\"same\")(x)\n", 166 | " x = layers.Conv2D(256, 3, strides=2, padding=\"same\", activation=\"relu\")(x)\n", 167 | " x = layers.Conv2D(256, 3, activation=\"relu\", padding=\"same\")(x)\n", 168 | "\n", 169 | " x = layers.Conv2DTranspose(256, 3, activation=\"relu\", padding=\"same\")(x)\n", 170 | " x = layers.Conv2DTranspose(256, 3, activation=\"relu\", padding=\"same\", strides=2)(x)\n", 171 | " x = layers.Conv2DTranspose(128, 3, activation=\"relu\", padding=\"same\")(x)\n", 172 | " x = layers.Conv2DTranspose(128, 3, activation=\"relu\", padding=\"same\", strides=2)(x)\n", 173 | " x = layers.Conv2DTranspose(64, 3, activation=\"relu\", padding=\"same\")(x)\n", 174 | " x = layers.Conv2DTranspose(64, 3, activation=\"relu\", padding=\"same\", strides=2)(x)\n", 175 | "\n", 176 | " outputs = layers.Conv2D(num_classes, 3, activation=\"softmax\", padding=\"same\")(x)\n", 177 | "\n", 178 | " model = keras.Model(inputs, outputs)\n", 179 | " return model\n", 180 | "\n", 181 | "model = get_model(img_size=img_size, num_classes=3)\n", 182 | "model.summary()" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 0, 188 | "metadata": { 189 | "colab_type": "code" 190 | }, 191 | "outputs": [], 192 | "source": [ 193 | "model.compile(optimizer=\"rmsprop\", loss=\"sparse_categorical_crossentropy\")\n", 194 | "\n", 195 | "callbacks = [\n", 196 | " keras.callbacks.ModelCheckpoint(\"oxford_segmentation.keras\",\n", 197 | " save_best_only=True)\n", 198 | "]\n", 199 | "\n", 200 | "history = model.fit(train_input_imgs, train_targets,\n", 201 | " epochs=50,\n", 202 | " callbacks=callbacks,\n", 203 | " batch_size=64,\n", 204 | " validation_data=(val_input_imgs, val_targets))" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 0, 210 | "metadata": { 211 | "colab_type": "code" 212 | }, 213 | "outputs": [], 214 | "source": [ 215 | "epochs = range(1, len(history.history[\"loss\"]) + 1)\n", 216 | "loss = history.history[\"loss\"]\n", 217 | "val_loss = history.history[\"val_loss\"]\n", 218 | "plt.figure()\n", 219 | "plt.plot(epochs, loss, \"bo\", label=\"Training loss\")\n", 220 | "plt.plot(epochs, val_loss, \"b\", label=\"Validation loss\")\n", 221 | "plt.title(\"Training and validation loss\")\n", 222 | "plt.legend()" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 0, 228 | "metadata": { 229 | "colab_type": "code" 230 | }, 231 | "outputs": [], 232 | "source": [ 233 | "from tensorflow.keras.utils import array_to_img\n", 234 | "\n", 235 | "model = keras.models.load_model(\"oxford_segmentation.keras\")\n", 236 | "\n", 237 | "i = 4\n", 238 | "test_image = val_input_imgs[i]\n", 239 | "plt.axis(\"off\")\n", 240 | "plt.imshow(array_to_img(test_image))\n", 241 | "\n", 242 | "mask = model.predict(np.expand_dims(test_image, 0))[0]\n", 243 | "\n", 244 | "def display_mask(pred):\n", 245 | " mask = np.argmax(pred, axis=-1)\n", 246 | " mask *= 127\n", 247 | " plt.axis(\"off\")\n", 248 | " plt.imshow(mask)\n", 249 | "\n", 250 | "display_mask(mask)" 251 | ] 252 | } 253 | ], 254 | "metadata": { 255 | "colab": { 256 | "collapsed_sections": [], 257 | "name": "chapter09_part01_image-segmentation.i", 258 | "private_outputs": false, 259 | "provenance": [], 260 | "toc_visible": true 261 | }, 262 | "kernelspec": { 263 | "display_name": "Python 3", 264 | "language": "python", 265 | "name": "python3" 266 | }, 267 | "language_info": { 268 | "codemirror_mode": { 269 | "name": "ipython", 270 | "version": 3 271 | }, 272 | "file_extension": ".py", 273 | "mimetype": "text/x-python", 274 | "name": "python", 275 | "nbconvert_exporter": "python", 276 | "pygments_lexer": "ipython3", 277 | "version": "3.7.0" 278 | } 279 | }, 280 | "nbformat": 4, 281 | "nbformat_minor": 0 282 | } -------------------------------------------------------------------------------- /first_edition/6.1-one-hot-encoding-of-words-or-characters.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "Using TensorFlow backend.\n" 13 | ] 14 | }, 15 | { 16 | "data": { 17 | "text/plain": [ 18 | "'2.0.8'" 19 | ] 20 | }, 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "output_type": "execute_result" 24 | } 25 | ], 26 | "source": [ 27 | "import keras\n", 28 | "keras.__version__" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "# One-hot encoding of words or characters\n", 36 | "\n", 37 | "This notebook contains the first code sample found in Chapter 6, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.\n", 38 | "\n", 39 | "----\n", 40 | "\n", 41 | "One-hot encoding is the most common, most basic way to turn a token into a vector. You already saw it in action in our initial IMDB and \n", 42 | "Reuters examples from chapter 3 (done with words, in our case). It consists in associating a unique integer index to every word, then \n", 43 | "turning this integer index i into a binary vector of size N, the size of the vocabulary, that would be all-zeros except for the i-th \n", 44 | "entry, which would be 1.\n", 45 | "\n", 46 | "Of course, one-hot encoding can be done at the character level as well. To unambiguously drive home what one-hot encoding is and how to \n", 47 | "implement it, here are two toy examples of one-hot encoding: one for words, the other for characters.\n", 48 | "\n" 49 | ] 50 | }, 51 | { 52 | "cell_type": "markdown", 53 | "metadata": {}, 54 | "source": [ 55 | "Word level one-hot encoding (toy example):" 56 | ] 57 | }, 58 | { 59 | "cell_type": "code", 60 | "execution_count": 3, 61 | "metadata": {}, 62 | "outputs": [], 63 | "source": [ 64 | "import numpy as np\n", 65 | "\n", 66 | "# This is our initial data; one entry per \"sample\"\n", 67 | "# (in this toy example, a \"sample\" is just a sentence, but\n", 68 | "# it could be an entire document).\n", 69 | "samples = ['The cat sat on the mat.', 'The dog ate my homework.']\n", 70 | "\n", 71 | "# First, build an index of all tokens in the data.\n", 72 | "token_index = {}\n", 73 | "for sample in samples:\n", 74 | " # We simply tokenize the samples via the `split` method.\n", 75 | " # in real life, we would also strip punctuation and special characters\n", 76 | " # from the samples.\n", 77 | " for word in sample.split():\n", 78 | " if word not in token_index:\n", 79 | " # Assign a unique index to each unique word\n", 80 | " token_index[word] = len(token_index) + 1\n", 81 | " # Note that we don't attribute index 0 to anything.\n", 82 | "\n", 83 | "# Next, we vectorize our samples.\n", 84 | "# We will only consider the first `max_length` words in each sample.\n", 85 | "max_length = 10\n", 86 | "\n", 87 | "# This is where we store our results:\n", 88 | "results = np.zeros((len(samples), max_length, max(token_index.values()) + 1))\n", 89 | "for i, sample in enumerate(samples):\n", 90 | " for j, word in list(enumerate(sample.split()))[:max_length]:\n", 91 | " index = token_index.get(word)\n", 92 | " results[i, j, index] = 1." 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "Character level one-hot encoding (toy example)" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 5, 105 | "metadata": {}, 106 | "outputs": [], 107 | "source": [ 108 | "import string\n", 109 | "\n", 110 | "samples = ['The cat sat on the mat.', 'The dog ate my homework.']\n", 111 | "characters = string.printable # All printable ASCII characters.\n", 112 | "token_index = dict(zip(characters, range(1, len(characters) + 1)))\n", 113 | "\n", 114 | "max_length = 50\n", 115 | "results = np.zeros((len(samples), max_length, max(token_index.values()) + 1))\n", 116 | "for i, sample in enumerate(samples):\n", 117 | " for j, character in enumerate(sample[:max_length]):\n", 118 | " index = token_index.get(character)\n", 119 | " results[i, j, index] = 1." 120 | ] 121 | }, 122 | { 123 | "cell_type": "markdown", 124 | "metadata": {}, 125 | "source": [ 126 | "Note that Keras has built-in utilities for doing one-hot encoding text at the word level or character level, starting from raw text data. \n", 127 | "This is what you should actually be using, as it will take care of a number of important features, such as stripping special characters \n", 128 | "from strings, or only taking into the top N most common words in your dataset (a common restriction to avoid dealing with very large input \n", 129 | "vector spaces)." 130 | ] 131 | }, 132 | { 133 | "cell_type": "markdown", 134 | "metadata": {}, 135 | "source": [ 136 | "Using Keras for word-level one-hot encoding:" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": 7, 142 | "metadata": {}, 143 | "outputs": [ 144 | { 145 | "name": "stdout", 146 | "output_type": "stream", 147 | "text": [ 148 | "Found 9 unique tokens.\n" 149 | ] 150 | } 151 | ], 152 | "source": [ 153 | "from keras.preprocessing.text import Tokenizer\n", 154 | "\n", 155 | "samples = ['The cat sat on the mat.', 'The dog ate my homework.']\n", 156 | "\n", 157 | "# We create a tokenizer, configured to only take\n", 158 | "# into account the top-1000 most common words\n", 159 | "tokenizer = Tokenizer(num_words=1000)\n", 160 | "# This builds the word index\n", 161 | "tokenizer.fit_on_texts(samples)\n", 162 | "\n", 163 | "# This turns strings into lists of integer indices.\n", 164 | "sequences = tokenizer.texts_to_sequences(samples)\n", 165 | "\n", 166 | "# You could also directly get the one-hot binary representations.\n", 167 | "# Note that other vectorization modes than one-hot encoding are supported!\n", 168 | "one_hot_results = tokenizer.texts_to_matrix(samples, mode='binary')\n", 169 | "\n", 170 | "# This is how you can recover the word index that was computed\n", 171 | "word_index = tokenizer.word_index\n", 172 | "print('Found %s unique tokens.' % len(word_index))" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "\n", 180 | "A variant of one-hot encoding is the so-called \"one-hot hashing trick\", which can be used when the number of unique tokens in your \n", 181 | "vocabulary is too large to handle explicitly. Instead of explicitly assigning an index to each word and keeping a reference of these \n", 182 | "indices in a dictionary, one may hash words into vectors of fixed size. This is typically done with a very lightweight hashing function. \n", 183 | "The main advantage of this method is that it does away with maintaining an explicit word index, which \n", 184 | "saves memory and allows online encoding of the data (starting to generate token vectors right away, before having seen all of the available \n", 185 | "data). The one drawback of this method is that it is susceptible to \"hash collisions\": two different words may end up with the same hash, \n", 186 | "and subsequently any machine learning model looking at these hashes won't be able to tell the difference between these words. The likelihood \n", 187 | "of hash collisions decreases when the dimensionality of the hashing space is much larger than the total number of unique tokens being hashed." 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": {}, 193 | "source": [ 194 | "Word-level one-hot encoding with hashing trick (toy example):" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 9, 200 | "metadata": {}, 201 | "outputs": [], 202 | "source": [ 203 | "samples = ['The cat sat on the mat.', 'The dog ate my homework.']\n", 204 | "\n", 205 | "# We will store our words as vectors of size 1000.\n", 206 | "# Note that if you have close to 1000 words (or more)\n", 207 | "# you will start seeing many hash collisions, which\n", 208 | "# will decrease the accuracy of this encoding method.\n", 209 | "dimensionality = 1000\n", 210 | "max_length = 10\n", 211 | "\n", 212 | "results = np.zeros((len(samples), max_length, dimensionality))\n", 213 | "for i, sample in enumerate(samples):\n", 214 | " for j, word in list(enumerate(sample.split()))[:max_length]:\n", 215 | " # Hash the word into a \"random\" integer index\n", 216 | " # that is between 0 and 1000\n", 217 | " index = abs(hash(word)) % dimensionality\n", 218 | " results[i, j, index] = 1." 219 | ] 220 | } 221 | ], 222 | "metadata": { 223 | "kernelspec": { 224 | "display_name": "Python 3", 225 | "language": "python", 226 | "name": "python3" 227 | }, 228 | "language_info": { 229 | "codemirror_mode": { 230 | "name": "ipython", 231 | "version": 3 232 | }, 233 | "file_extension": ".py", 234 | "mimetype": "text/x-python", 235 | "name": "python", 236 | "nbconvert_exporter": "python", 237 | "pygments_lexer": "ipython3", 238 | "version": "3.5.2" 239 | } 240 | }, 241 | "nbformat": 4, 242 | "nbformat_minor": 2 243 | } 244 | -------------------------------------------------------------------------------- /second_edition/chapter09_part02_modern-convnet-architecture-patterns.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "## Modern convnet architecture patterns" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "### Modularity, hierarchy, and reuse" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "### Residual connections" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "colab_type": "text" 43 | }, 44 | "source": [ 45 | "**Residual block where the number of filters changes**" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 0, 51 | "metadata": { 52 | "colab_type": "code" 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "from tensorflow import keras\n", 57 | "from tensorflow.keras import layers\n", 58 | "\n", 59 | "inputs = keras.Input(shape=(32, 32, 3))\n", 60 | "x = layers.Conv2D(32, 3, activation=\"relu\")(inputs)\n", 61 | "residual = x\n", 62 | "x = layers.Conv2D(64, 3, activation=\"relu\", padding=\"same\")(x)\n", 63 | "residual = layers.Conv2D(64, 1)(residual)\n", 64 | "x = layers.add([x, residual])" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": { 70 | "colab_type": "text" 71 | }, 72 | "source": [ 73 | "**Case where target block includes a max pooling layer**" 74 | ] 75 | }, 76 | { 77 | "cell_type": "code", 78 | "execution_count": 0, 79 | "metadata": { 80 | "colab_type": "code" 81 | }, 82 | "outputs": [], 83 | "source": [ 84 | "inputs = keras.Input(shape=(32, 32, 3))\n", 85 | "x = layers.Conv2D(32, 3, activation=\"relu\")(inputs)\n", 86 | "residual = x\n", 87 | "x = layers.Conv2D(64, 3, activation=\"relu\", padding=\"same\")(x)\n", 88 | "x = layers.MaxPooling2D(2, padding=\"same\")(x)\n", 89 | "residual = layers.Conv2D(64, 1, strides=2)(residual)\n", 90 | "x = layers.add([x, residual])" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 0, 96 | "metadata": { 97 | "colab_type": "code" 98 | }, 99 | "outputs": [], 100 | "source": [ 101 | "inputs = keras.Input(shape=(32, 32, 3))\n", 102 | "x = layers.Rescaling(1./255)(inputs)\n", 103 | "\n", 104 | "def residual_block(x, filters, pooling=False):\n", 105 | " residual = x\n", 106 | " x = layers.Conv2D(filters, 3, activation=\"relu\", padding=\"same\")(x)\n", 107 | " x = layers.Conv2D(filters, 3, activation=\"relu\", padding=\"same\")(x)\n", 108 | " if pooling:\n", 109 | " x = layers.MaxPooling2D(2, padding=\"same\")(x)\n", 110 | " residual = layers.Conv2D(filters, 1, strides=2)(residual)\n", 111 | " elif filters != residual.shape[-1]:\n", 112 | " residual = layers.Conv2D(filters, 1)(residual)\n", 113 | " x = layers.add([x, residual])\n", 114 | " return x\n", 115 | "\n", 116 | "x = residual_block(x, filters=32, pooling=True)\n", 117 | "x = residual_block(x, filters=64, pooling=True)\n", 118 | "x = residual_block(x, filters=128, pooling=False)\n", 119 | "\n", 120 | "x = layers.GlobalAveragePooling2D()(x)\n", 121 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 122 | "model = keras.Model(inputs=inputs, outputs=outputs)\n", 123 | "model.summary()" 124 | ] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": { 129 | "colab_type": "text" 130 | }, 131 | "source": [ 132 | "### Batch normalization" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": { 138 | "colab_type": "text" 139 | }, 140 | "source": [ 141 | "### Depthwise separable convolutions" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": { 147 | "colab_type": "text" 148 | }, 149 | "source": [ 150 | "### Putting it together: A mini Xception-like model" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 0, 156 | "metadata": { 157 | "colab_type": "code" 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "from google.colab import files\n", 162 | "files.upload()" 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 0, 168 | "metadata": { 169 | "colab_type": "code" 170 | }, 171 | "outputs": [], 172 | "source": [ 173 | "!mkdir ~/.kaggle\n", 174 | "!cp kaggle.json ~/.kaggle/\n", 175 | "!chmod 600 ~/.kaggle/kaggle.json\n", 176 | "!kaggle competitions download -c dogs-vs-cats\n", 177 | "!unzip -qq train.zip" 178 | ] 179 | }, 180 | { 181 | "cell_type": "code", 182 | "execution_count": 0, 183 | "metadata": { 184 | "colab_type": "code" 185 | }, 186 | "outputs": [], 187 | "source": [ 188 | "import os, shutil, pathlib\n", 189 | "from tensorflow.keras.utils import image_dataset_from_directory\n", 190 | "\n", 191 | "original_dir = pathlib.Path(\"train\")\n", 192 | "new_base_dir = pathlib.Path(\"cats_vs_dogs_small\")\n", 193 | "\n", 194 | "def make_subset(subset_name, start_index, end_index):\n", 195 | " for category in (\"cat\", \"dog\"):\n", 196 | " dir = new_base_dir / subset_name / category\n", 197 | " os.makedirs(dir)\n", 198 | " fnames = [f\"{category}.{i}.jpg\" for i in range(start_index, end_index)]\n", 199 | " for fname in fnames:\n", 200 | " shutil.copyfile(src=original_dir / fname,\n", 201 | " dst=dir / fname)\n", 202 | "\n", 203 | "make_subset(\"train\", start_index=0, end_index=1000)\n", 204 | "make_subset(\"validation\", start_index=1000, end_index=1500)\n", 205 | "make_subset(\"test\", start_index=1500, end_index=2500)\n", 206 | "\n", 207 | "train_dataset = image_dataset_from_directory(\n", 208 | " new_base_dir / \"train\",\n", 209 | " image_size=(180, 180),\n", 210 | " batch_size=32)\n", 211 | "validation_dataset = image_dataset_from_directory(\n", 212 | " new_base_dir / \"validation\",\n", 213 | " image_size=(180, 180),\n", 214 | " batch_size=32)\n", 215 | "test_dataset = image_dataset_from_directory(\n", 216 | " new_base_dir / \"test\",\n", 217 | " image_size=(180, 180),\n", 218 | " batch_size=32)" 219 | ] 220 | }, 221 | { 222 | "cell_type": "code", 223 | "execution_count": 0, 224 | "metadata": { 225 | "colab_type": "code" 226 | }, 227 | "outputs": [], 228 | "source": [ 229 | "data_augmentation = keras.Sequential(\n", 230 | " [\n", 231 | " layers.RandomFlip(\"horizontal\"),\n", 232 | " layers.RandomRotation(0.1),\n", 233 | " layers.RandomZoom(0.2),\n", 234 | " ]\n", 235 | ")" 236 | ] 237 | }, 238 | { 239 | "cell_type": "code", 240 | "execution_count": 0, 241 | "metadata": { 242 | "colab_type": "code" 243 | }, 244 | "outputs": [], 245 | "source": [ 246 | "inputs = keras.Input(shape=(180, 180, 3))\n", 247 | "x = data_augmentation(inputs)\n", 248 | "\n", 249 | "x = layers.Rescaling(1./255)(x)\n", 250 | "x = layers.Conv2D(filters=32, kernel_size=5, use_bias=False)(x)\n", 251 | "\n", 252 | "for size in [32, 64, 128, 256, 512]:\n", 253 | " residual = x\n", 254 | "\n", 255 | " x = layers.BatchNormalization()(x)\n", 256 | " x = layers.Activation(\"relu\")(x)\n", 257 | " x = layers.SeparableConv2D(size, 3, padding=\"same\", use_bias=False)(x)\n", 258 | "\n", 259 | " x = layers.BatchNormalization()(x)\n", 260 | " x = layers.Activation(\"relu\")(x)\n", 261 | " x = layers.SeparableConv2D(size, 3, padding=\"same\", use_bias=False)(x)\n", 262 | "\n", 263 | " x = layers.MaxPooling2D(3, strides=2, padding=\"same\")(x)\n", 264 | "\n", 265 | " residual = layers.Conv2D(\n", 266 | " size, 1, strides=2, padding=\"same\", use_bias=False)(residual)\n", 267 | " x = layers.add([x, residual])\n", 268 | "\n", 269 | "x = layers.GlobalAveragePooling2D()(x)\n", 270 | "x = layers.Dropout(0.5)(x)\n", 271 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 272 | "model = keras.Model(inputs=inputs, outputs=outputs)" 273 | ] 274 | }, 275 | { 276 | "cell_type": "code", 277 | "execution_count": 0, 278 | "metadata": { 279 | "colab_type": "code" 280 | }, 281 | "outputs": [], 282 | "source": [ 283 | "model.compile(loss=\"binary_crossentropy\",\n", 284 | " optimizer=\"rmsprop\",\n", 285 | " metrics=[\"accuracy\"])\n", 286 | "history = model.fit(\n", 287 | " train_dataset,\n", 288 | " epochs=100,\n", 289 | " validation_data=validation_dataset)" 290 | ] 291 | } 292 | ], 293 | "metadata": { 294 | "colab": { 295 | "collapsed_sections": [], 296 | "name": "chapter09_part02_modern-convnet-architecture-patterns.i", 297 | "private_outputs": false, 298 | "provenance": [], 299 | "toc_visible": true 300 | }, 301 | "kernelspec": { 302 | "display_name": "Python 3", 303 | "language": "python", 304 | "name": "python3" 305 | }, 306 | "language_info": { 307 | "codemirror_mode": { 308 | "name": "ipython", 309 | "version": 3 310 | }, 311 | "file_extension": ".py", 312 | "mimetype": "text/x-python", 313 | "name": "python", 314 | "nbconvert_exporter": "python", 315 | "pygments_lexer": "ipython3", 316 | "version": "3.7.0" 317 | } 318 | }, 319 | "nbformat": 4, 320 | "nbformat_minor": 0 321 | } -------------------------------------------------------------------------------- /second_edition/chapter12_part04_variational-autoencoders.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "## Generating images with variational autoencoders" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "### Sampling from latent spaces of images" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "### Concept vectors for image editing" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "colab_type": "text" 43 | }, 44 | "source": [ 45 | "### Variational autoencoders" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "colab_type": "text" 52 | }, 53 | "source": [ 54 | "### Implementing a VAE with Keras" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "colab_type": "text" 61 | }, 62 | "source": [ 63 | "**VAE encoder network**" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 0, 69 | "metadata": { 70 | "colab_type": "code" 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "from tensorflow import keras\n", 75 | "from tensorflow.keras import layers\n", 76 | "\n", 77 | "latent_dim = 2\n", 78 | "\n", 79 | "encoder_inputs = keras.Input(shape=(28, 28, 1))\n", 80 | "x = layers.Conv2D(32, 3, activation=\"relu\", strides=2, padding=\"same\")(encoder_inputs)\n", 81 | "x = layers.Conv2D(64, 3, activation=\"relu\", strides=2, padding=\"same\")(x)\n", 82 | "x = layers.Flatten()(x)\n", 83 | "x = layers.Dense(16, activation=\"relu\")(x)\n", 84 | "z_mean = layers.Dense(latent_dim, name=\"z_mean\")(x)\n", 85 | "z_log_var = layers.Dense(latent_dim, name=\"z_log_var\")(x)\n", 86 | "encoder = keras.Model(encoder_inputs, [z_mean, z_log_var], name=\"encoder\")" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": 0, 92 | "metadata": { 93 | "colab_type": "code" 94 | }, 95 | "outputs": [], 96 | "source": [ 97 | "encoder.summary()" 98 | ] 99 | }, 100 | { 101 | "cell_type": "markdown", 102 | "metadata": { 103 | "colab_type": "text" 104 | }, 105 | "source": [ 106 | "**Latent-space-sampling layer**" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 0, 112 | "metadata": { 113 | "colab_type": "code" 114 | }, 115 | "outputs": [], 116 | "source": [ 117 | "import tensorflow as tf\n", 118 | "\n", 119 | "class Sampler(layers.Layer):\n", 120 | " def call(self, z_mean, z_log_var):\n", 121 | " batch_size = tf.shape(z_mean)[0]\n", 122 | " z_size = tf.shape(z_mean)[1]\n", 123 | " epsilon = tf.random.normal(shape=(batch_size, z_size))\n", 124 | " return z_mean + tf.exp(0.5 * z_log_var) * epsilon" 125 | ] 126 | }, 127 | { 128 | "cell_type": "markdown", 129 | "metadata": { 130 | "colab_type": "text" 131 | }, 132 | "source": [ 133 | "**VAE decoder network, mapping latent space points to images**" 134 | ] 135 | }, 136 | { 137 | "cell_type": "code", 138 | "execution_count": 0, 139 | "metadata": { 140 | "colab_type": "code" 141 | }, 142 | "outputs": [], 143 | "source": [ 144 | "latent_inputs = keras.Input(shape=(latent_dim,))\n", 145 | "x = layers.Dense(7 * 7 * 64, activation=\"relu\")(latent_inputs)\n", 146 | "x = layers.Reshape((7, 7, 64))(x)\n", 147 | "x = layers.Conv2DTranspose(64, 3, activation=\"relu\", strides=2, padding=\"same\")(x)\n", 148 | "x = layers.Conv2DTranspose(32, 3, activation=\"relu\", strides=2, padding=\"same\")(x)\n", 149 | "decoder_outputs = layers.Conv2D(1, 3, activation=\"sigmoid\", padding=\"same\")(x)\n", 150 | "decoder = keras.Model(latent_inputs, decoder_outputs, name=\"decoder\")" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 0, 156 | "metadata": { 157 | "colab_type": "code" 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "decoder.summary()" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": { 167 | "colab_type": "text" 168 | }, 169 | "source": [ 170 | "**VAE model with custom `train_step()`**" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 0, 176 | "metadata": { 177 | "colab_type": "code" 178 | }, 179 | "outputs": [], 180 | "source": [ 181 | "class VAE(keras.Model):\n", 182 | " def __init__(self, encoder, decoder, **kwargs):\n", 183 | " super().__init__(**kwargs)\n", 184 | " self.encoder = encoder\n", 185 | " self.decoder = decoder\n", 186 | " self.sampler = Sampler()\n", 187 | " self.total_loss_tracker = keras.metrics.Mean(name=\"total_loss\")\n", 188 | " self.reconstruction_loss_tracker = keras.metrics.Mean(\n", 189 | " name=\"reconstruction_loss\")\n", 190 | " self.kl_loss_tracker = keras.metrics.Mean(name=\"kl_loss\")\n", 191 | "\n", 192 | " @property\n", 193 | " def metrics(self):\n", 194 | " return [self.total_loss_tracker,\n", 195 | " self.reconstruction_loss_tracker,\n", 196 | " self.kl_loss_tracker]\n", 197 | "\n", 198 | " def train_step(self, data):\n", 199 | " with tf.GradientTape() as tape:\n", 200 | " z_mean, z_log_var = self.encoder(data)\n", 201 | " z = self.sampler(z_mean, z_log_var)\n", 202 | " reconstruction = decoder(z)\n", 203 | " reconstruction_loss = tf.reduce_mean(\n", 204 | " tf.reduce_sum(\n", 205 | " keras.losses.binary_crossentropy(data, reconstruction),\n", 206 | " axis=(1, 2)\n", 207 | " )\n", 208 | " )\n", 209 | " kl_loss = -0.5 * (1 + z_log_var - tf.square(z_mean) - tf.exp(z_log_var))\n", 210 | " total_loss = reconstruction_loss + tf.reduce_mean(kl_loss)\n", 211 | " grads = tape.gradient(total_loss, self.trainable_weights)\n", 212 | " self.optimizer.apply_gradients(zip(grads, self.trainable_weights))\n", 213 | " self.total_loss_tracker.update_state(total_loss)\n", 214 | " self.reconstruction_loss_tracker.update_state(reconstruction_loss)\n", 215 | " self.kl_loss_tracker.update_state(kl_loss)\n", 216 | " return {\n", 217 | " \"total_loss\": self.total_loss_tracker.result(),\n", 218 | " \"reconstruction_loss\": self.reconstruction_loss_tracker.result(),\n", 219 | " \"kl_loss\": self.kl_loss_tracker.result(),\n", 220 | " }" 221 | ] 222 | }, 223 | { 224 | "cell_type": "markdown", 225 | "metadata": { 226 | "colab_type": "text" 227 | }, 228 | "source": [ 229 | "**Training the VAE**" 230 | ] 231 | }, 232 | { 233 | "cell_type": "code", 234 | "execution_count": 0, 235 | "metadata": { 236 | "colab_type": "code" 237 | }, 238 | "outputs": [], 239 | "source": [ 240 | "import numpy as np\n", 241 | "\n", 242 | "(x_train, _), (x_test, _) = keras.datasets.mnist.load_data()\n", 243 | "mnist_digits = np.concatenate([x_train, x_test], axis=0)\n", 244 | "mnist_digits = np.expand_dims(mnist_digits, -1).astype(\"float32\") / 255\n", 245 | "\n", 246 | "vae = VAE(encoder, decoder)\n", 247 | "vae.compile(optimizer=keras.optimizers.Adam(), run_eagerly=True)\n", 248 | "vae.fit(mnist_digits, epochs=30, batch_size=128)" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": { 254 | "colab_type": "text" 255 | }, 256 | "source": [ 257 | "**Sampling a grid of images from the 2D latent space**" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 0, 263 | "metadata": { 264 | "colab_type": "code" 265 | }, 266 | "outputs": [], 267 | "source": [ 268 | "import matplotlib.pyplot as plt\n", 269 | "\n", 270 | "n = 30\n", 271 | "digit_size = 28\n", 272 | "figure = np.zeros((digit_size * n, digit_size * n))\n", 273 | "\n", 274 | "grid_x = np.linspace(-1, 1, n)\n", 275 | "grid_y = np.linspace(-1, 1, n)[::-1]\n", 276 | "\n", 277 | "for i, yi in enumerate(grid_y):\n", 278 | " for j, xi in enumerate(grid_x):\n", 279 | " z_sample = np.array([[xi, yi]])\n", 280 | " x_decoded = vae.decoder.predict(z_sample)\n", 281 | " digit = x_decoded[0].reshape(digit_size, digit_size)\n", 282 | " figure[\n", 283 | " i * digit_size : (i + 1) * digit_size,\n", 284 | " j * digit_size : (j + 1) * digit_size,\n", 285 | " ] = digit\n", 286 | "\n", 287 | "plt.figure(figsize=(15, 15))\n", 288 | "start_range = digit_size // 2\n", 289 | "end_range = n * digit_size + start_range\n", 290 | "pixel_range = np.arange(start_range, end_range, digit_size)\n", 291 | "sample_range_x = np.round(grid_x, 1)\n", 292 | "sample_range_y = np.round(grid_y, 1)\n", 293 | "plt.xticks(pixel_range, sample_range_x)\n", 294 | "plt.yticks(pixel_range, sample_range_y)\n", 295 | "plt.xlabel(\"z[0]\")\n", 296 | "plt.ylabel(\"z[1]\")\n", 297 | "plt.axis(\"off\")\n", 298 | "plt.imshow(figure, cmap=\"Greys_r\")" 299 | ] 300 | }, 301 | { 302 | "cell_type": "markdown", 303 | "metadata": { 304 | "colab_type": "text" 305 | }, 306 | "source": [ 307 | "### Wrapping up" 308 | ] 309 | } 310 | ], 311 | "metadata": { 312 | "colab": { 313 | "collapsed_sections": [], 314 | "name": "chapter12_part04_variational-autoencoders.i", 315 | "private_outputs": false, 316 | "provenance": [], 317 | "toc_visible": true 318 | }, 319 | "kernelspec": { 320 | "display_name": "Python 3", 321 | "language": "python", 322 | "name": "python3" 323 | }, 324 | "language_info": { 325 | "codemirror_mode": { 326 | "name": "ipython", 327 | "version": 3 328 | }, 329 | "file_extension": ".py", 330 | "mimetype": "text/x-python", 331 | "name": "python", 332 | "nbconvert_exporter": "python", 333 | "pygments_lexer": "ipython3", 334 | "version": "3.7.0" 335 | } 336 | }, 337 | "nbformat": 4, 338 | "nbformat_minor": 0 339 | } -------------------------------------------------------------------------------- /second_edition/chapter12_part03_neural-style-transfer.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "## Neural style transfer" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "### The content loss" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "### The style loss" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "colab_type": "text" 43 | }, 44 | "source": [ 45 | "### Neural style transfer in Keras" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "colab_type": "text" 52 | }, 53 | "source": [ 54 | "**Getting the style and content images**" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 0, 60 | "metadata": { 61 | "colab_type": "code" 62 | }, 63 | "outputs": [], 64 | "source": [ 65 | "from tensorflow import keras\n", 66 | "\n", 67 | "base_image_path = keras.utils.get_file(\n", 68 | " \"sf.jpg\", origin=\"https://img-datasets.s3.amazonaws.com/sf.jpg\")\n", 69 | "style_reference_image_path = keras.utils.get_file(\n", 70 | " \"starry_night.jpg\", origin=\"https://img-datasets.s3.amazonaws.com/starry_night.jpg\")\n", 71 | "\n", 72 | "original_width, original_height = keras.utils.load_img(base_image_path).size\n", 73 | "img_height = 400\n", 74 | "img_width = round(original_width * img_height / original_height)" 75 | ] 76 | }, 77 | { 78 | "cell_type": "markdown", 79 | "metadata": { 80 | "colab_type": "text" 81 | }, 82 | "source": [ 83 | "**Auxiliary functions**" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 0, 89 | "metadata": { 90 | "colab_type": "code" 91 | }, 92 | "outputs": [], 93 | "source": [ 94 | "import numpy as np\n", 95 | "\n", 96 | "def preprocess_image(image_path):\n", 97 | " img = keras.utils.load_img(\n", 98 | " image_path, target_size=(img_height, img_width))\n", 99 | " img = keras.utils.img_to_array(img)\n", 100 | " img = np.expand_dims(img, axis=0)\n", 101 | " img = keras.applications.vgg19.preprocess_input(img)\n", 102 | " return img\n", 103 | "\n", 104 | "def deprocess_image(img):\n", 105 | " img = img.reshape((img_height, img_width, 3))\n", 106 | " img[:, :, 0] += 103.939\n", 107 | " img[:, :, 1] += 116.779\n", 108 | " img[:, :, 2] += 123.68\n", 109 | " img = img[:, :, ::-1]\n", 110 | " img = np.clip(img, 0, 255).astype(\"uint8\")\n", 111 | " return img" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": { 117 | "colab_type": "text" 118 | }, 119 | "source": [ 120 | "**Using a pretrained VGG19 model to create a feature extractor**" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 0, 126 | "metadata": { 127 | "colab_type": "code" 128 | }, 129 | "outputs": [], 130 | "source": [ 131 | "model = keras.applications.vgg19.VGG19(weights=\"imagenet\", include_top=False)\n", 132 | "\n", 133 | "outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])\n", 134 | "feature_extractor = keras.Model(inputs=model.inputs, outputs=outputs_dict)" 135 | ] 136 | }, 137 | { 138 | "cell_type": "markdown", 139 | "metadata": { 140 | "colab_type": "text" 141 | }, 142 | "source": [ 143 | "**Content loss**" 144 | ] 145 | }, 146 | { 147 | "cell_type": "code", 148 | "execution_count": 0, 149 | "metadata": { 150 | "colab_type": "code" 151 | }, 152 | "outputs": [], 153 | "source": [ 154 | "def content_loss(base_img, combination_img):\n", 155 | " return tf.reduce_sum(tf.square(combination_img - base_img))" 156 | ] 157 | }, 158 | { 159 | "cell_type": "markdown", 160 | "metadata": { 161 | "colab_type": "text" 162 | }, 163 | "source": [ 164 | "**Style loss**" 165 | ] 166 | }, 167 | { 168 | "cell_type": "code", 169 | "execution_count": 0, 170 | "metadata": { 171 | "colab_type": "code" 172 | }, 173 | "outputs": [], 174 | "source": [ 175 | "def gram_matrix(x):\n", 176 | " x = tf.transpose(x, (2, 0, 1))\n", 177 | " features = tf.reshape(x, (tf.shape(x)[0], -1))\n", 178 | " gram = tf.matmul(features, tf.transpose(features))\n", 179 | " return gram\n", 180 | "\n", 181 | "def style_loss(style_img, combination_img):\n", 182 | " S = gram_matrix(style_img)\n", 183 | " C = gram_matrix(combination_img)\n", 184 | " channels = 3\n", 185 | " size = img_height * img_width\n", 186 | " return tf.reduce_sum(tf.square(S - C)) / (4.0 * (channels ** 2) * (size ** 2))" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": { 192 | "colab_type": "text" 193 | }, 194 | "source": [ 195 | "**Total variation loss**" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": 0, 201 | "metadata": { 202 | "colab_type": "code" 203 | }, 204 | "outputs": [], 205 | "source": [ 206 | "def total_variation_loss(x):\n", 207 | " a = tf.square(\n", 208 | " x[:, : img_height - 1, : img_width - 1, :] - x[:, 1:, : img_width - 1, :]\n", 209 | " )\n", 210 | " b = tf.square(\n", 211 | " x[:, : img_height - 1, : img_width - 1, :] - x[:, : img_height - 1, 1:, :]\n", 212 | " )\n", 213 | " return tf.reduce_sum(tf.pow(a + b, 1.25))" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": { 219 | "colab_type": "text" 220 | }, 221 | "source": [ 222 | "**Defining the final loss that you'll minimize**" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": 0, 228 | "metadata": { 229 | "colab_type": "code" 230 | }, 231 | "outputs": [], 232 | "source": [ 233 | "style_layer_names = [\n", 234 | " \"block1_conv1\",\n", 235 | " \"block2_conv1\",\n", 236 | " \"block3_conv1\",\n", 237 | " \"block4_conv1\",\n", 238 | " \"block5_conv1\",\n", 239 | "]\n", 240 | "content_layer_name = \"block5_conv2\"\n", 241 | "total_variation_weight = 1e-6\n", 242 | "style_weight = 1e-6\n", 243 | "content_weight = 2.5e-8\n", 244 | "\n", 245 | "def compute_loss(combination_image, base_image, style_reference_image):\n", 246 | " input_tensor = tf.concat(\n", 247 | " [base_image, style_reference_image, combination_image], axis=0\n", 248 | " )\n", 249 | " features = feature_extractor(input_tensor)\n", 250 | " loss = tf.zeros(shape=())\n", 251 | " layer_features = features[content_layer_name]\n", 252 | " base_image_features = layer_features[0, :, :, :]\n", 253 | " combination_features = layer_features[2, :, :, :]\n", 254 | " loss = loss + content_weight * content_loss(\n", 255 | " base_image_features, combination_features\n", 256 | " )\n", 257 | " for layer_name in style_layer_names:\n", 258 | " layer_features = features[layer_name]\n", 259 | " style_reference_features = layer_features[1, :, :, :]\n", 260 | " combination_features = layer_features[2, :, :, :]\n", 261 | " style_loss_value = style_loss(\n", 262 | " style_reference_features, combination_features)\n", 263 | " loss += (style_weight / len(style_layer_names)) * style_loss_value\n", 264 | "\n", 265 | " loss += total_variation_weight * total_variation_loss(combination_image)\n", 266 | " return loss" 267 | ] 268 | }, 269 | { 270 | "cell_type": "markdown", 271 | "metadata": { 272 | "colab_type": "text" 273 | }, 274 | "source": [ 275 | "**Setting up the gradient-descent process**" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": 0, 281 | "metadata": { 282 | "colab_type": "code" 283 | }, 284 | "outputs": [], 285 | "source": [ 286 | "import tensorflow as tf\n", 287 | "\n", 288 | "@tf.function\n", 289 | "def compute_loss_and_grads(combination_image, base_image, style_reference_image):\n", 290 | " with tf.GradientTape() as tape:\n", 291 | " loss = compute_loss(combination_image, base_image, style_reference_image)\n", 292 | " grads = tape.gradient(loss, combination_image)\n", 293 | " return loss, grads\n", 294 | "\n", 295 | "optimizer = keras.optimizers.SGD(\n", 296 | " keras.optimizers.schedules.ExponentialDecay(\n", 297 | " initial_learning_rate=100.0, decay_steps=100, decay_rate=0.96\n", 298 | " )\n", 299 | ")\n", 300 | "\n", 301 | "base_image = preprocess_image(base_image_path)\n", 302 | "style_reference_image = preprocess_image(style_reference_image_path)\n", 303 | "combination_image = tf.Variable(preprocess_image(base_image_path))\n", 304 | "\n", 305 | "iterations = 4000\n", 306 | "for i in range(1, iterations + 1):\n", 307 | " loss, grads = compute_loss_and_grads(\n", 308 | " combination_image, base_image, style_reference_image\n", 309 | " )\n", 310 | " optimizer.apply_gradients([(grads, combination_image)])\n", 311 | " if i % 100 == 0:\n", 312 | " print(f\"Iteration {i}: loss={loss:.2f}\")\n", 313 | " img = deprocess_image(combination_image.numpy())\n", 314 | " fname = f\"combination_image_at_iteration_{i}.png\"\n", 315 | " keras.utils.save_img(fname, img)" 316 | ] 317 | }, 318 | { 319 | "cell_type": "markdown", 320 | "metadata": { 321 | "colab_type": "text" 322 | }, 323 | "source": [ 324 | "### Wrapping up" 325 | ] 326 | } 327 | ], 328 | "metadata": { 329 | "colab": { 330 | "collapsed_sections": [], 331 | "name": "chapter12_part03_neural-style-transfer.i", 332 | "private_outputs": false, 333 | "provenance": [], 334 | "toc_visible": true 335 | }, 336 | "kernelspec": { 337 | "display_name": "Python 3", 338 | "language": "python", 339 | "name": "python3" 340 | }, 341 | "language_info": { 342 | "codemirror_mode": { 343 | "name": "ipython", 344 | "version": 3 345 | }, 346 | "file_extension": ".py", 347 | "mimetype": "text/x-python", 348 | "name": "python", 349 | "nbconvert_exporter": "python", 350 | "pygments_lexer": "ipython3", 351 | "version": "3.7.0" 352 | } 353 | }, 354 | "nbformat": 4, 355 | "nbformat_minor": 0 356 | } -------------------------------------------------------------------------------- /chapter09_convnet-architecture-patterns.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Third Edition](https://www.manning.com/books/deep-learning-with-python-third-edition). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThe book's contents are available online at [deeplearningwithpython.io](https://deeplearningwithpython.io)." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 0, 15 | "metadata": { 16 | "colab_type": "code" 17 | }, 18 | "outputs": [], 19 | "source": [ 20 | "!pip install keras keras-hub --upgrade -q" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 0, 26 | "metadata": { 27 | "colab_type": "code" 28 | }, 29 | "outputs": [], 30 | "source": [ 31 | "import os\n", 32 | "os.environ[\"KERAS_BACKEND\"] = \"jax\"" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 0, 38 | "metadata": { 39 | "cellView": "form", 40 | "colab_type": "code" 41 | }, 42 | "outputs": [], 43 | "source": [ 44 | "# @title\n", 45 | "import os\n", 46 | "from IPython.core.magic import register_cell_magic\n", 47 | "\n", 48 | "@register_cell_magic\n", 49 | "def backend(line, cell):\n", 50 | " current, required = os.environ.get(\"KERAS_BACKEND\", \"\"), line.split()[-1]\n", 51 | " if current == required:\n", 52 | " get_ipython().run_cell(cell)\n", 53 | " else:\n", 54 | " print(\n", 55 | " f\"This cell requires the {required} backend. To run it, change KERAS_BACKEND to \"\n", 56 | " f\"\\\"{required}\\\" at the top of the notebook, restart the runtime, and rerun the notebook.\"\n", 57 | " )" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": { 63 | "colab_type": "text" 64 | }, 65 | "source": [ 66 | "## ConvNet architecture patterns" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": { 72 | "colab_type": "text" 73 | }, 74 | "source": [ 75 | "### Modularity, hierarchy, and reuse" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": { 81 | "colab_type": "text" 82 | }, 83 | "source": [ 84 | "### Residual connections" 85 | ] 86 | }, 87 | { 88 | "cell_type": "code", 89 | "execution_count": 0, 90 | "metadata": { 91 | "colab_type": "code" 92 | }, 93 | "outputs": [], 94 | "source": [ 95 | "import keras\n", 96 | "from keras import layers\n", 97 | "\n", 98 | "inputs = keras.Input(shape=(32, 32, 3))\n", 99 | "x = layers.Conv2D(32, 3, activation=\"relu\")(inputs)\n", 100 | "residual = x\n", 101 | "x = layers.Conv2D(64, 3, activation=\"relu\", padding=\"same\")(x)\n", 102 | "residual = layers.Conv2D(64, 1)(residual)\n", 103 | "x = layers.add([x, residual])" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 0, 109 | "metadata": { 110 | "colab_type": "code" 111 | }, 112 | "outputs": [], 113 | "source": [ 114 | "inputs = keras.Input(shape=(32, 32, 3))\n", 115 | "x = layers.Conv2D(32, 3, activation=\"relu\")(inputs)\n", 116 | "residual = x\n", 117 | "x = layers.Conv2D(64, 3, activation=\"relu\", padding=\"same\")(x)\n", 118 | "x = layers.MaxPooling2D(2, padding=\"same\")(x)\n", 119 | "residual = layers.Conv2D(64, 1, strides=2)(residual)\n", 120 | "x = layers.add([x, residual])" 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 0, 126 | "metadata": { 127 | "colab_type": "code" 128 | }, 129 | "outputs": [], 130 | "source": [ 131 | "inputs = keras.Input(shape=(32, 32, 3))\n", 132 | "x = layers.Rescaling(1.0 / 255)(inputs)\n", 133 | "\n", 134 | "def residual_block(x, filters, pooling=False):\n", 135 | " residual = x\n", 136 | " x = layers.Conv2D(filters, 3, activation=\"relu\", padding=\"same\")(x)\n", 137 | " x = layers.Conv2D(filters, 3, activation=\"relu\", padding=\"same\")(x)\n", 138 | " if pooling:\n", 139 | " x = layers.MaxPooling2D(2, padding=\"same\")(x)\n", 140 | " residual = layers.Conv2D(filters, 1, strides=2)(residual)\n", 141 | " elif filters != residual.shape[-1]:\n", 142 | " residual = layers.Conv2D(filters, 1)(residual)\n", 143 | " x = layers.add([x, residual])\n", 144 | " return x\n", 145 | "\n", 146 | "x = residual_block(x, filters=32, pooling=True)\n", 147 | "x = residual_block(x, filters=64, pooling=True)\n", 148 | "x = residual_block(x, filters=128, pooling=False)\n", 149 | "\n", 150 | "x = layers.GlobalAveragePooling2D()(x)\n", 151 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 152 | "model = keras.Model(inputs=inputs, outputs=outputs)" 153 | ] 154 | }, 155 | { 156 | "cell_type": "markdown", 157 | "metadata": { 158 | "colab_type": "text" 159 | }, 160 | "source": [ 161 | "### Batch normalization" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": { 167 | "colab_type": "text" 168 | }, 169 | "source": [ 170 | "### Depthwise separable convolutions" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "colab_type": "text" 177 | }, 178 | "source": [ 179 | "### Putting it together: A mini Xception-like model" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 0, 185 | "metadata": { 186 | "colab_type": "code" 187 | }, 188 | "outputs": [], 189 | "source": [ 190 | "import kagglehub\n", 191 | "\n", 192 | "kagglehub.login()" 193 | ] 194 | }, 195 | { 196 | "cell_type": "code", 197 | "execution_count": 0, 198 | "metadata": { 199 | "colab_type": "code" 200 | }, 201 | "outputs": [], 202 | "source": [ 203 | "import zipfile\n", 204 | "\n", 205 | "download_path = kagglehub.competition_download(\"dogs-vs-cats\")\n", 206 | "\n", 207 | "with zipfile.ZipFile(download_path + \"/train.zip\", \"r\") as zip_ref:\n", 208 | " zip_ref.extractall(\".\")" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 0, 214 | "metadata": { 215 | "colab_type": "code" 216 | }, 217 | "outputs": [], 218 | "source": [ 219 | "import os, shutil, pathlib\n", 220 | "from keras.utils import image_dataset_from_directory\n", 221 | "\n", 222 | "original_dir = pathlib.Path(\"train\")\n", 223 | "new_base_dir = pathlib.Path(\"dogs_vs_cats_small\")\n", 224 | "\n", 225 | "def make_subset(subset_name, start_index, end_index):\n", 226 | " for category in (\"cat\", \"dog\"):\n", 227 | " dir = new_base_dir / subset_name / category\n", 228 | " os.makedirs(dir)\n", 229 | " fnames = [f\"{category}.{i}.jpg\" for i in range(start_index, end_index)]\n", 230 | " for fname in fnames:\n", 231 | " shutil.copyfile(src=original_dir / fname, dst=dir / fname)\n", 232 | "\n", 233 | "make_subset(\"train\", start_index=0, end_index=1000)\n", 234 | "make_subset(\"validation\", start_index=1000, end_index=1500)\n", 235 | "make_subset(\"test\", start_index=1500, end_index=2500)\n", 236 | "\n", 237 | "batch_size = 64\n", 238 | "image_size = (180, 180)\n", 239 | "train_dataset = image_dataset_from_directory(\n", 240 | " new_base_dir / \"train\",\n", 241 | " image_size=image_size,\n", 242 | " batch_size=batch_size,\n", 243 | ")\n", 244 | "validation_dataset = image_dataset_from_directory(\n", 245 | " new_base_dir / \"validation\",\n", 246 | " image_size=image_size,\n", 247 | " batch_size=batch_size,\n", 248 | ")\n", 249 | "test_dataset = image_dataset_from_directory(\n", 250 | " new_base_dir / \"test\",\n", 251 | " image_size=image_size,\n", 252 | " batch_size=batch_size,\n", 253 | ")" 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "execution_count": 0, 259 | "metadata": { 260 | "colab_type": "code" 261 | }, 262 | "outputs": [], 263 | "source": [ 264 | "import tensorflow as tf\n", 265 | "from keras import layers\n", 266 | "\n", 267 | "data_augmentation_layers = [\n", 268 | " layers.RandomFlip(\"horizontal\"),\n", 269 | " layers.RandomRotation(0.1),\n", 270 | " layers.RandomZoom(0.2),\n", 271 | "]\n", 272 | "\n", 273 | "def data_augmentation(images, targets):\n", 274 | " for layer in data_augmentation_layers:\n", 275 | " images = layer(images)\n", 276 | " return images, targets\n", 277 | "\n", 278 | "augmented_train_dataset = train_dataset.map(\n", 279 | " data_augmentation, num_parallel_calls=8\n", 280 | ")\n", 281 | "augmented_train_dataset = augmented_train_dataset.prefetch(tf.data.AUTOTUNE)" 282 | ] 283 | }, 284 | { 285 | "cell_type": "code", 286 | "execution_count": 0, 287 | "metadata": { 288 | "colab_type": "code" 289 | }, 290 | "outputs": [], 291 | "source": [ 292 | "import keras\n", 293 | "\n", 294 | "inputs = keras.Input(shape=(180, 180, 3))\n", 295 | "x = layers.Rescaling(1.0 / 255)(inputs)\n", 296 | "x = layers.Conv2D(filters=32, kernel_size=5, use_bias=False)(x)\n", 297 | "\n", 298 | "for size in [32, 64, 128, 256, 512]:\n", 299 | " residual = x\n", 300 | "\n", 301 | " x = layers.BatchNormalization()(x)\n", 302 | " x = layers.Activation(\"relu\")(x)\n", 303 | " x = layers.SeparableConv2D(size, 3, padding=\"same\", use_bias=False)(x)\n", 304 | "\n", 305 | " x = layers.BatchNormalization()(x)\n", 306 | " x = layers.Activation(\"relu\")(x)\n", 307 | " x = layers.SeparableConv2D(size, 3, padding=\"same\", use_bias=False)(x)\n", 308 | "\n", 309 | " x = layers.MaxPooling2D(3, strides=2, padding=\"same\")(x)\n", 310 | "\n", 311 | " residual = layers.Conv2D(\n", 312 | " size, 1, strides=2, padding=\"same\", use_bias=False\n", 313 | " )(residual)\n", 314 | " x = layers.add([x, residual])\n", 315 | "\n", 316 | "x = layers.GlobalAveragePooling2D()(x)\n", 317 | "x = layers.Dropout(0.5)(x)\n", 318 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 319 | "model = keras.Model(inputs=inputs, outputs=outputs)" 320 | ] 321 | }, 322 | { 323 | "cell_type": "code", 324 | "execution_count": 0, 325 | "metadata": { 326 | "colab_type": "code" 327 | }, 328 | "outputs": [], 329 | "source": [ 330 | "model.compile(\n", 331 | " loss=\"binary_crossentropy\",\n", 332 | " optimizer=\"adam\",\n", 333 | " metrics=[\"accuracy\"],\n", 334 | ")\n", 335 | "history = model.fit(\n", 336 | " augmented_train_dataset,\n", 337 | " epochs=100,\n", 338 | " validation_data=validation_dataset,\n", 339 | ")" 340 | ] 341 | }, 342 | { 343 | "cell_type": "markdown", 344 | "metadata": { 345 | "colab_type": "text" 346 | }, 347 | "source": [ 348 | "### Beyond convolution: Vision Transformers" 349 | ] 350 | } 351 | ], 352 | "metadata": { 353 | "accelerator": "GPU", 354 | "colab": { 355 | "collapsed_sections": [], 356 | "name": "chapter09_convnet-architecture-patterns", 357 | "private_outputs": false, 358 | "provenance": [], 359 | "toc_visible": true 360 | }, 361 | "kernelspec": { 362 | "display_name": "Python 3", 363 | "language": "python", 364 | "name": "python3" 365 | }, 366 | "language_info": { 367 | "codemirror_mode": { 368 | "name": "ipython", 369 | "version": 3 370 | }, 371 | "file_extension": ".py", 372 | "mimetype": "text/x-python", 373 | "name": "python", 374 | "nbconvert_exporter": "python", 375 | "pygments_lexer": "ipython3", 376 | "version": "3.10.0" 377 | } 378 | }, 379 | "nbformat": 4, 380 | "nbformat_minor": 0 381 | } -------------------------------------------------------------------------------- /first_edition/5.1-introduction-to-convnets.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "Using TensorFlow backend.\n" 13 | ] 14 | }, 15 | { 16 | "data": { 17 | "text/plain": [ 18 | "'2.0.8'" 19 | ] 20 | }, 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "output_type": "execute_result" 24 | } 25 | ], 26 | "source": [ 27 | "import keras\n", 28 | "keras.__version__" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": { 34 | "collapsed": true 35 | }, 36 | "source": [ 37 | "# 5.1 - Introduction to convnets\n", 38 | "\n", 39 | "This notebook contains the code sample found in Chapter 5, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.\n", 40 | "\n", 41 | "----\n", 42 | "\n", 43 | "First, let's take a practical look at a very simple convnet example. We will use our convnet to classify MNIST digits, a task that you've already been \n", 44 | "through in Chapter 2, using a densely-connected network (our test accuracy then was 97.8%). Even though our convnet will be very basic, its \n", 45 | "accuracy will still blow out of the water that of the densely-connected model from Chapter 2.\n", 46 | "\n", 47 | "The 6 lines of code below show you what a basic convnet looks like. It's a stack of `Conv2D` and `MaxPooling2D` layers. We'll see in a \n", 48 | "minute what they do concretely.\n", 49 | "Importantly, a convnet takes as input tensors of shape `(image_height, image_width, image_channels)` (not including the batch dimension). \n", 50 | "In our case, we will configure our convnet to process inputs of size `(28, 28, 1)`, which is the format of MNIST images. We do this via \n", 51 | "passing the argument `input_shape=(28, 28, 1)` to our first layer." 52 | ] 53 | }, 54 | { 55 | "cell_type": "code", 56 | "execution_count": 2, 57 | "metadata": {}, 58 | "outputs": [], 59 | "source": [ 60 | "from keras import layers\n", 61 | "from keras import models\n", 62 | "\n", 63 | "model = models.Sequential()\n", 64 | "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))\n", 65 | "model.add(layers.MaxPooling2D((2, 2)))\n", 66 | "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", 67 | "model.add(layers.MaxPooling2D((2, 2)))\n", 68 | "model.add(layers.Conv2D(64, (3, 3), activation='relu'))" 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": {}, 74 | "source": [ 75 | "Let's display the architecture of our convnet so far:" 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": 3, 81 | "metadata": {}, 82 | "outputs": [ 83 | { 84 | "name": "stdout", 85 | "output_type": "stream", 86 | "text": [ 87 | "_________________________________________________________________\n", 88 | "Layer (type) Output Shape Param # \n", 89 | "=================================================================\n", 90 | "conv2d_1 (Conv2D) (None, 26, 26, 32) 320 \n", 91 | "_________________________________________________________________\n", 92 | "max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0 \n", 93 | "_________________________________________________________________\n", 94 | "conv2d_2 (Conv2D) (None, 11, 11, 64) 18496 \n", 95 | "_________________________________________________________________\n", 96 | "max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64) 0 \n", 97 | "_________________________________________________________________\n", 98 | "conv2d_3 (Conv2D) (None, 3, 3, 64) 36928 \n", 99 | "=================================================================\n", 100 | "Total params: 55,744\n", 101 | "Trainable params: 55,744\n", 102 | "Non-trainable params: 0\n", 103 | "_________________________________________________________________\n" 104 | ] 105 | } 106 | ], 107 | "source": [ 108 | "model.summary()" 109 | ] 110 | }, 111 | { 112 | "cell_type": "markdown", 113 | "metadata": { 114 | "collapsed": true 115 | }, 116 | "source": [ 117 | "You can see above that the output of every `Conv2D` and `MaxPooling2D` layer is a 3D tensor of shape `(height, width, channels)`. The width \n", 118 | "and height dimensions tend to shrink as we go deeper in the network. The number of channels is controlled by the first argument passed to \n", 119 | "the `Conv2D` layers (e.g. 32 or 64).\n", 120 | "\n", 121 | "The next step would be to feed our last output tensor (of shape `(3, 3, 64)`) into a densely-connected classifier network like those you are \n", 122 | "already familiar with: a stack of `Dense` layers. These classifiers process vectors, which are 1D, whereas our current output is a 3D tensor. \n", 123 | "So first, we will have to flatten our 3D outputs to 1D, and then add a few `Dense` layers on top:" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 4, 129 | "metadata": { 130 | "collapsed": true 131 | }, 132 | "outputs": [], 133 | "source": [ 134 | "model.add(layers.Flatten())\n", 135 | "model.add(layers.Dense(64, activation='relu'))\n", 136 | "model.add(layers.Dense(10, activation='softmax'))" 137 | ] 138 | }, 139 | { 140 | "cell_type": "markdown", 141 | "metadata": {}, 142 | "source": [ 143 | "We are going to do 10-way classification, so we use a final layer with 10 outputs and a softmax activation. Now here's what our network \n", 144 | "looks like:" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 5, 150 | "metadata": {}, 151 | "outputs": [ 152 | { 153 | "name": "stdout", 154 | "output_type": "stream", 155 | "text": [ 156 | "_________________________________________________________________\n", 157 | "Layer (type) Output Shape Param # \n", 158 | "=================================================================\n", 159 | "conv2d_1 (Conv2D) (None, 26, 26, 32) 320 \n", 160 | "_________________________________________________________________\n", 161 | "max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0 \n", 162 | "_________________________________________________________________\n", 163 | "conv2d_2 (Conv2D) (None, 11, 11, 64) 18496 \n", 164 | "_________________________________________________________________\n", 165 | "max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64) 0 \n", 166 | "_________________________________________________________________\n", 167 | "conv2d_3 (Conv2D) (None, 3, 3, 64) 36928 \n", 168 | "_________________________________________________________________\n", 169 | "flatten_1 (Flatten) (None, 576) 0 \n", 170 | "_________________________________________________________________\n", 171 | "dense_1 (Dense) (None, 64) 36928 \n", 172 | "_________________________________________________________________\n", 173 | "dense_2 (Dense) (None, 10) 650 \n", 174 | "=================================================================\n", 175 | "Total params: 93,322\n", 176 | "Trainable params: 93,322\n", 177 | "Non-trainable params: 0\n", 178 | "_________________________________________________________________\n" 179 | ] 180 | } 181 | ], 182 | "source": [ 183 | "model.summary()" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "As you can see, our `(3, 3, 64)` outputs were flattened into vectors of shape `(576,)`, before going through two `Dense` layers.\n", 191 | "\n", 192 | "Now, let's train our convnet on the MNIST digits. We will reuse a lot of the code we have already covered in the MNIST example from Chapter \n", 193 | "2." 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 6, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "from keras.datasets import mnist\n", 203 | "from keras.utils import to_categorical\n", 204 | "\n", 205 | "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n", 206 | "\n", 207 | "train_images = train_images.reshape((60000, 28, 28, 1))\n", 208 | "train_images = train_images.astype('float32') / 255\n", 209 | "\n", 210 | "test_images = test_images.reshape((10000, 28, 28, 1))\n", 211 | "test_images = test_images.astype('float32') / 255\n", 212 | "\n", 213 | "train_labels = to_categorical(train_labels)\n", 214 | "test_labels = to_categorical(test_labels)" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 7, 220 | "metadata": {}, 221 | "outputs": [ 222 | { 223 | "name": "stdout", 224 | "output_type": "stream", 225 | "text": [ 226 | "Epoch 1/5\n", 227 | "60000/60000 [==============================] - 8s - loss: 0.1766 - acc: 0.9440 \n", 228 | "Epoch 2/5\n", 229 | "60000/60000 [==============================] - 7s - loss: 0.0462 - acc: 0.9855 \n", 230 | "Epoch 3/5\n", 231 | "60000/60000 [==============================] - 7s - loss: 0.0322 - acc: 0.9902 \n", 232 | "Epoch 4/5\n", 233 | "60000/60000 [==============================] - 7s - loss: 0.0241 - acc: 0.9926 \n", 234 | "Epoch 5/5\n", 235 | "60000/60000 [==============================] - 7s - loss: 0.0187 - acc: 0.9943 \n" 236 | ] 237 | }, 238 | { 239 | "data": { 240 | "text/plain": [ 241 | "" 242 | ] 243 | }, 244 | "execution_count": 7, 245 | "metadata": {}, 246 | "output_type": "execute_result" 247 | } 248 | ], 249 | "source": [ 250 | "model.compile(optimizer='rmsprop',\n", 251 | " loss='categorical_crossentropy',\n", 252 | " metrics=['accuracy'])\n", 253 | "model.fit(train_images, train_labels, epochs=5, batch_size=64)" 254 | ] 255 | }, 256 | { 257 | "cell_type": "markdown", 258 | "metadata": {}, 259 | "source": [ 260 | "Let's evaluate the model on the test data:" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": 8, 266 | "metadata": {}, 267 | "outputs": [ 268 | { 269 | "name": "stdout", 270 | "output_type": "stream", 271 | "text": [ 272 | " 9536/10000 [===========================>..] - ETA: 0s" 273 | ] 274 | } 275 | ], 276 | "source": [ 277 | "test_loss, test_acc = model.evaluate(test_images, test_labels)" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 9, 283 | "metadata": {}, 284 | "outputs": [ 285 | { 286 | "data": { 287 | "text/plain": [ 288 | "0.99129999999999996" 289 | ] 290 | }, 291 | "execution_count": 9, 292 | "metadata": {}, 293 | "output_type": "execute_result" 294 | } 295 | ], 296 | "source": [ 297 | "test_acc" 298 | ] 299 | }, 300 | { 301 | "cell_type": "markdown", 302 | "metadata": {}, 303 | "source": [ 304 | "While our densely-connected network from Chapter 2 had a test accuracy of 97.8%, our basic convnet has a test accuracy of 99.3%: we \n", 305 | "decreased our error rate by 68% (relative). Not bad! " 306 | ] 307 | } 308 | ], 309 | "metadata": { 310 | "kernelspec": { 311 | "display_name": "Python 3", 312 | "language": "python", 313 | "name": "python3" 314 | }, 315 | "language_info": { 316 | "codemirror_mode": { 317 | "name": "ipython", 318 | "version": 3 319 | }, 320 | "file_extension": ".py", 321 | "mimetype": "text/x-python", 322 | "name": "python", 323 | "nbconvert_exporter": "python", 324 | "pygments_lexer": "ipython3", 325 | "version": "3.5.2" 326 | } 327 | }, 328 | "nbformat": 4, 329 | "nbformat_minor": 2 330 | } 331 | -------------------------------------------------------------------------------- /second_edition/chapter13_best-practices-for-the-real-world.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "# Best practices for the real world" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "## Getting the most out of your models" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "### Hyperparameter optimization" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "colab_type": "text" 43 | }, 44 | "source": [ 45 | "#### Using KerasTuner" 46 | ] 47 | }, 48 | { 49 | "cell_type": "code", 50 | "execution_count": 0, 51 | "metadata": { 52 | "colab_type": "code" 53 | }, 54 | "outputs": [], 55 | "source": [ 56 | "!pip install keras-tuner -q" 57 | ] 58 | }, 59 | { 60 | "cell_type": "markdown", 61 | "metadata": { 62 | "colab_type": "text" 63 | }, 64 | "source": [ 65 | "**A KerasTuner model-building function**" 66 | ] 67 | }, 68 | { 69 | "cell_type": "code", 70 | "execution_count": 0, 71 | "metadata": { 72 | "colab_type": "code" 73 | }, 74 | "outputs": [], 75 | "source": [ 76 | "from tensorflow import keras\n", 77 | "from tensorflow.keras import layers\n", 78 | "\n", 79 | "def build_model(hp):\n", 80 | " units = hp.Int(name=\"units\", min_value=16, max_value=64, step=16)\n", 81 | " model = keras.Sequential([\n", 82 | " layers.Dense(units, activation=\"relu\"),\n", 83 | " layers.Dense(10, activation=\"softmax\")\n", 84 | " ])\n", 85 | " optimizer = hp.Choice(name=\"optimizer\", values=[\"rmsprop\", \"adam\"])\n", 86 | " model.compile(\n", 87 | " optimizer=optimizer,\n", 88 | " loss=\"sparse_categorical_crossentropy\",\n", 89 | " metrics=[\"accuracy\"])\n", 90 | " return model" 91 | ] 92 | }, 93 | { 94 | "cell_type": "markdown", 95 | "metadata": { 96 | "colab_type": "text" 97 | }, 98 | "source": [ 99 | "**A KerasTuner `HyperModel`**" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 0, 105 | "metadata": { 106 | "colab_type": "code" 107 | }, 108 | "outputs": [], 109 | "source": [ 110 | "import kerastuner as kt\n", 111 | "\n", 112 | "class SimpleMLP(kt.HyperModel):\n", 113 | " def __init__(self, num_classes):\n", 114 | " self.num_classes = num_classes\n", 115 | "\n", 116 | " def build(self, hp):\n", 117 | " units = hp.Int(name=\"units\", min_value=16, max_value=64, step=16)\n", 118 | " model = keras.Sequential([\n", 119 | " layers.Dense(units, activation=\"relu\"),\n", 120 | " layers.Dense(self.num_classes, activation=\"softmax\")\n", 121 | " ])\n", 122 | " optimizer = hp.Choice(name=\"optimizer\", values=[\"rmsprop\", \"adam\"])\n", 123 | " model.compile(\n", 124 | " optimizer=optimizer,\n", 125 | " loss=\"sparse_categorical_crossentropy\",\n", 126 | " metrics=[\"accuracy\"])\n", 127 | " return model\n", 128 | "\n", 129 | "hypermodel = SimpleMLP(num_classes=10)" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": 0, 135 | "metadata": { 136 | "colab_type": "code" 137 | }, 138 | "outputs": [], 139 | "source": [ 140 | "tuner = kt.BayesianOptimization(\n", 141 | " build_model,\n", 142 | " objective=\"val_accuracy\",\n", 143 | " max_trials=100,\n", 144 | " executions_per_trial=2,\n", 145 | " directory=\"mnist_kt_test\",\n", 146 | " overwrite=True,\n", 147 | ")" 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": 0, 153 | "metadata": { 154 | "colab_type": "code" 155 | }, 156 | "outputs": [], 157 | "source": [ 158 | "tuner.search_space_summary()" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 0, 164 | "metadata": { 165 | "colab_type": "code" 166 | }, 167 | "outputs": [], 168 | "source": [ 169 | "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n", 170 | "x_train = x_train.reshape((-1, 28 * 28)).astype(\"float32\") / 255\n", 171 | "x_test = x_test.reshape((-1, 28 * 28)).astype(\"float32\") / 255\n", 172 | "x_train_full = x_train[:]\n", 173 | "y_train_full = y_train[:]\n", 174 | "num_val_samples = 10000\n", 175 | "x_train, x_val = x_train[:-num_val_samples], x_train[-num_val_samples:]\n", 176 | "y_train, y_val = y_train[:-num_val_samples], y_train[-num_val_samples:]\n", 177 | "callbacks = [\n", 178 | " keras.callbacks.EarlyStopping(monitor=\"val_loss\", patience=5),\n", 179 | "]\n", 180 | "tuner.search(\n", 181 | " x_train, y_train,\n", 182 | " batch_size=128,\n", 183 | " epochs=100,\n", 184 | " validation_data=(x_val, y_val),\n", 185 | " callbacks=callbacks,\n", 186 | " verbose=2,\n", 187 | ")" 188 | ] 189 | }, 190 | { 191 | "cell_type": "markdown", 192 | "metadata": { 193 | "colab_type": "text" 194 | }, 195 | "source": [ 196 | "**Querying the best hyperparameter configurations**" 197 | ] 198 | }, 199 | { 200 | "cell_type": "code", 201 | "execution_count": 0, 202 | "metadata": { 203 | "colab_type": "code" 204 | }, 205 | "outputs": [], 206 | "source": [ 207 | "top_n = 4\n", 208 | "best_hps = tuner.get_best_hyperparameters(top_n)" 209 | ] 210 | }, 211 | { 212 | "cell_type": "code", 213 | "execution_count": 0, 214 | "metadata": { 215 | "colab_type": "code" 216 | }, 217 | "outputs": [], 218 | "source": [ 219 | "def get_best_epoch(hp):\n", 220 | " model = build_model(hp)\n", 221 | " callbacks=[\n", 222 | " keras.callbacks.EarlyStopping(\n", 223 | " monitor=\"val_loss\", mode=\"min\", patience=10)\n", 224 | " ]\n", 225 | " history = model.fit(\n", 226 | " x_train, y_train,\n", 227 | " validation_data=(x_val, y_val),\n", 228 | " epochs=100,\n", 229 | " batch_size=128,\n", 230 | " callbacks=callbacks)\n", 231 | " val_loss_per_epoch = history.history[\"val_loss\"]\n", 232 | " best_epoch = val_loss_per_epoch.index(min(val_loss_per_epoch)) + 1\n", 233 | " print(f\"Best epoch: {best_epoch}\")\n", 234 | " return best_epoch" 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": 0, 240 | "metadata": { 241 | "colab_type": "code" 242 | }, 243 | "outputs": [], 244 | "source": [ 245 | "def get_best_trained_model(hp):\n", 246 | " best_epoch = get_best_epoch(hp)\n", 247 | " model = build_model(hp)\n", 248 | " model.fit(\n", 249 | " x_train_full, y_train_full,\n", 250 | " batch_size=128, epochs=int(best_epoch * 1.2))\n", 251 | " return model\n", 252 | "\n", 253 | "best_models = []\n", 254 | "for hp in best_hps:\n", 255 | " model = get_best_trained_model(hp)\n", 256 | " model.evaluate(x_test, y_test)\n", 257 | " best_models.append(model)" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 0, 263 | "metadata": { 264 | "colab_type": "code" 265 | }, 266 | "outputs": [], 267 | "source": [ 268 | "best_models = tuner.get_best_models(top_n)" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": { 274 | "colab_type": "text" 275 | }, 276 | "source": [ 277 | "#### The art of crafting the right search space" 278 | ] 279 | }, 280 | { 281 | "cell_type": "markdown", 282 | "metadata": { 283 | "colab_type": "text" 284 | }, 285 | "source": [ 286 | "#### The future of hyperparameter tuning: automated machine learning" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": { 292 | "colab_type": "text" 293 | }, 294 | "source": [ 295 | "### Model ensembling" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": { 301 | "colab_type": "text" 302 | }, 303 | "source": [ 304 | "## Scaling-up model training" 305 | ] 306 | }, 307 | { 308 | "cell_type": "markdown", 309 | "metadata": { 310 | "colab_type": "text" 311 | }, 312 | "source": [ 313 | "### Speeding up training on GPU with mixed precision" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": { 319 | "colab_type": "text" 320 | }, 321 | "source": [ 322 | "#### Understanding floating-point precision" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 0, 328 | "metadata": { 329 | "colab_type": "code" 330 | }, 331 | "outputs": [], 332 | "source": [ 333 | "import tensorflow as tf\n", 334 | "import numpy as np\n", 335 | "np_array = np.zeros((2, 2))\n", 336 | "tf_tensor = tf.convert_to_tensor(np_array)\n", 337 | "tf_tensor.dtype" 338 | ] 339 | }, 340 | { 341 | "cell_type": "code", 342 | "execution_count": 0, 343 | "metadata": { 344 | "colab_type": "code" 345 | }, 346 | "outputs": [], 347 | "source": [ 348 | "np_array = np.zeros((2, 2))\n", 349 | "tf_tensor = tf.convert_to_tensor(np_array, dtype=\"float32\")\n", 350 | "tf_tensor.dtype" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": { 356 | "colab_type": "text" 357 | }, 358 | "source": [ 359 | "#### Mixed-precision training in practice" 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": 0, 365 | "metadata": { 366 | "colab_type": "code" 367 | }, 368 | "outputs": [], 369 | "source": [ 370 | "from tensorflow import keras\n", 371 | "keras.mixed_precision.set_global_policy(\"mixed_float16\")" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": { 377 | "colab_type": "text" 378 | }, 379 | "source": [ 380 | "### Multi-GPU training" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": { 386 | "colab_type": "text" 387 | }, 388 | "source": [ 389 | "#### Getting your hands on two or more GPUs" 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": { 395 | "colab_type": "text" 396 | }, 397 | "source": [ 398 | "#### Single-host, multi-device synchronous training" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": { 404 | "colab_type": "text" 405 | }, 406 | "source": [ 407 | "### TPU training" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": { 413 | "colab_type": "text" 414 | }, 415 | "source": [ 416 | "#### Using a TPU via Google Colab" 417 | ] 418 | }, 419 | { 420 | "cell_type": "markdown", 421 | "metadata": { 422 | "colab_type": "text" 423 | }, 424 | "source": [ 425 | "#### Leveraging step fusing to improve TPU utilization" 426 | ] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": { 431 | "colab_type": "text" 432 | }, 433 | "source": [ 434 | "## Summary" 435 | ] 436 | } 437 | ], 438 | "metadata": { 439 | "colab": { 440 | "collapsed_sections": [], 441 | "name": "chapter13_best-practices-for-the-real-world.i", 442 | "private_outputs": false, 443 | "provenance": [], 444 | "toc_visible": true 445 | }, 446 | "kernelspec": { 447 | "display_name": "Python 3", 448 | "language": "python", 449 | "name": "python3" 450 | }, 451 | "language_info": { 452 | "codemirror_mode": { 453 | "name": "ipython", 454 | "version": 3 455 | }, 456 | "file_extension": ".py", 457 | "mimetype": "text/x-python", 458 | "name": "python", 459 | "nbconvert_exporter": "python", 460 | "pygments_lexer": "ipython3", 461 | "version": "3.7.0" 462 | } 463 | }, 464 | "nbformat": 4, 465 | "nbformat_minor": 0 466 | } -------------------------------------------------------------------------------- /second_edition/chapter12_part05_gans.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "## Introduction to generative adversarial networks" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "### A schematic GAN implementation" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "### A bag of tricks" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "colab_type": "text" 43 | }, 44 | "source": [ 45 | "### Getting our hands on the CelebA dataset" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "colab_type": "text" 52 | }, 53 | "source": [ 54 | "**Getting the CelebA data**" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 0, 60 | "metadata": { 61 | "colab_type": "code" 62 | }, 63 | "outputs": [], 64 | "source": [ 65 | "!mkdir celeba_gan\n", 66 | "!gdown --id 1O7m1010EJjLE5QxLZiM9Fpjs7Oj6e684 -O celeba_gan/data.zip\n", 67 | "!unzip -qq celeba_gan/data.zip -d celeba_gan" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": { 73 | "colab_type": "text" 74 | }, 75 | "source": [ 76 | "**Creating a dataset from a directory of images**" 77 | ] 78 | }, 79 | { 80 | "cell_type": "code", 81 | "execution_count": 0, 82 | "metadata": { 83 | "colab_type": "code" 84 | }, 85 | "outputs": [], 86 | "source": [ 87 | "from tensorflow import keras\n", 88 | "dataset = keras.utils.image_dataset_from_directory(\n", 89 | " \"celeba_gan\",\n", 90 | " label_mode=None,\n", 91 | " image_size=(64, 64),\n", 92 | " batch_size=32,\n", 93 | " smart_resize=True)" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": { 99 | "colab_type": "text" 100 | }, 101 | "source": [ 102 | "**Rescaling the images**" 103 | ] 104 | }, 105 | { 106 | "cell_type": "code", 107 | "execution_count": 0, 108 | "metadata": { 109 | "colab_type": "code" 110 | }, 111 | "outputs": [], 112 | "source": [ 113 | "dataset = dataset.map(lambda x: x / 255.)" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": { 119 | "colab_type": "text" 120 | }, 121 | "source": [ 122 | "**Displaying the first image**" 123 | ] 124 | }, 125 | { 126 | "cell_type": "code", 127 | "execution_count": 0, 128 | "metadata": { 129 | "colab_type": "code" 130 | }, 131 | "outputs": [], 132 | "source": [ 133 | "import matplotlib.pyplot as plt\n", 134 | "for x in dataset:\n", 135 | " plt.axis(\"off\")\n", 136 | " plt.imshow((x.numpy() * 255).astype(\"int32\")[0])\n", 137 | " break" 138 | ] 139 | }, 140 | { 141 | "cell_type": "markdown", 142 | "metadata": { 143 | "colab_type": "text" 144 | }, 145 | "source": [ 146 | "### The discriminator" 147 | ] 148 | }, 149 | { 150 | "cell_type": "markdown", 151 | "metadata": { 152 | "colab_type": "text" 153 | }, 154 | "source": [ 155 | "**The GAN discriminator network**" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 0, 161 | "metadata": { 162 | "colab_type": "code" 163 | }, 164 | "outputs": [], 165 | "source": [ 166 | "from tensorflow.keras import layers\n", 167 | "\n", 168 | "discriminator = keras.Sequential(\n", 169 | " [\n", 170 | " keras.Input(shape=(64, 64, 3)),\n", 171 | " layers.Conv2D(64, kernel_size=4, strides=2, padding=\"same\"),\n", 172 | " layers.LeakyReLU(alpha=0.2),\n", 173 | " layers.Conv2D(128, kernel_size=4, strides=2, padding=\"same\"),\n", 174 | " layers.LeakyReLU(alpha=0.2),\n", 175 | " layers.Conv2D(128, kernel_size=4, strides=2, padding=\"same\"),\n", 176 | " layers.LeakyReLU(alpha=0.2),\n", 177 | " layers.Flatten(),\n", 178 | " layers.Dropout(0.2),\n", 179 | " layers.Dense(1, activation=\"sigmoid\"),\n", 180 | " ],\n", 181 | " name=\"discriminator\",\n", 182 | ")" 183 | ] 184 | }, 185 | { 186 | "cell_type": "code", 187 | "execution_count": 0, 188 | "metadata": { 189 | "colab_type": "code" 190 | }, 191 | "outputs": [], 192 | "source": [ 193 | "discriminator.summary()" 194 | ] 195 | }, 196 | { 197 | "cell_type": "markdown", 198 | "metadata": { 199 | "colab_type": "text" 200 | }, 201 | "source": [ 202 | "### The generator" 203 | ] 204 | }, 205 | { 206 | "cell_type": "markdown", 207 | "metadata": { 208 | "colab_type": "text" 209 | }, 210 | "source": [ 211 | "**GAN generator network**" 212 | ] 213 | }, 214 | { 215 | "cell_type": "code", 216 | "execution_count": 0, 217 | "metadata": { 218 | "colab_type": "code" 219 | }, 220 | "outputs": [], 221 | "source": [ 222 | "latent_dim = 128\n", 223 | "\n", 224 | "generator = keras.Sequential(\n", 225 | " [\n", 226 | " keras.Input(shape=(latent_dim,)),\n", 227 | " layers.Dense(8 * 8 * 128),\n", 228 | " layers.Reshape((8, 8, 128)),\n", 229 | " layers.Conv2DTranspose(128, kernel_size=4, strides=2, padding=\"same\"),\n", 230 | " layers.LeakyReLU(alpha=0.2),\n", 231 | " layers.Conv2DTranspose(256, kernel_size=4, strides=2, padding=\"same\"),\n", 232 | " layers.LeakyReLU(alpha=0.2),\n", 233 | " layers.Conv2DTranspose(512, kernel_size=4, strides=2, padding=\"same\"),\n", 234 | " layers.LeakyReLU(alpha=0.2),\n", 235 | " layers.Conv2D(3, kernel_size=5, padding=\"same\", activation=\"sigmoid\"),\n", 236 | " ],\n", 237 | " name=\"generator\",\n", 238 | ")" 239 | ] 240 | }, 241 | { 242 | "cell_type": "code", 243 | "execution_count": 0, 244 | "metadata": { 245 | "colab_type": "code" 246 | }, 247 | "outputs": [], 248 | "source": [ 249 | "generator.summary()" 250 | ] 251 | }, 252 | { 253 | "cell_type": "markdown", 254 | "metadata": { 255 | "colab_type": "text" 256 | }, 257 | "source": [ 258 | "### The adversarial network" 259 | ] 260 | }, 261 | { 262 | "cell_type": "markdown", 263 | "metadata": { 264 | "colab_type": "text" 265 | }, 266 | "source": [ 267 | "**The GAN `Model`**" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": 0, 273 | "metadata": { 274 | "colab_type": "code" 275 | }, 276 | "outputs": [], 277 | "source": [ 278 | "import tensorflow as tf\n", 279 | "class GAN(keras.Model):\n", 280 | " def __init__(self, discriminator, generator, latent_dim):\n", 281 | " super().__init__()\n", 282 | " self.discriminator = discriminator\n", 283 | " self.generator = generator\n", 284 | " self.latent_dim = latent_dim\n", 285 | " self.d_loss_metric = keras.metrics.Mean(name=\"d_loss\")\n", 286 | " self.g_loss_metric = keras.metrics.Mean(name=\"g_loss\")\n", 287 | "\n", 288 | " def compile(self, d_optimizer, g_optimizer, loss_fn):\n", 289 | " super(GAN, self).compile()\n", 290 | " self.d_optimizer = d_optimizer\n", 291 | " self.g_optimizer = g_optimizer\n", 292 | " self.loss_fn = loss_fn\n", 293 | "\n", 294 | " @property\n", 295 | " def metrics(self):\n", 296 | " return [self.d_loss_metric, self.g_loss_metric]\n", 297 | "\n", 298 | " def train_step(self, real_images):\n", 299 | " batch_size = tf.shape(real_images)[0]\n", 300 | " random_latent_vectors = tf.random.normal(\n", 301 | " shape=(batch_size, self.latent_dim))\n", 302 | " generated_images = self.generator(random_latent_vectors)\n", 303 | " combined_images = tf.concat([generated_images, real_images], axis=0)\n", 304 | " labels = tf.concat(\n", 305 | " [tf.ones((batch_size, 1)), tf.zeros((batch_size, 1))],\n", 306 | " axis=0\n", 307 | " )\n", 308 | " labels += 0.05 * tf.random.uniform(tf.shape(labels))\n", 309 | "\n", 310 | " with tf.GradientTape() as tape:\n", 311 | " predictions = self.discriminator(combined_images)\n", 312 | " d_loss = self.loss_fn(labels, predictions)\n", 313 | " grads = tape.gradient(d_loss, self.discriminator.trainable_weights)\n", 314 | " self.d_optimizer.apply_gradients(\n", 315 | " zip(grads, self.discriminator.trainable_weights)\n", 316 | " )\n", 317 | "\n", 318 | " random_latent_vectors = tf.random.normal(\n", 319 | " shape=(batch_size, self.latent_dim))\n", 320 | "\n", 321 | " misleading_labels = tf.zeros((batch_size, 1))\n", 322 | "\n", 323 | " with tf.GradientTape() as tape:\n", 324 | " predictions = self.discriminator(\n", 325 | " self.generator(random_latent_vectors))\n", 326 | " g_loss = self.loss_fn(misleading_labels, predictions)\n", 327 | " grads = tape.gradient(g_loss, self.generator.trainable_weights)\n", 328 | " self.g_optimizer.apply_gradients(\n", 329 | " zip(grads, self.generator.trainable_weights))\n", 330 | "\n", 331 | " self.d_loss_metric.update_state(d_loss)\n", 332 | " self.g_loss_metric.update_state(g_loss)\n", 333 | " return {\"d_loss\": self.d_loss_metric.result(),\n", 334 | " \"g_loss\": self.g_loss_metric.result()}" 335 | ] 336 | }, 337 | { 338 | "cell_type": "markdown", 339 | "metadata": { 340 | "colab_type": "text" 341 | }, 342 | "source": [ 343 | "**A callback that samples generated images during training**" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": 0, 349 | "metadata": { 350 | "colab_type": "code" 351 | }, 352 | "outputs": [], 353 | "source": [ 354 | "class GANMonitor(keras.callbacks.Callback):\n", 355 | " def __init__(self, num_img=3, latent_dim=128):\n", 356 | " self.num_img = num_img\n", 357 | " self.latent_dim = latent_dim\n", 358 | "\n", 359 | " def on_epoch_end(self, epoch, logs=None):\n", 360 | " random_latent_vectors = tf.random.normal(shape=(self.num_img, self.latent_dim))\n", 361 | " generated_images = self.model.generator(random_latent_vectors)\n", 362 | " generated_images *= 255\n", 363 | " generated_images.numpy()\n", 364 | " for i in range(self.num_img):\n", 365 | " img = keras.utils.array_to_img(generated_images[i])\n", 366 | " img.save(f\"generated_img_{epoch:03d}_{i}.png\")" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": { 372 | "colab_type": "text" 373 | }, 374 | "source": [ 375 | "**Compiling and training the GAN**" 376 | ] 377 | }, 378 | { 379 | "cell_type": "code", 380 | "execution_count": 0, 381 | "metadata": { 382 | "colab_type": "code" 383 | }, 384 | "outputs": [], 385 | "source": [ 386 | "epochs = 100\n", 387 | "\n", 388 | "gan = GAN(discriminator=discriminator, generator=generator, latent_dim=latent_dim)\n", 389 | "gan.compile(\n", 390 | " d_optimizer=keras.optimizers.Adam(learning_rate=0.0001),\n", 391 | " g_optimizer=keras.optimizers.Adam(learning_rate=0.0001),\n", 392 | " loss_fn=keras.losses.BinaryCrossentropy(),\n", 393 | ")\n", 394 | "\n", 395 | "gan.fit(\n", 396 | " dataset, epochs=epochs, callbacks=[GANMonitor(num_img=10, latent_dim=latent_dim)]\n", 397 | ")" 398 | ] 399 | }, 400 | { 401 | "cell_type": "markdown", 402 | "metadata": { 403 | "colab_type": "text" 404 | }, 405 | "source": [ 406 | "### Wrapping up" 407 | ] 408 | }, 409 | { 410 | "cell_type": "markdown", 411 | "metadata": { 412 | "colab_type": "text" 413 | }, 414 | "source": [ 415 | "## Summary" 416 | ] 417 | } 418 | ], 419 | "metadata": { 420 | "colab": { 421 | "collapsed_sections": [], 422 | "name": "chapter12_part05_gans.i", 423 | "private_outputs": false, 424 | "provenance": [], 425 | "toc_visible": true 426 | }, 427 | "kernelspec": { 428 | "display_name": "Python 3", 429 | "language": "python", 430 | "name": "python3" 431 | }, 432 | "language_info": { 433 | "codemirror_mode": { 434 | "name": "ipython", 435 | "version": 3 436 | }, 437 | "file_extension": ".py", 438 | "mimetype": "text/x-python", 439 | "name": "python", 440 | "nbconvert_exporter": "python", 441 | "pygments_lexer": "ipython3", 442 | "version": "3.7.0" 443 | } 444 | }, 445 | "nbformat": 4, 446 | "nbformat_minor": 0 447 | } 448 | -------------------------------------------------------------------------------- /second_edition/chapter11_part03_transformer.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "## The Transformer architecture" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "### Understanding self-attention" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "#### Generalized self-attention: the query-key-value model" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "colab_type": "text" 43 | }, 44 | "source": [ 45 | "### Multi-head attention" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "colab_type": "text" 52 | }, 53 | "source": [ 54 | "### The Transformer encoder" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "colab_type": "text" 61 | }, 62 | "source": [ 63 | "**Getting the data**" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 0, 69 | "metadata": { 70 | "colab_type": "code" 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "!curl -O https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz\n", 75 | "!tar -xf aclImdb_v1.tar.gz\n", 76 | "!rm -r aclImdb/train/unsup" 77 | ] 78 | }, 79 | { 80 | "cell_type": "markdown", 81 | "metadata": { 82 | "colab_type": "text" 83 | }, 84 | "source": [ 85 | "**Preparing the data**" 86 | ] 87 | }, 88 | { 89 | "cell_type": "code", 90 | "execution_count": 0, 91 | "metadata": { 92 | "colab_type": "code" 93 | }, 94 | "outputs": [], 95 | "source": [ 96 | "import os, pathlib, shutil, random\n", 97 | "from tensorflow import keras\n", 98 | "batch_size = 32\n", 99 | "base_dir = pathlib.Path(\"aclImdb\")\n", 100 | "val_dir = base_dir / \"val\"\n", 101 | "train_dir = base_dir / \"train\"\n", 102 | "for category in (\"neg\", \"pos\"):\n", 103 | " os.makedirs(val_dir / category)\n", 104 | " files = os.listdir(train_dir / category)\n", 105 | " random.Random(1337).shuffle(files)\n", 106 | " num_val_samples = int(0.2 * len(files))\n", 107 | " val_files = files[-num_val_samples:]\n", 108 | " for fname in val_files:\n", 109 | " shutil.move(train_dir / category / fname,\n", 110 | " val_dir / category / fname)\n", 111 | "\n", 112 | "train_ds = keras.utils.text_dataset_from_directory(\n", 113 | " \"aclImdb/train\", batch_size=batch_size\n", 114 | ")\n", 115 | "val_ds = keras.utils.text_dataset_from_directory(\n", 116 | " \"aclImdb/val\", batch_size=batch_size\n", 117 | ")\n", 118 | "test_ds = keras.utils.text_dataset_from_directory(\n", 119 | " \"aclImdb/test\", batch_size=batch_size\n", 120 | ")\n", 121 | "text_only_train_ds = train_ds.map(lambda x, y: x)" 122 | ] 123 | }, 124 | { 125 | "cell_type": "markdown", 126 | "metadata": { 127 | "colab_type": "text" 128 | }, 129 | "source": [ 130 | "**Vectorizing the data**" 131 | ] 132 | }, 133 | { 134 | "cell_type": "code", 135 | "execution_count": 0, 136 | "metadata": { 137 | "colab_type": "code" 138 | }, 139 | "outputs": [], 140 | "source": [ 141 | "from tensorflow.keras import layers\n", 142 | "\n", 143 | "max_length = 600\n", 144 | "max_tokens = 20000\n", 145 | "text_vectorization = layers.TextVectorization(\n", 146 | " max_tokens=max_tokens,\n", 147 | " output_mode=\"int\",\n", 148 | " output_sequence_length=max_length,\n", 149 | ")\n", 150 | "text_vectorization.adapt(text_only_train_ds)\n", 151 | "\n", 152 | "int_train_ds = train_ds.map(\n", 153 | " lambda x, y: (text_vectorization(x), y),\n", 154 | " num_parallel_calls=4)\n", 155 | "int_val_ds = val_ds.map(\n", 156 | " lambda x, y: (text_vectorization(x), y),\n", 157 | " num_parallel_calls=4)\n", 158 | "int_test_ds = test_ds.map(\n", 159 | " lambda x, y: (text_vectorization(x), y),\n", 160 | " num_parallel_calls=4)" 161 | ] 162 | }, 163 | { 164 | "cell_type": "markdown", 165 | "metadata": { 166 | "colab_type": "text" 167 | }, 168 | "source": [ 169 | "**Transformer encoder implemented as a subclassed `Layer`**" 170 | ] 171 | }, 172 | { 173 | "cell_type": "code", 174 | "execution_count": 0, 175 | "metadata": { 176 | "colab_type": "code" 177 | }, 178 | "outputs": [], 179 | "source": [ 180 | "import tensorflow as tf\n", 181 | "from tensorflow import keras\n", 182 | "from tensorflow.keras import layers\n", 183 | "\n", 184 | "class TransformerEncoder(layers.Layer):\n", 185 | " def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):\n", 186 | " super().__init__(**kwargs)\n", 187 | " self.embed_dim = embed_dim\n", 188 | " self.dense_dim = dense_dim\n", 189 | " self.num_heads = num_heads\n", 190 | " self.attention = layers.MultiHeadAttention(\n", 191 | " num_heads=num_heads, key_dim=embed_dim)\n", 192 | " self.dense_proj = keras.Sequential(\n", 193 | " [layers.Dense(dense_dim, activation=\"relu\"),\n", 194 | " layers.Dense(embed_dim),]\n", 195 | " )\n", 196 | " self.layernorm_1 = layers.LayerNormalization()\n", 197 | " self.layernorm_2 = layers.LayerNormalization()\n", 198 | "\n", 199 | " def call(self, inputs, mask=None):\n", 200 | " if mask is not None:\n", 201 | " mask = mask[:, tf.newaxis, :]\n", 202 | " attention_output = self.attention(\n", 203 | " inputs, inputs, attention_mask=mask)\n", 204 | " proj_input = self.layernorm_1(inputs + attention_output)\n", 205 | " proj_output = self.dense_proj(proj_input)\n", 206 | " return self.layernorm_2(proj_input + proj_output)\n", 207 | "\n", 208 | " def get_config(self):\n", 209 | " config = super().get_config()\n", 210 | " config.update({\n", 211 | " \"embed_dim\": self.embed_dim,\n", 212 | " \"num_heads\": self.num_heads,\n", 213 | " \"dense_dim\": self.dense_dim,\n", 214 | " })\n", 215 | " return config" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": { 221 | "colab_type": "text" 222 | }, 223 | "source": [ 224 | "**Using the Transformer encoder for text classification**" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 0, 230 | "metadata": { 231 | "colab_type": "code" 232 | }, 233 | "outputs": [], 234 | "source": [ 235 | "vocab_size = 20000\n", 236 | "embed_dim = 256\n", 237 | "num_heads = 2\n", 238 | "dense_dim = 32\n", 239 | "\n", 240 | "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", 241 | "x = layers.Embedding(vocab_size, embed_dim)(inputs)\n", 242 | "x = TransformerEncoder(embed_dim, dense_dim, num_heads)(x)\n", 243 | "x = layers.GlobalMaxPooling1D()(x)\n", 244 | "x = layers.Dropout(0.5)(x)\n", 245 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 246 | "model = keras.Model(inputs, outputs)\n", 247 | "model.compile(optimizer=\"rmsprop\",\n", 248 | " loss=\"binary_crossentropy\",\n", 249 | " metrics=[\"accuracy\"])\n", 250 | "model.summary()" 251 | ] 252 | }, 253 | { 254 | "cell_type": "markdown", 255 | "metadata": { 256 | "colab_type": "text" 257 | }, 258 | "source": [ 259 | "**Training and evaluating the Transformer encoder based model**" 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": 0, 265 | "metadata": { 266 | "colab_type": "code" 267 | }, 268 | "outputs": [], 269 | "source": [ 270 | "callbacks = [\n", 271 | " keras.callbacks.ModelCheckpoint(\"transformer_encoder.keras\",\n", 272 | " save_best_only=True)\n", 273 | "]\n", 274 | "model.fit(int_train_ds, validation_data=int_val_ds, epochs=20, callbacks=callbacks)\n", 275 | "model = keras.models.load_model(\n", 276 | " \"transformer_encoder.keras\",\n", 277 | " custom_objects={\"TransformerEncoder\": TransformerEncoder})\n", 278 | "print(f\"Test acc: {model.evaluate(int_test_ds)[1]:.3f}\")" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": { 284 | "colab_type": "text" 285 | }, 286 | "source": [ 287 | "#### Using positional encoding to re-inject order information" 288 | ] 289 | }, 290 | { 291 | "cell_type": "markdown", 292 | "metadata": { 293 | "colab_type": "text" 294 | }, 295 | "source": [ 296 | "**Implementing positional embedding as a subclassed layer**" 297 | ] 298 | }, 299 | { 300 | "cell_type": "code", 301 | "execution_count": 0, 302 | "metadata": { 303 | "colab_type": "code" 304 | }, 305 | "outputs": [], 306 | "source": [ 307 | "class PositionalEmbedding(layers.Layer):\n", 308 | " def __init__(self, sequence_length, input_dim, output_dim, **kwargs):\n", 309 | " super().__init__(**kwargs)\n", 310 | " self.token_embeddings = layers.Embedding(\n", 311 | " input_dim=input_dim, output_dim=output_dim)\n", 312 | " self.position_embeddings = layers.Embedding(\n", 313 | " input_dim=sequence_length, output_dim=output_dim)\n", 314 | " self.sequence_length = sequence_length\n", 315 | " self.input_dim = input_dim\n", 316 | " self.output_dim = output_dim\n", 317 | "\n", 318 | " def call(self, inputs):\n", 319 | " length = tf.shape(inputs)[-1]\n", 320 | " positions = tf.range(start=0, limit=length, delta=1)\n", 321 | " embedded_tokens = self.token_embeddings(inputs)\n", 322 | " embedded_positions = self.position_embeddings(positions)\n", 323 | " return embedded_tokens + embedded_positions\n", 324 | "\n", 325 | " def compute_mask(self, inputs, mask=None):\n", 326 | " return tf.math.not_equal(inputs, 0)\n", 327 | "\n", 328 | " def get_config(self):\n", 329 | " config = super().get_config()\n", 330 | " config.update({\n", 331 | " \"output_dim\": self.output_dim,\n", 332 | " \"sequence_length\": self.sequence_length,\n", 333 | " \"input_dim\": self.input_dim,\n", 334 | " })\n", 335 | " return config" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": { 341 | "colab_type": "text" 342 | }, 343 | "source": [ 344 | "#### Putting it all together: A text-classification Transformer" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": { 350 | "colab_type": "text" 351 | }, 352 | "source": [ 353 | "**Combining the Transformer encoder with positional embedding**" 354 | ] 355 | }, 356 | { 357 | "cell_type": "code", 358 | "execution_count": 0, 359 | "metadata": { 360 | "colab_type": "code" 361 | }, 362 | "outputs": [], 363 | "source": [ 364 | "vocab_size = 20000\n", 365 | "sequence_length = 600\n", 366 | "embed_dim = 256\n", 367 | "num_heads = 2\n", 368 | "dense_dim = 32\n", 369 | "\n", 370 | "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", 371 | "x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(inputs)\n", 372 | "x = TransformerEncoder(embed_dim, dense_dim, num_heads)(x)\n", 373 | "x = layers.GlobalMaxPooling1D()(x)\n", 374 | "x = layers.Dropout(0.5)(x)\n", 375 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 376 | "model = keras.Model(inputs, outputs)\n", 377 | "model.compile(optimizer=\"rmsprop\",\n", 378 | " loss=\"binary_crossentropy\",\n", 379 | " metrics=[\"accuracy\"])\n", 380 | "model.summary()\n", 381 | "\n", 382 | "callbacks = [\n", 383 | " keras.callbacks.ModelCheckpoint(\"full_transformer_encoder.keras\",\n", 384 | " save_best_only=True)\n", 385 | "]\n", 386 | "model.fit(int_train_ds, validation_data=int_val_ds, epochs=20, callbacks=callbacks)\n", 387 | "model = keras.models.load_model(\n", 388 | " \"full_transformer_encoder.keras\",\n", 389 | " custom_objects={\"TransformerEncoder\": TransformerEncoder,\n", 390 | " \"PositionalEmbedding\": PositionalEmbedding})\n", 391 | "print(f\"Test acc: {model.evaluate(int_test_ds)[1]:.3f}\")" 392 | ] 393 | }, 394 | { 395 | "cell_type": "markdown", 396 | "metadata": { 397 | "colab_type": "text" 398 | }, 399 | "source": [ 400 | "### When to use sequence models over bag-of-words models?" 401 | ] 402 | } 403 | ], 404 | "metadata": { 405 | "colab": { 406 | "collapsed_sections": [], 407 | "name": "chapter11_part03_transformer.i", 408 | "private_outputs": false, 409 | "provenance": [], 410 | "toc_visible": true 411 | }, 412 | "kernelspec": { 413 | "display_name": "Python 3", 414 | "language": "python", 415 | "name": "python3" 416 | }, 417 | "language_info": { 418 | "codemirror_mode": { 419 | "name": "ipython", 420 | "version": 3 421 | }, 422 | "file_extension": ".py", 423 | "mimetype": "text/x-python", 424 | "name": "python", 425 | "nbconvert_exporter": "python", 426 | "pygments_lexer": "ipython3", 427 | "version": "3.7.0" 428 | } 429 | }, 430 | "nbformat": 4, 431 | "nbformat_minor": 0 432 | } -------------------------------------------------------------------------------- /second_edition/chapter11_part02_sequence-models.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "### Processing words as a sequence: The sequence model approach" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "#### A first practical example" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "**Downloading the data**" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": 0, 42 | "metadata": { 43 | "colab_type": "code" 44 | }, 45 | "outputs": [], 46 | "source": [ 47 | "!curl -O https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz\n", 48 | "!tar -xf aclImdb_v1.tar.gz\n", 49 | "!rm -r aclImdb/train/unsup" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": { 55 | "colab_type": "text" 56 | }, 57 | "source": [ 58 | "**Preparing the data**" 59 | ] 60 | }, 61 | { 62 | "cell_type": "code", 63 | "execution_count": 0, 64 | "metadata": { 65 | "colab_type": "code" 66 | }, 67 | "outputs": [], 68 | "source": [ 69 | "import os, pathlib, shutil, random\n", 70 | "from tensorflow import keras\n", 71 | "batch_size = 32\n", 72 | "base_dir = pathlib.Path(\"aclImdb\")\n", 73 | "val_dir = base_dir / \"val\"\n", 74 | "train_dir = base_dir / \"train\"\n", 75 | "for category in (\"neg\", \"pos\"):\n", 76 | " os.makedirs(val_dir / category)\n", 77 | " files = os.listdir(train_dir / category)\n", 78 | " random.Random(1337).shuffle(files)\n", 79 | " num_val_samples = int(0.2 * len(files))\n", 80 | " val_files = files[-num_val_samples:]\n", 81 | " for fname in val_files:\n", 82 | " shutil.move(train_dir / category / fname,\n", 83 | " val_dir / category / fname)\n", 84 | "\n", 85 | "train_ds = keras.utils.text_dataset_from_directory(\n", 86 | " \"aclImdb/train\", batch_size=batch_size\n", 87 | ")\n", 88 | "val_ds = keras.utils.text_dataset_from_directory(\n", 89 | " \"aclImdb/val\", batch_size=batch_size\n", 90 | ")\n", 91 | "test_ds = keras.utils.text_dataset_from_directory(\n", 92 | " \"aclImdb/test\", batch_size=batch_size\n", 93 | ")\n", 94 | "text_only_train_ds = train_ds.map(lambda x, y: x)" 95 | ] 96 | }, 97 | { 98 | "cell_type": "markdown", 99 | "metadata": { 100 | "colab_type": "text" 101 | }, 102 | "source": [ 103 | "**Preparing integer sequence datasets**" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 0, 109 | "metadata": { 110 | "colab_type": "code" 111 | }, 112 | "outputs": [], 113 | "source": [ 114 | "from tensorflow.keras import layers\n", 115 | "\n", 116 | "max_length = 600\n", 117 | "max_tokens = 20000\n", 118 | "text_vectorization = layers.TextVectorization(\n", 119 | " max_tokens=max_tokens,\n", 120 | " output_mode=\"int\",\n", 121 | " output_sequence_length=max_length,\n", 122 | ")\n", 123 | "text_vectorization.adapt(text_only_train_ds)\n", 124 | "\n", 125 | "int_train_ds = train_ds.map(\n", 126 | " lambda x, y: (text_vectorization(x), y),\n", 127 | " num_parallel_calls=4)\n", 128 | "int_val_ds = val_ds.map(\n", 129 | " lambda x, y: (text_vectorization(x), y),\n", 130 | " num_parallel_calls=4)\n", 131 | "int_test_ds = test_ds.map(\n", 132 | " lambda x, y: (text_vectorization(x), y),\n", 133 | " num_parallel_calls=4)" 134 | ] 135 | }, 136 | { 137 | "cell_type": "markdown", 138 | "metadata": { 139 | "colab_type": "text" 140 | }, 141 | "source": [ 142 | "**A sequence model built on one-hot encoded vector sequences**" 143 | ] 144 | }, 145 | { 146 | "cell_type": "code", 147 | "execution_count": 0, 148 | "metadata": { 149 | "colab_type": "code" 150 | }, 151 | "outputs": [], 152 | "source": [ 153 | "import tensorflow as tf\n", 154 | "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", 155 | "embedded = tf.one_hot(inputs, depth=max_tokens)\n", 156 | "x = layers.Bidirectional(layers.LSTM(32))(embedded)\n", 157 | "x = layers.Dropout(0.5)(x)\n", 158 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 159 | "model = keras.Model(inputs, outputs)\n", 160 | "model.compile(optimizer=\"rmsprop\",\n", 161 | " loss=\"binary_crossentropy\",\n", 162 | " metrics=[\"accuracy\"])\n", 163 | "model.summary()" 164 | ] 165 | }, 166 | { 167 | "cell_type": "markdown", 168 | "metadata": { 169 | "colab_type": "text" 170 | }, 171 | "source": [ 172 | "**Training a first basic sequence model**" 173 | ] 174 | }, 175 | { 176 | "cell_type": "code", 177 | "execution_count": 0, 178 | "metadata": { 179 | "colab_type": "code" 180 | }, 181 | "outputs": [], 182 | "source": [ 183 | "callbacks = [\n", 184 | " keras.callbacks.ModelCheckpoint(\"one_hot_bidir_lstm.keras\",\n", 185 | " save_best_only=True)\n", 186 | "]\n", 187 | "model.fit(int_train_ds, validation_data=int_val_ds, epochs=10, callbacks=callbacks)\n", 188 | "model = keras.models.load_model(\"one_hot_bidir_lstm.keras\")\n", 189 | "print(f\"Test acc: {model.evaluate(int_test_ds)[1]:.3f}\")" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": { 195 | "colab_type": "text" 196 | }, 197 | "source": [ 198 | "#### Understanding word embeddings" 199 | ] 200 | }, 201 | { 202 | "cell_type": "markdown", 203 | "metadata": { 204 | "colab_type": "text" 205 | }, 206 | "source": [ 207 | "#### Learning word embeddings with the Embedding layer" 208 | ] 209 | }, 210 | { 211 | "cell_type": "markdown", 212 | "metadata": { 213 | "colab_type": "text" 214 | }, 215 | "source": [ 216 | "**Instantiating an `Embedding` layer**" 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": 0, 222 | "metadata": { 223 | "colab_type": "code" 224 | }, 225 | "outputs": [], 226 | "source": [ 227 | "embedding_layer = layers.Embedding(input_dim=max_tokens, output_dim=256)" 228 | ] 229 | }, 230 | { 231 | "cell_type": "markdown", 232 | "metadata": { 233 | "colab_type": "text" 234 | }, 235 | "source": [ 236 | "**Model that uses an `Embedding` layer trained from scratch**" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 0, 242 | "metadata": { 243 | "colab_type": "code" 244 | }, 245 | "outputs": [], 246 | "source": [ 247 | "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", 248 | "embedded = layers.Embedding(input_dim=max_tokens, output_dim=256)(inputs)\n", 249 | "x = layers.Bidirectional(layers.LSTM(32))(embedded)\n", 250 | "x = layers.Dropout(0.5)(x)\n", 251 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 252 | "model = keras.Model(inputs, outputs)\n", 253 | "model.compile(optimizer=\"rmsprop\",\n", 254 | " loss=\"binary_crossentropy\",\n", 255 | " metrics=[\"accuracy\"])\n", 256 | "model.summary()\n", 257 | "\n", 258 | "callbacks = [\n", 259 | " keras.callbacks.ModelCheckpoint(\"embeddings_bidir_gru.keras\",\n", 260 | " save_best_only=True)\n", 261 | "]\n", 262 | "model.fit(int_train_ds, validation_data=int_val_ds, epochs=10, callbacks=callbacks)\n", 263 | "model = keras.models.load_model(\"embeddings_bidir_gru.keras\")\n", 264 | "print(f\"Test acc: {model.evaluate(int_test_ds)[1]:.3f}\")" 265 | ] 266 | }, 267 | { 268 | "cell_type": "markdown", 269 | "metadata": { 270 | "colab_type": "text" 271 | }, 272 | "source": [ 273 | "#### Understanding padding and masking" 274 | ] 275 | }, 276 | { 277 | "cell_type": "markdown", 278 | "metadata": { 279 | "colab_type": "text" 280 | }, 281 | "source": [ 282 | "**Using an `Embedding` layer with masking enabled**" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": 0, 288 | "metadata": { 289 | "colab_type": "code" 290 | }, 291 | "outputs": [], 292 | "source": [ 293 | "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", 294 | "embedded = layers.Embedding(\n", 295 | " input_dim=max_tokens, output_dim=256, mask_zero=True)(inputs)\n", 296 | "x = layers.Bidirectional(layers.LSTM(32))(embedded)\n", 297 | "x = layers.Dropout(0.5)(x)\n", 298 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 299 | "model = keras.Model(inputs, outputs)\n", 300 | "model.compile(optimizer=\"rmsprop\",\n", 301 | " loss=\"binary_crossentropy\",\n", 302 | " metrics=[\"accuracy\"])\n", 303 | "model.summary()\n", 304 | "\n", 305 | "callbacks = [\n", 306 | " keras.callbacks.ModelCheckpoint(\"embeddings_bidir_gru_with_masking.keras\",\n", 307 | " save_best_only=True)\n", 308 | "]\n", 309 | "model.fit(int_train_ds, validation_data=int_val_ds, epochs=10, callbacks=callbacks)\n", 310 | "model = keras.models.load_model(\"embeddings_bidir_gru_with_masking.keras\")\n", 311 | "print(f\"Test acc: {model.evaluate(int_test_ds)[1]:.3f}\")" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": { 317 | "colab_type": "text" 318 | }, 319 | "source": [ 320 | "#### Using pretrained word embeddings" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": 0, 326 | "metadata": { 327 | "colab_type": "code" 328 | }, 329 | "outputs": [], 330 | "source": [ 331 | "!wget http://nlp.stanford.edu/data/glove.6B.zip\n", 332 | "!unzip -q glove.6B.zip" 333 | ] 334 | }, 335 | { 336 | "cell_type": "markdown", 337 | "metadata": { 338 | "colab_type": "text" 339 | }, 340 | "source": [ 341 | "**Parsing the GloVe word-embeddings file**" 342 | ] 343 | }, 344 | { 345 | "cell_type": "code", 346 | "execution_count": 0, 347 | "metadata": { 348 | "colab_type": "code" 349 | }, 350 | "outputs": [], 351 | "source": [ 352 | "import numpy as np\n", 353 | "path_to_glove_file = \"glove.6B.100d.txt\"\n", 354 | "\n", 355 | "embeddings_index = {}\n", 356 | "with open(path_to_glove_file) as f:\n", 357 | " for line in f:\n", 358 | " word, coefs = line.split(maxsplit=1)\n", 359 | " coefs = np.fromstring(coefs, \"f\", sep=\" \")\n", 360 | " embeddings_index[word] = coefs\n", 361 | "\n", 362 | "print(f\"Found {len(embeddings_index)} word vectors.\")" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": { 368 | "colab_type": "text" 369 | }, 370 | "source": [ 371 | "**Preparing the GloVe word-embeddings matrix**" 372 | ] 373 | }, 374 | { 375 | "cell_type": "code", 376 | "execution_count": 0, 377 | "metadata": { 378 | "colab_type": "code" 379 | }, 380 | "outputs": [], 381 | "source": [ 382 | "embedding_dim = 100\n", 383 | "\n", 384 | "vocabulary = text_vectorization.get_vocabulary()\n", 385 | "word_index = dict(zip(vocabulary, range(len(vocabulary))))\n", 386 | "\n", 387 | "embedding_matrix = np.zeros((max_tokens, embedding_dim))\n", 388 | "for word, i in word_index.items():\n", 389 | " if i < max_tokens:\n", 390 | " embedding_vector = embeddings_index.get(word)\n", 391 | " if embedding_vector is not None:\n", 392 | " embedding_matrix[i] = embedding_vector" 393 | ] 394 | }, 395 | { 396 | "cell_type": "code", 397 | "execution_count": 0, 398 | "metadata": { 399 | "colab_type": "code" 400 | }, 401 | "outputs": [], 402 | "source": [ 403 | "embedding_layer = layers.Embedding(\n", 404 | " max_tokens,\n", 405 | " embedding_dim,\n", 406 | " embeddings_initializer=keras.initializers.Constant(embedding_matrix),\n", 407 | " trainable=False,\n", 408 | " mask_zero=True,\n", 409 | ")" 410 | ] 411 | }, 412 | { 413 | "cell_type": "markdown", 414 | "metadata": { 415 | "colab_type": "text" 416 | }, 417 | "source": [ 418 | "**Model that uses a pretrained Embedding layer**" 419 | ] 420 | }, 421 | { 422 | "cell_type": "code", 423 | "execution_count": 0, 424 | "metadata": { 425 | "colab_type": "code" 426 | }, 427 | "outputs": [], 428 | "source": [ 429 | "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", 430 | "embedded = embedding_layer(inputs)\n", 431 | "x = layers.Bidirectional(layers.LSTM(32))(embedded)\n", 432 | "x = layers.Dropout(0.5)(x)\n", 433 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 434 | "model = keras.Model(inputs, outputs)\n", 435 | "model.compile(optimizer=\"rmsprop\",\n", 436 | " loss=\"binary_crossentropy\",\n", 437 | " metrics=[\"accuracy\"])\n", 438 | "model.summary()\n", 439 | "\n", 440 | "callbacks = [\n", 441 | " keras.callbacks.ModelCheckpoint(\"glove_embeddings_sequence_model.keras\",\n", 442 | " save_best_only=True)\n", 443 | "]\n", 444 | "model.fit(int_train_ds, validation_data=int_val_ds, epochs=10, callbacks=callbacks)\n", 445 | "model = keras.models.load_model(\"glove_embeddings_sequence_model.keras\")\n", 446 | "print(f\"Test acc: {model.evaluate(int_test_ds)[1]:.3f}\")" 447 | ] 448 | } 449 | ], 450 | "metadata": { 451 | "colab": { 452 | "collapsed_sections": [], 453 | "name": "chapter11_part02_sequence-models.i", 454 | "private_outputs": false, 455 | "provenance": [], 456 | "toc_visible": true 457 | }, 458 | "kernelspec": { 459 | "display_name": "Python 3", 460 | "language": "python", 461 | "name": "python3" 462 | }, 463 | "language_info": { 464 | "codemirror_mode": { 465 | "name": "ipython", 466 | "version": 3 467 | }, 468 | "file_extension": ".py", 469 | "mimetype": "text/x-python", 470 | "name": "python", 471 | "nbconvert_exporter": "python", 472 | "pygments_lexer": "ipython3", 473 | "version": "3.7.0" 474 | } 475 | }, 476 | "nbformat": 4, 477 | "nbformat_minor": 0 478 | } 479 | -------------------------------------------------------------------------------- /second_edition/chapter14_conclusions.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "# Conclusions" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "## Key concepts in review" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "### Various approaches to AI" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "colab_type": "text" 43 | }, 44 | "source": [ 45 | "### What makes deep learning special within the field of machine learning" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "colab_type": "text" 52 | }, 53 | "source": [ 54 | "### How to think about deep learning" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "colab_type": "text" 61 | }, 62 | "source": [ 63 | "### Key enabling technologies" 64 | ] 65 | }, 66 | { 67 | "cell_type": "markdown", 68 | "metadata": { 69 | "colab_type": "text" 70 | }, 71 | "source": [ 72 | "### The universal machine-learning workflow" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": { 78 | "colab_type": "text" 79 | }, 80 | "source": [ 81 | "### Key network architectures" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": { 87 | "colab_type": "text" 88 | }, 89 | "source": [ 90 | "#### Densely connected networks" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 0, 96 | "metadata": { 97 | "colab_type": "code" 98 | }, 99 | "outputs": [], 100 | "source": [ 101 | "from tensorflow import keras\n", 102 | "from tensorflow.keras\u00a0import\u00a0layers\n", 103 | "inputs = keras.Input(shape=(num_input_features,))\n", 104 | "x = layers.Dense(32,\u00a0activation=\"relu\")(inputs)\n", 105 | "x = layers.Dense(32,\u00a0activation=\"relu\")(x)\n", 106 | "outputs = layers.Dense(1,\u00a0activation=\"sigmoid\")(x)\n", 107 | "model = keras.Model(inputs, outputs)\n", 108 | "model.compile(optimizer=\"rmsprop\",\u00a0loss=\"binary_crossentropy\")" 109 | ] 110 | }, 111 | { 112 | "cell_type": "code", 113 | "execution_count": 0, 114 | "metadata": { 115 | "colab_type": "code" 116 | }, 117 | "outputs": [], 118 | "source": [ 119 | "inputs = keras.Input(shape=(num_input_features,))\n", 120 | "x = layers.Dense(32,\u00a0activation=\"relu\")(inputs)\n", 121 | "x = layers.Dense(32,\u00a0activation=\"relu\")(x)\n", 122 | "outputs = layers.Dense(num_classes,\u00a0activation=\"softmax\")(x)\n", 123 | "model = keras.Model(inputs, outputs)\n", 124 | "model.compile(optimizer=\"rmsprop\",\u00a0loss=\"categorical_crossentropy\")" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 0, 130 | "metadata": { 131 | "colab_type": "code" 132 | }, 133 | "outputs": [], 134 | "source": [ 135 | "inputs = keras.Input(shape=(num_input_features,))\n", 136 | "x = layers.Dense(32,\u00a0activation=\"relu\")(inputs)\n", 137 | "x = layers.Dense(32,\u00a0activation=\"relu\")(x)\n", 138 | "outputs = layers.Dense(num_classes,\u00a0activation=\"sigmoid\")(x)\n", 139 | "model = keras.Model(inputs, outputs)\n", 140 | "model.compile(optimizer=\"rmsprop\",\u00a0loss=\"binary_crossentropy\")" 141 | ] 142 | }, 143 | { 144 | "cell_type": "code", 145 | "execution_count": 0, 146 | "metadata": { 147 | "colab_type": "code" 148 | }, 149 | "outputs": [], 150 | "source": [ 151 | "inputs = keras.Input(shape=(num_input_features,))\n", 152 | "x = layers.Dense(32,\u00a0activation=\"relu\")(inputs)\n", 153 | "x = layers.Dense(32,\u00a0activation=\"relu\")(x)\n", 154 | "outputs layers.Dense(num_values)(x)\n", 155 | "model = keras.Model(inputs, outputs)\n", 156 | "model.compile(optimizer=\"rmsprop\",\u00a0loss=\"mse\")" 157 | ] 158 | }, 159 | { 160 | "cell_type": "markdown", 161 | "metadata": { 162 | "colab_type": "text" 163 | }, 164 | "source": [ 165 | "#### Convnets" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 0, 171 | "metadata": { 172 | "colab_type": "code" 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "inputs = keras.Input(shape=(height,\u00a0width,\u00a0channels))\n", 177 | "x = layers.SeparableConv2D(32,\u00a03,\u00a0activation=\"relu\")(inputs)\n", 178 | "x = layers.SeparableConv2D(64,\u00a03,\u00a0activation=\"relu\")(x)\n", 179 | "x = layers.MaxPooling2D(2)(x)\n", 180 | "x = layers.SeparableConv2D(64,\u00a03,\u00a0activation=\"relu\")(x)\n", 181 | "x = layers.SeparableConv2D(128,\u00a03,\u00a0activation=\"relu\")(x)\n", 182 | "x = layers.MaxPooling2D(2)(x)\n", 183 | "x = layers.SeparableConv2D(64,\u00a03,\u00a0activation=\"relu\")(x)\n", 184 | "x = layers.SeparableConv2D(128,\u00a03,\u00a0activation=\"relu\")(x)\n", 185 | "x = layers.GlobalAveragePooling2D()(x)\n", 186 | "x = layers.Dense(32,\u00a0activation=\"relu\")(x)\n", 187 | "outputs = layers.Dense(num_classes,\u00a0activation=\"softmax\")(x)\n", 188 | "model = keras.Model(inputs, outputs)\n", 189 | "model.compile(optimizer=\"rmsprop\",\u00a0loss=\"categorical_crossentropy\")" 190 | ] 191 | }, 192 | { 193 | "cell_type": "markdown", 194 | "metadata": { 195 | "colab_type": "text" 196 | }, 197 | "source": [ 198 | "#### RNNs" 199 | ] 200 | }, 201 | { 202 | "cell_type": "code", 203 | "execution_count": 0, 204 | "metadata": { 205 | "colab_type": "code" 206 | }, 207 | "outputs": [], 208 | "source": [ 209 | "inputs = keras.Input(shape=(num_timesteps,\u00a0num_features))\n", 210 | "x = layers.LSTM(32)(inputs)\n", 211 | "outputs = layers.Dense(num_classes,\u00a0activation=\"sigmoid\")(x)\n", 212 | "model = keras.Model(inputs, outputs)\n", 213 | "model.compile(optimizer=\"rmsprop\",\u00a0loss=\"binary_crossentropy\")" 214 | ] 215 | }, 216 | { 217 | "cell_type": "code", 218 | "execution_count": 0, 219 | "metadata": { 220 | "colab_type": "code" 221 | }, 222 | "outputs": [], 223 | "source": [ 224 | "inputs = keras.Input(shape=(num_timesteps,\u00a0num_features))\n", 225 | "x = layers.LSTM(32,\u00a0return_sequences=True)(inputs)\n", 226 | "x = layers.LSTM(32,\u00a0return_sequences=True)(x)\n", 227 | "x = layers.LSTM(32)(x)\n", 228 | "outputs = layers.Dense(num_classes,\u00a0activation=\"sigmoid\")(x)\n", 229 | "model = keras.Model(inputs, outputs)\n", 230 | "model.compile(optimizer=\"rmsprop\",\u00a0loss=\"binary_crossentropy\")" 231 | ] 232 | }, 233 | { 234 | "cell_type": "markdown", 235 | "metadata": { 236 | "colab_type": "text" 237 | }, 238 | "source": [ 239 | "#### Transformers" 240 | ] 241 | }, 242 | { 243 | "cell_type": "code", 244 | "execution_count": 0, 245 | "metadata": { 246 | "colab_type": "code" 247 | }, 248 | "outputs": [], 249 | "source": [ 250 | "encoder_inputs = keras.Input(shape=(sequence_length,), dtype=\"int64\")\n", 251 | "x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(encoder_inputs)\n", 252 | "encoder_outputs = TransformerEncoder(embed_dim, dense_dim, num_heads)(x)\n", 253 | "decoder_inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", 254 | "x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(decoder_inputs)\n", 255 | "x = TransformerDecoder(embed_dim, dense_dim, num_heads)(x, encoder_outputs)\n", 256 | "decoder_outputs = layers.Dense(vocab_size, activation=\"softmax\")(x)\n", 257 | "transformer = keras.Model([encoder_inputs, decoder_inputs], decoder_outputs)\n", 258 | "transformer.compile(optimizer=\"rmsprop\", loss=\"categorical_crossentropy\")" 259 | ] 260 | }, 261 | { 262 | "cell_type": "code", 263 | "execution_count": 0, 264 | "metadata": { 265 | "colab_type": "code" 266 | }, 267 | "outputs": [], 268 | "source": [ 269 | "inputs = keras.Input(shape=(sequence_length,), dtype=\"int64\")\n", 270 | "x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(inputs)\n", 271 | "x = TransformerEncoder(embed_dim, dense_dim, num_heads)(x)\n", 272 | "x = layers.GlobalMaxPooling1D()(x)\n", 273 | "outputs = layers.Dense(1, activation=\"sigmoid\")(x)\n", 274 | "model = keras.Model(inputs, outputs)\n", 275 | "model.compile(optimizer=\"rmsprop\", loss=\"binary_crossentropy\")" 276 | ] 277 | }, 278 | { 279 | "cell_type": "markdown", 280 | "metadata": { 281 | "colab_type": "text" 282 | }, 283 | "source": [ 284 | "### The space of possibilities" 285 | ] 286 | }, 287 | { 288 | "cell_type": "markdown", 289 | "metadata": { 290 | "colab_type": "text" 291 | }, 292 | "source": [ 293 | "## The limitations of deep learning" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": { 299 | "colab_type": "text" 300 | }, 301 | "source": [ 302 | "### The risk of anthropomorphizing machine-learning models" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": { 308 | "colab_type": "text" 309 | }, 310 | "source": [ 311 | "### Automatons vs. intelligent agents" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": { 317 | "colab_type": "text" 318 | }, 319 | "source": [ 320 | "### Local generalization vs. extreme generalization" 321 | ] 322 | }, 323 | { 324 | "cell_type": "markdown", 325 | "metadata": { 326 | "colab_type": "text" 327 | }, 328 | "source": [ 329 | "### The purpose of intelligence" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": { 335 | "colab_type": "text" 336 | }, 337 | "source": [ 338 | "### Climbing the spectrum of generalization" 339 | ] 340 | }, 341 | { 342 | "cell_type": "markdown", 343 | "metadata": { 344 | "colab_type": "text" 345 | }, 346 | "source": [ 347 | "## Setting the course toward greater generality in AI" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": { 353 | "colab_type": "text" 354 | }, 355 | "source": [ 356 | "### On the importance of setting the right objective: The shortcut rule" 357 | ] 358 | }, 359 | { 360 | "cell_type": "markdown", 361 | "metadata": { 362 | "colab_type": "text" 363 | }, 364 | "source": [ 365 | "### A new target" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": { 371 | "colab_type": "text" 372 | }, 373 | "source": [ 374 | "## Implementing intelligence: The missing ingredients" 375 | ] 376 | }, 377 | { 378 | "cell_type": "markdown", 379 | "metadata": { 380 | "colab_type": "text" 381 | }, 382 | "source": [ 383 | "### Intelligence as sensitivity to abstract analogies" 384 | ] 385 | }, 386 | { 387 | "cell_type": "markdown", 388 | "metadata": { 389 | "colab_type": "text" 390 | }, 391 | "source": [ 392 | "### The two poles of abstraction" 393 | ] 394 | }, 395 | { 396 | "cell_type": "markdown", 397 | "metadata": { 398 | "colab_type": "text" 399 | }, 400 | "source": [ 401 | "#### Value-centric analogy" 402 | ] 403 | }, 404 | { 405 | "cell_type": "markdown", 406 | "metadata": { 407 | "colab_type": "text" 408 | }, 409 | "source": [ 410 | "#### Program-centric analogy" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": { 416 | "colab_type": "text" 417 | }, 418 | "source": [ 419 | "#### Cognition as a combination of both kinds of abstraction" 420 | ] 421 | }, 422 | { 423 | "cell_type": "markdown", 424 | "metadata": { 425 | "colab_type": "text" 426 | }, 427 | "source": [ 428 | "### The missing half of the picture" 429 | ] 430 | }, 431 | { 432 | "cell_type": "markdown", 433 | "metadata": { 434 | "colab_type": "text" 435 | }, 436 | "source": [ 437 | "## The future of deep learning" 438 | ] 439 | }, 440 | { 441 | "cell_type": "markdown", 442 | "metadata": { 443 | "colab_type": "text" 444 | }, 445 | "source": [ 446 | "### Models as programs" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": { 452 | "colab_type": "text" 453 | }, 454 | "source": [ 455 | "### Blending together deep learning and program synthesis" 456 | ] 457 | }, 458 | { 459 | "cell_type": "markdown", 460 | "metadata": { 461 | "colab_type": "text" 462 | }, 463 | "source": [ 464 | "#### Integrating deep-learning modules and algorithmic modules into hybrid systems" 465 | ] 466 | }, 467 | { 468 | "cell_type": "markdown", 469 | "metadata": { 470 | "colab_type": "text" 471 | }, 472 | "source": [ 473 | "#### Using deep learning to guide program search" 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": { 479 | "colab_type": "text" 480 | }, 481 | "source": [ 482 | "### Lifelong learning and modular subroutine reuse" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": { 488 | "colab_type": "text" 489 | }, 490 | "source": [ 491 | "### The long-term vision" 492 | ] 493 | }, 494 | { 495 | "cell_type": "markdown", 496 | "metadata": { 497 | "colab_type": "text" 498 | }, 499 | "source": [ 500 | "## Staying up to date in a fast-moving field" 501 | ] 502 | }, 503 | { 504 | "cell_type": "markdown", 505 | "metadata": { 506 | "colab_type": "text" 507 | }, 508 | "source": [ 509 | "### Practice on real-world problems using Kaggle" 510 | ] 511 | }, 512 | { 513 | "cell_type": "markdown", 514 | "metadata": { 515 | "colab_type": "text" 516 | }, 517 | "source": [ 518 | "### Read about the latest developments on arXiv" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": { 524 | "colab_type": "text" 525 | }, 526 | "source": [ 527 | "### Explore the Keras ecosystem" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": { 533 | "colab_type": "text" 534 | }, 535 | "source": [ 536 | "## Final words" 537 | ] 538 | } 539 | ], 540 | "metadata": { 541 | "colab": { 542 | "collapsed_sections": [], 543 | "name": "chapter14_conclusions.i", 544 | "private_outputs": false, 545 | "provenance": [], 546 | "toc_visible": true 547 | }, 548 | "kernelspec": { 549 | "display_name": "Python 3", 550 | "language": "python", 551 | "name": "python3" 552 | }, 553 | "language_info": { 554 | "codemirror_mode": { 555 | "name": "ipython", 556 | "version": 3 557 | }, 558 | "file_extension": ".py", 559 | "mimetype": "text/x-python", 560 | "name": "python", 561 | "nbconvert_exporter": "python", 562 | "pygments_lexer": "ipython3", 563 | "version": "3.7.0" 564 | } 565 | }, 566 | "nbformat": 4, 567 | "nbformat_minor": 0 568 | } -------------------------------------------------------------------------------- /first_edition/2.1-a-first-look-at-a-neural-network.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "Using TensorFlow backend.\n" 13 | ] 14 | }, 15 | { 16 | "data": { 17 | "text/plain": [ 18 | "'2.0.8'" 19 | ] 20 | }, 21 | "execution_count": 1, 22 | "metadata": {}, 23 | "output_type": "execute_result" 24 | } 25 | ], 26 | "source": [ 27 | "import keras\n", 28 | "keras.__version__" 29 | ] 30 | }, 31 | { 32 | "cell_type": "markdown", 33 | "metadata": {}, 34 | "source": [ 35 | "# A first look at a neural network\n", 36 | "\n", 37 | "This notebook contains the code samples found in Chapter 2, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.\n", 38 | "\n", 39 | "----\n", 40 | "\n", 41 | "We will now take a look at a first concrete example of a neural network, which makes use of the Python library Keras to learn to classify \n", 42 | "hand-written digits. Unless you already have experience with Keras or similar libraries, you will not understand everything about this \n", 43 | "first example right away. You probably haven't even installed Keras yet. Don't worry, that is perfectly fine. In the next chapter, we will \n", 44 | "review each element in our example and explain them in detail. So don't worry if some steps seem arbitrary or look like magic to you! \n", 45 | "We've got to start somewhere.\n", 46 | "\n", 47 | "The problem we are trying to solve here is to classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their 10 \n", 48 | "categories (0 to 9). The dataset we will use is the MNIST dataset, a classic dataset in the machine learning community, which has been \n", 49 | "around for almost as long as the field itself and has been very intensively studied. It's a set of 60,000 training images, plus 10,000 test \n", 50 | "images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. You can think of \"solving\" MNIST \n", 51 | "as the \"Hello World\" of deep learning -- it's what you do to verify that your algorithms are working as expected. As you become a machine \n", 52 | "learning practitioner, you will see MNIST come up over and over again, in scientific papers, blog posts, and so on." 53 | ] 54 | }, 55 | { 56 | "cell_type": "markdown", 57 | "metadata": {}, 58 | "source": [ 59 | "The MNIST dataset comes pre-loaded in Keras, in the form of a set of four Numpy arrays:" 60 | ] 61 | }, 62 | { 63 | "cell_type": "code", 64 | "execution_count": 2, 65 | "metadata": { 66 | "collapsed": true 67 | }, 68 | "outputs": [], 69 | "source": [ 70 | "from keras.datasets import mnist\n", 71 | "\n", 72 | "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()" 73 | ] 74 | }, 75 | { 76 | "cell_type": "markdown", 77 | "metadata": {}, 78 | "source": [ 79 | "`train_images` and `train_labels` form the \"training set\", the data that the model will learn from. The model will then be tested on the \n", 80 | "\"test set\", `test_images` and `test_labels`. Our images are encoded as Numpy arrays, and the labels are simply an array of digits, ranging \n", 81 | "from 0 to 9. There is a one-to-one correspondence between the images and the labels.\n", 82 | "\n", 83 | "Let's have a look at the training data:" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 3, 89 | "metadata": {}, 90 | "outputs": [ 91 | { 92 | "data": { 93 | "text/plain": [ 94 | "(60000, 28, 28)" 95 | ] 96 | }, 97 | "execution_count": 3, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | } 101 | ], 102 | "source": [ 103 | "train_images.shape" 104 | ] 105 | }, 106 | { 107 | "cell_type": "code", 108 | "execution_count": 4, 109 | "metadata": {}, 110 | "outputs": [ 111 | { 112 | "data": { 113 | "text/plain": [ 114 | "60000" 115 | ] 116 | }, 117 | "execution_count": 4, 118 | "metadata": {}, 119 | "output_type": "execute_result" 120 | } 121 | ], 122 | "source": [ 123 | "len(train_labels)" 124 | ] 125 | }, 126 | { 127 | "cell_type": "code", 128 | "execution_count": 5, 129 | "metadata": {}, 130 | "outputs": [ 131 | { 132 | "data": { 133 | "text/plain": [ 134 | "array([5, 0, 4, ..., 5, 6, 8], dtype=uint8)" 135 | ] 136 | }, 137 | "execution_count": 5, 138 | "metadata": {}, 139 | "output_type": "execute_result" 140 | } 141 | ], 142 | "source": [ 143 | "train_labels" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "Let's have a look at the test data:" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 6, 156 | "metadata": {}, 157 | "outputs": [ 158 | { 159 | "data": { 160 | "text/plain": [ 161 | "(10000, 28, 28)" 162 | ] 163 | }, 164 | "execution_count": 6, 165 | "metadata": {}, 166 | "output_type": "execute_result" 167 | } 168 | ], 169 | "source": [ 170 | "test_images.shape" 171 | ] 172 | }, 173 | { 174 | "cell_type": "code", 175 | "execution_count": 7, 176 | "metadata": {}, 177 | "outputs": [ 178 | { 179 | "data": { 180 | "text/plain": [ 181 | "10000" 182 | ] 183 | }, 184 | "execution_count": 7, 185 | "metadata": {}, 186 | "output_type": "execute_result" 187 | } 188 | ], 189 | "source": [ 190 | "len(test_labels)" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": 8, 196 | "metadata": {}, 197 | "outputs": [ 198 | { 199 | "data": { 200 | "text/plain": [ 201 | "array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)" 202 | ] 203 | }, 204 | "execution_count": 8, 205 | "metadata": {}, 206 | "output_type": "execute_result" 207 | } 208 | ], 209 | "source": [ 210 | "test_labels" 211 | ] 212 | }, 213 | { 214 | "cell_type": "markdown", 215 | "metadata": {}, 216 | "source": [ 217 | "Our workflow will be as follow: first we will present our neural network with the training data, `train_images` and `train_labels`. The \n", 218 | "network will then learn to associate images and labels. Finally, we will ask the network to produce predictions for `test_images`, and we \n", 219 | "will verify if these predictions match the labels from `test_labels`.\n", 220 | "\n", 221 | "Let's build our network -- again, remember that you aren't supposed to understand everything about this example just yet." 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 9, 227 | "metadata": { 228 | "collapsed": true 229 | }, 230 | "outputs": [], 231 | "source": [ 232 | "from keras import models\n", 233 | "from keras import layers\n", 234 | "\n", 235 | "network = models.Sequential()\n", 236 | "network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))\n", 237 | "network.add(layers.Dense(10, activation='softmax'))" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "\n", 245 | "The core building block of neural networks is the \"layer\", a data-processing module which you can conceive as a \"filter\" for data. Some \n", 246 | "data comes in, and comes out in a more useful form. Precisely, layers extract _representations_ out of the data fed into them -- hopefully \n", 247 | "representations that are more meaningful for the problem at hand. Most of deep learning really consists of chaining together simple layers \n", 248 | "which will implement a form of progressive \"data distillation\". A deep learning model is like a sieve for data processing, made of a \n", 249 | "succession of increasingly refined data filters -- the \"layers\".\n", 250 | "\n", 251 | "Here our network consists of a sequence of two `Dense` layers, which are densely-connected (also called \"fully-connected\") neural layers. \n", 252 | "The second (and last) layer is a 10-way \"softmax\" layer, which means it will return an array of 10 probability scores (summing to 1). Each \n", 253 | "score will be the probability that the current digit image belongs to one of our 10 digit classes.\n", 254 | "\n", 255 | "To make our network ready for training, we need to pick three more things, as part of \"compilation\" step:\n", 256 | "\n", 257 | "* A loss function: the is how the network will be able to measure how good a job it is doing on its training data, and thus how it will be \n", 258 | "able to steer itself in the right direction.\n", 259 | "* An optimizer: this is the mechanism through which the network will update itself based on the data it sees and its loss function.\n", 260 | "* Metrics to monitor during training and testing. Here we will only care about accuracy (the fraction of the images that were correctly \n", 261 | "classified).\n", 262 | "\n", 263 | "The exact purpose of the loss function and the optimizer will be made clear throughout the next two chapters." 264 | ] 265 | }, 266 | { 267 | "cell_type": "code", 268 | "execution_count": 10, 269 | "metadata": { 270 | "collapsed": true 271 | }, 272 | "outputs": [], 273 | "source": [ 274 | "network.compile(optimizer='rmsprop',\n", 275 | " loss='categorical_crossentropy',\n", 276 | " metrics=['accuracy'])" 277 | ] 278 | }, 279 | { 280 | "cell_type": "markdown", 281 | "metadata": {}, 282 | "source": [ 283 | "\n", 284 | "Before training, we will preprocess our data by reshaping it into the shape that the network expects, and scaling it so that all values are in \n", 285 | "the `[0, 1]` interval. Previously, our training images for instance were stored in an array of shape `(60000, 28, 28)` of type `uint8` with \n", 286 | "values in the `[0, 255]` interval. We transform it into a `float32` array of shape `(60000, 28 * 28)` with values between 0 and 1." 287 | ] 288 | }, 289 | { 290 | "cell_type": "code", 291 | "execution_count": 11, 292 | "metadata": { 293 | "collapsed": true 294 | }, 295 | "outputs": [], 296 | "source": [ 297 | "train_images = train_images.reshape((60000, 28 * 28))\n", 298 | "train_images = train_images.astype('float32') / 255\n", 299 | "\n", 300 | "test_images = test_images.reshape((10000, 28 * 28))\n", 301 | "test_images = test_images.astype('float32') / 255" 302 | ] 303 | }, 304 | { 305 | "cell_type": "markdown", 306 | "metadata": {}, 307 | "source": [ 308 | "We also need to categorically encode the labels, a step which we explain in chapter 3:" 309 | ] 310 | }, 311 | { 312 | "cell_type": "code", 313 | "execution_count": 12, 314 | "metadata": { 315 | "collapsed": true 316 | }, 317 | "outputs": [], 318 | "source": [ 319 | "from keras.utils import to_categorical\n", 320 | "\n", 321 | "train_labels = to_categorical(train_labels)\n", 322 | "test_labels = to_categorical(test_labels)" 323 | ] 324 | }, 325 | { 326 | "cell_type": "markdown", 327 | "metadata": {}, 328 | "source": [ 329 | "We are now ready to train our network, which in Keras is done via a call to the `fit` method of the network: \n", 330 | "we \"fit\" the model to its training data." 331 | ] 332 | }, 333 | { 334 | "cell_type": "code", 335 | "execution_count": 13, 336 | "metadata": {}, 337 | "outputs": [ 338 | { 339 | "name": "stdout", 340 | "output_type": "stream", 341 | "text": [ 342 | "Epoch 1/5\n", 343 | "60000/60000 [==============================] - 2s - loss: 0.2577 - acc: 0.9245 \n", 344 | "Epoch 2/5\n", 345 | "60000/60000 [==============================] - 1s - loss: 0.1042 - acc: 0.9690 \n", 346 | "Epoch 3/5\n", 347 | "60000/60000 [==============================] - 1s - loss: 0.0687 - acc: 0.9793 \n", 348 | "Epoch 4/5\n", 349 | "60000/60000 [==============================] - 1s - loss: 0.0508 - acc: 0.9848 \n", 350 | "Epoch 5/5\n", 351 | "60000/60000 [==============================] - 1s - loss: 0.0382 - acc: 0.9890 \n" 352 | ] 353 | }, 354 | { 355 | "data": { 356 | "text/plain": [ 357 | "" 358 | ] 359 | }, 360 | "execution_count": 13, 361 | "metadata": {}, 362 | "output_type": "execute_result" 363 | } 364 | ], 365 | "source": [ 366 | "network.fit(train_images, train_labels, epochs=5, batch_size=128)" 367 | ] 368 | }, 369 | { 370 | "cell_type": "markdown", 371 | "metadata": {}, 372 | "source": [ 373 | "Two quantities are being displayed during training: the \"loss\" of the network over the training data, and the accuracy of the network over \n", 374 | "the training data.\n", 375 | "\n", 376 | "We quickly reach an accuracy of 0.989 (i.e. 98.9%) on the training data. Now let's check that our model performs well on the test set too:" 377 | ] 378 | }, 379 | { 380 | "cell_type": "code", 381 | "execution_count": 14, 382 | "metadata": {}, 383 | "outputs": [ 384 | { 385 | "name": "stdout", 386 | "output_type": "stream", 387 | "text": [ 388 | " 9536/10000 [===========================>..] - ETA: 0s" 389 | ] 390 | } 391 | ], 392 | "source": [ 393 | "test_loss, test_acc = network.evaluate(test_images, test_labels)" 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 15, 399 | "metadata": {}, 400 | "outputs": [ 401 | { 402 | "name": "stdout", 403 | "output_type": "stream", 404 | "text": [ 405 | "test_acc: 0.9777\n" 406 | ] 407 | } 408 | ], 409 | "source": [ 410 | "print('test_acc:', test_acc)" 411 | ] 412 | }, 413 | { 414 | "cell_type": "markdown", 415 | "metadata": {}, 416 | "source": [ 417 | "\n", 418 | "Our test set accuracy turns out to be 97.8% -- that's quite a bit lower than the training set accuracy. \n", 419 | "This gap between training accuracy and test accuracy is an example of \"overfitting\", \n", 420 | "the fact that machine learning models tend to perform worse on new data than on their training data. \n", 421 | "Overfitting will be a central topic in chapter 3.\n", 422 | "\n", 423 | "This concludes our very first example -- you just saw how we could build and a train a neural network to classify handwritten digits, in \n", 424 | "less than 20 lines of Python code. In the next chapter, we will go in detail over every moving piece we just previewed, and clarify what is really \n", 425 | "going on behind the scenes. You will learn about \"tensors\", the data-storing objects going into the network, about tensor operations, which \n", 426 | "layers are made of, and about gradient descent, which allows our network to learn from its training examples." 427 | ] 428 | } 429 | ], 430 | "metadata": { 431 | "kernelspec": { 432 | "display_name": "Python 3", 433 | "language": "python", 434 | "name": "python3" 435 | }, 436 | "language_info": { 437 | "codemirror_mode": { 438 | "name": "ipython", 439 | "version": 3 440 | }, 441 | "file_extension": ".py", 442 | "mimetype": "text/x-python", 443 | "name": "python", 444 | "nbconvert_exporter": "python", 445 | "pygments_lexer": "ipython3", 446 | "version": "3.5.2" 447 | } 448 | }, 449 | "nbformat": 4, 450 | "nbformat_minor": 2 451 | } 452 | -------------------------------------------------------------------------------- /chapter18_best-practices-for-the-real-world.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Third Edition](https://www.manning.com/books/deep-learning-with-python-third-edition). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThe book's contents are available online at [deeplearningwithpython.io](https://deeplearningwithpython.io)." 10 | ] 11 | }, 12 | { 13 | "cell_type": "code", 14 | "execution_count": 0, 15 | "metadata": { 16 | "colab_type": "code" 17 | }, 18 | "outputs": [], 19 | "source": [ 20 | "!pip install keras keras-hub --upgrade -q" 21 | ] 22 | }, 23 | { 24 | "cell_type": "code", 25 | "execution_count": 0, 26 | "metadata": { 27 | "colab_type": "code" 28 | }, 29 | "outputs": [], 30 | "source": [ 31 | "import os\n", 32 | "os.environ[\"KERAS_BACKEND\"] = \"jax\"" 33 | ] 34 | }, 35 | { 36 | "cell_type": "code", 37 | "execution_count": 0, 38 | "metadata": { 39 | "cellView": "form", 40 | "colab_type": "code" 41 | }, 42 | "outputs": [], 43 | "source": [ 44 | "# @title\n", 45 | "import os\n", 46 | "from IPython.core.magic import register_cell_magic\n", 47 | "\n", 48 | "@register_cell_magic\n", 49 | "def backend(line, cell):\n", 50 | " current, required = os.environ.get(\"KERAS_BACKEND\", \"\"), line.split()[-1]\n", 51 | " if current == required:\n", 52 | " get_ipython().run_cell(cell)\n", 53 | " else:\n", 54 | " print(\n", 55 | " f\"This cell requires the {required} backend. To run it, change KERAS_BACKEND to \"\n", 56 | " f\"\\\"{required}\\\" at the top of the notebook, restart the runtime, and rerun the notebook.\"\n", 57 | " )" 58 | ] 59 | }, 60 | { 61 | "cell_type": "markdown", 62 | "metadata": { 63 | "colab_type": "text" 64 | }, 65 | "source": [ 66 | "## Best practices for the real world" 67 | ] 68 | }, 69 | { 70 | "cell_type": "markdown", 71 | "metadata": { 72 | "colab_type": "text" 73 | }, 74 | "source": [ 75 | "### Getting the most out of your models" 76 | ] 77 | }, 78 | { 79 | "cell_type": "markdown", 80 | "metadata": { 81 | "colab_type": "text" 82 | }, 83 | "source": [ 84 | "#### Hyperparameter optimization" 85 | ] 86 | }, 87 | { 88 | "cell_type": "markdown", 89 | "metadata": { 90 | "colab_type": "text" 91 | }, 92 | "source": [ 93 | "##### Using KerasTuner" 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 0, 99 | "metadata": { 100 | "colab_type": "code" 101 | }, 102 | "outputs": [], 103 | "source": [ 104 | "!pip install keras-tuner -q" 105 | ] 106 | }, 107 | { 108 | "cell_type": "code", 109 | "execution_count": 0, 110 | "metadata": { 111 | "colab_type": "code" 112 | }, 113 | "outputs": [], 114 | "source": [ 115 | "import keras\n", 116 | "from keras import layers\n", 117 | "\n", 118 | "def build_model(hp):\n", 119 | " units = hp.Int(name=\"units\", min_value=16, max_value=64, step=16)\n", 120 | " model = keras.Sequential(\n", 121 | " [\n", 122 | " layers.Dense(units, activation=\"relu\"),\n", 123 | " layers.Dense(10, activation=\"softmax\"),\n", 124 | " ]\n", 125 | " )\n", 126 | " optimizer = hp.Choice(name=\"optimizer\", values=[\"rmsprop\", \"adam\"])\n", 127 | " model.compile(\n", 128 | " optimizer=optimizer,\n", 129 | " loss=\"sparse_categorical_crossentropy\",\n", 130 | " metrics=[\"accuracy\"],\n", 131 | " )\n", 132 | " return model" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 0, 138 | "metadata": { 139 | "colab_type": "code" 140 | }, 141 | "outputs": [], 142 | "source": [ 143 | "import keras_tuner as kt\n", 144 | "\n", 145 | "class SimpleMLP(kt.HyperModel):\n", 146 | " def __init__(self, num_classes):\n", 147 | " self.num_classes = num_classes\n", 148 | "\n", 149 | " def build(self, hp):\n", 150 | " units = hp.Int(name=\"units\", min_value=16, max_value=64, step=16)\n", 151 | " model = keras.Sequential(\n", 152 | " [\n", 153 | " layers.Dense(units, activation=\"relu\"),\n", 154 | " layers.Dense(self.num_classes, activation=\"softmax\"),\n", 155 | " ]\n", 156 | " )\n", 157 | " optimizer = hp.Choice(name=\"optimizer\", values=[\"rmsprop\", \"adam\"])\n", 158 | " model.compile(\n", 159 | " optimizer=optimizer,\n", 160 | " loss=\"sparse_categorical_crossentropy\",\n", 161 | " metrics=[\"accuracy\"],\n", 162 | " )\n", 163 | " return model\n", 164 | "\n", 165 | "hypermodel = SimpleMLP(num_classes=10)" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": 0, 171 | "metadata": { 172 | "colab_type": "code" 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "tuner = kt.BayesianOptimization(\n", 177 | " build_model,\n", 178 | " objective=\"val_accuracy\",\n", 179 | " max_trials=20,\n", 180 | " executions_per_trial=2,\n", 181 | " directory=\"mnist_kt_test\",\n", 182 | " overwrite=True,\n", 183 | ")" 184 | ] 185 | }, 186 | { 187 | "cell_type": "code", 188 | "execution_count": 0, 189 | "metadata": { 190 | "colab_type": "code" 191 | }, 192 | "outputs": [], 193 | "source": [ 194 | "tuner.search_space_summary()" 195 | ] 196 | }, 197 | { 198 | "cell_type": "code", 199 | "execution_count": 0, 200 | "metadata": { 201 | "colab_type": "code" 202 | }, 203 | "outputs": [], 204 | "source": [ 205 | "(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()\n", 206 | "x_train = x_train.reshape((-1, 28 * 28)).astype(\"float32\") / 255\n", 207 | "x_test = x_test.reshape((-1, 28 * 28)).astype(\"float32\") / 255\n", 208 | "x_train_full = x_train[:]\n", 209 | "y_train_full = y_train[:]\n", 210 | "num_val_samples = 10000\n", 211 | "x_train, x_val = x_train[:-num_val_samples], x_train[-num_val_samples:]\n", 212 | "y_train, y_val = y_train[:-num_val_samples], y_train[-num_val_samples:]\n", 213 | "callbacks = [\n", 214 | " keras.callbacks.EarlyStopping(monitor=\"val_loss\", patience=5),\n", 215 | "]\n", 216 | "tuner.search(\n", 217 | " x_train,\n", 218 | " y_train,\n", 219 | " batch_size=128,\n", 220 | " epochs=100,\n", 221 | " validation_data=(x_val, y_val),\n", 222 | " callbacks=callbacks,\n", 223 | " verbose=2,\n", 224 | ")" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 0, 230 | "metadata": { 231 | "colab_type": "code" 232 | }, 233 | "outputs": [], 234 | "source": [ 235 | "top_n = 4\n", 236 | "best_hps = tuner.get_best_hyperparameters(top_n)" 237 | ] 238 | }, 239 | { 240 | "cell_type": "code", 241 | "execution_count": 0, 242 | "metadata": { 243 | "colab_type": "code" 244 | }, 245 | "outputs": [], 246 | "source": [ 247 | "def get_best_epoch(hp):\n", 248 | " model = build_model(hp)\n", 249 | " callbacks = [\n", 250 | " keras.callbacks.EarlyStopping(\n", 251 | " monitor=\"val_loss\", mode=\"min\", patience=10\n", 252 | " )\n", 253 | " ]\n", 254 | " history = model.fit(\n", 255 | " x_train,\n", 256 | " y_train,\n", 257 | " validation_data=(x_val, y_val),\n", 258 | " epochs=100,\n", 259 | " batch_size=128,\n", 260 | " callbacks=callbacks,\n", 261 | " )\n", 262 | " val_loss_per_epoch = history.history[\"val_loss\"]\n", 263 | " best_epoch = val_loss_per_epoch.index(min(val_loss_per_epoch)) + 1\n", 264 | " print(f\"Best epoch: {best_epoch}\")\n", 265 | " return best_epoch" 266 | ] 267 | }, 268 | { 269 | "cell_type": "code", 270 | "execution_count": 0, 271 | "metadata": { 272 | "colab_type": "code" 273 | }, 274 | "outputs": [], 275 | "source": [ 276 | "def get_best_trained_model(hp):\n", 277 | " best_epoch = get_best_epoch(hp)\n", 278 | " model = build_model(hp)\n", 279 | " model.fit(\n", 280 | " x_train_full, y_train_full, batch_size=128, epochs=int(best_epoch * 1.2)\n", 281 | " )\n", 282 | " return model\n", 283 | "\n", 284 | "best_models = []\n", 285 | "for hp in best_hps:\n", 286 | " model = get_best_trained_model(hp)\n", 287 | " model.evaluate(x_test, y_test)\n", 288 | " best_models.append(model)" 289 | ] 290 | }, 291 | { 292 | "cell_type": "code", 293 | "execution_count": 0, 294 | "metadata": { 295 | "colab_type": "code" 296 | }, 297 | "outputs": [], 298 | "source": [ 299 | "best_models = tuner.get_best_models(top_n)" 300 | ] 301 | }, 302 | { 303 | "cell_type": "markdown", 304 | "metadata": { 305 | "colab_type": "text" 306 | }, 307 | "source": [ 308 | "##### The art of crafting the right search space" 309 | ] 310 | }, 311 | { 312 | "cell_type": "markdown", 313 | "metadata": { 314 | "colab_type": "text" 315 | }, 316 | "source": [ 317 | "##### The future of hyperparameter tuning: automated machine learning" 318 | ] 319 | }, 320 | { 321 | "cell_type": "markdown", 322 | "metadata": { 323 | "colab_type": "text" 324 | }, 325 | "source": [ 326 | "#### Model ensembling" 327 | ] 328 | }, 329 | { 330 | "cell_type": "markdown", 331 | "metadata": { 332 | "colab_type": "text" 333 | }, 334 | "source": [ 335 | "### Scaling up model training with multiple devices" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": { 341 | "colab_type": "text" 342 | }, 343 | "source": [ 344 | "#### Multi-GPU training" 345 | ] 346 | }, 347 | { 348 | "cell_type": "markdown", 349 | "metadata": { 350 | "colab_type": "text" 351 | }, 352 | "source": [ 353 | "##### Data parallelism: Replicating your model on each GPU" 354 | ] 355 | }, 356 | { 357 | "cell_type": "markdown", 358 | "metadata": { 359 | "colab_type": "text" 360 | }, 361 | "source": [ 362 | "##### Model parallelism: Splitting your model across multiple GPUs" 363 | ] 364 | }, 365 | { 366 | "cell_type": "markdown", 367 | "metadata": { 368 | "colab_type": "text" 369 | }, 370 | "source": [ 371 | "#### Distributed training in practice" 372 | ] 373 | }, 374 | { 375 | "cell_type": "markdown", 376 | "metadata": { 377 | "colab_type": "text" 378 | }, 379 | "source": [ 380 | "##### Getting your hands on two or more GPUs" 381 | ] 382 | }, 383 | { 384 | "cell_type": "markdown", 385 | "metadata": { 386 | "colab_type": "text" 387 | }, 388 | "source": [ 389 | "##### Using data parallelism with JAX" 390 | ] 391 | }, 392 | { 393 | "cell_type": "markdown", 394 | "metadata": { 395 | "colab_type": "text" 396 | }, 397 | "source": [ 398 | "##### Using model parallelism with JAX" 399 | ] 400 | }, 401 | { 402 | "cell_type": "markdown", 403 | "metadata": { 404 | "colab_type": "text" 405 | }, 406 | "source": [ 407 | "###### The DeviceMesh API" 408 | ] 409 | }, 410 | { 411 | "cell_type": "markdown", 412 | "metadata": { 413 | "colab_type": "text" 414 | }, 415 | "source": [ 416 | "###### The LayoutMap API" 417 | ] 418 | }, 419 | { 420 | "cell_type": "markdown", 421 | "metadata": { 422 | "colab_type": "text" 423 | }, 424 | "source": [ 425 | "#### TPU training" 426 | ] 427 | }, 428 | { 429 | "cell_type": "markdown", 430 | "metadata": { 431 | "colab_type": "text" 432 | }, 433 | "source": [ 434 | "##### Using step fusing to improve TPU utilization" 435 | ] 436 | }, 437 | { 438 | "cell_type": "markdown", 439 | "metadata": { 440 | "colab_type": "text" 441 | }, 442 | "source": [ 443 | "### Speeding up training and inference with lower-precision computation" 444 | ] 445 | }, 446 | { 447 | "cell_type": "markdown", 448 | "metadata": { 449 | "colab_type": "text" 450 | }, 451 | "source": [ 452 | "##### Understanding floating-point precision" 453 | ] 454 | }, 455 | { 456 | "cell_type": "markdown", 457 | "metadata": { 458 | "colab_type": "text" 459 | }, 460 | "source": [ 461 | "##### Float16 inference" 462 | ] 463 | }, 464 | { 465 | "cell_type": "markdown", 466 | "metadata": { 467 | "colab_type": "text" 468 | }, 469 | "source": [ 470 | "##### Mixed-precision training" 471 | ] 472 | }, 473 | { 474 | "cell_type": "markdown", 475 | "metadata": { 476 | "colab_type": "text" 477 | }, 478 | "source": [ 479 | "##### Using loss scaling with mixed precision" 480 | ] 481 | }, 482 | { 483 | "cell_type": "markdown", 484 | "metadata": { 485 | "colab_type": "text" 486 | }, 487 | "source": [ 488 | "##### Beyond mixed precision: float8 training" 489 | ] 490 | }, 491 | { 492 | "cell_type": "markdown", 493 | "metadata": { 494 | "colab_type": "text" 495 | }, 496 | "source": [ 497 | "#### Faster inference with quantization" 498 | ] 499 | }, 500 | { 501 | "cell_type": "code", 502 | "execution_count": 0, 503 | "metadata": { 504 | "colab_type": "code" 505 | }, 506 | "outputs": [], 507 | "source": [ 508 | "from keras import ops\n", 509 | "\n", 510 | "x = ops.array([[0.1, 0.9], [1.2, -0.8]])\n", 511 | "kernel = ops.array([[-0.1, -2.2], [1.1, 0.7]])" 512 | ] 513 | }, 514 | { 515 | "cell_type": "code", 516 | "execution_count": 0, 517 | "metadata": { 518 | "colab_type": "code" 519 | }, 520 | "outputs": [], 521 | "source": [ 522 | "def abs_max_quantize(value):\n", 523 | " abs_max = ops.max(ops.abs(value), keepdims=True)\n", 524 | " scale = ops.divide(127, abs_max + 1e-7)\n", 525 | " scaled_value = value * scale\n", 526 | " scaled_value = ops.clip(ops.round(scaled_value), -127, 127)\n", 527 | " scaled_value = ops.cast(scaled_value, dtype=\"int8\")\n", 528 | " return scaled_value, scale\n", 529 | "\n", 530 | "int_x, x_scale = abs_max_quantize(x)\n", 531 | "int_kernel, kernel_scale = abs_max_quantize(kernel)" 532 | ] 533 | }, 534 | { 535 | "cell_type": "code", 536 | "execution_count": 0, 537 | "metadata": { 538 | "colab_type": "code" 539 | }, 540 | "outputs": [], 541 | "source": [ 542 | "int_y = ops.matmul(int_x, int_kernel)\n", 543 | "y = ops.cast(int_y, dtype=\"float32\") / (x_scale * kernel_scale)" 544 | ] 545 | }, 546 | { 547 | "cell_type": "code", 548 | "execution_count": 0, 549 | "metadata": { 550 | "colab_type": "code" 551 | }, 552 | "outputs": [], 553 | "source": [ 554 | "y" 555 | ] 556 | }, 557 | { 558 | "cell_type": "code", 559 | "execution_count": 0, 560 | "metadata": { 561 | "colab_type": "code" 562 | }, 563 | "outputs": [], 564 | "source": [ 565 | "ops.matmul(x, kernel)" 566 | ] 567 | } 568 | ], 569 | "metadata": { 570 | "accelerator": "GPU", 571 | "colab": { 572 | "collapsed_sections": [], 573 | "name": "chapter18_best-practices-for-the-real-world", 574 | "private_outputs": false, 575 | "provenance": [], 576 | "toc_visible": true 577 | }, 578 | "kernelspec": { 579 | "display_name": "Python 3", 580 | "language": "python", 581 | "name": "python3" 582 | }, 583 | "language_info": { 584 | "codemirror_mode": { 585 | "name": "ipython", 586 | "version": 3 587 | }, 588 | "file_extension": ".py", 589 | "mimetype": "text/x-python", 590 | "name": "python", 591 | "nbconvert_exporter": "python", 592 | "pygments_lexer": "ipython3", 593 | "version": "3.10.0" 594 | } 595 | }, 596 | "nbformat": 4, 597 | "nbformat_minor": 0 598 | } -------------------------------------------------------------------------------- /second_edition/chapter12_part01_text-generation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text" 7 | }, 8 | "source": [ 9 | "This is a companion notebook for the book [Deep Learning with Python, Second Edition](https://www.manning.com/books/deep-learning-with-python-second-edition?a_aid=keras&a_bid=76564dff). For readability, it only contains runnable code blocks and section titles, and omits everything else in the book: text paragraphs, figures, and pseudocode.\n\n**If you want to be able to follow what's going on, I recommend reading the notebook side by side with your copy of the book.**\n\nThis notebook was generated for TensorFlow 2.6." 10 | ] 11 | }, 12 | { 13 | "cell_type": "markdown", 14 | "metadata": { 15 | "colab_type": "text" 16 | }, 17 | "source": [ 18 | "# Generative deep learning" 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": { 24 | "colab_type": "text" 25 | }, 26 | "source": [ 27 | "## Text generation" 28 | ] 29 | }, 30 | { 31 | "cell_type": "markdown", 32 | "metadata": { 33 | "colab_type": "text" 34 | }, 35 | "source": [ 36 | "### A brief history of generative deep learning for sequence generation" 37 | ] 38 | }, 39 | { 40 | "cell_type": "markdown", 41 | "metadata": { 42 | "colab_type": "text" 43 | }, 44 | "source": [ 45 | "### How do you generate sequence data?" 46 | ] 47 | }, 48 | { 49 | "cell_type": "markdown", 50 | "metadata": { 51 | "colab_type": "text" 52 | }, 53 | "source": [ 54 | "### The importance of the sampling strategy" 55 | ] 56 | }, 57 | { 58 | "cell_type": "markdown", 59 | "metadata": { 60 | "colab_type": "text" 61 | }, 62 | "source": [ 63 | "**Reweighting a probability distribution to a different temperature**" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 0, 69 | "metadata": { 70 | "colab_type": "code" 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "import numpy as np\n", 75 | "def reweight_distribution(original_distribution, temperature=0.5):\n", 76 | " distribution = np.log(original_distribution) / temperature\n", 77 | " distribution = np.exp(distribution)\n", 78 | " return distribution / np.sum(distribution)" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": { 84 | "colab_type": "text" 85 | }, 86 | "source": [ 87 | "### Implementing text generation with Keras" 88 | ] 89 | }, 90 | { 91 | "cell_type": "markdown", 92 | "metadata": { 93 | "colab_type": "text" 94 | }, 95 | "source": [ 96 | "#### Preparing the data" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": { 102 | "colab_type": "text" 103 | }, 104 | "source": [ 105 | "**Downloading and uncompressing the IMDB movie reviews dataset**" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 0, 111 | "metadata": { 112 | "colab_type": "code" 113 | }, 114 | "outputs": [], 115 | "source": [ 116 | "!wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz\n", 117 | "!tar -xf aclImdb_v1.tar.gz" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": { 123 | "colab_type": "text" 124 | }, 125 | "source": [ 126 | "**Creating a dataset from text files (one file = one sample)**" 127 | ] 128 | }, 129 | { 130 | "cell_type": "code", 131 | "execution_count": 0, 132 | "metadata": { 133 | "colab_type": "code" 134 | }, 135 | "outputs": [], 136 | "source": [ 137 | "import tensorflow as tf\n", 138 | "from tensorflow import keras\n", 139 | "dataset = keras.utils.text_dataset_from_directory(\n", 140 | " directory=\"aclImdb\", label_mode=None, batch_size=256)\n", 141 | "dataset = dataset.map(lambda x: tf.strings.regex_replace(x, \"
\", \" \"))" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": { 147 | "colab_type": "text" 148 | }, 149 | "source": [ 150 | "**Preparing a `TextVectorization` layer**" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 0, 156 | "metadata": { 157 | "colab_type": "code" 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "from tensorflow.keras.layers import TextVectorization\n", 162 | "\n", 163 | "sequence_length = 100\n", 164 | "vocab_size = 15000\n", 165 | "text_vectorization = TextVectorization(\n", 166 | " max_tokens=vocab_size,\n", 167 | " output_mode=\"int\",\n", 168 | " output_sequence_length=sequence_length,\n", 169 | ")\n", 170 | "text_vectorization.adapt(dataset)" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "colab_type": "text" 177 | }, 178 | "source": [ 179 | "**Setting up a language modeling dataset**" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 0, 185 | "metadata": { 186 | "colab_type": "code" 187 | }, 188 | "outputs": [], 189 | "source": [ 190 | "def prepare_lm_dataset(text_batch):\n", 191 | " vectorized_sequences = text_vectorization(text_batch)\n", 192 | " x = vectorized_sequences[:, :-1]\n", 193 | " y = vectorized_sequences[:, 1:]\n", 194 | " return x, y\n", 195 | "\n", 196 | "lm_dataset = dataset.map(prepare_lm_dataset, num_parallel_calls=4)" 197 | ] 198 | }, 199 | { 200 | "cell_type": "markdown", 201 | "metadata": { 202 | "colab_type": "text" 203 | }, 204 | "source": [ 205 | "#### A Transformer-based sequence-to-sequence model" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": 0, 211 | "metadata": { 212 | "colab_type": "code" 213 | }, 214 | "outputs": [], 215 | "source": [ 216 | "import tensorflow as tf\n", 217 | "from tensorflow.keras import layers\n", 218 | "\n", 219 | "class PositionalEmbedding(layers.Layer):\n", 220 | " def __init__(self, sequence_length, input_dim, output_dim, **kwargs):\n", 221 | " super().__init__(**kwargs)\n", 222 | " self.token_embeddings = layers.Embedding(\n", 223 | " input_dim=input_dim, output_dim=output_dim)\n", 224 | " self.position_embeddings = layers.Embedding(\n", 225 | " input_dim=sequence_length, output_dim=output_dim)\n", 226 | " self.sequence_length = sequence_length\n", 227 | " self.input_dim = input_dim\n", 228 | " self.output_dim = output_dim\n", 229 | "\n", 230 | " def call(self, inputs):\n", 231 | " length = tf.shape(inputs)[-1]\n", 232 | " positions = tf.range(start=0, limit=length, delta=1)\n", 233 | " embedded_tokens = self.token_embeddings(inputs)\n", 234 | " embedded_positions = self.position_embeddings(positions)\n", 235 | " return embedded_tokens + embedded_positions\n", 236 | "\n", 237 | " def compute_mask(self, inputs, mask=None):\n", 238 | " return tf.math.not_equal(inputs, 0)\n", 239 | "\n", 240 | " def get_config(self):\n", 241 | " config = super(PositionalEmbedding, self).get_config()\n", 242 | " config.update({\n", 243 | " \"output_dim\": self.output_dim,\n", 244 | " \"sequence_length\": self.sequence_length,\n", 245 | " \"input_dim\": self.input_dim,\n", 246 | " })\n", 247 | " return config\n", 248 | "\n", 249 | "\n", 250 | "class TransformerDecoder(layers.Layer):\n", 251 | " def __init__(self, embed_dim, dense_dim, num_heads, **kwargs):\n", 252 | " super().__init__(**kwargs)\n", 253 | " self.embed_dim = embed_dim\n", 254 | " self.dense_dim = dense_dim\n", 255 | " self.num_heads = num_heads\n", 256 | " self.attention_1 = layers.MultiHeadAttention(\n", 257 | " num_heads=num_heads, key_dim=embed_dim)\n", 258 | " self.attention_2 = layers.MultiHeadAttention(\n", 259 | " num_heads=num_heads, key_dim=embed_dim)\n", 260 | " self.dense_proj = keras.Sequential(\n", 261 | " [layers.Dense(dense_dim, activation=\"relu\"),\n", 262 | " layers.Dense(embed_dim),]\n", 263 | " )\n", 264 | " self.layernorm_1 = layers.LayerNormalization()\n", 265 | " self.layernorm_2 = layers.LayerNormalization()\n", 266 | " self.layernorm_3 = layers.LayerNormalization()\n", 267 | " self.supports_masking = True\n", 268 | "\n", 269 | " def get_config(self):\n", 270 | " config = super(TransformerDecoder, self).get_config()\n", 271 | " config.update({\n", 272 | " \"embed_dim\": self.embed_dim,\n", 273 | " \"num_heads\": self.num_heads,\n", 274 | " \"dense_dim\": self.dense_dim,\n", 275 | " })\n", 276 | " return config\n", 277 | "\n", 278 | " def get_causal_attention_mask(self, inputs):\n", 279 | " input_shape = tf.shape(inputs)\n", 280 | " batch_size, sequence_length = input_shape[0], input_shape[1]\n", 281 | " i = tf.range(sequence_length)[:, tf.newaxis]\n", 282 | " j = tf.range(sequence_length)\n", 283 | " mask = tf.cast(i >= j, dtype=\"int32\")\n", 284 | " mask = tf.reshape(mask, (1, input_shape[1], input_shape[1]))\n", 285 | " mult = tf.concat(\n", 286 | " [tf.expand_dims(batch_size, -1),\n", 287 | " tf.constant([1, 1], dtype=tf.int32)], axis=0)\n", 288 | " return tf.tile(mask, mult)\n", 289 | "\n", 290 | " def call(self, inputs, encoder_outputs, mask=None):\n", 291 | " causal_mask = self.get_causal_attention_mask(inputs)\n", 292 | " if mask is not None:\n", 293 | " padding_mask = tf.cast(\n", 294 | " mask[:, tf.newaxis, :], dtype=\"int32\")\n", 295 | " padding_mask = tf.minimum(padding_mask, causal_mask)\n", 296 | " else:\n", 297 | " padding_mask = mask\n", 298 | " attention_output_1 = self.attention_1(\n", 299 | " query=inputs,\n", 300 | " value=inputs,\n", 301 | " key=inputs,\n", 302 | " attention_mask=causal_mask)\n", 303 | " attention_output_1 = self.layernorm_1(inputs + attention_output_1)\n", 304 | " attention_output_2 = self.attention_2(\n", 305 | " query=attention_output_1,\n", 306 | " value=encoder_outputs,\n", 307 | " key=encoder_outputs,\n", 308 | " attention_mask=padding_mask,\n", 309 | " )\n", 310 | " attention_output_2 = self.layernorm_2(\n", 311 | " attention_output_1 + attention_output_2)\n", 312 | " proj_output = self.dense_proj(attention_output_2)\n", 313 | " return self.layernorm_3(attention_output_2 + proj_output)" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": { 319 | "colab_type": "text" 320 | }, 321 | "source": [ 322 | "**A simple Transformer-based language model**" 323 | ] 324 | }, 325 | { 326 | "cell_type": "code", 327 | "execution_count": 0, 328 | "metadata": { 329 | "colab_type": "code" 330 | }, 331 | "outputs": [], 332 | "source": [ 333 | "from tensorflow.keras import layers\n", 334 | "embed_dim = 256\n", 335 | "latent_dim = 2048\n", 336 | "num_heads = 2\n", 337 | "\n", 338 | "inputs = keras.Input(shape=(None,), dtype=\"int64\")\n", 339 | "x = PositionalEmbedding(sequence_length, vocab_size, embed_dim)(inputs)\n", 340 | "x = TransformerDecoder(embed_dim, latent_dim, num_heads)(x, x)\n", 341 | "outputs = layers.Dense(vocab_size, activation=\"softmax\")(x)\n", 342 | "model = keras.Model(inputs, outputs)\n", 343 | "model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=\"rmsprop\")" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": { 349 | "colab_type": "text" 350 | }, 351 | "source": [ 352 | "### A text-generation callback with variable-temperature sampling" 353 | ] 354 | }, 355 | { 356 | "cell_type": "markdown", 357 | "metadata": { 358 | "colab_type": "text" 359 | }, 360 | "source": [ 361 | "**The text-generation callback**" 362 | ] 363 | }, 364 | { 365 | "cell_type": "code", 366 | "execution_count": 0, 367 | "metadata": { 368 | "colab_type": "code" 369 | }, 370 | "outputs": [], 371 | "source": [ 372 | "import numpy as np\n", 373 | "\n", 374 | "tokens_index = dict(enumerate(text_vectorization.get_vocabulary()))\n", 375 | "\n", 376 | "def sample_next(predictions, temperature=1.0):\n", 377 | " predictions = np.asarray(predictions).astype(\"float64\")\n", 378 | " predictions = np.log(predictions) / temperature\n", 379 | " exp_preds = np.exp(predictions)\n", 380 | " predictions = exp_preds / np.sum(exp_preds)\n", 381 | " probas = np.random.multinomial(1, predictions, 1)\n", 382 | " return np.argmax(probas)\n", 383 | "\n", 384 | "class TextGenerator(keras.callbacks.Callback):\n", 385 | " def __init__(self,\n", 386 | " prompt,\n", 387 | " generate_length,\n", 388 | " model_input_length,\n", 389 | " temperatures=(1.,),\n", 390 | " print_freq=1):\n", 391 | " self.prompt = prompt\n", 392 | " self.generate_length = generate_length\n", 393 | " self.model_input_length = model_input_length\n", 394 | " self.temperatures = temperatures\n", 395 | " self.print_freq = print_freq\n", 396 | " vectorized_prompt = text_vectorization([prompt])[0].numpy()\n", 397 | " self.prompt_length = np.nonzero(vectorized_prompt == 0)[0][0]\n", 398 | "\n", 399 | " def on_epoch_end(self, epoch, logs=None):\n", 400 | " if (epoch + 1) % self.print_freq != 0:\n", 401 | " return\n", 402 | " for temperature in self.temperatures:\n", 403 | " print(\"== Generating with temperature\", temperature)\n", 404 | " sentence = self.prompt\n", 405 | " for i in range(self.generate_length):\n", 406 | " tokenized_sentence = text_vectorization([sentence])\n", 407 | " predictions = self.model(tokenized_sentence)\n", 408 | " next_token = sample_next(\n", 409 | " predictions[0, self.prompt_length - 1 + i, :]\n", 410 | " )\n", 411 | " sampled_token = tokens_index[next_token]\n", 412 | " sentence += \" \" + sampled_token\n", 413 | " print(sentence)\n", 414 | "\n", 415 | "prompt = \"This movie\"\n", 416 | "text_gen_callback = TextGenerator(\n", 417 | " prompt,\n", 418 | " generate_length=50,\n", 419 | " model_input_length=sequence_length,\n", 420 | " temperatures=(0.2, 0.5, 0.7, 1., 1.5))" 421 | ] 422 | }, 423 | { 424 | "cell_type": "markdown", 425 | "metadata": { 426 | "colab_type": "text" 427 | }, 428 | "source": [ 429 | "**Fitting the language model**" 430 | ] 431 | }, 432 | { 433 | "cell_type": "code", 434 | "execution_count": 0, 435 | "metadata": { 436 | "colab_type": "code" 437 | }, 438 | "outputs": [], 439 | "source": [ 440 | "model.fit(lm_dataset, epochs=200, callbacks=[text_gen_callback])" 441 | ] 442 | }, 443 | { 444 | "cell_type": "markdown", 445 | "metadata": { 446 | "colab_type": "text" 447 | }, 448 | "source": [ 449 | "### Wrapping up" 450 | ] 451 | } 452 | ], 453 | "metadata": { 454 | "colab": { 455 | "collapsed_sections": [], 456 | "name": "chapter12_part01_text-generation.i", 457 | "private_outputs": false, 458 | "provenance": [], 459 | "toc_visible": true 460 | }, 461 | "kernelspec": { 462 | "display_name": "Python 3", 463 | "language": "python", 464 | "name": "python3" 465 | }, 466 | "language_info": { 467 | "codemirror_mode": { 468 | "name": "ipython", 469 | "version": 3 470 | }, 471 | "file_extension": ".py", 472 | "mimetype": "text/x-python", 473 | "name": "python", 474 | "nbconvert_exporter": "python", 475 | "pygments_lexer": "ipython3", 476 | "version": "3.7.0" 477 | } 478 | }, 479 | "nbformat": 4, 480 | "nbformat_minor": 0 481 | } --------------------------------------------------------------------------------