├── .gitignore ├── README.md ├── Test.ipynb ├── basic_classification.ipynb ├── custom_layers.ipynb ├── cvae.ipynb ├── cyclegan.ipynb ├── dcgan.ipynb ├── feature_columns.ipynb ├── image_captioning.ipynb ├── images.ipynb ├── intro_to_cnns.ipynb ├── overfit_and_underfit.ipynb ├── save_and_restore_models.ipynb ├── tensorboard_nersc_helper.py ├── test_tensorboard.ipynb └── transformer.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DL4Sci School TensorFlow tutorials 2 | 3 | This repository contains the hands-on introductory deep learning tutorial examples for the 4 | Deep Learning for Science school at Berkeley Lab: https://dl4sci-school.lbl.gov/ 5 | 6 | These jupyter notebooks come from the official TensorFlow 2.0 tutorials at 7 | https://www.tensorflow.org/beta. 8 | 9 | We made minor updates so attendees could run them on Cori GPU without modification. 10 | 11 | ## Contents 12 | 13 | * [Setup on Cori GPU](https://github.com/NERSC/dl4sci-tf-tutorials#getting-setup-on-cori-gpu) 14 | * [Setup on Collab](https://github.com/NERSC/dl4sci-tf-tutorials#running-on-collab) 15 | * [Introductory examples](https://github.com/NERSC/dl4sci-tf-tutorials#introductory-hands-on-notebooks) 16 | * [Advanced examples](https://github.com/NERSC/dl4sci-tf-tutorials#optional-advanced-notebooks) 17 | 18 | ## Getting setup on Cori GPU 19 | 20 | Open https://jupyter-dl.nersc.gov/ and log in with your training account 21 | credentials. 22 | 23 | Start a terminal by scrolling to the bottom of the Launcher window and clicking 24 | the `Terminal` button under `Other`. 25 | 26 | Using the terminal, clone this repository to download all of the tutorial 27 | notebooks: 28 | 29 | `git clone https://github.com/NERSC/dl4sci-tf-tutorials.git` 30 | 31 | Now you can use the Jupyter file browser to navigate the repository and launch 32 | notebooks. 33 | 34 | You can test that things are working on a Cori GPU node by running the 35 | [Test.ipynb](Test.ipynb) notebook. 36 | 37 | ## Running on Collab 38 | 39 | If you have issues with Cori GPU or if you simply prefer you can run these 40 | examples in the cloud on Google's Collab service. Simply go to the TF webpage 41 | for the specific example (links below) and click `Run in Google Collab`. 42 | Note that you may not get access to a GPU on Collab, but the TF tutorials are 43 | designed to execute quickly regardless. 44 | 45 | ## Introductory hands-on notebooks 46 | 47 | For a good introduction to implementing models in TensorFlow using the recommended 48 | Keras API, we recommend working through at least the first few examples below. 49 | 50 | The overfitting/underfitting and save/restore examples also demonstrate very 51 | practical use-cases that you may want to work through. 52 | 53 | Finally, depending on time, you can also try out the advanced examples according 54 | to your preference. 55 | 56 | For each example, see if you can successfully modify the code and take note of how results change. 57 | 58 | ### Basic classification 59 | 60 | [basic_classification.ipynb](basic_classification.ipynb) 61 | 62 | https://www.tensorflow.org/beta/tutorials/keras/basic_classification 63 | 64 | Quiz questions: 65 | 1. Why did we divide the image data by 255? 66 | 2. Which activation function did we use for our hidden layer of the network? Could we have used a different one? 67 | 3. Which activation function did we use for the output layer of the network? Could we have used a different one? 68 | 69 | Challenges: 70 | 1. Try to modify the network architecture by adding/removing layers, changing the size of the layers, etc. 71 | 2. Try changing the optimizer algorithm; can you figure out how to modify the default *learning rate*? 72 | 3. See if you can improve the test set accuracy. How good of a model can you train? 73 | 74 | 75 | ### Convolutional neural networks 76 | 77 | [intro_to_cnns.ipynb](intro_to_cnns.ipynb) 78 | 79 | https://www.tensorflow.org/beta/tutorials/images/intro_to_cnns 80 | 81 | This example is similar to the previous one but demonstrates how to setup CNNs so is valuable to work through as well. 82 | 83 | Quiz questons: 84 | 1. What benefit do we get from using max-pooling in our network? 85 | 2. Why do we add the dense layers only at the end? 86 | 3. Does the model converge with the specified settings? 87 | 88 | Challenges: 89 | 1. Try to modify the network architecture: the number of layers, the number of filters in the layers, the sizes of the filters, etc. 90 | 2. What's the best test accuracy you can achieve? 91 | 3. See if you can add data augmentation like the examples here: https://keras.io/preprocessing/image/ 92 | 4. Try to add BatchNormalization to the model: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/layers/BatchNormalization 93 | 94 | ### Classify structured data 95 | 96 | [feature_columns.ipynb](feature_columns.ipynb) 97 | 98 | https://www.tensorflow.org/beta/tutorials/keras/feature_columns 99 | 100 | Quiz questions: 101 | 1. This tutorial is just meant to teach you some mechanics and doesn't give an impressive result. What are some of the reasons why this model under-performs? 102 | 2. What are the situations in which you should consider using embedding or hashed-feature columns? Can you think of a good use-case for bucketized features? 103 | 104 | ### Overfitting and underfitting 105 | 106 | [overfit_and_underfit.ipynb](overfit_and_underfit.ipynb) 107 | 108 | https://www.tensorflow.org/beta/tutorials/keras/overfit_and_underfit 109 | 110 | Quiz questions: 111 | 1. Try to summarize how you diagnose under- and overfitting. 112 | 2. If your model is overfitting, what is the most ideal way to improve it? 113 | 3. How can you fix under-fitting? 114 | 115 | Challenge: 116 | 1. All the models in this example are overfitting. Can you build a model that underfits? 117 | 118 | ### Saving and restoring models 119 | 120 | [save_and_restore_models.ipynb](save_and_restore_models.ipynb) 121 | 122 | https://www.tensorflow.org/beta/tutorials/keras/save_and_restore_models 123 | 124 | ## Optional advanced notebooks 125 | 126 | ### Defining custom layers 127 | 128 | [custom_layers.ipynb](custom_layers.ipynb) 129 | 130 | https://www.tensorflow.org/beta/tutorials/eager/custom_layers 131 | 132 | ### Loading and preprocessing images 133 | 134 | [images.ipynb](images.ipynb) 135 | 136 | https://www.tensorflow.org/beta/tutorials/load_data/images 137 | 138 | ### Deep Convolutional Generative Adversarial Networks (DCGAN) 139 | 140 | [dcgan.ipynb](dcgan.ipynb) 141 | 142 | https://www.tensorflow.org/beta/tutorials/generative/dcgan 143 | 144 | ### Variational auto-encoders (VAE) 145 | 146 | [cvae.ipynb](cvae.ipynb) 147 | 148 | https://www.tensorflow.org/beta/tutorials/generative/cvae 149 | 150 | ### Image to image translation with CycleGAN 151 | 152 | [cyclegan.ipynb](cyclegan.ipynb) 153 | 154 | https://www.tensorflow.org/beta/tutorials/generative/cyclegan 155 | 156 | **Note**: I didn't have time to install tensorflow-examples, so this is done in the notebook. 157 | You will have to restart the kernel after running the `pip install` cell to pick up the new library. 158 | 159 | ### Transformers for language understanding 160 | 161 | [transformer.ipynb](transformer.ipynb) 162 | 163 | https://www.tensorflow.org/beta/tutorials/text/transformer 164 | 165 | ### Image captioning 166 | 167 | [image_captioning.ipynb](image_captioning.ipynb) 168 | 169 | https://www.tensorflow.org/beta/tutorials/text/image_captioning 170 | 171 | This example takes quite a while to run. It uses a fairly slow feature caching method, and the model training has poor GPU utilization and takes a while. 172 | 173 | ## TensorBoard 174 | 175 | If you'd like to try TensorBoard in Jupyter, you can take a look at the example 176 | code in [test_tensorboard.ipynb](test_tensorboard.ipynb). See if you can get 177 | TensorBoard working with one of the example notebooks! 178 | -------------------------------------------------------------------------------- /Test.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Basic tests of the Cori GPU partition" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "Verify that we're on a Cori GPU node (with a name like `cgpuXX`) and that we can see a V100 GPU." 15 | ] 16 | }, 17 | { 18 | "cell_type": "code", 19 | "execution_count": 5, 20 | "metadata": {}, 21 | "outputs": [ 22 | { 23 | "name": "stdout", 24 | "output_type": "stream", 25 | "text": [ 26 | "cgpu11\n" 27 | ] 28 | } 29 | ], 30 | "source": [ 31 | "!hostname" 32 | ] 33 | }, 34 | { 35 | "cell_type": "code", 36 | "execution_count": 6, 37 | "metadata": {}, 38 | "outputs": [ 39 | { 40 | "name": "stdout", 41 | "output_type": "stream", 42 | "text": [ 43 | "Mon Jul 8 17:28:28 2019 \n", 44 | "+-----------------------------------------------------------------------------+\n", 45 | "| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |\n", 46 | "|-------------------------------+----------------------+----------------------+\n", 47 | "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", 48 | "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", 49 | "|===============================+======================+======================|\n", 50 | "| 0 Tesla V100-SXM2... On | 00000000:1A:00.0 Off | 0 |\n", 51 | "| N/A 27C P0 38W / 300W | 0MiB / 16130MiB | 0% Default |\n", 52 | "+-------------------------------+----------------------+----------------------+\n", 53 | " \n", 54 | "+-----------------------------------------------------------------------------+\n", 55 | "| Processes: GPU Memory |\n", 56 | "| GPU PID Type Process name Usage |\n", 57 | "|=============================================================================|\n", 58 | "| No running processes found |\n", 59 | "+-----------------------------------------------------------------------------+\n" 60 | ] 61 | } 62 | ], 63 | "source": [ 64 | "!nvidia-smi" 65 | ] 66 | }, 67 | { 68 | "cell_type": "code", 69 | "execution_count": 1, 70 | "metadata": {}, 71 | "outputs": [ 72 | { 73 | "name": "stdout", 74 | "output_type": "stream", 75 | "text": [ 76 | "\u001b[1m\u001b[37mcgpu11\u001b[m Wed Jul 10 18:12:57 2019\n", 77 | "\u001b[36m[0]\u001b[m \u001b[34mTesla V100-SXM2-16GB\u001b[m |\u001b[31m 36'C\u001b[m, \u001b[32m 0 %\u001b[m | \u001b[36m\u001b[1m\u001b[33m 0\u001b[m / \u001b[33m16130\u001b[m MB |\n" 78 | ] 79 | } 80 | ], 81 | "source": [ 82 | "!gpustat" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "## Test CUDA-enabled TensorFlow" 90 | ] 91 | }, 92 | { 93 | "cell_type": "code", 94 | "execution_count": 4, 95 | "metadata": {}, 96 | "outputs": [], 97 | "source": [ 98 | "import tensorflow as tf" 99 | ] 100 | }, 101 | { 102 | "cell_type": "code", 103 | "execution_count": 7, 104 | "metadata": {}, 105 | "outputs": [ 106 | { 107 | "data": { 108 | "text/plain": [ 109 | "" 110 | ] 111 | }, 112 | "execution_count": 7, 113 | "metadata": {}, 114 | "output_type": "execute_result" 115 | } 116 | ], 117 | "source": [ 118 | "tf" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 10, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "data": { 128 | "text/plain": [ 129 | "True" 130 | ] 131 | }, 132 | "execution_count": 10, 133 | "metadata": {}, 134 | "output_type": "execute_result" 135 | } 136 | ], 137 | "source": [ 138 | "tf.test.is_built_with_cuda()" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 11, 144 | "metadata": {}, 145 | "outputs": [ 146 | { 147 | "data": { 148 | "text/plain": [ 149 | "True" 150 | ] 151 | }, 152 | "execution_count": 11, 153 | "metadata": {}, 154 | "output_type": "execute_result" 155 | } 156 | ], 157 | "source": [ 158 | "tf.test.is_gpu_available()" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 12, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "data": { 168 | "text/plain": [ 169 | "'/device:GPU:0'" 170 | ] 171 | }, 172 | "execution_count": 12, 173 | "metadata": {}, 174 | "output_type": "execute_result" 175 | } 176 | ], 177 | "source": [ 178 | "tf.test.gpu_device_name()" 179 | ] 180 | } 181 | ], 182 | "metadata": { 183 | "kernelspec": { 184 | "display_name": "tensorflow-gpu/2.0.0-beta-py36", 185 | "language": "python", 186 | "name": "tensorflow_gpu_2.0.0-beta-py36" 187 | }, 188 | "language_info": { 189 | "codemirror_mode": { 190 | "name": "ipython", 191 | "version": 3 192 | }, 193 | "file_extension": ".py", 194 | "mimetype": "text/x-python", 195 | "name": "python", 196 | "nbconvert_exporter": "python", 197 | "pygments_lexer": "ipython3", 198 | "version": "3.6.8" 199 | } 200 | }, 201 | "nbformat": 4, 202 | "nbformat_minor": 2 203 | } 204 | -------------------------------------------------------------------------------- /custom_layers.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "tDnwEv8FtJm7" 8 | }, 9 | "source": [ 10 | "##### Copyright 2018 The TensorFlow Authors." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": null, 16 | "metadata": { 17 | "cellView": "form", 18 | "colab": {}, 19 | "colab_type": "code", 20 | "id": "JlknJBWQtKkI" 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 25 | "# you may not use this file except in compliance with the License.\n", 26 | "# You may obtain a copy of the License at\n", 27 | "#\n", 28 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 29 | "#\n", 30 | "# Unless required by applicable law or agreed to in writing, software\n", 31 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 32 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 33 | "# See the License for the specific language governing permissions and\n", 34 | "# limitations under the License." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "colab_type": "text", 41 | "id": "60RdWsg1tETW" 42 | }, 43 | "source": [ 44 | "# Custom layers" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "colab_type": "text", 51 | "id": "BcJg7Enms86w" 52 | }, 53 | "source": [ 54 | "\n", 55 | " \n", 58 | " \n", 61 | " \n", 64 | " \n", 67 | "
\n", 56 | " View on TensorFlow.org\n", 57 | " \n", 59 | " Run in Google Colab\n", 60 | " \n", 62 | " View source on GitHub\n", 63 | " \n", 65 | " Download notebook\n", 66 | "
" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": { 73 | "colab_type": "text", 74 | "id": "UEu3q4jmpKVT" 75 | }, 76 | "source": [ 77 | "We recommend using `tf.keras` as a high-level API for building neural networks. That said, most TensorFlow APIs are usable with eager execution.\n" 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "execution_count": 2, 83 | "metadata": { 84 | "colab": {}, 85 | "colab_type": "code", 86 | "id": "-sXDg19Q691F" 87 | }, 88 | "outputs": [], 89 | "source": [ 90 | "from __future__ import absolute_import, division, print_function, unicode_literals" 91 | ] 92 | }, 93 | { 94 | "cell_type": "code", 95 | "execution_count": 3, 96 | "metadata": {}, 97 | "outputs": [], 98 | "source": [ 99 | "# Tell TF not to consume all GPU memory so you can run more than one notebook at once\n", 100 | "import os\n", 101 | "os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 4, 107 | "metadata": {}, 108 | "outputs": [ 109 | { 110 | "name": "stdout", 111 | "output_type": "stream", 112 | "text": [ 113 | "\u001b[1m\u001b[37mcgpu11\u001b[m Wed Jul 10 16:40:23 2019\n", 114 | "\u001b[36m[0]\u001b[m \u001b[34mTesla V100-SXM2-16GB\u001b[m |\u001b[31m 36'C\u001b[m, \u001b[32m 0 %\u001b[m | \u001b[36m\u001b[1m\u001b[33m 0\u001b[m / \u001b[33m16130\u001b[m MB |\n" 115 | ] 116 | } 117 | ], 118 | "source": [ 119 | "!gpustat" 120 | ] 121 | }, 122 | { 123 | "cell_type": "code", 124 | "execution_count": 3, 125 | "metadata": { 126 | "colab": {}, 127 | "colab_type": "code", 128 | "id": "Py0m-N6VgQFJ" 129 | }, 130 | "outputs": [], 131 | "source": [ 132 | "import tensorflow as tf" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": { 138 | "colab_type": "text", 139 | "id": "zSFfVVjkrrsI" 140 | }, 141 | "source": [ 142 | "## Layers: common sets of useful operations\n", 143 | "\n", 144 | "Most of the time when writing code for machine learning models you want to operate at a higher level of abstraction than individual operations and manipulation of individual variables.\n", 145 | "\n", 146 | "Many machine learning models are expressible as the composition and stacking of relatively simple layers, and TensorFlow provides both a set of many common layers as a well as easy ways for you to write your own application-specific layers either from scratch or as the composition of existing layers.\n", 147 | "\n", 148 | "TensorFlow includes the full [Keras](https://keras.io) API in the tf.keras package, and the Keras layers are very useful when building your own models.\n" 149 | ] 150 | }, 151 | { 152 | "cell_type": "code", 153 | "execution_count": 4, 154 | "metadata": { 155 | "colab": {}, 156 | "colab_type": "code", 157 | "id": "8PyXlPl-4TzQ" 158 | }, 159 | "outputs": [], 160 | "source": [ 161 | "# In the tf.keras.layers package, layers are objects. To construct a layer,\n", 162 | "# simply construct the object. Most layers take as a first argument the number\n", 163 | "# of output dimensions / channels.\n", 164 | "layer = tf.keras.layers.Dense(100)\n", 165 | "# The number of input dimensions is often unnecessary, as it can be inferred\n", 166 | "# the first time the layer is used, but it can be provided if you want to\n", 167 | "# specify it manually, which is useful in some complex models.\n", 168 | "layer = tf.keras.layers.Dense(10, input_shape=(None, 5))" 169 | ] 170 | }, 171 | { 172 | "cell_type": "markdown", 173 | "metadata": { 174 | "colab_type": "text", 175 | "id": "Fn69xxPO5Psr" 176 | }, 177 | "source": [ 178 | "The full list of pre-existing layers can be seen in [the documentation](https://www.tensorflow.org/api_docs/python/tf/keras/layers). It includes Dense (a fully-connected layer),\n", 179 | "Conv2D, LSTM, BatchNormalization, Dropout, and many others." 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 5, 185 | "metadata": { 186 | "colab": {}, 187 | "colab_type": "code", 188 | "id": "E3XKNknP5Mhb" 189 | }, 190 | "outputs": [ 191 | { 192 | "data": { 193 | "text/plain": [ 194 | "" 205 | ] 206 | }, 207 | "execution_count": 5, 208 | "metadata": {}, 209 | "output_type": "execute_result" 210 | } 211 | ], 212 | "source": [ 213 | "# To use a layer, simply call it.\n", 214 | "layer(tf.zeros([10, 5]))" 215 | ] 216 | }, 217 | { 218 | "cell_type": "code", 219 | "execution_count": 6, 220 | "metadata": { 221 | "colab": {}, 222 | "colab_type": "code", 223 | "id": "Wt_Nsv-L5t2s" 224 | }, 225 | "outputs": [ 226 | { 227 | "data": { 228 | "text/plain": [ 229 | "[,\n", 241 | " ]" 242 | ] 243 | }, 244 | "execution_count": 6, 245 | "metadata": {}, 246 | "output_type": "execute_result" 247 | } 248 | ], 249 | "source": [ 250 | "# Layers have many useful methods. For example, you can inspect all variables\n", 251 | "# in a layer using `layer.variables` and trainable variables using\n", 252 | "# `layer.trainable_variables`. In this case a fully-connected layer\n", 253 | "# will have variables for weights and biases.\n", 254 | "layer.variables" 255 | ] 256 | }, 257 | { 258 | "cell_type": "code", 259 | "execution_count": 7, 260 | "metadata": { 261 | "colab": {}, 262 | "colab_type": "code", 263 | "id": "6ilvKjz8_4MQ" 264 | }, 265 | "outputs": [ 266 | { 267 | "data": { 268 | "text/plain": [ 269 | "(,\n", 281 | " )" 282 | ] 283 | }, 284 | "execution_count": 7, 285 | "metadata": {}, 286 | "output_type": "execute_result" 287 | } 288 | ], 289 | "source": [ 290 | "# The variables are also accessible through nice accessors\n", 291 | "layer.kernel, layer.bias" 292 | ] 293 | }, 294 | { 295 | "cell_type": "markdown", 296 | "metadata": { 297 | "colab_type": "text", 298 | "id": "O0kDbE54-5VS" 299 | }, 300 | "source": [ 301 | "## Implementing custom layers\n", 302 | "The best way to implement your own layer is extending the tf.keras.Layer class and implementing:\n", 303 | " * `__init__` , where you can do all input-independent initialization\n", 304 | " * `build`, where you know the shapes of the input tensors and can do the rest of the initialization\n", 305 | " * `call`, where you do the forward computation\n", 306 | "\n", 307 | "Note that you don't have to wait until `build` is called to create your variables, you can also create them in `__init__`. However, the advantage of creating them in `build` is that it enables late variable creation based on the shape of the inputs the layer will operate on. On the other hand, creating variables in `__init__` would mean that shapes required to create the variables will need to be explicitly specified." 308 | ] 309 | }, 310 | { 311 | "cell_type": "code", 312 | "execution_count": 8, 313 | "metadata": { 314 | "colab": {}, 315 | "colab_type": "code", 316 | "id": "5Byl3n1k5kIy" 317 | }, 318 | "outputs": [ 319 | { 320 | "name": "stdout", 321 | "output_type": "stream", 322 | "text": [ 323 | "tf.Tensor(\n", 324 | "[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", 325 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", 326 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", 327 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", 328 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", 329 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", 330 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", 331 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", 332 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]\n", 333 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]], shape=(10, 10), dtype=float32)\n", 334 | "[]\n" 346 | ] 347 | } 348 | ], 349 | "source": [ 350 | "class MyDenseLayer(tf.keras.layers.Layer):\n", 351 | " def __init__(self, num_outputs):\n", 352 | " super(MyDenseLayer, self).__init__()\n", 353 | " self.num_outputs = num_outputs\n", 354 | "\n", 355 | " def build(self, input_shape):\n", 356 | " self.kernel = self.add_variable(\"kernel\",\n", 357 | " shape=[int(input_shape[-1]),\n", 358 | " self.num_outputs])\n", 359 | "\n", 360 | " def call(self, input):\n", 361 | " return tf.matmul(input, self.kernel)\n", 362 | "\n", 363 | "layer = MyDenseLayer(10)\n", 364 | "print(layer(tf.zeros([10, 5])))\n", 365 | "print(layer.trainable_variables)" 366 | ] 367 | }, 368 | { 369 | "cell_type": "markdown", 370 | "metadata": { 371 | "colab_type": "text", 372 | "id": "tk8E2vY0-z4Z" 373 | }, 374 | "source": [ 375 | "Overall code is easier to read and maintain if it uses standard layers whenever possible, as other readers will be familiar with the behavior of standard layers. If you want to use a layer which is not present in `tf.keras.layers`, consider filing a [github issue](http://github.com/tensorflow/tensorflow/issues/new) or, even better, sending us a pull request!" 376 | ] 377 | }, 378 | { 379 | "cell_type": "markdown", 380 | "metadata": { 381 | "colab_type": "text", 382 | "id": "Qhg4KlbKrs3G" 383 | }, 384 | "source": [ 385 | "## Models: composing layers\n", 386 | "\n", 387 | "Many interesting layer-like things in machine learning models are implemented by composing existing layers. For example, each residual block in a resnet is a composition of convolutions, batch normalizations, and a shortcut.\n", 388 | "\n", 389 | "The main class used when creating a layer-like thing which contains other layers is tf.keras.Model. Implementing one is done by inheriting from tf.keras.Model." 390 | ] 391 | }, 392 | { 393 | "cell_type": "code", 394 | "execution_count": 9, 395 | "metadata": { 396 | "colab": {}, 397 | "colab_type": "code", 398 | "id": "N30DTXiRASlb" 399 | }, 400 | "outputs": [ 401 | { 402 | "name": "stdout", 403 | "output_type": "stream", 404 | "text": [ 405 | "tf.Tensor(\n", 406 | "[[[[0. 0. 0.]\n", 407 | " [0. 0. 0.]\n", 408 | " [0. 0. 0.]]\n", 409 | "\n", 410 | " [[0. 0. 0.]\n", 411 | " [0. 0. 0.]\n", 412 | " [0. 0. 0.]]]], shape=(1, 2, 3, 3), dtype=float32)\n", 413 | "['resnet_identity_block/conv2d/kernel:0', 'resnet_identity_block/conv2d/bias:0', 'resnet_identity_block/batch_normalization/gamma:0', 'resnet_identity_block/batch_normalization/beta:0', 'resnet_identity_block/conv2d_1/kernel:0', 'resnet_identity_block/conv2d_1/bias:0', 'resnet_identity_block/batch_normalization_1/gamma:0', 'resnet_identity_block/batch_normalization_1/beta:0', 'resnet_identity_block/conv2d_2/kernel:0', 'resnet_identity_block/conv2d_2/bias:0', 'resnet_identity_block/batch_normalization_2/gamma:0', 'resnet_identity_block/batch_normalization_2/beta:0']\n" 414 | ] 415 | } 416 | ], 417 | "source": [ 418 | "class ResnetIdentityBlock(tf.keras.Model):\n", 419 | " def __init__(self, kernel_size, filters):\n", 420 | " super(ResnetIdentityBlock, self).__init__(name='')\n", 421 | " filters1, filters2, filters3 = filters\n", 422 | "\n", 423 | " self.conv2a = tf.keras.layers.Conv2D(filters1, (1, 1))\n", 424 | " self.bn2a = tf.keras.layers.BatchNormalization()\n", 425 | "\n", 426 | " self.conv2b = tf.keras.layers.Conv2D(filters2, kernel_size, padding='same')\n", 427 | " self.bn2b = tf.keras.layers.BatchNormalization()\n", 428 | "\n", 429 | " self.conv2c = tf.keras.layers.Conv2D(filters3, (1, 1))\n", 430 | " self.bn2c = tf.keras.layers.BatchNormalization()\n", 431 | "\n", 432 | " def call(self, input_tensor, training=False):\n", 433 | " x = self.conv2a(input_tensor)\n", 434 | " x = self.bn2a(x, training=training)\n", 435 | " x = tf.nn.relu(x)\n", 436 | "\n", 437 | " x = self.conv2b(x)\n", 438 | " x = self.bn2b(x, training=training)\n", 439 | " x = tf.nn.relu(x)\n", 440 | "\n", 441 | " x = self.conv2c(x)\n", 442 | " x = self.bn2c(x, training=training)\n", 443 | "\n", 444 | " x += input_tensor\n", 445 | " return tf.nn.relu(x)\n", 446 | "\n", 447 | "block = ResnetIdentityBlock(1, [1, 2, 3])\n", 448 | "print(block(tf.zeros([1, 2, 3, 3])))\n", 449 | "print([x.name for x in block.trainable_variables])" 450 | ] 451 | }, 452 | { 453 | "cell_type": "markdown", 454 | "metadata": { 455 | "colab_type": "text", 456 | "id": "wYfucVw65PMj" 457 | }, 458 | "source": [ 459 | "Much of the time, however, models which compose many layers simply call one layer after the other. This can be done in very little code using tf.keras.Sequential" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": 10, 465 | "metadata": { 466 | "colab": {}, 467 | "colab_type": "code", 468 | "id": "L9frk7Ur4uvJ" 469 | }, 470 | "outputs": [ 471 | { 472 | "data": { 473 | "text/plain": [ 474 | "" 482 | ] 483 | }, 484 | "execution_count": 10, 485 | "metadata": {}, 486 | "output_type": "execute_result" 487 | } 488 | ], 489 | "source": [ 490 | "my_seq = tf.keras.Sequential([tf.keras.layers.Conv2D(1, (1, 1),\n", 491 | " input_shape=(\n", 492 | " None, None, 3)),\n", 493 | " tf.keras.layers.BatchNormalization(),\n", 494 | " tf.keras.layers.Conv2D(2, 1,\n", 495 | " padding='same'),\n", 496 | " tf.keras.layers.BatchNormalization(),\n", 497 | " tf.keras.layers.Conv2D(3, (1, 1)),\n", 498 | " tf.keras.layers.BatchNormalization()])\n", 499 | "my_seq(tf.zeros([1, 2, 3, 3]))" 500 | ] 501 | } 502 | ], 503 | "metadata": { 504 | "colab": { 505 | "collapsed_sections": [], 506 | "name": "custom_layers.ipynb", 507 | "private_outputs": true, 508 | "provenance": [], 509 | "toc_visible": true, 510 | "version": "0.3.2" 511 | }, 512 | "kernelspec": { 513 | "display_name": "tensorflow-gpu/2.0.0-beta-py36", 514 | "language": "python", 515 | "name": "tensorflow_gpu_2.0.0-beta-py36" 516 | }, 517 | "language_info": { 518 | "codemirror_mode": { 519 | "name": "ipython", 520 | "version": 3 521 | }, 522 | "file_extension": ".py", 523 | "mimetype": "text/x-python", 524 | "name": "python", 525 | "nbconvert_exporter": "python", 526 | "pygments_lexer": "ipython3", 527 | "version": "3.6.8" 528 | } 529 | }, 530 | "nbformat": 4, 531 | "nbformat_minor": 2 532 | } 533 | -------------------------------------------------------------------------------- /cvae.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "Ndo4ERqnwQOU" 8 | }, 9 | "source": [ 10 | "##### Copyright 2018 The TensorFlow Authors." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": { 17 | "cellView": "form", 18 | "colab": {}, 19 | "colab_type": "code", 20 | "id": "MTKwbguKwT4R" 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 25 | "# you may not use this file except in compliance with the License.\n", 26 | "# You may obtain a copy of the License at\n", 27 | "#\n", 28 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 29 | "#\n", 30 | "# Unless required by applicable law or agreed to in writing, software\n", 31 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 32 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 33 | "# See the License for the specific language governing permissions and\n", 34 | "# limitations under the License." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "colab_type": "text", 41 | "id": "xfNT-mlFwxVM" 42 | }, 43 | "source": [ 44 | "# Convolutional Variational Autoencoder" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "colab_type": "text", 51 | "id": "0TD5ZrvEMbhZ" 52 | }, 53 | "source": [ 54 | "\n", 55 | " \n", 60 | " \n", 65 | " \n", 70 | " \n", 73 | "
\n", 56 | " \n", 57 | " \n", 58 | " View on TensorFlow.org\n", 59 | " \n", 61 | " \n", 62 | " \n", 63 | " Run in Google Colab\n", 64 | " \n", 66 | " \n", 67 | " \n", 68 | " View source on GitHub\n", 69 | " \n", 71 | " Download notebook\n", 72 | "
" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": { 79 | "colab_type": "text", 80 | "id": "ITZuApL56Mny" 81 | }, 82 | "source": [ 83 | "![evolution of output during training](https://tensorflow.org/images/autoencoders/cvae.gif)\n", 84 | "\n", 85 | "This notebook demonstrates how to generate images of handwritten digits by training a Variational Autoencoder ([1](https://arxiv.org/abs/1312.6114), [2](https://arxiv.org/abs/1401.4082)).\n", 86 | "\n" 87 | ] 88 | }, 89 | { 90 | "cell_type": "markdown", 91 | "metadata": { 92 | "colab_type": "text", 93 | "id": "e1_Y75QXJS6h" 94 | }, 95 | "source": [ 96 | "## Import TensorFlow and other libraries" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 1, 102 | "metadata": {}, 103 | "outputs": [], 104 | "source": [ 105 | "# Tell TF not to consume all GPU memory so you can run more than one notebook at once\n", 106 | "import os\n", 107 | "os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'" 108 | ] 109 | }, 110 | { 111 | "cell_type": "code", 112 | "execution_count": 2, 113 | "metadata": {}, 114 | "outputs": [ 115 | { 116 | "name": "stdout", 117 | "output_type": "stream", 118 | "text": [ 119 | "\u001b[1m\u001b[37mcgpu11\u001b[m Wed Jul 10 17:33:28 2019\n", 120 | "\u001b[36m[0]\u001b[m \u001b[34mTesla V100-SXM2-16GB\u001b[m |\u001b[31m 38'C\u001b[m, \u001b[32m 0 %\u001b[m | \u001b[36m\u001b[1m\u001b[33m 536\u001b[m / \u001b[33m16130\u001b[m MB | \u001b[1m\u001b[30msfarrell\u001b[m(\u001b[33m525M\u001b[m)\n" 121 | ] 122 | } 123 | ], 124 | "source": [ 125 | "!gpustat" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 3, 131 | "metadata": { 132 | "colab": {}, 133 | "colab_type": "code", 134 | "id": "YfIk2es3hJEd" 135 | }, 136 | "outputs": [], 137 | "source": [ 138 | "from __future__ import absolute_import, division, print_function, unicode_literals\n", 139 | "\n", 140 | "import tensorflow as tf\n", 141 | "\n", 142 | "import time\n", 143 | "import numpy as np\n", 144 | "import glob\n", 145 | "import matplotlib.pyplot as plt\n", 146 | "import PIL\n", 147 | "import imageio\n", 148 | "\n", 149 | "from IPython import display" 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": { 155 | "colab_type": "text", 156 | "id": "iYn4MdZnKCey" 157 | }, 158 | "source": [ 159 | "## Load the MNIST dataset\n", 160 | "Each MNIST image is originally a vector of 784 integers, each of which is between 0-255 and represents the intensity of a pixel. We model each pixel with a Bernoulli distribution in our model, and we statically binarize the dataset." 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": 4, 166 | "metadata": { 167 | "colab": {}, 168 | "colab_type": "code", 169 | "id": "a4fYMGxGhrna" 170 | }, 171 | "outputs": [], 172 | "source": [ 173 | "(train_images, _), (test_images, _) = tf.keras.datasets.mnist.load_data()" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 5, 179 | "metadata": { 180 | "colab": {}, 181 | "colab_type": "code", 182 | "id": "NFC2ghIdiZYE" 183 | }, 184 | "outputs": [], 185 | "source": [ 186 | "train_images = train_images.reshape(train_images.shape[0], 28, 28, 1).astype('float32')\n", 187 | "test_images = test_images.reshape(test_images.shape[0], 28, 28, 1).astype('float32')\n", 188 | "\n", 189 | "# Normalizing the images to the range of [0., 1.]\n", 190 | "train_images /= 255.\n", 191 | "test_images /= 255.\n", 192 | "\n", 193 | "# Binarization\n", 194 | "train_images[train_images >= .5] = 1.\n", 195 | "train_images[train_images < .5] = 0.\n", 196 | "test_images[test_images >= .5] = 1.\n", 197 | "test_images[test_images < .5] = 0." 198 | ] 199 | }, 200 | { 201 | "cell_type": "code", 202 | "execution_count": 6, 203 | "metadata": { 204 | "colab": {}, 205 | "colab_type": "code", 206 | "id": "S4PIDhoDLbsZ" 207 | }, 208 | "outputs": [], 209 | "source": [ 210 | "TRAIN_BUF = 60000\n", 211 | "BATCH_SIZE = 100\n", 212 | "\n", 213 | "TEST_BUF = 10000" 214 | ] 215 | }, 216 | { 217 | "cell_type": "markdown", 218 | "metadata": { 219 | "colab_type": "text", 220 | "id": "PIGN6ouoQxt3" 221 | }, 222 | "source": [ 223 | "## Use *tf.data* to create batches and shuffle the dataset" 224 | ] 225 | }, 226 | { 227 | "cell_type": "code", 228 | "execution_count": 7, 229 | "metadata": { 230 | "colab": {}, 231 | "colab_type": "code", 232 | "id": "-yKCCQOoJ7cn" 233 | }, 234 | "outputs": [], 235 | "source": [ 236 | "train_dataset = tf.data.Dataset.from_tensor_slices(train_images).shuffle(TRAIN_BUF).batch(BATCH_SIZE)\n", 237 | "test_dataset = tf.data.Dataset.from_tensor_slices(test_images).shuffle(TEST_BUF).batch(BATCH_SIZE)" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": { 243 | "colab_type": "text", 244 | "id": "THY-sZMiQ4UV" 245 | }, 246 | "source": [ 247 | "## Wire up the generative and inference network with *tf.keras.Sequential*\n", 248 | "\n", 249 | "In our VAE example, we use two small ConvNets for the generative and inference network. Since these neural nets are small, we use `tf.keras.Sequential` to simplify our code. Let $x$ and $z$ denote the observation and latent variable respectively in the following descriptions.\n", 250 | "\n", 251 | "### Generative Network\n", 252 | "This defines the generative model which takes a latent encoding as input, and outputs the parameters for a conditional distribution of the observation, i.e. $p(x|z)$. Additionally, we use a unit Gaussian prior $p(z)$ for the latent variable.\n", 253 | "\n", 254 | "### Inference Network\n", 255 | "This defines an approximate posterior distribution $q(z|x)$, which takes as input an observation and outputs a set of parameters for the conditional distribution of the latent representation. In this example, we simply model this distribution as a diagonal Gaussian. In this case, the inference network outputs the mean and log-variance parameters of a factorized Gaussian (log-variance instead of the variance directly is for numerical stability).\n", 256 | "\n", 257 | "### Reparameterization Trick\n", 258 | "During optimization, we can sample from $q(z|x)$ by first sampling from a unit Gaussian, and then multiplying by the standard deviation and adding the mean. This ensures the gradients could pass through the sample to the inference network parameters.\n", 259 | "\n", 260 | "### Network architecture\n", 261 | "For the inference network, we use two convolutional layers followed by a fully-connected layer. In the generative network, we mirror this architecture by using a fully-connected layer followed by three convolution transpose layers (a.k.a. deconvolutional layers in some contexts). Note, it's common practice to avoid using batch normalization when training VAEs, since the additional stochasticity due to using mini-batches may aggravate instability on top of the stochasticity from sampling." 262 | ] 263 | }, 264 | { 265 | "cell_type": "code", 266 | "execution_count": 8, 267 | "metadata": { 268 | "colab": {}, 269 | "colab_type": "code", 270 | "id": "VGLbvBEmjK0a" 271 | }, 272 | "outputs": [], 273 | "source": [ 274 | "class CVAE(tf.keras.Model):\n", 275 | " def __init__(self, latent_dim):\n", 276 | " super(CVAE, self).__init__()\n", 277 | " self.latent_dim = latent_dim\n", 278 | " self.inference_net = tf.keras.Sequential([\n", 279 | " tf.keras.layers.InputLayer(input_shape=(28, 28, 1)),\n", 280 | " tf.keras.layers.Conv2D(\n", 281 | " filters=32, kernel_size=3, strides=(2, 2), activation='relu'),\n", 282 | " tf.keras.layers.Conv2D(\n", 283 | " filters=64, kernel_size=3, strides=(2, 2), activation='relu'),\n", 284 | " tf.keras.layers.Flatten(),\n", 285 | " # No activation\n", 286 | " tf.keras.layers.Dense(latent_dim + latent_dim),\n", 287 | " ])\n", 288 | "\n", 289 | " self.generative_net = tf.keras.Sequential([\n", 290 | " tf.keras.layers.InputLayer(input_shape=(latent_dim,)),\n", 291 | " tf.keras.layers.Dense(units=7*7*32, activation=tf.nn.relu),\n", 292 | " tf.keras.layers.Reshape(target_shape=(7, 7, 32)),\n", 293 | " tf.keras.layers.Conv2DTranspose(\n", 294 | " filters=64,\n", 295 | " kernel_size=3,\n", 296 | " strides=(2, 2),\n", 297 | " padding=\"SAME\",\n", 298 | " activation='relu'),\n", 299 | " tf.keras.layers.Conv2DTranspose(\n", 300 | " filters=32,\n", 301 | " kernel_size=3,\n", 302 | " strides=(2, 2),\n", 303 | " padding=\"SAME\",\n", 304 | " activation='relu'),\n", 305 | " # No activation\n", 306 | " tf.keras.layers.Conv2DTranspose(\n", 307 | " filters=1, kernel_size=3, strides=(1, 1), padding=\"SAME\"),\n", 308 | " ])\n", 309 | "\n", 310 | " def sample(self, eps=None):\n", 311 | " if eps is None:\n", 312 | " eps = tf.random.normal(shape=(100, self.latent_dim))\n", 313 | " return self.decode(eps, apply_sigmoid=True)\n", 314 | "\n", 315 | " def encode(self, x):\n", 316 | " mean, logvar = tf.split(self.inference_net(x), num_or_size_splits=2, axis=1)\n", 317 | " return mean, logvar\n", 318 | "\n", 319 | " def reparameterize(self, mean, logvar):\n", 320 | " eps = tf.random.normal(shape=mean.shape)\n", 321 | " return eps * tf.exp(logvar * .5) + mean\n", 322 | "\n", 323 | " def decode(self, z, apply_sigmoid=False):\n", 324 | " logits = self.generative_net(z)\n", 325 | " if apply_sigmoid:\n", 326 | " probs = tf.sigmoid(logits)\n", 327 | " return probs\n", 328 | "\n", 329 | " return logits" 330 | ] 331 | }, 332 | { 333 | "cell_type": "markdown", 334 | "metadata": { 335 | "colab_type": "text", 336 | "id": "0FMYgY_mPfTi" 337 | }, 338 | "source": [ 339 | "## Define the loss function and the optimizer\n", 340 | "\n", 341 | "VAEs train by maximizing the evidence lower bound (ELBO) on the marginal log-likelihood:\n", 342 | "\n", 343 | "$$\\log p(x) \\ge \\text{ELBO} = \\mathbb{E}_{q(z|x)}\\left[\\log \\frac{p(x, z)}{q(z|x)}\\right].$$\n", 344 | "\n", 345 | "In practice, we optimize the single sample Monte Carlo estimate of this expectation:\n", 346 | "\n", 347 | "$$\\log p(x| z) + \\log p(z) - \\log q(z|x),$$\n", 348 | "where $z$ is sampled from $q(z|x)$.\n", 349 | "\n", 350 | "**Note**: we could also analytically compute the KL term, but here we incorporate all three terms in the Monte Carlo estimator for simplicity." 351 | ] 352 | }, 353 | { 354 | "cell_type": "code", 355 | "execution_count": 9, 356 | "metadata": { 357 | "colab": {}, 358 | "colab_type": "code", 359 | "id": "iWCn_PVdEJZ7" 360 | }, 361 | "outputs": [], 362 | "source": [ 363 | "optimizer = tf.keras.optimizers.Adam(1e-4)\n", 364 | "\n", 365 | "def log_normal_pdf(sample, mean, logvar, raxis=1):\n", 366 | " log2pi = tf.math.log(2. * np.pi)\n", 367 | " return tf.reduce_sum(\n", 368 | " -.5 * ((sample - mean) ** 2. * tf.exp(-logvar) + logvar + log2pi),\n", 369 | " axis=raxis)\n", 370 | "\n", 371 | "def compute_loss(model, x):\n", 372 | " mean, logvar = model.encode(x)\n", 373 | " z = model.reparameterize(mean, logvar)\n", 374 | " x_logit = model.decode(z)\n", 375 | "\n", 376 | " cross_ent = tf.nn.sigmoid_cross_entropy_with_logits(logits=x_logit, labels=x)\n", 377 | " logpx_z = -tf.reduce_sum(cross_ent, axis=[1, 2, 3])\n", 378 | " logpz = log_normal_pdf(z, 0., 0.)\n", 379 | " logqz_x = log_normal_pdf(z, mean, logvar)\n", 380 | " return -tf.reduce_mean(logpx_z + logpz - logqz_x)\n", 381 | "\n", 382 | "def compute_gradients(model, x):\n", 383 | " with tf.GradientTape() as tape:\n", 384 | " loss = compute_loss(model, x)\n", 385 | " return tape.gradient(loss, model.trainable_variables), loss\n", 386 | "\n", 387 | "def apply_gradients(optimizer, gradients, variables):\n", 388 | " optimizer.apply_gradients(zip(gradients, variables))" 389 | ] 390 | }, 391 | { 392 | "cell_type": "markdown", 393 | "metadata": { 394 | "colab_type": "text", 395 | "id": "Rw1fkAczTQYh" 396 | }, 397 | "source": [ 398 | "## Training\n", 399 | "\n", 400 | "* We start by iterating over the dataset\n", 401 | "* During each iteration, we pass the image to the encoder to obtain a set of mean and log-variance parameters of the approximate posterior $q(z|x)$\n", 402 | "* We then apply the *reparameterization trick* to sample from $q(z|x)$\n", 403 | "* Finally, we pass the reparameterized samples to the decoder to obtain the logits of the generative distribution $p(x|z)$\n", 404 | "* **Note:** Since we use the dataset loaded by keras with 60k datapoints in the training set and 10k datapoints in the test set, our resulting ELBO on the test set is slightly higher than reported results in the literature which uses dynamic binarization of Larochelle's MNIST.\n", 405 | "\n", 406 | "## Generate Images\n", 407 | "\n", 408 | "* After training, it is time to generate some images\n", 409 | "* We start by sampling a set of latent vectors from the unit Gaussian prior distribution $p(z)$\n", 410 | "* The generator will then convert the latent sample $z$ to logits of the observation, giving a distribution $p(x|z)$\n", 411 | "* Here we plot the probabilities of Bernoulli distributions\n" 412 | ] 413 | }, 414 | { 415 | "cell_type": "code", 416 | "execution_count": 10, 417 | "metadata": { 418 | "colab": {}, 419 | "colab_type": "code", 420 | "id": "NS2GWywBbAWo" 421 | }, 422 | "outputs": [], 423 | "source": [ 424 | "epochs = 32\n", 425 | "latent_dim = 50\n", 426 | "num_examples_to_generate = 16\n", 427 | "\n", 428 | "# keeping the random vector constant for generation (prediction) so\n", 429 | "# it will be easier to see the improvement.\n", 430 | "random_vector_for_generation = tf.random.normal(\n", 431 | " shape=[num_examples_to_generate, latent_dim])\n", 432 | "model = CVAE(latent_dim)" 433 | ] 434 | }, 435 | { 436 | "cell_type": "code", 437 | "execution_count": 11, 438 | "metadata": { 439 | "colab": {}, 440 | "colab_type": "code", 441 | "id": "RmdVsmvhPxyy" 442 | }, 443 | "outputs": [], 444 | "source": [ 445 | "def generate_and_save_images(model, epoch, test_input):\n", 446 | " predictions = model.sample(test_input)\n", 447 | " fig = plt.figure(figsize=(4,4))\n", 448 | "\n", 449 | " for i in range(predictions.shape[0]):\n", 450 | " plt.subplot(4, 4, i+1)\n", 451 | " plt.imshow(predictions[i, :, :, 0], cmap='gray')\n", 452 | " plt.axis('off')\n", 453 | "\n", 454 | " # tight_layout minimizes the overlap between 2 sub-plots\n", 455 | " plt.savefig('image_at_epoch_{:04d}.png'.format(epoch))\n", 456 | " plt.show()" 457 | ] 458 | }, 459 | { 460 | "cell_type": "code", 461 | "execution_count": null, 462 | "metadata": { 463 | "colab": {}, 464 | "colab_type": "code", 465 | "id": "2M7LmLtGEMQJ" 466 | }, 467 | "outputs": [], 468 | "source": [ 469 | "generate_and_save_images(model, 0, random_vector_for_generation)\n", 470 | "\n", 471 | "for epoch in range(1, epochs + 1):\n", 472 | " start_time = time.time()\n", 473 | " for train_x in train_dataset:\n", 474 | " gradients, loss = compute_gradients(model, train_x)\n", 475 | " apply_gradients(optimizer, gradients, model.trainable_variables)\n", 476 | " end_time = time.time()\n", 477 | "\n", 478 | " if epoch % 1 == 0:\n", 479 | " loss = tf.keras.metrics.Mean()\n", 480 | " for test_x in test_dataset:\n", 481 | " loss(compute_loss(model, test_x))\n", 482 | " elbo = -loss.result()\n", 483 | " display.clear_output(wait=False)\n", 484 | " print('Epoch: {}, Test set ELBO: {}, '\n", 485 | " 'time elapse for current epoch {}'.format(epoch,\n", 486 | " elbo,\n", 487 | " end_time - start_time))\n", 488 | " generate_and_save_images(\n", 489 | " model, epoch, random_vector_for_generation)" 490 | ] 491 | }, 492 | { 493 | "cell_type": "markdown", 494 | "metadata": { 495 | "colab_type": "text", 496 | "id": "P4M_vIbUi7c0" 497 | }, 498 | "source": [ 499 | "### Display an image using the epoch number" 500 | ] 501 | }, 502 | { 503 | "cell_type": "code", 504 | "execution_count": null, 505 | "metadata": { 506 | "colab": {}, 507 | "colab_type": "code", 508 | "id": "WfO5wCdclHGL" 509 | }, 510 | "outputs": [], 511 | "source": [ 512 | "def display_image(epoch_no):\n", 513 | " return PIL.Image.open('image_at_epoch_{:04d}.png'.format(epoch_no))" 514 | ] 515 | }, 516 | { 517 | "cell_type": "code", 518 | "execution_count": null, 519 | "metadata": { 520 | "colab": {}, 521 | "colab_type": "code", 522 | "id": "5x3q9_Oe5q0A" 523 | }, 524 | "outputs": [], 525 | "source": [ 526 | "plt.imshow(display_image(epochs))\n", 527 | "plt.axis('off')# Display images" 528 | ] 529 | }, 530 | { 531 | "cell_type": "markdown", 532 | "metadata": { 533 | "colab_type": "text", 534 | "id": "NywiH3nL8guF" 535 | }, 536 | "source": [ 537 | "### Generate a GIF of all the saved images." 538 | ] 539 | }, 540 | { 541 | "cell_type": "code", 542 | "execution_count": null, 543 | "metadata": { 544 | "colab": {}, 545 | "colab_type": "code", 546 | "id": "IGKQgENQ8lEI" 547 | }, 548 | "outputs": [], 549 | "source": [ 550 | "anim_file = 'cvae.gif'\n", 551 | "\n", 552 | "with imageio.get_writer(anim_file, mode='I') as writer:\n", 553 | " filenames = glob.glob('image*.png')\n", 554 | " filenames = sorted(filenames)\n", 555 | " last = -1\n", 556 | " for i,filename in enumerate(filenames):\n", 557 | " frame = 2*(i**0.5)\n", 558 | " if round(frame) > round(last):\n", 559 | " last = frame\n", 560 | " else:\n", 561 | " continue\n", 562 | " image = imageio.imread(filename)\n", 563 | " writer.append_data(image)\n", 564 | " image = imageio.imread(filename)\n", 565 | " writer.append_data(image)\n", 566 | "\n", 567 | "import IPython\n", 568 | "if IPython.version_info >= (6,2,0,''):\n", 569 | " display.Image(filename=anim_file)" 570 | ] 571 | }, 572 | { 573 | "cell_type": "markdown", 574 | "metadata": { 575 | "colab_type": "text", 576 | "id": "yQXO_dlXkKsT" 577 | }, 578 | "source": [ 579 | "If you're working in Colab you can download the animation with the code below:" 580 | ] 581 | }, 582 | { 583 | "cell_type": "code", 584 | "execution_count": null, 585 | "metadata": { 586 | "colab": {}, 587 | "colab_type": "code", 588 | "id": "4fSJS3m5HLFM" 589 | }, 590 | "outputs": [], 591 | "source": [ 592 | "try:\n", 593 | " from google.colab import files\n", 594 | "except ImportError:\n", 595 | " pass\n", 596 | "else:\n", 597 | " files.download(anim_file)" 598 | ] 599 | } 600 | ], 601 | "metadata": { 602 | "accelerator": "GPU", 603 | "colab": { 604 | "collapsed_sections": [], 605 | "name": "cvae.ipynb", 606 | "private_outputs": true, 607 | "provenance": [], 608 | "toc_visible": true, 609 | "version": "0.3.2" 610 | }, 611 | "kernelspec": { 612 | "display_name": "tensorflow-gpu/2.0.0-beta-py36", 613 | "language": "python", 614 | "name": "tensorflow_gpu_2.0.0-beta-py36" 615 | }, 616 | "language_info": { 617 | "codemirror_mode": { 618 | "name": "ipython", 619 | "version": 3 620 | }, 621 | "file_extension": ".py", 622 | "mimetype": "text/x-python", 623 | "name": "python", 624 | "nbconvert_exporter": "python", 625 | "pygments_lexer": "ipython3", 626 | "version": "3.6.8" 627 | } 628 | }, 629 | "nbformat": 4, 630 | "nbformat_minor": 2 631 | } 632 | -------------------------------------------------------------------------------- /feature_columns.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "rNdWfPXCjTjY" 8 | }, 9 | "source": [ 10 | "##### Copyright 2019 The TensorFlow Authors." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": { 17 | "cellView": "form", 18 | "colab": {}, 19 | "colab_type": "code", 20 | "id": "I1dUQ0GejU8N" 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 25 | "# you may not use this file except in compliance with the License.\n", 26 | "# You may obtain a copy of the License at\n", 27 | "#\n", 28 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 29 | "#\n", 30 | "# Unless required by applicable law or agreed to in writing, software\n", 31 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 32 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 33 | "# See the License for the specific language governing permissions and\n", 34 | "# limitations under the License." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "colab_type": "text", 41 | "id": "c05P9g5WjizZ" 42 | }, 43 | "source": [ 44 | "# Classify structured data" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "colab_type": "text", 51 | "id": "zofH_gCzgplN" 52 | }, 53 | "source": [ 54 | "\n", 55 | " \n", 60 | " \n", 65 | " \n", 70 | " \n", 73 | "
\n", 56 | " \n", 57 | " \n", 58 | " View on TensorFlow.org\n", 59 | " \n", 61 | " \n", 62 | " \n", 63 | " Run in Google Colab\n", 64 | " \n", 66 | " \n", 67 | " \n", 68 | " View source on GitHub\n", 69 | " \n", 71 | " Download notebook\n", 72 | "
" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": { 79 | "colab_type": "text", 80 | "id": "K1y4OHpGgss7" 81 | }, 82 | "source": [ 83 | "This tutorial demonstrates how to classify structured data (e.g. tabular data in a CSV). We will use [Keras](https://www.tensorflow.org/guide/keras) to define the model, and [feature columns](https://www.tensorflow.org/guide/feature_columns) as a bridge to map from columns in a CSV to features used to train the model. This tutorial contains complete code to:\n", 84 | "\n", 85 | "* Load a CSV file using [Pandas](https://pandas.pydata.org/).\n", 86 | "* Build an input pipeline to batch and shuffle the rows using [tf.data](https://www.tensorflow.org/guide/datasets).\n", 87 | "* Map from columns in the CSV to features used to train the model using feature columns.\n", 88 | "* Build, train, and evaluate a model using Keras.\n", 89 | "\n", 90 | "## The Dataset\n", 91 | "\n", 92 | "We will use a small [dataset](https://archive.ics.uci.edu/ml/datasets/heart+Disease) provided by the Cleveland Clinic Foundation for Heart Disease. There are several hundred rows in the CSV. Each row describe a patient, and each column describes an attribute. We will use this information to predict whether a patient has heart disease, which in this dataset is a binary classification task.\n", 93 | "\n", 94 | "Following is a [description](https://archive.ics.uci.edu/ml/machine-learning-databases/heart-disease/heart-disease.names) of this dataset. Notice there are both numeric and categorical columns.\n", 95 | "\n", 96 | ">Column| Description| Feature Type | Data Type\n", 97 | ">------------|--------------------|----------------------|-----------------\n", 98 | ">Age | Age in years | Numerical | integer\n", 99 | ">Sex | (1 = male; 0 = female) | Categorical | integer\n", 100 | ">CP | Chest pain type (0, 1, 2, 3, 4) | Categorical | integer\n", 101 | ">Trestbpd | Resting blood pressure (in mm Hg on admission to the hospital) | Numerical | integer\n", 102 | ">Chol | Serum cholestoral in mg/dl | Numerical | integer\n", 103 | ">FBS | (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false) | Categorical | integer\n", 104 | ">RestECG | Resting electrocardiographic results (0, 1, 2) | Categorical | integer\n", 105 | ">Thalach | Maximum heart rate achieved | Numerical | integer\n", 106 | ">Exang | Exercise induced angina (1 = yes; 0 = no) | Categorical | integer\n", 107 | ">Oldpeak | ST depression induced by exercise relative to rest | Numerical | integer\n", 108 | ">Slope | The slope of the peak exercise ST segment | Numerical | float\n", 109 | ">CA | Number of major vessels (0-3) colored by flourosopy | Numerical | integer\n", 110 | ">Thal | 3 = normal; 6 = fixed defect; 7 = reversable defect | Categorical | string\n", 111 | ">Target | Diagnosis of heart disease (1 = true; 0 = false) | Classification | integer" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": { 117 | "colab_type": "text", 118 | "id": "VxyBFc_kKazA" 119 | }, 120 | "source": [ 121 | "## Import TensorFlow and other libraries" 122 | ] 123 | }, 124 | { 125 | "cell_type": "code", 126 | "execution_count": 2, 127 | "metadata": {}, 128 | "outputs": [], 129 | "source": [ 130 | "# Tell TF not to consume all GPU memory so you can run more than one notebook at once\n", 131 | "import os\n", 132 | "os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 3, 138 | "metadata": {}, 139 | "outputs": [ 140 | { 141 | "name": "stdout", 142 | "output_type": "stream", 143 | "text": [ 144 | "\u001b[1m\u001b[37mcgpu01\u001b[m Fri Jul 12 16:39:23 2019\n", 145 | "\u001b[36m[0]\u001b[m \u001b[34mTesla V100-SXM2-16GB\u001b[m |\u001b[31m 33'C\u001b[m, \u001b[32m 0 %\u001b[m | \u001b[36m\u001b[1m\u001b[33m 0\u001b[m / \u001b[33m16130\u001b[m MB |\n" 146 | ] 147 | } 148 | ], 149 | "source": [ 150 | "!gpustat" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 4, 156 | "metadata": { 157 | "colab": {}, 158 | "colab_type": "code", 159 | "id": "9dEreb4QKizj" 160 | }, 161 | "outputs": [], 162 | "source": [ 163 | "from __future__ import absolute_import, division, print_function, unicode_literals\n", 164 | "\n", 165 | "import numpy as np\n", 166 | "import pandas as pd\n", 167 | "\n", 168 | "import tensorflow as tf\n", 169 | "\n", 170 | "from tensorflow import feature_column\n", 171 | "from tensorflow.keras import layers\n", 172 | "from sklearn.model_selection import train_test_split" 173 | ] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": { 178 | "colab_type": "text", 179 | "id": "KCEhSZcULZ9n" 180 | }, 181 | "source": [ 182 | "## Use Pandas to create a dataframe\n", 183 | "\n", 184 | "[Pandas](https://pandas.pydata.org/) is a Python library with many helpful utilities for loading and working with structured data. We will use Pandas to download the dataset from a URL, and load it into a dataframe." 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 5, 190 | "metadata": { 191 | "colab": {}, 192 | "colab_type": "code", 193 | "id": "REZ57BXCLdfG" 194 | }, 195 | "outputs": [ 196 | { 197 | "data": { 198 | "text/html": [ 199 | "
\n", 200 | "\n", 213 | "\n", 214 | " \n", 215 | " \n", 216 | " \n", 217 | " \n", 218 | " \n", 219 | " \n", 220 | " \n", 221 | " \n", 222 | " \n", 223 | " \n", 224 | " \n", 225 | " \n", 226 | " \n", 227 | " \n", 228 | " \n", 229 | " \n", 230 | " \n", 231 | " \n", 232 | " \n", 233 | " \n", 234 | " \n", 235 | " \n", 236 | " \n", 237 | " \n", 238 | " \n", 239 | " \n", 240 | " \n", 241 | " \n", 242 | " \n", 243 | " \n", 244 | " \n", 245 | " \n", 246 | " \n", 247 | " \n", 248 | " \n", 249 | " \n", 250 | " \n", 251 | " \n", 252 | " \n", 253 | " \n", 254 | " \n", 255 | " \n", 256 | " \n", 257 | " \n", 258 | " \n", 259 | " \n", 260 | " \n", 261 | " \n", 262 | " \n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | " \n", 298 | " \n", 299 | " \n", 300 | " \n", 301 | " \n", 302 | " \n", 303 | " \n", 304 | " \n", 305 | " \n", 306 | " \n", 307 | " \n", 308 | " \n", 309 | " \n", 310 | " \n", 311 | " \n", 312 | " \n", 313 | " \n", 314 | " \n", 315 | " \n", 316 | " \n", 317 | " \n", 318 | " \n", 319 | " \n", 320 | "
agesexcptrestbpscholfbsrestecgthalachexangoldpeakslopecathaltarget
063111452331215002.330fixed0
167141602860210811.523normal1
267141202290212912.622reversible0
337131302500018703.530normal0
441021302040217201.410normal0
\n", 321 | "
" 322 | ], 323 | "text/plain": [ 324 | " age sex cp trestbps chol fbs restecg thalach exang oldpeak slope \\\n", 325 | "0 63 1 1 145 233 1 2 150 0 2.3 3 \n", 326 | "1 67 1 4 160 286 0 2 108 1 1.5 2 \n", 327 | "2 67 1 4 120 229 0 2 129 1 2.6 2 \n", 328 | "3 37 1 3 130 250 0 0 187 0 3.5 3 \n", 329 | "4 41 0 2 130 204 0 2 172 0 1.4 1 \n", 330 | "\n", 331 | " ca thal target \n", 332 | "0 0 fixed 0 \n", 333 | "1 3 normal 1 \n", 334 | "2 2 reversible 0 \n", 335 | "3 0 normal 0 \n", 336 | "4 0 normal 0 " 337 | ] 338 | }, 339 | "execution_count": 5, 340 | "metadata": {}, 341 | "output_type": "execute_result" 342 | } 343 | ], 344 | "source": [ 345 | "URL = 'https://storage.googleapis.com/applied-dl/heart.csv'\n", 346 | "dataframe = pd.read_csv(URL)\n", 347 | "dataframe.head()" 348 | ] 349 | }, 350 | { 351 | "cell_type": "markdown", 352 | "metadata": { 353 | "colab_type": "text", 354 | "id": "u0zhLtQqMPem" 355 | }, 356 | "source": [ 357 | "## Split the dataframe into train, validation, and test\n", 358 | "\n", 359 | "The dataset we downloaded was a single CSV file. We will split this into train, validation, and test sets." 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": 6, 365 | "metadata": { 366 | "colab": {}, 367 | "colab_type": "code", 368 | "id": "YEOpw7LhMYsI" 369 | }, 370 | "outputs": [ 371 | { 372 | "name": "stdout", 373 | "output_type": "stream", 374 | "text": [ 375 | "193 train examples\n", 376 | "49 validation examples\n", 377 | "61 test examples\n" 378 | ] 379 | } 380 | ], 381 | "source": [ 382 | "train, test = train_test_split(dataframe, test_size=0.2)\n", 383 | "train, val = train_test_split(train, test_size=0.2)\n", 384 | "print(len(train), 'train examples')\n", 385 | "print(len(val), 'validation examples')\n", 386 | "print(len(test), 'test examples')" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": { 392 | "colab_type": "text", 393 | "id": "84ef46LXMfvu" 394 | }, 395 | "source": [ 396 | "## Create an input pipeline using tf.data\n", 397 | "\n", 398 | "Next, we will wrap the dataframes with [tf.data](https://www.tensorflow.org/guide/datasets). This will enable us to use feature columns as a bridge to map from the columns in the Pandas dataframe to features used to train the model. If we were working with a very large CSV file (so large that it does not fit into memory), we would use tf.data to read it from disk directly. That is not covered in this tutorial." 399 | ] 400 | }, 401 | { 402 | "cell_type": "code", 403 | "execution_count": 7, 404 | "metadata": { 405 | "colab": {}, 406 | "colab_type": "code", 407 | "id": "NkcaMYP-MsRe" 408 | }, 409 | "outputs": [], 410 | "source": [ 411 | "# A utility method to create a tf.data dataset from a Pandas Dataframe\n", 412 | "def df_to_dataset(dataframe, shuffle=True, batch_size=32):\n", 413 | " dataframe = dataframe.copy()\n", 414 | " labels = dataframe.pop('target')\n", 415 | " ds = tf.data.Dataset.from_tensor_slices((dict(dataframe), labels))\n", 416 | " if shuffle:\n", 417 | " ds = ds.shuffle(buffer_size=len(dataframe))\n", 418 | " ds = ds.batch(batch_size)\n", 419 | " return ds" 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": 8, 425 | "metadata": { 426 | "colab": {}, 427 | "colab_type": "code", 428 | "id": "CXbbXkJvMy34" 429 | }, 430 | "outputs": [], 431 | "source": [ 432 | "batch_size = 5 # A small batch sized is used for demonstration purposes\n", 433 | "train_ds = df_to_dataset(train, batch_size=batch_size)\n", 434 | "val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)\n", 435 | "test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)" 436 | ] 437 | }, 438 | { 439 | "cell_type": "markdown", 440 | "metadata": { 441 | "colab_type": "text", 442 | "id": "qRLGSMDzM-dl" 443 | }, 444 | "source": [ 445 | "## Understand the input pipeline\n", 446 | "\n", 447 | "Now that we have created the input pipeline, let's call it to see the format of the data it returns. We have used a small batch size to keep the output readable." 448 | ] 449 | }, 450 | { 451 | "cell_type": "code", 452 | "execution_count": 13, 453 | "metadata": { 454 | "colab": {}, 455 | "colab_type": "code", 456 | "id": "CSBo3dUVNFc9" 457 | }, 458 | "outputs": [ 459 | { 460 | "name": "stdout", 461 | "output_type": "stream", 462 | "text": [ 463 | "Every feature: ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal']\n", 464 | "A batch of ages: tf.Tensor([65 45 65 67 67], shape=(5,), dtype=int32)\n", 465 | "A batch of targets: tf.Tensor([0 0 0 0 1], shape=(5,), dtype=int32)\n" 466 | ] 467 | } 468 | ], 469 | "source": [ 470 | "for feature_batch, label_batch in train_ds.take(1):\n", 471 | " print('Every feature:', list(feature_batch.keys()))\n", 472 | " print('A batch of ages:', feature_batch['age'])\n", 473 | " print('A batch of targets:', label_batch)" 474 | ] 475 | }, 476 | { 477 | "cell_type": "markdown", 478 | "metadata": { 479 | "colab_type": "text", 480 | "id": "OT5N6Se-NQsC" 481 | }, 482 | "source": [ 483 | "We can see that the dataset returns a dictionary of column names (from the dataframe) that map to column values from rows in the dataframe." 484 | ] 485 | }, 486 | { 487 | "cell_type": "markdown", 488 | "metadata": { 489 | "colab_type": "text", 490 | "id": "ttIvgLRaNoOQ" 491 | }, 492 | "source": [ 493 | "## Demonstrate several types of feature column\n", 494 | "TensorFlow provides many types of feature columns. In this section, we will create several types of feature columns, and demonstrate how they transform a column from the dataframe." 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": 14, 500 | "metadata": { 501 | "colab": {}, 502 | "colab_type": "code", 503 | "id": "mxwiHFHuNhmf" 504 | }, 505 | "outputs": [], 506 | "source": [ 507 | "# We will use this batch to demonstrate several types of feature columns\n", 508 | "example_batch = next(iter(train_ds))[0]" 509 | ] 510 | }, 511 | { 512 | "cell_type": "code", 513 | "execution_count": 12, 514 | "metadata": { 515 | "colab": {}, 516 | "colab_type": "code", 517 | "id": "0wfLB8Q3N3UH" 518 | }, 519 | "outputs": [], 520 | "source": [ 521 | "# A utility method to create a feature column\n", 522 | "# and to transform a batch of data\n", 523 | "def demo(feature_column):\n", 524 | " feature_layer = layers.DenseFeatures(feature_column)\n", 525 | " print(feature_layer(example_batch).numpy())" 526 | ] 527 | }, 528 | { 529 | "cell_type": "markdown", 530 | "metadata": { 531 | "colab_type": "text", 532 | "id": "Q7OEKe82N-Qb" 533 | }, 534 | "source": [ 535 | "### Numeric columns\n", 536 | "The output of a feature column becomes the input to the model (using the demo function defined above, we will be able to see exactly how each column from the dataframe is transformed). A [numeric column](https://www.tensorflow.org/api_docs/python/tf/feature_column/numeric_column) is the simplest type of column. It is used to represent real valued features. When using this column, your model will receive the column value from the dataframe unchanged." 537 | ] 538 | }, 539 | { 540 | "cell_type": "code", 541 | "execution_count": 13, 542 | "metadata": { 543 | "colab": {}, 544 | "colab_type": "code", 545 | "id": "QZTZ0HnHOCxC" 546 | }, 547 | "outputs": [ 548 | { 549 | "name": "stdout", 550 | "output_type": "stream", 551 | "text": [ 552 | "[[66.]\n", 553 | " [62.]\n", 554 | " [65.]\n", 555 | " [54.]\n", 556 | " [55.]]\n" 557 | ] 558 | } 559 | ], 560 | "source": [ 561 | "age = feature_column.numeric_column(\"age\")\n", 562 | "demo(age)" 563 | ] 564 | }, 565 | { 566 | "cell_type": "markdown", 567 | "metadata": { 568 | "colab_type": "text", 569 | "id": "7a6ddSyzOKpq" 570 | }, 571 | "source": [ 572 | "In the heart disease dataset, most columns from the dataframe are numeric." 573 | ] 574 | }, 575 | { 576 | "cell_type": "markdown", 577 | "metadata": { 578 | "colab_type": "text", 579 | "id": "IcSxUoYgOlA1" 580 | }, 581 | "source": [ 582 | "### Bucketized columns\n", 583 | "Often, you don't want to feed a number directly into the model, but instead split its value into different categories based on numerical ranges. Consider raw data that represents a person's age. Instead of representing age as a numeric column, we could split the age into several buckets using a [bucketized column](https://www.tensorflow.org/api_docs/python/tf/feature_column/bucketized_column). Notice the one-hot values below describe which age range each row matches." 584 | ] 585 | }, 586 | { 587 | "cell_type": "code", 588 | "execution_count": 14, 589 | "metadata": { 590 | "colab": {}, 591 | "colab_type": "code", 592 | "id": "wJ4Wt3SAOpTQ" 593 | }, 594 | "outputs": [ 595 | { 596 | "name": "stdout", 597 | "output_type": "stream", 598 | "text": [ 599 | "[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]\n", 600 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]\n", 601 | " [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]\n", 602 | " [0. 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]\n", 603 | " [0. 0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]]\n" 604 | ] 605 | } 606 | ], 607 | "source": [ 608 | "age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])\n", 609 | "demo(age_buckets)" 610 | ] 611 | }, 612 | { 613 | "cell_type": "markdown", 614 | "metadata": { 615 | "colab_type": "text", 616 | "id": "r1tArzewPb-b" 617 | }, 618 | "source": [ 619 | "### Categorical columns\n", 620 | "In this dataset, thal is represented as a string (e.g. 'fixed', 'normal', or 'reversible'). We cannot feed strings directly to a model. Instead, we must first map them to numeric values. The categorical vocabulary columns provide a way to represent strings as a one-hot vector (much like you have seen above with age buckets). The vocabulary can be passed as a list using [categorical_column_with_vocabulary_list](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_list), or loaded from a file using [categorical_column_with_vocabulary_file](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_vocabulary_file)." 621 | ] 622 | }, 623 | { 624 | "cell_type": "code", 625 | "execution_count": 15, 626 | "metadata": { 627 | "colab": {}, 628 | "colab_type": "code", 629 | "id": "DJ6QnSHkPtOC" 630 | }, 631 | "outputs": [ 632 | { 633 | "name": "stderr", 634 | "output_type": "stream", 635 | "text": [ 636 | "WARNING: Logging before flag parsing goes to stderr.\n", 637 | "W0708 17:44:04.147350 46912496740800 deprecation.py:323] From /usr/common/software/tensorflow/gpu-tensorflow/2.0.0-beta-py36/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column_v2.py:2655: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", 638 | "Instructions for updating:\n", 639 | "Use tf.where in 2.0, which has the same broadcast rule as np.where\n", 640 | "W0708 17:44:04.177947 46912496740800 deprecation.py:323] From /usr/common/software/tensorflow/gpu-tensorflow/2.0.0-beta-py36/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4215: IndicatorColumn._variable_shape (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.\n", 641 | "Instructions for updating:\n", 642 | "The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.\n", 643 | "W0708 17:44:04.178982 46912496740800 deprecation.py:323] From /usr/common/software/tensorflow/gpu-tensorflow/2.0.0-beta-py36/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4270: VocabularyListCategoricalColumn._num_buckets (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.\n", 644 | "Instructions for updating:\n", 645 | "The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.\n" 646 | ] 647 | }, 648 | { 649 | "name": "stdout", 650 | "output_type": "stream", 651 | "text": [ 652 | "[[0. 1. 0.]\n", 653 | " [0. 1. 0.]\n", 654 | " [0. 1. 0.]\n", 655 | " [0. 1. 0.]\n", 656 | " [0. 1. 0.]]\n" 657 | ] 658 | } 659 | ], 660 | "source": [ 661 | "thal = feature_column.categorical_column_with_vocabulary_list(\n", 662 | " 'thal', ['fixed', 'normal', 'reversible'])\n", 663 | "\n", 664 | "thal_one_hot = feature_column.indicator_column(thal)\n", 665 | "demo(thal_one_hot)" 666 | ] 667 | }, 668 | { 669 | "cell_type": "markdown", 670 | "metadata": { 671 | "colab_type": "text", 672 | "id": "dxQloQ9jOoXL" 673 | }, 674 | "source": [ 675 | "In a more complex dataset, many columns would be categorical (e.g. strings). Feature columns are most valuable when working with categorical data. Although there is only one categorical column in this dataset, we will use it to demonstrate several important types of feature columns that you could use when working with other datasets." 676 | ] 677 | }, 678 | { 679 | "cell_type": "markdown", 680 | "metadata": { 681 | "colab_type": "text", 682 | "id": "LEFPjUr6QmwS" 683 | }, 684 | "source": [ 685 | "### Embedding columns\n", 686 | "Suppose instead of having just a few possible strings, we have thousands (or more) values per category. For a number of reasons, as the number of categories grow large, it becomes infeasible to train a neural network using one-hot encodings. We can use an embedding column to overcome this limitation. Instead of representing the data as a one-hot vector of many dimensions, an [embedding column](https://www.tensorflow.org/api_docs/python/tf/feature_column/embedding_column) represents that data as a lower-dimensional, dense vector in which each cell can contain any number, not just 0 or 1. The size of the embedding (8, in the example below) is a parameter that must be tuned.\n", 687 | "\n", 688 | "Key point: using an embedding column is best when a categorical column has many possible values. We are using one here for demonstration purposes, so you have a complete example you can modify for a different dataset in the future." 689 | ] 690 | }, 691 | { 692 | "cell_type": "code", 693 | "execution_count": 16, 694 | "metadata": { 695 | "colab": {}, 696 | "colab_type": "code", 697 | "id": "hSlohmr2Q_UU" 698 | }, 699 | "outputs": [ 700 | { 701 | "name": "stdout", 702 | "output_type": "stream", 703 | "text": [ 704 | "[[ 0.29484138 0.26313597 0.04232811 0.5781962 -0.00889751 0.11868574\n", 705 | " 0.16841811 -0.34431875]\n", 706 | " [ 0.29484138 0.26313597 0.04232811 0.5781962 -0.00889751 0.11868574\n", 707 | " 0.16841811 -0.34431875]\n", 708 | " [ 0.29484138 0.26313597 0.04232811 0.5781962 -0.00889751 0.11868574\n", 709 | " 0.16841811 -0.34431875]\n", 710 | " [ 0.29484138 0.26313597 0.04232811 0.5781962 -0.00889751 0.11868574\n", 711 | " 0.16841811 -0.34431875]\n", 712 | " [ 0.29484138 0.26313597 0.04232811 0.5781962 -0.00889751 0.11868574\n", 713 | " 0.16841811 -0.34431875]]\n" 714 | ] 715 | } 716 | ], 717 | "source": [ 718 | "# Notice the input to the embedding column is the categorical column\n", 719 | "# we previously created\n", 720 | "thal_embedding = feature_column.embedding_column(thal, dimension=8)\n", 721 | "demo(thal_embedding)" 722 | ] 723 | }, 724 | { 725 | "cell_type": "markdown", 726 | "metadata": { 727 | "colab_type": "text", 728 | "id": "urFCAvTVRMpB" 729 | }, 730 | "source": [ 731 | "### Hashed feature columns\n", 732 | "\n", 733 | "Another way to represent a categorical column with a large number of values is to use a [categorical_column_with_hash_bucket](https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket). This feature column calculates a hash value of the input, then selects one of the `hash_bucket_size` buckets to encode a string. When using this column, you do not need to provide the vocabulary, and you can choose to make the number of hash_buckets significantly smaller than the number of actual categories to save space.\n", 734 | "\n", 735 | "Key point: An important downside of this technique is that there may be collisions in which different strings are mapped to the same bucket. In practice, this can work well for some datasets regardless." 736 | ] 737 | }, 738 | { 739 | "cell_type": "code", 740 | "execution_count": 17, 741 | "metadata": { 742 | "colab": {}, 743 | "colab_type": "code", 744 | "id": "YHU_Aj2nRRDC" 745 | }, 746 | "outputs": [ 747 | { 748 | "name": "stderr", 749 | "output_type": "stream", 750 | "text": [ 751 | "W0708 17:44:25.522425 46912496740800 deprecation.py:323] From /usr/common/software/tensorflow/gpu-tensorflow/2.0.0-beta-py36/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4270: HashedCategoricalColumn._num_buckets (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.\n", 752 | "Instructions for updating:\n", 753 | "The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.\n" 754 | ] 755 | }, 756 | { 757 | "name": "stdout", 758 | "output_type": "stream", 759 | "text": [ 760 | "[[0. 0. 0. ... 0. 0. 0.]\n", 761 | " [0. 0. 0. ... 0. 0. 0.]\n", 762 | " [0. 0. 0. ... 0. 0. 0.]\n", 763 | " [0. 0. 0. ... 0. 0. 0.]\n", 764 | " [0. 0. 0. ... 0. 0. 0.]]\n" 765 | ] 766 | } 767 | ], 768 | "source": [ 769 | "thal_hashed = feature_column.categorical_column_with_hash_bucket(\n", 770 | " 'thal', hash_bucket_size=1000)\n", 771 | "demo(feature_column.indicator_column(thal_hashed))" 772 | ] 773 | }, 774 | { 775 | "cell_type": "markdown", 776 | "metadata": { 777 | "colab_type": "text", 778 | "id": "fB94M27DRXtZ" 779 | }, 780 | "source": [ 781 | "### Crossed feature columns\n", 782 | "Combining features into a single feature, better known as [feature crosses](https://developers.google.com/machine-learning/glossary/#feature_cross), enables a model to learn separate weights for each combination of features. Here, we will create a new feature that is the cross of age and thal. Note that `crossed_column` does not build the full table of all possible combinations (which could be very large). Instead, it is backed by a `hashed_column`, so you can choose how large the table is." 783 | ] 784 | }, 785 | { 786 | "cell_type": "code", 787 | "execution_count": 18, 788 | "metadata": { 789 | "colab": {}, 790 | "colab_type": "code", 791 | "id": "oaPVERd9Rep6" 792 | }, 793 | "outputs": [ 794 | { 795 | "name": "stderr", 796 | "output_type": "stream", 797 | "text": [ 798 | "W0708 17:44:28.542766 46912496740800 deprecation.py:323] From /usr/common/software/tensorflow/gpu-tensorflow/2.0.0-beta-py36/lib/python3.6/site-packages/tensorflow/python/feature_column/feature_column_v2.py:4270: CrossedColumn._num_buckets (from tensorflow.python.feature_column.feature_column_v2) is deprecated and will be removed in a future version.\n", 799 | "Instructions for updating:\n", 800 | "The old _FeatureColumn APIs are being deprecated. Please use the new FeatureColumn APIs instead.\n" 801 | ] 802 | }, 803 | { 804 | "name": "stdout", 805 | "output_type": "stream", 806 | "text": [ 807 | "[[0. 0. 0. ... 0. 0. 0.]\n", 808 | " [0. 0. 0. ... 0. 0. 0.]\n", 809 | " [0. 0. 0. ... 0. 0. 0.]\n", 810 | " [0. 0. 0. ... 0. 0. 0.]\n", 811 | " [0. 0. 0. ... 0. 0. 0.]]\n" 812 | ] 813 | } 814 | ], 815 | "source": [ 816 | "crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)\n", 817 | "demo(feature_column.indicator_column(crossed_feature))" 818 | ] 819 | }, 820 | { 821 | "cell_type": "markdown", 822 | "metadata": { 823 | "colab_type": "text", 824 | "id": "ypkI9zx6Rj1q" 825 | }, 826 | "source": [ 827 | "## Choose which columns to use\n", 828 | "We have seen how to use several types of feature columns. Now we will use them to train a model. The goal of this tutorial is to show you the complete code (e.g. mechanics) needed to work with feature columns. We have selected a few columns to train our model below arbitrarily.\n", 829 | "\n", 830 | "Key point: If your aim is to build an accurate model, try a larger dataset of your own, and think carefully about which features are the most meaningful to include, and how they should be represented." 831 | ] 832 | }, 833 | { 834 | "cell_type": "code", 835 | "execution_count": 19, 836 | "metadata": { 837 | "colab": {}, 838 | "colab_type": "code", 839 | "id": "4PlLY7fORuzA" 840 | }, 841 | "outputs": [], 842 | "source": [ 843 | "feature_columns = []\n", 844 | "\n", 845 | "# numeric cols\n", 846 | "for header in ['age', 'trestbps', 'chol', 'thalach', 'oldpeak', 'slope', 'ca']:\n", 847 | " feature_columns.append(feature_column.numeric_column(header))\n", 848 | "\n", 849 | "# bucketized cols\n", 850 | "age_buckets = feature_column.bucketized_column(age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])\n", 851 | "feature_columns.append(age_buckets)\n", 852 | "\n", 853 | "# indicator cols\n", 854 | "thal = feature_column.categorical_column_with_vocabulary_list(\n", 855 | " 'thal', ['fixed', 'normal', 'reversible'])\n", 856 | "thal_one_hot = feature_column.indicator_column(thal)\n", 857 | "feature_columns.append(thal_one_hot)\n", 858 | "\n", 859 | "# embedding cols\n", 860 | "thal_embedding = feature_column.embedding_column(thal, dimension=8)\n", 861 | "feature_columns.append(thal_embedding)\n", 862 | "\n", 863 | "# crossed cols\n", 864 | "crossed_feature = feature_column.crossed_column([age_buckets, thal], hash_bucket_size=1000)\n", 865 | "crossed_feature = feature_column.indicator_column(crossed_feature)\n", 866 | "feature_columns.append(crossed_feature)" 867 | ] 868 | }, 869 | { 870 | "cell_type": "markdown", 871 | "metadata": { 872 | "colab_type": "text", 873 | "id": "M-nDp8krS_ts" 874 | }, 875 | "source": [ 876 | "### Create a feature layer\n", 877 | "Now that we have defined our feature columns, we will use a [DenseFeatures](https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/keras/layers/DenseFeatures) layer to input them to our Keras model." 878 | ] 879 | }, 880 | { 881 | "cell_type": "code", 882 | "execution_count": 20, 883 | "metadata": { 884 | "colab": {}, 885 | "colab_type": "code", 886 | "id": "6o-El1R2TGQP" 887 | }, 888 | "outputs": [], 889 | "source": [ 890 | "feature_layer = tf.keras.layers.DenseFeatures(feature_columns)" 891 | ] 892 | }, 893 | { 894 | "cell_type": "markdown", 895 | "metadata": { 896 | "colab_type": "text", 897 | "id": "8cf6vKfgTH0U" 898 | }, 899 | "source": [ 900 | "Earlier, we used a small batch size to demonstrate how feature columns worked. We create a new input pipeline with a larger batch size." 901 | ] 902 | }, 903 | { 904 | "cell_type": "code", 905 | "execution_count": 21, 906 | "metadata": { 907 | "colab": {}, 908 | "colab_type": "code", 909 | "id": "gcemszoGSse_" 910 | }, 911 | "outputs": [], 912 | "source": [ 913 | "batch_size = 32\n", 914 | "train_ds = df_to_dataset(train, batch_size=batch_size)\n", 915 | "val_ds = df_to_dataset(val, shuffle=False, batch_size=batch_size)\n", 916 | "test_ds = df_to_dataset(test, shuffle=False, batch_size=batch_size)" 917 | ] 918 | }, 919 | { 920 | "cell_type": "markdown", 921 | "metadata": { 922 | "colab_type": "text", 923 | "id": "bBx4Xu0eTXWq" 924 | }, 925 | "source": [ 926 | "## Create, compile, and train the model" 927 | ] 928 | }, 929 | { 930 | "cell_type": "code", 931 | "execution_count": 22, 932 | "metadata": { 933 | "colab": {}, 934 | "colab_type": "code", 935 | "id": "_YJPPb3xTPeZ" 936 | }, 937 | "outputs": [ 938 | { 939 | "name": "stdout", 940 | "output_type": "stream", 941 | "text": [ 942 | "Epoch 1/5\n", 943 | "7/7 [==============================] - 1s 136ms/step - loss: 1.7815 - accuracy: 0.6680 - val_loss: 2.5096 - val_accuracy: 0.6735\n", 944 | "Epoch 2/5\n", 945 | "7/7 [==============================] - 0s 35ms/step - loss: 2.2612 - accuracy: 0.7451 - val_loss: 0.9468 - val_accuracy: 0.6735\n", 946 | "Epoch 3/5\n", 947 | "7/7 [==============================] - 0s 34ms/step - loss: 0.7198 - accuracy: 0.6553 - val_loss: 0.9441 - val_accuracy: 0.6735\n", 948 | "Epoch 4/5\n", 949 | "7/7 [==============================] - 0s 34ms/step - loss: 1.0339 - accuracy: 0.7451 - val_loss: 0.4733 - val_accuracy: 0.7551\n", 950 | "Epoch 5/5\n", 951 | "7/7 [==============================] - 0s 34ms/step - loss: 0.4802 - accuracy: 0.6961 - val_loss: 0.5859 - val_accuracy: 0.6735\n" 952 | ] 953 | }, 954 | { 955 | "data": { 956 | "text/plain": [ 957 | "" 958 | ] 959 | }, 960 | "execution_count": 22, 961 | "metadata": {}, 962 | "output_type": "execute_result" 963 | } 964 | ], 965 | "source": [ 966 | "model = tf.keras.Sequential([\n", 967 | " feature_layer,\n", 968 | " layers.Dense(128, activation='relu'),\n", 969 | " layers.Dense(128, activation='relu'),\n", 970 | " layers.Dense(1, activation='sigmoid')\n", 971 | "])\n", 972 | "\n", 973 | "model.compile(optimizer='adam',\n", 974 | " loss='binary_crossentropy',\n", 975 | " metrics=['accuracy'],\n", 976 | " run_eagerly=True)\n", 977 | "\n", 978 | "model.fit(train_ds,\n", 979 | " validation_data=val_ds,\n", 980 | " epochs=5)" 981 | ] 982 | }, 983 | { 984 | "cell_type": "code", 985 | "execution_count": 23, 986 | "metadata": { 987 | "colab": {}, 988 | "colab_type": "code", 989 | "id": "GnFmMOW0Tcaa" 990 | }, 991 | "outputs": [ 992 | { 993 | "name": "stdout", 994 | "output_type": "stream", 995 | "text": [ 996 | "2/2 [==============================] - 0s 21ms/step - loss: 0.6455 - accuracy: 0.7213\n", 997 | "Accuracy 0.72131145\n" 998 | ] 999 | } 1000 | ], 1001 | "source": [ 1002 | "loss, accuracy = model.evaluate(test_ds)\n", 1003 | "print(\"Accuracy\", accuracy)" 1004 | ] 1005 | }, 1006 | { 1007 | "cell_type": "markdown", 1008 | "metadata": { 1009 | "colab_type": "text", 1010 | "id": "3bdfbq20V6zu" 1011 | }, 1012 | "source": [ 1013 | "Key point: You will typically see best results with deep learning with much larger and more complex datasets. When working with a small dataset like this one, we recommend using a decision tree or random forest as a strong baseline. The goal of this tutorial is not to train an accurate model, but to demonstrate the mechanics of working with structured data, so you have code to use as a starting point when working with your own datasets in the future." 1014 | ] 1015 | }, 1016 | { 1017 | "cell_type": "markdown", 1018 | "metadata": { 1019 | "colab_type": "text", 1020 | "id": "SotnhVWuHQCw" 1021 | }, 1022 | "source": [ 1023 | "## Next steps\n", 1024 | "The best way to learn more about classifying structured data is to try it yourself. We suggest finding another dataset to work with, and training a model to classify it using code similar to the above. To improve accuracy, think carefully about which features to include in your model, and how they should be represented." 1025 | ] 1026 | } 1027 | ], 1028 | "metadata": { 1029 | "colab": { 1030 | "collapsed_sections": [], 1031 | "name": "feature_columns.ipynb", 1032 | "private_outputs": true, 1033 | "provenance": [], 1034 | "toc_visible": true, 1035 | "version": "0.3.2" 1036 | }, 1037 | "kernelspec": { 1038 | "display_name": "tensorflow-gpu/2.0.0-beta-py36", 1039 | "language": "python", 1040 | "name": "tensorflow_gpu_2.0.0-beta-py36" 1041 | }, 1042 | "language_info": { 1043 | "codemirror_mode": { 1044 | "name": "ipython", 1045 | "version": 3 1046 | }, 1047 | "file_extension": ".py", 1048 | "mimetype": "text/x-python", 1049 | "name": "python", 1050 | "nbconvert_exporter": "python", 1051 | "pygments_lexer": "ipython3", 1052 | "version": "3.6.8" 1053 | } 1054 | }, 1055 | "nbformat": 4, 1056 | "nbformat_minor": 2 1057 | } 1058 | -------------------------------------------------------------------------------- /intro_to_cnns.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "x4HI2mpwlrcn" 8 | }, 9 | "source": [ 10 | "##### Copyright 2019 The TensorFlow Authors." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": { 17 | "cellView": "form", 18 | "colab": {}, 19 | "colab_type": "code", 20 | "id": "679Lmwt3l1Bk" 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 25 | "# you may not use this file except in compliance with the License.\n", 26 | "# You may obtain a copy of the License at\n", 27 | "#\n", 28 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 29 | "#\n", 30 | "# Unless required by applicable law or agreed to in writing, software\n", 31 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 32 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 33 | "# See the License for the specific language governing permissions and\n", 34 | "# limitations under the License." 35 | ] 36 | }, 37 | { 38 | "cell_type": "markdown", 39 | "metadata": { 40 | "colab_type": "text", 41 | "id": "DSPCom-KmApV" 42 | }, 43 | "source": [ 44 | "# Convolutional Neural Networks" 45 | ] 46 | }, 47 | { 48 | "cell_type": "markdown", 49 | "metadata": { 50 | "colab_type": "text", 51 | "id": "klAltGp8ycek" 52 | }, 53 | "source": [ 54 | "\n", 55 | " \n", 60 | " \n", 65 | " \n", 70 | " \n", 73 | "
\n", 56 | " \n", 57 | " \n", 58 | " View on TensorFlow.org\n", 59 | " \n", 61 | " \n", 62 | " \n", 63 | " Run in Google Colab\n", 64 | " \n", 66 | " \n", 67 | " \n", 68 | " View source on GitHub\n", 69 | " \n", 71 | " Download notebook\n", 72 | "
" 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": { 79 | "colab_type": "text", 80 | "id": "qLGkt5qiyz4E" 81 | }, 82 | "source": [ 83 | "This tutorial demonstrates training a simple [Convolutional Neural Network](https://developers.google.com/machine-learning/glossary/#convolutional_neural_network) (CNN) to classify MNIST digits. This simple network will achieve over 99% accuracy on the MNIST test set. Because this tutorial uses the [Keras Sequential API](https://www.tensorflow.org/guide/keras), creating and training our model will take just a few lines of code.\n", 84 | "\n", 85 | "Note: CNNs train faster with a GPU. If you are running this notebook with Colab, you can enable the free GPU via * Edit -> Notebook settings -> Hardware accelerator -> GPU*." 86 | ] 87 | }, 88 | { 89 | "cell_type": "markdown", 90 | "metadata": { 91 | "colab_type": "text", 92 | "id": "m7KBpffWzlxH" 93 | }, 94 | "source": [ 95 | "### Import TensorFlow" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": 3, 101 | "metadata": {}, 102 | "outputs": [], 103 | "source": [ 104 | "# Tell TF not to consume all GPU memory so you can run more than one notebook at once\n", 105 | "import os\n", 106 | "os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'" 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "execution_count": 32, 112 | "metadata": {}, 113 | "outputs": [ 114 | { 115 | "name": "stdout", 116 | "output_type": "stream", 117 | "text": [ 118 | "\u001b[1m\u001b[37mcgpu11\u001b[m Wed Jul 10 16:23:19 2019\n", 119 | "\u001b[36m[0]\u001b[m \u001b[34mTesla V100-SXM2-16GB\u001b[m |\u001b[31m 39'C\u001b[m, \u001b[32m 0 %\u001b[m | \u001b[36m\u001b[1m\u001b[33m 536\u001b[m / \u001b[33m16130\u001b[m MB | \u001b[1m\u001b[30msfarrell\u001b[m(\u001b[33m525M\u001b[m)\n" 120 | ] 121 | } 122 | ], 123 | "source": [ 124 | "!gpustat" 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": 2, 130 | "metadata": { 131 | "colab": {}, 132 | "colab_type": "code", 133 | "id": "iAve6DCL4JH4" 134 | }, 135 | "outputs": [], 136 | "source": [ 137 | "from __future__ import absolute_import, division, print_function, unicode_literals\n", 138 | "\n", 139 | "import tensorflow as tf\n", 140 | "\n", 141 | "from tensorflow.keras import datasets, layers, models" 142 | ] 143 | }, 144 | { 145 | "cell_type": "markdown", 146 | "metadata": { 147 | "colab_type": "text", 148 | "id": "jRFxccghyMVo" 149 | }, 150 | "source": [ 151 | "### Download and prepare the MNIST dataset" 152 | ] 153 | }, 154 | { 155 | "cell_type": "code", 156 | "execution_count": 3, 157 | "metadata": { 158 | "colab": {}, 159 | "colab_type": "code", 160 | "id": "JWoEqyMuXFF4" 161 | }, 162 | "outputs": [], 163 | "source": [ 164 | "(train_images, train_labels), (test_images, test_labels) = datasets.mnist.load_data()\n", 165 | "\n", 166 | "train_images = train_images.reshape((60000, 28, 28, 1))\n", 167 | "test_images = test_images.reshape((10000, 28, 28, 1))\n", 168 | "\n", 169 | "# Normalize pixel values to be between 0 and 1\n", 170 | "train_images, test_images = train_images / 255.0, test_images / 255.0" 171 | ] 172 | }, 173 | { 174 | "cell_type": "markdown", 175 | "metadata": { 176 | "colab_type": "text", 177 | "id": "Oewp-wYg31t9" 178 | }, 179 | "source": [ 180 | "### Create the convolutional base" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": { 186 | "colab_type": "text", 187 | "id": "3hQvqXpNyN3x" 188 | }, 189 | "source": [ 190 | "The 6 lines of code below define the convolutional base using a common pattern: a stack of [Conv2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D) and [MaxPooling2D](https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D) layers.\n", 191 | "\n", 192 | "As input, a CNN takes tensors of shape (image_height, image_width, color_channels), ignoring the batch size. If you are new to color channels, MNIST has one (because the images are grayscale), whereas a color image has three (R,G,B). In this example, we will configure our CNN to process inputs of shape (28, 28, 1), which is the format of MNIST images. We do this by passing the argument `input_shape` to our first layer.\n", 193 | "\n" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 4, 199 | "metadata": { 200 | "colab": {}, 201 | "colab_type": "code", 202 | "id": "L9YmGQBQPrdn" 203 | }, 204 | "outputs": [], 205 | "source": [ 206 | "model = models.Sequential()\n", 207 | "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))\n", 208 | "model.add(layers.MaxPooling2D((2, 2)))\n", 209 | "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", 210 | "model.add(layers.MaxPooling2D((2, 2)))\n", 211 | "model.add(layers.Conv2D(64, (3, 3), activation='relu'))" 212 | ] 213 | }, 214 | { 215 | "cell_type": "markdown", 216 | "metadata": { 217 | "colab_type": "text", 218 | "id": "lvDVFkg-2DPm" 219 | }, 220 | "source": [ 221 | "Let display the architecture of our model so far." 222 | ] 223 | }, 224 | { 225 | "cell_type": "code", 226 | "execution_count": 5, 227 | "metadata": { 228 | "colab": {}, 229 | "colab_type": "code", 230 | "id": "8-C4XBg4UTJy" 231 | }, 232 | "outputs": [ 233 | { 234 | "name": "stdout", 235 | "output_type": "stream", 236 | "text": [ 237 | "Model: \"sequential\"\n", 238 | "_________________________________________________________________\n", 239 | "Layer (type) Output Shape Param # \n", 240 | "=================================================================\n", 241 | "conv2d (Conv2D) (None, 26, 26, 32) 320 \n", 242 | "_________________________________________________________________\n", 243 | "max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0 \n", 244 | "_________________________________________________________________\n", 245 | "conv2d_1 (Conv2D) (None, 11, 11, 64) 18496 \n", 246 | "_________________________________________________________________\n", 247 | "max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0 \n", 248 | "_________________________________________________________________\n", 249 | "conv2d_2 (Conv2D) (None, 3, 3, 64) 36928 \n", 250 | "=================================================================\n", 251 | "Total params: 55,744\n", 252 | "Trainable params: 55,744\n", 253 | "Non-trainable params: 0\n", 254 | "_________________________________________________________________\n" 255 | ] 256 | } 257 | ], 258 | "source": [ 259 | "model.summary()" 260 | ] 261 | }, 262 | { 263 | "cell_type": "markdown", 264 | "metadata": { 265 | "colab_type": "text", 266 | "id": "_j-AXYeZ2GO5" 267 | }, 268 | "source": [ 269 | "Above, you can see that the output of every Conv2D and MaxPooling2D layer is a 3D tensor of shape (height, width, channels). The width and height dimensions tend to shrink as we go deeper in the network. The number of output channels for each Conv2D layer is controlled by the first argument (e.g., 32 or 64). Typically, as the width and height shrink, we can afford (computationally) to add more output channels in each Conv2D layer." 270 | ] 271 | }, 272 | { 273 | "cell_type": "markdown", 274 | "metadata": { 275 | "colab_type": "text", 276 | "id": "_v8sVOtG37bT" 277 | }, 278 | "source": [ 279 | "### Add Dense layers on top\n", 280 | "To complete our model, we will feed the last output tensor from the convolutional base (of shape (3, 3, 64)) into one or more Dense layers to perform classification. Dense layers take vectors as input (which are 1D), while the current output is a 3D tensor. First, we will flatten (or unroll) the 3D output to 1D, then add one or more Dense layers on top. MNIST has 10 output classes, so we use a final Dense layer with 10 outputs and a softmax activation." 281 | ] 282 | }, 283 | { 284 | "cell_type": "code", 285 | "execution_count": 6, 286 | "metadata": { 287 | "colab": {}, 288 | "colab_type": "code", 289 | "id": "mRs95d6LUVEi" 290 | }, 291 | "outputs": [], 292 | "source": [ 293 | "model.add(layers.Flatten())\n", 294 | "model.add(layers.Dense(64, activation='relu'))\n", 295 | "model.add(layers.Dense(10, activation='softmax'))" 296 | ] 297 | }, 298 | { 299 | "cell_type": "markdown", 300 | "metadata": { 301 | "colab_type": "text", 302 | "id": "ipGiQMcR4Gtq" 303 | }, 304 | "source": [ 305 | " Here's the complete architecture of our model." 306 | ] 307 | }, 308 | { 309 | "cell_type": "code", 310 | "execution_count": 7, 311 | "metadata": { 312 | "colab": {}, 313 | "colab_type": "code", 314 | "id": "8Yu_m-TZUWGX" 315 | }, 316 | "outputs": [ 317 | { 318 | "name": "stdout", 319 | "output_type": "stream", 320 | "text": [ 321 | "Model: \"sequential\"\n", 322 | "_________________________________________________________________\n", 323 | "Layer (type) Output Shape Param # \n", 324 | "=================================================================\n", 325 | "conv2d (Conv2D) (None, 26, 26, 32) 320 \n", 326 | "_________________________________________________________________\n", 327 | "max_pooling2d (MaxPooling2D) (None, 13, 13, 32) 0 \n", 328 | "_________________________________________________________________\n", 329 | "conv2d_1 (Conv2D) (None, 11, 11, 64) 18496 \n", 330 | "_________________________________________________________________\n", 331 | "max_pooling2d_1 (MaxPooling2 (None, 5, 5, 64) 0 \n", 332 | "_________________________________________________________________\n", 333 | "conv2d_2 (Conv2D) (None, 3, 3, 64) 36928 \n", 334 | "_________________________________________________________________\n", 335 | "flatten (Flatten) (None, 576) 0 \n", 336 | "_________________________________________________________________\n", 337 | "dense (Dense) (None, 64) 36928 \n", 338 | "_________________________________________________________________\n", 339 | "dense_1 (Dense) (None, 10) 650 \n", 340 | "=================================================================\n", 341 | "Total params: 93,322\n", 342 | "Trainable params: 93,322\n", 343 | "Non-trainable params: 0\n", 344 | "_________________________________________________________________\n" 345 | ] 346 | } 347 | ], 348 | "source": [ 349 | "model.summary()" 350 | ] 351 | }, 352 | { 353 | "cell_type": "markdown", 354 | "metadata": { 355 | "colab_type": "text", 356 | "id": "xNKXi-Gy3RO-" 357 | }, 358 | "source": [ 359 | "As you can see, our (3, 3, 64) outputs were flattened into vectors of shape (576) before going through two Dense layers." 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": { 365 | "colab_type": "text", 366 | "id": "P3odqfHP4M67" 367 | }, 368 | "source": [ 369 | "### Compile and train the model" 370 | ] 371 | }, 372 | { 373 | "cell_type": "code", 374 | "execution_count": 8, 375 | "metadata": { 376 | "colab": {}, 377 | "colab_type": "code", 378 | "id": "MdDzI75PUXrG" 379 | }, 380 | "outputs": [ 381 | { 382 | "name": "stderr", 383 | "output_type": "stream", 384 | "text": [ 385 | "WARNING: Logging before flag parsing goes to stderr.\n", 386 | "W0708 21:30:03.791706 46912496740800 deprecation.py:323] From /usr/common/software/tensorflow/gpu-tensorflow/2.0.0-beta-py36/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", 387 | "Instructions for updating:\n", 388 | "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" 389 | ] 390 | }, 391 | { 392 | "name": "stdout", 393 | "output_type": "stream", 394 | "text": [ 395 | "Train on 60000 samples\n", 396 | "Epoch 1/5\n", 397 | "60000/60000 [==============================] - 7s 123us/sample - loss: 0.1433 - accuracy: 0.9549\n", 398 | "Epoch 2/5\n", 399 | "60000/60000 [==============================] - 5s 87us/sample - loss: 0.0454 - accuracy: 0.9857\n", 400 | "Epoch 3/5\n", 401 | "60000/60000 [==============================] - 5s 87us/sample - loss: 0.0324 - accuracy: 0.9895\n", 402 | "Epoch 4/5\n", 403 | "60000/60000 [==============================] - 5s 86us/sample - loss: 0.0245 - accuracy: 0.9919\n", 404 | "Epoch 5/5\n", 405 | "60000/60000 [==============================] - 5s 87us/sample - loss: 0.0175 - accuracy: 0.9945\n" 406 | ] 407 | }, 408 | { 409 | "data": { 410 | "text/plain": [ 411 | "" 412 | ] 413 | }, 414 | "execution_count": 8, 415 | "metadata": {}, 416 | "output_type": "execute_result" 417 | } 418 | ], 419 | "source": [ 420 | "model.compile(optimizer='adam',\n", 421 | " loss='sparse_categorical_crossentropy',\n", 422 | " metrics=['accuracy'])\n", 423 | "\n", 424 | "model.fit(train_images, train_labels, epochs=5)" 425 | ] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": { 430 | "colab_type": "text", 431 | "id": "jKgyC5K_4O0d" 432 | }, 433 | "source": [ 434 | "### Evaluate the model" 435 | ] 436 | }, 437 | { 438 | "cell_type": "code", 439 | "execution_count": 9, 440 | "metadata": { 441 | "colab": {}, 442 | "colab_type": "code", 443 | "id": "gtyDF0MKUcM7" 444 | }, 445 | "outputs": [ 446 | { 447 | "name": "stdout", 448 | "output_type": "stream", 449 | "text": [ 450 | "10000/10000 [==============================] - 0s 41us/sample - loss: 0.0336 - accuracy: 0.9910\n" 451 | ] 452 | } 453 | ], 454 | "source": [ 455 | "test_loss, test_acc = model.evaluate(test_images, test_labels)" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 10, 461 | "metadata": { 462 | "colab": {}, 463 | "colab_type": "code", 464 | "id": "0LvwaKhtUdOo" 465 | }, 466 | "outputs": [ 467 | { 468 | "name": "stdout", 469 | "output_type": "stream", 470 | "text": [ 471 | "0.991\n" 472 | ] 473 | } 474 | ], 475 | "source": [ 476 | "print(test_acc)" 477 | ] 478 | }, 479 | { 480 | "cell_type": "markdown", 481 | "metadata": { 482 | "colab_type": "text", 483 | "id": "8cfJ8AR03gT5" 484 | }, 485 | "source": [ 486 | "As you can see, our simple CNN has achieved a test accuracy of over 99%. Not bad for a few lines of code! For another style of writing a CNN (using the Keras Subclassing API and a GradientTape) head [here](https://github.com/tensorflow/docs/blob/master/site/en/r2/tutorials/quickstart/advanced.ipynb)." 487 | ] 488 | } 489 | ], 490 | "metadata": { 491 | "accelerator": "GPU", 492 | "colab": { 493 | "collapsed_sections": [], 494 | "name": "intro_to_cnns.ipynb", 495 | "private_outputs": true, 496 | "provenance": [], 497 | "toc_visible": true, 498 | "version": "0.3.2" 499 | }, 500 | "kernelspec": { 501 | "display_name": "tensorflow-gpu/2.0.0-beta-py36", 502 | "language": "python", 503 | "name": "tensorflow_gpu_2.0.0-beta-py36" 504 | }, 505 | "language_info": { 506 | "codemirror_mode": { 507 | "name": "ipython", 508 | "version": 3 509 | }, 510 | "file_extension": ".py", 511 | "mimetype": "text/x-python", 512 | "name": "python", 513 | "nbconvert_exporter": "python", 514 | "pygments_lexer": "ipython3", 515 | "version": "3.6.8" 516 | } 517 | }, 518 | "nbformat": 4, 519 | "nbformat_minor": 2 520 | } 521 | -------------------------------------------------------------------------------- /save_and_restore_models.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": { 6 | "colab_type": "text", 7 | "id": "g_nWetWWd_ns" 8 | }, 9 | "source": [ 10 | "##### Copyright 2018 The TensorFlow Authors." 11 | ] 12 | }, 13 | { 14 | "cell_type": "code", 15 | "execution_count": 1, 16 | "metadata": { 17 | "cellView": "form", 18 | "colab": {}, 19 | "colab_type": "code", 20 | "id": "2pHVBk_seED1" 21 | }, 22 | "outputs": [], 23 | "source": [ 24 | "#@title Licensed under the Apache License, Version 2.0 (the \"License\");\n", 25 | "# you may not use this file except in compliance with the License.\n", 26 | "# You may obtain a copy of the License at\n", 27 | "#\n", 28 | "# https://www.apache.org/licenses/LICENSE-2.0\n", 29 | "#\n", 30 | "# Unless required by applicable law or agreed to in writing, software\n", 31 | "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", 32 | "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", 33 | "# See the License for the specific language governing permissions and\n", 34 | "# limitations under the License." 35 | ] 36 | }, 37 | { 38 | "cell_type": "code", 39 | "execution_count": 2, 40 | "metadata": { 41 | "cellView": "form", 42 | "colab": {}, 43 | "colab_type": "code", 44 | "id": "N_fMsQ-N8I7j" 45 | }, 46 | "outputs": [], 47 | "source": [ 48 | "#@title MIT License\n", 49 | "#\n", 50 | "# Copyright (c) 2017 François Chollet\n", 51 | "#\n", 52 | "# Permission is hereby granted, free of charge, to any person obtaining a\n", 53 | "# copy of this software and associated documentation files (the \"Software\"),\n", 54 | "# to deal in the Software without restriction, including without limitation\n", 55 | "# the rights to use, copy, modify, merge, publish, distribute, sublicense,\n", 56 | "# and/or sell copies of the Software, and to permit persons to whom the\n", 57 | "# Software is furnished to do so, subject to the following conditions:\n", 58 | "#\n", 59 | "# The above copyright notice and this permission notice shall be included in\n", 60 | "# all copies or substantial portions of the Software.\n", 61 | "#\n", 62 | "# THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n", 63 | "# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n", 64 | "# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL\n", 65 | "# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n", 66 | "# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n", 67 | "# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER\n", 68 | "# DEALINGS IN THE SOFTWARE." 69 | ] 70 | }, 71 | { 72 | "cell_type": "markdown", 73 | "metadata": { 74 | "colab_type": "text", 75 | "id": "pZJ3uY9O17VN" 76 | }, 77 | "source": [ 78 | "# Save and restore models" 79 | ] 80 | }, 81 | { 82 | "cell_type": "markdown", 83 | "metadata": { 84 | "colab_type": "text", 85 | "id": "M4Ata7_wMul1" 86 | }, 87 | "source": [ 88 | "\n", 89 | " \n", 92 | " \n", 95 | " \n", 98 | " \n", 101 | "
\n", 90 | " View on TensorFlow.org\n", 91 | " \n", 93 | " Run in Google Colab\n", 94 | " \n", 96 | " View source on GitHub\n", 97 | " \n", 99 | " Download notebook\n", 100 | "
" 102 | ] 103 | }, 104 | { 105 | "cell_type": "markdown", 106 | "metadata": { 107 | "colab_type": "text", 108 | "id": "mBdde4YJeJKF" 109 | }, 110 | "source": [ 111 | "Model progress can be saved during—and after—training. This means a model can resume where it left off and avoid long training times. Saving also means you can share your model and others can recreate your work. When publishing research models and techniques, most machine learning practitioners share:\n", 112 | "\n", 113 | "* code to create the model, and\n", 114 | "* the trained weights, or parameters, for the model\n", 115 | "\n", 116 | "Sharing this data helps others understand how the model works and try it themselves with new data.\n", 117 | "\n", 118 | "Caution: Be careful with untrusted code—TensorFlow models are code. See [Using TensorFlow Securely](https://github.com/tensorflow/tensorflow/blob/master/SECURITY.md) for details.\n", 119 | "\n", 120 | "### Options\n", 121 | "\n", 122 | "There are different ways to save TensorFlow models—depending on the API you're using. This guide uses [tf.keras](https://www.tensorflow.org/guide/keras), a high-level API to build and train models in TensorFlow. For other approaches, see the TensorFlow [Save and Restore](https://www.tensorflow.org/guide/saved_model) guide or [Saving in eager](https://www.tensorflow.org/guide/eager#object-based_saving).\n" 123 | ] 124 | }, 125 | { 126 | "cell_type": "markdown", 127 | "metadata": { 128 | "colab_type": "text", 129 | "id": "xCUREq7WXgvg" 130 | }, 131 | "source": [ 132 | "## Setup" 133 | ] 134 | }, 135 | { 136 | "cell_type": "markdown", 137 | "metadata": { 138 | "colab_type": "text", 139 | "id": "SbGsznErXWt6" 140 | }, 141 | "source": [ 142 | "### Get an example dataset\n", 143 | "\n", 144 | "We'll use the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) to train our model to demonstrate saving weights. To speed up these demonstration runs, only use the first 1000 examples:" 145 | ] 146 | }, 147 | { 148 | "cell_type": "code", 149 | "execution_count": 3, 150 | "metadata": {}, 151 | "outputs": [], 152 | "source": [ 153 | "# Tell TF not to consume all GPU memory so you can run more than one notebook at once\n", 154 | "import os\n", 155 | "os.environ['TF_FORCE_GPU_ALLOW_GROWTH'] = 'true'" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 4, 161 | "metadata": {}, 162 | "outputs": [ 163 | { 164 | "name": "stdout", 165 | "output_type": "stream", 166 | "text": [ 167 | "\u001b[1m\u001b[37mcgpu03\u001b[m Sat Jul 13 16:28:25 2019\n", 168 | "\u001b[36m[0]\u001b[m \u001b[34mTesla V100-SXM2-16GB\u001b[m |\u001b[31m 36'C\u001b[m, \u001b[32m 0 %\u001b[m | \u001b[36m\u001b[1m\u001b[33m 0\u001b[m / \u001b[33m16130\u001b[m MB |\n" 169 | ] 170 | } 171 | ], 172 | "source": [ 173 | "!gpustat" 174 | ] 175 | }, 176 | { 177 | "cell_type": "code", 178 | "execution_count": 5, 179 | "metadata": { 180 | "colab": {}, 181 | "colab_type": "code", 182 | "id": "7Nm7Tyb-gRt-" 183 | }, 184 | "outputs": [ 185 | { 186 | "data": { 187 | "text/plain": [ 188 | "'2.0.0-beta1'" 189 | ] 190 | }, 191 | "execution_count": 5, 192 | "metadata": {}, 193 | "output_type": "execute_result" 194 | } 195 | ], 196 | "source": [ 197 | "from __future__ import absolute_import, division, print_function, unicode_literals\n", 198 | "\n", 199 | "import tensorflow as tf\n", 200 | "from tensorflow import keras\n", 201 | "\n", 202 | "tf.__version__" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 6, 208 | "metadata": { 209 | "colab": {}, 210 | "colab_type": "code", 211 | "id": "9rGfFwE9XVwz" 212 | }, 213 | "outputs": [], 214 | "source": [ 215 | "(train_images, train_labels), (test_images, test_labels) = tf.keras.datasets.mnist.load_data()\n", 216 | "\n", 217 | "train_labels = train_labels[:1000]\n", 218 | "test_labels = test_labels[:1000]\n", 219 | "\n", 220 | "train_images = train_images[:1000].reshape(-1, 28 * 28) / 255.0\n", 221 | "test_images = test_images[:1000].reshape(-1, 28 * 28) / 255.0" 222 | ] 223 | }, 224 | { 225 | "cell_type": "markdown", 226 | "metadata": { 227 | "colab_type": "text", 228 | "id": "anG3iVoXyZGI" 229 | }, 230 | "source": [ 231 | "### Define a model" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": { 237 | "colab_type": "text", 238 | "id": "wynsOBfby0Pa" 239 | }, 240 | "source": [ 241 | "Let's build a simple model we'll use to demonstrate saving and loading weights." 242 | ] 243 | }, 244 | { 245 | "cell_type": "code", 246 | "execution_count": 7, 247 | "metadata": { 248 | "colab": {}, 249 | "colab_type": "code", 250 | "id": "0HZbJIjxyX1S" 251 | }, 252 | "outputs": [ 253 | { 254 | "name": "stdout", 255 | "output_type": "stream", 256 | "text": [ 257 | "Model: \"sequential\"\n", 258 | "_________________________________________________________________\n", 259 | "Layer (type) Output Shape Param # \n", 260 | "=================================================================\n", 261 | "dense (Dense) (None, 512) 401920 \n", 262 | "_________________________________________________________________\n", 263 | "dropout (Dropout) (None, 512) 0 \n", 264 | "_________________________________________________________________\n", 265 | "dense_1 (Dense) (None, 10) 5130 \n", 266 | "=================================================================\n", 267 | "Total params: 407,050\n", 268 | "Trainable params: 407,050\n", 269 | "Non-trainable params: 0\n", 270 | "_________________________________________________________________\n" 271 | ] 272 | } 273 | ], 274 | "source": [ 275 | "# Returns a short sequential model\n", 276 | "def create_model():\n", 277 | " model = tf.keras.models.Sequential([\n", 278 | " keras.layers.Dense(512, activation='relu', input_shape=(784,)),\n", 279 | " keras.layers.Dropout(0.2),\n", 280 | " keras.layers.Dense(10, activation='softmax')\n", 281 | " ])\n", 282 | "\n", 283 | " model.compile(optimizer='adam',\n", 284 | " loss='sparse_categorical_crossentropy',\n", 285 | " metrics=['accuracy'])\n", 286 | "\n", 287 | " return model\n", 288 | "\n", 289 | "\n", 290 | "# Create a basic model instance\n", 291 | "model = create_model()\n", 292 | "model.summary()" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": { 298 | "colab_type": "text", 299 | "id": "soDE0W_KH8rG" 300 | }, 301 | "source": [ 302 | "## Save checkpoints during training" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": { 308 | "colab_type": "text", 309 | "id": "mRyd5qQQIXZm" 310 | }, 311 | "source": [ 312 | "The primary use case is to automatically save checkpoints *during* and at *the end* of training. This way you can use a trained model without having to retrain it, or pick-up training where you left of—in case the training process was interrupted.\n", 313 | "\n", 314 | "`tf.keras.callbacks.ModelCheckpoint` is a callback that performs this task. The callback takes a couple of arguments to configure checkpointing.\n", 315 | "\n", 316 | "### Checkpoint callback usage\n", 317 | "\n", 318 | "Train the model and pass it the `ModelCheckpoint` callback:" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 8, 324 | "metadata": { 325 | "colab": {}, 326 | "colab_type": "code", 327 | "id": "IFPuhwntH8VH" 328 | }, 329 | "outputs": [ 330 | { 331 | "name": "stderr", 332 | "output_type": "stream", 333 | "text": [ 334 | "WARNING: Logging before flag parsing goes to stderr.\n", 335 | "W0713 16:29:02.212527 46912496740800 deprecation.py:323] From /usr/common/software/tensorflow/gpu-tensorflow/2.0.0-beta-py36/lib/python3.6/site-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.\n", 336 | "Instructions for updating:\n", 337 | "Use tf.where in 2.0, which has the same broadcast rule as np.where\n" 338 | ] 339 | }, 340 | { 341 | "name": "stdout", 342 | "output_type": "stream", 343 | "text": [ 344 | "Train on 1000 samples, validate on 1000 samples\n", 345 | "Epoch 1/10\n", 346 | " 832/1000 [=======================>......] - ETA: 0s - loss: 1.2869 - accuracy: 0.6214 \n", 347 | "Epoch 00001: saving model to training_1/cp.ckpt\n", 348 | "1000/1000 [==============================] - 1s 641us/sample - loss: 1.1847 - accuracy: 0.6510 - val_loss: 0.7144 - val_accuracy: 0.7840\n", 349 | "Epoch 2/10\n", 350 | " 832/1000 [=======================>......] - ETA: 0s - loss: 0.4252 - accuracy: 0.8690\n", 351 | "Epoch 00002: saving model to training_1/cp.ckpt\n", 352 | "1000/1000 [==============================] - 0s 123us/sample - loss: 0.4301 - accuracy: 0.8700 - val_loss: 0.5157 - val_accuracy: 0.8430\n", 353 | "Epoch 3/10\n", 354 | " 832/1000 [=======================>......] - ETA: 0s - loss: 0.2988 - accuracy: 0.9255\n", 355 | "Epoch 00003: saving model to training_1/cp.ckpt\n", 356 | "1000/1000 [==============================] - 0s 117us/sample - loss: 0.2923 - accuracy: 0.9270 - val_loss: 0.4626 - val_accuracy: 0.8530\n", 357 | "Epoch 4/10\n", 358 | " 832/1000 [=======================>......] - ETA: 0s - loss: 0.2020 - accuracy: 0.9615\n", 359 | "Epoch 00004: saving model to training_1/cp.ckpt\n", 360 | "1000/1000 [==============================] - 0s 117us/sample - loss: 0.2067 - accuracy: 0.9590 - val_loss: 0.4642 - val_accuracy: 0.8520\n", 361 | "Epoch 5/10\n", 362 | " 800/1000 [=======================>......] - ETA: 0s - loss: 0.1493 - accuracy: 0.9700\n", 363 | "Epoch 00005: saving model to training_1/cp.ckpt\n", 364 | "1000/1000 [==============================] - 0s 123us/sample - loss: 0.1578 - accuracy: 0.9680 - val_loss: 0.4094 - val_accuracy: 0.8630\n", 365 | "Epoch 6/10\n", 366 | " 800/1000 [=======================>......] - ETA: 0s - loss: 0.1092 - accuracy: 0.9800\n", 367 | "Epoch 00006: saving model to training_1/cp.ckpt\n", 368 | "1000/1000 [==============================] - 0s 121us/sample - loss: 0.1173 - accuracy: 0.9780 - val_loss: 0.4104 - val_accuracy: 0.8650\n", 369 | "Epoch 7/10\n", 370 | " 800/1000 [=======================>......] - ETA: 0s - loss: 0.0867 - accuracy: 0.9887\n", 371 | "Epoch 00007: saving model to training_1/cp.ckpt\n", 372 | "1000/1000 [==============================] - 0s 118us/sample - loss: 0.0874 - accuracy: 0.9870 - val_loss: 0.4074 - val_accuracy: 0.8570\n", 373 | "Epoch 8/10\n", 374 | " 832/1000 [=======================>......] - ETA: 0s - loss: 0.0591 - accuracy: 0.9964\n", 375 | "Epoch 00008: saving model to training_1/cp.ckpt\n", 376 | "1000/1000 [==============================] - 0s 117us/sample - loss: 0.0602 - accuracy: 0.9950 - val_loss: 0.4031 - val_accuracy: 0.8560\n", 377 | "Epoch 9/10\n", 378 | " 832/1000 [=======================>......] - ETA: 0s - loss: 0.0586 - accuracy: 0.9952\n", 379 | "Epoch 00009: saving model to training_1/cp.ckpt\n", 380 | "1000/1000 [==============================] - 0s 117us/sample - loss: 0.0562 - accuracy: 0.9960 - val_loss: 0.4018 - val_accuracy: 0.8630\n", 381 | "Epoch 10/10\n", 382 | " 832/1000 [=======================>......] - ETA: 0s - loss: 0.0404 - accuracy: 0.9988\n", 383 | "Epoch 00010: saving model to training_1/cp.ckpt\n", 384 | "1000/1000 [==============================] - 0s 117us/sample - loss: 0.0428 - accuracy: 0.9990 - val_loss: 0.4028 - val_accuracy: 0.8680\n" 385 | ] 386 | }, 387 | { 388 | "data": { 389 | "text/plain": [ 390 | "" 391 | ] 392 | }, 393 | "execution_count": 8, 394 | "metadata": {}, 395 | "output_type": "execute_result" 396 | } 397 | ], 398 | "source": [ 399 | "checkpoint_path = \"training_1/cp.ckpt\"\n", 400 | "checkpoint_dir = os.path.dirname(checkpoint_path)\n", 401 | "\n", 402 | "# Create checkpoint callback\n", 403 | "cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,\n", 404 | " save_weights_only=True,\n", 405 | " verbose=1)\n", 406 | "\n", 407 | "model = create_model()\n", 408 | "\n", 409 | "model.fit(train_images, train_labels, epochs = 10,\n", 410 | " validation_data = (test_images,test_labels),\n", 411 | " callbacks = [cp_callback]) # pass callback to training\n", 412 | "\n", 413 | "# This may generate warnings related to saving the state of the optimizer.\n", 414 | "# These warnings (and similar warnings throughout this notebook)\n", 415 | "# are in place to discourage outdated usage, and can be ignored." 416 | ] 417 | }, 418 | { 419 | "cell_type": "markdown", 420 | "metadata": { 421 | "colab_type": "text", 422 | "id": "rlM-sgyJO084" 423 | }, 424 | "source": [ 425 | "This creates a single collection of TensorFlow checkpoint files that are updated at the end of each epoch:" 426 | ] 427 | }, 428 | { 429 | "cell_type": "code", 430 | "execution_count": 9, 431 | "metadata": { 432 | "colab": {}, 433 | "colab_type": "code", 434 | "id": "gXG5FVKFOVQ3" 435 | }, 436 | "outputs": [ 437 | { 438 | "name": "stdout", 439 | "output_type": "stream", 440 | "text": [ 441 | "checkpoint\t\t cp.ckpt.data-00001-of-00002\n", 442 | "cp.ckpt.data-00000-of-00002 cp.ckpt.index\n" 443 | ] 444 | } 445 | ], 446 | "source": [ 447 | "!ls {checkpoint_dir}" 448 | ] 449 | }, 450 | { 451 | "cell_type": "markdown", 452 | "metadata": { 453 | "colab_type": "text", 454 | "id": "wlRN_f56Pqa9" 455 | }, 456 | "source": [ 457 | "Create a new, untrained model. When restoring a model from only weights, you must have a model with the same architecture as the original model. Since it's the same model architecture, we can share weights despite that it's a different *instance* of the model.\n", 458 | "\n", 459 | "Now rebuild a fresh, untrained model, and evaluate it on the test set. An untrained model will perform at chance levels (~10% accuracy):" 460 | ] 461 | }, 462 | { 463 | "cell_type": "code", 464 | "execution_count": 10, 465 | "metadata": { 466 | "colab": {}, 467 | "colab_type": "code", 468 | "id": "Fp5gbuiaPqCT" 469 | }, 470 | "outputs": [ 471 | { 472 | "name": "stdout", 473 | "output_type": "stream", 474 | "text": [ 475 | "1000/1000 [==============================] - 0s 92us/sample - loss: 2.3596 - accuracy: 0.1370\n", 476 | "Untrained model, accuracy: 13.70%\n" 477 | ] 478 | } 479 | ], 480 | "source": [ 481 | "model = create_model()\n", 482 | "\n", 483 | "loss, acc = model.evaluate(test_images, test_labels)\n", 484 | "print(\"Untrained model, accuracy: {:5.2f}%\".format(100*acc))" 485 | ] 486 | }, 487 | { 488 | "cell_type": "markdown", 489 | "metadata": { 490 | "colab_type": "text", 491 | "id": "1DTKpZssRSo3" 492 | }, 493 | "source": [ 494 | "Then load the weights from the checkpoint, and re-evaluate:" 495 | ] 496 | }, 497 | { 498 | "cell_type": "code", 499 | "execution_count": 11, 500 | "metadata": { 501 | "colab": {}, 502 | "colab_type": "code", 503 | "id": "2IZxbwiRRSD2" 504 | }, 505 | "outputs": [ 506 | { 507 | "name": "stdout", 508 | "output_type": "stream", 509 | "text": [ 510 | "1000/1000 [==============================] - 0s 35us/sample - loss: 0.4028 - accuracy: 0.8680\n", 511 | "Restored model, accuracy: 86.80%\n" 512 | ] 513 | } 514 | ], 515 | "source": [ 516 | "model.load_weights(checkpoint_path)\n", 517 | "loss,acc = model.evaluate(test_images, test_labels)\n", 518 | "print(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))" 519 | ] 520 | }, 521 | { 522 | "cell_type": "markdown", 523 | "metadata": { 524 | "colab_type": "text", 525 | "id": "bpAbKkAyVPV8" 526 | }, 527 | "source": [ 528 | "### Checkpoint callback options\n", 529 | "\n", 530 | "The callback provides several options to give the resulting checkpoints unique names, and adjust the checkpointing frequency.\n", 531 | "\n", 532 | "Train a new model, and save uniquely named checkpoints once every 5-epochs:\n" 533 | ] 534 | }, 535 | { 536 | "cell_type": "code", 537 | "execution_count": 12, 538 | "metadata": { 539 | "colab": {}, 540 | "colab_type": "code", 541 | "id": "mQF_dlgIVOvq" 542 | }, 543 | "outputs": [ 544 | { 545 | "name": "stderr", 546 | "output_type": "stream", 547 | "text": [ 548 | "W0713 16:29:51.184920 46912496740800 callbacks.py:859] `period` argument is deprecated. Please use `save_freq` to specify the frequency in number of samples seen.\n" 549 | ] 550 | }, 551 | { 552 | "name": "stdout", 553 | "output_type": "stream", 554 | "text": [ 555 | "\n", 556 | "Epoch 00005: saving model to training_2/cp-0005.ckpt\n" 557 | ] 558 | }, 559 | { 560 | "name": "stderr", 561 | "output_type": "stream", 562 | "text": [ 563 | "W0713 16:29:52.894087 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.iter\n", 564 | "W0713 16:29:52.894643 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_1\n", 565 | "W0713 16:29:52.897105 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_2\n", 566 | "W0713 16:29:52.897518 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.decay\n", 567 | "W0713 16:29:52.897911 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.learning_rate\n", 568 | "W0713 16:29:52.898756 46912496740800 util.py:252] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details.\n" 569 | ] 570 | }, 571 | { 572 | "name": "stdout", 573 | "output_type": "stream", 574 | "text": [ 575 | "\n", 576 | "Epoch 00010: saving model to training_2/cp-0010.ckpt\n", 577 | "\n", 578 | "Epoch 00015: saving model to training_2/cp-0015.ckpt\n", 579 | "\n", 580 | "Epoch 00020: saving model to training_2/cp-0020.ckpt\n", 581 | "\n", 582 | "Epoch 00025: saving model to training_2/cp-0025.ckpt\n", 583 | "\n", 584 | "Epoch 00030: saving model to training_2/cp-0030.ckpt\n", 585 | "\n", 586 | "Epoch 00035: saving model to training_2/cp-0035.ckpt\n", 587 | "\n", 588 | "Epoch 00040: saving model to training_2/cp-0040.ckpt\n", 589 | "\n", 590 | "Epoch 00045: saving model to training_2/cp-0045.ckpt\n", 591 | "\n", 592 | "Epoch 00050: saving model to training_2/cp-0050.ckpt\n" 593 | ] 594 | }, 595 | { 596 | "data": { 597 | "text/plain": [ 598 | "" 599 | ] 600 | }, 601 | "execution_count": 12, 602 | "metadata": {}, 603 | "output_type": "execute_result" 604 | } 605 | ], 606 | "source": [ 607 | "# include the epoch in the file name. (uses `str.format`)\n", 608 | "checkpoint_path = \"training_2/cp-{epoch:04d}.ckpt\"\n", 609 | "checkpoint_dir = os.path.dirname(checkpoint_path)\n", 610 | "\n", 611 | "cp_callback = tf.keras.callbacks.ModelCheckpoint(\n", 612 | " checkpoint_path, verbose=1, save_weights_only=True,\n", 613 | " # Save weights, every 5-epochs.\n", 614 | " period=5)\n", 615 | "\n", 616 | "model = create_model()\n", 617 | "model.save_weights(checkpoint_path.format(epoch=0))\n", 618 | "model.fit(train_images, train_labels,\n", 619 | " epochs = 50, callbacks = [cp_callback],\n", 620 | " validation_data = (test_images,test_labels),\n", 621 | " verbose=0)" 622 | ] 623 | }, 624 | { 625 | "cell_type": "markdown", 626 | "metadata": { 627 | "colab_type": "text", 628 | "id": "1zFrKTjjavWI" 629 | }, 630 | "source": [ 631 | "Now, look at the resulting checkpoints and choose the latest one:" 632 | ] 633 | }, 634 | { 635 | "cell_type": "code", 636 | "execution_count": 13, 637 | "metadata": { 638 | "colab": {}, 639 | "colab_type": "code", 640 | "id": "p64q3-V4sXt0" 641 | }, 642 | "outputs": [ 643 | { 644 | "name": "stdout", 645 | "output_type": "stream", 646 | "text": [ 647 | "checkpoint\t\t\t cp-0025.ckpt.data-00001-of-00002\n", 648 | "cp-0000.ckpt.data-00000-of-00002 cp-0025.ckpt.index\n", 649 | "cp-0000.ckpt.data-00001-of-00002 cp-0030.ckpt.data-00000-of-00002\n", 650 | "cp-0000.ckpt.index\t\t cp-0030.ckpt.data-00001-of-00002\n", 651 | "cp-0005.ckpt.data-00000-of-00002 cp-0030.ckpt.index\n", 652 | "cp-0005.ckpt.data-00001-of-00002 cp-0035.ckpt.data-00000-of-00002\n", 653 | "cp-0005.ckpt.index\t\t cp-0035.ckpt.data-00001-of-00002\n", 654 | "cp-0010.ckpt.data-00000-of-00002 cp-0035.ckpt.index\n", 655 | "cp-0010.ckpt.data-00001-of-00002 cp-0040.ckpt.data-00000-of-00002\n", 656 | "cp-0010.ckpt.index\t\t cp-0040.ckpt.data-00001-of-00002\n", 657 | "cp-0015.ckpt.data-00000-of-00002 cp-0040.ckpt.index\n", 658 | "cp-0015.ckpt.data-00001-of-00002 cp-0045.ckpt.data-00000-of-00002\n", 659 | "cp-0015.ckpt.index\t\t cp-0045.ckpt.data-00001-of-00002\n", 660 | "cp-0020.ckpt.data-00000-of-00002 cp-0045.ckpt.index\n", 661 | "cp-0020.ckpt.data-00001-of-00002 cp-0050.ckpt.data-00000-of-00002\n", 662 | "cp-0020.ckpt.index\t\t cp-0050.ckpt.data-00001-of-00002\n", 663 | "cp-0025.ckpt.data-00000-of-00002 cp-0050.ckpt.index\n" 664 | ] 665 | } 666 | ], 667 | "source": [ 668 | "! ls {checkpoint_dir}" 669 | ] 670 | }, 671 | { 672 | "cell_type": "code", 673 | "execution_count": 14, 674 | "metadata": { 675 | "colab": {}, 676 | "colab_type": "code", 677 | "id": "1AN_fnuyR41H" 678 | }, 679 | "outputs": [ 680 | { 681 | "data": { 682 | "text/plain": [ 683 | "'training_2/cp-0050.ckpt'" 684 | ] 685 | }, 686 | "execution_count": 14, 687 | "metadata": {}, 688 | "output_type": "execute_result" 689 | } 690 | ], 691 | "source": [ 692 | "latest = tf.train.latest_checkpoint(checkpoint_dir)\n", 693 | "latest" 694 | ] 695 | }, 696 | { 697 | "cell_type": "markdown", 698 | "metadata": { 699 | "colab_type": "text", 700 | "id": "Zk2ciGbKg561" 701 | }, 702 | "source": [ 703 | "Note: the default tensorflow format only saves the 5 most recent checkpoints.\n", 704 | "\n", 705 | "To test, reset the model and load the latest checkpoint:" 706 | ] 707 | }, 708 | { 709 | "cell_type": "code", 710 | "execution_count": 15, 711 | "metadata": { 712 | "colab": {}, 713 | "colab_type": "code", 714 | "id": "3M04jyK-H3QK" 715 | }, 716 | "outputs": [ 717 | { 718 | "name": "stdout", 719 | "output_type": "stream", 720 | "text": [ 721 | "1000/1000 [==============================] - 0s 90us/sample - loss: 0.5014 - accuracy: 0.8730\n", 722 | "Restored model, accuracy: 87.30%\n" 723 | ] 724 | } 725 | ], 726 | "source": [ 727 | "model = create_model()\n", 728 | "model.load_weights(latest)\n", 729 | "loss, acc = model.evaluate(test_images, test_labels)\n", 730 | "print(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))" 731 | ] 732 | }, 733 | { 734 | "cell_type": "markdown", 735 | "metadata": { 736 | "colab_type": "text", 737 | "id": "c2OxsJOTHxia" 738 | }, 739 | "source": [ 740 | "## What are these files?" 741 | ] 742 | }, 743 | { 744 | "cell_type": "markdown", 745 | "metadata": { 746 | "colab_type": "text", 747 | "id": "JtdYhvWnH2ib" 748 | }, 749 | "source": [ 750 | "The above code stores the weights to a collection of [checkpoint](https://www.tensorflow.org/guide/saved_model#save_and_restore_variables)-formatted files that contain only the trained weights in a binary format. Checkpoints contain:\n", 751 | "* One or more shards that contain your model's weights.\n", 752 | "* An index file that indicates which weights are stored in a which shard.\n", 753 | "\n", 754 | "If you are only training a model on a single machine, you'll have one shard with the suffix: `.data-00000-of-00001`" 755 | ] 756 | }, 757 | { 758 | "cell_type": "markdown", 759 | "metadata": { 760 | "colab_type": "text", 761 | "id": "S_FA-ZvxuXQV" 762 | }, 763 | "source": [ 764 | "## Manually save weights\n", 765 | "\n", 766 | "Above you saw how to load the weights into a model.\n", 767 | "\n", 768 | "Manually saving the weights is just as simple, use the `Model.save_weights` method." 769 | ] 770 | }, 771 | { 772 | "cell_type": "code", 773 | "execution_count": 16, 774 | "metadata": { 775 | "colab": {}, 776 | "colab_type": "code", 777 | "id": "R7W5plyZ-u9X" 778 | }, 779 | "outputs": [ 780 | { 781 | "name": "stdout", 782 | "output_type": "stream", 783 | "text": [ 784 | "1000/1000 [==============================] - 0s 92us/sample - loss: 0.5014 - accuracy: 0.8730\n", 785 | "Restored model, accuracy: 87.30%\n" 786 | ] 787 | } 788 | ], 789 | "source": [ 790 | "# Save the weights\n", 791 | "model.save_weights('./checkpoints/my_checkpoint')\n", 792 | "\n", 793 | "# Restore the weights\n", 794 | "model = create_model()\n", 795 | "model.load_weights('./checkpoints/my_checkpoint')\n", 796 | "\n", 797 | "loss,acc = model.evaluate(test_images, test_labels)\n", 798 | "print(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))" 799 | ] 800 | }, 801 | { 802 | "cell_type": "markdown", 803 | "metadata": { 804 | "colab_type": "text", 805 | "id": "kOGlxPRBEvV1" 806 | }, 807 | "source": [ 808 | "## Save the entire model\n", 809 | "\n", 810 | "The model and optimizer can be saved to a file that contains both their state (weights and variables), and the model configuration. This allows you to export a model so it can be used without access to the original python code. Since the optimizer-state is recovered you can even resume training from exactly where you left off.\n", 811 | "\n", 812 | "Saving a fully-functional model is very useful—you can load them in TensorFlow.js ([HDF5](https://js.tensorflow.org/tutorials/import-keras.html), [Saved Model](https://js.tensorflow.org/tutorials/import-saved-model.html)) and then train and run them in web browsers, or convert them to run on mobile devices using TensorFlow Lite ([HDF5](https://www.tensorflow.org/lite/convert/python_api#exporting_a_tfkeras_file_), [Saved Model](https://www.tensorflow.org/lite/convert/python_api#exporting_a_savedmodel_))" 813 | ] 814 | }, 815 | { 816 | "cell_type": "markdown", 817 | "metadata": { 818 | "colab_type": "text", 819 | "id": "SkGwf-50zLNn" 820 | }, 821 | "source": [ 822 | "### As an HDF5 file\n", 823 | "\n", 824 | "Keras provides a basic save format using the [HDF5](https://en.wikipedia.org/wiki/Hierarchical_Data_Format) standard. For our purposes, the saved model can be treated as a single binary blob." 825 | ] 826 | }, 827 | { 828 | "cell_type": "code", 829 | "execution_count": 18, 830 | "metadata": { 831 | "colab": {}, 832 | "colab_type": "code", 833 | "id": "m2dkmJVCGUia" 834 | }, 835 | "outputs": [ 836 | { 837 | "name": "stdout", 838 | "output_type": "stream", 839 | "text": [ 840 | "Train on 1000 samples\n", 841 | "Epoch 1/5\n", 842 | "1000/1000 [==============================] - 0s 228us/sample - loss: 1.1308 - accuracy: 0.6570\n", 843 | "Epoch 2/5\n", 844 | "1000/1000 [==============================] - 0s 65us/sample - loss: 0.4309 - accuracy: 0.8740\n", 845 | "Epoch 3/5\n", 846 | "1000/1000 [==============================] - 0s 65us/sample - loss: 0.2839 - accuracy: 0.9320\n", 847 | "Epoch 4/5\n", 848 | "1000/1000 [==============================] - 0s 64us/sample - loss: 0.2064 - accuracy: 0.9460\n", 849 | "Epoch 5/5\n", 850 | "1000/1000 [==============================] - 0s 65us/sample - loss: 0.1507 - accuracy: 0.9690\n" 851 | ] 852 | } 853 | ], 854 | "source": [ 855 | "model = create_model()\n", 856 | "\n", 857 | "model.fit(train_images, train_labels, epochs=5)\n", 858 | "\n", 859 | "# Save entire model to a HDF5 file\n", 860 | "model.save('my_model.h5')" 861 | ] 862 | }, 863 | { 864 | "cell_type": "markdown", 865 | "metadata": { 866 | "colab_type": "text", 867 | "id": "GWmttMOqS68S" 868 | }, 869 | "source": [ 870 | "Now recreate the model from that file:" 871 | ] 872 | }, 873 | { 874 | "cell_type": "code", 875 | "execution_count": 19, 876 | "metadata": { 877 | "colab": {}, 878 | "colab_type": "code", 879 | "id": "5NDMO_7kS6Do" 880 | }, 881 | "outputs": [ 882 | { 883 | "name": "stderr", 884 | "output_type": "stream", 885 | "text": [ 886 | "W0713 16:34:17.752168 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.iter\n", 887 | "W0713 16:34:17.752822 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_1\n", 888 | "W0713 16:34:17.753264 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_2\n", 889 | "W0713 16:34:17.754821 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.decay\n", 890 | "W0713 16:34:17.756290 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.learning_rate\n", 891 | "W0713 16:34:17.757513 46912496740800 util.py:252] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details.\n" 892 | ] 893 | }, 894 | { 895 | "name": "stdout", 896 | "output_type": "stream", 897 | "text": [ 898 | "Model: \"sequential_6\"\n", 899 | "_________________________________________________________________\n", 900 | "Layer (type) Output Shape Param # \n", 901 | "=================================================================\n", 902 | "dense_12 (Dense) (None, 512) 401920 \n", 903 | "_________________________________________________________________\n", 904 | "dropout_6 (Dropout) (None, 512) 0 \n", 905 | "_________________________________________________________________\n", 906 | "dense_13 (Dense) (None, 10) 5130 \n", 907 | "=================================================================\n", 908 | "Total params: 407,050\n", 909 | "Trainable params: 407,050\n", 910 | "Non-trainable params: 0\n", 911 | "_________________________________________________________________\n" 912 | ] 913 | } 914 | ], 915 | "source": [ 916 | "# Recreate the exact same model, including weights and optimizer.\n", 917 | "new_model = keras.models.load_model('my_model.h5')\n", 918 | "new_model.summary()" 919 | ] 920 | }, 921 | { 922 | "cell_type": "markdown", 923 | "metadata": { 924 | "colab_type": "text", 925 | "id": "JXQpbTicTBwt" 926 | }, 927 | "source": [ 928 | "Check its accuracy:" 929 | ] 930 | }, 931 | { 932 | "cell_type": "code", 933 | "execution_count": 20, 934 | "metadata": { 935 | "colab": {}, 936 | "colab_type": "code", 937 | "id": "jwEaj9DnTCVA" 938 | }, 939 | "outputs": [ 940 | { 941 | "name": "stdout", 942 | "output_type": "stream", 943 | "text": [ 944 | "1000/1000 [==============================] - 0s 95us/sample - loss: 0.4257 - accuracy: 0.8540\n", 945 | "Restored model, accuracy: 85.40%\n" 946 | ] 947 | } 948 | ], 949 | "source": [ 950 | "loss, acc = new_model.evaluate(test_images, test_labels)\n", 951 | "print(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))" 952 | ] 953 | }, 954 | { 955 | "cell_type": "markdown", 956 | "metadata": { 957 | "colab_type": "text", 958 | "id": "dGXqd4wWJl8O" 959 | }, 960 | "source": [ 961 | "This technique saves everything:\n", 962 | "\n", 963 | "* The weight values\n", 964 | "* The model's configuration(architecture)\n", 965 | "* The optimizer configuration\n", 966 | "\n", 967 | "Keras saves models by inspecting the architecture. Currently, it is not able to save TensorFlow optimizers (from `tf.train`). When using those you will need to re-compile the model after loading, and you will lose the state of the optimizer.\n" 968 | ] 969 | }, 970 | { 971 | "cell_type": "markdown", 972 | "metadata": { 973 | "colab_type": "text", 974 | "id": "kPyhgcoVzqUB" 975 | }, 976 | "source": [ 977 | "### As a `saved_model`" 978 | ] 979 | }, 980 | { 981 | "cell_type": "markdown", 982 | "metadata": { 983 | "colab_type": "text", 984 | "id": "LtcN4VIb7JkK" 985 | }, 986 | "source": [ 987 | "Caution: This method of saving a `tf.keras` model is experimental and may change in future versions." 988 | ] 989 | }, 990 | { 991 | "cell_type": "markdown", 992 | "metadata": { 993 | "colab_type": "text", 994 | "id": "DSWiSB0Q8c46" 995 | }, 996 | "source": [ 997 | "Build a fresh model:" 998 | ] 999 | }, 1000 | { 1001 | "cell_type": "code", 1002 | "execution_count": 21, 1003 | "metadata": { 1004 | "colab": {}, 1005 | "colab_type": "code", 1006 | "id": "sI1YvCDFzpl3" 1007 | }, 1008 | "outputs": [ 1009 | { 1010 | "name": "stdout", 1011 | "output_type": "stream", 1012 | "text": [ 1013 | "Train on 1000 samples\n", 1014 | "Epoch 1/5\n", 1015 | "1000/1000 [==============================] - 0s 237us/sample - loss: 1.1295 - accuracy: 0.6870\n", 1016 | "Epoch 2/5\n", 1017 | "1000/1000 [==============================] - 0s 72us/sample - loss: 0.4160 - accuracy: 0.8890\n", 1018 | "Epoch 3/5\n", 1019 | "1000/1000 [==============================] - 0s 67us/sample - loss: 0.2783 - accuracy: 0.9330\n", 1020 | "Epoch 4/5\n", 1021 | "1000/1000 [==============================] - 0s 65us/sample - loss: 0.2048 - accuracy: 0.9510\n", 1022 | "Epoch 5/5\n", 1023 | "1000/1000 [==============================] - 0s 65us/sample - loss: 0.1506 - accuracy: 0.9690\n" 1024 | ] 1025 | }, 1026 | { 1027 | "data": { 1028 | "text/plain": [ 1029 | "" 1030 | ] 1031 | }, 1032 | "execution_count": 21, 1033 | "metadata": {}, 1034 | "output_type": "execute_result" 1035 | } 1036 | ], 1037 | "source": [ 1038 | "model = create_model()\n", 1039 | "\n", 1040 | "model.fit(train_images, train_labels, epochs=5)" 1041 | ] 1042 | }, 1043 | { 1044 | "cell_type": "markdown", 1045 | "metadata": { 1046 | "colab_type": "text", 1047 | "id": "iUvT_3qE8hV5" 1048 | }, 1049 | "source": [ 1050 | "Create a `saved_model`, and place it in a time-stamped directory:" 1051 | ] 1052 | }, 1053 | { 1054 | "cell_type": "code", 1055 | "execution_count": 22, 1056 | "metadata": { 1057 | "colab": {}, 1058 | "colab_type": "code", 1059 | "id": "sq8fPglI1RWA" 1060 | }, 1061 | "outputs": [ 1062 | { 1063 | "name": "stderr", 1064 | "output_type": "stream", 1065 | "text": [ 1066 | "W0713 16:34:53.631388 46912496740800 deprecation.py:323] From /usr/common/software/tensorflow/gpu-tensorflow/2.0.0-beta-py36/lib/python3.6/site-packages/tensorflow/python/saved_model/signature_def_utils_impl.py:253: build_tensor_info (from tensorflow.python.saved_model.utils_impl) is deprecated and will be removed in a future version.\n", 1067 | "Instructions for updating:\n", 1068 | "This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.utils.build_tensor_info or tf.compat.v1.saved_model.build_tensor_info.\n", 1069 | "W0713 16:34:53.633486 46912496740800 export_utils.py:182] Export includes no default signature!\n", 1070 | "W0713 16:34:53.839863 46912496740800 export_utils.py:182] Export includes no default signature!\n" 1071 | ] 1072 | }, 1073 | { 1074 | "data": { 1075 | "text/plain": [ 1076 | "'./saved_models/1563060893'" 1077 | ] 1078 | }, 1079 | "execution_count": 22, 1080 | "metadata": {}, 1081 | "output_type": "execute_result" 1082 | } 1083 | ], 1084 | "source": [ 1085 | "import time\n", 1086 | "saved_model_path = \"./saved_models/{}\".format(int(time.time()))\n", 1087 | "\n", 1088 | "tf.keras.experimental.export_saved_model(model, saved_model_path)\n", 1089 | "saved_model_path" 1090 | ] 1091 | }, 1092 | { 1093 | "cell_type": "markdown", 1094 | "metadata": { 1095 | "colab_type": "text", 1096 | "id": "MjpmyPfh8-1n" 1097 | }, 1098 | "source": [ 1099 | "List your saved models:" 1100 | ] 1101 | }, 1102 | { 1103 | "cell_type": "code", 1104 | "execution_count": 23, 1105 | "metadata": { 1106 | "colab": {}, 1107 | "colab_type": "code", 1108 | "id": "ZtOvxA7V0iTv" 1109 | }, 1110 | "outputs": [ 1111 | { 1112 | "name": "stdout", 1113 | "output_type": "stream", 1114 | "text": [ 1115 | "1563060893\n" 1116 | ] 1117 | } 1118 | ], 1119 | "source": [ 1120 | "!ls saved_models/" 1121 | ] 1122 | }, 1123 | { 1124 | "cell_type": "markdown", 1125 | "metadata": { 1126 | "colab_type": "text", 1127 | "id": "B7qfpvpY9HCe" 1128 | }, 1129 | "source": [ 1130 | "Reload a fresh keras model from the saved model." 1131 | ] 1132 | }, 1133 | { 1134 | "cell_type": "code", 1135 | "execution_count": 24, 1136 | "metadata": { 1137 | "colab": {}, 1138 | "colab_type": "code", 1139 | "id": "0YofwHdN0pxa" 1140 | }, 1141 | "outputs": [ 1142 | { 1143 | "name": "stderr", 1144 | "output_type": "stream", 1145 | "text": [ 1146 | "W0713 16:35:09.242412 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_1\n", 1147 | "W0713 16:35:09.243044 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_2\n", 1148 | "W0713 16:35:09.243519 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.decay\n", 1149 | "W0713 16:35:09.243970 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.learning_rate\n", 1150 | "W0713 16:35:09.244395 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.kernel\n", 1151 | "W0713 16:35:09.244815 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.bias\n", 1152 | "W0713 16:35:09.245213 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.kernel\n", 1153 | "W0713 16:35:09.245635 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.bias\n", 1154 | "W0713 16:35:09.246025 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.kernel\n", 1155 | "W0713 16:35:09.246461 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.bias\n", 1156 | "W0713 16:35:09.250754 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.kernel\n", 1157 | "W0713 16:35:09.251188 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.bias\n", 1158 | "W0713 16:35:09.251616 46912496740800 util.py:252] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details.\n", 1159 | "W0713 16:35:09.254225 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer\n", 1160 | "W0713 16:35:09.254666 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.iter\n", 1161 | "W0713 16:35:09.255103 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_1\n", 1162 | "W0713 16:35:09.255517 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.beta_2\n", 1163 | "W0713 16:35:09.255945 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.decay\n", 1164 | "W0713 16:35:09.256351 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer.learning_rate\n", 1165 | "W0713 16:35:09.262776 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.kernel\n", 1166 | "W0713 16:35:09.263218 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-0.bias\n", 1167 | "W0713 16:35:09.263681 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.kernel\n", 1168 | "W0713 16:35:09.264089 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'm' for (root).layer_with_weights-1.bias\n", 1169 | "W0713 16:35:09.264487 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.kernel\n", 1170 | "W0713 16:35:09.264889 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-0.bias\n", 1171 | "W0713 16:35:09.265260 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.kernel\n", 1172 | "W0713 16:35:09.265689 46912496740800 util.py:244] Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).layer_with_weights-1.bias\n", 1173 | "W0713 16:35:09.266100 46912496740800 util.py:252] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/alpha/guide/checkpoints#loading_mechanics for details.\n" 1174 | ] 1175 | }, 1176 | { 1177 | "name": "stdout", 1178 | "output_type": "stream", 1179 | "text": [ 1180 | "Model: \"sequential_7\"\n", 1181 | "_________________________________________________________________\n", 1182 | "Layer (type) Output Shape Param # \n", 1183 | "=================================================================\n", 1184 | "dense_14 (Dense) (None, 512) 401920 \n", 1185 | "_________________________________________________________________\n", 1186 | "dropout_7 (Dropout) (None, 512) 0 \n", 1187 | "_________________________________________________________________\n", 1188 | "dense_15 (Dense) (None, 10) 5130 \n", 1189 | "=================================================================\n", 1190 | "Total params: 407,050\n", 1191 | "Trainable params: 407,050\n", 1192 | "Non-trainable params: 0\n", 1193 | "_________________________________________________________________\n" 1194 | ] 1195 | } 1196 | ], 1197 | "source": [ 1198 | "new_model = tf.keras.experimental.load_from_saved_model(saved_model_path)\n", 1199 | "new_model.summary()" 1200 | ] 1201 | }, 1202 | { 1203 | "cell_type": "markdown", 1204 | "metadata": { 1205 | "colab_type": "text", 1206 | "id": "uWwgNaz19TH2" 1207 | }, 1208 | "source": [ 1209 | "Run the restored model." 1210 | ] 1211 | }, 1212 | { 1213 | "cell_type": "code", 1214 | "execution_count": 25, 1215 | "metadata": { 1216 | "colab": {}, 1217 | "colab_type": "code", 1218 | "id": "Yh5Mu0yOgE5J" 1219 | }, 1220 | "outputs": [ 1221 | { 1222 | "data": { 1223 | "text/plain": [ 1224 | "(1000, 10)" 1225 | ] 1226 | }, 1227 | "execution_count": 25, 1228 | "metadata": {}, 1229 | "output_type": "execute_result" 1230 | } 1231 | ], 1232 | "source": [ 1233 | "model.predict(test_images).shape" 1234 | ] 1235 | }, 1236 | { 1237 | "cell_type": "code", 1238 | "execution_count": 26, 1239 | "metadata": { 1240 | "colab": {}, 1241 | "colab_type": "code", 1242 | "id": "Pc9e6G6w1AWG" 1243 | }, 1244 | "outputs": [ 1245 | { 1246 | "name": "stdout", 1247 | "output_type": "stream", 1248 | "text": [ 1249 | "1000/1000 [==============================] - 0s 90us/sample - loss: 0.4166 - accuracy: 0.8640\n", 1250 | "Restored model, accuracy: 86.40%\n" 1251 | ] 1252 | } 1253 | ], 1254 | "source": [ 1255 | "# The model has to be compiled before evaluating.\n", 1256 | "# This step is not required if the saved model is only being deployed.\n", 1257 | "\n", 1258 | "new_model.compile(optimizer=model.optimizer, # keep the optimizer that was loaded\n", 1259 | " loss='sparse_categorical_crossentropy',\n", 1260 | " metrics=['accuracy'])\n", 1261 | "\n", 1262 | "# Evaluate the restored model.\n", 1263 | "loss, acc = new_model.evaluate(test_images, test_labels)\n", 1264 | "print(\"Restored model, accuracy: {:5.2f}%\".format(100*acc))" 1265 | ] 1266 | }, 1267 | { 1268 | "cell_type": "markdown", 1269 | "metadata": { 1270 | "colab_type": "text", 1271 | "id": "eUYTzSz5VxL2" 1272 | }, 1273 | "source": [ 1274 | "## What's Next\n", 1275 | "\n", 1276 | "That was a quick guide to saving and loading in with `tf.keras`.\n", 1277 | "\n", 1278 | "* The [tf.keras guide](https://www.tensorflow.org/guide/keras) shows more about saving and loading models with `tf.keras`.\n", 1279 | "\n", 1280 | "* See [Saving in eager](https://www.tensorflow.org/guide/eager#object_based_saving) for saving during eager execution.\n", 1281 | "\n", 1282 | "* The [Save and Restore](https://www.tensorflow.org/guide/saved_model) guide has low-level details about TensorFlow saving." 1283 | ] 1284 | } 1285 | ], 1286 | "metadata": { 1287 | "accelerator": "GPU", 1288 | "colab": { 1289 | "collapsed_sections": [], 1290 | "name": "save_and_restore_models.ipynb", 1291 | "private_outputs": true, 1292 | "provenance": [], 1293 | "toc_visible": true, 1294 | "version": "0.3.2" 1295 | }, 1296 | "kernelspec": { 1297 | "display_name": "tensorflow-gpu/2.0.0-beta-py36", 1298 | "language": "python", 1299 | "name": "tensorflow_gpu_2.0.0-beta-py36" 1300 | }, 1301 | "language_info": { 1302 | "codemirror_mode": { 1303 | "name": "ipython", 1304 | "version": 3 1305 | }, 1306 | "file_extension": ".py", 1307 | "mimetype": "text/x-python", 1308 | "name": "python", 1309 | "nbconvert_exporter": "python", 1310 | "pygments_lexer": "ipython3", 1311 | "version": "3.6.8" 1312 | } 1313 | }, 1314 | "nbformat": 4, 1315 | "nbformat_minor": 2 1316 | } 1317 | -------------------------------------------------------------------------------- /tensorboard_nersc_helper.py: -------------------------------------------------------------------------------- 1 | import os, pwd 2 | from tensorboard import notebook 3 | import getpass 4 | from IPython.core.display import display, HTML 5 | 6 | def get_pid_owner(pid): 7 | # the /proc/PID is owned by process creator 8 | proc_stat_file = os.stat("/proc/%d" % pid) 9 | # get UID via stat call 10 | uid = proc_stat_file.st_uid 11 | # look up the username from uid 12 | username = pwd.getpwuid(uid)[0] 13 | 14 | return username 15 | 16 | def get_tb_port(username): 17 | 18 | for tb_nb in notebook.manager.get_all(): 19 | if get_pid_owner(tb_nb.pid) == username: 20 | return tb_nb.port 21 | 22 | def show_tb_address(): 23 | 24 | username = getpass.getuser() 25 | tb_port = get_tb_port(username) 26 | 27 | address = "https://jupyter-dl.nersc.gov/user/" + "username" + "/proxy/" + str(tb_port) + "/" 28 | address = address.strip() 29 | 30 | display(HTML('%s'%(address,address))) 31 | -------------------------------------------------------------------------------- /test_tensorboard.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import tensorflow as tf\n", 10 | "\n", 11 | "### To run tensorboard in Jupyter you first import the extension\n", 12 | "%load_ext tensorboard" 13 | ] 14 | }, 15 | { 16 | "cell_type": "code", 17 | "execution_count": 2, 18 | "metadata": {}, 19 | "outputs": [ 20 | { 21 | "data": { 22 | "text/html": [ 23 | "\n", 24 | " \n", 31 | " " 32 | ], 33 | "text/plain": [ 34 | "" 35 | ] 36 | }, 37 | "metadata": {}, 38 | "output_type": "display_data" 39 | } 40 | ], 41 | "source": [ 42 | "### Now you can run a tensorboard server. Note that port 0 asks tensorboard to use a port not already in use\n", 43 | "%tensorboard --logdir logs --port 0" 44 | ] 45 | }, 46 | { 47 | "cell_type": "code", 48 | "execution_count": 3, 49 | "metadata": {}, 50 | "outputs": [ 51 | { 52 | "data": { 53 | "text/html": [ 54 | "https://jupyter-dl.nersc.gov/user/username/proxy/40817/" 55 | ], 56 | "text/plain": [ 57 | "" 58 | ] 59 | }, 60 | "metadata": {}, 61 | "output_type": "display_data" 62 | } 63 | ], 64 | "source": [ 65 | "### The following function will provide you with an address to connect to the tensorboard instance\n", 66 | "import tensorboard_nersc_helper\n", 67 | "tensorboard_nersc_helper.show_tb_address()" 68 | ] 69 | }, 70 | { 71 | "cell_type": "raw", 72 | "metadata": {}, 73 | "source": [ 74 | "# Now you need to add tensorboard Keras callback to your code\n", 75 | "# example below:\n", 76 | "\n", 77 | "logdir = os.path.join(\"logs\", datetime.datetime.now().strftime(\"%Y%m%d-%H%M%S\"))\n", 78 | "tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)\n", 79 | "\n", 80 | "model.fit(x=x_train,\n", 81 | " y=y_train,\n", 82 | " epochs=5,\n", 83 | " validation_data=(x_test, y_test),\n", 84 | " callbacks=[tensorboard_callback])" 85 | ] 86 | } 87 | ], 88 | "metadata": { 89 | "kernelspec": { 90 | "display_name": "tensorflow-gpu/2.0.0-beta-py36", 91 | "language": "python", 92 | "name": "tensorflow_gpu_2.0.0-beta-py36" 93 | }, 94 | "language_info": { 95 | "codemirror_mode": { 96 | "name": "ipython", 97 | "version": 3 98 | }, 99 | "file_extension": ".py", 100 | "mimetype": "text/x-python", 101 | "name": "python", 102 | "nbconvert_exporter": "python", 103 | "pygments_lexer": "ipython3", 104 | "version": "3.6.8" 105 | } 106 | }, 107 | "nbformat": 4, 108 | "nbformat_minor": 2 109 | } 110 | --------------------------------------------------------------------------------