├── Readme.md ├── Reviewed ├── batch.png ├── lstm.png └── lstm_stock_market_prediction.ipynb ├── requirements.txt ├── requirements_gpu.txt └── tensorboard_tutorial ├── Tensorboard Tutorial.ipynb ├── tensorboard_1.PNG ├── tensorboard_2.png ├── tensorboard_3_1.PNG ├── tensorboard_3_2.PNG ├── tensorboard_3_3.PNG ├── tensorboard_graph.PNG └── tensorboard_histogram_vs_distribution_views.png /Readme.md: -------------------------------------------------------------------------------- 1 | ## Introduction 2 | This repository contains various tutorials written for DataCamp. 3 | 4 | ## How to run code? 5 | I like to keep the Python setup in my OS very simple and create virtual environments with required custom libraries depending on the project I want to run. I would very much like to use Docker for this purpose, as that is the de-facto standard for process isolation. However since I am using Windows there is no way for me to expose my GPU to Docker. Thus, I opt to `conda` and Python virtual environment. 6 | 7 | ### Using conda environment 8 | 1. Download and install [Anaconda](https://www.anaconda.com/download/#windows) 9 | 2. Make sure conda is in the system PATH by trying `conda --version` on a terminal 10 | 3. Create a conda virtual environment using `conda create -n datacamp.tutorials python=3.5` 11 | 4. `cd` into the project directory 12 | 5. Install tensorflow as follows 13 | * If you **do not** have a GPU use: `conda install -n datacamp.tutorial --yes --file requirements.txt` 14 | * If you **do** have a GPU use: `conda install -n datacamp.tutorials --yes --file requirements_gpu.txt` 15 | 6. Activate the newly created environment with `activate datacamp.tutorials` 16 | 17 | Further reading on how to setup conda environments: [Here](https://uoa-eresearch.github.io/eresearch-cookbook/recipe/2014/11/20/conda/) 18 | 19 | ### Using Python virtualenv 20 | I prefer conda because numpy, pandas and tensorflow CPU operations are much faster than when used with pip according to this [article](https://towardsdatascience.com/stop-installing-tensorflow-using-pip-for-performance-sake-5854f9d9eb0c). But if you prefer to use Python virtualenv, use the following steps. 21 | 22 | 1. Download and install Python 3.5 23 | 2. Now install `virtualenv` with `pip3 install virtualenv` 24 | 3. `cd` into the project directory 25 | 4. Create a virtual environment with `virtualenv -p datacamp.tutorials` 26 | 5. Activate the virtual environment as follows 27 | * If you are on **Windows**: `\datacamp.tutorials\Scripts\activate` 28 | * If you are on **Ubuntu**: `source \datacamp.tutorials\bin\activate` 29 | 6. Install tensorflow as follows 30 | * If you **do not** have a GPU use: `pip3 install -r requirements.txt` 31 | * If you **do** have a GPU use: `pip3 install -r requirements_gpu.txt` -------------------------------------------------------------------------------- /Reviewed/batch.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thushv89/datacamp_tutorials/5d2461bf49a4029b27d1f5afa8570aa5234dd1ca/Reviewed/batch.png -------------------------------------------------------------------------------- /Reviewed/lstm.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thushv89/datacamp_tutorials/5d2461bf49a4029b27d1f5afa8570aa5234dd1ca/Reviewed/lstm.png -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | pandas 3 | tensorflow 4 | matplotlib 5 | scikit-learn 6 | jupyter -------------------------------------------------------------------------------- /requirements_gpu.txt: -------------------------------------------------------------------------------- 1 | numpy 2 | pandas 3 | tensorflow-gpu 4 | matplotlib 5 | scikit-learn 6 | jupyter -------------------------------------------------------------------------------- /tensorboard_tutorial/Tensorboard Tutorial.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Tensorboard Tutorial\n", 8 | "\n", 9 | "This tutorial will guid you on how to use the Tensorboard. Tensorboard is an amazing utility that allows us to visualize data and how it behaves. In this tutorial, you will see for what sort of purposes you can use the Tensorboard when training a neural network. \n", 10 | "\n", 11 | "First you will be explained how to start the Tensorboard, followed by an enlisting of the different views offered in the Tesorboard. Next you can learn how you can visualize scalar values produced during computations, on the Tensorboard. You will also see how this provide insights to the model to fix any potential errors in the learning. Thereafter you will investigate how you can visualize vectors or collections of data as histograms using the Tensorboard. With this view you will compare how weight initialization of the neural network affects the weight update of the neural network during the learning." 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 1, 17 | "metadata": {}, 18 | "outputs": [ 19 | { 20 | "name": "stderr", 21 | "output_type": "stream", 22 | "text": [ 23 | "c:\\users\\thushan\\documents\\python_virtualenvs\\tensorflow_venv\\lib\\site-packages\\h5py\\__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.\n", 24 | " from ._conv import register_converters as _register_converters\n" 25 | ] 26 | } 27 | ], 28 | "source": [ 29 | "# Make sure that you have all these libaries available to run the code successfully\n", 30 | "from pandas_datareader import data\n", 31 | "import matplotlib.pyplot as plt\n", 32 | "import pandas as pd\n", 33 | "import datetime as dt\n", 34 | "import urllib.request, json \n", 35 | "import os\n", 36 | "import numpy as np\n", 37 | "import tensorflow as tf # This code has been tested with TensorFlow 1.6\n", 38 | "from tensorflow.examples.tutorials.mnist import input_data" 39 | ] 40 | }, 41 | { 42 | "cell_type": "markdown", 43 | "metadata": {}, 44 | "source": [ 45 | "## Starting Tensorboard\n", 46 | "To visualize things via Tensorboard, you first need to start the Tensorboard service. For that,\n", 47 | "\n", 48 | "1. Open up the command prompt (Windows) or a terminal (Ubuntu/Mac)\n", 49 | "2. Go into the project home directory\n", 50 | "3. If you are using python virtuanenv, activate the virtual environment you have installed TensorFlow\n", 51 | "4. Make sure you can see the TensorFlow library through Python. For that,\n", 52 | " * Type in `python3`, you will get a \">>>\" looking prompt\n", 53 | " * Try `import tensorflow as tf`\n", 54 | " * If you can run this successfully you are fine\n", 55 | "5. Exit the python prompt (that is, \">>>\") by typing `exit()` and type in the following command\n", 56 | " * `tensorboard --logdir=summaries`\n", 57 | " * `--logdir` is the directory you will create data to visualize\n", 58 | " * Files that Tensorboard save data into are called *event files*\n", 59 | " * Type of data saved into the event files is called *summary data*\n", 60 | " * Optionally you can use `--port=` to change the port Tensorboard runs on \n", 61 | "6. You should now get the following message\n", 62 | " * TensorBoard 1.6.0 at <url>:6006 (Press CTRL+C to quit)\n", 63 | "7. Enter the <url>:6006 in to the web browser\n", 64 | " * You should be able to see a orange dashboard at this point. You won't have anything to display because you haven't generated data.\n", 65 | "\n", 66 | "**Note**: Tensorboard does not like to see multiple event files in the same directory. This can lead to you getting very gruesome curves on the display. So for each different example you create a separate folder (e.g. summaries/first, summaries/second, ...) to save data. Another thing to keep in mind is that, if you want to re-run an experiment (that is, saving event file to an already populated folder), you have to make sure to first delete the existing event files.\n", 67 | "\n", 68 | "## Different Views of Tensorboard\n", 69 | "\n", 70 | "Different views take inputs of different formats and display them differently. You can change the view on the top orange bar on the Tensorboard\n", 71 | "* **Scalars** - Visualize scalar values (e.g. classification accuracy)\n", 72 | "* **Graph** - Visualize the computational graph of your model (e.g. neural network model)\n", 73 | "* **Distributions** - Visualize how data changes over time (e.g. weights of a neural network)\n", 74 | "* **Histograms** - A fancier view of the distribution that shows distributions in a 3-dimensional perspective\n", 75 | "* Projector - Can be used to visualize word embeddings (that is, word embeddings are numerical representations of words that capture their semantic relationships)\n", 76 | "* Image - Visualizing image data\n", 77 | "* Audio - Visualizing audio data\n", 78 | "* Text - Visualizing text (string) data\n", 79 | "\n", 80 | "In this tutorial you will cover the views shown in bold." 81 | ] 82 | }, 83 | { 84 | "cell_type": "markdown", 85 | "metadata": {}, 86 | "source": [ 87 | "## Understanding the Benefit of Scalar Visualization through Tensorboard\n", 88 | "\n", 89 | "In this section, you will first understand why visualizing certain things (e.g. loss or accuracy) is beneficial. When training deep neural networks, one of the crucial issues that strikes the beginners is the lack of understanding the effects of various design choices and hyperparameters. For example, if you carelessly initialize weights of a deep neural network to have a very large variance between weights, your model will quickly diverge and collapse. On the other hand, things can go wrong even when you are quite competent in taming neural networks to make use of them. For example, not paying attention to the learning rate can lead to either the divergence of the model or pre-maturely saturating to sub-optimal performance. \n", 90 | "\n", 91 | "One way quickly detect problems with your model is to have a graphical visualization of what's going on in our model in real time (for example, every 100 iterations). So if your model is behaving oddly, it will be clearly visible. That is exactly what tensorboard provides you with. You can decide which values needs to be displayed on the Tensorboard and Tensorboard will keep maintain a real time visualization of those values during learning.\n", 92 | "\n", 93 | "You start by first creating a five-layer neural network that you will use to classify hand-written digit images. For that you will use the famous MNIST dataset. TensorFlow provides a simple API to load MNIST data, so you don't have to manually download data. Before that you define a simple method (that is, `accuracy()`) that calculates the accuracy of some predictions with respect to the true labels." 94 | ] 95 | }, 96 | { 97 | "cell_type": "code", 98 | "execution_count": 2, 99 | "metadata": { 100 | "collapsed": true 101 | }, 102 | "outputs": [], 103 | "source": [ 104 | "def accuracy(predictions,labels):\n", 105 | " '''\n", 106 | " Accuracy of a given set of predictions of size (N x n_classes) and\n", 107 | " labels of size (N x n_classes)\n", 108 | " '''\n", 109 | " return np.sum(np.argmax(predictions,axis=1)==np.argmax(labels,axis=1))*100.0/labels.shape[0]" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "### Define Inputs, Outputs, Weights and Biases\n", 117 | "\n", 118 | "First you define a `batch_size` denoting the amount of data you sample at a single optimization/validation or testing step. Then you define the `layer_ids`, which gives an identifier for each of the layers of the neural network you will be defining. You then can define `layer_sizes`. Note that `len(layer_sizes)` should be `len(layer_ids)+1`, because `layer_sizes` includes the size of the input at the beginning. MNIST has images of size 28x28, which will be 784 when unwrapped to a single dimension. Then you can define the input and label placeholders, that you will later use to train the model. Finally you define two TensorFlow variables for each layer (that is, `weights` and `bias`).\n", 119 | "\n", 120 | "You can use variable scoping (more information [here](https://www.tensorflow.org/programmers_guide/variables)) so that the variables will be nicely named and will be much easier to access later." 121 | ] 122 | }, 123 | { 124 | "cell_type": "code", 125 | "execution_count": 3, 126 | "metadata": { 127 | "collapsed": true 128 | }, 129 | "outputs": [], 130 | "source": [ 131 | "batch_size = 100\n", 132 | "layer_ids = ['hidden1','hidden2','hidden3','hidden4','hidden5','out']\n", 133 | "layer_sizes = [784, 500, 400, 300, 200, 100, 10]\n", 134 | "\n", 135 | "tf.reset_default_graph()\n", 136 | "\n", 137 | "# Inputs and Labels\n", 138 | "train_inputs = tf.placeholder(tf.float32, shape=[batch_size, layer_sizes[0]], name='train_inputs')\n", 139 | "train_labels = tf.placeholder(tf.float32, shape=[batch_size, layer_sizes[-1]], name='train_labels')\n", 140 | "\n", 141 | "# Weight and Bias definitions\n", 142 | "for idx, lid in enumerate(layer_ids):\n", 143 | " \n", 144 | " with tf.variable_scope(lid):\n", 145 | " w = tf.get_variable('weights',shape=[layer_sizes[idx], layer_sizes[idx+1]], \n", 146 | " initializer=tf.truncated_normal_initializer(stddev=0.05))\n", 147 | " b = tf.get_variable('bias',shape= [layer_sizes[idx+1]], \n", 148 | " initializer=tf.random_uniform_initializer(-0.1,0.1))\n", 149 | " " 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "### Calculate Logits, Predictions, Loss and Optimization\n", 157 | "\n", 158 | "With the input/output placeholders, weights and biases of each layer defined, you now can define the calculations to calculate the logits of the neural network. Logits are the unnormalized values produced at the last layer of the neural network. When normalized, you call them predictions. This involves iterating through each layer in the neural network and computing `tf.matmul(h,w) +b`. You also need to apply an activation function as `tf.nn.relu(tf.matmul(h,w) +b)`, for all layers except for the last layer.\n", 159 | "\n", 160 | "Next you define loss function that is used to optimize the neural network. In this example, you can use the cross entropy loss, which often deliver better results in classification problems than the mean squared error.\n", 161 | "\n", 162 | "Finally you will need to define an optimizer that takes in the loss and update the weights of the neural network in the direction that minimizes the loss." 163 | ] 164 | }, 165 | { 166 | "cell_type": "code", 167 | "execution_count": 4, 168 | "metadata": { 169 | "collapsed": true 170 | }, 171 | "outputs": [], 172 | "source": [ 173 | "# Calculating Logits\n", 174 | "h = train_inputs\n", 175 | "for lid in layer_ids:\n", 176 | " with tf.variable_scope(lid,reuse=True):\n", 177 | " w, b = tf.get_variable('weights'), tf.get_variable('bias')\n", 178 | " if lid != 'out':\n", 179 | " h = tf.nn.relu(tf.matmul(h,w)+b,name=lid+'_output')\n", 180 | " else:\n", 181 | " h = tf.nn.xw_plus_b(h,w,b,name=lid+'_output')\n", 182 | "\n", 183 | "tf_predictions = tf.nn.softmax(h, name='predictions')\n", 184 | "# Calculating Loss\n", 185 | "tf_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=train_labels, logits=h),name='loss')\n", 186 | "\n", 187 | "# Optimizer \n", 188 | "tf_learning_rate = tf.placeholder(tf.float32, shape=None, name='learning_rate')\n", 189 | "optimizer = tf.train.MomentumOptimizer(tf_learning_rate,momentum=0.9)\n", 190 | "grads_and_vars = optimizer.compute_gradients(tf_loss)\n", 191 | "tf_loss_minimize = optimizer.minimize(tf_loss)\n" 192 | ] 193 | }, 194 | { 195 | "cell_type": "markdown", 196 | "metadata": {}, 197 | "source": [ 198 | "### Defining Tensorboard Summaries\n", 199 | "\n", 200 | "Here you can define the `tf.summary` objects. `tf.summary` objects are the type of entities understood by the Tensorboard. This means that whatever value you'd like to be displayed on the Tensorboard, you should encapsulate it as a `tf.summary` object. There are several different types of summaries. Here as you are visualizing only scalars, you can define `tf.summary.scalar` objects. Furthermore, you can use `tf.name_scope` to group scalars on the Tensorboard. That is, scalars having the same name scope will be displayed on the same row on the Tensorboard. Here you define three different summaries.\n", 201 | "\n", 202 | "* `tf_loss_summary` : You feed in a value by means of a placeholder, whenever you need to publish this to the Tensorboard\n", 203 | "* `tf_accuracy_summary` : You feed in a value by means of a placeholder, whenever you need to publish this to the Tensorboard\n", 204 | "* `tf_gradnorm_summary` : This calculates the l2 norm of the gradients of the last layer of your neural network. Gradient norm is a good indicator of whether the weights of the neural network are being properly updated. A too small gradient norm can indicate *vanishing gradient* or a too large gradient can imply *exploding gradient* phenomenon." 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 5, 210 | "metadata": { 211 | "collapsed": true 212 | }, 213 | "outputs": [], 214 | "source": [ 215 | "# Name scope allows you to group various summaries together\n", 216 | "# Summaries having the same name_scope will be displayed on the same row on the Tensorboard\n", 217 | "with tf.name_scope('performance'):\n", 218 | " # Summaries need to display on the Tensorboard\n", 219 | " # Whenever need to record the loss, feed the mean loss to this placeholder\n", 220 | " tf_loss_ph = tf.placeholder(tf.float32,shape=None,name='loss_summary') \n", 221 | " # Create a scalar summary object for the loss so Tensorboard knows how to display it\n", 222 | " tf_loss_summary = tf.summary.scalar('loss', tf_loss_ph)\n", 223 | "\n", 224 | " # Whenever you need to record the loss, feed the mean test accuracy to this placeholder\n", 225 | " tf_accuracy_ph = tf.placeholder(tf.float32,shape=None, name='accuracy_summary') \n", 226 | " # Create a scalar summary object for the accuracy so Tensorboard knows how to display it\n", 227 | " tf_accuracy_summary = tf.summary.scalar('accuracy', tf_accuracy_ph)\n", 228 | "\n", 229 | "# Gradient norm summary\n", 230 | "for g,v in grads_and_vars:\n", 231 | " if 'hidden5' in v.name and 'weights' in v.name:\n", 232 | " with tf.name_scope('gradients'):\n", 233 | " tf_last_grad_norm = tf.sqrt(tf.reduce_mean(g**2))\n", 234 | " tf_gradnorm_summary = tf.summary.scalar('grad_norm', tf_last_grad_norm)\n", 235 | " break\n", 236 | "# Merge all summaries together\n", 237 | "performance_summaries = tf.summary.merge([tf_loss_summary,tf_accuracy_summary])\n" 238 | ] 239 | }, 240 | { 241 | "cell_type": "markdown", 242 | "metadata": {}, 243 | "source": [ 244 | "### Executing the neural network model: Loading Data, Training, Validation and Testing\n", 245 | "\n", 246 | "In the code below you do the following. First you create a session, in which you execute the operations you defined above. Then you create folder for saving summary data. You next create a summary write `summ_writer`. You can now initialize all variables. This will be followed by loading the MNIST dataset.\n", 247 | "\n", 248 | "Then for each epoch, and each batch in training data (that is, each iteration). Execute `gradnorm_summary` if it is the first iteration and write `gradnorm_summary` to event file with summary writer. You now execute model optimization and calculating the loss. After you go through the full training dataset for a single epoch, calculate average training loss.\n", 249 | "\n", 250 | "You follow a similar treatment for the validation dataset as well. Specifically, for each batch in validation data, you calculate validation accuracy for each batch. Thereafter calculate average validation accuracy for full validation set.\n", 251 | "\n", 252 | "Finally, the testing phase is executed. In this, for each batch in test data, you calculate test accuracy for each batch. With that, you calculate average test accuracy for full test set. At the very end you execute `performance_summaries` and write them to event file with the summary writer." 253 | ] 254 | }, 255 | { 256 | "cell_type": "code", 257 | "execution_count": 6, 258 | "metadata": {}, 259 | "outputs": [ 260 | { 261 | "name": "stdout", 262 | "output_type": "stream", 263 | "text": [ 264 | "Successfully downloaded train-images-idx3-ubyte.gz 9912422 bytes.\n", 265 | "Extracting MNIST_data\\train-images-idx3-ubyte.gz\n", 266 | "Successfully downloaded train-labels-idx1-ubyte.gz 28881 bytes.\n", 267 | "Extracting MNIST_data\\train-labels-idx1-ubyte.gz\n", 268 | "Successfully downloaded t10k-images-idx3-ubyte.gz 1648877 bytes.\n", 269 | "Extracting MNIST_data\\t10k-images-idx3-ubyte.gz\n", 270 | "Successfully downloaded t10k-labels-idx1-ubyte.gz 4542 bytes.\n", 271 | "Extracting MNIST_data\\t10k-labels-idx1-ubyte.gz\n", 272 | "Average loss in epoch 0: 2.30480\n", 273 | "\tAverage Valid Accuracy in epoch 0: 8.24000\n", 274 | "\tAverage Test Accuracy in epoch 0: 8.67000\n", 275 | "\n", 276 | "Average loss in epoch 1: 2.30277\n", 277 | "\tAverage Valid Accuracy in epoch 1: 8.52000\n", 278 | "\tAverage Test Accuracy in epoch 1: 8.87000\n", 279 | "\n", 280 | "Average loss in epoch 2: 2.30079\n", 281 | "\tAverage Valid Accuracy in epoch 2: 8.84000\n", 282 | "\tAverage Test Accuracy in epoch 2: 9.21000\n", 283 | "\n", 284 | "Average loss in epoch 3: 2.29883\n", 285 | "\tAverage Valid Accuracy in epoch 3: 9.36000\n", 286 | "\tAverage Test Accuracy in epoch 3: 9.64000\n", 287 | "\n", 288 | "Average loss in epoch 4: 2.29682\n", 289 | "\tAverage Valid Accuracy in epoch 4: 10.68000\n", 290 | "\tAverage Test Accuracy in epoch 4: 10.63000\n", 291 | "\n", 292 | "Average loss in epoch 5: 2.29473\n", 293 | "\tAverage Valid Accuracy in epoch 5: 17.64000\n", 294 | "\tAverage Test Accuracy in epoch 5: 16.85000\n", 295 | "\n", 296 | "Average loss in epoch 6: 2.29249\n", 297 | "\tAverage Valid Accuracy in epoch 6: 24.02000\n", 298 | "\tAverage Test Accuracy in epoch 6: 23.65000\n", 299 | "\n", 300 | "Average loss in epoch 7: 2.29005\n", 301 | "\tAverage Valid Accuracy in epoch 7: 26.64000\n", 302 | "\tAverage Test Accuracy in epoch 7: 25.64000\n", 303 | "\n", 304 | "Average loss in epoch 8: 2.28732\n", 305 | "\tAverage Valid Accuracy in epoch 8: 27.64000\n", 306 | "\tAverage Test Accuracy in epoch 8: 26.53000\n", 307 | "\n", 308 | "Average loss in epoch 9: 2.28420\n", 309 | "\tAverage Valid Accuracy in epoch 9: 28.06000\n", 310 | "\tAverage Test Accuracy in epoch 9: 27.22000\n", 311 | "\n", 312 | "Average loss in epoch 10: 2.28056\n", 313 | "\tAverage Valid Accuracy in epoch 10: 28.84000\n", 314 | "\tAverage Test Accuracy in epoch 10: 28.27000\n", 315 | "\n", 316 | "Average loss in epoch 11: 2.27626\n", 317 | "\tAverage Valid Accuracy in epoch 11: 29.80000\n", 318 | "\tAverage Test Accuracy in epoch 11: 29.24000\n", 319 | "\n", 320 | "Average loss in epoch 12: 2.27104\n", 321 | "\tAverage Valid Accuracy in epoch 12: 31.48000\n", 322 | "\tAverage Test Accuracy in epoch 12: 31.13000\n", 323 | "\n", 324 | "Average loss in epoch 13: 2.26458\n", 325 | "\tAverage Valid Accuracy in epoch 13: 34.22000\n", 326 | "\tAverage Test Accuracy in epoch 13: 33.76000\n", 327 | "\n", 328 | "Average loss in epoch 14: 2.25642\n", 329 | "\tAverage Valid Accuracy in epoch 14: 38.02000\n", 330 | "\tAverage Test Accuracy in epoch 14: 37.85000\n", 331 | "\n", 332 | "Average loss in epoch 15: 2.24581\n", 333 | "\tAverage Valid Accuracy in epoch 15: 41.84000\n", 334 | "\tAverage Test Accuracy in epoch 15: 41.75000\n", 335 | "\n", 336 | "Average loss in epoch 16: 2.23146\n", 337 | "\tAverage Valid Accuracy in epoch 16: 45.04000\n", 338 | "\tAverage Test Accuracy in epoch 16: 44.63000\n", 339 | "\n", 340 | "Average loss in epoch 17: 2.21123\n", 341 | "\tAverage Valid Accuracy in epoch 17: 47.72000\n", 342 | "\tAverage Test Accuracy in epoch 17: 46.60000\n", 343 | "\n", 344 | "Average loss in epoch 18: 2.18142\n", 345 | "\tAverage Valid Accuracy in epoch 18: 48.84000\n", 346 | "\tAverage Test Accuracy in epoch 18: 47.44000\n", 347 | "\n", 348 | "Average loss in epoch 19: 2.13528\n", 349 | "\tAverage Valid Accuracy in epoch 19: 48.54000\n", 350 | "\tAverage Test Accuracy in epoch 19: 47.68000\n", 351 | "\n", 352 | "Average loss in epoch 20: 2.06068\n", 353 | "\tAverage Valid Accuracy in epoch 20: 45.86000\n", 354 | "\tAverage Test Accuracy in epoch 20: 45.84000\n", 355 | "\n", 356 | "Average loss in epoch 21: 1.94296\n", 357 | "\tAverage Valid Accuracy in epoch 21: 43.86000\n", 358 | "\tAverage Test Accuracy in epoch 21: 43.45000\n", 359 | "\n", 360 | "Average loss in epoch 22: 1.78510\n", 361 | "\tAverage Valid Accuracy in epoch 22: 47.84000\n", 362 | "\tAverage Test Accuracy in epoch 22: 47.24000\n", 363 | "\n", 364 | "Average loss in epoch 23: 1.60649\n", 365 | "\tAverage Valid Accuracy in epoch 23: 52.72000\n", 366 | "\tAverage Test Accuracy in epoch 23: 52.87000\n", 367 | "\n", 368 | "Average loss in epoch 24: 1.42325\n", 369 | "\tAverage Valid Accuracy in epoch 24: 59.08000\n", 370 | "\tAverage Test Accuracy in epoch 24: 58.69000\n", 371 | "\n" 372 | ] 373 | } 374 | ], 375 | "source": [ 376 | "\n", 377 | "image_size = 28\n", 378 | "n_channels = 1\n", 379 | "n_classes = 10\n", 380 | "n_train = 55000\n", 381 | "n_valid = 5000\n", 382 | "n_test = 10000\n", 383 | "n_epochs = 25\n", 384 | "\n", 385 | "config = tf.ConfigProto(allow_soft_placement=True)\n", 386 | "config.gpu_options.allow_growth = True\n", 387 | "config.gpu_options.per_process_gpu_memory_fraction = 0.9 # making sure Tensorflow doesn't overflow the GPU\n", 388 | "\n", 389 | "session = tf.InteractiveSession(config=config)\n", 390 | "\n", 391 | "if not os.path.exists('summaries'):\n", 392 | " os.mkdir('summaries')\n", 393 | "if not os.path.exists(os.path.join('summaries','first')):\n", 394 | " os.mkdir(os.path.join('summaries','first'))\n", 395 | "\n", 396 | "summ_writer = tf.summary.FileWriter(os.path.join('summaries','first'), session.graph)\n", 397 | "\n", 398 | "tf.global_variables_initializer().run()\n", 399 | "\n", 400 | "accuracy_per_epoch = []\n", 401 | "mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)\n", 402 | "\n", 403 | "\n", 404 | "for epoch in range(n_epochs):\n", 405 | " loss_per_epoch = []\n", 406 | " for i in range(n_train//batch_size):\n", 407 | " \n", 408 | " # =================================== Training for one step ========================================\n", 409 | " batch = mnist_data.train.next_batch(batch_size) # Get one batch of training data\n", 410 | " if i == 0:\n", 411 | " # Only for the first epoch, get the summary data\n", 412 | " # Otherwise, it can clutter the visualization\n", 413 | " l,_,gn_summ = session.run([tf_loss,tf_loss_minimize,tf_gradnorm_summary],\n", 414 | " feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),\n", 415 | " train_labels: batch[1],\n", 416 | " tf_learning_rate: 0.0001})\n", 417 | " summ_writer.add_summary(gn_summ, epoch)\n", 418 | " else:\n", 419 | " # Optimize with training data\n", 420 | " l,_ = session.run([tf_loss,tf_loss_minimize],\n", 421 | " feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),\n", 422 | " train_labels: batch[1],\n", 423 | " tf_learning_rate: 0.0001})\n", 424 | " loss_per_epoch.append(l)\n", 425 | " \n", 426 | " print('Average loss in epoch %d: %.5f'%(epoch,np.mean(loss_per_epoch))) \n", 427 | " avg_loss = np.mean(loss_per_epoch)\n", 428 | " \n", 429 | " # ====================== Calculate the Validation Accuracy ==========================\n", 430 | " valid_accuracy_per_epoch = []\n", 431 | " for i in range(n_valid//batch_size):\n", 432 | " valid_images,valid_labels = mnist_data.validation.next_batch(batch_size)\n", 433 | " valid_batch_predictions = session.run(\n", 434 | " tf_predictions,feed_dict={train_inputs: valid_images.reshape(batch_size,image_size*image_size)})\n", 435 | " valid_accuracy_per_epoch.append(accuracy(valid_batch_predictions,valid_labels))\n", 436 | " \n", 437 | " mean_v_acc = np.mean(valid_accuracy_per_epoch)\n", 438 | " print('\\tAverage Valid Accuracy in epoch %d: %.5f'%(epoch,np.mean(valid_accuracy_per_epoch)))\n", 439 | " \n", 440 | " # ===================== Calculate the Test Accuracy ===============================\n", 441 | " accuracy_per_epoch = []\n", 442 | " for i in range(n_test//batch_size):\n", 443 | " test_images, test_labels = mnist_data.test.next_batch(batch_size)\n", 444 | " test_batch_predictions = session.run(\n", 445 | " tf_predictions,feed_dict={train_inputs: test_images.reshape(batch_size,image_size*image_size)}\n", 446 | " )\n", 447 | " accuracy_per_epoch.append(accuracy(test_batch_predictions,test_labels))\n", 448 | " \n", 449 | " print('\\tAverage Test Accuracy in epoch %d: %.5f\\n'%(epoch,np.mean(accuracy_per_epoch)))\n", 450 | " avg_test_accuracy = np.mean(accuracy_per_epoch)\n", 451 | " \n", 452 | " # Execute the summaries defined above\n", 453 | " summ = session.run(performance_summaries, feed_dict={tf_loss_ph:avg_loss, tf_accuracy_ph:avg_test_accuracy})\n", 454 | "\n", 455 | " # Write the obtained summaries to the file, so it can be displayed in the Tensorboard\n", 456 | " summ_writer.add_summary(summ, epoch)\n", 457 | " \n", 458 | "session.close()" 459 | ] 460 | }, 461 | { 462 | "cell_type": "markdown", 463 | "metadata": {}, 464 | "source": [ 465 | "### Visualizing the Computational Graph\n", 466 | "First you will see what the computational graph for our model looks like. You can access this view by clicking on the **Graphs** view on the Tensorboard. It should look like something below. You can see that you have a nice flow from `train_inputs` to `loss` and `predictions` flowing through the **hidden layers** **1** to **5**.\n", 467 | "\n", 468 | "\n", 469 | "\n", 470 | "### Visualizing the Summary Data: Is Everything A-Okey Here?\n", 471 | "\n", 472 | "MNIST classification is one of the simplest examples, and still cannot solve it with a 5 layer neural network. For MNIST, it's not difficult to achieve an accuracy of more than 90% in less than 5 epochs. So what is going on here? Let's turn towards our Tensorboard.\n", 473 | "\n", 474 | "This is what the Tensorboard looks like for our example.\n", 475 | "\n", 476 | "\n", 477 | "### Observations and Conclusions from the Tensorboard\n", 478 | "You can see that the accuracy is going up, but very slow. You can see that the gradients updates are increasing over time. This is an odd behavior. If you're reaching towards convergence, you should see the gradients diminishing (approaching zero), not increasing. But because the accuracy is going up, we're on the right path. *You probably need a higher learning rate*. You can now try a learning rate of **0.01**." 479 | ] 480 | }, 481 | { 482 | "cell_type": "markdown", 483 | "metadata": {}, 484 | "source": [ 485 | "### Using a Higher Learning Rate and Executing the Neural Network Model\n", 486 | "\n", 487 | "In the code below you do the following. First you create a session, in which you execute the operations you defined above. Then you create folder for saving summary data. You next create a summary write `summ_writer`. You can now initialize all variables. This will be followed by loading the MNIST dataset.\n", 488 | "\n", 489 | "Then for each epoch, and each batch in training data (that is, each iteration). Execute `gradnorm_summary` if it is the first iteration and write `gradnorm_summary` to event file with summary writer. You now execute model optimization and calculating the loss. After you go through the full training dataset for a single epoch, calculate average training loss.\n", 490 | "\n", 491 | "You follow a similar treatment for the validation dataset as well. Specifically, for each batch in validation data, you calculate validation accuracy for each batch. Thereafter calculate average validation accuracy for full validation set.\n", 492 | "\n", 493 | "Finally, the testing phase is executed. In this, for each batch in test data, you calculate test accuracy for each batch. With that, you calculate average test accuracy for full test set. At the very end you execute `performance_summaries` and write them to event file with the summary writer." 494 | ] 495 | }, 496 | { 497 | "cell_type": "code", 498 | "execution_count": 7, 499 | "metadata": {}, 500 | "outputs": [ 501 | { 502 | "name": "stdout", 503 | "output_type": "stream", 504 | "text": [ 505 | "Extracting MNIST_data\\train-images-idx3-ubyte.gz\n", 506 | "Extracting MNIST_data\\train-labels-idx1-ubyte.gz\n", 507 | "Extracting MNIST_data\\t10k-images-idx3-ubyte.gz\n", 508 | "Extracting MNIST_data\\t10k-labels-idx1-ubyte.gz\n", 509 | "Average loss in epoch 0: 0.96963\n", 510 | "\tAverage Valid Accuracy in epoch 0: 93.56000\n", 511 | "\tAverage Test Accuracy in epoch 0: 92.70000\n", 512 | "\n", 513 | "Average loss in epoch 1: 0.18322\n", 514 | "\tAverage Valid Accuracy in epoch 1: 94.68000\n", 515 | "\tAverage Test Accuracy in epoch 1: 94.10000\n", 516 | "\n", 517 | "Average loss in epoch 2: 0.11146\n", 518 | "\tAverage Valid Accuracy in epoch 2: 97.10000\n", 519 | "\tAverage Test Accuracy in epoch 2: 96.59000\n", 520 | "\n", 521 | "Average loss in epoch 3: 0.07916\n", 522 | "\tAverage Valid Accuracy in epoch 3: 97.02000\n", 523 | "\tAverage Test Accuracy in epoch 3: 96.59000\n", 524 | "\n", 525 | "Average loss in epoch 4: 0.05842\n", 526 | "\tAverage Valid Accuracy in epoch 4: 97.74000\n", 527 | "\tAverage Test Accuracy in epoch 4: 97.38000\n", 528 | "\n", 529 | "Average loss in epoch 5: 0.04314\n", 530 | "\tAverage Valid Accuracy in epoch 5: 97.72000\n", 531 | "\tAverage Test Accuracy in epoch 5: 97.45000\n", 532 | "\n", 533 | "Average loss in epoch 6: 0.03279\n", 534 | "\tAverage Valid Accuracy in epoch 6: 98.08000\n", 535 | "\tAverage Test Accuracy in epoch 6: 97.83000\n", 536 | "\n", 537 | "Average loss in epoch 7: 0.02241\n", 538 | "\tAverage Valid Accuracy in epoch 7: 97.94000\n", 539 | "\tAverage Test Accuracy in epoch 7: 97.72000\n", 540 | "\n", 541 | "Average loss in epoch 8: 0.01907\n", 542 | "\tAverage Valid Accuracy in epoch 8: 97.98000\n", 543 | "\tAverage Test Accuracy in epoch 8: 97.72000\n", 544 | "\n", 545 | "Average loss in epoch 9: 0.01381\n", 546 | "\tAverage Valid Accuracy in epoch 9: 98.08000\n", 547 | "\tAverage Test Accuracy in epoch 9: 97.81000\n", 548 | "\n", 549 | "Average loss in epoch 10: 0.01153\n", 550 | "\tAverage Valid Accuracy in epoch 10: 97.80000\n", 551 | "\tAverage Test Accuracy in epoch 10: 97.40000\n", 552 | "\n", 553 | "Average loss in epoch 11: 0.00779\n", 554 | "\tAverage Valid Accuracy in epoch 11: 98.20000\n", 555 | "\tAverage Test Accuracy in epoch 11: 97.76000\n", 556 | "\n", 557 | "Average loss in epoch 12: 0.00602\n", 558 | "\tAverage Valid Accuracy in epoch 12: 97.92000\n", 559 | "\tAverage Test Accuracy in epoch 12: 97.79000\n", 560 | "\n", 561 | "Average loss in epoch 13: 0.00622\n", 562 | "\tAverage Valid Accuracy in epoch 13: 98.08000\n", 563 | "\tAverage Test Accuracy in epoch 13: 97.97000\n", 564 | "\n", 565 | "Average loss in epoch 14: 0.00187\n", 566 | "\tAverage Valid Accuracy in epoch 14: 98.18000\n", 567 | "\tAverage Test Accuracy in epoch 14: 97.99000\n", 568 | "\n", 569 | "Average loss in epoch 15: 0.00191\n", 570 | "\tAverage Valid Accuracy in epoch 15: 98.24000\n", 571 | "\tAverage Test Accuracy in epoch 15: 97.99000\n", 572 | "\n", 573 | "Average loss in epoch 16: 0.00057\n", 574 | "\tAverage Valid Accuracy in epoch 16: 98.36000\n", 575 | "\tAverage Test Accuracy in epoch 16: 98.22000\n", 576 | "\n", 577 | "Average loss in epoch 17: 0.00037\n", 578 | "\tAverage Valid Accuracy in epoch 17: 98.34000\n", 579 | "\tAverage Test Accuracy in epoch 17: 98.14000\n", 580 | "\n", 581 | "Average loss in epoch 18: 0.00017\n", 582 | "\tAverage Valid Accuracy in epoch 18: 98.34000\n", 583 | "\tAverage Test Accuracy in epoch 18: 98.14000\n", 584 | "\n", 585 | "Average loss in epoch 19: 0.00014\n", 586 | "\tAverage Valid Accuracy in epoch 19: 98.36000\n", 587 | "\tAverage Test Accuracy in epoch 19: 98.20000\n", 588 | "\n", 589 | "Average loss in epoch 20: 0.00012\n", 590 | "\tAverage Valid Accuracy in epoch 20: 98.38000\n", 591 | "\tAverage Test Accuracy in epoch 20: 98.16000\n", 592 | "\n", 593 | "Average loss in epoch 21: 0.00010\n", 594 | "\tAverage Valid Accuracy in epoch 21: 98.40000\n", 595 | "\tAverage Test Accuracy in epoch 21: 98.18000\n", 596 | "\n", 597 | "Average loss in epoch 22: 0.00009\n", 598 | "\tAverage Valid Accuracy in epoch 22: 98.36000\n", 599 | "\tAverage Test Accuracy in epoch 22: 98.18000\n", 600 | "\n", 601 | "Average loss in epoch 23: 0.00009\n", 602 | "\tAverage Valid Accuracy in epoch 23: 98.36000\n", 603 | "\tAverage Test Accuracy in epoch 23: 98.17000\n", 604 | "\n", 605 | "Average loss in epoch 24: 0.00008\n", 606 | "\tAverage Valid Accuracy in epoch 24: 98.38000\n", 607 | "\tAverage Test Accuracy in epoch 24: 98.17000\n", 608 | "\n" 609 | ] 610 | } 611 | ], 612 | "source": [ 613 | "\n", 614 | "image_size = 28\n", 615 | "n_channels = 1\n", 616 | "n_classes = 10\n", 617 | "n_train = 55000\n", 618 | "n_valid = 5000\n", 619 | "n_test = 10000\n", 620 | "n_epochs = 25\n", 621 | "\n", 622 | "config = tf.ConfigProto(allow_soft_placement=True)\n", 623 | "config.gpu_options.allow_growth = True\n", 624 | "config.gpu_options.per_process_gpu_memory_fraction = 0.9 # making sure Tensorflow doesn't overflow the GPU\n", 625 | "\n", 626 | "session = tf.InteractiveSession(config=config)\n", 627 | "\n", 628 | "if not os.path.exists('summaries'):\n", 629 | " os.mkdir('summaries')\n", 630 | "if not os.path.exists(os.path.join('summaries','second')):\n", 631 | " os.mkdir(os.path.join('summaries','second'))\n", 632 | " \n", 633 | "summ_writer_2 = tf.summary.FileWriter(os.path.join('summaries','second'), session.graph)\n", 634 | "\n", 635 | "tf.global_variables_initializer().run()\n", 636 | "\n", 637 | "accuracy_per_epoch = []\n", 638 | "mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)\n", 639 | "\n", 640 | "\n", 641 | "for epoch in range(n_epochs):\n", 642 | " loss_per_epoch = []\n", 643 | " for i in range(n_train//batch_size):\n", 644 | " \n", 645 | " # =================================== Training for one step ========================================\n", 646 | " batch = mnist_data.train.next_batch(batch_size) # Get one batch of training data\n", 647 | " if i == 0:\n", 648 | " # Only for the first epoch, get the summary data\n", 649 | " # Otherwise, it can clutter the visualization\n", 650 | " l,_,gn_summ = session.run([tf_loss,tf_loss_minimize,tf_gradnorm_summary],\n", 651 | " feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),\n", 652 | " train_labels: batch[1],\n", 653 | " tf_learning_rate: 0.01})\n", 654 | " summ_writer_2.add_summary(gn_summ, epoch)\n", 655 | " else:\n", 656 | " # Optimize with training data\n", 657 | " l,_ = session.run([tf_loss,tf_loss_minimize],\n", 658 | " feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),\n", 659 | " train_labels: batch[1],\n", 660 | " tf_learning_rate: 0.01})\n", 661 | " loss_per_epoch.append(l)\n", 662 | " \n", 663 | " print('Average loss in epoch %d: %.5f'%(epoch,np.mean(loss_per_epoch))) \n", 664 | " avg_loss = np.mean(loss_per_epoch)\n", 665 | " \n", 666 | " # ====================== Calculate the Validation Accuracy ==========================\n", 667 | " valid_accuracy_per_epoch = []\n", 668 | " for i in range(n_valid//batch_size):\n", 669 | " valid_images,valid_labels = mnist_data.validation.next_batch(batch_size)\n", 670 | " valid_batch_predictions = session.run(\n", 671 | " tf_predictions,feed_dict={train_inputs: valid_images.reshape(batch_size,image_size*image_size)})\n", 672 | " valid_accuracy_per_epoch.append(accuracy(valid_batch_predictions,valid_labels))\n", 673 | " \n", 674 | " mean_v_acc = np.mean(valid_accuracy_per_epoch)\n", 675 | " print('\\tAverage Valid Accuracy in epoch %d: %.5f'%(epoch,np.mean(valid_accuracy_per_epoch)))\n", 676 | " \n", 677 | " # ===================== Calculate the Test Accuracy ===============================\n", 678 | " accuracy_per_epoch = []\n", 679 | " for i in range(n_test//batch_size):\n", 680 | " test_images, test_labels = mnist_data.test.next_batch(batch_size)\n", 681 | " test_batch_predictions = session.run(\n", 682 | " tf_predictions,feed_dict={train_inputs: test_images.reshape(batch_size,image_size*image_size)}\n", 683 | " )\n", 684 | " accuracy_per_epoch.append(accuracy(test_batch_predictions,test_labels))\n", 685 | " \n", 686 | " print('\\tAverage Test Accuracy in epoch %d: %.5f\\n'%(epoch,np.mean(accuracy_per_epoch)))\n", 687 | " avg_test_accuracy = np.mean(accuracy_per_epoch)\n", 688 | " \n", 689 | " # Execute the summaries defined above\n", 690 | " summ = session.run(performance_summaries, feed_dict={tf_loss_ph:avg_loss, tf_accuracy_ph:avg_test_accuracy})\n", 691 | "\n", 692 | " # Write the obtained summaries to the file, so it can be displayed in the Tensorboard\n", 693 | " summ_writer_2.add_summary(summ, epoch)\n", 694 | " \n", 695 | "session.close()" 696 | ] 697 | }, 698 | { 699 | "cell_type": "markdown", 700 | "metadata": {}, 701 | "source": [ 702 | "### Second Look at the Tensorboard: Looks Much Better Now\n", 703 | "\n", 704 | "Now you can see that the accuracy starts close to 100 and continues to go up. And you can see that the gradient updates are also diminishing over time and approaches zero. Things seems much better with the learning rate of 0.01.\n", 705 | "\n", 706 | "\n", 707 | "\n", 708 | "Next let's move beyond scalars. You will see how you can analyze vectors/collections of scalars with the Tensorboard" 709 | ] 710 | }, 711 | { 712 | "cell_type": "markdown", 713 | "metadata": {}, 714 | "source": [ 715 | "## Beyond Scalars: Visualizing Histograms/Distributions through Tensorboard\n", 716 | "\n", 717 | "You saw the benefit of visualizing scalars through Tensorboard, which allowed us to see how the model behaves and fix any potential issues with the model. Moreover, visualizing the graph allowed us to see that there is an uninterrupted link from the inputs to the predictions, which is necessary for gradient calcualtions. \n", 718 | "\n", 719 | "Now we're going to see another useful view in Tensorboard; histograms or distributions. Histograms means what it means! It is a collection of values represented by the frequency/density that the value is present in the collection. How can you can use histograms to visualize something in the neural network. You can use histograms to visualize the network weight values over time. Visualizing network weights is important, because if the weights are wildly jumping here and their during learning, it indicates something wrong with the weight initialization or the learning rate. You will see how weights change in our example. If you look at the code, it uses a *truncated_normal_initializer(...)* to initialize weights." 720 | ] 721 | }, 722 | { 723 | "cell_type": "markdown", 724 | "metadata": {}, 725 | "source": [ 726 | "### Defining Histogram Summaries to Visualize Weights and Biases\n", 727 | "\n", 728 | "Here you again define the `tf.summary` objects. However now you are visualizing vectors of scalars so you need to define `tf.summary.histogram` objects. In this case, you define two histogram objects (namely, `tf_w_hist` and `tf_b_hist`) that contains weights and biases of agiven layer. You will define such histogram objects for all the layers and each layer will have it's own name scope. Finally you can use the `tf.summary.merge` operation to create a grouped operation that execute all these summaries at once." 729 | ] 730 | }, 731 | { 732 | "cell_type": "code", 733 | "execution_count": 8, 734 | "metadata": { 735 | "collapsed": true 736 | }, 737 | "outputs": [], 738 | "source": [ 739 | "\n", 740 | "# Summaries need to display on the Tensorboard\n", 741 | "# Create a summary for each weight bias in each layer\n", 742 | "all_summaries = []\n", 743 | "for lid in layer_ids:\n", 744 | " with tf.name_scope(lid+'_hist'):\n", 745 | " with tf.variable_scope(lid,reuse=True):\n", 746 | " w,b = tf.get_variable('weights'), tf.get_variable('bias')\n", 747 | "\n", 748 | " # Create a scalar summary object for the loss so Tensorboard knows how to display it\n", 749 | " tf_w_hist = tf.summary.histogram('weights_hist', tf.reshape(w,[-1]))\n", 750 | " tf_b_hist = tf.summary.histogram('bias_hist', b)\n", 751 | " all_summaries.extend([tf_w_hist, tf_b_hist])\n", 752 | "\n", 753 | "# Merge all parameter histogram summaries together\n", 754 | "tf_param_summaries = tf.summary.merge(all_summaries)\n", 755 | "\n" 756 | ] 757 | }, 758 | { 759 | "cell_type": "markdown", 760 | "metadata": {}, 761 | "source": [ 762 | "### Executing the neural network model (with Histogram Summaries)\n", 763 | "\n", 764 | "In the code below you do the following. First you create a session, in which you execute the operations you defined above. Then you create folder for saving summary data. You next create a summary write `summ_writer`. You can now initialize all variables. This will be followed by loading the MNIST dataset.\n", 765 | "\n", 766 | "Then for each epoch, and each batch in training data (that is, each iteration). Execute `gradnorm_summary` and `tf_param_summaries` if it is the first iteration and write `gradnorm_summary` and `tf_param_summaries` to event file with summary writer. You now execute model optimization and calculating the loss. After you go through the full training dataset for a single epoch, calculate average training loss.\n", 767 | "\n", 768 | "You follow a similar treatment for the validation dataset as well. Specifically, for each batch in validation data, you calculate validation accuracy for each batch. Thereafter calculate average validation accuracy for full validation set.\n", 769 | "\n", 770 | "Finally, the testing phase is executed. In this, for each batch in test data, you calculate test accuracy for each batch. With that, you calculate average test accuracy for full test set. At the very end you execute `performance_summaries` and write them to event file with the summary writer.\n", 771 | "\n", 772 | "**Note**: This is as same as you did before, but here you have few additional line to compute the histogram summaries (that is, `tf_param_summaries`).\n" 773 | ] 774 | }, 775 | { 776 | "cell_type": "code", 777 | "execution_count": 9, 778 | "metadata": {}, 779 | "outputs": [ 780 | { 781 | "name": "stdout", 782 | "output_type": "stream", 783 | "text": [ 784 | "Extracting MNIST_data\\train-images-idx3-ubyte.gz\n", 785 | "Extracting MNIST_data\\train-labels-idx1-ubyte.gz\n", 786 | "Extracting MNIST_data\\t10k-images-idx3-ubyte.gz\n", 787 | "Extracting MNIST_data\\t10k-labels-idx1-ubyte.gz\n", 788 | "Average loss in epoch 0: 0.94838\n", 789 | "\tAverage Valid Accuracy in epoch 0: 93.76000\n", 790 | "\tAverage Test Accuracy in epoch 0: 93.41000\n", 791 | "\n", 792 | "Average loss in epoch 1: 0.17797\n", 793 | "\tAverage Valid Accuracy in epoch 1: 96.00000\n", 794 | "\tAverage Test Accuracy in epoch 1: 95.43000\n", 795 | "\n", 796 | "Average loss in epoch 2: 0.11237\n", 797 | "\tAverage Valid Accuracy in epoch 2: 96.84000\n", 798 | "\tAverage Test Accuracy in epoch 2: 96.67000\n", 799 | "\n", 800 | "Average loss in epoch 3: 0.07718\n", 801 | "\tAverage Valid Accuracy in epoch 3: 97.36000\n", 802 | "\tAverage Test Accuracy in epoch 3: 97.08000\n", 803 | "\n", 804 | "Average loss in epoch 4: 0.05755\n", 805 | "\tAverage Valid Accuracy in epoch 4: 97.64000\n", 806 | "\tAverage Test Accuracy in epoch 4: 97.63000\n", 807 | "\n", 808 | "Average loss in epoch 5: 0.04365\n", 809 | "\tAverage Valid Accuracy in epoch 5: 97.78000\n", 810 | "\tAverage Test Accuracy in epoch 5: 97.41000\n", 811 | "\n", 812 | "Average loss in epoch 6: 0.03195\n", 813 | "\tAverage Valid Accuracy in epoch 6: 97.60000\n", 814 | "\tAverage Test Accuracy in epoch 6: 97.42000\n", 815 | "\n", 816 | "Average loss in epoch 7: 0.02522\n", 817 | "\tAverage Valid Accuracy in epoch 7: 97.88000\n", 818 | "\tAverage Test Accuracy in epoch 7: 97.74000\n", 819 | "\n", 820 | "Average loss in epoch 8: 0.01883\n", 821 | "\tAverage Valid Accuracy in epoch 8: 97.94000\n", 822 | "\tAverage Test Accuracy in epoch 8: 97.71000\n", 823 | "\n", 824 | "Average loss in epoch 9: 0.01504\n", 825 | "\tAverage Valid Accuracy in epoch 9: 97.70000\n", 826 | "\tAverage Test Accuracy in epoch 9: 97.39000\n", 827 | "\n", 828 | "Average loss in epoch 10: 0.01283\n", 829 | "\tAverage Valid Accuracy in epoch 10: 98.00000\n", 830 | "\tAverage Test Accuracy in epoch 10: 97.77000\n", 831 | "\n", 832 | "Average loss in epoch 11: 0.00796\n", 833 | "\tAverage Valid Accuracy in epoch 11: 98.24000\n", 834 | "\tAverage Test Accuracy in epoch 11: 97.91000\n", 835 | "\n", 836 | "Average loss in epoch 12: 0.00872\n", 837 | "\tAverage Valid Accuracy in epoch 12: 97.98000\n", 838 | "\tAverage Test Accuracy in epoch 12: 97.79000\n", 839 | "\n", 840 | "Average loss in epoch 13: 0.00390\n", 841 | "\tAverage Valid Accuracy in epoch 13: 98.18000\n", 842 | "\tAverage Test Accuracy in epoch 13: 98.03000\n", 843 | "\n", 844 | "Average loss in epoch 14: 0.00113\n", 845 | "\tAverage Valid Accuracy in epoch 14: 98.32000\n", 846 | "\tAverage Test Accuracy in epoch 14: 98.16000\n", 847 | "\n", 848 | "Average loss in epoch 15: 0.00071\n", 849 | "\tAverage Valid Accuracy in epoch 15: 98.16000\n", 850 | "\tAverage Test Accuracy in epoch 15: 98.07000\n", 851 | "\n", 852 | "Average loss in epoch 16: 0.00057\n", 853 | "\tAverage Valid Accuracy in epoch 16: 98.26000\n", 854 | "\tAverage Test Accuracy in epoch 16: 98.17000\n", 855 | "\n", 856 | "Average loss in epoch 17: 0.00039\n", 857 | "\tAverage Valid Accuracy in epoch 17: 98.10000\n", 858 | "\tAverage Test Accuracy in epoch 17: 98.19000\n", 859 | "\n", 860 | "Average loss in epoch 18: 0.00018\n", 861 | "\tAverage Valid Accuracy in epoch 18: 98.18000\n", 862 | "\tAverage Test Accuracy in epoch 18: 98.13000\n", 863 | "\n", 864 | "Average loss in epoch 19: 0.00015\n", 865 | "\tAverage Valid Accuracy in epoch 19: 98.14000\n", 866 | "\tAverage Test Accuracy in epoch 19: 98.14000\n", 867 | "\n", 868 | "Average loss in epoch 20: 0.00013\n", 869 | "\tAverage Valid Accuracy in epoch 20: 98.20000\n", 870 | "\tAverage Test Accuracy in epoch 20: 98.10000\n", 871 | "\n", 872 | "Average loss in epoch 21: 0.00011\n", 873 | "\tAverage Valid Accuracy in epoch 21: 98.16000\n", 874 | "\tAverage Test Accuracy in epoch 21: 98.13000\n", 875 | "\n", 876 | "Average loss in epoch 22: 0.00010\n", 877 | "\tAverage Valid Accuracy in epoch 22: 98.14000\n", 878 | "\tAverage Test Accuracy in epoch 22: 98.12000\n", 879 | "\n", 880 | "Average loss in epoch 23: 0.00009\n", 881 | "\tAverage Valid Accuracy in epoch 23: 98.16000\n", 882 | "\tAverage Test Accuracy in epoch 23: 98.15000\n", 883 | "\n", 884 | "Average loss in epoch 24: 0.00009\n", 885 | "\tAverage Valid Accuracy in epoch 24: 98.18000\n", 886 | "\tAverage Test Accuracy in epoch 24: 98.14000\n", 887 | "\n" 888 | ] 889 | } 890 | ], 891 | "source": [ 892 | "\n", 893 | "image_size = 28\n", 894 | "n_channels = 1\n", 895 | "n_classes = 10\n", 896 | "n_train = 55000\n", 897 | "n_valid = 5000\n", 898 | "n_test = 10000\n", 899 | "n_epochs = 25\n", 900 | "\n", 901 | "config = tf.ConfigProto(allow_soft_placement=True)\n", 902 | "config.gpu_options.allow_growth = True\n", 903 | "config.gpu_options.per_process_gpu_memory_fraction = 0.9 # making sure Tensorflow doesn't overflow the GPU\n", 904 | "\n", 905 | "session = tf.InteractiveSession(config=config)\n", 906 | "\n", 907 | "if not os.path.exists('summaries'):\n", 908 | " os.mkdir('summaries')\n", 909 | "if not os.path.exists(os.path.join('summaries','third')):\n", 910 | " os.mkdir(os.path.join('summaries','third'))\n", 911 | " \n", 912 | "summ_writer_3 = tf.summary.FileWriter(os.path.join('summaries','third'), session.graph)\n", 913 | "\n", 914 | "tf.global_variables_initializer().run()\n", 915 | "\n", 916 | "accuracy_per_epoch = []\n", 917 | "mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)\n", 918 | "\n", 919 | "\n", 920 | "for epoch in range(n_epochs):\n", 921 | " loss_per_epoch = []\n", 922 | " for i in range(n_train//batch_size):\n", 923 | " \n", 924 | " # =================================== Training for one step ========================================\n", 925 | " batch = mnist_data.train.next_batch(batch_size) # Get one batch of training data\n", 926 | " if i == 0:\n", 927 | " # Only for the first epoch, get the summary data\n", 928 | " # Otherwise, it can clutter the visualization\n", 929 | " l,_,gn_summ, wb_summ = session.run([tf_loss,tf_loss_minimize,tf_gradnorm_summary, tf_param_summaries],\n", 930 | " feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),\n", 931 | " train_labels: batch[1],\n", 932 | " tf_learning_rate: 0.00001})\n", 933 | " summ_writer_3.add_summary(gn_summ, epoch)\n", 934 | " summ_writer_3.add_summary(wb_summ, epoch)\n", 935 | " else:\n", 936 | " # Optimize with training data\n", 937 | " l,_ = session.run([tf_loss,tf_loss_minimize],\n", 938 | " feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),\n", 939 | " train_labels: batch[1],\n", 940 | " tf_learning_rate: 0.01})\n", 941 | " loss_per_epoch.append(l)\n", 942 | " \n", 943 | " print('Average loss in epoch %d: %.5f'%(epoch,np.mean(loss_per_epoch))) \n", 944 | " avg_loss = np.mean(loss_per_epoch)\n", 945 | " \n", 946 | " # ====================== Calculate the Validation Accuracy ==========================\n", 947 | " valid_accuracy_per_epoch = []\n", 948 | " for i in range(n_valid//batch_size):\n", 949 | " valid_images,valid_labels = mnist_data.validation.next_batch(batch_size)\n", 950 | " valid_batch_predictions = session.run(\n", 951 | " tf_predictions,feed_dict={train_inputs: valid_images.reshape(batch_size,image_size*image_size)})\n", 952 | " valid_accuracy_per_epoch.append(accuracy(valid_batch_predictions,valid_labels))\n", 953 | " \n", 954 | " mean_v_acc = np.mean(valid_accuracy_per_epoch)\n", 955 | " print('\\tAverage Valid Accuracy in epoch %d: %.5f'%(epoch,np.mean(valid_accuracy_per_epoch)))\n", 956 | " \n", 957 | " # ===================== Calculate the Test Accuracy ===============================\n", 958 | " accuracy_per_epoch = []\n", 959 | " for i in range(n_test//batch_size):\n", 960 | " test_images, test_labels = mnist_data.test.next_batch(batch_size)\n", 961 | " test_batch_predictions = session.run(\n", 962 | " tf_predictions,feed_dict={train_inputs: test_images.reshape(batch_size,image_size*image_size)}\n", 963 | " )\n", 964 | " accuracy_per_epoch.append(accuracy(test_batch_predictions,test_labels))\n", 965 | " \n", 966 | " print('\\tAverage Test Accuracy in epoch %d: %.5f\\n'%(epoch,np.mean(accuracy_per_epoch)))\n", 967 | " avg_test_accuracy = np.mean(accuracy_per_epoch)\n", 968 | " \n", 969 | " # Execute the summaries defined above\n", 970 | " summ = session.run(performance_summaries, feed_dict={tf_loss_ph:avg_loss, tf_accuracy_ph:avg_test_accuracy})\n", 971 | "\n", 972 | " # Write the obtained summaries to the file, so it can be displayed in the Tensorboard\n", 973 | " summ_writer_3.add_summary(summ, epoch)\n", 974 | " \n", 975 | "session.close()" 976 | ] 977 | }, 978 | { 979 | "cell_type": "markdown", 980 | "metadata": {}, 981 | "source": [ 982 | "### Visualizing Histogram Data of Weights and Biases\n", 983 | "\n", 984 | "Here's what our weights and biases look like. First you have 3 axis; time (x axis), value (y axis) and frequency/density of values (z axis). Darker histograms represent old data and lighter historgrams represent newer data. Higher value on the z axis means that the vector contains more values near that specific value.\n", 985 | "\n", 986 | "**Note**: You also have an \"overlay\" view of the histograms over time as well. You can change the type of display on the left side option panel. \n", 987 | "\n", 988 | "\n", 989 | "\n", 990 | "" 991 | ] 992 | }, 993 | { 994 | "cell_type": "markdown", 995 | "metadata": {}, 996 | "source": [ 997 | "### Effect of Different Initializers: Changing the Initialization of Weights and Re-Defining the Model\n", 998 | "\n", 999 | "Now, instead of using `truncated_normal_initializer()`, you will use the `xavier_initialzer()` to initialize weights. Xavier initialization is much better initialization technique, especially for deep neural networks. This is because Xavier initializer, instead of using a user defined standard deviation (as you did when using the `truncated_normal_initializer()`), Xaiver initialization automatically decides the standard deviation based on the number of input and output connections to a layer. This helps to flow gradients from top to bottom without issues like *vanishing gradient*. You then define the model again.\n", 1000 | "\n", 1001 | "First you define a `batch_size` denoting the amount of data you sample at a single optimization/validation or testing step. Then you define the `layer_ids`, which gives an identifier for each of the layers of the neural network you will be defining. You then can define `layer_sizes`. Note that `len(layer_sizes)` should be `len(layer_ids)+1`, because `layer_sizes` includes the size of the input at the beginning. MNIST has images of size 28x28, which will be 784 when unwrapped to a single dimension. Then you can define the input and label placeholders, that you will later use to train the model. Finally you define two TensorFlow variables for each layer (that is, `weights` and `bias`).\n", 1002 | "\n", 1003 | "**Note**: This is identical to the code you used first time, except for the initialization technique used for the weights" 1004 | ] 1005 | }, 1006 | { 1007 | "cell_type": "code", 1008 | "execution_count": 10, 1009 | "metadata": { 1010 | "collapsed": true 1011 | }, 1012 | "outputs": [], 1013 | "source": [ 1014 | "batch_size = 100\n", 1015 | "layer_ids = ['hidden1','hidden2','hidden3','hidden4','hidden5','out']\n", 1016 | "layer_sizes = [784, 500, 400, 300, 200, 100, 10]\n", 1017 | "\n", 1018 | "tf.reset_default_graph()\n", 1019 | "\n", 1020 | "# Inputs and Labels\n", 1021 | "train_inputs = tf.placeholder(tf.float32, shape=[batch_size, layer_sizes[0]], name='train_inputs')\n", 1022 | "train_labels = tf.placeholder(tf.float32, shape=[batch_size, layer_sizes[-1]], name='train_labels')\n", 1023 | "\n", 1024 | "# Weight and Bias definitions\n", 1025 | "for idx, lid in enumerate(layer_ids):\n", 1026 | " \n", 1027 | " with tf.variable_scope(lid):\n", 1028 | " w = tf.get_variable('weights',shape=[layer_sizes[idx], layer_sizes[idx+1]], \n", 1029 | " initializer=tf.contrib.layers.xavier_initializer())\n", 1030 | " b = tf.get_variable('bias',shape= [layer_sizes[idx+1]], \n", 1031 | " initializer=tf.random_uniform_initializer(-0.1,0.1))\n", 1032 | " " 1033 | ] 1034 | }, 1035 | { 1036 | "cell_type": "markdown", 1037 | "metadata": {}, 1038 | "source": [ 1039 | "### Calculate Logits, Predictions, Loss and Optimization\n", 1040 | "\n", 1041 | "With the input/output placeholders, weights and biases of each layer defined, you now can define the calculations to calculate the logits of the neural network. Logits are the unnormalized values produced at the last layer of the neural network. When normalized, you call them predictions. This involves iterating through each layer in the neural network and computing `tf.matmul(h,w) +b`. You also need to apply an activation function as `tf.nn.relu(tf.matmul(h,w) +b)`, for all layers except for the last layer.\n", 1042 | "\n", 1043 | "Next you define loss function that is used to optimize the neural network. In this example, you can use the cross entropy loss, which often deliver better results in classification problems than the mean squared error.\n", 1044 | "\n", 1045 | "Finally you will need to define an optimizer that takes in the loss and update the weights of the neural network in the direction that minimizes the loss.\n", 1046 | "\n", 1047 | "**Note**: This is identical to the code you used first time you defined these operations and tensors." 1048 | ] 1049 | }, 1050 | { 1051 | "cell_type": "code", 1052 | "execution_count": 11, 1053 | "metadata": { 1054 | "collapsed": true 1055 | }, 1056 | "outputs": [], 1057 | "source": [ 1058 | "# Calculating Logits\n", 1059 | "h = train_inputs\n", 1060 | "for lid in layer_ids:\n", 1061 | " with tf.variable_scope(lid,reuse=True):\n", 1062 | " w, b = tf.get_variable('weights'), tf.get_variable('bias')\n", 1063 | " if lid != 'out':\n", 1064 | " h = tf.nn.relu(tf.matmul(h,w)+b,name=lid+'_output')\n", 1065 | " else:\n", 1066 | " h = tf.nn.xw_plus_b(h,w,b,name=lid+'_output')\n", 1067 | "\n", 1068 | "tf_predictions = tf.nn.softmax(h, name='predictions')\n", 1069 | "# Calculating Loss\n", 1070 | "tf_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits_v2(labels=train_labels, logits=h),name='loss')\n", 1071 | "\n", 1072 | "# Optimizer \n", 1073 | "tf_learning_rate = tf.placeholder(tf.float32, shape=None, name='learning_rate')\n", 1074 | "optimizer = tf.train.MomentumOptimizer(tf_learning_rate,momentum=0.9)\n", 1075 | "grads_and_vars = optimizer.compute_gradients(tf_loss)\n", 1076 | "tf_loss_minimize = optimizer.minimize(tf_loss)\n" 1077 | ] 1078 | }, 1079 | { 1080 | "cell_type": "markdown", 1081 | "metadata": {}, 1082 | "source": [ 1083 | "### Defining Tensorboard Summaries\n", 1084 | "\n", 1085 | "Here you can define the `tf.summary` objects. `tf.summary` objects are the type of entities understood by the Tensorboard. This means that whatever value you'd like to be displayed on the Tensorboard, you should encapsulate it as a `tf.summary` object. There are several different types of summaries. Here as you are visualizing only scalars, you can define `tf.summary.scalar` objects. Furthermore, you can use `tf.name_scope` to group scalars on the Tensorboard. That is, scalrs having the same name scope will be displayed on the same row on the Tensorboard. Here you define three different summaries.\n", 1086 | "\n", 1087 | "* `tf_loss_summary` : You feed in a value by means of a placeholder, whenever you need to publish this to the Tensorboard\n", 1088 | "* `tf_accuracy_summary` : You feed in a value by means of a placeholder, whenever you need to publish this to the Tensorboard\n", 1089 | "* `tf_gradnorm_summary` : This calculates the l2 norm of the gradients of the last layer of your neural network. Gradient norm is a good indicator of whether the weights of the neural network are being properly updated. A too small gradient norm can indicate *vanishing gradient* or a too large gradient can imply *exploding gradient* phenomenon.\n", 1090 | "\n", 1091 | "**Note**: This is identical to the code you used first time you defined these operations and tensors." 1092 | ] 1093 | }, 1094 | { 1095 | "cell_type": "code", 1096 | "execution_count": 12, 1097 | "metadata": { 1098 | "collapsed": true 1099 | }, 1100 | "outputs": [], 1101 | "source": [ 1102 | "# Name scope allows you to group various summaries together\n", 1103 | "# Summaries having the same name_scope will be displayed on the same row on the Tensorboard\n", 1104 | "with tf.name_scope('performance'):\n", 1105 | " # Summaries need to display on the Tensorboard\n", 1106 | " # Whenever you need to record the loss, feed the mean loss to this placeholder\n", 1107 | " tf_loss_ph = tf.placeholder(tf.float32,shape=None,name='loss_summary') \n", 1108 | " # Create a scalar summary object for the loss so Tensorboard knows how to display it\n", 1109 | " tf_loss_summary = tf.summary.scalar('loss', tf_loss_ph)\n", 1110 | "\n", 1111 | " # Whenever you need to record the loss, feed the mean test accuracy to this placeholder\n", 1112 | " tf_accuracy_ph = tf.placeholder(tf.float32,shape=None, name='accuracy_summary') \n", 1113 | " # Create a scalar summary object for the accuracy so Tensorboard knows how to display it\n", 1114 | " tf_accuracy_summary = tf.summary.scalar('accuracy', tf_accuracy_ph)\n", 1115 | "\n", 1116 | "# Gradient norm summary\n", 1117 | "for g,v in grads_and_vars:\n", 1118 | " if 'hidden5' in v.name and 'weights' in v.name:\n", 1119 | " with tf.name_scope('gradients'):\n", 1120 | " tf_last_grad_norm = tf.sqrt(tf.reduce_mean(g**2))\n", 1121 | " tf_gradnorm_summary = tf.summary.scalar('grad_norm', tf_last_grad_norm)\n", 1122 | " break\n", 1123 | "# Merge all summaries together\n", 1124 | "performance_summaries = tf.summary.merge([tf_loss_summary,tf_accuracy_summary])" 1125 | ] 1126 | }, 1127 | { 1128 | "cell_type": "markdown", 1129 | "metadata": {}, 1130 | "source": [ 1131 | "### Defining Histogram Summaries to Visualize Weights and Biases \n", 1132 | "\n", 1133 | "Here you again define the `tf.summary` objects. However now you are visualizing vectors of scalars so you need to define `tf.summary.histogram` objects. In this case, you define two histogram objects (namely, `tf_w_hist` and `tf_b_hist`) that contains weights and biases of agiven layer. You will define such histogram objects for all the layers and each layer will have it's own name scope. Finally you can use the `tf.summary.merge` operation to create a grouped operation that execute all these summaries at once.\n", 1134 | "\n", 1135 | "**Note**: This is identical to the code you used first time you defined these operations and tensors." 1136 | ] 1137 | }, 1138 | { 1139 | "cell_type": "code", 1140 | "execution_count": 13, 1141 | "metadata": {}, 1142 | "outputs": [], 1143 | "source": [ 1144 | "\n", 1145 | "# Summaries need to display on the Tensorboard\n", 1146 | "# Create a summary for each weight bias in each layer\n", 1147 | "all_summaries = []\n", 1148 | "for lid in layer_ids:\n", 1149 | " with tf.name_scope(lid+'_hist'):\n", 1150 | " with tf.variable_scope(lid,reuse=True):\n", 1151 | " w,b = tf.get_variable('weights'), tf.get_variable('bias')\n", 1152 | "\n", 1153 | " # Create a scalar summary object for the loss so Tensorboard knows how to display it\n", 1154 | " tf_w_hist = tf.summary.histogram('weights_hist', tf.reshape(w,[-1]))\n", 1155 | " tf_b_hist = tf.summary.histogram('bias_hist', b)\n", 1156 | " all_summaries.extend([tf_w_hist, tf_b_hist])\n", 1157 | "\n", 1158 | "# Merge all parameter histogram summaries together\n", 1159 | "tf_param_summaries = tf.summary.merge(all_summaries)\n" 1160 | ] 1161 | }, 1162 | { 1163 | "cell_type": "markdown", 1164 | "metadata": {}, 1165 | "source": [ 1166 | "### Executing the neural network model\n", 1167 | "\n", 1168 | "In the code below you do the following. First you create a session, in which you execute the operations you defined above. Then you create folder for saving summary data. You next create a summary write `summ_writer`. You can now initialize all variables. This will be followed by loading the MNIST dataset.\n", 1169 | "\n", 1170 | "Then for each epoch, and each batch in training data (that is, each iteration). Execute `gradnorm_summary` and `tf_param_summaries` if it is the first iteration and write `gradnorm_summary` and `tf_param_summaries` to event file with summary writer. You now execute model optimization and calculating the loss. After you go through the full training dataset for a single epoch, calculate average training loss.\n", 1171 | "\n", 1172 | "You follow a similar treatment for the validation dataset as well. Specifically, for each batch in validation data, you calculate validation accuracy for each batch. Thereafter calculate average validation accuracy for full validation set.\n", 1173 | "\n", 1174 | "Finally, the testing phase is executed. In this, for each batch in test data, you calculate test accuracy for each batch. With that, you calculate average test accuracy for full test set. At the very end you execute `performance_summaries` and write them to event file with the summary writer.\n", 1175 | "\n", 1176 | "**Note**: This is as same as you did before." 1177 | ] 1178 | }, 1179 | { 1180 | "cell_type": "code", 1181 | "execution_count": 14, 1182 | "metadata": {}, 1183 | "outputs": [ 1184 | { 1185 | "name": "stdout", 1186 | "output_type": "stream", 1187 | "text": [ 1188 | "Extracting MNIST_data\\train-images-idx3-ubyte.gz\n", 1189 | "Extracting MNIST_data\\train-labels-idx1-ubyte.gz\n", 1190 | "Extracting MNIST_data\\t10k-images-idx3-ubyte.gz\n", 1191 | "Extracting MNIST_data\\t10k-labels-idx1-ubyte.gz\n", 1192 | "Average loss in epoch 0: 0.44695\n", 1193 | "\tAverage Valid Accuracy in epoch 0: 95.92000\n", 1194 | "\tAverage Test Accuracy in epoch 0: 95.50000\n", 1195 | "\n", 1196 | "Average loss in epoch 1: 0.13685\n", 1197 | "\tAverage Valid Accuracy in epoch 1: 96.78000\n", 1198 | "\tAverage Test Accuracy in epoch 1: 96.28000\n", 1199 | "\n", 1200 | "Average loss in epoch 2: 0.08945\n", 1201 | "\tAverage Valid Accuracy in epoch 2: 97.18000\n", 1202 | "\tAverage Test Accuracy in epoch 2: 97.14000\n", 1203 | "\n", 1204 | "Average loss in epoch 3: 0.06410\n", 1205 | "\tAverage Valid Accuracy in epoch 3: 97.54000\n", 1206 | "\tAverage Test Accuracy in epoch 3: 97.56000\n", 1207 | "\n", 1208 | "Average loss in epoch 4: 0.04689\n", 1209 | "\tAverage Valid Accuracy in epoch 4: 98.06000\n", 1210 | "\tAverage Test Accuracy in epoch 4: 97.75000\n", 1211 | "\n", 1212 | "Average loss in epoch 5: 0.03310\n", 1213 | "\tAverage Valid Accuracy in epoch 5: 97.98000\n", 1214 | "\tAverage Test Accuracy in epoch 5: 97.83000\n", 1215 | "\n", 1216 | "Average loss in epoch 6: 0.02627\n", 1217 | "\tAverage Valid Accuracy in epoch 6: 97.96000\n", 1218 | "\tAverage Test Accuracy in epoch 6: 97.72000\n", 1219 | "\n", 1220 | "Average loss in epoch 7: 0.02006\n", 1221 | "\tAverage Valid Accuracy in epoch 7: 98.10000\n", 1222 | "\tAverage Test Accuracy in epoch 7: 97.96000\n", 1223 | "\n", 1224 | "Average loss in epoch 8: 0.01436\n", 1225 | "\tAverage Valid Accuracy in epoch 8: 98.34000\n", 1226 | "\tAverage Test Accuracy in epoch 8: 98.20000\n", 1227 | "\n", 1228 | "Average loss in epoch 9: 0.00902\n", 1229 | "\tAverage Valid Accuracy in epoch 9: 98.26000\n", 1230 | "\tAverage Test Accuracy in epoch 9: 98.04000\n", 1231 | "\n", 1232 | "Average loss in epoch 10: 0.00522\n", 1233 | "\tAverage Valid Accuracy in epoch 10: 98.36000\n", 1234 | "\tAverage Test Accuracy in epoch 10: 98.20000\n", 1235 | "\n", 1236 | "Average loss in epoch 11: 0.00262\n", 1237 | "\tAverage Valid Accuracy in epoch 11: 98.40000\n", 1238 | "\tAverage Test Accuracy in epoch 11: 98.27000\n", 1239 | "\n", 1240 | "Average loss in epoch 12: 0.00252\n", 1241 | "\tAverage Valid Accuracy in epoch 12: 98.46000\n", 1242 | "\tAverage Test Accuracy in epoch 12: 98.26000\n", 1243 | "\n", 1244 | "Average loss in epoch 13: 0.00184\n", 1245 | "\tAverage Valid Accuracy in epoch 13: 98.52000\n", 1246 | "\tAverage Test Accuracy in epoch 13: 98.36000\n", 1247 | "\n", 1248 | "Average loss in epoch 14: 0.00059\n", 1249 | "\tAverage Valid Accuracy in epoch 14: 98.46000\n", 1250 | "\tAverage Test Accuracy in epoch 14: 98.22000\n", 1251 | "\n", 1252 | "Average loss in epoch 15: 0.00048\n", 1253 | "\tAverage Valid Accuracy in epoch 15: 98.62000\n", 1254 | "\tAverage Test Accuracy in epoch 15: 98.43000\n", 1255 | "\n", 1256 | "Average loss in epoch 16: 0.00038\n", 1257 | "\tAverage Valid Accuracy in epoch 16: 98.58000\n", 1258 | "\tAverage Test Accuracy in epoch 16: 98.38000\n", 1259 | "\n", 1260 | "Average loss in epoch 17: 0.00032\n", 1261 | "\tAverage Valid Accuracy in epoch 17: 98.60000\n", 1262 | "\tAverage Test Accuracy in epoch 17: 98.41000\n", 1263 | "\n", 1264 | "Average loss in epoch 18: 0.00024\n", 1265 | "\tAverage Valid Accuracy in epoch 18: 98.58000\n", 1266 | "\tAverage Test Accuracy in epoch 18: 98.40000\n", 1267 | "\n", 1268 | "Average loss in epoch 19: 0.00019\n", 1269 | "\tAverage Valid Accuracy in epoch 19: 98.56000\n", 1270 | "\tAverage Test Accuracy in epoch 19: 98.43000\n", 1271 | "\n", 1272 | "Average loss in epoch 20: 0.00016\n", 1273 | "\tAverage Valid Accuracy in epoch 20: 98.56000\n", 1274 | "\tAverage Test Accuracy in epoch 20: 98.43000\n", 1275 | "\n", 1276 | "Average loss in epoch 21: 0.00014\n", 1277 | "\tAverage Valid Accuracy in epoch 21: 98.56000\n", 1278 | "\tAverage Test Accuracy in epoch 21: 98.41000\n", 1279 | "\n", 1280 | "Average loss in epoch 22: 0.00013\n", 1281 | "\tAverage Valid Accuracy in epoch 22: 98.56000\n", 1282 | "\tAverage Test Accuracy in epoch 22: 98.44000\n", 1283 | "\n", 1284 | "Average loss in epoch 23: 0.00012\n", 1285 | "\tAverage Valid Accuracy in epoch 23: 98.56000\n", 1286 | "\tAverage Test Accuracy in epoch 23: 98.42000\n", 1287 | "\n", 1288 | "Average loss in epoch 24: 0.00011\n", 1289 | "\tAverage Valid Accuracy in epoch 24: 98.58000\n", 1290 | "\tAverage Test Accuracy in epoch 24: 98.40000\n", 1291 | "\n" 1292 | ] 1293 | } 1294 | ], 1295 | "source": [ 1296 | "\n", 1297 | "image_size = 28\n", 1298 | "n_channels = 1\n", 1299 | "n_classes = 10\n", 1300 | "n_train = 55000\n", 1301 | "n_valid = 5000\n", 1302 | "n_test = 10000\n", 1303 | "n_epochs = 25\n", 1304 | "\n", 1305 | "config = tf.ConfigProto(allow_soft_placement=True)\n", 1306 | "config.gpu_options.allow_growth = True\n", 1307 | "config.gpu_options.per_process_gpu_memory_fraction = 0.9 # making sure Tensorflow doesn't overflow the GPU\n", 1308 | "\n", 1309 | "session = tf.InteractiveSession(config=config)\n", 1310 | "\n", 1311 | "if not os.path.exists('summaries'):\n", 1312 | " os.mkdir('summaries')\n", 1313 | "if not os.path.exists(os.path.join('summaries','fourth')):\n", 1314 | " os.mkdir(os.path.join('summaries','fourth'))\n", 1315 | " \n", 1316 | "summ_writer_4 = tf.summary.FileWriter(os.path.join('summaries','fourth'), session.graph)\n", 1317 | "\n", 1318 | "tf.global_variables_initializer().run()\n", 1319 | "\n", 1320 | "accuracy_per_epoch = []\n", 1321 | "mnist_data = input_data.read_data_sets('MNIST_data', one_hot=True)\n", 1322 | "\n", 1323 | "\n", 1324 | "for epoch in range(n_epochs):\n", 1325 | " loss_per_epoch = []\n", 1326 | " for i in range(n_train//batch_size):\n", 1327 | " \n", 1328 | " # =================================== Training for one step ========================================\n", 1329 | " batch = mnist_data.train.next_batch(batch_size) # Get one batch of training data\n", 1330 | " if i == 0:\n", 1331 | " # Only for the first epoch, get the summary data\n", 1332 | " # Otherwise, it can clutter the visualization\n", 1333 | " l,_,gn_summ, wb_summ = session.run([tf_loss,tf_loss_minimize,tf_gradnorm_summary, tf_param_summaries],\n", 1334 | " feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),\n", 1335 | " train_labels: batch[1],\n", 1336 | " tf_learning_rate: 0.01})\n", 1337 | " summ_writer_4.add_summary(gn_summ, epoch)\n", 1338 | " summ_writer_4.add_summary(wb_summ, epoch)\n", 1339 | " else:\n", 1340 | " # Optimize with training data\n", 1341 | " l,_ = session.run([tf_loss,tf_loss_minimize],\n", 1342 | " feed_dict={train_inputs: batch[0].reshape(batch_size,image_size*image_size),\n", 1343 | " train_labels: batch[1],\n", 1344 | " tf_learning_rate: 0.01})\n", 1345 | " loss_per_epoch.append(l)\n", 1346 | " \n", 1347 | " print('Average loss in epoch %d: %.5f'%(epoch,np.mean(loss_per_epoch))) \n", 1348 | " avg_loss = np.mean(loss_per_epoch)\n", 1349 | " \n", 1350 | " # ====================== Calculate the Validation Accuracy ==========================\n", 1351 | " valid_accuracy_per_epoch = []\n", 1352 | " for i in range(n_valid//batch_size):\n", 1353 | " valid_images,valid_labels = mnist_data.validation.next_batch(batch_size)\n", 1354 | " valid_batch_predictions = session.run(\n", 1355 | " tf_predictions,feed_dict={train_inputs: valid_images.reshape(batch_size,image_size*image_size)})\n", 1356 | " valid_accuracy_per_epoch.append(accuracy(valid_batch_predictions,valid_labels))\n", 1357 | " \n", 1358 | " mean_v_acc = np.mean(valid_accuracy_per_epoch)\n", 1359 | " print('\\tAverage Valid Accuracy in epoch %d: %.5f'%(epoch,np.mean(valid_accuracy_per_epoch)))\n", 1360 | " \n", 1361 | " # ===================== Calculate the Test Accuracy ===============================\n", 1362 | " accuracy_per_epoch = []\n", 1363 | " for i in range(n_test//batch_size):\n", 1364 | " test_images, test_labels = mnist_data.test.next_batch(batch_size)\n", 1365 | " test_batch_predictions = session.run(\n", 1366 | " tf_predictions,feed_dict={train_inputs: test_images.reshape(batch_size,image_size*image_size)}\n", 1367 | " )\n", 1368 | " accuracy_per_epoch.append(accuracy(test_batch_predictions,test_labels))\n", 1369 | " \n", 1370 | " print('\\tAverage Test Accuracy in epoch %d: %.5f\\n'%(epoch,np.mean(accuracy_per_epoch)))\n", 1371 | " avg_test_accuracy = np.mean(accuracy_per_epoch)\n", 1372 | " \n", 1373 | " # Execute the summaries defined above\n", 1374 | " summ = session.run(performance_summaries, feed_dict={tf_loss_ph:avg_loss, tf_accuracy_ph:avg_test_accuracy})\n", 1375 | "\n", 1376 | " # Write the obtained summaries to the file, so it can be displayed in the Tensorboard\n", 1377 | " summ_writer_4.add_summary(summ, epoch)\n", 1378 | " \n", 1379 | "session.close()" 1380 | ] 1381 | }, 1382 | { 1383 | "cell_type": "markdown", 1384 | "metadata": {}, 1385 | "source": [ 1386 | "### Comparing Different Initialization Techniques\n", 1387 | "\n", 1388 | "Here you compare how weights evolve over time for the two different initalizations; *truncated_normal_initializer* (red) and *xavier_initializer* (blue). You can see that xavier initializer keeps more weights away from zero than the normal initializer, which is a better thing to do. This is potentially allowing the xavier initialized neural networks to converge faster, as evident by the loss/accuracy curves. \n", 1389 | "\n", 1390 | "\n" 1391 | ] 1392 | }, 1393 | { 1394 | "cell_type": "markdown", 1395 | "metadata": {}, 1396 | "source": [ 1397 | "## Distibution View of Histograms\n", 1398 | "\n", 1399 | "You now can compare the difference between the two views; histogram view and the distribution view. Distribution view is essentiall a different way of looking at the histograms. If you look at the image below, you can easily see that the distribution view is a top view of the histogram view (I have rotated the histogram graphs to easily see the resemblance).\n", 1400 | "\n", 1401 | "\n" 1402 | ] 1403 | }, 1404 | { 1405 | "cell_type": "markdown", 1406 | "metadata": {}, 1407 | "source": [ 1408 | "## Conclusion\n", 1409 | "\n", 1410 | "In this tutorial you saw how to use the Tensorboard. First you learnt how to start the Tensorboard through the command prompt (Windows) or terminal (Ubuntu/Mac). Next you looked at different views of data provided by the Tensorboard. You then looked at code that visualizes scalar values (e.g. loss / accuracy). You used a feed-forward neural network model to concretely understand the use of the scalar value visualization. Thereafter, you explored how you can visualize collections/vectors of scalars using the histogram view. This was followed by a comparison highlighting the differences between neural network weight initialization techniques using the histogram view. Finally you discussed the similarities between the distribution view and the histogram view.\n", 1411 | "\n", 1412 | "* Author: Thushan Ganegedara\n", 1413 | "* Email: thushv@gmail.com\n", 1414 | "* Website: http://www.thushv.com/\n", 1415 | "* LinkedIn: https://www.linkedin.com/in/thushanganegedara/" 1416 | ] 1417 | }, 1418 | { 1419 | "cell_type": "code", 1420 | "execution_count": null, 1421 | "metadata": { 1422 | "collapsed": true 1423 | }, 1424 | "outputs": [], 1425 | "source": [] 1426 | } 1427 | ], 1428 | "metadata": { 1429 | "kernelspec": { 1430 | "display_name": "Python 3", 1431 | "language": "python", 1432 | "name": "python3" 1433 | }, 1434 | "language_info": { 1435 | "codemirror_mode": { 1436 | "name": "ipython", 1437 | "version": 3 1438 | }, 1439 | "file_extension": ".py", 1440 | "mimetype": "text/x-python", 1441 | "name": "python", 1442 | "nbconvert_exporter": "python", 1443 | "pygments_lexer": "ipython3", 1444 | "version": "3.5.2" 1445 | } 1446 | }, 1447 | "nbformat": 4, 1448 | "nbformat_minor": 2 1449 | } 1450 | -------------------------------------------------------------------------------- /tensorboard_tutorial/tensorboard_1.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thushv89/datacamp_tutorials/5d2461bf49a4029b27d1f5afa8570aa5234dd1ca/tensorboard_tutorial/tensorboard_1.PNG -------------------------------------------------------------------------------- /tensorboard_tutorial/tensorboard_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thushv89/datacamp_tutorials/5d2461bf49a4029b27d1f5afa8570aa5234dd1ca/tensorboard_tutorial/tensorboard_2.png -------------------------------------------------------------------------------- /tensorboard_tutorial/tensorboard_3_1.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thushv89/datacamp_tutorials/5d2461bf49a4029b27d1f5afa8570aa5234dd1ca/tensorboard_tutorial/tensorboard_3_1.PNG -------------------------------------------------------------------------------- /tensorboard_tutorial/tensorboard_3_2.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thushv89/datacamp_tutorials/5d2461bf49a4029b27d1f5afa8570aa5234dd1ca/tensorboard_tutorial/tensorboard_3_2.PNG -------------------------------------------------------------------------------- /tensorboard_tutorial/tensorboard_3_3.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thushv89/datacamp_tutorials/5d2461bf49a4029b27d1f5afa8570aa5234dd1ca/tensorboard_tutorial/tensorboard_3_3.PNG -------------------------------------------------------------------------------- /tensorboard_tutorial/tensorboard_graph.PNG: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thushv89/datacamp_tutorials/5d2461bf49a4029b27d1f5afa8570aa5234dd1ca/tensorboard_tutorial/tensorboard_graph.PNG -------------------------------------------------------------------------------- /tensorboard_tutorial/tensorboard_histogram_vs_distribution_views.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/thushv89/datacamp_tutorials/5d2461bf49a4029b27d1f5afa8570aa5234dd1ca/tensorboard_tutorial/tensorboard_histogram_vs_distribution_views.png --------------------------------------------------------------------------------