├── .gitignore ├── README.md ├── download-and-setup ├── .gitignore └── README.md ├── lab1_FFN ├── .gitignore ├── confusionmatrix.py ├── lab1_FFN.ipynb └── mnist.npz ├── lab2_CNN ├── .gitignore ├── confusionmatrix.py ├── lab2_CNN.ipynb ├── mnist.npz └── spatial_transformer.py ├── lab3_RNN ├── .gitignore ├── confusionmatrix.py ├── data_generator.py ├── enc-dec.png ├── lab3_RNN.ipynb └── tf_utils.py ├── lab4_Kaggle ├── .gitignore ├── README.md └── lab4_Kaggle.ipynb └── lab5_AE ├── .gitignore └── lab5_AE.ipynb /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.npz 3 | *.csv 4 | *.jpg 5 | *.ipynb_checkpoints 6 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TensorFlow Tutorial - used by Nvidia 2 | 3 | Learn TensorFlow from scratch by examples and visualizations with interactive jupyter notebooks. Learn to compete in the [Kaggle leaf detection challenge](https://www.kaggle.com/c/leaf-classification)! 4 | 5 | All exercises are designed to be run from a CPU on a laptop, but can be accelerated with GPU resources. 6 | 7 | Lab 1-4 was used in the [Deep Learning using TensorFlow](http://www.eventbrite.com/e/deep-learning-using-tensorflow-tickets-27071720244#) in London by Nvidia and Persontyle 8 | 9 | ## Credits 10 | 11 | Labs 1, 2, 3 and 5 have been translated from Theano/Lasagne with minor modifications from the following repositories: [Nvidia Summer Camp](https://github.com/DeepLearningDTU/nvidia_deep_learning_summercamp_2016) and [02456 deep learning](https://github.com/DeepLearningDTU/02456-deep-learning). Original authors: [skaae](https://github.com/skaae), [casperkaae](https://github.com/casperkaae) and [larsmaaloee](https://github.com/larsmaaloee). 12 | 13 | Thanks to professor [Ole Winther](http://cogsys.imm.dtu.dk/staff/winther/) for supervision and sponsoring the labs. 14 | 15 | ## Setup and Installation 16 | 17 | Guides for downloading and installing TensorFlow on Linux, OSX and Windows using Docker can be found [here](https://github.com/alrojo/tensorflow-tutorial/tree/master/download-and-setup). 18 | 19 | ## Material 20 | 21 | The material consists of 5 labs. 22 | 23 | ### [Lab1 - FFN](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab1_FFN) 24 | 25 | Logistic regression, feed forward neural network (FFN) on the (in)famous MNIST! 26 | 27 | Optional reading material from [Michael Nielsen](http://neuralnetworksanddeeplearning.com/) chapters 1-4 (Do 3-5 of the optional exercises). 28 | 29 | ### [Lab2 - CNN](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab2_CNN) 30 | 31 | Convolutional Neural Network (CNN) and Spatial Transformer on images. 32 | 33 | Optional reading material from [Michael Nielsen](http://neuralnetworksanddeeplearning.com/) chapter 6 (stop when reaching section called Other approaches to deep neural nets). 34 | 35 | ### [Lab3 - RNN](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab3_RNN) 36 | 37 | Recurrent Neural Network (RNN) on Translation using Encoder-Decoder model and Encoder-Decoder with attention. 38 | 39 | Optional reading material from [Alex Graves](https://www.cs.toronto.edu/~graves/preprint.pdf) chapters 3.1, 3.2 and 4, 40 | 41 | ### [Lab4 - Kaggle](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab4_Kaggle) 42 | 43 | Compete in the kaggle competition [Leaf Classification](https://www.kaggle.com/c/leaf-classification) using FFN, CNN and RNN. 44 | 45 | ### [Lab5 - AE](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab5_AE) 46 | 47 | Unsupervised learning with autoencoder (AE) reconstructing the MNIST from only two latent variables. 48 | 49 | Optional reading material from [deeplearningbook.org](http://www.deeplearningbook.org/contents/autoencoders.html) chapter 14. 50 | -------------------------------------------------------------------------------- /download-and-setup/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/download-and-setup/.gitignore -------------------------------------------------------------------------------- /download-and-setup/README.md: -------------------------------------------------------------------------------- 1 | # Download and Setup 2 | 3 | This tutorial will guide installation of TensorFlow on Linux, OSX and Windows. 4 | 5 | # Docker 6 | 7 | In this tutorial we will use [docker](https://www.docker.com/) containers to handle dependancies and run our code. 8 | Docker is a container which allow us to run our code in an encapsuled container. 9 | The language of choice will be python2. 10 | 11 | ## 1. Installation of docker (all operating systems) 12 | 13 | Instructions for installing docker can be found [here](https://docs.docker.com/engine/installation/#installation), the instructions contains guides for most operating systems. 14 | 15 | ## 2. Using dockerhub 16 | 17 | After installing docker you are ready to go! The docker image that you will use for this tutorial is an extension of TensorFlow's own nightly build docker image (with sklearn, wget, scikit-image etc.). 18 | 19 | Getting access to docker images on dockerhub (`hub.docker.com`) is easy! When choosing your docker image just type the dockerhub username followed by the project. In our case the username will be `alrojo` and the repository `tf-sklearn-cpu`, I encourage you to learn the fundamentals of docker, in the [project folder](https://hub.docker.com/r/alrojo/docker-whale/) (on docker hub) I have supplied the `Dockerfile` commands from which it was created. 20 | 21 | To run the docker type 22 | 23 | >docker run -it alrojo/tf-sklearn-cpu 24 | 25 | this starts up a docker container from the `alrojo/tf-sklearn-cpu` image. 26 | Where `-it` is required for an interactive experience with the docker bash environment. 27 | To exit the interactive environment of the docker container type 28 | 29 | >exit 30 | 31 | (Don't worry! We need to rerun it with some other flags in just a moment.) 32 | 33 | ## 3. Forwarding port 34 | 35 | As the docker system runs independent of your host system, we need to enable port forwarding (for jupyter notebook) and sharing of directories. 36 | 37 | First, make sure that you have downloaded this repository, if not, you can either go to `github.com/alrojo/tensorflow_tutorial`, click `Clone or download`, download as zip and extract to your desired folder. 38 | Alternatively you can run the command 39 | 40 | >git clone https://github.com/alrojo/tensorflow_tutorial.git 41 | 42 | In the following `$PATH\_TO\_FOLDER` should be replaced by the name of the your desired folder, an example of a path could be `~/deep\_learning\_courses.` 43 | And the name of the repository will be denoted as tensorflow_tutorial. 44 | Given these namings, run the following line in your shell 45 | 46 | NOTE: windows users might not have the windows style path, type `pwd` in your docker command windows to find you docker friendly path. 47 | 48 | >docker run -p 8888:8888 -v $PATH\_TO\_FOLDER/tensorflow_tutorial:/mnt/myproject -it alrojo/tf-sklearn-cpu 49 | 50 | so if you are using `~/deep\_learning\_courses.` as your `$PATH\_TO\_FOLDER`, the command will look like this 51 | 52 | >docker run -p 8888:8888 -v ~/deep\_learning\_courses/tensorflow_tutorial:/mnt/myproject -it alrojo/tf-sklearn-cpu 53 | 54 | where `-it` is required for an interactive experience with the docker bash environment, `-p` is for port forwarding and `-v` is for mounting your given folder to the docker container. 55 | 56 | This should leave you in the root directory of your docker container with port forwarded and shared directory, run the command 57 | 58 | >./run\_jupyter.sh 59 | 60 | Your volume should be available through the `/mnt` folder, run 61 | 62 | Open a new tab in your browser and type localhost:8888 in the browser address bar. Note that you cannot have any other notebooks running simultaneously. 63 | 64 | NOTE: when using docker toolbox on windows the port will probably not bind to local host, instead you must find the port it binds to by typing the following in your docker prompt 65 | 66 | >docker-machine ip 67 | 68 | this should give you an ip that you can replace with localhost. 69 | 70 | From within the notebook, click on `/mnt`, click on `myproject`, now you can start the exercises! 71 | 72 | ## Installation of nvidia-docker for GPU 73 | 74 | Use the following [guide](http://cs224d.stanford.edu/) for AWS setup. 75 | -------------------------------------------------------------------------------- /lab1_FFN/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab1_FFN/.gitignore -------------------------------------------------------------------------------- /lab1_FFN/confusionmatrix.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class ConfusionMatrix: 5 | """ 6 | Simple confusion matrix class 7 | row is the true class, column is the predicted class 8 | """ 9 | def __init__(self, num_classes, class_names=None): 10 | self.n_classes = num_classes 11 | if class_names is None: 12 | self.class_names = map(str, range(num_classes)) 13 | else: 14 | self.class_names = class_names 15 | 16 | # find max class_name and pad 17 | max_len = max(map(len, self.class_names)) 18 | self.max_len = max_len 19 | for idx, name in enumerate(self.class_names): 20 | if len(self.class_names) < max_len: 21 | self.class_names[idx] = name + " "*(max_len-len(name)) 22 | 23 | self.mat = np.zeros((num_classes,num_classes),dtype='int') 24 | 25 | def __str__(self): 26 | # calucate row and column sums 27 | col_sum = np.sum(self.mat, axis=1) 28 | row_sum = np.sum(self.mat, axis=0) 29 | 30 | s = [] 31 | 32 | mat_str = self.mat.__str__() 33 | mat_str = mat_str.replace('[','').replace(']','').split('\n') 34 | 35 | for idx, row in enumerate(mat_str): 36 | if idx == 0: 37 | pad = " " 38 | else: 39 | pad = "" 40 | class_name = self.class_names[idx] 41 | class_name = " " + class_name + " |" 42 | row_str = class_name + pad + row 43 | row_str += " |" + str(col_sum[idx]) 44 | s.append(row_str) 45 | 46 | row_sum = [(self.max_len+4)*" "+" ".join(map(str, row_sum))] 47 | hline = [(1+self.max_len)*" "+"-"*len(row_sum[0])] 48 | 49 | s = hline + s + hline + row_sum 50 | 51 | # add linebreaks 52 | s_out = [line+'\n' for line in s] 53 | return "".join(s_out) 54 | 55 | def batch_add(self, targets, preds): 56 | assert targets.shape == preds.shape 57 | assert len(targets) == len(preds) 58 | assert max(targets) < self.n_classes 59 | assert max(preds) < self.n_classes 60 | targets = targets.flatten() 61 | preds = preds.flatten() 62 | for i in range(len(targets)): 63 | self.mat[targets[i], preds[i]] += 1 64 | 65 | def get_errors(self): 66 | tp = np.asarray(np.diag(self.mat).flatten(),dtype='float') 67 | fn = np.asarray(np.sum(self.mat, axis=1).flatten(),dtype='float') - tp 68 | fp = np.asarray(np.sum(self.mat, axis=0).flatten(),dtype='float') - tp 69 | tn = np.asarray(np.sum(self.mat)*np.ones(self.n_classes).flatten(), 70 | dtype='float') - tp - fn - fp 71 | return tp, fn, fp, tn 72 | 73 | def accuracy(self): 74 | """ 75 | Calculates global accuracy 76 | :return: accuracy 77 | :example: >>> conf = ConfusionMatrix(3) 78 | >>> conf.batchAdd([0,0,1],[0,0,2]) 79 | >>> print conf.accuracy() 80 | """ 81 | tp, _, _, _ = self.get_errors() 82 | n_samples = np.sum(self.mat) 83 | return np.sum(tp) / n_samples 84 | 85 | def sensitivity(self): 86 | tp, tn, fp, fn = self.get_errors() 87 | res = tp / (tp + fn) 88 | res = res[~np.isnan(res)] 89 | return res 90 | 91 | def specificity(self): 92 | tp, tn, fp, fn = self.get_errors() 93 | res = tn / (tn + fp) 94 | res = res[~np.isnan(res)] 95 | return res 96 | 97 | def positive_predictive_value(self): 98 | tp, tn, fp, fn = self.get_errors() 99 | res = tp / (tp + fp) 100 | res = res[~np.isnan(res)] 101 | return res 102 | 103 | def negative_predictive_value(self): 104 | tp, tn, fp, fn = self.get_errors() 105 | res = tn / (tn + fn) 106 | res = res[~np.isnan(res)] 107 | return res 108 | 109 | def false_positive_rate(self): 110 | tp, tn, fp, fn = self.get_errors() 111 | res = fp / (fp + tn) 112 | res = res[~np.isnan(res)] 113 | return res 114 | 115 | def false_discovery_rate(self): 116 | tp, tn, fp, fn = self.get_errors() 117 | res = fp / (tp + fp) 118 | res = res[~np.isnan(res)] 119 | return res 120 | 121 | def F1(self): 122 | tp, tn, fp, fn = self.get_errors() 123 | res = (2*tp) / (2*tp + fp + fn) 124 | res = res[~np.isnan(res)] 125 | return res 126 | 127 | def matthews_correlation(self): 128 | tp, tn, fp, fn = self.get_errors() 129 | numerator = tp*tn - fp*fn 130 | denominator = np.sqrt((tp + fp)*(tp + fn)*(tn + fp)*(tn + fn)) 131 | res = numerator / denominator 132 | res = res[~np.isnan(res)] 133 | return res 134 | -------------------------------------------------------------------------------- /lab1_FFN/lab1_FFN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Credits\n", 8 | "TensorFlow translation of [Lasagne tutorial](https://github.com/DeepLearningDTU/nvidia_deep_learning_summercamp_2016/blob/master/lab1/lab1_FFN.ipynb). Thanks to [skaae](https://github.com/skaae), [casperkaae](https://github.com/casperkaae) and [larsmaaloee](https://github.com/larsmaaloee)." 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "# Dependancies and supporting functions\n", 16 | "Loading dependancies and supporting functions by running the code block below." 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": null, 22 | "metadata": { 23 | "collapsed": false 24 | }, 25 | "outputs": [], 26 | "source": [ 27 | "%matplotlib inline\n", 28 | "import matplotlib\n", 29 | "import numpy as np\n", 30 | "import matplotlib.pyplot as plt\n", 31 | "import sklearn.datasets\n", 32 | "import tensorflow as tf\n", 33 | "from tensorflow.python.framework.ops import reset_default_graph\n", 34 | "\n", 35 | "# Do not worry about the code below for now, it is used for plotting later\n", 36 | "def plot_decision_boundary(pred_func, X, y):\n", 37 | " #from https://github.com/dennybritz/nn-from-scratch/blob/master/nn-from-scratch.ipynb\n", 38 | " # Set min and max values and give it some padding\n", 39 | " x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5\n", 40 | " y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5\n", 41 | " \n", 42 | " h = 0.01\n", 43 | " # Generate a grid of points with distance h between them\n", 44 | " xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n", 45 | " \n", 46 | " yy = yy.astype('float32')\n", 47 | " xx = xx.astype('float32')\n", 48 | " # Predict the function value for the whole gid\n", 49 | " Z = pred_func(np.c_[xx.ravel(), yy.ravel()])[:,0]\n", 50 | " Z = Z.reshape(xx.shape)\n", 51 | " # Plot the contour and training examples\n", 52 | " plt.figure()\n", 53 | " plt.contourf(xx, yy, Z, cmap=plt.cm.RdBu)\n", 54 | " plt.scatter(X[:, 0], X[:, 1], c=-y, cmap=plt.cm.Spectral)\n", 55 | "\n", 56 | "def onehot(t, num_classes):\n", 57 | " out = np.zeros((t.shape[0], num_classes))\n", 58 | " for row, col in enumerate(t):\n", 59 | " out[row, col] = 1\n", 60 | " return out" 61 | ] 62 | }, 63 | { 64 | "cell_type": "markdown", 65 | "metadata": {}, 66 | "source": [ 67 | "# Neural networks 101\n", 68 | "In this notebook you will implement a simple neural network in TensorFlow utilizing the graph building and automatic differentiation engine of TensorFlow. We assume that you are already familiar with backpropagation (if not please see [Andrej Karpathy](http://cs.stanford.edu/people/karpathy/) or [Michal Nielsen](http://neuralnetworksanddeeplearning.com/chap2.html).\n", 69 | "We'll not spend much time on how TensorFlow works, but you can refer to [this short tutorial](https://www.tensorflow.org/versions/r0.10/get_started/basic_usage.html) if you are interested, or [the python documentation](https://www.tensorflow.org/versions/r0.10/api_docs/index.html).\n", 70 | "\n", 71 | "(Additionally, for the ambitious people we have previously made an assignment where you will implement both the forward and backpropagation in a neural network by hand, https://github.com/DTU-deeplearning/day1-NN/blob/master/exercises_1.ipynb)(Ole, skal jeg også implementere det?)\n", 72 | "\n", 73 | "In this exercise we'll start right away by defining logistic regression model in TensorFlow. Some details of TensorFlow can be a bit confusing, however you'll pick them up when you worked with it for some time. We'll initially start with a simple 2-D and 2-class classification problem where the class decision boundary can be visualized. Initially we show that logistic regression can only separate classes linearly. Adding a Non-linear hidden layer to the algorithm permits nonlinear class separation. If time permits we'll continue on to implement a fully conencted neural network to classify the (in)famous MNIST dataset consisting of images of hand written digits." 74 | ] 75 | }, 76 | { 77 | "cell_type": "markdown", 78 | "metadata": { 79 | "collapsed": true 80 | }, 81 | "source": [ 82 | "# Problem \n", 83 | "We'll initally demonstrate the that MLPs can classify non-linear problems whereas simple logistic regression cannot. For ease of visualization and computationl speed we initially experiment on the simple 2D half-moon dataset." 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": null, 89 | "metadata": { 90 | "collapsed": false 91 | }, 92 | "outputs": [], 93 | "source": [ 94 | "# Generate a dataset and plot it\n", 95 | "np.random.seed(0)\n", 96 | "num_samples = 300\n", 97 | "\n", 98 | "X, y = sklearn.datasets.make_moons(num_samples, noise=0.20)\n", 99 | "\n", 100 | "X_tr = X[:100].astype('float32')\n", 101 | "X_val = X[100:200].astype('float32')\n", 102 | "X_te = X[200:].astype('float32')\n", 103 | "\n", 104 | "y_tr = y[:100].astype('int32')\n", 105 | "y_val = y[100:200].astype('int32')\n", 106 | "y_te = y[200:].astype('int32')\n", 107 | "\n", 108 | "plt.scatter(X_tr[:,0], X_tr[:,1], s=40, c=y_tr, cmap=plt.cm.BuGn)\n", 109 | "\n", 110 | "print X.shape, y.shape\n", 111 | "\n", 112 | "num_features = X_tr.shape[-1]\n", 113 | "num_output = 2" 114 | ] 115 | }, 116 | { 117 | "cell_type": "markdown", 118 | "metadata": {}, 119 | "source": [ 120 | "# From Logistic Regression to \"Deep Learning\" in Lasagne\n", 121 | "The code implements logistic regression in TensorFlow. In section __Assignments Half Moon__ you are asked to modify the code into a neural network.\n", 122 | "\n", 123 | "The building blocks of TensorFlow are variables and operations, with these we can form computational graphs that form neural networks.\n", 124 | "\n", 125 | "The [tf.placeholder](https://www.tensorflow.org/versions/r0.10/api_docs/python/io_ops.html#placeholder) allows us to feed our input data to the computational graph. We can define constraints with the shape of the placeholder to only take a tensor of a certain shape. Note that it is common to provide ``None`` for the first dimension, which allows us to vary the batch size at runtime.\n", 126 | "\n", 127 | "The [tf.Variable](https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#Variable) allows us to store and update Tensors in our graph. Variables are used to build weights for our neural network. Note that we will use a wrapper called [`tf.get_variable`](https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#get_variable) througout this tutorial.\n", 128 | "\n", 129 | "The [tf.Operation](https://www.tensorflow.org/versions/r0.10/api_docs/python/framework.html#Operation) allows us to perform operations on tensors, resulting in new tensors. Such as when computing the logistic regression which is implemented below:\n", 130 | "\n", 131 | "$$y = nonlinearity(xW + b)$$\n", 132 | "\n", 133 | "where $x$ is the input tensor, $y$ is the output tensor and $\\{W, b\\}$ are the weights (variable tensors). The weights are initialized with an initializer of our choice (check [tensorflow's API](https://www.tensorflow.org/versions/r0.10/api_docs/index.html) for more.\n", 134 | "x has shape ```[batchsize, num_features]```. ```W``` has shape ```[num_features, num_units]``` and b has ```[num_units]```. y has then ```[batch_size, num_units]```.\n", 135 | "\n", 136 | "NOTE: to make building neural networks easier, TensorFlow's [contrib](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#layers-contrib) wraps TensorFlow functionality to support various operations such as; [convolutions](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#convolution2d), [batch_norm](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#batch_norm), [fully_connected](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#fully_connected).\n", 137 | "\n", 138 | "In this first exercise we will use basic TensorFlow functions so that you can learn how to build it from scratch. This will help you later if you want to build your own custom operations." 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "## TensorFlow Playerground\n", 146 | "\n", 147 | "If you are new to Neural Networks, start by using the [TensorFlow playground](http://playground.tensorflow.org/) to familiarize yourself with hidden layers, hidden units, activations, learning rate, etc." 148 | ] 149 | }, 150 | { 151 | "cell_type": "code", 152 | "execution_count": null, 153 | "metadata": { 154 | "collapsed": false 155 | }, 156 | "outputs": [], 157 | "source": [ 158 | "# resets the graph, needed when initializing weights multiple times, like in this notebook\n", 159 | "reset_default_graph()\n", 160 | "\n", 161 | "# Setting up placeholder, this is where your data enters the graph!\n", 162 | "x_pl = tf.placeholder(tf.float32, [None, num_features])\n", 163 | "\n", 164 | "# Setting up variables, these variables are weights in your network that can be update while running our graph.\n", 165 | "# Notice, to make a hidden layer, the weights needs to have the following dimensionality\n", 166 | "# W[number_of_units_going_in, number_of_units_going_out]\n", 167 | "# b[number_of_units_going_out]\n", 168 | "# in the example below we have 2 input units (num_features) and 2 output units (num_output)\n", 169 | "# so our weights become W[2, 2], b[2]\n", 170 | "# if we want to make a hidden layer with 100 units, we need to define the shape of the\n", 171 | "# first weight to W[2, 100], b[2] and the shape of the second weight to W[100, 2], b[2]\n", 172 | "\n", 173 | "# defining our initializer for our weigths from a normal distribution (mean=0, std=0.1)\n", 174 | "weight_initializer = tf.truncated_normal_initializer(stddev=0.1)\n", 175 | "with tf.variable_scope('l_1'): # if you run it more than once, reuse has to be True\n", 176 | " W_1 = tf.get_variable('W', [num_features, num_output], # change num_output to 100 for mlp\n", 177 | " initializer=weight_initializer)\n", 178 | " b_1 = tf.get_variable('b', [num_output], # change num_output to 100 for mlp\n", 179 | " initializer=tf.constant_initializer(0.0))\n", 180 | "# with tf. variable_scope('l_2'):\n", 181 | "# W_2 = tf.get_variable('W', [100, num_output],\n", 182 | "# initializer=weight_initializer)\n", 183 | "# b_2 = tf.get_variable('b', [num_output],\n", 184 | "# initializer=tf.constant_initializer(0.0))\n", 185 | "\n", 186 | "# Setting up ops, these ops will define edges along our computational graph\n", 187 | "# The below ops will compute a logistic regression, but can be modified to compute\n", 188 | "# a neural network\n", 189 | "\n", 190 | "l_1 = tf.matmul(x_pl, W_1) + b_1\n", 191 | "# to make a hidden layer we need a nonlinearity\n", 192 | "# l_1_nonlinear = tf.nn.relu(l_1)\n", 193 | "# the layer before the softmax should not have a nonlinearity\n", 194 | "# l_2 = tf.matmul(l_1_nonlinear, W_2) + b_2\n", 195 | "y = tf.nn.softmax(l_1) # change to l_2 for MLP" 196 | ] 197 | }, 198 | { 199 | "cell_type": "code", 200 | "execution_count": null, 201 | "metadata": { 202 | "collapsed": false 203 | }, 204 | "outputs": [], 205 | "source": [ 206 | "# knowing how to print your tensors and ops is useful, here are some examples\n", 207 | "print(\"---placeholders---\")\n", 208 | "print(x_pl.name)\n", 209 | "print(x_pl)\n", 210 | "print\n", 211 | "print(\"---weights---\")\n", 212 | "print(W_1.name)\n", 213 | "print(W_1.get_shape())\n", 214 | "print(W_1)\n", 215 | "print\n", 216 | "print(b_1.name)\n", 217 | "print(b_1)\n", 218 | "print(b_1.get_shape())\n", 219 | "print\n", 220 | "print(\"---ops---\")\n", 221 | "print(l_1.name)\n", 222 | "print(l_1)\n", 223 | "print\n", 224 | "print(y.name)\n", 225 | "print(y)" 226 | ] 227 | }, 228 | { 229 | "cell_type": "markdown", 230 | "metadata": {}, 231 | "source": [ 232 | "After we have built the network we have our tensors in our default [graph](https://www.tensorflow.org/versions/r0.10/api_docs/python/framework.html#Graph), which we can use to build the cost function and training part.\n", 233 | "\n", 234 | "Further, using our default graph we can print the operations and variables of our default graph." 235 | ] 236 | }, 237 | { 238 | "cell_type": "code", 239 | "execution_count": null, 240 | "metadata": { 241 | "collapsed": false 242 | }, 243 | "outputs": [], 244 | "source": [ 245 | "# y_ is a placeholder variable taking on the value of the target batch.\n", 246 | "y_ = tf.placeholder(tf.float32, [None, num_output])\n", 247 | "\n", 248 | "# computing cross entropy per sample\n", 249 | "cross_entropy = -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])\n", 250 | "\n", 251 | "# averaging over samples\n", 252 | "cross_entropy = tf.reduce_mean(cross_entropy)" 253 | ] 254 | }, 255 | { 256 | "cell_type": "markdown", 257 | "metadata": {}, 258 | "source": [ 259 | "Notice that our weights and operations defined in the `l_1` space are saved in the `l_1` directory of the graph." 260 | ] 261 | }, 262 | { 263 | "cell_type": "code", 264 | "execution_count": null, 265 | "metadata": { 266 | "collapsed": false 267 | }, 268 | "outputs": [], 269 | "source": [ 270 | "# using the graph to print ops\n", 271 | "print(\"operations\")\n", 272 | "operations = [op.name for op in tf.get_default_graph().get_operations()]\n", 273 | "print(operations)\n", 274 | "print\n", 275 | "# variables are accessed through tensorflow\n", 276 | "print(\"variables\")\n", 277 | "variables = [var.name for var in tf.all_variables()]\n", 278 | "print(variables)" 279 | ] 280 | }, 281 | { 282 | "cell_type": "markdown", 283 | "metadata": {}, 284 | "source": [ 285 | "To train our neural network we need to update the parameters in direction of the negative gradient w.r.t the cost function we defined earlier.\n", 286 | "We can use `tf.train.Optimizer` to get the gradients (using `compute_gradients`) for all parameters in the network w.r.t ``cost_train``.\n", 287 | "Imagine that `cost_train` is a function and we want to go downhill. We go downhill by changing the value of the paramters in direction of the negative gradient. \n", 288 | "\n", 289 | "Finally we can use the built-in `minimize` to calculate the stochastic gradient descent (SGD) update rule for each paramter in the network.\n", 290 | "\n", 291 | "Heres a small animation of gradient descent: http://imgur.com/a/Hqolp . E.g why saddle points might be difficult.\n", 292 | "To use the other optimizers checkout which optimizers TensorFlow [supports](https://www.tensorflow.org/versions/r0.10/api_docs/python/train.html#optimizers)" 293 | ] 294 | }, 295 | { 296 | "cell_type": "code", 297 | "execution_count": null, 298 | "metadata": { 299 | "collapsed": false 300 | }, 301 | "outputs": [], 302 | "source": [ 303 | "# Defining our optimizer (try with different optimizers here!)\n", 304 | "optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)\n", 305 | "\n", 306 | "# Computing our gradients\n", 307 | "grads_and_vars = optimizer.compute_gradients(cross_entropy)\n", 308 | "\n", 309 | "# Applying the gradients\n", 310 | "train_op = optimizer.apply_gradients(grads_and_vars)\n", 311 | "\n", 312 | "# Notice, alternatively you can use train_op = optimizer.minimize(crossentropy)\n", 313 | "# instead of the three steps above" 314 | ] 315 | }, 316 | { 317 | "cell_type": "markdown", 318 | "metadata": {}, 319 | "source": [ 320 | "Next, we make the prediction functions, such that we can get an accuracy measure over a batch" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": null, 326 | "metadata": { 327 | "collapsed": true 328 | }, 329 | "outputs": [], 330 | "source": [ 331 | "# making a one-hot encoded vector of correct (1) and incorrect (0) predictions\n", 332 | "correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))\n", 333 | "\n", 334 | "# averaging the one-hot encoded vector\n", 335 | "accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))" 336 | ] 337 | }, 338 | { 339 | "cell_type": "markdown", 340 | "metadata": {}, 341 | "source": [ 342 | "The next step is to utilize our `train_op` function repeatedly in order to optimize our weights `W_1` and `b_1` to make the best possible linear seperation of the half moon dataset.\n", 343 | "\n", 344 | "It is worth or read a short introduction on TensorFlow [sessions](https://www.tensorflow.org/versions/r0.10/api_docs/python/client.html#Session) before continuing to the next codeblock. Sessions are used to run TensorFlow graphs, they uses `fetches` to decide which parts of the graph to compute and `feed_dicts` to load data into the graph." 345 | ] 346 | }, 347 | { 348 | "cell_type": "code", 349 | "execution_count": null, 350 | "metadata": { 351 | "collapsed": false 352 | }, 353 | "outputs": [], 354 | "source": [ 355 | "# defining a function to make predictions using our classifier\n", 356 | "def pred(X_in, sess):\n", 357 | " # first we must define what data to give it\n", 358 | " feed_dict = {x_pl: X_in}\n", 359 | " # secondly our fetches\n", 360 | " fetches = [y]\n", 361 | " # utilizing the given session (ref. sess) to compute results\n", 362 | " res = sess.run(fetches, feed_dict)\n", 363 | " # res is a list with each indices representing the corresponding element in fetches\n", 364 | " return res[0]\n", 365 | "\n", 366 | "# Training loop\n", 367 | "num_epochs = 1000\n", 368 | "\n", 369 | "train_cost, val_cost, val_acc = [],[],[]\n", 370 | "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n", 371 | "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n", 372 | "with tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts)) as sess:\n", 373 | " \n", 374 | " # initializing all variables\n", 375 | " init = tf.initialize_all_variables()\n", 376 | " sess.run(init)\n", 377 | " plot_decision_boundary(lambda x: pred(x, sess), X_val, y_val)\n", 378 | " plt.title(\"Untrained Classifier\")\n", 379 | " for e in range(num_epochs):\n", 380 | " ### TRAINING ###\n", 381 | " # what to feed to our train_op\n", 382 | " # notice we onehot encode our predictions to change shape from (batch,) -> (batch, num_output)\n", 383 | " feed_dict_train = {x_pl: X_tr, y_: onehot(y_tr, num_output)}\n", 384 | " \n", 385 | " # deciding which parts to fetch, train_op makes the classifier \"train\"\n", 386 | " fetches_train = [train_op, cross_entropy]\n", 387 | " \n", 388 | " # running the train_op\n", 389 | " res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)\n", 390 | " # storing cross entropy (second fetch argument, so index=1)\n", 391 | " train_cost += [res[1]]\n", 392 | " \n", 393 | " ### VALIDATING ###\n", 394 | " # what to feed our accuracy op\n", 395 | " feed_dict_valid = {x_pl: X_val, y_: onehot(y_val, num_output)}\n", 396 | "\n", 397 | " # deciding which parts to fetch\n", 398 | " fetches_valid = [cross_entropy, accuracy]\n", 399 | "\n", 400 | " # running the validation\n", 401 | " res = sess.run(fetches=fetches_valid, feed_dict=feed_dict_valid)\n", 402 | " val_cost += [res[0]]\n", 403 | " val_acc += [res[1]]\n", 404 | "\n", 405 | " if e % 100 == 0:\n", 406 | " print \"Epoch %i, Train Cost: %0.3f\\tVal Cost: %0.3f\\t Val acc: %0.3f\"%(e, train_cost[-1],val_cost[-1],val_acc[-1])\n", 407 | "\n", 408 | " ### TESTING ###\n", 409 | " # what to feed our accuracy op\n", 410 | " feed_dict_test = {x_pl: X_te, y_: onehot(y_te, num_output)}\n", 411 | "\n", 412 | " # deciding which parts to fetch\n", 413 | " fetches_test = [cross_entropy, accuracy]\n", 414 | "\n", 415 | " # running the validation\n", 416 | " res = sess.run(fetches=fetches_test, feed_dict=feed_dict_test)\n", 417 | " test_cost = res[0]\n", 418 | " test_acc = res[1]\n", 419 | " print \"\\nTest Cost: %0.3f\\tTest Accuracy: %0.3f\"%(test_cost, test_acc)\n", 420 | " \n", 421 | " # For plotting purposes\n", 422 | " plot_decision_boundary(lambda x: pred(x, sess), X_te, y_te)\n", 423 | "\n", 424 | "# notice: we do not need to use the session environment anymore, so returning from it.\n", 425 | "plt.title(\"Trained Classifier\")\n", 426 | "\n", 427 | "epoch = np.arange(len(train_cost))\n", 428 | "plt.figure()\n", 429 | "plt.plot(epoch,train_cost,'r',epoch,val_cost,'b')\n", 430 | "plt.legend(['Train Loss','Val Loss'])\n", 431 | "plt.xlabel('Updates'), plt.ylabel('Loss')" 432 | ] 433 | }, 434 | { 435 | "cell_type": "markdown", 436 | "metadata": {}, 437 | "source": [ 438 | "# Assignments Half Moon\n", 439 | "\n", 440 | " 1) A linear logistic classifier is only able to create a linear decision boundary. Change the Logistic classifier into a (non-linear) Neural network by inserting a dense hidden layer between the input and output layers of the model\n", 441 | " \n", 442 | " 2) Experiment with multiple hidden layers or more / less hidden units. What happens to the decision boundary?\n", 443 | " \n", 444 | " 3) Overfitting: When increasing the number of hidden layers / units the neural network will fit the training data better by creating a highly nonlinear decision boundary. If the model is to complex it will often generalize poorly to new data (validation and test set). Can you obseve this from the training and validation errors? \n", 445 | " \n", 446 | " 4) We used the vanilla stocastic gradient descent algorithm for parameter updates. This is usually slow to converge and more sophisticated pseudo-second-order methods usually works better. Try changing the optimizer to [adam or momentum](https://www.tensorflow.org/versions/r0.10/api_docs/python/train.html#AdamOptimizer)" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "# Optional: MNIST dataset\n", 454 | "MNIST is a dataset that is often used for benchmarking. The MNIST dataset consists of 70,000 images of handwritten digits from 0-9. The dataset is split into a 50,000 images training set, 10,000 images validation set and 10,000 images test set. The images are 28x28 pixels, where each pixel represents a normalised value between 0-255 (0=black and 255=white).\n", 455 | "\n", 456 | "### Primer for the afternoon...\n", 457 | "We use a feedforward neural network to classify the 28x28 mnist images. ``num_features`` is therefore 28x28=784.\n", 458 | "That is, we represent each image as a vector. The ordering of the pixels in the vector does not matter, so we could permutate all images using the same permutation and still get the same performance. (Your are of course encouraged to try this using ``numpy.random.permutation`` to get a random permutation :)). This task is therefore called the _permutation invariant_ MNIST. Obviously this throws away a lot of structure in the data. After lunch we'll fix this with the convolutional neural network wich encodes prior knowledgde about data that has either spatial or temporal structure. \n", 459 | "\n", 460 | "### Ballpark estimates of hyperparameters\n", 461 | "__Optimizers:__\n", 462 | " 1. SGD + Momentum: learning rate 1.0 - 0.1 \n", 463 | " 2. ADAM: learning rate 3*1e-4 - 1e-5\n", 464 | " 3. RMSPROP: somewhere between SGD and ADAM\n", 465 | "\n", 466 | "__Regularization:__\n", 467 | " 1. Dropout. Dropout rate 0.1-0.5\n", 468 | " 2. L2 and L1 regularization - https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#regularizers.\n", 469 | " Not used that often in deep learning, but 1e-4 - 1e-8.\n", 470 | " 3. Batchnorm: Batchnorm also acts as a regularizer - https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#batch_norm\n", 471 | " Often very useful (faster and better convergence)\n", 472 | " \n", 473 | " \n", 474 | "__Parameter initialization__\n", 475 | " Parameter initialization is extremely important. TensorFlow has a lot of different initializers, check the TensorFlow API [documentation](https://www.tensorflow.org/versions/r0.10/api_docs/index.html). Often used initializer are\n", 476 | " 1. He - (not available in TensorFlow's API)\n", 477 | " 2. Glorot - https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#xavier_initializer\n", 478 | " 3. Uniform or Normal with small scale. (0.1 - 0.01) - https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#random_normal_initializer\n", 479 | " 4. Orthogonal (I find that this works very well for RNNs) - (not available in TensorFlow's API)\n", 480 | "\n", 481 | "Bias is nearly always initialized to zero using the [tf.constant_initializer](https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#constant_initializer).\n", 482 | "\n", 483 | "__Number of hidden units and network structure__\n", 484 | " Probably as big network as possible and then apply regularization. You'll have to experiment :). One rarely goes below 512 units for feedforward networks unless your are training on CPU...\n", 485 | " Theres is some research into stochstic depth networks: https://arxiv.org/pdf/1603.09382v2.pdf, but in general this is trail and error.\n", 486 | "\n", 487 | "__Nonlinearity__: [The most commonly used nonliearities are](https://www.tensorflow.org/versions/r0.10/api_docs/python/nn.html#activation-functions)\n", 488 | " \n", 489 | " 1. ReLU\n", 490 | " 2. Leaky ReLU. Same as \n", 491 | " 3. Elu\n", 492 | " 3. Sigmoids are used if your output is binary. It is not used in the hidden layers. Squases the output between -1 and 1\n", 493 | " 4. Softmax used as output if you have a classification problem. Normalizes the the output to 1. )\n", 494 | "\n", 495 | "\n", 496 | "See the plot below.\n", 497 | "\n", 498 | "__mini-batch size__\n", 499 | " Usually people use 16-256. Bigger is not allways better. With smaller mini-batch size you get more updates and your model might converge faster. Also small batchsizes uses less memory -> you can train a model with more parameters.\n", 500 | "\n", 501 | "Hyperparameters can be found by experience (guessing) or some search procedure. Random search is easy to implement and performs decent: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf . \n", 502 | "More advanced search procedures include [SPEARMINT](https://github.com/JasperSnoek/spearmint) and many others." 503 | ] 504 | }, 505 | { 506 | "cell_type": "code", 507 | "execution_count": null, 508 | "metadata": { 509 | "collapsed": false 510 | }, 511 | "outputs": [], 512 | "source": [ 513 | "# PLOT OF DIFFERENT OUTPUT USNITS\n", 514 | "x = np.linspace(-6, 6, 100)\n", 515 | "relu = lambda x: np.maximum(0, x)\n", 516 | "leaky_relu = lambda x: np.maximum(0, x) + 0.1*np.minimum(0, x) # probably a slow implementation....\n", 517 | "elu = lambda x: (x > 0)*x + (1 - (x > 0))*(np.exp(x) - 1) \n", 518 | "sigmoid = lambda x: (1+np.exp(-x))**(-1)\n", 519 | "def softmax(w, t = 1.0):\n", 520 | " e = np.exp(w)\n", 521 | " dist = e / np.sum(e)\n", 522 | " return dist\n", 523 | "x_softmax = softmax(x)\n", 524 | "\n", 525 | "plt.figure(figsize=(6,6))\n", 526 | "plt.plot(x, relu(x), label='ReLU', lw=2)\n", 527 | "plt.plot(x, leaky_relu(x), label='Leaky ReLU',lw=2)\n", 528 | "plt.plot(x, elu(x), label='Elu', lw=2)\n", 529 | "plt.plot(x, sigmoid(x), label='Sigmoid',lw=2)\n", 530 | "plt.legend(loc=2, fontsize=16)\n", 531 | "plt.title('Non-linearities', fontsize=20)\n", 532 | "plt.ylim([-2, 5])\n", 533 | "plt.xlim([-6, 6])\n", 534 | "\n", 535 | "# softmax\n", 536 | "# assert that all class probablities sum to one\n", 537 | "print np.sum(x_softmax)\n", 538 | "assert abs(1.0 - x_softmax.sum()) < 1e-8" 539 | ] 540 | }, 541 | { 542 | "cell_type": "markdown", 543 | "metadata": { 544 | "collapsed": true 545 | }, 546 | "source": [ 547 | "## MNIST\n", 548 | "First let's load the MNIST dataset and plot a few examples:" 549 | ] 550 | }, 551 | { 552 | "cell_type": "code", 553 | "execution_count": null, 554 | "metadata": { 555 | "collapsed": false 556 | }, 557 | "outputs": [], 558 | "source": [ 559 | "#To speed up training we'll only work on a subset of the data\n", 560 | "data = np.load('mnist.npz')\n", 561 | "num_classes = 10\n", 562 | "x_train = data['X_train'][:1000].astype('float32')\n", 563 | "targets_train = data['y_train'][:1000].astype('int32')\n", 564 | "\n", 565 | "x_valid = data['X_valid'][:500].astype('float32')\n", 566 | "targets_valid = data['y_valid'][:500].astype('int32')\n", 567 | "\n", 568 | "x_test = data['X_test'][:500].astype('float32')\n", 569 | "targets_test = data['y_test'][:500].astype('int32')\n", 570 | "\n", 571 | "print \"Information on dataset\"\n", 572 | "print \"x_train\", x_train.shape\n", 573 | "print \"targets_train\", targets_train.shape\n", 574 | "print \"x_valid\", x_valid.shape\n", 575 | "print \"targets_valid\", targets_valid.shape\n", 576 | "print \"x_test\", x_test.shape\n", 577 | "print \"targets_test\", targets_test.shape" 578 | ] 579 | }, 580 | { 581 | "cell_type": "code", 582 | "execution_count": null, 583 | "metadata": { 584 | "collapsed": false, 585 | "scrolled": true 586 | }, 587 | "outputs": [], 588 | "source": [ 589 | "#plot a few MNIST examples\n", 590 | "idx = 0\n", 591 | "canvas = np.zeros((28*10, 10*28))\n", 592 | "for i in range(10):\n", 593 | " for j in range(10):\n", 594 | " canvas[i*28:(i+1)*28, j*28:(j+1)*28] = x_train[idx].reshape((28, 28))\n", 595 | " idx += 1\n", 596 | "plt.figure(figsize=(7, 7))\n", 597 | "plt.axis('off')\n", 598 | "plt.imshow(canvas, cmap='gray')\n", 599 | "plt.title('MNIST handwritten digits')\n", 600 | "plt.show()" 601 | ] 602 | }, 603 | { 604 | "cell_type": "code", 605 | "execution_count": null, 606 | "metadata": { 607 | "collapsed": false 608 | }, 609 | "outputs": [], 610 | "source": [ 611 | "#Hyperparameters\n", 612 | "\n", 613 | "num_classes = 10\n", 614 | "num_l1 = 512\n", 615 | "num_features = x_train.shape[1]\n", 616 | "\n", 617 | "# resetting the graph ...\n", 618 | "reset_default_graph()\n", 619 | "\n", 620 | "# Setting up placeholder, this is where your data enters the graph!\n", 621 | "x_pl = tf.placeholder(tf.float32, [None, num_features])\n", 622 | "\n", 623 | "# defining our weight initializers\n", 624 | "weight_initializer = tf.truncated_normal_initializer(stddev=0.1)\n", 625 | "\n", 626 | "# Setting up the trainable weights of the network\n", 627 | "with tf.variable_scope('l_1'):\n", 628 | " W_1 = tf.get_variable('W', [num_features, num_l1],\n", 629 | " initializer=weight_initializer)\n", 630 | " b_1 = tf.get_variable('b', [num_l1],\n", 631 | " initializer=tf.constant_initializer(0.0))\n", 632 | "\n", 633 | "with tf.variable_scope('l_2'):\n", 634 | " W_2 = tf.get_variable('W', [num_l1, num_classes],\n", 635 | " initializer=weight_initializer)\n", 636 | " b_2 = tf.get_variable('b', [num_classes],\n", 637 | " initializer=tf.constant_initializer(0.0))\n", 638 | "\n", 639 | "\n", 640 | "# Building the layers of the neural network\n", 641 | "l1 = tf.matmul(x_pl, W_1) + b_1\n", 642 | "l1_nonlinear = tf.nn.elu(l1) # you can try with various activation functions!\n", 643 | "l2 = tf.matmul(l1, W_2) + b_2\n", 644 | "y = tf.nn.softmax(l2)" 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": null, 650 | "metadata": { 651 | "collapsed": true 652 | }, 653 | "outputs": [], 654 | "source": [ 655 | "# y_ is a placeholder variable taking on the value of the target batch.\n", 656 | "y_ = tf.placeholder(tf.float32, [None, num_classes])\n", 657 | "\n", 658 | "# computing cross entropy per sample\n", 659 | "cross_entropy = -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])\n", 660 | "\n", 661 | "# averaging over samples\n", 662 | "loss_tn = tf.reduce_mean(cross_entropy)\n", 663 | "\n", 664 | "# L2 regularization\n", 665 | "#reg_scale = 0.0001\n", 666 | "#regularize = tf.contrib.layers.l2_regularizer(reg_scale)\n", 667 | "#params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n", 668 | "#reg_term = sum([regularize(param) for param in params])\n", 669 | "#loss_tn += reg_term\n", 670 | "\n", 671 | "# defining our optimizer\n", 672 | "optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)\n", 673 | "\n", 674 | "# applying the gradients\n", 675 | "train_op = optimizer.minimize(loss_tn)\n", 676 | "\n", 677 | "# notice, alternatively you can use train_op = optimizer.minimize(crossentropy)\n", 678 | "# instead of the three steps above" 679 | ] 680 | }, 681 | { 682 | "cell_type": "code", 683 | "execution_count": null, 684 | "metadata": { 685 | "collapsed": false 686 | }, 687 | "outputs": [], 688 | "source": [ 689 | "#Test the forward pass\n", 690 | "x = np.random.normal(0,1, (45, 28*28)).astype('float32') #dummy data\n", 691 | "\n", 692 | "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n", 693 | "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n", 694 | "# initialize the Session\n", 695 | "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n", 696 | "sess.run(tf.initialize_all_variables())\n", 697 | "res = sess.run(fetches=[y], feed_dict={x_pl: x})\n", 698 | "print \"y\", res[0].shape" 699 | ] 700 | }, 701 | { 702 | "cell_type": "markdown", 703 | "metadata": {}, 704 | "source": [ 705 | "# Build the training loop.\n", 706 | "We train the network by calculating the gradient w.r.t the cost function and update the parameters in direction of the negative gradient. \n", 707 | "\n", 708 | "\n", 709 | "When training neural network you always use mini batches. Instead of calculating the average gradient using the entire dataset you approximate the gradient using a mini-batch of typically 16 to 256 samples. The paramters are updated after each mini batch. Networks converges much faster using minibatches because the paramters are updated more often.\n", 710 | "\n", 711 | "We build a loop that iterates over the training data. Remember that the parameters are updated each time ``f_train`` is called." 712 | ] 713 | }, 714 | { 715 | "cell_type": "code", 716 | "execution_count": null, 717 | "metadata": { 718 | "collapsed": false 719 | }, 720 | "outputs": [], 721 | "source": [ 722 | "# using confusionmatrix to handle \n", 723 | "from confusionmatrix import ConfusionMatrix\n", 724 | "\n", 725 | "# setting hyperparameters and gettings epoch sizes\n", 726 | "batch_size = 100\n", 727 | "num_epochs = 100\n", 728 | "num_samples_train = x_train.shape[0]\n", 729 | "num_batches_train = num_samples_train // batch_size\n", 730 | "num_samples_valid = x_valid.shape[0]\n", 731 | "num_batches_valid = num_samples_valid // batch_size\n", 732 | "\n", 733 | "# setting up lists for handling loss/accuracy\n", 734 | "train_acc, train_loss = [], []\n", 735 | "valid_acc, valid_loss = [], []\n", 736 | "test_acc, test_loss = [], []\n", 737 | "cur_loss = 0\n", 738 | "loss = []\n", 739 | "## TRAINING ##\n", 740 | "for epoch in range(num_epochs):\n", 741 | " #Forward->Backprob->Update params\n", 742 | " cur_loss = 0\n", 743 | " for i in range(num_batches_train):\n", 744 | " idx = range(i*batch_size, (i+1)*batch_size)\n", 745 | " x_batch = x_train[idx]\n", 746 | " target_batch = targets_train[idx]\n", 747 | " feed_dict_train = {x_pl: x_batch, y_: onehot(target_batch, num_classes)}\n", 748 | " fetches_train = [train_op, loss_tn]\n", 749 | " res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)\n", 750 | " batch_loss = res[1]\n", 751 | " cur_loss += batch_loss\n", 752 | " loss += [cur_loss/batch_size]\n", 753 | " \n", 754 | " confusion_valid = ConfusionMatrix(num_classes)\n", 755 | " confusion_train = ConfusionMatrix(num_classes)\n", 756 | "\n", 757 | " ### EVAL - TRAIN ###\n", 758 | " for i in range(num_batches_train):\n", 759 | " idx = range(i*batch_size, (i+1)*batch_size)\n", 760 | " x_batch = x_train[idx]\n", 761 | " targets_batch = targets_train[idx]\n", 762 | " # what to feed our accuracy op\n", 763 | " feed_dict_eval_train = {x_pl: x_batch, y_: onehot(targets_batch, num_classes)}\n", 764 | " # deciding which parts to fetch\n", 765 | " fetches_eval_train = [y]\n", 766 | " # running the validation\n", 767 | " res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n", 768 | " # collecting and storing predictions\n", 769 | " net_out = res[0]\n", 770 | " preds = np.argmax(net_out, axis=-1)\n", 771 | " confusion_train.batch_add(targets_batch, preds)\n", 772 | "\n", 773 | " ### EVAL - VALIDATION ###\n", 774 | " confusion_valid = ConfusionMatrix(num_classes)\n", 775 | " for i in range(num_batches_valid):\n", 776 | " idx = range(i*batch_size, (i+1)*batch_size)\n", 777 | " x_batch = x_valid[idx]\n", 778 | " targets_batch = targets_valid[idx]\n", 779 | " # what to feed our accuracy op\n", 780 | " feed_dict_eval_train = {x_pl: x_batch, y_: onehot(targets_batch, num_classes)}\n", 781 | " # deciding which parts to fetch\n", 782 | " fetches_eval_train = [y]\n", 783 | " # running the validation\n", 784 | " res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n", 785 | " # collecting and storing predictions\n", 786 | " net_out = res[0]\n", 787 | " preds = np.argmax(net_out, axis=-1) \n", 788 | " confusion_valid.batch_add(targets_batch, preds)\n", 789 | " \n", 790 | " train_acc_cur = confusion_train.accuracy()\n", 791 | " valid_acc_cur = confusion_valid.accuracy()\n", 792 | "\n", 793 | " train_acc += [train_acc_cur]\n", 794 | " valid_acc += [valid_acc_cur]\n", 795 | " print \"Epoch %i : Train Loss %e , Train acc %f, Valid acc %f \" \\\n", 796 | " % (epoch+1, loss[-1], train_acc_cur, valid_acc_cur)\n", 797 | " \n", 798 | " \n", 799 | "epoch = np.arange(len(train_acc))\n", 800 | "plt.figure()\n", 801 | "plt.plot(epoch,train_acc,'r',epoch,valid_acc,'b')\n", 802 | "plt.legend(['Train Acc','Val Acc'])\n", 803 | "plt.xlabel('Updates'), plt.ylabel('Acc')" 804 | ] 805 | }, 806 | { 807 | "cell_type": "markdown", 808 | "metadata": { 809 | "collapsed": true 810 | }, 811 | "source": [ 812 | "# More questions\n", 813 | "\n", 814 | "1. Do you see overfitting? Google overfitting if you don't know how to spot it\n", 815 | "2. Try and regularize your network by penalizing the L2 or L1 norm of the network parameters. [Read the docs for more info](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#regularizers)" 816 | ] 817 | } 818 | ], 819 | "metadata": { 820 | "kernelspec": { 821 | "display_name": "Python 2", 822 | "language": "python", 823 | "name": "python2" 824 | }, 825 | "language_info": { 826 | "codemirror_mode": { 827 | "name": "ipython", 828 | "version": 2 829 | }, 830 | "file_extension": ".py", 831 | "mimetype": "text/x-python", 832 | "name": "python", 833 | "nbconvert_exporter": "python", 834 | "pygments_lexer": "ipython2", 835 | "version": "2.7.6" 836 | } 837 | }, 838 | "nbformat": 4, 839 | "nbformat_minor": 0 840 | } 841 | -------------------------------------------------------------------------------- /lab1_FFN/mnist.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab1_FFN/mnist.npz -------------------------------------------------------------------------------- /lab2_CNN/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab2_CNN/.gitignore -------------------------------------------------------------------------------- /lab2_CNN/confusionmatrix.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class ConfusionMatrix: 5 | """ 6 | Simple confusion matrix class 7 | row is the true class, column is the predicted class 8 | """ 9 | def __init__(self, num_classes, class_names=None): 10 | self.n_classes = num_classes 11 | if class_names is None: 12 | self.class_names = map(str, range(num_classes)) 13 | else: 14 | self.class_names = class_names 15 | 16 | # find max class_name and pad 17 | max_len = max(map(len, self.class_names)) 18 | self.max_len = max_len 19 | for idx, name in enumerate(self.class_names): 20 | if len(self.class_names) < max_len: 21 | self.class_names[idx] = name + " "*(max_len-len(name)) 22 | 23 | self.mat = np.zeros((num_classes,num_classes),dtype='int') 24 | 25 | def __str__(self): 26 | # calucate row and column sums 27 | col_sum = np.sum(self.mat, axis=1) 28 | row_sum = np.sum(self.mat, axis=0) 29 | 30 | s = [] 31 | 32 | mat_str = self.mat.__str__() 33 | mat_str = mat_str.replace('[','').replace(']','').split('\n') 34 | 35 | for idx, row in enumerate(mat_str): 36 | if idx == 0: 37 | pad = " " 38 | else: 39 | pad = "" 40 | class_name = self.class_names[idx] 41 | class_name = " " + class_name + " |" 42 | row_str = class_name + pad + row 43 | row_str += " |" + str(col_sum[idx]) 44 | s.append(row_str) 45 | 46 | row_sum = [(self.max_len+4)*" "+" ".join(map(str, row_sum))] 47 | hline = [(1+self.max_len)*" "+"-"*len(row_sum[0])] 48 | 49 | s = hline + s + hline + row_sum 50 | 51 | # add linebreaks 52 | s_out = [line+'\n' for line in s] 53 | return "".join(s_out) 54 | 55 | def batch_add(self, targets, preds): 56 | assert targets.shape == preds.shape 57 | assert len(targets) == len(preds) 58 | assert max(targets) < self.n_classes 59 | assert max(preds) < self.n_classes 60 | targets = targets.flatten() 61 | preds = preds.flatten() 62 | for i in range(len(targets)): 63 | self.mat[targets[i], preds[i]] += 1 64 | 65 | def get_errors(self): 66 | tp = np.asarray(np.diag(self.mat).flatten(),dtype='float') 67 | fn = np.asarray(np.sum(self.mat, axis=1).flatten(),dtype='float') - tp 68 | fp = np.asarray(np.sum(self.mat, axis=0).flatten(),dtype='float') - tp 69 | tn = np.asarray(np.sum(self.mat)*np.ones(self.n_classes).flatten(), 70 | dtype='float') - tp - fn - fp 71 | return tp, fn, fp, tn 72 | 73 | def accuracy(self): 74 | """ 75 | Calculates global accuracy 76 | :return: accuracy 77 | :example: >>> conf = ConfusionMatrix(3) 78 | >>> conf.batchAdd([0,0,1],[0,0,2]) 79 | >>> print conf.accuracy() 80 | """ 81 | tp, _, _, _ = self.get_errors() 82 | n_samples = np.sum(self.mat) 83 | return np.sum(tp) / n_samples 84 | 85 | def sensitivity(self): 86 | tp, tn, fp, fn = self.get_errors() 87 | res = tp / (tp + fn) 88 | res = res[~np.isnan(res)] 89 | return res 90 | 91 | def specificity(self): 92 | tp, tn, fp, fn = self.get_errors() 93 | res = tn / (tn + fp) 94 | res = res[~np.isnan(res)] 95 | return res 96 | 97 | def positive_predictive_value(self): 98 | tp, tn, fp, fn = self.get_errors() 99 | res = tp / (tp + fp) 100 | res = res[~np.isnan(res)] 101 | return res 102 | 103 | def negative_predictive_value(self): 104 | tp, tn, fp, fn = self.get_errors() 105 | res = tn / (tn + fn) 106 | res = res[~np.isnan(res)] 107 | return res 108 | 109 | def false_positive_rate(self): 110 | tp, tn, fp, fn = self.get_errors() 111 | res = fp / (fp + tn) 112 | res = res[~np.isnan(res)] 113 | return res 114 | 115 | def false_discovery_rate(self): 116 | tp, tn, fp, fn = self.get_errors() 117 | res = fp / (tp + fp) 118 | res = res[~np.isnan(res)] 119 | return res 120 | 121 | def F1(self): 122 | tp, tn, fp, fn = self.get_errors() 123 | res = (2*tp) / (2*tp + fp + fn) 124 | res = res[~np.isnan(res)] 125 | return res 126 | 127 | def matthews_correlation(self): 128 | tp, tn, fp, fn = self.get_errors() 129 | numerator = tp*tn - fp*fn 130 | denominator = np.sqrt((tp + fp)*(tp + fn)*(tn + fp)*(tn + fn)) 131 | res = numerator / denominator 132 | res = res[~np.isnan(res)] 133 | return res 134 | -------------------------------------------------------------------------------- /lab2_CNN/lab2_CNN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Credits\n", 8 | "TensorFlow translation of [Lasagne tutorial](https://github.com/DeepLearningDTU/nvidia_deep_learning_summercamp_2016/blob/master/lab2/lab2_CNN.ipynb). Thanks to [skaae](https://github.com/skaae), [casperkaae](https://github.com/casperkaae) and [larsmaaloee](https://github.com/larsmaaloee)." 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "# Dependancies and supporting functions\n", 16 | "Loading dependancies and supporting functions by running the code block below." 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": null, 22 | "metadata": { 23 | "collapsed": false 24 | }, 25 | "outputs": [], 26 | "source": [ 27 | "%matplotlib inline\n", 28 | "import matplotlib\n", 29 | "import numpy as np\n", 30 | "import matplotlib.pyplot as plt\n", 31 | "import sklearn.datasets\n", 32 | "import tensorflow as tf\n", 33 | "from tensorflow.python.framework.ops import reset_default_graph\n", 34 | "\n", 35 | "def onehot(t, num_classes):\n", 36 | " out = np.zeros((t.shape[0], num_classes))\n", 37 | " for row, col in enumerate(t):\n", 38 | " out[row, col] = 1\n", 39 | " return out" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "# Convolutional Neural networks 101\n", 47 | "\n", 48 | "Convolution neural networks are one of the most succesfull types of neural networks for image recognition and an integral part of reigniting the interest in neural networks. \n", 49 | "\n", 50 | "In this lab we'll experiment with inserting 2D-convolution layers in the fully connected neural networks introduced in LAB1. We'll furhter experiment with stacking of convolution layers, max pooling and strided convolutions which are all important techniques in current convolution neural network architectures. Lastly we'll try to visualize the learned convolution filters and try to understand what kind of features they learn to recognize.\n", 51 | "\n", 52 | "\n", 53 | "If you are unfamilar with the the convolution operation https://github.com/vdumoulin/conv_arithmetic have a nice visualization of different convolution variants. For a more indept tutorial please see http://cs231n.github.io/convolutional-networks/ or http://neuralnetworksanddeeplearning.com/chap6.html. Lastly if you are ambitious and want implement a convolution neural network from scratch please see an exercise for our Deep Learning summer school last year https://github.com/DTU-deeplearning/day2-Conv" 54 | ] 55 | }, 56 | { 57 | "cell_type": "code", 58 | "execution_count": null, 59 | "metadata": { 60 | "collapsed": false 61 | }, 62 | "outputs": [], 63 | "source": [ 64 | "#LOAD the mnist data. To speed up training we'll only work on a subset of the data.\n", 65 | "#Note that we reshape the data from (nsamples, num_features)= (nsamples, nchannels*rows*cols) -> (nsamples, nchannels, rows, cols)\n", 66 | "# in order to retain the spatial arrangements of the pixels\n", 67 | "data = np.load('mnist.npz')\n", 68 | "num_classes = 10\n", 69 | "nchannels,rows,cols = 1,28,28\n", 70 | "x_train = data['X_train'][:1000].astype('float32')\n", 71 | "x_train = x_train.reshape((-1,nchannels,rows,cols))\n", 72 | "targets_train = data['y_train'][:1000].astype('int32')\n", 73 | "\n", 74 | "x_valid = data['X_valid'][:500].astype('float32')\n", 75 | "x_valid = x_valid.reshape((-1,nchannels,rows,cols))\n", 76 | "targets_valid = data['y_valid'][:500].astype('int32')\n", 77 | "\n", 78 | "x_test = data['X_test'][:500].astype('float32')\n", 79 | "x_test = x_test.reshape((-1,nchannels,rows,cols))\n", 80 | "targets_test = data['y_test'][:500].astype('int32')\n", 81 | "\n", 82 | "print \"Information on dataset\"\n", 83 | "print \"x_train\", x_train.shape\n", 84 | "print \"targets_train\", targets_train.shape\n", 85 | "print \"x_valid\", x_valid.shape\n", 86 | "print \"targets_valid\", targets_valid.shape\n", 87 | "print \"x_test\", x_test.shape\n", 88 | "print \"targets_test\", targets_test.shape" 89 | ] 90 | }, 91 | { 92 | "cell_type": "code", 93 | "execution_count": null, 94 | "metadata": { 95 | "collapsed": false 96 | }, 97 | "outputs": [], 98 | "source": [ 99 | "#plot a few MNIST examples\n", 100 | "idx = 0\n", 101 | "canvas = np.zeros((28*10, 10*28))\n", 102 | "for i in range(10):\n", 103 | " for j in range(10):\n", 104 | " canvas[i*28:(i+1)*28, j*28:(j+1)*28] = x_train[idx].reshape((28, 28))\n", 105 | " idx += 1\n", 106 | "plt.figure(figsize=(7, 7))\n", 107 | "plt.imshow(canvas, cmap='gray')\n", 108 | "plt.title('MNIST handwritten digits')\n", 109 | "plt.show()" 110 | ] 111 | }, 112 | { 113 | "cell_type": "markdown", 114 | "metadata": {}, 115 | "source": [ 116 | "### Documentation on contrib layers\n", 117 | "Check out the [github page](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py) for information on contrib layers (not well documented in their api)." 118 | ] 119 | }, 120 | { 121 | "cell_type": "code", 122 | "execution_count": null, 123 | "metadata": { 124 | "collapsed": false 125 | }, 126 | "outputs": [], 127 | "source": [ 128 | "from tensorflow.contrib.layers import fully_connected, convolution2d, flatten, batch_norm, max_pool2d, dropout\n", 129 | "from tensorflow.python.ops.nn import relu, elu, relu6, sigmoid, tanh, softmax" 130 | ] 131 | }, 132 | { 133 | "cell_type": "code", 134 | "execution_count": null, 135 | "metadata": { 136 | "collapsed": false 137 | }, 138 | "outputs": [], 139 | "source": [ 140 | "# define a simple feed forward neural network\n", 141 | "\n", 142 | "# hyperameters of the model\n", 143 | "num_classes = 10\n", 144 | "channels = x_train.shape[1]\n", 145 | "height = x_train.shape[2]\n", 146 | "width = x_train.shape[3]\n", 147 | "num_filters_conv1 = 16\n", 148 | "kernel_size_conv1 = [5, 5] # [height, width]\n", 149 | "stride_conv1 = [1, 1] # [stride_height, stride_width]\n", 150 | "num_l1 = 100\n", 151 | "# resetting the graph ...\n", 152 | "reset_default_graph()\n", 153 | "\n", 154 | "# Setting up placeholder, this is where your data enters the graph!\n", 155 | "x_pl = tf.placeholder(tf.float32, [None, channels, height, width])\n", 156 | "l_reshape = tf.transpose(x_pl, [0, 2, 3, 1]) # TensorFlow uses NHWC instead of NCHW\n", 157 | "#is_training = tf.placeholder(tf.bool)#used for dropout\n", 158 | "\n", 159 | "# Building the layers of the neural network\n", 160 | "# we define the variable scope, so we more easily can recognise our variables later\n", 161 | "#l_conv1 = convolution2d(l_reshape, num_filters_conv1, kernel_size_conv1, stride_conv1, scope=\"l_conv1\")\n", 162 | "l_flatten = flatten(l_reshape, scope=\"flatten\") # use l_conv1 instead of l_reshape\n", 163 | "l1 = fully_connected(l_flatten, num_l1, activation_fn=relu, scope=\"l1\")\n", 164 | "#l1 = dropout(l1, is_training=is_training, scope=\"dropout\")\n", 165 | "y = fully_connected(l1, num_classes, activation_fn=softmax, scope=\"y\")" 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": null, 171 | "metadata": { 172 | "collapsed": true 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "# y_ is a placeholder variable taking on the value of the target batch.\n", 177 | "y_ = tf.placeholder(tf.float32, [None, num_classes])\n", 178 | "\n", 179 | "# computing cross entropy per sample\n", 180 | "cross_entropy = -tf.reduce_sum(y_ * tf.log(y+1e-8), reduction_indices=[1])\n", 181 | "\n", 182 | "# averaging over samples\n", 183 | "cross_entropy = tf.reduce_mean(cross_entropy)\n", 184 | "\n", 185 | "# defining our optimizer\n", 186 | "optimizer = tf.train.AdamOptimizer(learning_rate=0.001)\n", 187 | "\n", 188 | "# applying the gradients\n", 189 | "train_op = optimizer.minimize(cross_entropy)" 190 | ] 191 | }, 192 | { 193 | "cell_type": "code", 194 | "execution_count": null, 195 | "metadata": { 196 | "collapsed": false 197 | }, 198 | "outputs": [], 199 | "source": [ 200 | "#Test the forward pass\n", 201 | "x = np.random.normal(0,1, (45, 1,28,28)).astype('float32') #dummy data\n", 202 | "\n", 203 | "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n", 204 | "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n", 205 | "# initialize the Session\n", 206 | "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n", 207 | "sess.run(tf.initialize_all_variables())\n", 208 | "res = sess.run(fetches=[y], feed_dict={x_pl: x})\n", 209 | "#res = sess.run(fetches=[y], feed_dict={x_pl: x, is_training: False}) # for when using dropout\n", 210 | "print \"y\", res[0].shape" 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "execution_count": null, 216 | "metadata": { 217 | "collapsed": false 218 | }, 219 | "outputs": [], 220 | "source": [ 221 | "#Training Loop\n", 222 | "from confusionmatrix import ConfusionMatrix\n", 223 | "batch_size = 100\n", 224 | "num_epochs = 10\n", 225 | "num_samples_train = x_train.shape[0]\n", 226 | "num_batches_train = num_samples_train // batch_size\n", 227 | "num_samples_valid = x_valid.shape[0]\n", 228 | "num_batches_valid = num_samples_valid // batch_size\n", 229 | "\n", 230 | "train_acc, train_loss = [], []\n", 231 | "valid_acc, valid_loss = [], []\n", 232 | "test_acc, test_loss = [], []\n", 233 | "cur_loss = 0\n", 234 | "loss = []\n", 235 | "try:\n", 236 | " for epoch in range(num_epochs):\n", 237 | " #Forward->Backprob->Update params\n", 238 | " cur_loss = 0\n", 239 | " for i in range(num_batches_train):\n", 240 | " idx = range(i*batch_size, (i+1)*batch_size)\n", 241 | " x_batch = x_train[idx]\n", 242 | " target_batch = targets_train[idx]\n", 243 | " feed_dict_train = {x_pl: x_batch, y_: onehot(target_batch, num_classes)}\n", 244 | " #feed_dict_train = {x_pl: x_batch, y_: onehot(target_batch, num_classes), is_training: True}\n", 245 | " fetches_train = [train_op, cross_entropy]\n", 246 | " res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)\n", 247 | " batch_loss = res[1] #this will do the complete backprob pass\n", 248 | " cur_loss += batch_loss\n", 249 | " loss += [cur_loss/batch_size]\n", 250 | "\n", 251 | " confusion_valid = ConfusionMatrix(num_classes)\n", 252 | " confusion_train = ConfusionMatrix(num_classes)\n", 253 | "\n", 254 | " for i in range(num_batches_train):\n", 255 | " idx = range(i*batch_size, (i+1)*batch_size)\n", 256 | " x_batch = x_train[idx]\n", 257 | " targets_batch = targets_train[idx]\n", 258 | " # what to feed our accuracy op\n", 259 | " feed_dict_eval_train = {x_pl: x_batch}\n", 260 | " #feed_dict_eval_train = {x_pl: x_batch, is_training: False}\n", 261 | " # deciding which parts to fetch\n", 262 | " fetches_eval_train = [y]\n", 263 | " # running the validation\n", 264 | " res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n", 265 | " # collecting and storing predictions\n", 266 | " net_out = res[0] \n", 267 | " preds = np.argmax(net_out, axis=-1) \n", 268 | " confusion_train.batch_add(targets_batch, preds)\n", 269 | "\n", 270 | " confusion_valid = ConfusionMatrix(num_classes)\n", 271 | " for i in range(num_batches_valid):\n", 272 | " idx = range(i*batch_size, (i+1)*batch_size)\n", 273 | " x_batch = x_valid[idx]\n", 274 | " targets_batch = targets_valid[idx]\n", 275 | " # what to feed our accuracy op\n", 276 | " feed_dict_eval_train = {x_pl: x_batch}\n", 277 | " #feed_dict_eval_train = {x_pl: x_batch, is_training: False}\n", 278 | " # deciding which parts to fetch\n", 279 | " fetches_eval_train = [y]\n", 280 | " # running the validation\n", 281 | " res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n", 282 | " # collecting and storing predictions\n", 283 | " net_out = res[0]\n", 284 | " preds = np.argmax(net_out, axis=-1) \n", 285 | "\n", 286 | " confusion_valid.batch_add(targets_batch, preds)\n", 287 | "\n", 288 | " train_acc_cur = confusion_train.accuracy()\n", 289 | " valid_acc_cur = confusion_valid.accuracy()\n", 290 | "\n", 291 | " train_acc += [train_acc_cur]\n", 292 | " valid_acc += [valid_acc_cur]\n", 293 | " print \"Epoch %i : Train Loss %e , Train acc %f, Valid acc %f \" \\\n", 294 | " % (epoch+1, loss[-1], train_acc_cur, valid_acc_cur)\n", 295 | "except KeyboardInterrupt:\n", 296 | " pass\n", 297 | " \n", 298 | "\n", 299 | "#get test set score\n", 300 | "confusion_test = ConfusionMatrix(num_classes)\n", 301 | "# what to feed our accuracy op\n", 302 | "feed_dict_eval_train = {x_pl: x_test}\n", 303 | "#feed_dict_eval_train = {x_pl: x_test, is_training: False}\n", 304 | "# deciding which parts to fetch\n", 305 | "fetches_eval_train = [y]\n", 306 | "# running the validation\n", 307 | "res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n", 308 | "# collecting and storing predictions\n", 309 | "net_out = res[0] \n", 310 | "preds = np.argmax(net_out, axis=-1) \n", 311 | "confusion_test.batch_add(targets_test, preds)\n", 312 | "print \"\\nTest set Acc: %f\" %(confusion_test.accuracy())\n", 313 | "\n", 314 | "\n", 315 | "epoch = np.arange(len(train_acc))\n", 316 | "plt.figure()\n", 317 | "plt.plot(epoch,train_acc,'r',epoch,valid_acc,'b')\n", 318 | "plt.legend(['Train Acc','Val Acc'])\n", 319 | "plt.xlabel('Epochs'), plt.ylabel('Acc'), plt.ylim([0.75,1.03])" 320 | ] 321 | }, 322 | { 323 | "cell_type": "markdown", 324 | "metadata": {}, 325 | "source": [ 326 | "# Assignments 1\n", 327 | "\n", 328 | " 1) Note the performance of the standard feedforward neural network. Add a 2D convolution layer before the dense hidden layer and confirm that it increases the generalization performance of the network (try num_filters=16 and filter_size=5 as a starting point). \n", 329 | " \n", 330 | " 2) Can the performance be increases even further by stacking more convolution layers ?\n", 331 | " \n", 332 | " 3) Maxpooling is a technique for decreasing the spatial resolution of an image while retaining the important features. Effectively this gives a local translational invariance and reduces the computation by a factor of four. In the classification algorithm which is usually desirable. Try to either: \n", 333 | " \n", 334 | " a) add a maxpool layer(add arguement pool_size=2) after the convolution layer or\n", 335 | " b) set add stride=2 to the arguments of the convolution layer. \n", 336 | " Verify that this decreases spatial dimension of the image. (print l_conv.output_shape or print l_maxpool.output_shape). Does this increase the performance of the network (you may need to stack multiple layers or increase the number of filters to increase performance) ?\n", 337 | " \n" 338 | ] 339 | }, 340 | { 341 | "cell_type": "markdown", 342 | "metadata": {}, 343 | "source": [ 344 | "# Visualization of filters\n", 345 | "Convolution filters can be interpreted as spatial feature detectors picking up different image features such as edges, corners etc. Below we provide code for visualization of the filters. The best results are obtained with fairly large filters of size 9 and either 16 or 36 filters. " 346 | ] 347 | }, 348 | { 349 | "cell_type": "code", 350 | "execution_count": null, 351 | "metadata": { 352 | "collapsed": false 353 | }, 354 | "outputs": [], 355 | "source": [ 356 | "# to start with we print the names of the weights in our graph\n", 357 | "# to see what operations we are allowed to perform on the variables in our graph, try:\n", 358 | "#print(dir(tf.all_variables()[0]))\n", 359 | "# you will notice it has \"name\" and \"value\", which we will build a dictionary from\n", 360 | "names_and_vars = {var.name: sess.run(var.value()) for var in tf.all_variables()}\n", 361 | "print(names_and_vars.keys())\n", 362 | "# getting the name was easy, just use .name on the variable object\n", 363 | "# getting the value in a numpy array format is slightly more tricky\n", 364 | "# we need to first get a variable object, then turn it into a tensor with .value()\n", 365 | "# and the evaluate the tensor with sess.run(...)" 366 | ] 367 | }, 368 | { 369 | "cell_type": "code", 370 | "execution_count": null, 371 | "metadata": { 372 | "collapsed": false 373 | }, 374 | "outputs": [], 375 | "source": [ 376 | "### ERROR - If you get a key error, then you need to define l_conv1 in your model!\n", 377 | "if not u'l_conv1/weights:0' in names_and_vars:\n", 378 | " print \"You need to go back and define a convolutional layer in the network.\"\n", 379 | "else:\n", 380 | " np_W = names_and_vars[u'l_conv1/weights:0'] # get the filter values from the first conv layer\n", 381 | " print np_W.shape, \"i.e. the shape is filter_size, filter_size, num_channels, num_filters\"\n", 382 | " filter_size, _, num_channels, num_filters = np_W.shape\n", 383 | " n = int(num_filters**0.5)\n", 384 | "\n", 385 | " # reshaping the last dimension to be n by n\n", 386 | " np_W_res = np_W.reshape(filter_size, filter_size, num_channels, n, n)\n", 387 | " fig, ax = plt.subplots(n,n)\n", 388 | " print \"learned filter values\"\n", 389 | " for i in range(n):\n", 390 | " for j in range(n):\n", 391 | " ax[i,j].imshow(np_W_res[:,:,0,i,j], cmap='gray',interpolation='none')\n", 392 | " ax[i,j].xaxis.set_major_formatter(plt.NullFormatter())\n", 393 | " ax[i,j].yaxis.set_major_formatter(plt.NullFormatter())\n", 394 | "\n", 395 | "\n", 396 | " idx = 1\n", 397 | " plt.figure()\n", 398 | " plt.imshow(x_train[idx,0],cmap='gray',interpolation='none')\n", 399 | " plt.title('Inut Image')\n", 400 | " plt.show()\n", 401 | "\n", 402 | " #visalize the filters convolved with an input image\n", 403 | " from scipy.signal import convolve2d\n", 404 | " np_W_res = np_W.reshape(filter_size, filter_size, num_channels, n, n)\n", 405 | " fig, ax = plt.subplots(n,n,figsize=(9,9))\n", 406 | " print \"Response from input image convolved with the filters\"\n", 407 | " for i in range(n):\n", 408 | " for j in range(n):\n", 409 | " ax[i,j].imshow(convolve2d(x_train[1,0],np_W_res[:,:,0,i,j],mode='same'),\n", 410 | " cmap='gray',interpolation='none')\n", 411 | " ax[i,j].xaxis.set_major_formatter(plt.NullFormatter())\n", 412 | " ax[i,j].yaxis.set_major_formatter(plt.NullFormatter())" 413 | ] 414 | }, 415 | { 416 | "cell_type": "markdown", 417 | "metadata": {}, 418 | "source": [ 419 | "# Assignment 2\n", 420 | "\n", 421 | "The visualized filters will likely look most like noise due to the small amount of training data.\n", 422 | "\n", 423 | " 1) Try to use 10000 traning examples instead and visualise the filters again\n", 424 | " \n", 425 | " 2) Dropout is a very usefull technique for preventing overfitting. Try to add a DropoutLayer after the convolution layer and hidden layer. This should increase both performance and the \"visual appeal\" of the filters\n", 426 | " \n", 427 | " 3) Batch normalization is a recent innovation for improving generalization performance. Try to insert batch normalization layers into the network to improve performance. \n", 428 | " \n", 429 | " \n" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "# More Fun with convolutional networks\n", 437 | "### Get the data" 438 | ] 439 | }, 440 | { 441 | "cell_type": "code", 442 | "execution_count": null, 443 | "metadata": { 444 | "collapsed": false 445 | }, 446 | "outputs": [], 447 | "source": [ 448 | "!wget -N https://s3.amazonaws.com/lasagne/recipes/datasets/mnist_cluttered_60x60_6distortions.npz" 449 | ] 450 | }, 451 | { 452 | "cell_type": "markdown", 453 | "metadata": {}, 454 | "source": [ 455 | "In the data the each mnist digit (20x20 pixels) has been placed randomly in a 60x60 canvas. To make the task harder each canvas has then been cluttered with small pieces of digits. In this task it is helpfull for a network if it can focus only on the digit and ignore the rest.\n", 456 | "\n", 457 | "The ``TransformerLayer`` lets us do this. The transformer layer learns an affine transformation which lets the network zoom, rotate and skew. If you are interested you should read the paper, but the main idea is that you can let a small convolutional network determine the the parameters of the affine transformation. You then apply the affine transformation to the input data. Usually this also involves downsampling which forces the model to zoom in on the relevant parts of the data. After the affine transformation we can use a larger conv net to do the classification. \n", 458 | "This is possible because you can backprop through a an affine transformation if you use bilinear interpolation." 459 | ] 460 | }, 461 | { 462 | "cell_type": "code", 463 | "execution_count": null, 464 | "metadata": { 465 | "collapsed": false 466 | }, 467 | "outputs": [], 468 | "source": [ 469 | "import os\n", 470 | "import matplotlib\n", 471 | "import numpy as np\n", 472 | "np.random.seed(123)\n", 473 | "import matplotlib.pyplot as plt\n", 474 | "import tensorflow as tf\n", 475 | "from tensorflow.contrib.layers import fully_connected, convolution2d, flatten, max_pool2d\n", 476 | "pool = max_pool2d\n", 477 | "conv = convolution2d\n", 478 | "dense = fully_connected\n", 479 | "from tensorflow.python.ops.nn import relu, softmax\n", 480 | "from tensorflow.python.framework.ops import reset_default_graph\n", 481 | "\n", 482 | "from spatial_transformer import transformer\n", 483 | "\n", 484 | "def onehot(t, num_classes):\n", 485 | " out = np.zeros((t.shape[0], num_classes))\n", 486 | " for row, col in enumerate(t):\n", 487 | " out[row, col] = 1\n", 488 | " return out\n", 489 | "\n", 490 | "NUM_EPOCHS = 500\n", 491 | "BATCH_SIZE = 256\n", 492 | "LEARNING_RATE = 0.001\n", 493 | "DIM = 60\n", 494 | "NUM_CLASSES = 10\n", 495 | "mnist_cluttered = \"mnist_cluttered_60x60_6distortions.npz\"" 496 | ] 497 | }, 498 | { 499 | "cell_type": "code", 500 | "execution_count": null, 501 | "metadata": { 502 | "collapsed": false 503 | }, 504 | "outputs": [], 505 | "source": [ 506 | "def load_data():\n", 507 | " data = np.load(mnist_cluttered)\n", 508 | " X_train, y_train = data['x_train'], np.argmax(data['y_train'], axis=-1)\n", 509 | " X_valid, y_valid = data['x_valid'], np.argmax(data['y_valid'], axis=-1)\n", 510 | " X_test, y_test = data['x_test'], np.argmax(data['y_test'], axis=-1)\n", 511 | "\n", 512 | " # reshape for convolutions\n", 513 | " X_train = X_train.reshape((X_train.shape[0], 1, DIM, DIM))\n", 514 | " X_valid = X_valid.reshape((X_valid.shape[0], 1, DIM, DIM))\n", 515 | " X_test = X_test.reshape((X_test.shape[0], 1, DIM, DIM))\n", 516 | " \n", 517 | " print \"Train samples:\", X_train.shape\n", 518 | " print \"Validation samples:\", X_valid.shape\n", 519 | " print \"Test samples:\", X_test.shape\n", 520 | "\n", 521 | " return dict(\n", 522 | " X_train=np.asarray(X_train, dtype='float32'),\n", 523 | " y_train=y_train.astype('int32'),\n", 524 | " X_valid=np.asarray(X_valid, dtype='float32'),\n", 525 | " y_valid=y_valid.astype('int32'),\n", 526 | " X_test=np.asarray(X_test, dtype='float32'),\n", 527 | " y_test=y_test.astype('int32'),\n", 528 | " num_examples_train=X_train.shape[0],\n", 529 | " num_examples_valid=X_valid.shape[0],\n", 530 | " num_examples_test=X_test.shape[0],\n", 531 | " input_height=X_train.shape[2],\n", 532 | " input_width=X_train.shape[3],\n", 533 | " output_dim=10,)\n", 534 | "data = load_data()\n", 535 | "\n", 536 | "idx = 0\n", 537 | "canvas = np.zeros((DIM*10, 10*DIM))\n", 538 | "for i in range(10):\n", 539 | " for j in range(10):\n", 540 | " canvas[i*DIM:(i+1)*DIM, j*DIM:(j+1)*DIM] = data['X_train'][idx].reshape((DIM, DIM))\n", 541 | " idx += 1\n", 542 | "plt.figure(figsize=(10, 10))\n", 543 | "plt.imshow(canvas, cmap='gray')\n", 544 | "plt.title('Cluttered handwritten digits')\n", 545 | "plt.axis('off')\n", 546 | "\n", 547 | "plt.show()" 548 | ] 549 | }, 550 | { 551 | "cell_type": "markdown", 552 | "metadata": {}, 553 | "source": [ 554 | "## Building the model\n", 555 | "\n", 556 | "We use a model where the localization network is a two layer convolution network which operates directly on the image input. The output from the localization network is a 6 dimensional vector specifying the parameters in the affine transformation.\n", 557 | "\n", 558 | "We set up the transformer layer to initially do the identity transform, similarly to [1]. If the output from the localization networks is [t1, t2, t3, t4, t5, t6] then t1 and t5 determines zoom, t2 and t4 determines skewness, and t3 and t6 move the center position. By setting the initial values of the bias vector to \n", 559 | "\n", 560 | "```\n", 561 | "|1, 0, 0|\n", 562 | "|0, 1, 0|\n", 563 | "```\n", 564 | "and the final W of the localization network to all zeros we ensure that in the beginning of training the network works as a pooling layer. \n", 565 | "\n", 566 | "The output of the localization layer feeds into the transformer layer which applies the transformation to the image input. In our setup the transformer layer downsamples the input by a factor 3.\n", 567 | "\n", 568 | "Finally a 2 layer convolution layer and 2 fully connected layers calculates the output probabilities.\n", 569 | "\n", 570 | "\n", 571 | "### The model\n", 572 | "```\n", 573 | "Input -> localization_network -> TransformerLayer -> output_network -> predictions\n", 574 | " | |\n", 575 | " >--------------------------------^\n", 576 | "```\n", 577 | "\n", 578 | "\n" 579 | ] 580 | }, 581 | { 582 | "cell_type": "code", 583 | "execution_count": null, 584 | "metadata": { 585 | "collapsed": false 586 | }, 587 | "outputs": [], 588 | "source": [ 589 | "reset_default_graph()\n", 590 | "def build_model(x_pl, input_width, input_height, output_dim,\n", 591 | " batch_size=BATCH_SIZE):\n", 592 | " # Setting up placeholder, this is where your data enters the graph!\n", 593 | " l_reshape = tf.transpose(x_pl, [0, 2, 3, 1]) # TensorFlow uses NHWC instead of NCHW\n", 594 | "\n", 595 | " # make distributed representation of input image for localization network\n", 596 | " loc_l1 = pool(l_reshape, kernel_size=[2, 2], scope=\"loc_l1\")\n", 597 | " loc_l2 = conv(loc_l1, num_outputs=8, kernel_size=[5, 5], stride=[1, 1], padding=\"SAME\", scope=\"loc_l2\")\n", 598 | " loc_l3 = pool(loc_l2, kernel_size=[2, 2], scope=\"loc_l3\")\n", 599 | " loc_l4 = conv(loc_l3, num_outputs=8, kernel_size=[5, 5], stride=[1, 1], padding=\"SAME\", scope=\"loc_l4\")\n", 600 | " loc_l4_flatten = flatten(loc_l4, scope=\"loc_l4_flatten\")\n", 601 | " loc_l5 = dense(loc_l4_flatten, num_outputs=50, activation_fn=relu, scope=\"loc_l5\")\n", 602 | " # set up weights for transformation (notice we always need 6 output neurons)\n", 603 | " W_loc_out = tf.get_variable(\"W_loc_out\", [50, 6], initializer=tf.constant_initializer(0.0))\n", 604 | " initial = np.array([[1, 0, 0], [0, 1, 0]])\n", 605 | " initial = initial.astype('float32')\n", 606 | " initial = initial.flatten()\n", 607 | " b_loc_out = tf.Variable(initial_value=initial, name='b_loc_out')\n", 608 | " loc_out = tf.matmul(loc_l5, W_loc_out) + b_loc_out\n", 609 | "\n", 610 | " # spatial transformer\n", 611 | " l_trans1 = transformer(l_reshape, loc_out, out_size=(DIM//3, DIM//3))\n", 612 | " l_trans1.set_shape([None, DIM//3, DIM//3, 1])\n", 613 | " l_trans1_valid = tf.transpose(l_trans1, [0, 2, 3, 1]) # Back into NCHW for validation\n", 614 | "\n", 615 | " print \"Transformer network output shape: \", l_trans1.get_shape()\n", 616 | "\n", 617 | " # classification network\n", 618 | " class_l1 = conv(l_trans1, num_outputs=16, kernel_size=[3, 3], scope=\"class_l1\")\n", 619 | " class_l2 = pool(class_l1, kernel_size=[2, 2], scope=\"class_l2\")\n", 620 | " class_l3 = conv(class_l2, num_outputs=16, kernel_size=[3, 3], scope=\"class_l3\")\n", 621 | " class_l4 = pool(class_l3, kernel_size=[2, 2], scope=\"class_l4\")\n", 622 | " class_l4_flatten = flatten(class_l4, scope=\"class_l4_flatten\")\n", 623 | " class_l5 = dense(class_l4_flatten, num_outputs=256, activation_fn=relu, scope=\"class_l5\")\n", 624 | " l_out = dense(class_l5, num_outputs=output_dim, activation_fn=softmax, scope=\"l_out\")\n", 625 | "\n", 626 | " return l_out, l_trans1_valid\n", 627 | "\n", 628 | "x_pl = tf.placeholder(tf.float32, [None, 1, DIM, DIM])\n", 629 | "model, l_transform = build_model(x_pl, DIM, DIM, NUM_CLASSES)\n", 630 | "#model_params = lasagne.layers.get_all_params(model, trainable=True)" 631 | ] 632 | }, 633 | { 634 | "cell_type": "code", 635 | "execution_count": null, 636 | "metadata": { 637 | "collapsed": false 638 | }, 639 | "outputs": [], 640 | "source": [ 641 | "# y_ is a placeholder variable taking on the value of the target batch.\n", 642 | "y_pl = tf.placeholder(tf.float32, shape=[None, NUM_CLASSES])\n", 643 | "lr_pl = tf.placeholder(tf.float32, shape=[])\n", 644 | "\n", 645 | "# computing cross entropy per sample\n", 646 | "cross_entropy = -tf.reduce_sum(y_pl * tf.log(model+1e-8), reduction_indices=[1])\n", 647 | "\n", 648 | "# averaging over samples\n", 649 | "cross_entropy = tf.reduce_mean(cross_entropy)\n", 650 | "\n", 651 | "# defining our optimizer\n", 652 | "optimizer = tf.train.AdamOptimizer(learning_rate=lr_pl)\n", 653 | "\n", 654 | "# applying the gradients\n", 655 | "train_op = optimizer.minimize(cross_entropy)" 656 | ] 657 | }, 658 | { 659 | "cell_type": "code", 660 | "execution_count": null, 661 | "metadata": { 662 | "collapsed": false 663 | }, 664 | "outputs": [], 665 | "source": [ 666 | "# test the forward pass\n", 667 | "x = np.random.normal(0,1, (45, 1,60,60)).astype('float32') #dummy data\n", 668 | "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n", 669 | "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n", 670 | "# initialize the Session\n", 671 | "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n", 672 | "sess.run(tf.initialize_all_variables())\n", 673 | "res = sess.run(fetches=[model], feed_dict={x_pl: x})\n", 674 | "print \"y\", res[0].shape" 675 | ] 676 | }, 677 | { 678 | "cell_type": "markdown", 679 | "metadata": {}, 680 | "source": [ 681 | "### Training the model\n", 682 | "Unfortunately NVIDIA has yet to squeeze a TitanX into a labtop and training convnets on CPU is painfully slow. After 10 epochs you should see that model starts to zoom in on the digits. " 683 | ] 684 | }, 685 | { 686 | "cell_type": "code", 687 | "execution_count": null, 688 | "metadata": { 689 | "collapsed": false 690 | }, 691 | "outputs": [], 692 | "source": [ 693 | "def train_epoch(X, y, learning_rate):\n", 694 | " num_samples = X.shape[0]\n", 695 | " num_batches = int(np.ceil(num_samples / float(BATCH_SIZE)))\n", 696 | " costs = []\n", 697 | " correct = 0\n", 698 | " for i in range(num_batches):\n", 699 | " if i % 10 == 0:\n", 700 | " print i,\n", 701 | " idx = range(i*BATCH_SIZE, np.minimum((i+1)*BATCH_SIZE, num_samples))\n", 702 | " X_batch_tr = X[idx]\n", 703 | " y_batch_tr = y[idx]\n", 704 | " fetches_tr = [train_op, cross_entropy, model]\n", 705 | " feed_dict_tr = {x_pl: X_batch_tr, y_pl: onehot(y_batch_tr, NUM_CLASSES), lr_pl: learning_rate}\n", 706 | " res = sess.run(fetches=fetches_tr, feed_dict=feed_dict_tr)\n", 707 | " cost_batch, output_train = tuple(res[1:3])\n", 708 | " costs += [cost_batch]\n", 709 | " preds = np.argmax(output_train, axis=-1)\n", 710 | " correct += np.sum(y_batch_tr == preds)\n", 711 | " print \"\"\n", 712 | " return np.mean(costs), correct / float(num_samples)\n", 713 | "\n", 714 | "\n", 715 | "def eval_epoch(X, y):\n", 716 | " num_samples = X.shape[0]\n", 717 | " num_batches = int(np.ceil(num_samples / float(BATCH_SIZE)))\n", 718 | " pred_list = []\n", 719 | " transform_list = []\n", 720 | " for i in range(num_batches):\n", 721 | " if i % 10 == 0:\n", 722 | " print i,\n", 723 | " idx = range(i*BATCH_SIZE, np.minimum((i+1)*BATCH_SIZE, num_samples))\n", 724 | " X_batch_val = X[idx]\n", 725 | " fetches_val = [model, l_transform]\n", 726 | " feed_dict_val = {x_pl: X_batch_val}\n", 727 | " res = sess.run(fetches=fetches_val, feed_dict=feed_dict_val)\n", 728 | " output_eval, transform_eval = tuple(res)\n", 729 | " pred_list.append(output_eval)\n", 730 | " transform_list.append(transform_eval)\n", 731 | " transform_eval = np.concatenate(transform_list, axis=0)\n", 732 | " preds = np.concatenate(pred_list, axis=0)\n", 733 | " preds = np.argmax(preds, axis=-1)\n", 734 | " acc = np.mean(preds == y)\n", 735 | " print \"\"\n", 736 | " return acc, transform_eval" 737 | ] 738 | }, 739 | { 740 | "cell_type": "code", 741 | "execution_count": null, 742 | "metadata": { 743 | "collapsed": false 744 | }, 745 | "outputs": [], 746 | "source": [ 747 | "valid_accs, train_accs, test_accs = [], [], []\n", 748 | "learning_rate=0.0001\n", 749 | "try:\n", 750 | " for n in range(NUM_EPOCHS):\n", 751 | " print \"Epoch %d:\" % n\n", 752 | " print 'train ',\n", 753 | " train_cost, train_acc = train_epoch(data['X_train'], data['y_train'], learning_rate)\n", 754 | " print 'valid ',\n", 755 | " valid_acc, valid_trainsform = eval_epoch(data['X_valid'], data['y_valid'])\n", 756 | " print 'test ',\n", 757 | " test_acc, test_transform = eval_epoch(data['X_test'], data['y_test'])\n", 758 | " valid_accs += [valid_acc]\n", 759 | " test_accs += [test_acc]\n", 760 | " train_accs += [train_acc]\n", 761 | "\n", 762 | " # learning rate annealing\n", 763 | " if (n+1) % 20 == 0:\n", 764 | " learning_rate = learning_rate * 0.7\n", 765 | " print \"New LR:\", learning_rate\n", 766 | "\n", 767 | " print \"train cost {0:.2}, train acc {1:.2}, val acc {2:.2}, test acc {3:.2}\".format(\n", 768 | " train_cost, train_acc, valid_acc, test_acc)\n", 769 | "except KeyboardInterrupt:\n", 770 | " pass" 771 | ] 772 | }, 773 | { 774 | "cell_type": "markdown", 775 | "metadata": {}, 776 | "source": [ 777 | "### Plot errors and zoom" 778 | ] 779 | }, 780 | { 781 | "cell_type": "code", 782 | "execution_count": null, 783 | "metadata": { 784 | "collapsed": false 785 | }, 786 | "outputs": [], 787 | "source": [ 788 | "plt.figure(figsize=(9,9))\n", 789 | "plt.plot(1-np.array(train_accs), label='Training Error')\n", 790 | "plt.plot(1-np.array(valid_accs), label='Validation Error')\n", 791 | "plt.legend(fontsize=20)\n", 792 | "plt.xlabel('Epoch', fontsize=20)\n", 793 | "plt.ylabel('Error', fontsize=20)\n", 794 | "plt.show()" 795 | ] 796 | }, 797 | { 798 | "cell_type": "code", 799 | "execution_count": null, 800 | "metadata": { 801 | "collapsed": false 802 | }, 803 | "outputs": [], 804 | "source": [ 805 | "plt.figure(figsize=(7,14))\n", 806 | "for i in range(3):\n", 807 | " plt.subplot(321+i*2)\n", 808 | " plt.imshow(data['X_test'][i].reshape(DIM, DIM), cmap='gray', interpolation='none')\n", 809 | " if i == 0:\n", 810 | " plt.title('Original 60x60', fontsize=20)\n", 811 | " plt.axis('off')\n", 812 | " plt.subplot(322+i*2)\n", 813 | " plt.imshow(test_transform[i].reshape(DIM//3, DIM//3).T, cmap='gray', interpolation='none')\n", 814 | " if i == 0:\n", 815 | " plt.title('Transformed 20x20', fontsize=20)\n", 816 | " plt.axis('off')\n", 817 | " \n", 818 | " \n", 819 | "plt.tight_layout()\n", 820 | "plt.show()" 821 | ] 822 | }, 823 | { 824 | "cell_type": "markdown", 825 | "metadata": { 826 | "collapsed": true 827 | }, 828 | "source": [ 829 | "# A few pointers for image classification\n", 830 | "If you want do image classification using a pretrained model is often a good choice, especially if you have limited amounts of labeled data.\n", 831 | "\n", 832 | "An often used pretrained network is the Google Inception model. TensorFlow has a guide for using their current state-of-the-art pretrained model in their [model repository](https://github.com/tensorflow/models/tree/master/inception). Torch7 and Theano have similar pretrained models that you can find with google. \n", 833 | "\n", 834 | "Currently the best performing image network is the [ResNet](https://arxiv.org/pdf/1512.03385v1.pdf) model. Torch7 has an interesting blog post about residual nets. http://torch.ch/blog/2016/02/04/resnets.html" 835 | ] 836 | } 837 | ], 838 | "metadata": { 839 | "kernelspec": { 840 | "display_name": "Python 2", 841 | "language": "python", 842 | "name": "python2" 843 | }, 844 | "language_info": { 845 | "codemirror_mode": { 846 | "name": "ipython", 847 | "version": 2 848 | }, 849 | "file_extension": ".py", 850 | "mimetype": "text/x-python", 851 | "name": "python", 852 | "nbconvert_exporter": "python", 853 | "pygments_lexer": "ipython2", 854 | "version": "2.7.6" 855 | } 856 | }, 857 | "nbformat": 4, 858 | "nbformat_minor": 0 859 | } 860 | -------------------------------------------------------------------------------- /lab2_CNN/mnist.npz: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab2_CNN/mnist.npz -------------------------------------------------------------------------------- /lab2_CNN/spatial_transformer.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | import tensorflow as tf 16 | 17 | 18 | def transformer(U, theta, out_size, name='SpatialTransformer', **kwargs): 19 | """Spatial Transformer Layer 20 | 21 | Implements a spatial transformer layer as described in [1]_. 22 | Based on [2]_ and edited by David Dao for Tensorflow. 23 | 24 | Parameters 25 | ---------- 26 | U : float 27 | The output of a convolutional net should have the 28 | shape [num_batch, height, width, num_channels]. 29 | theta: float 30 | The output of the 31 | localisation network should be [num_batch, 6]. 32 | out_size: tuple of two ints 33 | The size of the output of the network (height, width) 34 | 35 | References 36 | ---------- 37 | .. [1] Spatial Transformer Networks 38 | Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu 39 | Submitted on 5 Jun 2015 40 | .. [2] https://github.com/skaae/transformer_network/blob/master/transformerlayer.py 41 | 42 | Notes 43 | ----- 44 | To initialize the network to the identity transform init 45 | ``theta`` to : 46 | identity = np.array([[1., 0., 0.], 47 | [0., 1., 0.]]) 48 | identity = identity.flatten() 49 | theta = tf.Variable(initial_value=identity) 50 | 51 | """ 52 | 53 | def _repeat(x, n_repeats): 54 | with tf.variable_scope('_repeat'): 55 | rep = tf.transpose( 56 | tf.expand_dims(tf.ones(shape=tf.pack([n_repeats, ])), 1), [1, 0]) 57 | rep = tf.cast(rep, 'int32') 58 | x = tf.matmul(tf.reshape(x, (-1, 1)), rep) 59 | return tf.reshape(x, [-1]) 60 | 61 | def _interpolate(im, x, y, out_size): 62 | with tf.variable_scope('_interpolate'): 63 | # constants 64 | num_batch = tf.shape(im)[0] 65 | height = tf.shape(im)[1] 66 | width = tf.shape(im)[2] 67 | channels = tf.shape(im)[3] 68 | 69 | x = tf.cast(x, 'float32') 70 | y = tf.cast(y, 'float32') 71 | height_f = tf.cast(height, 'float32') 72 | width_f = tf.cast(width, 'float32') 73 | out_height = out_size[0] 74 | out_width = out_size[1] 75 | zero = tf.zeros([], dtype='int32') 76 | max_y = tf.cast(tf.shape(im)[1] - 1, 'int32') 77 | max_x = tf.cast(tf.shape(im)[2] - 1, 'int32') 78 | 79 | # scale indices from [-1, 1] to [0, width/height] 80 | x = (x + 1.0)*(width_f) / 2.0 81 | y = (y + 1.0)*(height_f) / 2.0 82 | 83 | # do sampling 84 | x0 = tf.cast(tf.floor(x), 'int32') 85 | x1 = x0 + 1 86 | y0 = tf.cast(tf.floor(y), 'int32') 87 | y1 = y0 + 1 88 | 89 | x0 = tf.clip_by_value(x0, zero, max_x) 90 | x1 = tf.clip_by_value(x1, zero, max_x) 91 | y0 = tf.clip_by_value(y0, zero, max_y) 92 | y1 = tf.clip_by_value(y1, zero, max_y) 93 | dim2 = width 94 | dim1 = width*height 95 | base = _repeat(tf.range(num_batch)*dim1, out_height*out_width) 96 | base_y0 = base + y0*dim2 97 | base_y1 = base + y1*dim2 98 | idx_a = base_y0 + x0 99 | idx_b = base_y1 + x0 100 | idx_c = base_y0 + x1 101 | idx_d = base_y1 + x1 102 | 103 | # use indices to lookup pixels in the flat image and restore 104 | # channels dim 105 | im_flat = tf.reshape(im, tf.pack([-1, channels])) 106 | im_flat = tf.cast(im_flat, 'float32') 107 | Ia = tf.gather(im_flat, idx_a) 108 | Ib = tf.gather(im_flat, idx_b) 109 | Ic = tf.gather(im_flat, idx_c) 110 | Id = tf.gather(im_flat, idx_d) 111 | 112 | # and finally calculate interpolated values 113 | x0_f = tf.cast(x0, 'float32') 114 | x1_f = tf.cast(x1, 'float32') 115 | y0_f = tf.cast(y0, 'float32') 116 | y1_f = tf.cast(y1, 'float32') 117 | wa = tf.expand_dims(((x1_f-x) * (y1_f-y)), 1) 118 | wb = tf.expand_dims(((x1_f-x) * (y-y0_f)), 1) 119 | wc = tf.expand_dims(((x-x0_f) * (y1_f-y)), 1) 120 | wd = tf.expand_dims(((x-x0_f) * (y-y0_f)), 1) 121 | output = tf.add_n([wa*Ia, wb*Ib, wc*Ic, wd*Id]) 122 | return output 123 | 124 | def _meshgrid(height, width): 125 | with tf.variable_scope('_meshgrid'): 126 | # This should be equivalent to: 127 | # x_t, y_t = np.meshgrid(np.linspace(-1, 1, width), 128 | # np.linspace(-1, 1, height)) 129 | # ones = np.ones(np.prod(x_t.shape)) 130 | # grid = np.vstack([x_t.flatten(), y_t.flatten(), ones]) 131 | x_t = tf.matmul(tf.ones(shape=tf.pack([height, 1])), 132 | tf.transpose(tf.expand_dims(tf.linspace(-1.0, 1.0, width), 1), [1, 0])) 133 | y_t = tf.matmul(tf.expand_dims(tf.linspace(-1.0, 1.0, height), 1), 134 | tf.ones(shape=tf.pack([1, width]))) 135 | 136 | x_t_flat = tf.reshape(x_t, (1, -1)) 137 | y_t_flat = tf.reshape(y_t, (1, -1)) 138 | 139 | ones = tf.ones_like(x_t_flat) 140 | grid = tf.concat(0, [x_t_flat, y_t_flat, ones]) 141 | return grid 142 | 143 | def _transform(theta, input_dim, out_size): 144 | with tf.variable_scope('_transform'): 145 | num_batch = tf.shape(input_dim)[0] 146 | height = tf.shape(input_dim)[1] 147 | width = tf.shape(input_dim)[2] 148 | num_channels = tf.shape(input_dim)[3] 149 | theta = tf.reshape(theta, (-1, 2, 3)) 150 | theta = tf.cast(theta, 'float32') 151 | 152 | # grid of (x_t, y_t, 1), eq (1) in ref [1] 153 | height_f = tf.cast(height, 'float32') 154 | width_f = tf.cast(width, 'float32') 155 | out_height = out_size[0] 156 | out_width = out_size[1] 157 | grid = _meshgrid(out_height, out_width) 158 | grid = tf.expand_dims(grid, 0) 159 | grid = tf.reshape(grid, [-1]) 160 | grid = tf.tile(grid, tf.pack([num_batch])) 161 | grid = tf.reshape(grid, tf.pack([num_batch, 3, -1])) 162 | 163 | # Transform A x (x_t, y_t, 1)^T -> (x_s, y_s) 164 | T_g = tf.batch_matmul(theta, grid) 165 | x_s = tf.slice(T_g, [0, 0, 0], [-1, 1, -1]) 166 | y_s = tf.slice(T_g, [0, 1, 0], [-1, 1, -1]) 167 | x_s_flat = tf.reshape(x_s, [-1]) 168 | y_s_flat = tf.reshape(y_s, [-1]) 169 | 170 | input_transformed = _interpolate( 171 | input_dim, x_s_flat, y_s_flat, 172 | out_size) 173 | 174 | output = tf.reshape( 175 | input_transformed, tf.pack([num_batch, out_height, out_width, num_channels])) 176 | return output 177 | 178 | with tf.variable_scope(name): 179 | output = _transform(theta, U, out_size) 180 | return output 181 | 182 | 183 | def batch_transformer(U, thetas, out_size, name='BatchSpatialTransformer'): 184 | """Batch Spatial Transformer Layer 185 | 186 | Parameters 187 | ---------- 188 | 189 | U : float 190 | tensor of inputs [num_batch,height,width,num_channels] 191 | thetas : float 192 | a set of transformations for each input [num_batch,num_transforms,6] 193 | out_size : int 194 | the size of the output [out_height,out_width] 195 | 196 | Returns: float 197 | Tensor of size [num_batch*num_transforms,out_height,out_width,num_channels] 198 | """ 199 | with tf.variable_scope(name): 200 | num_batch, num_transforms = map(int, thetas.get_shape().as_list()[:2]) 201 | indices = [[i]*num_transforms for i in xrange(num_batch)] 202 | input_repeated = tf.gather(U, tf.reshape(indices, [-1])) 203 | return transformer(input_repeated, thetas, out_size) 204 | -------------------------------------------------------------------------------- /lab3_RNN/.gitignore: -------------------------------------------------------------------------------- 1 | *.jpg 2 | *.png 3 | -------------------------------------------------------------------------------- /lab3_RNN/confusionmatrix.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class ConfusionMatrix: 5 | """ 6 | Simple confusion matrix class 7 | row is the true class, column is the predicted class 8 | """ 9 | def __init__(self, num_classes, class_names=None): 10 | self.n_classes = num_classes 11 | if class_names is None: 12 | self.class_names = map(str, range(num_classes)) 13 | else: 14 | self.class_names = class_names 15 | 16 | # find max class_name and pad 17 | max_len = max(map(len, self.class_names)) 18 | self.max_len = max_len 19 | for idx, name in enumerate(self.class_names): 20 | if len(self.class_names) < max_len: 21 | self.class_names[idx] = name + " "*(max_len-len(name)) 22 | 23 | self.mat = np.zeros((num_classes,num_classes),dtype='int') 24 | 25 | def __str__(self): 26 | # calucate row and column sums 27 | col_sum = np.sum(self.mat, axis=1) 28 | row_sum = np.sum(self.mat, axis=0) 29 | 30 | s = [] 31 | 32 | mat_str = self.mat.__str__() 33 | mat_str = mat_str.replace('[','').replace(']','').split('\n') 34 | 35 | for idx, row in enumerate(mat_str): 36 | if idx == 0: 37 | pad = " " 38 | else: 39 | pad = "" 40 | class_name = self.class_names[idx] 41 | class_name = " " + class_name + " |" 42 | row_str = class_name + pad + row 43 | row_str += " |" + str(col_sum[idx]) 44 | s.append(row_str) 45 | 46 | row_sum = [(self.max_len+4)*" "+" ".join(map(str, row_sum))] 47 | hline = [(1+self.max_len)*" "+"-"*len(row_sum[0])] 48 | 49 | s = hline + s + hline + row_sum 50 | 51 | # add linebreaks 52 | s_out = [line+'\n' for line in s] 53 | return "".join(s_out) 54 | 55 | def batch_add(self, targets, preds): 56 | assert targets.shape == preds.shape 57 | assert len(targets) == len(preds) 58 | assert max(targets) < self.n_classes 59 | assert max(preds) < self.n_classes 60 | targets = targets.flatten() 61 | preds = preds.flatten() 62 | for i in range(len(targets)): 63 | self.mat[targets[i], preds[i]] += 1 64 | 65 | def get_errors(self): 66 | tp = np.asarray(np.diag(self.mat).flatten(),dtype='float') 67 | fn = np.asarray(np.sum(self.mat, axis=1).flatten(),dtype='float') - tp 68 | fp = np.asarray(np.sum(self.mat, axis=0).flatten(),dtype='float') - tp 69 | tn = np.asarray(np.sum(self.mat)*np.ones(self.n_classes).flatten(), 70 | dtype='float') - tp - fn - fp 71 | return tp, fn, fp, tn 72 | 73 | def accuracy(self): 74 | """ 75 | Calculates global accuracy 76 | :return: accuracy 77 | :example: >>> conf = ConfusionMatrix(3) 78 | >>> conf.batchAdd([0,0,1],[0,0,2]) 79 | >>> print conf.accuracy() 80 | """ 81 | tp, _, _, _ = self.get_errors() 82 | n_samples = np.sum(self.mat) 83 | return np.sum(tp) / n_samples 84 | 85 | def sensitivity(self): 86 | tp, tn, fp, fn = self.get_errors() 87 | res = tp / (tp + fn) 88 | res = res[~np.isnan(res)] 89 | return res 90 | 91 | def specificity(self): 92 | tp, tn, fp, fn = self.get_errors() 93 | res = tn / (tn + fp) 94 | res = res[~np.isnan(res)] 95 | return res 96 | 97 | def positive_predictive_value(self): 98 | tp, tn, fp, fn = self.get_errors() 99 | res = tp / (tp + fp) 100 | res = res[~np.isnan(res)] 101 | return res 102 | 103 | def negative_predictive_value(self): 104 | tp, tn, fp, fn = self.get_errors() 105 | res = tn / (tn + fn) 106 | res = res[~np.isnan(res)] 107 | return res 108 | 109 | def false_positive_rate(self): 110 | tp, tn, fp, fn = self.get_errors() 111 | res = fp / (fp + tn) 112 | res = res[~np.isnan(res)] 113 | return res 114 | 115 | def false_discovery_rate(self): 116 | tp, tn, fp, fn = self.get_errors() 117 | res = fp / (tp + fp) 118 | res = res[~np.isnan(res)] 119 | return res 120 | 121 | def F1(self): 122 | tp, tn, fp, fn = self.get_errors() 123 | res = (2*tp) / (2*tp + fp + fn) 124 | res = res[~np.isnan(res)] 125 | return res 126 | 127 | def matthews_correlation(self): 128 | tp, tn, fp, fn = self.get_errors() 129 | numerator = tp*tn - fp*fn 130 | denominator = np.sqrt((tp + fp)*(tp + fn)*(tn + fp)*(tn + fn)) 131 | res = numerator / denominator 132 | res = res[~np.isnan(res)] 133 | return res 134 | -------------------------------------------------------------------------------- /lab3_RNN/data_generator.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import numpy as np 3 | 4 | target_to_text = { 5 | '0':'zero', 6 | '1':'one', 7 | '2':'two', 8 | '3':'three', 9 | '4':'four', 10 | '5':'five', 11 | '6':'six', 12 | '7':'seven', 13 | '8':'eight', 14 | '9':'nine', 15 | } 16 | 17 | stop_character = start_character = '#' 18 | 19 | input_characters = " ".join(target_to_text.values()) 20 | valid_characters = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '#'] + \ 21 | list(set(input_characters)) 22 | 23 | def print_valid_characters(): 24 | l = '' 25 | for i,c in enumerate(valid_characters): 26 | l += "\'%s\'=%i,\t" % (c,i) 27 | print("Number of valid characters:", len(valid_characters)) 28 | print(l) 29 | 30 | ninput_chars = len(valid_characters) 31 | def get_batch(batch_size=100, min_digits = 3, max_digits=3): 32 | ''' 33 | Generates random sequences of integers and translates them to text i.e. 1->'one'. 34 | :param batch_size: number of samples to return 35 | :param min_digits: minimum length of target 36 | :param max_digits: maximum length of target 37 | ''' 38 | text_inputs = [] 39 | int_inputs = [] 40 | text_targets_in = [] 41 | text_targets_out = [] 42 | int_targets_in = [] 43 | int_targets_out = [] 44 | for i in range(batch_size): 45 | #convert integer into a list of digits 46 | tar_len = np.random.randint(min_digits,max_digits+1) 47 | text_target = inp_str = "".join(map(str,np.random.randint(0,10,tar_len))) 48 | text_target_in = start_character + text_target 49 | text_target_out = text_target + stop_character 50 | 51 | #generate the targets as a list of intergers 52 | int_target_in = map(lambda c: valid_characters.index(c), text_target_in) 53 | int_target_out = map(lambda c: valid_characters.index(c), text_target_out) 54 | 55 | #generate the text input 56 | text_input = " ".join(map(lambda k: target_to_text[k], inp_str)) 57 | #generate the inputs as a list of intergers 58 | int_input = map(lambda c: valid_characters.index(c), text_input) 59 | 60 | text_inputs.append(text_input) 61 | int_inputs.append(int_input) 62 | text_targets_in.append(text_target_in) 63 | text_targets_out.append(text_target_out) 64 | int_targets_in.append(int_target_in) 65 | int_targets_out.append(int_target_out) 66 | 67 | #create the input matrix, mask and seq_len - note that we zero pad the shorter sequences. 68 | max_input_len = max(map(len, int_inputs)) 69 | inputs = np.zeros((batch_size, max_input_len)) 70 | # input_masks = np.zeros((batch_size,max_input_len)) 71 | for (i,inp) in enumerate(int_inputs): 72 | cur_len = len(inp) 73 | inputs[i,:cur_len] = inp 74 | # input_masks[i,:cur_len] = 1 75 | inputs_seqlen = np.asarray(map(len, int_inputs)) 76 | 77 | max_target_in_len = max(map(len, int_targets_in)) 78 | targets_in = np.zeros((batch_size, max_target_in_len)) 79 | targets_mask = np.zeros((batch_size, max_target_in_len)) 80 | for (i, tar) in enumerate(int_targets_in): 81 | cur_len = len(tar) 82 | targets_in[i, :cur_len] = tar 83 | targets_seqlen = np.asarray(map(len, int_targets_in)) 84 | 85 | max_target_out_len = max(map(len, int_targets_out)) 86 | targets_out = np.zeros((batch_size, max_target_in_len)) 87 | for (i,tar) in enumerate(int_targets_out): 88 | cur_len = len(tar) 89 | targets_out[i,:cur_len] = tar 90 | targets_mask[i,:cur_len] = 1 91 | 92 | return inputs.astype('int32'), \ 93 | inputs_seqlen.astype('int32'), \ 94 | targets_in.astype('int32'), \ 95 | targets_out.astype('int32'), \ 96 | targets_seqlen.astype('int32'), \ 97 | targets_mask.astype('float32'), \ 98 | text_inputs, \ 99 | text_targets_in, \ 100 | text_targets_out 101 | 102 | if __name__ == '__main__': 103 | batch_size = 3 104 | inputs, inputs_seqlen, targets_in, targets_out, targets_seqlen, targets_mask, \ 105 | text_inputs, text_targets_in, text_targets_out = \ 106 | get_batch(batch_size=batch_size, max_digits=2, min_digits=1) 107 | 108 | print("input types:", inputs.dtype, inputs_seqlen.dtype, targets_in.dtype, targets_out.dtype, targets_seqlen.dtype) 109 | print(print_valid_characters()) 110 | print("Stop/start character = #") 111 | 112 | for i in range(batch_size): 113 | print("\nSAMPLE",i) 114 | print("TEXT INPUTS:\t\t\t", text_inputs[i]) 115 | print("TEXT TARGETS INPUT:\t\t", text_targets_in[i]) 116 | print("TEXT TARGETS OUTPUT:\t\t", text_targets_out[i]) 117 | print("ENCODED INPUTS:\t\t\t", inputs[i]) 118 | print("INPUTS SEQUENCE LENGTH:\t\t", inputs_seqlen[i]) 119 | print("ENCODED TARGETS INPUT:\t\t", targets_in[i]) 120 | print("ENCODED TARGETS OUTPUT:\t\t", targets_out[i]) 121 | print("TARGETS SEQUENCE LENGTH:\t", targets_seqlen[i]) 122 | print("TARGETS MASK:\t\t\t", targets_mask[i]) -------------------------------------------------------------------------------- /lab3_RNN/enc-dec.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab3_RNN/enc-dec.png -------------------------------------------------------------------------------- /lab3_RNN/lab3_RNN.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": { 7 | "collapsed": false 8 | }, 9 | "outputs": [], 10 | "source": [ 11 | "%matplotlib inline \n", 12 | "%matplotlib nbagg\n", 13 | "import tensorflow as tf\n", 14 | "import matplotlib\n", 15 | "import numpy as np\n", 16 | "import matplotlib.pyplot as plt\n", 17 | "from IPython import display\n", 18 | "from data_generator import get_batch, print_valid_characters\n", 19 | "from tensorflow.python.framework.ops import reset_default_graph\n", 20 | "\n", 21 | "import tf_utils" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "# Recurrent Neural Networks\n", 29 | "\n", 30 | "Recurrent neural networks are the natural type of neural network to use for sequential data i.e. time series analysis, translation, speech recognition, biological sequence analysis etc. Recurrent neural networks works by recursively applying the same operation at each time step of the data sequence and having layers that pass information from previous time step to the current. It can therefore naturally handle input of varying length. Recurrent networks can be used for several prediction tasks including: sequence-to-class, sequence tagging, and sequence-to-sequence predictions.\n", 31 | "\n", 32 | "In this exercise we'll implement a Encoder-Decoder RNN based on the GRU unit for a simple sequence to sequence translation task. This type of models have shown impressive performance in Neural Machine Translation and Image Caption generation. \n", 33 | "\n", 34 | "For more in depth background material on RNNs please see [Supervised Sequence Labelling with Recurrent\n", 35 | "Neural Networks](https://www.cs.toronto.edu/~graves/preprint.pdf) by Alex Graves\n", 36 | "\n", 37 | "We know that LSTMs and GRUs are difficult to understand. A very good non-mathematical introduction is [Chris Olahs blog](http://colah.github.io/posts/2015-08-Understanding-LSTMs/). (All the posts are nice and cover various topics within machine-learning)." 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "# Encoder-Decoder\n", 45 | "In the encoder-decoder structure one RNN (blue) encodes the input and a second RNN (red) calculates the target values. One essential step is to let the encoder and decoder communicate. In the simplest approach you use the last hidden state of the encoder to initialize the decoder. Other approaches lets the decoder attend to different parts of the encoded input at different timesteps in the decoding process. \n", 46 | "\n", 47 | "\n", 48 | "\n", 49 | "In our implementation we use a RNN with gated recurrent units (GRU) as encoder. We then use the last hidden state of the encoder ($h^{enc}_T$) as input to the decoder which is also a GRU RNN. \n", 50 | "\n", 51 | "### RNNs in TensorFlow\n", 52 | "TensorFlow has implementations of LSTM and GRU units. Both implementations assume that the input from the tensor below has the shape **(batch_size, seq_len, num_features)**, unless you have `time\\_major=True`. In this excercise we will use the GRU unit since it only stores a single hidden value per neuron (LSTMs stores two) and is approximately twice as fast as the LSTM unit.\n", 53 | "\n", 54 | "As stated above we will implement a Encoder-Decoder model. The simplest way to do this is to encode the input sequence using the Encoder model. We will then use the last hidden state of the Encoder $h^{enc}_T$ as input to the decoder model which then uses this information (simply a fixed length vector of numbers) to produce the targets. There is (at least) two ways to input $h^{enc}_T$ into the decoder\n", 55 | "\n", 56 | "1. Repeatly use $h^{enc}_T$ as input to the Decoder at each decode time step, as well as the previously computed word\n", 57 | "2. Intialize the decoder using $h^{enc}_T$ and run the decoder without any inputs\n", 58 | "\n", 59 | "In this exercise we will follow the second approach because it's easier to implement. To do this need to create a tensorflow layer that takes $h^{enc}_T$." 60 | ] 61 | }, 62 | { 63 | "cell_type": "markdown", 64 | "metadata": {}, 65 | "source": [ 66 | "### The Data\n", 67 | "Since RNN models can be very slow to train on real large datasets we will generate some simpler training data for this exercise. The task for the RNN is simply to translate a string of letters spelling the numbers between 0-9 into the corresponding numbers i.e\n", 68 | "\n", 69 | "\"one two five\" --> \"125#\" (we use # as a special end-of-sequence character)\n", 70 | "\n", 71 | "To input the strings into the RNN model we translate the characters into a vector integers using a simple translation table (i.e. 'h'->16, 'o'-> 17 etc). The code below prints a few input/output pairs using the *get_batch* function which randomy produces the data.\n", 72 | "\n", 73 | "Do note; that as showed in the illustration above for input to the decoder the end-of-sequence tag is flipped, and used in the beginning instead of the end. This tag is known as start-of-sequence, but often the end-of-sequence tag is just reused for this purpose.\n", 74 | "\n", 75 | "In the data loader below you will see two targets, target input and target output. Where the input will be used to compute the translation and output used for the loss function." 76 | ] 77 | }, 78 | { 79 | "cell_type": "code", 80 | "execution_count": null, 81 | "metadata": { 82 | "collapsed": false 83 | }, 84 | "outputs": [], 85 | "source": [ 86 | "batch_size = 3\n", 87 | "inputs, inputs_seqlen, targets_in, targets_out, targets_seqlen, targets_mask, \\\n", 88 | "text_inputs, text_targets_in, text_targets_out = \\\n", 89 | " get_batch(batch_size=batch_size, max_digits=2, min_digits=1)\n", 90 | "\n", 91 | "print \"input types:\", inputs.dtype, inputs_seqlen.dtype, targets_in.dtype, targets_out.dtype, targets_seqlen.dtype\n", 92 | "print print_valid_characters()\n", 93 | "print \"Stop/start character = #\"\n", 94 | "\n", 95 | "for i in range(batch_size):\n", 96 | " print \"\\nSAMPLE\",i\n", 97 | " print \"TEXT INPUTS:\\t\\t\\t\", text_inputs[i]\n", 98 | " print \"TEXT TARGETS INPUT:\\t\\t\", text_targets_in[i]\n", 99 | " print \"TEXT TARGETS OUTPUT:\\t\\t\", text_targets_out[i]\n", 100 | " print \"ENCODED INPUTS:\\t\\t\\t\", inputs[i]\n", 101 | " print \"INPUTS SEQUENCE LENGTH:\\t\\t\", inputs_seqlen[i]\n", 102 | " print \"ENCODED TARGETS INPUT:\\t\\t\", targets_in[i]\n", 103 | " print \"ENCODED TARGETS OUTPUT:\\t\\t\", targets_out[i]\n", 104 | " print \"TARGETS SEQUENCE LENGTH:\\t\", targets_seqlen[i]\n", 105 | " print \"TARGETS MASK:\\t\\t\\t\", targets_mask[i]" 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "### Encoder Decoder model setup\n", 113 | "Below is the TensorFlow model definition. We use an embedding layer to go from integer representation to vector representation of the input.\n", 114 | "\n", 115 | "Note that we have made use of a custom decoder wrapper which can be found in `rnn.py`." 116 | ] 117 | }, 118 | { 119 | "cell_type": "code", 120 | "execution_count": null, 121 | "metadata": { 122 | "collapsed": false 123 | }, 124 | "outputs": [], 125 | "source": [ 126 | "# resetting the graph\n", 127 | "reset_default_graph()\n", 128 | "\n", 129 | "# Setting up hyperparameters and general configs\n", 130 | "MAX_DIGITS = 5\n", 131 | "MIN_DIGITS = 5\n", 132 | "NUM_INPUTS = 27\n", 133 | "NUM_OUTPUTS = 11 #(0-9 + '#')\n", 134 | "\n", 135 | "BATCH_SIZE = 100\n", 136 | "# try various learning rates 1e-2 to 1e-5\n", 137 | "LEARNING_RATE = 0.005\n", 138 | "X_EMBEDDINGS = 8\n", 139 | "t_EMBEDDINGS = 8\n", 140 | "NUM_UNITS_ENC = 10\n", 141 | "NUM_UNITS_DEC = 10\n", 142 | "\n", 143 | "\n", 144 | "# Setting up placeholders, these are the tensors that we \"feed\" to our network\n", 145 | "Xs = tf.placeholder(tf.int32, shape=[None, None], name='X_input')\n", 146 | "ts_in = tf.placeholder(tf.int32, shape=[None, None], name='t_input_in')\n", 147 | "ts_out = tf.placeholder(tf.int32, shape=[None, None], name='t_input_out')\n", 148 | "X_len = tf.placeholder(tf.int32, shape=[None], name='X_len')\n", 149 | "t_len = tf.placeholder(tf.int32, shape=[None], name='X_len')\n", 150 | "t_mask = tf.placeholder(tf.float32, shape=[None, None], name='t_mask')\n", 151 | "\n", 152 | "# Building the model\n", 153 | "\n", 154 | "# first we build the embeddings to make our characters into dense, trainable vectors\n", 155 | "X_embeddings = tf.get_variable('X_embeddings', [NUM_INPUTS, X_EMBEDDINGS],\n", 156 | " initializer=tf.random_normal_initializer(stddev=0.1))\n", 157 | "t_embeddings = tf.get_variable('t_embeddings', [NUM_OUTPUTS, t_EMBEDDINGS],\n", 158 | " initializer=tf.random_normal_initializer(stddev=0.1))\n", 159 | "\n", 160 | "# setting up weights for computing the final output\n", 161 | "W_out = tf.get_variable('W_out', [NUM_UNITS_DEC, NUM_OUTPUTS])\n", 162 | "b_out = tf.get_variable('b_out', [NUM_OUTPUTS])\n", 163 | "\n", 164 | "X_embedded = tf.gather(X_embeddings, Xs, name='embed_X')\n", 165 | "t_embedded = tf.gather(t_embeddings, ts_in, name='embed_t')\n", 166 | "\n", 167 | "# forward encoding\n", 168 | "enc_cell = tf.nn.rnn_cell.GRUCell(NUM_UNITS_ENC)#python.ops.rnn_cell.GRUCell\n", 169 | "_, enc_state = tf.nn.dynamic_rnn(cell=enc_cell, inputs=X_embedded,\n", 170 | " sequence_length=X_len, dtype=tf.float32)\n", 171 | "# use below incase TF's makes issues\n", 172 | "#enc_state, _ = tf_utils.encoder(X_embedded, X_len, 'encoder', NUM_UNITS_ENC)\n", 173 | "#\n", 174 | "#enc_state = tf.concat(1, [enc_state, enc_state])\n", 175 | "\n", 176 | "# decoding\n", 177 | "# note that we are using a wrapper for decoding here, this wrapper is hardcoded to only use GRU\n", 178 | "# check out tf_utils to see how you make your own decoder\n", 179 | "dec_out, valid_dec_out = tf_utils.decoder(enc_state, t_embedded, t_len, \n", 180 | " NUM_UNITS_DEC, t_embeddings,\n", 181 | " W_out, b_out)\n", 182 | "\n", 183 | "# reshaping to have [batch_size*seqlen, num_units]\n", 184 | "out_tensor = tf.reshape(dec_out, [-1, NUM_UNITS_DEC])\n", 185 | "valid_out_tensor = tf.reshape(valid_dec_out, [-1, NUM_UNITS_DEC])\n", 186 | "# computing output\n", 187 | "out_tensor = tf.matmul(out_tensor, W_out) + b_out\n", 188 | "valid_out_tensor = tf.matmul(valid_out_tensor, W_out) + b_out\n", 189 | "# reshaping back to sequence\n", 190 | "b_size = tf.shape(X_len)[0] # use a variable we know has batch_size in [0]\n", 191 | "seq_len = tf.shape(t_embedded)[1] # variable we know has sequence length in [1]\n", 192 | "num_out = tf.constant(NUM_OUTPUTS) # casting NUM_OUTPUTS to a tensor variable\n", 193 | "out_shape = tf.concat(0, [tf.expand_dims(b_size, 0),\n", 194 | " tf.expand_dims(seq_len, 0),\n", 195 | " tf.expand_dims(num_out, 0)])\n", 196 | "out_tensor = tf.reshape(out_tensor, out_shape)\n", 197 | "valid_out_tensor = tf.reshape(valid_out_tensor, out_shape)\n", 198 | "# handling shape loss\n", 199 | "#out_tensor.set_shape([None, None, NUM_OUTPUTS])\n", 200 | "y = out_tensor\n", 201 | "y_valid = valid_out_tensor" 202 | ] 203 | }, 204 | { 205 | "cell_type": "code", 206 | "execution_count": null, 207 | "metadata": { 208 | "collapsed": false 209 | }, 210 | "outputs": [], 211 | "source": [ 212 | "# print all the variable names and shapes\n", 213 | "for var in tf.all_variables():\n", 214 | " s = var.name + \" \"*(40-len(var.name))\n", 215 | " print s, var.value().get_shape()" 216 | ] 217 | }, 218 | { 219 | "cell_type": "markdown", 220 | "metadata": {}, 221 | "source": [ 222 | "### Defining the cost function, gradient clipping and accuracy\n", 223 | "Because the targets are categorical we use the cross entropy error.\n", 224 | "As the data is sequential we use the sequence to sequence cross entropy supplied in `tf_utils.py`.\n", 225 | "We use the Adam optimizer but you can experiment with the different optimizers implemented in [TensorFlow](https://www.tensorflow.org/versions/r0.10/api_docs/python/train.html#optimizers)." 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": null, 231 | "metadata": { 232 | "collapsed": false 233 | }, 234 | "outputs": [], 235 | "source": [ 236 | "def loss_and_acc(preds):\n", 237 | " # sequence_loss_tensor is a modification of TensorFlow's own sequence_to_sequence_loss\n", 238 | " # TensorFlow's seq2seq loss works with a 2D list instead of a 3D tensors\n", 239 | " loss = tf_utils.sequence_loss_tensor(preds, ts_out, t_mask, NUM_OUTPUTS) # notice that we use ts_out here!\n", 240 | " # if you want regularization\n", 241 | " #reg_scale = 0.00001\n", 242 | " #regularize = tf.contrib.layers.l2_regularizer(reg_scale)\n", 243 | " #params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n", 244 | " #reg_term = sum([regularize(param) for param in params])\n", 245 | " #loss += reg_term\n", 246 | " # calculate accuracy\n", 247 | " argmax = tf.to_int32(tf.argmax(preds, 2))\n", 248 | " correct = tf.to_float(tf.equal(argmax, ts_out)) * t_mask\n", 249 | " accuracy = tf.reduce_sum(correct) / tf.reduce_sum(t_mask)\n", 250 | " return loss, accuracy, argmax\n", 251 | "\n", 252 | "loss, accuracy, predictions = loss_and_acc(y)\n", 253 | "loss_valid, accuracy_valid, predictions_valid = loss_and_acc(y_valid)\n", 254 | "\n", 255 | "# use lobal step to keep track of our iterations\n", 256 | "global_step = tf.Variable(0, name='global_step', trainable=False)\n", 257 | "# pick optimizer, try momentum or adadelta\n", 258 | "optimizer = tf.train.AdamOptimizer(LEARNING_RATE)\n", 259 | "# extract gradients for each variable\n", 260 | "grads_and_vars = optimizer.compute_gradients(loss)\n", 261 | "# add below for clipping by norm\n", 262 | "#gradients, variables = zip(*grads_and_vars) # unzip list of tuples\n", 263 | "#clipped_gradients, global_norm = (\n", 264 | "# tf.clip_by_global_norm(gradients, self.clip_norm) )\n", 265 | "#grads_and_vars = zip(clipped_gradients, variables)\n", 266 | "# apply gradients and make trainable function\n", 267 | "train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": null, 273 | "metadata": { 274 | "collapsed": false 275 | }, 276 | "outputs": [], 277 | "source": [ 278 | "# print all the variable names and shapes\n", 279 | "# notice that we now have the optimizer Adam as well!\n", 280 | "for var in tf.all_variables():\n", 281 | " s = var.name + \" \"*(40-len(var.name))\n", 282 | " print s, var.value().get_shape()" 283 | ] 284 | }, 285 | { 286 | "cell_type": "code", 287 | "execution_count": null, 288 | "metadata": { 289 | "collapsed": false 290 | }, 291 | "outputs": [], 292 | "source": [ 293 | "# as always, test the forward pass and initialize the tf.Session!\n", 294 | "# here is some dummy data\n", 295 | "batch_size=3\n", 296 | "inputs, inputs_seqlen, targets_in, targets_out, targets_seqlen, targets_mask, \\\n", 297 | "text_inputs, text_targets_in, text_targets_out = \\\n", 298 | " get_batch(batch_size=batch_size, max_digits=7, min_digits=2)\n", 299 | "\n", 300 | "for i in range(batch_size):\n", 301 | " print \"\\nSAMPLE\",i\n", 302 | " print \"TEXT INPUTS:\\t\\t\\t\", text_inputs[i]\n", 303 | " print \"TEXT TARGETS INPUT:\\t\\t\", text_targets_in[i]\n", 304 | "\n", 305 | "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n", 306 | "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n", 307 | "# initialize the Session\n", 308 | "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n", 309 | "# test train part\n", 310 | "sess.run(tf.initialize_all_variables())\n", 311 | "feed_dict = {Xs: inputs, X_len: inputs_seqlen, ts_in: targets_in,\n", 312 | " ts_out: targets_out, t_len: targets_seqlen}\n", 313 | "fetches = [y]\n", 314 | "res = sess.run(fetches=fetches, feed_dict=feed_dict)\n", 315 | "print \"y\", res[0].shape\n", 316 | "\n", 317 | "# test validation part\n", 318 | "fetches = [y_valid]\n", 319 | "res = sess.run(fetches=fetches, feed_dict=feed_dict)\n", 320 | "print \"y_valid\", res[0].shape" 321 | ] 322 | }, 323 | { 324 | "cell_type": "code", 325 | "execution_count": null, 326 | "metadata": { 327 | "collapsed": false 328 | }, 329 | "outputs": [], 330 | "source": [ 331 | "#Generate some validation data\n", 332 | "X_val, X_len_val, t_in_val, t_out_val, t_len_val, t_mask_val, \\\n", 333 | "text_inputs_val, text_targets_in_val, text_targets_out_val = \\\n", 334 | " get_batch(batch_size=5000, max_digits=MAX_DIGITS,min_digits=MIN_DIGITS)\n", 335 | "print \"X_val\", X_val.shape\n", 336 | "print \"t_out_val\", t_out_val.shape" 337 | ] 338 | }, 339 | { 340 | "cell_type": "markdown", 341 | "metadata": {}, 342 | "source": [ 343 | "# Training" 344 | ] 345 | }, 346 | { 347 | "cell_type": "code", 348 | "execution_count": null, 349 | "metadata": { 350 | "collapsed": false 351 | }, 352 | "outputs": [], 353 | "source": [ 354 | "# setting up running parameters\n", 355 | "val_interval = 5000\n", 356 | "samples_to_process = 3e5\n", 357 | "samples_processed = 0\n", 358 | "samples_val = []\n", 359 | "costs, accs_val = [], []\n", 360 | "plt.figure()\n", 361 | "try:\n", 362 | " while samples_processed < samples_to_process:\n", 363 | " # load data\n", 364 | " X_tr, X_len_tr, t_in_tr, t_out_tr, t_len_tr, t_mask_tr, \\\n", 365 | " text_inputs_tr, text_targets_in_tr, text_targets_out_tr = \\\n", 366 | " get_batch(batch_size=BATCH_SIZE,max_digits=MAX_DIGITS,min_digits=MIN_DIGITS)\n", 367 | " # make fetches\n", 368 | " fetches_tr = [train_op, loss, accuracy]\n", 369 | " # set up feed dict\n", 370 | " feed_dict_tr = {Xs: X_tr, X_len: X_len_tr, ts_in: t_in_tr,\n", 371 | " ts_out: t_out_tr, t_len: t_len_tr, t_mask: t_mask_tr}\n", 372 | " # run the model\n", 373 | " res = tuple(sess.run(fetches=fetches_tr, feed_dict=feed_dict_tr))\n", 374 | " _, batch_cost, batch_acc = res\n", 375 | " costs += [batch_cost]\n", 376 | " samples_processed += BATCH_SIZE\n", 377 | " #if samples_processed % 1000 == 0: print batch_cost, batch_acc\n", 378 | " #validation data\n", 379 | " if samples_processed % val_interval == 0:\n", 380 | " #print \"validating\"\n", 381 | " fetches_val = [accuracy_valid, y_valid]\n", 382 | " feed_dict_val = {Xs: X_val, X_len: X_len_val, ts_in: t_in_val,\n", 383 | " ts_out: t_out_val, t_len: t_len_val, t_mask: t_mask_val}\n", 384 | " res = tuple(sess.run(fetches=fetches_val, feed_dict=feed_dict_val))\n", 385 | " acc_val, output_val = res\n", 386 | " samples_val += [samples_processed]\n", 387 | " accs_val += [acc_val]\n", 388 | " plt.plot(samples_val, accs_val, 'g-')\n", 389 | " plt.ylabel('Validation Accuracy', fontsize=15)\n", 390 | " plt.xlabel('Processed samples', fontsize=15)\n", 391 | " plt.title('', fontsize=20)\n", 392 | " plt.grid('on')\n", 393 | " plt.savefig(\"out.png\")\n", 394 | " display.display(display.Image(filename=\"out.png\"))\n", 395 | " display.clear_output(wait=True)\n", 396 | "except KeyboardInterrupt:\n", 397 | " pass" 398 | ] 399 | }, 400 | { 401 | "cell_type": "code", 402 | "execution_count": null, 403 | "metadata": { 404 | "collapsed": false, 405 | "scrolled": true 406 | }, 407 | "outputs": [], 408 | "source": [ 409 | "#plot of validation accuracy for each target position\n", 410 | "plt.figure(figsize=(7,7))\n", 411 | "plt.plot(np.mean(np.argmax(output_val,axis=2)==t_out_val,axis=0))\n", 412 | "plt.ylabel('Accuracy', fontsize=15)\n", 413 | "plt.xlabel('Target position', fontsize=15)\n", 414 | "#plt.title('', fontsize=20)\n", 415 | "plt.grid('on')\n", 416 | "plt.show()\n", 417 | "#why do the plot look like this?" 418 | ] 419 | }, 420 | { 421 | "cell_type": "markdown", 422 | "metadata": {}, 423 | "source": [ 424 | "# Exercises:\n", 425 | "\n", 426 | "1. The model has two GRU networks. The ```GRUEncoder``` and the ```GRUDecoder```.\n", 427 | "A GRU is parameterized by a update gate `z`, reset gate `r` and the cell `c`.\n", 428 | "Under normal circumstances, such as in the TensorFlow GRUCell implementation, these gates have been stacked for faster computation, but in the custom decoder each weight and bias are as described in the original [article for GRU](https://arxiv.org/abs/1406.1078).\n", 429 | "Thus we have the following weights and bias; ```{decoder/W_z_x:0, decoder/W_z_h:0, b_updategate, decoder/b_z:0, decoder/W_r_x:0, decoder/W_r_h:0, decoder/b_r:0, decoder/W_c_x:0, decoder/W_c_h:0, decoder/b_h:0}```.\n", 430 | "Try to explain the shape of ```decoder/W_z_x:0``` and ```decoder/W_z_h:0```. Why are they different? You can find the equations for the gru at: [GRU](http://lasagne.readthedocs.io/en/latest/modules/layers/recurrent.html#lasagne.layers.GRULayer). \n", 431 | "\n", 432 | "2. The GRUunit is able to ignore the input and just copy the previous hidden state. In the begining of training this might be desireable behaviour because it helps the model learn long range dependencies. You can make the model ignore the input by modifying initial bias values. What bias would you modify and how would you modify it? Again you'll need to refer to the GRU equations: [GRU](http://lasagne.readthedocs.io/en/latest/modules/layers/recurrent.html#lasagne.layers.GRULayer)\n", 433 | "Further, if you look into `tf_utils.py` and search for the `decoder(...)` function, you will see that the init for each weight and bias can be changed.\n", 434 | "\n", 435 | "3. Try setting MIN_DIGITS and MAX_DIGITS to 20\n", 436 | "\n", 437 | "4. What is the final validation performance? Why do you think it is not better? Comment on the accuracy for each position in of the output symbols?\n", 438 | "\n", 439 | "5. Why do you think the validation performance looks more \"jig-saw\" like compared to FFN and CNN models?\n", 440 | "\n", 441 | "6. In the example we stack a softmax layer on top of a Recurrent layer. In the code snippet below explain how we can do that?" 442 | ] 443 | }, 444 | { 445 | "cell_type": "code", 446 | "execution_count": null, 447 | "metadata": { 448 | "collapsed": false 449 | }, 450 | "outputs": [], 451 | "source": [ 452 | "reset_default_graph()\n", 453 | "\n", 454 | "bs_, seqlen_, numinputs_ = 16, 140, 40\n", 455 | "x_pl_ = tf.placeholder(tf.float32, [bs_, seqlen_, numinputs_])\n", 456 | "gru_cell_ = tf.nn.rnn_cell.GRUCell(10)\n", 457 | "l_gru_, gru_state_ = tf.nn.dynamic_rnn(gru_cell_, x_pl_, dtype=tf.float32)\n", 458 | "l_reshape_ = tf.reshape(l_gru_, [-1, 10])\n", 459 | "\n", 460 | "l_softmax_ = tf.contrib.layers.fully_connected(l_reshape_, 11, activation_fn=tf.nn.softmax)\n", 461 | "l_softmax_seq_ = tf.reshape(l_softmax_, [bs_, seqlen_, -1])\n", 462 | "\n", 463 | "print \"l_input_\", x_pl_.get_shape()\n", 464 | "print \"l_gru_\", l_gru_.get_shape()\n", 465 | "print \"l_reshape_\", l_reshape_.get_shape()\n", 466 | "print \"l_softmax_\", l_softmax_.get_shape()\n", 467 | "print \"l_softmax_seq_\", l_softmax_seq_.get_shape()" 468 | ] 469 | }, 470 | { 471 | "cell_type": "markdown", 472 | "metadata": {}, 473 | "source": [ 474 | "6. Optional: You are interested in doing sentiment analysis on tweets, i.e classification as positive or negative. You decide read over the twitter seqeuence and use the last hidden state to do the classification. How can you modify the small network above to only outa single classification for network? Hints: look at the gru\\_state\\_ or the [tf.slice](https://www.tensorflow.org/versions/r0.10/api_docs/python/array_ops.html#slice) in the API.\n", 475 | "\n", 476 | "\n", 477 | "7. Optional: Bidirectional Encoder, Bidirectional Encoders are usually implemented by running a forward model and a backward model (a forward model on a reversed sequence) separately and the concatenating them before parsing them on to the next layer. To reverse the sequence try looking at [tf.reverse_sequence](https://www.tensorflow.org/versions/r0.10/api_docs/python/array_ops.html#reverse_sequence)\n", 478 | "\n", 479 | "```\n", 480 | "enc_cell = tf.nn.rnn_cell.GRUCell(NUM_UNITS_ENC)\n", 481 | "_, enc_state = tf.nn.dynamic_rnn(cell=enc_cell, inputs=X_embedded,\n", 482 | " sequence_length=X_len, dtype=tf.float32, scope=\"rnn_forward\")\n", 483 | "\n", 484 | "X_embedded_backwards = tf.reverse_sequence(X_embedded, tf.to_int64(X_len), 1)\n", 485 | "enc_cell_backwards = tf.nn.rnn_cell.GRUCell(NUM_UNITS_ENC)\n", 486 | "_, enc_state_backwards = tf.nn.dynamic_rnn(cell=enc_cell_backwards, inputs=X_embedded_backwards,\n", 487 | " sequence_length=X_len, dtype=tf.float32, scope=\"rnn_backward\")\n", 488 | "\n", 489 | "enc_state = tf.concat(1, [enc_state, enc_state_backwards])\n", 490 | "```\n", 491 | "\n", 492 | "Note: you will need to double the NUM_UNITS_DEC, as it currently does not support different sizes." 493 | ] 494 | }, 495 | { 496 | "cell_type": "markdown", 497 | "metadata": {}, 498 | "source": [ 499 | "## Attention Decoder (LSTM)\n", 500 | "Soft attention for recurrent neural networks have recently attracted a lot of interest.\n", 501 | "These methods let the Decoder model selective focus on which part of the encoder sequence it will use for each decoded output symbol.\n", 502 | "This relieves the encoder from having to compress the input sequence into a fixed size vector representation passed on to the decoder.\n", 503 | "Secondly we can interrogate the decoder network about where it attends while producing the ouputs.\n", 504 | "below we'll implement an LSTM-decoder with selective attention and show that it significantly improves the performance of the toy translation task.\n", 505 | "\n", 506 | "The siminal attention paper is https://arxiv.org/pdf/1409.0473v7.pdf\n", 507 | "\n", 508 | "The principle of attention models is simple. \n", 509 | "\n", 510 | "1. Use the encoder to get the hidden represention $\\{h^1_e, ...h^n_e\\}$ for each position in the input sequence. \n", 511 | "2. for timestep $t$ in the decoder do for $m = 1...n$ : $a_m = f(h^m_e, h^d_t)$. Where f is a function returning a scalar value. \n", 512 | "3. You can then normalize the sequence of scalars $\\{a_1, ... a_n\\}$ to get probablities $\\{p_1, ... p_n\\}$.\n", 513 | "4. Weight each $h^e_t$ by its probablity $p_t$ and sum to get $h_{in}$.\n", 514 | "5. Use $h_{in}$ as an additional input to the decoder. $h_{in}$ is recalculated each time the decoder is updated." 515 | ] 516 | }, 517 | { 518 | "cell_type": "code", 519 | "execution_count": null, 520 | "metadata": { 521 | "collapsed": false 522 | }, 523 | "outputs": [], 524 | "source": [ 525 | "# resetting the graph\n", 526 | "reset_default_graph()\n", 527 | "\n", 528 | "# Setting up hyperparameters and general configs\n", 529 | "MAX_DIGITS = 10\n", 530 | "MIN_DIGITS = 10\n", 531 | "NUM_INPUTS = 27\n", 532 | "NUM_OUTPUTS = 11 #(0-9 + '#')\n", 533 | "\n", 534 | "BATCH_SIZE = 100\n", 535 | "# try various learning rates 1e-2 to 1e-5\n", 536 | "LEARNING_RATE = 0.005\n", 537 | "X_EMBEDDINGS = 8\n", 538 | "t_EMBEDDINGS = 8\n", 539 | "NUM_UNITS_ENC = 10\n", 540 | "NUM_UNITS_DEC = 10\n", 541 | "NUM_UNITS_ATTN = 20\n", 542 | "\n", 543 | "\n", 544 | "# Setting up placeholders, these are the tensors that we \"feed\" to our network\n", 545 | "Xs = tf.placeholder(tf.int32, shape=[None, None], name='X_input')\n", 546 | "ts_in = tf.placeholder(tf.int32, shape=[None, None], name='t_input_in')\n", 547 | "ts_out = tf.placeholder(tf.int32, shape=[None, None], name='t_input_out')\n", 548 | "X_len = tf.placeholder(tf.int32, shape=[None], name='X_len')\n", 549 | "t_len = tf.placeholder(tf.int32, shape=[None], name='X_len')\n", 550 | "t_mask = tf.placeholder(tf.float32, shape=[None, None], name='t_mask')\n", 551 | "\n", 552 | "# Building the model\n", 553 | "\n", 554 | "# first we build the embeddings to make our characters into dense, trainable vectors\n", 555 | "X_embeddings = tf.get_variable('X_embeddings', [NUM_INPUTS, X_EMBEDDINGS],\n", 556 | " initializer=tf.random_normal_initializer(stddev=0.1))\n", 557 | "t_embeddings = tf.get_variable('t_embeddings', [NUM_OUTPUTS, t_EMBEDDINGS],\n", 558 | " initializer=tf.random_normal_initializer(stddev=0.1))\n", 559 | "\n", 560 | "# setting up weights for computing the final output\n", 561 | "W_out = tf.get_variable('W_out', [NUM_UNITS_DEC, NUM_OUTPUTS])\n", 562 | "b_out = tf.get_variable('b_out', [NUM_OUTPUTS])\n", 563 | "\n", 564 | "X_embedded = tf.gather(X_embeddings, Xs, name='embed_X')\n", 565 | "t_embedded = tf.gather(t_embeddings, ts_in, name='embed_t')\n", 566 | "\n", 567 | "# forward encoding\n", 568 | "enc_cell = tf.nn.rnn_cell.GRUCell(NUM_UNITS_ENC)#python.ops.rnn_cell.GRUCell\n", 569 | "enc_out, enc_state = tf.nn.dynamic_rnn(cell=enc_cell, inputs=X_embedded,\n", 570 | " sequence_length=X_len, dtype=tf.float32)\n", 571 | "# use below in case TF's does not work as intended\n", 572 | "#enc_state, _ = tf_utils.encoder(X_embedded, X_len, 'encoder', NUM_UNITS_ENC)\n", 573 | "#\n", 574 | "#enc_state = tf.concat(1, [enc_state, enc_state])\n", 575 | "\n", 576 | "# decoding\n", 577 | "# note that we are using a wrapper for decoding here, this wrapper is hardcoded to only use GRU\n", 578 | "# check out tf_utils to see how you make your own decoder\n", 579 | "dec_out, dec_out_valid, alpha_valid = \\\n", 580 | " tf_utils.attention_decoder(enc_out, X_len, enc_state, t_embedded, t_len,\n", 581 | " NUM_UNITS_DEC, NUM_UNITS_ATTN, t_embeddings,\n", 582 | " W_out, b_out)\n", 583 | "\n", 584 | "# reshaping to have [batch_size*seqlen, num_units]\n", 585 | "out_tensor = tf.reshape(dec_out, [-1, NUM_UNITS_DEC])\n", 586 | "out_tensor_valid = tf.reshape(dec_out_valid, [-1, NUM_UNITS_DEC])\n", 587 | "# computing output\n", 588 | "out_tensor = tf.matmul(out_tensor, W_out) + b_out\n", 589 | "out_tensor_valid = tf.matmul(out_tensor_valid, W_out) + b_out\n", 590 | "# reshaping back to sequence\n", 591 | "b_size = tf.shape(X_len)[0] # use a variable we know has batch_size in [0]\n", 592 | "seq_len = tf.shape(t_embedded)[1] # variable we know has sequence length in [1]\n", 593 | "num_out = tf.constant(NUM_OUTPUTS) # casting NUM_OUTPUTS to a tensor variable\n", 594 | "out_shape = tf.concat(0, [tf.expand_dims(b_size, 0),\n", 595 | " tf.expand_dims(seq_len, 0),\n", 596 | " tf.expand_dims(num_out, 0)])\n", 597 | "out_tensor = tf.reshape(out_tensor, out_shape)\n", 598 | "out_tensor_valid = tf.reshape(out_tensor_valid, out_shape)\n", 599 | "# handling shape loss\n", 600 | "#out_tensor.set_shape([None, None, NUM_OUTPUTS])\n", 601 | "y = out_tensor\n", 602 | "y_valid = out_tensor_valid" 603 | ] 604 | }, 605 | { 606 | "cell_type": "code", 607 | "execution_count": null, 608 | "metadata": { 609 | "collapsed": false 610 | }, 611 | "outputs": [], 612 | "source": [ 613 | "def loss_and_acc(preds):\n", 614 | " # sequence_loss_tensor is a modification of TensorFlow's own sequence_to_sequence_loss\n", 615 | " # TensorFlow's seq2seq loss works with a 2D list instead of a 3D tensors\n", 616 | " loss = tf_utils.sequence_loss_tensor(preds, ts_out, t_mask, NUM_OUTPUTS) # notice that we use ts_out here!\n", 617 | " # if you want regularization\n", 618 | " reg_scale = 0.00001\n", 619 | " regularize = tf.contrib.layers.l2_regularizer(reg_scale)\n", 620 | " params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n", 621 | " reg_term = sum([regularize(param) for param in params])\n", 622 | " loss += reg_term\n", 623 | " # calculate accuracy\n", 624 | " argmax = tf.to_int32(tf.argmax(preds, 2))\n", 625 | " correct = tf.to_float(tf.equal(argmax, ts_out)) * t_mask\n", 626 | " accuracy = tf.reduce_sum(correct) / tf.reduce_sum(t_mask)\n", 627 | " return loss, accuracy, argmax\n", 628 | "\n", 629 | "loss, accuracy, predictions = loss_and_acc(y)\n", 630 | "loss_valid, accuracy_valid, predictions_valid = loss_and_acc(y_valid)\n", 631 | "\n", 632 | "# use lobal step to keep track of our iterations\n", 633 | "global_step = tf.Variable(0, name='global_step', trainable=False)\n", 634 | "# pick optimizer, try momentum or adadelta\n", 635 | "optimizer = tf.train.AdamOptimizer(LEARNING_RATE)\n", 636 | "# extract gradients for each variable\n", 637 | "grads_and_vars = optimizer.compute_gradients(loss)\n", 638 | "# add below for clipping by norm\n", 639 | "#gradients, variables = zip(*grads_and_vars) # unzip list of tuples\n", 640 | "#clipped_gradients, global_norm = (\n", 641 | "# tf.clip_by_global_norm(gradients, self.clip_norm) )\n", 642 | "#grads_and_vars = zip(clipped_gradients, variables)\n", 643 | "# apply gradients and make trainable function\n", 644 | "train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)" 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": null, 650 | "metadata": { 651 | "collapsed": false 652 | }, 653 | "outputs": [], 654 | "source": [ 655 | "# as always, test the forward pass and start the tf.Session!\n", 656 | "# here is some dummy data\n", 657 | "batch_size = 3\n", 658 | "inputs, inputs_seqlen, targets_in, targets_out, targets_seqlen, targets_mask, \\\n", 659 | "text_inputs, text_targets_in, text_targets_out = \\\n", 660 | " get_batch(batch_size=3, max_digits=7, min_digits=2)\n", 661 | "\n", 662 | "for i in range(batch_size):\n", 663 | " print \"\\nSAMPLE\",i\n", 664 | " print \"TEXT INPUTS:\\t\\t\\t\", text_inputs[i]\n", 665 | " print \"TEXT TARGETS INPUT:\\t\\t\", text_targets_in[i]\n", 666 | "\n", 667 | "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n", 668 | "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n", 669 | "# initialize the Session\n", 670 | "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n", 671 | "# test train part\n", 672 | "sess.run(tf.initialize_all_variables())\n", 673 | "feed_dict = {Xs: inputs, X_len: inputs_seqlen, ts_in: targets_in,\n", 674 | " ts_out: targets_out, t_len: targets_seqlen}\n", 675 | "fetches = [y]\n", 676 | "res = sess.run(fetches=fetches, feed_dict=feed_dict)\n", 677 | "print \"y\", res[0].shape\n", 678 | "\n", 679 | "# test validation part\n", 680 | "fetches = [y_valid]\n", 681 | "res = sess.run(fetches=fetches, feed_dict=feed_dict)\n", 682 | "print \"y_valid\", res[0].shape" 683 | ] 684 | }, 685 | { 686 | "cell_type": "code", 687 | "execution_count": null, 688 | "metadata": { 689 | "collapsed": false, 690 | "scrolled": true 691 | }, 692 | "outputs": [], 693 | "source": [ 694 | "# print all the variable names and shapes\n", 695 | "# notice that W_z is now packed, such that it contains both W_z_h and W_x_h, this is for optimization\n", 696 | "# further, we now have W_s, b_s. This is so NUM_UNITS_ENC and NUM_UNITS_DEC does not have to share shape ..!\n", 697 | "for var in tf.all_variables():\n", 698 | " s = var.name + \" \"*(40-len(var.name))\n", 699 | " print s, var.value().get_shape()" 700 | ] 701 | }, 702 | { 703 | "cell_type": "code", 704 | "execution_count": null, 705 | "metadata": { 706 | "collapsed": false 707 | }, 708 | "outputs": [], 709 | "source": [ 710 | "#Generate some validation data\n", 711 | "X_val, X_len_val, t_in_val, t_out_val, t_len_val, t_mask_val, \\\n", 712 | "text_inputs_val, text_targets_in_val, text_targets_out_val = \\\n", 713 | " get_batch(batch_size=5000, max_digits=MAX_DIGITS,min_digits=MIN_DIGITS)\n", 714 | "print \"X_val\", X_val.shape\n", 715 | "print \"t_out_val\", t_out_val.shape" 716 | ] 717 | }, 718 | { 719 | "cell_type": "code", 720 | "execution_count": null, 721 | "metadata": { 722 | "collapsed": false, 723 | "scrolled": true 724 | }, 725 | "outputs": [], 726 | "source": [ 727 | "# NOTICE - THIS MIGHT TAKE UPTO 30 MINUTES ON CPU..!\n", 728 | "# setting up running parameters\n", 729 | "val_interval = 5000\n", 730 | "samples_to_process = 3e5\n", 731 | "samples_processed = 0\n", 732 | "samples_val = []\n", 733 | "costs, accs = [], []\n", 734 | "plt.figure()\n", 735 | "try:\n", 736 | " while samples_processed < samples_to_process:\n", 737 | " # load data\n", 738 | " X_tr, X_len_tr, t_in_tr, t_out_tr, t_len_tr, t_mask_tr, \\\n", 739 | " text_inputs_tr, text_targets_in_tr, text_targets_out_tr = \\\n", 740 | " get_batch(batch_size=BATCH_SIZE,max_digits=MAX_DIGITS,min_digits=MIN_DIGITS)\n", 741 | " # make fetches\n", 742 | " fetches_tr = [train_op, loss, accuracy]\n", 743 | " # set up feed dict\n", 744 | " feed_dict_tr = {Xs: X_tr, X_len: X_len_tr, ts_in: t_in_tr,\n", 745 | " ts_out: t_out_tr, t_len: t_len_tr, t_mask: t_mask_tr}\n", 746 | " # run the model\n", 747 | " res = tuple(sess.run(fetches=fetches_tr, feed_dict=feed_dict_tr))\n", 748 | " _, batch_cost, batch_acc = res\n", 749 | " costs += [batch_cost]\n", 750 | " samples_processed += BATCH_SIZE\n", 751 | " #if samples_processed % 1000 == 0: print batch_cost, batch_acc\n", 752 | " #validation data\n", 753 | " if samples_processed % val_interval == 0:\n", 754 | " #print \"validating\"\n", 755 | " fetches_val = [accuracy_valid, y_valid, alpha_valid]\n", 756 | " feed_dict_val = {Xs: X_val, X_len: X_len_val, ts_in: t_in_val,\n", 757 | " ts_out: t_out_val, t_len: t_len_val, t_mask: t_mask_val}\n", 758 | " res = tuple(sess.run(fetches=fetches_val, feed_dict=feed_dict_val))\n", 759 | " acc_val, output_val, alp_val = res\n", 760 | " samples_val += [samples_processed]\n", 761 | " accs += [acc_val]\n", 762 | " plt.plot(samples_val, accs, 'b-')\n", 763 | " plt.ylabel('Validation Accuracy', fontsize=15)\n", 764 | " plt.xlabel('Processed samples', fontsize=15)\n", 765 | " plt.title('', fontsize=20)\n", 766 | " plt.grid('on')\n", 767 | " plt.savefig(\"out_attention.png\")\n", 768 | " display.display(display.Image(filename=\"out_attention.png\"))\n", 769 | " display.clear_output(wait=True)\n", 770 | "# NOTICE - THIS MIGHT TAKE UPTO 30 MINUTES ON CPU..!\n", 771 | "except KeyboardInterrupt:\n", 772 | " pass" 773 | ] 774 | }, 775 | { 776 | "cell_type": "code", 777 | "execution_count": null, 778 | "metadata": { 779 | "collapsed": false 780 | }, 781 | "outputs": [], 782 | "source": [ 783 | "#plot of validation accuracy for each target position\n", 784 | "plt.figure(figsize=(7,7))\n", 785 | "plt.plot(np.mean(np.argmax(output_val,axis=2)==t_out_val,axis=0))\n", 786 | "plt.ylabel('Accuracy', fontsize=15)\n", 787 | "plt.xlabel('Target position', fontsize=15)\n", 788 | "#plt.title('', fontsize=20)\n", 789 | "plt.grid('on')\n", 790 | "plt.show()\n", 791 | "#why do the plot look like this?" 792 | ] 793 | }, 794 | { 795 | "cell_type": "code", 796 | "execution_count": null, 797 | "metadata": { 798 | "collapsed": false 799 | }, 800 | "outputs": [], 801 | "source": [ 802 | "### attention plot, try with different i = 1, 2, ..., 1000\n", 803 | "i = 42\n", 804 | "\n", 805 | "column_labels = map(str, list(t_out_val[i]))\n", 806 | "row_labels = map(str, (list(X_val[i])))\n", 807 | "data = alp_val[i]\n", 808 | "fig, ax = plt.subplots()\n", 809 | "heatmap = ax.pcolor(data, cmap=plt.cm.Blues)\n", 810 | "\n", 811 | "# put the major ticks at the middle of each cell\n", 812 | "ax.set_xticks(np.arange(data.shape[1])+0.5, minor=False)\n", 813 | "ax.set_yticks(np.arange(data.shape[0])+0.5, minor=False)\n", 814 | "\n", 815 | "# want a more natural, table-like display\n", 816 | "ax.invert_yaxis()\n", 817 | "ax.xaxis.tick_top()\n", 818 | "\n", 819 | "ax.set_xticklabels(row_labels, minor=False)\n", 820 | "ax.set_yticklabels(column_labels, minor=False)\n", 821 | "\n", 822 | "plt.ylabel('output', fontsize=15)\n", 823 | "plt.xlabel('Attention plot', fontsize=15)\n", 824 | "\n", 825 | "plt.show()" 826 | ] 827 | }, 828 | { 829 | "cell_type": "code", 830 | "execution_count": null, 831 | "metadata": { 832 | "collapsed": false 833 | }, 834 | "outputs": [], 835 | "source": [ 836 | "#Plot of average attention weight as a function of the sequence position for each of \n", 837 | "#the 21 targets in the output sequence i.e. each line is the mean postion of the \n", 838 | "#attention for each target position.\n", 839 | "\n", 840 | "np.mean(alp_val, axis=0).shape\n", 841 | "plt.figure()\n", 842 | "plt.plot(np.mean(alp_val, axis=0).T)\n", 843 | "plt.ylabel('alpha', fontsize=15)\n", 844 | "plt.xlabel('Input Sequence position', fontsize=15)\n", 845 | "plt.title('Alpha weights', fontsize=20)\n", 846 | "plt.legend(map(str,range(1,22)), bbox_to_anchor=(1.125,1.0), fontsize=10)\n", 847 | "plt.show()\n" 848 | ] 849 | }, 850 | { 851 | "cell_type": "markdown", 852 | "metadata": { 853 | "collapsed": true 854 | }, 855 | "source": [ 856 | "## Assignments for the attention decoder\n", 857 | "1. Explain what the attention plot show.\n", 858 | "2. Explain what the alphaweights show.\n", 859 | "3. Why are the alpha curve for the first digit narrow and peaked while later digits have alpha curves that are wider and less peaked?\n", 860 | "4. Why is attention a good idea for this problem? Can you think of other problems where attention is a good choice?\n", 861 | "5. Try setting MIN_DIGITS and MAX_DIGITS to 20\n", 862 | "6. Enable gradient clipping (under the loss codeblock)" 863 | ] 864 | } 865 | ], 866 | "metadata": { 867 | "kernelspec": { 868 | "display_name": "Python 2", 869 | "language": "python", 870 | "name": "python2" 871 | }, 872 | "language_info": { 873 | "codemirror_mode": { 874 | "name": "ipython", 875 | "version": 2 876 | }, 877 | "file_extension": ".py", 878 | "mimetype": "text/x-python", 879 | "name": "python", 880 | "nbconvert_exporter": "python", 881 | "pygments_lexer": "ipython2", 882 | "version": "2.7.6" 883 | } 884 | }, 885 | "nbformat": 4, 886 | "nbformat_minor": 0 887 | } 888 | -------------------------------------------------------------------------------- /lab3_RNN/tf_utils.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.python.ops import tensor_array_ops 3 | from tensorflow.python.framework import ops 4 | from tensorflow.python.ops import nn_ops 5 | from tensorflow.python.ops import math_ops 6 | 7 | 8 | ### 9 | # custom loss function, similar to tensorflows but uses 3D tensors 10 | # instead of a list of 2D tensors 11 | def sequence_loss_tensor(logits, targets, weights, num_classes, 12 | average_across_timesteps=True, 13 | softmax_loss_function=None, name=None): 14 | """Weighted cross-entropy loss for a sequence of logits (per example). 15 | """ 16 | with ops.op_scope([logits, targets, weights], name, "sequence_loss_by_example"): 17 | probs_flat = tf.reshape(logits, [-1, num_classes]) 18 | targets = tf.reshape(targets, [-1]) 19 | if softmax_loss_function is None: 20 | crossent = nn_ops.sparse_softmax_cross_entropy_with_logits( 21 | probs_flat, targets) 22 | else: 23 | crossent = softmax_loss_function(probs_flat, targets) 24 | crossent = crossent * tf.reshape(weights, [-1]) 25 | crossent = tf.reduce_sum(crossent) 26 | total_size = math_ops.reduce_sum(weights) 27 | total_size += 1e-12 # to avoid division by zero 28 | crossent /= total_size 29 | return crossent 30 | 31 | 32 | ### 33 | # a custom masking function, takes sequence lengths and makes masks 34 | def mask(sequence_lengths): 35 | # based on this SO answer: http://stackoverflow.com/a/34138336/118173 36 | batch_size = tf.shape(sequence_lengths)[0] 37 | max_len = tf.reduce_max(sequence_lengths) 38 | 39 | lengths_transposed = tf.expand_dims(sequence_lengths, 1) 40 | 41 | rng = tf.range(max_len) 42 | rng_row = tf.expand_dims(rng, 0) 43 | 44 | return tf.less(rng_row, lengths_transposed) 45 | 46 | 47 | ### 48 | # a custom encoder function (in case we cant get tensorflows to work) 49 | 50 | def encoder(inputs, lengths, name, num_units, reverse=False, swap=False): 51 | with tf.variable_scope(name): 52 | weight_initializer = tf.truncated_normal_initializer(stddev=0.1) 53 | input_units = inputs.get_shape()[2] 54 | W_z = tf.get_variable('W_z', 55 | shape=[input_units+num_units, num_units], 56 | initializer=weight_initializer) 57 | W_r = tf.get_variable('W_r', 58 | shape=[input_units+num_units, num_units], 59 | initializer=weight_initializer) 60 | W_h = tf.get_variable('W_h', 61 | shape=[input_units+num_units, num_units], 62 | initializer=weight_initializer) 63 | b_z = tf.get_variable('b_z', 64 | shape=[num_units], 65 | initializer=tf.constant_initializer(1.0)) 66 | b_r = tf.get_variable('b_r', 67 | shape=[num_units], 68 | initializer=tf.constant_initializer(1.0)) 69 | b_h = tf.get_variable('b_h', 70 | shape=[num_units], 71 | initializer=tf.constant_initializer()) 72 | 73 | max_sequence_length = tf.reduce_max(lengths) 74 | min_sequence_length = tf.reduce_min(lengths) 75 | 76 | time = tf.constant(0) 77 | 78 | state_shape = tf.concat(0, [tf.expand_dims(tf.shape(lengths)[0], 0), 79 | tf.expand_dims(tf.constant(num_units), 0)]) 80 | # state_shape = tf.Print(state_shape, [state_shape]) 81 | state = tf.zeros(state_shape, dtype=tf.float32) 82 | 83 | if reverse: 84 | inputs = tf.reverse(inputs, dims=[False, True, False]) 85 | inputs = tf.transpose(inputs, perm=[1, 0, 2]) 86 | input_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True) 87 | input_ta = input_ta.unpack(inputs) 88 | 89 | output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True) 90 | 91 | def encoder_cond(time, state, output_ta_t): 92 | return tf.less(time, max_sequence_length) 93 | 94 | def encoder_body(time, old_state, output_ta_t): 95 | x_t = input_ta.read(time) 96 | 97 | con = tf.concat(1, [x_t, old_state]) 98 | z = tf.sigmoid(tf.matmul(con, W_z) + b_z) 99 | r = tf.sigmoid(tf.matmul(con, W_r) + b_r) 100 | con = tf.concat(1, [x_t, r*old_state]) 101 | h = tf.tanh(tf.matmul(con, W_h) + b_h) 102 | new_state = (1-z)*h + z*old_state 103 | 104 | output_ta_t = output_ta_t.write(time, new_state) 105 | 106 | def updateall(): 107 | return new_state 108 | 109 | def updatesome(): 110 | if reverse: 111 | return tf.select( 112 | tf.greater_equal(time, max_sequence_length-lengths), 113 | new_state, 114 | old_state) 115 | else: 116 | return tf.select(tf.less(time, lengths), new_state, old_state) 117 | 118 | if reverse: 119 | state = tf.cond( 120 | tf.greater_equal(time, max_sequence_length-min_sequence_length), 121 | updateall, 122 | updatesome) 123 | else: 124 | state = tf.cond(tf.less(time, min_sequence_length), updateall, updatesome) 125 | 126 | return (time + 1, state, output_ta_t) 127 | 128 | loop_vars = [time, state, output_ta] 129 | 130 | time, state, output_ta = tf.while_loop(encoder_cond, encoder_body, loop_vars, swap_memory=swap) 131 | 132 | enc_state = state 133 | enc_out = tf.transpose(output_ta.pack(), perm=[1, 0, 2]) 134 | 135 | if reverse: 136 | enc_out = tf.reverse(enc_out, dims=[False, True, False]) 137 | 138 | enc_out.set_shape([None, None, num_units]) 139 | 140 | return enc_state, enc_out 141 | 142 | 143 | ### 144 | # a custom decoder function 145 | 146 | def decoder(initial_state, target_input, target_len, num_units, 147 | embeddings, W_out, b_out, 148 | W_z_x_init = tf.truncated_normal_initializer(stddev=0.1), 149 | W_z_h_init = tf.truncated_normal_initializer(stddev=0.1), 150 | W_r_x_init = tf.truncated_normal_initializer(stddev=0.1), 151 | W_r_h_init = tf.truncated_normal_initializer(stddev=0.1), 152 | W_c_x_init = tf.truncated_normal_initializer(stddev=0.1), 153 | W_c_h_init = tf.truncated_normal_initializer(stddev=0.1), 154 | b_z_init = tf.constant_initializer(0.0), 155 | b_r_init = tf.constant_initializer(0.0), 156 | b_c_init = tf.constant_initializer(0.0), 157 | name='decoder', swap=False): 158 | """decoder 159 | TODO 160 | """ 161 | 162 | 163 | with tf.variable_scope(name): 164 | # we need the max seq len to optimize our RNN computation later on 165 | max_sequence_length = tf.reduce_max(target_len) 166 | # target_dims is just the embedding size 167 | target_dims = target_input.get_shape()[2] 168 | # set up weights for the GRU gates 169 | var = tf.get_variable # for ease of use 170 | # target_dims + num_units is because we stack embeddings and prev. hidden state to 171 | # optimize speed 172 | W_z_x = var('W_z_x', shape=[target_dims, num_units], initializer=W_z_x_init) 173 | W_z_h = var('W_z_h', shape=[num_units, num_units], initializer=W_z_h_init) 174 | b_z = var('b_z', shape=[num_units], initializer=b_z_init) 175 | W_r_x = var('W_r_x', shape=[target_dims, num_units], initializer=W_r_x_init) 176 | W_r_h = var('W_r_h', shape=[num_units, num_units], initializer=W_r_h_init) 177 | b_r = var('b_r', shape=[num_units], initializer=b_r_init) 178 | W_c_x = var('W_c_x', shape=[target_dims, num_units], initializer=W_c_x_init) 179 | W_c_h = var('W_c_h', shape=[num_units, num_units], initializer=W_c_h_init) 180 | b_c = var('b_h', shape=[num_units], initializer=b_c_init) 181 | 182 | # make inputs time-major 183 | inputs = tf.transpose(target_input, perm=[1, 0, 2]) 184 | # make tensor array for inputs, these are dynamic and used in the while-loop 185 | # these are not in the api documentation yet, you will have to look at github.com/tensorflow 186 | input_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True) 187 | input_ta = input_ta.unpack(inputs) 188 | 189 | # function to the while-loop, for early stopping 190 | def decoder_cond(time, state, output_ta_t): 191 | return tf.less(time, max_sequence_length) 192 | 193 | # the body_builder is just a wrapper to parse feedback 194 | def decoder_body_builder(feedback=False): 195 | # the decoder body, this is where the RNN magic happens! 196 | def decoder_body(time, old_state, output_ta_t): 197 | # when validating we need previous prediction, handle in feedback 198 | if feedback: 199 | def from_previous(): 200 | prev_1 = tf.matmul(old_state, W_out) + b_out 201 | return tf.gather(embeddings, tf.argmax(prev_1, 1)) 202 | x_t = tf.cond(tf.greater(time, 0), from_previous, lambda: input_ta.read(0)) 203 | else: 204 | # else we just read the next timestep 205 | x_t = input_ta.read(time) 206 | 207 | # calculate the GRU 208 | z = tf.sigmoid(tf.matmul(x_t, W_z_x) + tf.matmul(old_state, W_z_h) + b_z) # update gate 209 | r = tf.sigmoid(tf.matmul(x_t, W_r_x) + tf.matmul(old_state, W_r_h) + b_r) # reset gate 210 | c = tf.tanh(tf.matmul(x_t, W_c_x) + tf.matmul(r*old_state, W_c_h) + b_c) # proposed new state 211 | new_state = (1-z)*c + z*old_state # new state 212 | 213 | # writing output 214 | output_ta_t = output_ta_t.write(time, new_state) 215 | 216 | # return in "input-to-next-step" style 217 | return (time + 1, new_state, output_ta_t) 218 | return decoder_body 219 | # set up variables to loop with 220 | output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False) 221 | time = tf.constant(0) 222 | loop_vars = [time, initial_state, output_ta] 223 | 224 | # run the while-loop for training 225 | _, state, output_ta = tf.while_loop(decoder_cond, 226 | decoder_body_builder(), 227 | loop_vars, 228 | swap_memory=swap) 229 | # run the while-loop for validation 230 | _, valid_state, valid_output_ta = tf.while_loop(decoder_cond, 231 | decoder_body_builder(feedback=True), 232 | loop_vars, 233 | swap_memory=swap) 234 | # returning to batch major 235 | dec_out = tf.transpose(output_ta.pack(), perm=[1, 0, 2]) 236 | valid_dec_out = tf.transpose(valid_output_ta.pack(), perm=[1, 0, 2]) 237 | return dec_out, valid_dec_out 238 | 239 | 240 | ### 241 | # decoder with attention 242 | 243 | def attention_decoder(attention_input, attention_lengths, initial_state, target_input, 244 | target_input_lengths, num_units, num_attn_units, embeddings, W_out, b_out, 245 | name='decoder', swap=False): 246 | """Decoder with attention. 247 | Note that the number of units in the attention decoder must always 248 | be equal to the size of the initial state/attention input. 249 | Keyword arguments: 250 | attention_input: the input to put attention on. expected dims: [batch_size, attention_length, attention_dims] 251 | initial_state: The initial state for the decoder RNN. 252 | target_input: The target to replicate. Expected: [batch_size, max_target_sequence_len, embedding_dims] 253 | num_attn_units: Number of units in the alignment layer that produces the context vectors. 254 | """ 255 | with tf.variable_scope(name): 256 | target_dims = target_input.get_shape()[2] 257 | attention_dims = attention_input.get_shape()[2] 258 | attn_len = tf.shape(attention_input)[1] 259 | max_sequence_length = tf.reduce_max(target_input_lengths) 260 | 261 | weight_initializer = tf.truncated_normal_initializer(stddev=0.1) 262 | # map initial state to num_units 263 | W_s = tf.get_variable('W_s', 264 | shape=[attention_dims, num_units], 265 | initializer=weight_initializer) 266 | b_s = tf.get_variable('b_s', 267 | shape=[num_units], 268 | initializer=tf.constant_initializer()) 269 | 270 | # GRU 271 | W_z = tf.get_variable('W_z', 272 | shape=[target_dims+num_units+attention_dims, num_units], 273 | initializer=weight_initializer) 274 | W_r = tf.get_variable('W_r', 275 | shape=[target_dims+num_units+attention_dims, num_units], 276 | initializer=weight_initializer) 277 | W_c = tf.get_variable('W_c', 278 | shape=[target_dims+num_units+attention_dims, num_units], 279 | initializer=weight_initializer) 280 | b_z = tf.get_variable('b_z', 281 | shape=[num_units], 282 | initializer=tf.constant_initializer(1.0)) 283 | b_r = tf.get_variable('b_r', 284 | shape=[num_units], 285 | initializer=tf.constant_initializer(1.0)) 286 | b_c = tf.get_variable('b_c', 287 | shape=[num_units], 288 | initializer=tf.constant_initializer()) 289 | 290 | # for attention 291 | W_a = tf.get_variable('W_a', 292 | shape=[attention_dims, num_attn_units], 293 | initializer=weight_initializer) 294 | U_a = tf.get_variable('U_a', 295 | shape=[1, 1, attention_dims, num_attn_units], 296 | initializer=weight_initializer) 297 | b_a = tf.get_variable('b_a', 298 | shape=[num_attn_units], 299 | initializer=tf.constant_initializer()) 300 | v_a = tf.get_variable('v_a', 301 | shape=[num_attn_units], 302 | initializer=weight_initializer) 303 | 304 | # project initial state 305 | initial_state = tf.nn.tanh(tf.matmul(initial_state, W_s) + b_s) 306 | 307 | # TODO: don't use convolutions! 308 | # TODO: fix the bias (b_a) 309 | hidden = tf.reshape(attention_input, tf.pack([-1, attn_len, 1, attention_dims])) 310 | part1 = tf.nn.conv2d(hidden, U_a, [1, 1, 1, 1], "SAME") 311 | part1 = tf.squeeze(part1, [2]) # squeeze out the third dimension 312 | 313 | inputs = tf.transpose(target_input, perm=[1, 0, 2]) 314 | input_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True) 315 | input_ta = input_ta.unpack(inputs) 316 | 317 | def decoder_cond(time, state, output_ta_t, attention_tracker): 318 | return tf.less(time, max_sequence_length) 319 | 320 | def decoder_body_builder(feedback=False): 321 | def decoder_body(time, old_state, output_ta_t, attention_tracker): 322 | if feedback: 323 | def from_previous(): 324 | prev_1 = tf.matmul(old_state, W_out) + b_out 325 | return tf.gather(embeddings, tf.argmax(prev_1, 1)) 326 | x_t = tf.cond(tf.greater(time, 0), from_previous, lambda: input_ta.read(0)) 327 | else: 328 | x_t = input_ta.read(time) 329 | 330 | # attention 331 | part2 = tf.matmul(old_state, W_a) + b_a 332 | part2 = tf.expand_dims(part2, 1) 333 | john = part1 + part2 334 | e = tf.reduce_sum(v_a * tf.tanh(john), [2]) 335 | alpha = tf.nn.softmax(e) 336 | alpha = tf.to_float(mask(attention_lengths)) * alpha 337 | alpha = alpha / tf.reduce_sum(alpha, [1], keep_dims=True) 338 | attention_tracker = attention_tracker.write(time, alpha) 339 | context = tf.reduce_sum(tf.expand_dims(alpha, 2) * tf.squeeze(hidden), [1]) 340 | 341 | # GRU 342 | con = tf.concat(1, [x_t, old_state, context]) 343 | z = tf.sigmoid(tf.matmul(con, W_z) + b_z) 344 | r = tf.sigmoid(tf.matmul(con, W_r) + b_r) 345 | con = tf.concat(1, [x_t, r*old_state, context]) 346 | c = tf.tanh(tf.matmul(con, W_c) + b_c) 347 | new_state = (1-z)*c + z*old_state 348 | 349 | output_ta_t = output_ta_t.write(time, new_state) 350 | 351 | return (time + 1, new_state, output_ta_t, attention_tracker) 352 | return decoder_body 353 | 354 | 355 | output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False) 356 | attention_tracker = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False) 357 | time = tf.constant(0) 358 | loop_vars = [time, initial_state, output_ta, attention_tracker] 359 | 360 | _, state, output_ta, _ = tf.while_loop(decoder_cond, 361 | decoder_body_builder(), 362 | loop_vars, 363 | swap_memory=swap) 364 | _, valid_state, valid_output_ta, valid_attention_tracker = tf.while_loop(decoder_cond, 365 | decoder_body_builder(feedback=True), 366 | loop_vars, 367 | swap_memory=swap) 368 | 369 | dec_out = tf.transpose(output_ta.pack(), perm=[1, 0, 2]) 370 | valid_dec_out = tf.transpose(valid_output_ta.pack(), perm=[1, 0, 2]) 371 | valid_attention_tracker = tf.transpose(valid_attention_tracker.pack(), perm=[1, 0, 2]) 372 | 373 | return dec_out, valid_dec_out, valid_attention_tracker 374 | -------------------------------------------------------------------------------- /lab4_Kaggle/.gitignore: -------------------------------------------------------------------------------- 1 | tensorboard 2 | -------------------------------------------------------------------------------- /lab4_Kaggle/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab4_Kaggle/README.md -------------------------------------------------------------------------------- /lab5_AE/.gitignore: -------------------------------------------------------------------------------- 1 | *.jpg 2 | *.png 3 | -------------------------------------------------------------------------------- /lab5_AE/lab5_AE.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Credits\n", 8 | "TensorFlow translation of [Lasagne tutorial](https://github.com/DeepLearningDTU/02456-deep-learning/blob/master/week5/lab51_AE.ipynb). Thanks to [skaae](https://github.com/skaae), [casperkaae](https://github.com/casperkaae) and [larsmaaloee](https://github.com/larsmaaloee)." 9 | ] 10 | }, 11 | { 12 | "cell_type": "markdown", 13 | "metadata": {}, 14 | "source": [ 15 | "# Dependancies and supporting functions\n", 16 | "Loading dependancies and supporting functions by running the code block below." 17 | ] 18 | }, 19 | { 20 | "cell_type": "code", 21 | "execution_count": null, 22 | "metadata": { 23 | "collapsed": false 24 | }, 25 | "outputs": [], 26 | "source": [ 27 | "from __future__ import division, print_function\n", 28 | "import matplotlib\n", 29 | "import matplotlib.pyplot as plt\n", 30 | "from IPython.display import Image, display, clear_output\n", 31 | "%matplotlib nbagg\n", 32 | "%matplotlib inline \n", 33 | "import numpy as np\n", 34 | "import matplotlib.pyplot as plt\n", 35 | "import sklearn.datasets\n", 36 | "import tensorflow as tf\n", 37 | "from tensorflow.python.framework.ops import reset_default_graph" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "# Auto-encoders 101\n", 45 | "In this notebook you will implement a simple auto-encoder (AE). We assume that you are already familiar with the basics of neural networks. We'll start by defining an AE similar to the one used for the finetuning step by [Geoffrey Hinton and Ruslan Salakhutdinov](https://www.cs.toronto.edu/~hinton/science.pdf). We'll experiment with the AE setup and try to run it on the MNIST dataset. There has been a wide variety of research into the field of auto-encoders and the technique that you're about to learn is very simple compared to recent advances (e.g. [the Ladder network](https://arxiv.org/abs/1507.02672) and [VAEs](https://arxiv.org/abs/1312.6114)). However, the basic idea stays the same.\n", 46 | "\n", 47 | "AEs are used within unsupervised learning, in which you do not have a target $y$. Instead it *encodes* an input $x$ into a latent state $z$ and decodes $z$ into a reconstruction $\\hat{x}$. This way the parameters of the network can be optimized w.r.t. the difference between $x$ and $\\hat{x}$. Depending on the input distribution, the difference can be measured in various ways, e.g. mean squared error (MSE). In many applications the auto-encoder will find an internal state of each data point corresponding to a feature. So if we are to model the MNIST dataset, one could expect that the internal state would correspond to a digit-class and/or the shape.\n", 48 | "\n", 49 | "*The exercises are found at the bottom of the notebook*" 50 | ] 51 | }, 52 | { 53 | "cell_type": "markdown", 54 | "metadata": {}, 55 | "source": [ 56 | "## MNIST\n", 57 | "First let us load the MNIST dataset and plot a few examples. We only load a limited amount of classes to speed up training." 58 | ] 59 | }, 60 | { 61 | "cell_type": "code", 62 | "execution_count": null, 63 | "metadata": { 64 | "collapsed": false 65 | }, 66 | "outputs": [], 67 | "source": [ 68 | "from sklearn.utils import shuffle\n", 69 | "\n", 70 | "# To speed up training we'll only work on a subset of the data containing only the numbers 0, 1.\n", 71 | "data = np.load('../lab1_FFN/mnist.npz')\n", 72 | "num_classes = 2\n", 73 | "idxs_train = []\n", 74 | "idxs_valid = []\n", 75 | "idxs_test = []\n", 76 | "for i in range(num_classes):\n", 77 | " idxs_train += np.where(data['y_train'] == i)[0].tolist()\n", 78 | " idxs_valid += np.where(data['y_valid'] == i)[0].tolist()\n", 79 | " idxs_test += np.where(data['y_test'] == i)[0].tolist()\n", 80 | "\n", 81 | "x_train = data['X_train'][idxs_train].astype('float32')\n", 82 | "# Since this is unsupervised, the targets are only used for validation.\n", 83 | "targets_train = data['y_train'][idxs_train].astype('int32')\n", 84 | "x_train, targets_train = shuffle(x_train, targets_train, random_state=1234)\n", 85 | "\n", 86 | "\n", 87 | "x_valid = data['X_valid'][idxs_valid].astype('float32')\n", 88 | "targets_valid = data['y_valid'][idxs_valid].astype('int32')\n", 89 | "\n", 90 | "x_test = data['X_test'][idxs_test].astype('float32')\n", 91 | "targets_test = data['y_test'][idxs_test].astype('int32')\n", 92 | "\n", 93 | "print(\"training set dim(%i, %i).\" % x_train.shape)\n", 94 | "print(\"validation set dim(%i, %i).\" % x_valid.shape)\n", 95 | "print(\"test set dim(%i, %i).\" % x_test.shape)" 96 | ] 97 | }, 98 | { 99 | "cell_type": "code", 100 | "execution_count": null, 101 | "metadata": { 102 | "collapsed": false 103 | }, 104 | "outputs": [], 105 | "source": [ 106 | "#plot a few MNIST examples\n", 107 | "idx = 0\n", 108 | "canvas = np.zeros((28*10, 10*28))\n", 109 | "for i in range(10):\n", 110 | " for j in range(10):\n", 111 | " canvas[i*28:(i+1)*28, j*28:(j+1)*28] = x_train[idx].reshape((28, 28))\n", 112 | " idx += 1\n", 113 | "plt.figure(figsize=(7, 7))\n", 114 | "plt.axis('off')\n", 115 | "plt.imshow(canvas, cmap='gray')\n", 116 | "plt.title('MNIST handwritten digits')" 117 | ] 118 | }, 119 | { 120 | "cell_type": "markdown", 121 | "metadata": {}, 122 | "source": [ 123 | "### Building the model\n", 124 | "When defining the model the latent layer $z$ must act as a bottleneck of information. We initialize the AE with 1 hidden layer in the encoder and decoder using relu units as non-linearities. The latent layer has a dimensionality of 2 in order to make it easy to visualise. Since $x$ are pixel intensities that are normalized between 0 and 1, we use the sigmoid non-linearity to model the reconstruction." 125 | ] 126 | }, 127 | { 128 | "cell_type": "code", 129 | "execution_count": null, 130 | "metadata": { 131 | "collapsed": true 132 | }, 133 | "outputs": [], 134 | "source": [ 135 | "from tensorflow.contrib.layers import fully_connected\n", 136 | "from tensorflow.python.ops.nn import relu, sigmoid" 137 | ] 138 | }, 139 | { 140 | "cell_type": "code", 141 | "execution_count": null, 142 | "metadata": { 143 | "collapsed": false 144 | }, 145 | "outputs": [], 146 | "source": [ 147 | "# define in/output size\n", 148 | "num_features = x_train.shape[1]\n", 149 | "\n", 150 | "# reset graph\n", 151 | "reset_default_graph()\n", 152 | "\n", 153 | "# define the model\n", 154 | "x_pl = tf.placeholder(tf.float32, [None, num_features], 'x_pl')\n", 155 | "l_enc = fully_connected(inputs=x_pl, num_outputs=128, activation_fn=relu, scope='l_enc')\n", 156 | "l_z = fully_connected(inputs=l_enc, num_outputs=2, activation_fn=None, scope='l_z') # None indicates a linear output.\n", 157 | "l_dec = fully_connected(inputs=l_z, num_outputs=128, activation_fn=relu, scope='l_dec')\n", 158 | "l_out = fully_connected(inputs=l_dec, num_outputs=num_features, activation_fn=sigmoid) # iid pixel intensities between 0 and 1." 159 | ] 160 | }, 161 | { 162 | "cell_type": "markdown", 163 | "metadata": {}, 164 | "source": [ 165 | "Following we define the TensorFlow functions for training and evaluation." 166 | ] 167 | }, 168 | { 169 | "cell_type": "code", 170 | "execution_count": null, 171 | "metadata": { 172 | "collapsed": false 173 | }, 174 | "outputs": [], 175 | "source": [ 176 | "# calculate loss\n", 177 | "loss_per_pixel = tf.square(tf.sub(l_out, x_pl))\n", 178 | "loss = tf.reduce_mean(loss_per_pixel, name=\"mean_square_error\")\n", 179 | "# if you want regularization\n", 180 | "#reg_scale = 0.0005\n", 181 | "#regularize = tf.contrib.layers.l2_regularizer(reg_scale)\n", 182 | "#params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n", 183 | "#reg_term = sum([regularize(param) for param in params])\n", 184 | "#loss += reg_term\n", 185 | "\n", 186 | "# define our optimizer\n", 187 | "optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.25)\n", 188 | "\n", 189 | "# make training op for applying the gradients\n", 190 | "train_op = optimizer.minimize(loss)" 191 | ] 192 | }, 193 | { 194 | "cell_type": "code", 195 | "execution_count": null, 196 | "metadata": { 197 | "collapsed": false 198 | }, 199 | "outputs": [], 200 | "source": [ 201 | "# test the forward pass\n", 202 | "_x_test = np.zeros(shape=(32, num_features))\n", 203 | "# initialize the Session\n", 204 | "sess = tf.Session()\n", 205 | "# test the forward pass\n", 206 | "sess.run(tf.initialize_all_variables())\n", 207 | "feed_dict = {x_pl: _x_test}\n", 208 | "res_forward_pass = sess.run(fetches=[l_out], feed_dict=feed_dict)\n", 209 | "print(\"l_out\", res_forward_pass[0].shape)" 210 | ] 211 | }, 212 | { 213 | "cell_type": "markdown", 214 | "metadata": {}, 215 | "source": [ 216 | "In the training loop we sample each batch and evaluate the error, latent space and reconstructions every epoch." 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": null, 222 | "metadata": { 223 | "collapsed": false 224 | }, 225 | "outputs": [], 226 | "source": [ 227 | "batch_size = 100\n", 228 | "num_epochs = 100\n", 229 | "num_samples_train = x_train.shape[0]\n", 230 | "num_batches_train = num_samples_train // batch_size\n", 231 | "num_samples_valid = x_valid.shape[0]\n", 232 | "num_batches_valid = num_samples_valid // batch_size\n", 233 | "updates = []\n", 234 | "\n", 235 | "train_loss = []\n", 236 | "valid_loss = []\n", 237 | "cur_loss = 0\n", 238 | "plt.figure(figsize=(12, 24))\n", 239 | "\n", 240 | "try:\n", 241 | " for epoch in range(num_epochs):\n", 242 | " #Forward->Backprob->Update params\n", 243 | " cur_loss = []\n", 244 | " for i in range(num_batches_train):\n", 245 | " idxs = np.random.choice(range(x_train.shape[0]), size=(batch_size), replace=False) \n", 246 | " x_batch = x_train[idxs]\n", 247 | " # setup what to fetch, notice l\n", 248 | " fetches_train = [train_op, loss, l_out, l_z]\n", 249 | " feed_dict_train = {x_pl: x_batch}\n", 250 | " # do the complete backprob pass\n", 251 | " res_train = sess.run(fetches_train, feed_dict_train)\n", 252 | " _, batch_loss, train_out, train_z = tuple(res_train)\n", 253 | " cur_loss += [batch_loss]\n", 254 | " train_loss += [np.mean(cur_loss)]\n", 255 | " updates += [batch_size*num_batches_train*(epoch+1)]\n", 256 | "\n", 257 | " # evaluate\n", 258 | " fetches_eval = [loss, l_out, l_z]\n", 259 | " feed_dict_eval = {x_pl: x_valid}\n", 260 | " res_valid = sess.run(fetches_eval, feed_dict_eval)\n", 261 | " eval_loss, eval_out, eval_z = tuple(res_valid)\n", 262 | " valid_loss += [eval_loss]\n", 263 | "\n", 264 | " if epoch == 0:\n", 265 | " continue\n", 266 | "\n", 267 | " # Plotting\n", 268 | " plt.subplot(num_classes+1,2,1)\n", 269 | " plt.title('Error')\n", 270 | " plt.legend(['Train Error', 'Valid Error'])\n", 271 | " plt.xlabel('Updates'), plt.ylabel('Error')\n", 272 | " plt.plot(updates, train_loss, color=\"black\")\n", 273 | " plt.plot(updates, valid_loss, color=\"grey\")\n", 274 | " plt.ticklabel_format(style='sci', axis='x', scilimits=(0,0))\n", 275 | " plt.grid('on')\n", 276 | "\n", 277 | " plt.subplot(num_classes+1,2,2)\n", 278 | " plt.cla()\n", 279 | " plt.title('Latent space')\n", 280 | " plt.xlabel('z0'), plt.ylabel('z1')\n", 281 | " color = iter(plt.get_cmap('brg')(np.linspace(0, 1.0, num_classes)))\n", 282 | " for i in range(num_classes):\n", 283 | " clr = next(color)\n", 284 | " plt.scatter(eval_z[targets_valid==i, 0], eval_z[targets_valid==i, 1], c=clr, s=5., lw=0, marker='o', )\n", 285 | " plt.grid('on')\n", 286 | " \n", 287 | " c=0\n", 288 | " for k in range(3, 3 + num_classes*2, 2):\n", 289 | " plt.subplot(num_classes+1,2,k)\n", 290 | " plt.cla()\n", 291 | " plt.title('Inputs for %i' % c)\n", 292 | " plt.axis('off')\n", 293 | " idx = 0\n", 294 | " canvas = np.zeros((28*10, 10*28))\n", 295 | " for i in range(10):\n", 296 | " for j in range(10):\n", 297 | " canvas[i*28:(i+1)*28, j*28:(j+1)*28] = x_valid[targets_valid==c][idx].reshape((28, 28))\n", 298 | " idx += 1\n", 299 | " plt.imshow(canvas, cmap='gray')\n", 300 | " \n", 301 | " plt.subplot(num_classes+1,2,k+1)\n", 302 | " plt.cla()\n", 303 | " plt.title('Reconstructions for %i' % c)\n", 304 | " plt.axis('off')\n", 305 | " idx = 0\n", 306 | " canvas = np.zeros((28*10, 10*28))\n", 307 | " for i in range(10):\n", 308 | " for j in range(10):\n", 309 | " canvas[i*28:(i+1)*28, j*28:(j+1)*28] = eval_out[targets_valid==c][idx].reshape((28, 28))\n", 310 | " idx += 1\n", 311 | " plt.imshow(canvas, cmap='gray')\n", 312 | " c+=1\n", 313 | " \n", 314 | " \n", 315 | " plt.savefig(\"out51.png\")\n", 316 | " display(Image(filename=\"out51.png\"))\n", 317 | " clear_output(wait=True)\n", 318 | " \n", 319 | "except KeyboardInterrupt:\n", 320 | " pass\n", 321 | " " 322 | ] 323 | }, 324 | { 325 | "cell_type": "markdown", 326 | "metadata": { 327 | "collapsed": true 328 | }, 329 | "source": [ 330 | "### Exercise 1 - Analyzing the AE\n", 331 | "1. The above implementation of an AE is very simple.\n", 332 | " - *Experiment with the number of layers and non-linearities in order to improve the reconstructions.*\n", 333 | " - *What happens with the network when we change the non-linearities in the latent layer (e.g. sigmoid)?*\n", 334 | " - *Try to increase the number of digit classes in the training set and analyze the results.*\n", 335 | " - *Test different optimization algorithms and decide whether you should use regularizers*.\n", 336 | " \n", 337 | "2. Currently we optimize w.r.t. mean squared error. \n", 338 | " - *Find another error function that could fit this problem better.* \n", 339 | " - *Evaluate whether the error function is a better choice and explain your findings.*\n", 340 | "\n", 341 | "3. Complexity of the bottleneck.\n", 342 | " - *Increase the number of units in the latent layer and train.*\n", 343 | " - *Visualize by using [PCA](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) or [t-SNE](http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html).*" 344 | ] 345 | }, 346 | { 347 | "cell_type": "markdown", 348 | "metadata": { 349 | "collapsed": true 350 | }, 351 | "source": [ 352 | "### Exercise 2 - Adding classification (for the ambitious)\n", 353 | "The above training has been performed unsupervised. Now let us assume that we only have a fraction of labeled data points from each class (implemented below). As we know, semi-supervised learning can be utilized by combining unsupervised and supervised learning. Now you must analyze whether a trained AE from the above exercise can aid a classifier.\n", 354 | "\n", 355 | "1. Build a simple classifier (like the ones from week1) where you:\n", 356 | " - *Train on the labeled dataset and evaluate the results.*\n", 357 | "2. Build a second classifier and train on the latent output $z$ of the AE.\n", 358 | "3. Build a third classifier and train on the reconstructions of the AE.\n", 359 | "4. Evaluate the classifiers against each other and implement a model that improves the classification by combining the input, latent output and reconstruction." 360 | ] 361 | }, 362 | { 363 | "cell_type": "code", 364 | "execution_count": null, 365 | "metadata": { 366 | "collapsed": false 367 | }, 368 | "outputs": [], 369 | "source": [ 370 | "# Generate a subset of labeled data points\n", 371 | "\n", 372 | "num_labeled = 10 # You decide on the size of the fraction...\n", 373 | "\n", 374 | "def onehot(t, num_classes):\n", 375 | " out = np.zeros((t.shape[0], num_classes))\n", 376 | " for row, col in enumerate(t):\n", 377 | " out[row, col] = 1\n", 378 | " return out\n", 379 | "\n", 380 | "idxs_train_l = []\n", 381 | "for i in range(num_classes):\n", 382 | " idxs = np.where(targets_train == i)[0]\n", 383 | " idxs_train_l += np.random.choice(idxs, size=num_labeled).tolist()\n", 384 | "\n", 385 | "x_train_l = x_train[idxs_train_l]\n", 386 | "targets_train_l = targets_train[idxs_train_l]\n", 387 | "print(\"labeled training set dim(%i, %i).\" % x_train_l.shape)\n", 388 | "\n", 389 | "plt.figure(figsize=(12, 7))\n", 390 | "for i in range(num_classes*num_labeled):\n", 391 | " im = x_train_l[i].reshape((28, 28))\n", 392 | " plt.subplot(1, num_classes*num_labeled, i + 1)\n", 393 | " plt.imshow(im, cmap='gray')\n", 394 | " plt.axis('off')" 395 | ] 396 | } 397 | ], 398 | "metadata": { 399 | "kernelspec": { 400 | "display_name": "Python 2", 401 | "language": "python", 402 | "name": "python2" 403 | }, 404 | "language_info": { 405 | "codemirror_mode": { 406 | "name": "ipython", 407 | "version": 2 408 | }, 409 | "file_extension": ".py", 410 | "mimetype": "text/x-python", 411 | "name": "python", 412 | "nbconvert_exporter": "python", 413 | "pygments_lexer": "ipython2", 414 | "version": "2.7.6" 415 | } 416 | }, 417 | "nbformat": 4, 418 | "nbformat_minor": 0 419 | } 420 | --------------------------------------------------------------------------------