├── .gitignore
├── README.md
├── download-and-setup
    ├── .gitignore
    └── README.md
├── lab1_FFN
    ├── .gitignore
    ├── confusionmatrix.py
    ├── lab1_FFN.ipynb
    └── mnist.npz
├── lab2_CNN
    ├── .gitignore
    ├── confusionmatrix.py
    ├── lab2_CNN.ipynb
    ├── mnist.npz
    └── spatial_transformer.py
├── lab3_RNN
    ├── .gitignore
    ├── confusionmatrix.py
    ├── data_generator.py
    ├── enc-dec.png
    ├── lab3_RNN.ipynb
    └── tf_utils.py
├── lab4_Kaggle
    ├── .gitignore
    ├── README.md
    └── lab4_Kaggle.ipynb
└── lab5_AE
    ├── .gitignore
    └── lab5_AE.ipynb


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *.npz
3 | *.csv
4 | *.jpg
5 | *.ipynb_checkpoints
6 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # TensorFlow Tutorial - used by Nvidia
 2 | 
 3 | Learn TensorFlow from scratch by examples and visualizations with interactive jupyter notebooks. Learn to compete in the [Kaggle leaf detection challenge](https://www.kaggle.com/c/leaf-classification)!
 4 | 
 5 | All exercises are designed to be run from a CPU on a laptop, but can be accelerated with GPU resources.
 6 | 
 7 | Lab 1-4 was used in the [Deep Learning using TensorFlow](http://www.eventbrite.com/e/deep-learning-using-tensorflow-tickets-27071720244#) in London by Nvidia and Persontyle
 8 | 
 9 | ## Credits
10 | 
11 | Labs 1, 2, 3 and 5 have been translated from Theano/Lasagne with minor modifications from the following repositories: [Nvidia Summer Camp](https://github.com/DeepLearningDTU/nvidia_deep_learning_summercamp_2016) and [02456 deep learning](https://github.com/DeepLearningDTU/02456-deep-learning). Original authors: [skaae](https://github.com/skaae), [casperkaae](https://github.com/casperkaae) and [larsmaaloee](https://github.com/larsmaaloee).
12 | 
13 | Thanks to professor [Ole Winther](http://cogsys.imm.dtu.dk/staff/winther/) for supervision and sponsoring the labs.
14 | 
15 | ## Setup and Installation
16 | 
17 | Guides for downloading and installing TensorFlow on Linux, OSX and Windows using Docker can be found [here](https://github.com/alrojo/tensorflow-tutorial/tree/master/download-and-setup).
18 | 
19 | ## Material
20 | 
21 | The material consists of 5 labs.
22 | 
23 | ### [Lab1 - FFN](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab1_FFN)
24 | 
25 | Logistic regression, feed forward neural network (FFN) on the (in)famous MNIST!
26 | 
27 | Optional reading material from [Michael Nielsen](http://neuralnetworksanddeeplearning.com/) chapters 1-4 (Do 3-5 of the optional exercises).
28 | 
29 | ### [Lab2 - CNN](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab2_CNN)
30 | 
31 | Convolutional Neural Network (CNN) and Spatial Transformer on images.
32 | 
33 | Optional reading material from [Michael Nielsen](http://neuralnetworksanddeeplearning.com/) chapter 6 (stop when reaching section called Other approaches to deep neural nets).
34 | 
35 | ### [Lab3 - RNN](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab3_RNN)
36 | 
37 | Recurrent Neural Network (RNN) on Translation using Encoder-Decoder model and Encoder-Decoder with attention.
38 | 
39 | Optional reading material from [Alex Graves](https://www.cs.toronto.edu/~graves/preprint.pdf) chapters 3.1, 3.2 and 4,
40 | 
41 | ### [Lab4 - Kaggle](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab4_Kaggle)
42 | 
43 | Compete in the kaggle competition [Leaf Classification](https://www.kaggle.com/c/leaf-classification) using FFN, CNN and RNN.
44 | 
45 | ### [Lab5 - AE](https://github.com/alrojo/tensorflow-tutorial/tree/master/lab5_AE)
46 | 
47 | Unsupervised learning with autoencoder (AE) reconstructing the MNIST from only two latent variables.
48 | 
49 | Optional reading material from [deeplearningbook.org](http://www.deeplearningbook.org/contents/autoencoders.html) chapter 14.
50 | 


--------------------------------------------------------------------------------
/download-and-setup/.gitignore:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/download-and-setup/.gitignore


--------------------------------------------------------------------------------
/download-and-setup/README.md:
--------------------------------------------------------------------------------
 1 | # Download and Setup
 2 | 
 3 | This tutorial will guide installation of TensorFlow on Linux, OSX and Windows.
 4 | 
 5 | # Docker
 6 | 
 7 | In this tutorial we will use [docker](https://www.docker.com/) containers to handle dependancies and run our code.
 8 | Docker is a container which allow us to run our code in an encapsuled container.
 9 | The language of choice will be python2.
10 | 
11 | ## 1. Installation of docker (all operating systems)
12 | 
13 | Instructions for installing docker can be found [here](https://docs.docker.com/engine/installation/#installation), the instructions contains guides for most operating systems.
14 | 
15 | ## 2. Using dockerhub
16 | 
17 | After installing docker you are ready to go! The docker image that you will use for this tutorial is an extension of TensorFlow's own nightly build docker image (with sklearn, wget, scikit-image etc.).
18 | 
19 | Getting access to docker images on dockerhub (`hub.docker.com`) is easy! When choosing your docker image just type the dockerhub username followed by the project. In our case the username will be `alrojo` and the repository `tf-sklearn-cpu`, I encourage you to learn the fundamentals of docker, in the [project folder](https://hub.docker.com/r/alrojo/docker-whale/) (on docker hub) I have supplied the `Dockerfile` commands from which it was created.
20 | 
21 | To run the docker type
22 | 
23 | >docker run -it alrojo/tf-sklearn-cpu
24 | 
25 | this starts up a docker container from the `alrojo/tf-sklearn-cpu` image.
26 | Where `-it` is required for an interactive experience with the docker bash environment.
27 | To exit the interactive environment of the docker container type
28 | 
29 | >exit
30 | 
31 | (Don't worry! We need to rerun it with some other flags in just a moment.)
32 | 
33 | ## 3. Forwarding port
34 | 
35 | As the docker system runs independent of your host system, we need to enable port forwarding (for jupyter notebook) and sharing of directories.
36 | 
37 | First, make sure that you have downloaded this repository, if not, you can either go to `github.com/alrojo/tensorflow_tutorial`, click `Clone or download`, download as zip and extract to your desired folder.
38 | Alternatively you can run the command
39 | 
40 | >git clone https://github.com/alrojo/tensorflow_tutorial.git
41 | 
42 | In the following `$PATH\_TO\_FOLDER` should be replaced by the name of the your desired folder, an example of a path could be `~/deep\_learning\_courses.`
43 | And the name of the repository will be denoted as tensorflow_tutorial.
44 | Given these namings, run the following line in your shell
45 | 
46 | NOTE: windows users might not have the windows style path, type `pwd` in your docker command windows to find you docker friendly path.
47 | 
48 | >docker run -p 8888:8888 -v $PATH\_TO\_FOLDER/tensorflow_tutorial:/mnt/myproject -it alrojo/tf-sklearn-cpu
49 | 
50 | so if you are using `~/deep\_learning\_courses.` as your `$PATH\_TO\_FOLDER`, the command will look like this
51 | 
52 | >docker run -p 8888:8888 -v ~/deep\_learning\_courses/tensorflow_tutorial:/mnt/myproject -it alrojo/tf-sklearn-cpu
53 | 
54 | where `-it` is required for an interactive experience with the docker bash environment, `-p` is for port forwarding	and `-v` is for mounting your given folder to the docker container.
55 | 
56 | This should leave you in the root directory of your docker container with port forwarded and shared directory, run the command
57 | 
58 | >./run\_jupyter.sh
59 | 
60 | Your volume should be available through the `/mnt` folder, run
61 | 
62 | Open a new tab in your browser and type localhost:8888 in the browser address bar. Note that you cannot have any other notebooks running simultaneously.
63 | 
64 | NOTE: when using docker toolbox on windows the port will probably not bind to local host, instead you must find the port it binds to by typing the following in your docker prompt
65 | 
66 | >docker-machine ip
67 | 
68 | this should give you an ip that you can replace with localhost.
69 | 
70 | From within the notebook, click on `/mnt`, click on `myproject`, now you can start the exercises!
71 | 
72 | ## Installation of nvidia-docker for GPU
73 | 
74 | Use the following [guide](http://cs224d.stanford.edu/) for AWS setup.
75 | 


--------------------------------------------------------------------------------
/lab1_FFN/.gitignore:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab1_FFN/.gitignore


--------------------------------------------------------------------------------
/lab1_FFN/confusionmatrix.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | class ConfusionMatrix:
  5 |     """
  6 |        Simple confusion matrix class
  7 |        row is the true class, column is the predicted class
  8 |     """
  9 |     def __init__(self, num_classes, class_names=None):
 10 |         self.n_classes = num_classes
 11 |         if class_names is None:
 12 |             self.class_names = map(str, range(num_classes))
 13 |         else:
 14 |             self.class_names = class_names
 15 | 
 16 |         # find max class_name and pad
 17 |         max_len = max(map(len, self.class_names))
 18 |         self.max_len = max_len
 19 |         for idx, name in enumerate(self.class_names):
 20 |             if len(self.class_names) < max_len:
 21 |                 self.class_names[idx] = name + " "*(max_len-len(name))
 22 | 
 23 |         self.mat = np.zeros((num_classes,num_classes),dtype='int')
 24 | 
 25 |     def __str__(self):
 26 |         # calucate row and column sums
 27 |         col_sum = np.sum(self.mat, axis=1)
 28 |         row_sum = np.sum(self.mat, axis=0)
 29 | 
 30 |         s = []
 31 | 
 32 |         mat_str = self.mat.__str__()
 33 |         mat_str = mat_str.replace('[','').replace(']','').split('\n')
 34 | 
 35 |         for idx, row in enumerate(mat_str):
 36 |             if idx == 0:
 37 |                 pad = " "
 38 |             else:
 39 |                 pad = ""
 40 |             class_name = self.class_names[idx]
 41 |             class_name = " " + class_name + " |"
 42 |             row_str = class_name + pad + row
 43 |             row_str += " |" + str(col_sum[idx])
 44 |             s.append(row_str)
 45 | 
 46 |         row_sum = [(self.max_len+4)*" "+" ".join(map(str, row_sum))]
 47 |         hline = [(1+self.max_len)*" "+"-"*len(row_sum[0])]
 48 | 
 49 |         s = hline + s + hline + row_sum
 50 | 
 51 |         # add linebreaks
 52 |         s_out = [line+'\n' for line in s]
 53 |         return "".join(s_out)
 54 | 
 55 |     def batch_add(self, targets, preds):
 56 |         assert targets.shape == preds.shape
 57 |         assert len(targets) == len(preds)
 58 |         assert max(targets) < self.n_classes
 59 |         assert max(preds) < self.n_classes
 60 |         targets = targets.flatten()
 61 |         preds = preds.flatten()
 62 |         for i in range(len(targets)):
 63 |                 self.mat[targets[i], preds[i]] += 1
 64 | 
 65 |     def get_errors(self):
 66 |         tp = np.asarray(np.diag(self.mat).flatten(),dtype='float')
 67 |         fn = np.asarray(np.sum(self.mat, axis=1).flatten(),dtype='float') - tp
 68 |         fp = np.asarray(np.sum(self.mat, axis=0).flatten(),dtype='float') - tp
 69 |         tn = np.asarray(np.sum(self.mat)*np.ones(self.n_classes).flatten(),
 70 |                         dtype='float') - tp - fn - fp
 71 |         return tp, fn, fp, tn
 72 | 
 73 |     def accuracy(self):
 74 |         """
 75 |         Calculates global accuracy
 76 |         :return: accuracy
 77 |         :example: >>> conf = ConfusionMatrix(3)
 78 |                   >>> conf.batchAdd([0,0,1],[0,0,2])
 79 |                   >>> print conf.accuracy()
 80 |         """
 81 |         tp, _, _, _ = self.get_errors()
 82 |         n_samples = np.sum(self.mat)
 83 |         return np.sum(tp) / n_samples
 84 | 
 85 |     def sensitivity(self):
 86 |         tp, tn, fp, fn = self.get_errors()
 87 |         res = tp / (tp + fn)
 88 |         res = res[~np.isnan(res)]
 89 |         return res
 90 | 
 91 |     def specificity(self):
 92 |         tp, tn, fp, fn = self.get_errors()
 93 |         res = tn / (tn + fp)
 94 |         res = res[~np.isnan(res)]
 95 |         return res
 96 | 
 97 |     def positive_predictive_value(self):
 98 |         tp, tn, fp, fn = self.get_errors()
 99 |         res = tp / (tp + fp)
100 |         res = res[~np.isnan(res)]
101 |         return res
102 | 
103 |     def negative_predictive_value(self):
104 |         tp, tn, fp, fn = self.get_errors()
105 |         res = tn / (tn + fn)
106 |         res = res[~np.isnan(res)]
107 |         return res
108 | 
109 |     def false_positive_rate(self):
110 |         tp, tn, fp, fn = self.get_errors()
111 |         res = fp / (fp + tn)
112 |         res = res[~np.isnan(res)]
113 |         return res
114 | 
115 |     def false_discovery_rate(self):
116 |         tp, tn, fp, fn = self.get_errors()
117 |         res = fp / (tp + fp)
118 |         res = res[~np.isnan(res)]
119 |         return res
120 | 
121 |     def F1(self):
122 |         tp, tn, fp, fn = self.get_errors()
123 |         res = (2*tp) / (2*tp + fp + fn)
124 |         res = res[~np.isnan(res)]
125 |         return res
126 | 
127 |     def matthews_correlation(self):
128 |         tp, tn, fp, fn = self.get_errors()
129 |         numerator = tp*tn - fp*fn
130 |         denominator = np.sqrt((tp + fp)*(tp + fn)*(tn + fp)*(tn + fn))
131 |         res = numerator / denominator
132 |         res = res[~np.isnan(res)]
133 |         return res
134 | 


--------------------------------------------------------------------------------
/lab1_FFN/lab1_FFN.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Credits\n",
  8 |     "TensorFlow translation of [Lasagne tutorial](https://github.com/DeepLearningDTU/nvidia_deep_learning_summercamp_2016/blob/master/lab1/lab1_FFN.ipynb). Thanks to [skaae](https://github.com/skaae), [casperkaae](https://github.com/casperkaae) and [larsmaaloee](https://github.com/larsmaaloee)."
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "metadata": {},
 14 |    "source": [
 15 |     "# Dependancies and supporting functions\n",
 16 |     "Loading dependancies and supporting functions by running the code block below."
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": null,
 22 |    "metadata": {
 23 |     "collapsed": false
 24 |    },
 25 |    "outputs": [],
 26 |    "source": [
 27 |     "%matplotlib inline\n",
 28 |     "import matplotlib\n",
 29 |     "import numpy as np\n",
 30 |     "import matplotlib.pyplot as plt\n",
 31 |     "import sklearn.datasets\n",
 32 |     "import tensorflow as tf\n",
 33 |     "from tensorflow.python.framework.ops import reset_default_graph\n",
 34 |     "\n",
 35 |     "# Do not worry about the code below for now, it is used for plotting later\n",
 36 |     "def plot_decision_boundary(pred_func, X, y):\n",
 37 |     "    #from https://github.com/dennybritz/nn-from-scratch/blob/master/nn-from-scratch.ipynb\n",
 38 |     "    # Set min and max values and give it some padding\n",
 39 |     "    x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5\n",
 40 |     "    y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5\n",
 41 |     "    \n",
 42 |     "    h = 0.01\n",
 43 |     "    # Generate a grid of points with distance h between them\n",
 44 |     "    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))\n",
 45 |     "    \n",
 46 |     "    yy = yy.astype('float32')\n",
 47 |     "    xx = xx.astype('float32')\n",
 48 |     "    # Predict the function value for the whole gid\n",
 49 |     "    Z = pred_func(np.c_[xx.ravel(), yy.ravel()])[:,0]\n",
 50 |     "    Z = Z.reshape(xx.shape)\n",
 51 |     "    # Plot the contour and training examples\n",
 52 |     "    plt.figure()\n",
 53 |     "    plt.contourf(xx, yy, Z, cmap=plt.cm.RdBu)\n",
 54 |     "    plt.scatter(X[:, 0], X[:, 1], c=-y, cmap=plt.cm.Spectral)\n",
 55 |     "\n",
 56 |     "def onehot(t, num_classes):\n",
 57 |     "    out = np.zeros((t.shape[0], num_classes))\n",
 58 |     "    for row, col in enumerate(t):\n",
 59 |     "        out[row, col] = 1\n",
 60 |     "    return out"
 61 |    ]
 62 |   },
 63 |   {
 64 |    "cell_type": "markdown",
 65 |    "metadata": {},
 66 |    "source": [
 67 |     "# Neural networks 101\n",
 68 |     "In this notebook you will implement a simple neural network in TensorFlow utilizing the graph building and automatic differentiation engine of TensorFlow. We assume that you are already familiar with backpropagation (if not please see [Andrej Karpathy](http://cs.stanford.edu/people/karpathy/) or [Michal Nielsen](http://neuralnetworksanddeeplearning.com/chap2.html).\n",
 69 |     "We'll not spend much time on how TensorFlow works, but you can refer to [this short tutorial](https://www.tensorflow.org/versions/r0.10/get_started/basic_usage.html) if you are interested, or [the python documentation](https://www.tensorflow.org/versions/r0.10/api_docs/index.html).\n",
 70 |     "\n",
 71 |     "(Additionally, for the ambitious people we have previously made an assignment where you will implement both the forward and backpropagation in a neural network by hand, https://github.com/DTU-deeplearning/day1-NN/blob/master/exercises_1.ipynb)(Ole, skal jeg også implementere det?)\n",
 72 |     "\n",
 73 |     "In this exercise we'll start right away by defining logistic regression model in TensorFlow. Some details of TensorFlow can be a bit confusing, however you'll pick them up when you worked with it for some time. We'll initially start with a simple 2-D and 2-class classification problem where the class decision boundary can be visualized. Initially we show that logistic regression can only separate classes linearly. Adding a Non-linear hidden layer to the algorithm permits nonlinear class separation. If time permits we'll continue on to implement a fully conencted neural network to classify the (in)famous MNIST dataset consisting of images of hand written digits."
 74 |    ]
 75 |   },
 76 |   {
 77 |    "cell_type": "markdown",
 78 |    "metadata": {
 79 |     "collapsed": true
 80 |    },
 81 |    "source": [
 82 |     "# Problem \n",
 83 |     "We'll initally demonstrate the that MLPs can classify non-linear problems whereas simple logistic regression cannot. For ease of visualization and computationl speed we initially experiment on the simple 2D half-moon dataset."
 84 |    ]
 85 |   },
 86 |   {
 87 |    "cell_type": "code",
 88 |    "execution_count": null,
 89 |    "metadata": {
 90 |     "collapsed": false
 91 |    },
 92 |    "outputs": [],
 93 |    "source": [
 94 |     "# Generate a dataset and plot it\n",
 95 |     "np.random.seed(0)\n",
 96 |     "num_samples = 300\n",
 97 |     "\n",
 98 |     "X, y = sklearn.datasets.make_moons(num_samples, noise=0.20)\n",
 99 |     "\n",
100 |     "X_tr = X[:100].astype('float32')\n",
101 |     "X_val = X[100:200].astype('float32')\n",
102 |     "X_te = X[200:].astype('float32')\n",
103 |     "\n",
104 |     "y_tr = y[:100].astype('int32')\n",
105 |     "y_val = y[100:200].astype('int32')\n",
106 |     "y_te = y[200:].astype('int32')\n",
107 |     "\n",
108 |     "plt.scatter(X_tr[:,0], X_tr[:,1], s=40, c=y_tr, cmap=plt.cm.BuGn)\n",
109 |     "\n",
110 |     "print X.shape, y.shape\n",
111 |     "\n",
112 |     "num_features = X_tr.shape[-1]\n",
113 |     "num_output = 2"
114 |    ]
115 |   },
116 |   {
117 |    "cell_type": "markdown",
118 |    "metadata": {},
119 |    "source": [
120 |     "# From Logistic Regression to \"Deep Learning\" in Lasagne\n",
121 |     "The code implements logistic regression in TensorFlow. In section __Assignments Half Moon__ you are asked to modify the code into a neural network.\n",
122 |     "\n",
123 |     "The building blocks of TensorFlow are variables and operations, with these we can form computational graphs that form neural networks.\n",
124 |     "\n",
125 |     "The [tf.placeholder](https://www.tensorflow.org/versions/r0.10/api_docs/python/io_ops.html#placeholder) allows us to feed our input data to the computational graph. We can define constraints with the shape of the placeholder to only take a tensor of a certain shape. Note that it is common to provide ``None`` for the first dimension, which allows us to vary the batch size at runtime.\n",
126 |     "\n",
127 |     "The [tf.Variable](https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#Variable) allows us to store and update Tensors in our graph. Variables are used to build weights for our neural network. Note that we will use a wrapper called [`tf.get_variable`](https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#get_variable) througout this tutorial.\n",
128 |     "\n",
129 |     "The [tf.Operation](https://www.tensorflow.org/versions/r0.10/api_docs/python/framework.html#Operation) allows us to perform operations on tensors, resulting in new tensors. Such as when computing the logistic regression which is implemented below:\n",
130 |     "\n",
131 |     "$$y = nonlinearity(xW + b)$$\n",
132 |     "\n",
133 |     "where $x$ is the input tensor, $y$ is the output tensor and $\\{W, b\\}$ are the weights (variable tensors). The weights are initialized with an initializer of our choice (check [tensorflow's API](https://www.tensorflow.org/versions/r0.10/api_docs/index.html) for more.\n",
134 |     "x has shape ```[batchsize, num_features]```. ```W``` has shape ```[num_features, num_units]``` and b has ```[num_units]```. y has then ```[batch_size, num_units]```.\n",
135 |     "\n",
136 |     "NOTE: to make building neural networks easier, TensorFlow's [contrib](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#layers-contrib) wraps TensorFlow functionality to support various operations such as; [convolutions](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#convolution2d), [batch_norm](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#batch_norm), [fully_connected](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#fully_connected).\n",
137 |     "\n",
138 |     "In this first exercise we will use basic TensorFlow functions so that you can learn how to build it from scratch. This will help you later if you want to build your own custom operations."
139 |    ]
140 |   },
141 |   {
142 |    "cell_type": "markdown",
143 |    "metadata": {},
144 |    "source": [
145 |     "## TensorFlow Playerground\n",
146 |     "\n",
147 |     "If you are new to Neural Networks, start by using the [TensorFlow playground](http://playground.tensorflow.org/) to familiarize yourself with hidden layers, hidden units, activations, learning rate, etc."
148 |    ]
149 |   },
150 |   {
151 |    "cell_type": "code",
152 |    "execution_count": null,
153 |    "metadata": {
154 |     "collapsed": false
155 |    },
156 |    "outputs": [],
157 |    "source": [
158 |     "# resets the graph, needed when initializing weights multiple times, like in this notebook\n",
159 |     "reset_default_graph()\n",
160 |     "\n",
161 |     "# Setting up placeholder, this is where your data enters the graph!\n",
162 |     "x_pl = tf.placeholder(tf.float32, [None, num_features])\n",
163 |     "\n",
164 |     "# Setting up variables, these variables are weights in your network that can be update while running our graph.\n",
165 |     "# Notice, to make a hidden layer, the weights needs to have the following dimensionality\n",
166 |     "# W[number_of_units_going_in, number_of_units_going_out]\n",
167 |     "# b[number_of_units_going_out]\n",
168 |     "# in the example below we have 2 input units (num_features) and 2 output units (num_output)\n",
169 |     "# so our weights become W[2, 2], b[2]\n",
170 |     "# if we want to make a hidden layer with 100 units, we need to define the shape of the\n",
171 |     "# first weight to W[2, 100], b[2] and the shape of the second weight to W[100, 2], b[2]\n",
172 |     "\n",
173 |     "# defining our initializer for our weigths from a normal distribution (mean=0, std=0.1)\n",
174 |     "weight_initializer = tf.truncated_normal_initializer(stddev=0.1)\n",
175 |     "with tf.variable_scope('l_1'): # if you run it more than once, reuse has to be True\n",
176 |     "    W_1 = tf.get_variable('W', [num_features, num_output], # change num_output to 100 for mlp\n",
177 |     "                          initializer=weight_initializer)\n",
178 |     "    b_1 = tf.get_variable('b', [num_output], # change num_output to 100 for mlp\n",
179 |     "                          initializer=tf.constant_initializer(0.0))\n",
180 |     "# with tf. variable_scope('l_2'):\n",
181 |     "#     W_2 = tf.get_variable('W', [100, num_output],\n",
182 |     "#                           initializer=weight_initializer)\n",
183 |     "#     b_2 = tf.get_variable('b', [num_output],\n",
184 |     "#                           initializer=tf.constant_initializer(0.0))\n",
185 |     "\n",
186 |     "# Setting up ops, these ops will define edges along our computational graph\n",
187 |     "# The below ops will compute a logistic regression, but can be modified to compute\n",
188 |     "# a neural network\n",
189 |     "\n",
190 |     "l_1 = tf.matmul(x_pl, W_1) + b_1\n",
191 |     "# to make a hidden layer we need a nonlinearity\n",
192 |     "# l_1_nonlinear = tf.nn.relu(l_1)\n",
193 |     "# the layer before the softmax should not have a nonlinearity\n",
194 |     "# l_2 = tf.matmul(l_1_nonlinear, W_2) + b_2\n",
195 |     "y = tf.nn.softmax(l_1) # change to l_2 for MLP"
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "code",
200 |    "execution_count": null,
201 |    "metadata": {
202 |     "collapsed": false
203 |    },
204 |    "outputs": [],
205 |    "source": [
206 |     "# knowing how to print your tensors and ops is useful, here are some examples\n",
207 |     "print(\"---placeholders---\")\n",
208 |     "print(x_pl.name)\n",
209 |     "print(x_pl)\n",
210 |     "print\n",
211 |     "print(\"---weights---\")\n",
212 |     "print(W_1.name)\n",
213 |     "print(W_1.get_shape())\n",
214 |     "print(W_1)\n",
215 |     "print\n",
216 |     "print(b_1.name)\n",
217 |     "print(b_1)\n",
218 |     "print(b_1.get_shape())\n",
219 |     "print\n",
220 |     "print(\"---ops---\")\n",
221 |     "print(l_1.name)\n",
222 |     "print(l_1)\n",
223 |     "print\n",
224 |     "print(y.name)\n",
225 |     "print(y)"
226 |    ]
227 |   },
228 |   {
229 |    "cell_type": "markdown",
230 |    "metadata": {},
231 |    "source": [
232 |     "After we have built the network we have our tensors in our default [graph](https://www.tensorflow.org/versions/r0.10/api_docs/python/framework.html#Graph), which we can use to build the cost function and training part.\n",
233 |     "\n",
234 |     "Further, using our default graph we can print the operations and variables of our default graph."
235 |    ]
236 |   },
237 |   {
238 |    "cell_type": "code",
239 |    "execution_count": null,
240 |    "metadata": {
241 |     "collapsed": false
242 |    },
243 |    "outputs": [],
244 |    "source": [
245 |     "# y_ is a placeholder variable taking on the value of the target batch.\n",
246 |     "y_ = tf.placeholder(tf.float32, [None, num_output])\n",
247 |     "\n",
248 |     "# computing cross entropy per sample\n",
249 |     "cross_entropy = -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])\n",
250 |     "\n",
251 |     "# averaging over samples\n",
252 |     "cross_entropy = tf.reduce_mean(cross_entropy)"
253 |    ]
254 |   },
255 |   {
256 |    "cell_type": "markdown",
257 |    "metadata": {},
258 |    "source": [
259 |     "Notice that our weights and operations defined in the `l_1` space are saved in the `l_1` directory of the graph."
260 |    ]
261 |   },
262 |   {
263 |    "cell_type": "code",
264 |    "execution_count": null,
265 |    "metadata": {
266 |     "collapsed": false
267 |    },
268 |    "outputs": [],
269 |    "source": [
270 |     "# using the graph to print ops\n",
271 |     "print(\"operations\")\n",
272 |     "operations = [op.name for op in tf.get_default_graph().get_operations()]\n",
273 |     "print(operations)\n",
274 |     "print\n",
275 |     "# variables are accessed through tensorflow\n",
276 |     "print(\"variables\")\n",
277 |     "variables = [var.name for var in tf.all_variables()]\n",
278 |     "print(variables)"
279 |    ]
280 |   },
281 |   {
282 |    "cell_type": "markdown",
283 |    "metadata": {},
284 |    "source": [
285 |     "To train our neural network we need to update the parameters in direction of the negative gradient w.r.t the cost function we defined earlier.\n",
286 |     "We can use `tf.train.Optimizer` to get the gradients (using `compute_gradients`) for all parameters in the network w.r.t ``cost_train``.\n",
287 |     "Imagine that `cost_train` is a function and we want to go downhill. We go downhill by changing the value of the paramters in direction of the negative gradient. \n",
288 |     "\n",
289 |     "Finally we can use the built-in `minimize` to calculate the stochastic gradient descent (SGD) update rule for each paramter in the network.\n",
290 |     "\n",
291 |     "Heres a small animation of gradient descent: http://imgur.com/a/Hqolp . E.g why saddle points might be difficult.\n",
292 |     "To use the other optimizers checkout which optimizers TensorFlow [supports](https://www.tensorflow.org/versions/r0.10/api_docs/python/train.html#optimizers)"
293 |    ]
294 |   },
295 |   {
296 |    "cell_type": "code",
297 |    "execution_count": null,
298 |    "metadata": {
299 |     "collapsed": false
300 |    },
301 |    "outputs": [],
302 |    "source": [
303 |     "# Defining our optimizer (try with different optimizers here!)\n",
304 |     "optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)\n",
305 |     "\n",
306 |     "# Computing our gradients\n",
307 |     "grads_and_vars = optimizer.compute_gradients(cross_entropy)\n",
308 |     "\n",
309 |     "# Applying the gradients\n",
310 |     "train_op = optimizer.apply_gradients(grads_and_vars)\n",
311 |     "\n",
312 |     "# Notice, alternatively you can use train_op = optimizer.minimize(crossentropy)\n",
313 |     "# instead of the three steps above"
314 |    ]
315 |   },
316 |   {
317 |    "cell_type": "markdown",
318 |    "metadata": {},
319 |    "source": [
320 |     "Next, we make the prediction functions, such that we can get an accuracy measure over a batch"
321 |    ]
322 |   },
323 |   {
324 |    "cell_type": "code",
325 |    "execution_count": null,
326 |    "metadata": {
327 |     "collapsed": true
328 |    },
329 |    "outputs": [],
330 |    "source": [
331 |     "# making a one-hot encoded vector of correct (1) and incorrect (0) predictions\n",
332 |     "correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))\n",
333 |     "\n",
334 |     "# averaging the one-hot encoded vector\n",
335 |     "accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))"
336 |    ]
337 |   },
338 |   {
339 |    "cell_type": "markdown",
340 |    "metadata": {},
341 |    "source": [
342 |     "The next step is to utilize our `train_op` function repeatedly in order to optimize our weights `W_1` and `b_1` to make the best possible linear seperation of the half moon dataset.\n",
343 |     "\n",
344 |     "It is worth or read a short introduction on TensorFlow [sessions](https://www.tensorflow.org/versions/r0.10/api_docs/python/client.html#Session) before continuing to the next codeblock. Sessions are used to run TensorFlow graphs, they uses `fetches` to decide which parts of the graph to compute and `feed_dicts` to load data into the graph."
345 |    ]
346 |   },
347 |   {
348 |    "cell_type": "code",
349 |    "execution_count": null,
350 |    "metadata": {
351 |     "collapsed": false
352 |    },
353 |    "outputs": [],
354 |    "source": [
355 |     "# defining a function to make predictions using our classifier\n",
356 |     "def pred(X_in, sess):\n",
357 |     "    # first we must define what data to give it\n",
358 |     "    feed_dict = {x_pl: X_in}\n",
359 |     "    # secondly our fetches\n",
360 |     "    fetches = [y]\n",
361 |     "    # utilizing the given session (ref. sess) to compute results\n",
362 |     "    res = sess.run(fetches, feed_dict)\n",
363 |     "    # res is a list with each indices representing the corresponding element in fetches\n",
364 |     "    return res[0]\n",
365 |     "\n",
366 |     "# Training loop\n",
367 |     "num_epochs = 1000\n",
368 |     "\n",
369 |     "train_cost, val_cost, val_acc = [],[],[]\n",
370 |     "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n",
371 |     "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n",
372 |     "with tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts)) as sess:\n",
373 |     "    \n",
374 |     "    # initializing all variables\n",
375 |     "    init = tf.initialize_all_variables()\n",
376 |     "    sess.run(init)\n",
377 |     "    plot_decision_boundary(lambda x: pred(x, sess), X_val, y_val)\n",
378 |     "    plt.title(\"Untrained Classifier\")\n",
379 |     "    for e in range(num_epochs):\n",
380 |     "        ### TRAINING ###\n",
381 |     "        # what to feed to our train_op\n",
382 |     "        # notice we onehot encode our predictions to change shape from (batch,) -> (batch, num_output)\n",
383 |     "        feed_dict_train = {x_pl: X_tr, y_: onehot(y_tr, num_output)}\n",
384 |     "        \n",
385 |     "        # deciding which parts to fetch, train_op makes the classifier \"train\"\n",
386 |     "        fetches_train = [train_op, cross_entropy]\n",
387 |     "        \n",
388 |     "        # running the train_op\n",
389 |     "        res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)\n",
390 |     "        # storing cross entropy (second fetch argument, so index=1)\n",
391 |     "        train_cost += [res[1]]\n",
392 |     "    \n",
393 |     "        ### VALIDATING ###\n",
394 |     "        # what to feed our accuracy op\n",
395 |     "        feed_dict_valid = {x_pl: X_val, y_: onehot(y_val, num_output)}\n",
396 |     "\n",
397 |     "        # deciding which parts to fetch\n",
398 |     "        fetches_valid = [cross_entropy, accuracy]\n",
399 |     "\n",
400 |     "        # running the validation\n",
401 |     "        res = sess.run(fetches=fetches_valid, feed_dict=feed_dict_valid)\n",
402 |     "        val_cost += [res[0]]\n",
403 |     "        val_acc += [res[1]]\n",
404 |     "\n",
405 |     "        if e % 100 == 0:\n",
406 |     "            print \"Epoch %i, Train Cost: %0.3f\\tVal Cost: %0.3f\\t Val acc: %0.3f\"%(e, train_cost[-1],val_cost[-1],val_acc[-1])\n",
407 |     "\n",
408 |     "    ### TESTING ###\n",
409 |     "    # what to feed our accuracy op\n",
410 |     "    feed_dict_test = {x_pl: X_te, y_: onehot(y_te, num_output)}\n",
411 |     "\n",
412 |     "    # deciding which parts to fetch\n",
413 |     "    fetches_test = [cross_entropy, accuracy]\n",
414 |     "\n",
415 |     "    # running the validation\n",
416 |     "    res = sess.run(fetches=fetches_test, feed_dict=feed_dict_test)\n",
417 |     "    test_cost = res[0]\n",
418 |     "    test_acc = res[1]\n",
419 |     "    print \"\\nTest Cost: %0.3f\\tTest Accuracy: %0.3f\"%(test_cost, test_acc)\n",
420 |     "    \n",
421 |     "    # For plotting purposes\n",
422 |     "    plot_decision_boundary(lambda x: pred(x, sess), X_te, y_te)\n",
423 |     "\n",
424 |     "# notice: we do not need to use the session environment anymore, so returning from it.\n",
425 |     "plt.title(\"Trained Classifier\")\n",
426 |     "\n",
427 |     "epoch = np.arange(len(train_cost))\n",
428 |     "plt.figure()\n",
429 |     "plt.plot(epoch,train_cost,'r',epoch,val_cost,'b')\n",
430 |     "plt.legend(['Train Loss','Val Loss'])\n",
431 |     "plt.xlabel('Updates'), plt.ylabel('Loss')"
432 |    ]
433 |   },
434 |   {
435 |    "cell_type": "markdown",
436 |    "metadata": {},
437 |    "source": [
438 |     "# Assignments Half Moon\n",
439 |     "\n",
440 |     " 1) A linear logistic classifier is only able to create a linear decision boundary. Change the Logistic classifier into a (non-linear) Neural network by inserting a dense hidden layer between the input and output layers of the model\n",
441 |     " \n",
442 |     " 2) Experiment with multiple hidden layers or more / less hidden units. What happens to the decision boundary?\n",
443 |     " \n",
444 |     " 3) Overfitting: When increasing the number of hidden layers / units the neural network will fit the training data better by creating a highly nonlinear decision boundary. If the model is to complex it will often generalize poorly to new data (validation and test set). Can you obseve this from the training and validation errors? \n",
445 |     " \n",
446 |     " 4) We used the vanilla stocastic gradient descent algorithm for parameter updates. This is usually slow to converge and more sophisticated pseudo-second-order methods usually works better. Try changing the optimizer to [adam or momentum](https://www.tensorflow.org/versions/r0.10/api_docs/python/train.html#AdamOptimizer)"
447 |    ]
448 |   },
449 |   {
450 |    "cell_type": "markdown",
451 |    "metadata": {},
452 |    "source": [
453 |     "# Optional:  MNIST dataset\n",
454 |     "MNIST is a dataset that is often used for benchmarking. The MNIST dataset consists of 70,000 images of handwritten digits from 0-9. The dataset is split into a 50,000 images training set, 10,000 images validation set and 10,000 images test set. The images are 28x28 pixels, where each pixel represents a normalised value between 0-255 (0=black and 255=white).\n",
455 |     "\n",
456 |     "### Primer for the afternoon...\n",
457 |     "We use a feedforward neural network to classify the 28x28 mnist images. ``num_features`` is therefore 28x28=784.\n",
458 |     "That is, we represent each image as a vector. The ordering of the pixels in the vector does not matter, so we could permutate all images using the same permutation and still get the same performance. (Your are of course encouraged to try this using ``numpy.random.permutation`` to get a random permutation :)). This task is therefore called the _permutation invariant_ MNIST. Obviously this throws away a lot of structure in the data. After lunch we'll fix this with the convolutional neural network wich encodes prior knowledgde about data that has either spatial or temporal structure.  \n",
459 |     "\n",
460 |     "### Ballpark estimates of hyperparameters\n",
461 |     "__Optimizers:__\n",
462 |     "    1. SGD + Momentum: learning rate 1.0 - 0.1 \n",
463 |     "    2. ADAM: learning rate 3*1e-4 - 1e-5\n",
464 |     "    3. RMSPROP: somewhere between SGD and ADAM\n",
465 |     "\n",
466 |     "__Regularization:__\n",
467 |     "    1. Dropout. Dropout rate 0.1-0.5\n",
468 |     "    2. L2 and L1 regularization - https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#regularizers.\n",
469 |     "    Not used that often in deep learning, but 1e-4  -  1e-8.\n",
470 |     "    3. Batchnorm: Batchnorm also acts as a regularizer - https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#batch_norm\n",
471 |     "    Often very useful (faster and better convergence)\n",
472 |     "    \n",
473 |     "    \n",
474 |     "__Parameter initialization__\n",
475 |     "    Parameter initialization is extremely important. TensorFlow has a lot of different initializers, check the TensorFlow API [documentation](https://www.tensorflow.org/versions/r0.10/api_docs/index.html). Often used initializer are\n",
476 |     "    1. He - (not available in TensorFlow's API)\n",
477 |     "    2. Glorot - https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#xavier_initializer\n",
478 |     "    3. Uniform or Normal with small scale. (0.1 - 0.01) - https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#random_normal_initializer\n",
479 |     "    4. Orthogonal (I find that this works very well for RNNs) - (not available in TensorFlow's API)\n",
480 |     "\n",
481 |     "Bias is nearly always initialized to zero using the [tf.constant_initializer](https://www.tensorflow.org/versions/r0.10/api_docs/python/state_ops.html#constant_initializer).\n",
482 |     "\n",
483 |     "__Number of hidden units and network structure__\n",
484 |     "   Probably as big network as possible and then apply regularization. You'll have to experiment :). One rarely goes below 512 units for feedforward networks unless your are training on CPU...\n",
485 |     "   Theres is some research into stochstic depth networks: https://arxiv.org/pdf/1603.09382v2.pdf, but in general this is trail and error.\n",
486 |     "\n",
487 |     "__Nonlinearity__: [The most commonly used nonliearities are](https://www.tensorflow.org/versions/r0.10/api_docs/python/nn.html#activation-functions)\n",
488 |     "    \n",
489 |     "    1. ReLU\n",
490 |     "    2. Leaky ReLU. Same as \n",
491 |     "    3. Elu\n",
492 |     "    3. Sigmoids are used if your output is binary. It is not used in the hidden layers. Squases the output between -1 and 1\n",
493 |     "    4. Softmax used as output if you have a classification problem. Normalizes the the output to 1. )\n",
494 |     "\n",
495 |     "\n",
496 |     "See the plot below.\n",
497 |     "\n",
498 |     "__mini-batch size__\n",
499 |     "   Usually people use 16-256. Bigger is not allways better. With smaller mini-batch size you get more updates and your model might converge faster. Also small batchsizes uses less memory  -> you can train a model with more parameters.\n",
500 |     "\n",
501 |     "Hyperparameters can be found by experience (guessing) or some search procedure. Random search is easy to implement and performs decent: http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf . \n",
502 |     "More advanced search procedures include [SPEARMINT](https://github.com/JasperSnoek/spearmint) and many others."
503 |    ]
504 |   },
505 |   {
506 |    "cell_type": "code",
507 |    "execution_count": null,
508 |    "metadata": {
509 |     "collapsed": false
510 |    },
511 |    "outputs": [],
512 |    "source": [
513 |     "# PLOT OF DIFFERENT OUTPUT USNITS\n",
514 |     "x = np.linspace(-6, 6, 100)\n",
515 |     "relu = lambda x: np.maximum(0, x)\n",
516 |     "leaky_relu = lambda x: np.maximum(0, x) + 0.1*np.minimum(0, x) # probably a slow implementation....\n",
517 |     "elu = lambda x: (x > 0)*x + (1 - (x > 0))*(np.exp(x) - 1) \n",
518 |     "sigmoid = lambda x: (1+np.exp(-x))**(-1)\n",
519 |     "def softmax(w, t = 1.0):\n",
520 |     "    e = np.exp(w)\n",
521 |     "    dist = e / np.sum(e)\n",
522 |     "    return dist\n",
523 |     "x_softmax = softmax(x)\n",
524 |     "\n",
525 |     "plt.figure(figsize=(6,6))\n",
526 |     "plt.plot(x, relu(x), label='ReLU', lw=2)\n",
527 |     "plt.plot(x, leaky_relu(x), label='Leaky ReLU',lw=2)\n",
528 |     "plt.plot(x, elu(x), label='Elu', lw=2)\n",
529 |     "plt.plot(x, sigmoid(x), label='Sigmoid',lw=2)\n",
530 |     "plt.legend(loc=2, fontsize=16)\n",
531 |     "plt.title('Non-linearities', fontsize=20)\n",
532 |     "plt.ylim([-2, 5])\n",
533 |     "plt.xlim([-6, 6])\n",
534 |     "\n",
535 |     "# softmax\n",
536 |     "# assert that all class probablities sum to one\n",
537 |     "print np.sum(x_softmax)\n",
538 |     "assert abs(1.0 - x_softmax.sum()) < 1e-8"
539 |    ]
540 |   },
541 |   {
542 |    "cell_type": "markdown",
543 |    "metadata": {
544 |     "collapsed": true
545 |    },
546 |    "source": [
547 |     "## MNIST\n",
548 |     "First let's load the MNIST dataset and plot a few examples:"
549 |    ]
550 |   },
551 |   {
552 |    "cell_type": "code",
553 |    "execution_count": null,
554 |    "metadata": {
555 |     "collapsed": false
556 |    },
557 |    "outputs": [],
558 |    "source": [
559 |     "#To speed up training we'll only work on a subset of the data\n",
560 |     "data = np.load('mnist.npz')\n",
561 |     "num_classes = 10\n",
562 |     "x_train = data['X_train'][:1000].astype('float32')\n",
563 |     "targets_train = data['y_train'][:1000].astype('int32')\n",
564 |     "\n",
565 |     "x_valid = data['X_valid'][:500].astype('float32')\n",
566 |     "targets_valid = data['y_valid'][:500].astype('int32')\n",
567 |     "\n",
568 |     "x_test = data['X_test'][:500].astype('float32')\n",
569 |     "targets_test = data['y_test'][:500].astype('int32')\n",
570 |     "\n",
571 |     "print \"Information on dataset\"\n",
572 |     "print \"x_train\", x_train.shape\n",
573 |     "print \"targets_train\", targets_train.shape\n",
574 |     "print \"x_valid\", x_valid.shape\n",
575 |     "print \"targets_valid\", targets_valid.shape\n",
576 |     "print \"x_test\", x_test.shape\n",
577 |     "print \"targets_test\", targets_test.shape"
578 |    ]
579 |   },
580 |   {
581 |    "cell_type": "code",
582 |    "execution_count": null,
583 |    "metadata": {
584 |     "collapsed": false,
585 |     "scrolled": true
586 |    },
587 |    "outputs": [],
588 |    "source": [
589 |     "#plot a few MNIST examples\n",
590 |     "idx = 0\n",
591 |     "canvas = np.zeros((28*10, 10*28))\n",
592 |     "for i in range(10):\n",
593 |     "    for j in range(10):\n",
594 |     "        canvas[i*28:(i+1)*28, j*28:(j+1)*28] = x_train[idx].reshape((28, 28))\n",
595 |     "        idx += 1\n",
596 |     "plt.figure(figsize=(7, 7))\n",
597 |     "plt.axis('off')\n",
598 |     "plt.imshow(canvas, cmap='gray')\n",
599 |     "plt.title('MNIST handwritten digits')\n",
600 |     "plt.show()"
601 |    ]
602 |   },
603 |   {
604 |    "cell_type": "code",
605 |    "execution_count": null,
606 |    "metadata": {
607 |     "collapsed": false
608 |    },
609 |    "outputs": [],
610 |    "source": [
611 |     "#Hyperparameters\n",
612 |     "\n",
613 |     "num_classes = 10\n",
614 |     "num_l1 = 512\n",
615 |     "num_features = x_train.shape[1]\n",
616 |     "\n",
617 |     "# resetting the graph ...\n",
618 |     "reset_default_graph()\n",
619 |     "\n",
620 |     "# Setting up placeholder, this is where your data enters the graph!\n",
621 |     "x_pl = tf.placeholder(tf.float32, [None, num_features])\n",
622 |     "\n",
623 |     "# defining our weight initializers\n",
624 |     "weight_initializer = tf.truncated_normal_initializer(stddev=0.1)\n",
625 |     "\n",
626 |     "# Setting up the trainable weights of the network\n",
627 |     "with tf.variable_scope('l_1'):\n",
628 |     "    W_1 = tf.get_variable('W', [num_features, num_l1],\n",
629 |     "                          initializer=weight_initializer)\n",
630 |     "    b_1 = tf.get_variable('b', [num_l1],\n",
631 |     "                          initializer=tf.constant_initializer(0.0))\n",
632 |     "\n",
633 |     "with tf.variable_scope('l_2'):\n",
634 |     "    W_2 = tf.get_variable('W', [num_l1, num_classes],\n",
635 |     "                          initializer=weight_initializer)\n",
636 |     "    b_2 = tf.get_variable('b', [num_classes],\n",
637 |     "                          initializer=tf.constant_initializer(0.0))\n",
638 |     "\n",
639 |     "\n",
640 |     "# Building the layers of the neural network\n",
641 |     "l1 = tf.matmul(x_pl, W_1) + b_1\n",
642 |     "l1_nonlinear = tf.nn.elu(l1) # you can try with various activation functions!\n",
643 |     "l2 = tf.matmul(l1, W_2) + b_2\n",
644 |     "y = tf.nn.softmax(l2)"
645 |    ]
646 |   },
647 |   {
648 |    "cell_type": "code",
649 |    "execution_count": null,
650 |    "metadata": {
651 |     "collapsed": true
652 |    },
653 |    "outputs": [],
654 |    "source": [
655 |     "# y_ is a placeholder variable taking on the value of the target batch.\n",
656 |     "y_ = tf.placeholder(tf.float32, [None, num_classes])\n",
657 |     "\n",
658 |     "# computing cross entropy per sample\n",
659 |     "cross_entropy = -tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1])\n",
660 |     "\n",
661 |     "# averaging over samples\n",
662 |     "loss_tn = tf.reduce_mean(cross_entropy)\n",
663 |     "\n",
664 |     "# L2 regularization\n",
665 |     "#reg_scale = 0.0001\n",
666 |     "#regularize = tf.contrib.layers.l2_regularizer(reg_scale)\n",
667 |     "#params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n",
668 |     "#reg_term = sum([regularize(param) for param in params])\n",
669 |     "#loss_tn += reg_term\n",
670 |     "\n",
671 |     "# defining our optimizer\n",
672 |     "optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.1)\n",
673 |     "\n",
674 |     "# applying the gradients\n",
675 |     "train_op = optimizer.minimize(loss_tn)\n",
676 |     "\n",
677 |     "# notice, alternatively you can use train_op = optimizer.minimize(crossentropy)\n",
678 |     "# instead of the three steps above"
679 |    ]
680 |   },
681 |   {
682 |    "cell_type": "code",
683 |    "execution_count": null,
684 |    "metadata": {
685 |     "collapsed": false
686 |    },
687 |    "outputs": [],
688 |    "source": [
689 |     "#Test the forward pass\n",
690 |     "x = np.random.normal(0,1, (45, 28*28)).astype('float32') #dummy data\n",
691 |     "\n",
692 |     "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n",
693 |     "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n",
694 |     "# initialize the Session\n",
695 |     "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n",
696 |     "sess.run(tf.initialize_all_variables())\n",
697 |     "res = sess.run(fetches=[y], feed_dict={x_pl: x})\n",
698 |     "print \"y\", res[0].shape"
699 |    ]
700 |   },
701 |   {
702 |    "cell_type": "markdown",
703 |    "metadata": {},
704 |    "source": [
705 |     "# Build the training loop.\n",
706 |     "We train the network by calculating the gradient w.r.t the cost function and update the parameters in direction of the negative gradient. \n",
707 |     "\n",
708 |     "\n",
709 |     "When training neural network you always use mini batches. Instead of calculating the average gradient using the entire dataset you approximate the gradient using a mini-batch of typically 16 to 256 samples. The paramters are updated after each mini batch. Networks converges much faster using minibatches because the paramters are updated more often.\n",
710 |     "\n",
711 |     "We build a loop that iterates over the training data. Remember that the parameters are updated each time ``f_train`` is called."
712 |    ]
713 |   },
714 |   {
715 |    "cell_type": "code",
716 |    "execution_count": null,
717 |    "metadata": {
718 |     "collapsed": false
719 |    },
720 |    "outputs": [],
721 |    "source": [
722 |     "# using confusionmatrix to handle \n",
723 |     "from confusionmatrix import ConfusionMatrix\n",
724 |     "\n",
725 |     "# setting hyperparameters and gettings epoch sizes\n",
726 |     "batch_size = 100\n",
727 |     "num_epochs = 100\n",
728 |     "num_samples_train = x_train.shape[0]\n",
729 |     "num_batches_train = num_samples_train // batch_size\n",
730 |     "num_samples_valid = x_valid.shape[0]\n",
731 |     "num_batches_valid = num_samples_valid // batch_size\n",
732 |     "\n",
733 |     "# setting up lists for handling loss/accuracy\n",
734 |     "train_acc, train_loss = [], []\n",
735 |     "valid_acc, valid_loss = [], []\n",
736 |     "test_acc, test_loss = [], []\n",
737 |     "cur_loss = 0\n",
738 |     "loss = []\n",
739 |     "## TRAINING ##\n",
740 |     "for epoch in range(num_epochs):\n",
741 |     "    #Forward->Backprob->Update params\n",
742 |     "    cur_loss = 0\n",
743 |     "    for i in range(num_batches_train):\n",
744 |     "        idx = range(i*batch_size, (i+1)*batch_size)\n",
745 |     "        x_batch = x_train[idx]\n",
746 |     "        target_batch = targets_train[idx]\n",
747 |     "        feed_dict_train = {x_pl: x_batch, y_: onehot(target_batch, num_classes)}\n",
748 |     "        fetches_train = [train_op, loss_tn]\n",
749 |     "        res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)\n",
750 |     "        batch_loss = res[1]\n",
751 |     "        cur_loss += batch_loss\n",
752 |     "    loss += [cur_loss/batch_size]\n",
753 |     "    \n",
754 |     "    confusion_valid = ConfusionMatrix(num_classes)\n",
755 |     "    confusion_train = ConfusionMatrix(num_classes)\n",
756 |     "\n",
757 |     "    ### EVAL - TRAIN ###\n",
758 |     "    for i in range(num_batches_train):\n",
759 |     "        idx = range(i*batch_size, (i+1)*batch_size)\n",
760 |     "        x_batch = x_train[idx]\n",
761 |     "        targets_batch = targets_train[idx]\n",
762 |     "        # what to feed our accuracy op\n",
763 |     "        feed_dict_eval_train = {x_pl: x_batch, y_: onehot(targets_batch, num_classes)}\n",
764 |     "        # deciding which parts to fetch\n",
765 |     "        fetches_eval_train = [y]\n",
766 |     "        # running the validation\n",
767 |     "        res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n",
768 |     "        # collecting and storing predictions\n",
769 |     "        net_out = res[0]\n",
770 |     "        preds = np.argmax(net_out, axis=-1)\n",
771 |     "        confusion_train.batch_add(targets_batch, preds)\n",
772 |     "\n",
773 |     "    ### EVAL - VALIDATION ###\n",
774 |     "    confusion_valid = ConfusionMatrix(num_classes)\n",
775 |     "    for i in range(num_batches_valid):\n",
776 |     "        idx = range(i*batch_size, (i+1)*batch_size)\n",
777 |     "        x_batch = x_valid[idx]\n",
778 |     "        targets_batch = targets_valid[idx]\n",
779 |     "        # what to feed our accuracy op\n",
780 |     "        feed_dict_eval_train = {x_pl: x_batch, y_: onehot(targets_batch, num_classes)}\n",
781 |     "        # deciding which parts to fetch\n",
782 |     "        fetches_eval_train = [y]\n",
783 |     "        # running the validation\n",
784 |     "        res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n",
785 |     "        # collecting and storing predictions\n",
786 |     "        net_out = res[0]\n",
787 |     "        preds = np.argmax(net_out, axis=-1) \n",
788 |     "        confusion_valid.batch_add(targets_batch, preds)\n",
789 |     "    \n",
790 |     "    train_acc_cur = confusion_train.accuracy()\n",
791 |     "    valid_acc_cur = confusion_valid.accuracy()\n",
792 |     "\n",
793 |     "    train_acc += [train_acc_cur]\n",
794 |     "    valid_acc += [valid_acc_cur]\n",
795 |     "    print \"Epoch %i : Train Loss %e , Train acc %f,  Valid acc %f \" \\\n",
796 |     "    % (epoch+1, loss[-1], train_acc_cur, valid_acc_cur)\n",
797 |     "    \n",
798 |     "    \n",
799 |     "epoch = np.arange(len(train_acc))\n",
800 |     "plt.figure()\n",
801 |     "plt.plot(epoch,train_acc,'r',epoch,valid_acc,'b')\n",
802 |     "plt.legend(['Train Acc','Val Acc'])\n",
803 |     "plt.xlabel('Updates'), plt.ylabel('Acc')"
804 |    ]
805 |   },
806 |   {
807 |    "cell_type": "markdown",
808 |    "metadata": {
809 |     "collapsed": true
810 |    },
811 |    "source": [
812 |     "# More questions\n",
813 |     "\n",
814 |     "1. Do you see overfitting? Google overfitting if you don't know how to spot it\n",
815 |     "2. Try and regularize your network by penalizing the L2 or L1 norm of the network parameters. [Read the docs for more info](https://www.tensorflow.org/versions/r0.10/api_docs/python/contrib.layers.html#regularizers)"
816 |    ]
817 |   }
818 |  ],
819 |  "metadata": {
820 |   "kernelspec": {
821 |    "display_name": "Python 2",
822 |    "language": "python",
823 |    "name": "python2"
824 |   },
825 |   "language_info": {
826 |    "codemirror_mode": {
827 |     "name": "ipython",
828 |     "version": 2
829 |    },
830 |    "file_extension": ".py",
831 |    "mimetype": "text/x-python",
832 |    "name": "python",
833 |    "nbconvert_exporter": "python",
834 |    "pygments_lexer": "ipython2",
835 |    "version": "2.7.6"
836 |   }
837 |  },
838 |  "nbformat": 4,
839 |  "nbformat_minor": 0
840 | }
841 | 


--------------------------------------------------------------------------------
/lab1_FFN/mnist.npz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab1_FFN/mnist.npz


--------------------------------------------------------------------------------
/lab2_CNN/.gitignore:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab2_CNN/.gitignore


--------------------------------------------------------------------------------
/lab2_CNN/confusionmatrix.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | class ConfusionMatrix:
  5 |     """
  6 |        Simple confusion matrix class
  7 |        row is the true class, column is the predicted class
  8 |     """
  9 |     def __init__(self, num_classes, class_names=None):
 10 |         self.n_classes = num_classes
 11 |         if class_names is None:
 12 |             self.class_names = map(str, range(num_classes))
 13 |         else:
 14 |             self.class_names = class_names
 15 | 
 16 |         # find max class_name and pad
 17 |         max_len = max(map(len, self.class_names))
 18 |         self.max_len = max_len
 19 |         for idx, name in enumerate(self.class_names):
 20 |             if len(self.class_names) < max_len:
 21 |                 self.class_names[idx] = name + " "*(max_len-len(name))
 22 | 
 23 |         self.mat = np.zeros((num_classes,num_classes),dtype='int')
 24 | 
 25 |     def __str__(self):
 26 |         # calucate row and column sums
 27 |         col_sum = np.sum(self.mat, axis=1)
 28 |         row_sum = np.sum(self.mat, axis=0)
 29 | 
 30 |         s = []
 31 | 
 32 |         mat_str = self.mat.__str__()
 33 |         mat_str = mat_str.replace('[','').replace(']','').split('\n')
 34 | 
 35 |         for idx, row in enumerate(mat_str):
 36 |             if idx == 0:
 37 |                 pad = " "
 38 |             else:
 39 |                 pad = ""
 40 |             class_name = self.class_names[idx]
 41 |             class_name = " " + class_name + " |"
 42 |             row_str = class_name + pad + row
 43 |             row_str += " |" + str(col_sum[idx])
 44 |             s.append(row_str)
 45 | 
 46 |         row_sum = [(self.max_len+4)*" "+" ".join(map(str, row_sum))]
 47 |         hline = [(1+self.max_len)*" "+"-"*len(row_sum[0])]
 48 | 
 49 |         s = hline + s + hline + row_sum
 50 | 
 51 |         # add linebreaks
 52 |         s_out = [line+'\n' for line in s]
 53 |         return "".join(s_out)
 54 | 
 55 |     def batch_add(self, targets, preds):
 56 |         assert targets.shape == preds.shape
 57 |         assert len(targets) == len(preds)
 58 |         assert max(targets) < self.n_classes
 59 |         assert max(preds) < self.n_classes
 60 |         targets = targets.flatten()
 61 |         preds = preds.flatten()
 62 |         for i in range(len(targets)):
 63 |                 self.mat[targets[i], preds[i]] += 1
 64 | 
 65 |     def get_errors(self):
 66 |         tp = np.asarray(np.diag(self.mat).flatten(),dtype='float')
 67 |         fn = np.asarray(np.sum(self.mat, axis=1).flatten(),dtype='float') - tp
 68 |         fp = np.asarray(np.sum(self.mat, axis=0).flatten(),dtype='float') - tp
 69 |         tn = np.asarray(np.sum(self.mat)*np.ones(self.n_classes).flatten(),
 70 |                         dtype='float') - tp - fn - fp
 71 |         return tp, fn, fp, tn
 72 | 
 73 |     def accuracy(self):
 74 |         """
 75 |         Calculates global accuracy
 76 |         :return: accuracy
 77 |         :example: >>> conf = ConfusionMatrix(3)
 78 |                   >>> conf.batchAdd([0,0,1],[0,0,2])
 79 |                   >>> print conf.accuracy()
 80 |         """
 81 |         tp, _, _, _ = self.get_errors()
 82 |         n_samples = np.sum(self.mat)
 83 |         return np.sum(tp) / n_samples
 84 | 
 85 |     def sensitivity(self):
 86 |         tp, tn, fp, fn = self.get_errors()
 87 |         res = tp / (tp + fn)
 88 |         res = res[~np.isnan(res)]
 89 |         return res
 90 | 
 91 |     def specificity(self):
 92 |         tp, tn, fp, fn = self.get_errors()
 93 |         res = tn / (tn + fp)
 94 |         res = res[~np.isnan(res)]
 95 |         return res
 96 | 
 97 |     def positive_predictive_value(self):
 98 |         tp, tn, fp, fn = self.get_errors()
 99 |         res = tp / (tp + fp)
100 |         res = res[~np.isnan(res)]
101 |         return res
102 | 
103 |     def negative_predictive_value(self):
104 |         tp, tn, fp, fn = self.get_errors()
105 |         res = tn / (tn + fn)
106 |         res = res[~np.isnan(res)]
107 |         return res
108 | 
109 |     def false_positive_rate(self):
110 |         tp, tn, fp, fn = self.get_errors()
111 |         res = fp / (fp + tn)
112 |         res = res[~np.isnan(res)]
113 |         return res
114 | 
115 |     def false_discovery_rate(self):
116 |         tp, tn, fp, fn = self.get_errors()
117 |         res = fp / (tp + fp)
118 |         res = res[~np.isnan(res)]
119 |         return res
120 | 
121 |     def F1(self):
122 |         tp, tn, fp, fn = self.get_errors()
123 |         res = (2*tp) / (2*tp + fp + fn)
124 |         res = res[~np.isnan(res)]
125 |         return res
126 | 
127 |     def matthews_correlation(self):
128 |         tp, tn, fp, fn = self.get_errors()
129 |         numerator = tp*tn - fp*fn
130 |         denominator = np.sqrt((tp + fp)*(tp + fn)*(tn + fp)*(tn + fn))
131 |         res = numerator / denominator
132 |         res = res[~np.isnan(res)]
133 |         return res
134 | 


--------------------------------------------------------------------------------
/lab2_CNN/lab2_CNN.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Credits\n",
  8 |     "TensorFlow translation of [Lasagne tutorial](https://github.com/DeepLearningDTU/nvidia_deep_learning_summercamp_2016/blob/master/lab2/lab2_CNN.ipynb). Thanks to [skaae](https://github.com/skaae), [casperkaae](https://github.com/casperkaae) and [larsmaaloee](https://github.com/larsmaaloee)."
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "metadata": {},
 14 |    "source": [
 15 |     "# Dependancies and supporting functions\n",
 16 |     "Loading dependancies and supporting functions by running the code block below."
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": null,
 22 |    "metadata": {
 23 |     "collapsed": false
 24 |    },
 25 |    "outputs": [],
 26 |    "source": [
 27 |     "%matplotlib inline\n",
 28 |     "import matplotlib\n",
 29 |     "import numpy as np\n",
 30 |     "import matplotlib.pyplot as plt\n",
 31 |     "import sklearn.datasets\n",
 32 |     "import tensorflow as tf\n",
 33 |     "from tensorflow.python.framework.ops import reset_default_graph\n",
 34 |     "\n",
 35 |     "def onehot(t, num_classes):\n",
 36 |     "    out = np.zeros((t.shape[0], num_classes))\n",
 37 |     "    for row, col in enumerate(t):\n",
 38 |     "        out[row, col] = 1\n",
 39 |     "    return out"
 40 |    ]
 41 |   },
 42 |   {
 43 |    "cell_type": "markdown",
 44 |    "metadata": {},
 45 |    "source": [
 46 |     "# Convolutional Neural networks 101\n",
 47 |     "\n",
 48 |     "Convolution neural networks are one of the most succesfull types of neural networks for image recognition and an integral part of reigniting the interest in neural networks. \n",
 49 |     "\n",
 50 |     "In this lab we'll experiment with inserting 2D-convolution layers in the fully connected neural networks introduced in LAB1. We'll furhter experiment with stacking of convolution layers, max pooling and strided convolutions which are all important techniques in current convolution neural network architectures. Lastly we'll try to visualize the learned convolution filters and try to understand what kind of features they learn to recognize.\n",
 51 |     "\n",
 52 |     "\n",
 53 |     "If you are unfamilar with the the convolution operation  https://github.com/vdumoulin/conv_arithmetic have a nice visualization of different convolution variants. For a more indept tutorial please see http://cs231n.github.io/convolutional-networks/ or http://neuralnetworksanddeeplearning.com/chap6.html. Lastly if you are ambitious and want implement a convolution neural network from scratch please see an exercise for our Deep Learning summer school last year https://github.com/DTU-deeplearning/day2-Conv"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "code",
 58 |    "execution_count": null,
 59 |    "metadata": {
 60 |     "collapsed": false
 61 |    },
 62 |    "outputs": [],
 63 |    "source": [
 64 |     "#LOAD the mnist data. To speed up training we'll only work on a subset of the data.\n",
 65 |     "#Note that we reshape the data from (nsamples, num_features)= (nsamples, nchannels*rows*cols)  -> (nsamples, nchannels, rows, cols)\n",
 66 |     "# in order to retain the spatial arrangements of the pixels\n",
 67 |     "data = np.load('mnist.npz')\n",
 68 |     "num_classes = 10\n",
 69 |     "nchannels,rows,cols = 1,28,28\n",
 70 |     "x_train = data['X_train'][:1000].astype('float32')\n",
 71 |     "x_train = x_train.reshape((-1,nchannels,rows,cols))\n",
 72 |     "targets_train = data['y_train'][:1000].astype('int32')\n",
 73 |     "\n",
 74 |     "x_valid = data['X_valid'][:500].astype('float32')\n",
 75 |     "x_valid = x_valid.reshape((-1,nchannels,rows,cols))\n",
 76 |     "targets_valid = data['y_valid'][:500].astype('int32')\n",
 77 |     "\n",
 78 |     "x_test = data['X_test'][:500].astype('float32')\n",
 79 |     "x_test = x_test.reshape((-1,nchannels,rows,cols))\n",
 80 |     "targets_test = data['y_test'][:500].astype('int32')\n",
 81 |     "\n",
 82 |     "print \"Information on dataset\"\n",
 83 |     "print \"x_train\", x_train.shape\n",
 84 |     "print \"targets_train\", targets_train.shape\n",
 85 |     "print \"x_valid\", x_valid.shape\n",
 86 |     "print \"targets_valid\", targets_valid.shape\n",
 87 |     "print \"x_test\", x_test.shape\n",
 88 |     "print \"targets_test\", targets_test.shape"
 89 |    ]
 90 |   },
 91 |   {
 92 |    "cell_type": "code",
 93 |    "execution_count": null,
 94 |    "metadata": {
 95 |     "collapsed": false
 96 |    },
 97 |    "outputs": [],
 98 |    "source": [
 99 |     "#plot a few MNIST examples\n",
100 |     "idx = 0\n",
101 |     "canvas = np.zeros((28*10, 10*28))\n",
102 |     "for i in range(10):\n",
103 |     "    for j in range(10):\n",
104 |     "        canvas[i*28:(i+1)*28, j*28:(j+1)*28] = x_train[idx].reshape((28, 28))\n",
105 |     "        idx += 1\n",
106 |     "plt.figure(figsize=(7, 7))\n",
107 |     "plt.imshow(canvas, cmap='gray')\n",
108 |     "plt.title('MNIST handwritten digits')\n",
109 |     "plt.show()"
110 |    ]
111 |   },
112 |   {
113 |    "cell_type": "markdown",
114 |    "metadata": {},
115 |    "source": [
116 |     "### Documentation on contrib layers\n",
117 |     "Check out the [github page](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/layers/python/layers/layers.py) for information on contrib layers (not well documented in their api)."
118 |    ]
119 |   },
120 |   {
121 |    "cell_type": "code",
122 |    "execution_count": null,
123 |    "metadata": {
124 |     "collapsed": false
125 |    },
126 |    "outputs": [],
127 |    "source": [
128 |     "from tensorflow.contrib.layers import fully_connected, convolution2d, flatten, batch_norm, max_pool2d, dropout\n",
129 |     "from tensorflow.python.ops.nn import relu, elu, relu6, sigmoid, tanh, softmax"
130 |    ]
131 |   },
132 |   {
133 |    "cell_type": "code",
134 |    "execution_count": null,
135 |    "metadata": {
136 |     "collapsed": false
137 |    },
138 |    "outputs": [],
139 |    "source": [
140 |     "# define a simple feed forward neural network\n",
141 |     "\n",
142 |     "# hyperameters of the model\n",
143 |     "num_classes = 10\n",
144 |     "channels = x_train.shape[1]\n",
145 |     "height = x_train.shape[2]\n",
146 |     "width = x_train.shape[3]\n",
147 |     "num_filters_conv1 = 16\n",
148 |     "kernel_size_conv1 = [5, 5] # [height, width]\n",
149 |     "stride_conv1 = [1, 1] # [stride_height, stride_width]\n",
150 |     "num_l1 = 100\n",
151 |     "# resetting the graph ...\n",
152 |     "reset_default_graph()\n",
153 |     "\n",
154 |     "# Setting up placeholder, this is where your data enters the graph!\n",
155 |     "x_pl = tf.placeholder(tf.float32, [None, channels, height, width])\n",
156 |     "l_reshape = tf.transpose(x_pl, [0, 2, 3, 1]) # TensorFlow uses NHWC instead of NCHW\n",
157 |     "#is_training = tf.placeholder(tf.bool)#used for dropout\n",
158 |     "\n",
159 |     "# Building the layers of the neural network\n",
160 |     "# we define the variable scope, so we more easily can recognise our variables later\n",
161 |     "#l_conv1 = convolution2d(l_reshape, num_filters_conv1, kernel_size_conv1, stride_conv1, scope=\"l_conv1\")\n",
162 |     "l_flatten = flatten(l_reshape, scope=\"flatten\") # use l_conv1 instead of l_reshape\n",
163 |     "l1 = fully_connected(l_flatten, num_l1, activation_fn=relu, scope=\"l1\")\n",
164 |     "#l1 = dropout(l1, is_training=is_training, scope=\"dropout\")\n",
165 |     "y = fully_connected(l1, num_classes, activation_fn=softmax, scope=\"y\")"
166 |    ]
167 |   },
168 |   {
169 |    "cell_type": "code",
170 |    "execution_count": null,
171 |    "metadata": {
172 |     "collapsed": true
173 |    },
174 |    "outputs": [],
175 |    "source": [
176 |     "# y_ is a placeholder variable taking on the value of the target batch.\n",
177 |     "y_ = tf.placeholder(tf.float32, [None, num_classes])\n",
178 |     "\n",
179 |     "# computing cross entropy per sample\n",
180 |     "cross_entropy = -tf.reduce_sum(y_ * tf.log(y+1e-8), reduction_indices=[1])\n",
181 |     "\n",
182 |     "# averaging over samples\n",
183 |     "cross_entropy = tf.reduce_mean(cross_entropy)\n",
184 |     "\n",
185 |     "# defining our optimizer\n",
186 |     "optimizer = tf.train.AdamOptimizer(learning_rate=0.001)\n",
187 |     "\n",
188 |     "# applying the gradients\n",
189 |     "train_op = optimizer.minimize(cross_entropy)"
190 |    ]
191 |   },
192 |   {
193 |    "cell_type": "code",
194 |    "execution_count": null,
195 |    "metadata": {
196 |     "collapsed": false
197 |    },
198 |    "outputs": [],
199 |    "source": [
200 |     "#Test the forward pass\n",
201 |     "x = np.random.normal(0,1, (45, 1,28,28)).astype('float32') #dummy data\n",
202 |     "\n",
203 |     "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n",
204 |     "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n",
205 |     "# initialize the Session\n",
206 |     "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n",
207 |     "sess.run(tf.initialize_all_variables())\n",
208 |     "res = sess.run(fetches=[y], feed_dict={x_pl: x})\n",
209 |     "#res = sess.run(fetches=[y], feed_dict={x_pl: x, is_training: False}) # for when using dropout\n",
210 |     "print \"y\", res[0].shape"
211 |    ]
212 |   },
213 |   {
214 |    "cell_type": "code",
215 |    "execution_count": null,
216 |    "metadata": {
217 |     "collapsed": false
218 |    },
219 |    "outputs": [],
220 |    "source": [
221 |     "#Training Loop\n",
222 |     "from confusionmatrix import ConfusionMatrix\n",
223 |     "batch_size = 100\n",
224 |     "num_epochs = 10\n",
225 |     "num_samples_train = x_train.shape[0]\n",
226 |     "num_batches_train = num_samples_train // batch_size\n",
227 |     "num_samples_valid = x_valid.shape[0]\n",
228 |     "num_batches_valid = num_samples_valid // batch_size\n",
229 |     "\n",
230 |     "train_acc, train_loss = [], []\n",
231 |     "valid_acc, valid_loss = [], []\n",
232 |     "test_acc, test_loss = [], []\n",
233 |     "cur_loss = 0\n",
234 |     "loss = []\n",
235 |     "try:\n",
236 |     "    for epoch in range(num_epochs):\n",
237 |     "        #Forward->Backprob->Update params\n",
238 |     "        cur_loss = 0\n",
239 |     "        for i in range(num_batches_train):\n",
240 |     "            idx = range(i*batch_size, (i+1)*batch_size)\n",
241 |     "            x_batch = x_train[idx]\n",
242 |     "            target_batch = targets_train[idx]\n",
243 |     "            feed_dict_train = {x_pl: x_batch, y_: onehot(target_batch, num_classes)}\n",
244 |     "            #feed_dict_train = {x_pl: x_batch, y_: onehot(target_batch, num_classes), is_training: True}\n",
245 |     "            fetches_train = [train_op, cross_entropy]\n",
246 |     "            res = sess.run(fetches=fetches_train, feed_dict=feed_dict_train)\n",
247 |     "            batch_loss = res[1] #this will do the complete backprob pass\n",
248 |     "            cur_loss += batch_loss\n",
249 |     "        loss += [cur_loss/batch_size]\n",
250 |     "\n",
251 |     "        confusion_valid = ConfusionMatrix(num_classes)\n",
252 |     "        confusion_train = ConfusionMatrix(num_classes)\n",
253 |     "\n",
254 |     "        for i in range(num_batches_train):\n",
255 |     "            idx = range(i*batch_size, (i+1)*batch_size)\n",
256 |     "            x_batch = x_train[idx]\n",
257 |     "            targets_batch = targets_train[idx]\n",
258 |     "            # what to feed our accuracy op\n",
259 |     "            feed_dict_eval_train = {x_pl: x_batch}\n",
260 |     "            #feed_dict_eval_train = {x_pl: x_batch, is_training: False}\n",
261 |     "            # deciding which parts to fetch\n",
262 |     "            fetches_eval_train = [y]\n",
263 |     "            # running the validation\n",
264 |     "            res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n",
265 |     "            # collecting and storing predictions\n",
266 |     "            net_out = res[0] \n",
267 |     "            preds = np.argmax(net_out, axis=-1) \n",
268 |     "            confusion_train.batch_add(targets_batch, preds)\n",
269 |     "\n",
270 |     "        confusion_valid = ConfusionMatrix(num_classes)\n",
271 |     "        for i in range(num_batches_valid):\n",
272 |     "            idx = range(i*batch_size, (i+1)*batch_size)\n",
273 |     "            x_batch = x_valid[idx]\n",
274 |     "            targets_batch = targets_valid[idx]\n",
275 |     "            # what to feed our accuracy op\n",
276 |     "            feed_dict_eval_train = {x_pl: x_batch}\n",
277 |     "            #feed_dict_eval_train = {x_pl: x_batch, is_training: False}\n",
278 |     "            # deciding which parts to fetch\n",
279 |     "            fetches_eval_train = [y]\n",
280 |     "            # running the validation\n",
281 |     "            res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n",
282 |     "            # collecting and storing predictions\n",
283 |     "            net_out = res[0]\n",
284 |     "            preds = np.argmax(net_out, axis=-1) \n",
285 |     "\n",
286 |     "            confusion_valid.batch_add(targets_batch, preds)\n",
287 |     "\n",
288 |     "        train_acc_cur = confusion_train.accuracy()\n",
289 |     "        valid_acc_cur = confusion_valid.accuracy()\n",
290 |     "\n",
291 |     "        train_acc += [train_acc_cur]\n",
292 |     "        valid_acc += [valid_acc_cur]\n",
293 |     "        print \"Epoch %i : Train Loss %e , Train acc %f,  Valid acc %f \" \\\n",
294 |     "        % (epoch+1, loss[-1], train_acc_cur, valid_acc_cur)\n",
295 |     "except KeyboardInterrupt:\n",
296 |     "    pass\n",
297 |     "    \n",
298 |     "\n",
299 |     "#get test set score\n",
300 |     "confusion_test = ConfusionMatrix(num_classes)\n",
301 |     "# what to feed our accuracy op\n",
302 |     "feed_dict_eval_train = {x_pl: x_test}\n",
303 |     "#feed_dict_eval_train = {x_pl: x_test, is_training: False}\n",
304 |     "# deciding which parts to fetch\n",
305 |     "fetches_eval_train = [y]\n",
306 |     "# running the validation\n",
307 |     "res = sess.run(fetches=fetches_eval_train, feed_dict=feed_dict_eval_train)\n",
308 |     "# collecting and storing predictions\n",
309 |     "net_out = res[0] \n",
310 |     "preds = np.argmax(net_out, axis=-1) \n",
311 |     "confusion_test.batch_add(targets_test, preds)\n",
312 |     "print \"\\nTest set Acc:  %f\" %(confusion_test.accuracy())\n",
313 |     "\n",
314 |     "\n",
315 |     "epoch = np.arange(len(train_acc))\n",
316 |     "plt.figure()\n",
317 |     "plt.plot(epoch,train_acc,'r',epoch,valid_acc,'b')\n",
318 |     "plt.legend(['Train Acc','Val Acc'])\n",
319 |     "plt.xlabel('Epochs'), plt.ylabel('Acc'), plt.ylim([0.75,1.03])"
320 |    ]
321 |   },
322 |   {
323 |    "cell_type": "markdown",
324 |    "metadata": {},
325 |    "source": [
326 |     "# Assignments 1\n",
327 |     "\n",
328 |     " 1) Note the performance of the standard feedforward neural network. Add a 2D convolution layer before the dense hidden layer and confirm that it increases the generalization performance of the network (try num_filters=16 and filter_size=5 as a starting point). \n",
329 |     " \n",
330 |     " 2) Can the performance be increases even further by stacking more convolution layers ?\n",
331 |     " \n",
332 |     " 3) Maxpooling is a technique for decreasing the spatial resolution of an image while retaining the important features. Effectively this gives a local translational invariance and reduces the computation by a factor of four. In the classification algorithm which is usually desirable. Try to either: \n",
333 |     " \n",
334 |     "     a) add a maxpool layer(add arguement pool_size=2)  after the convolution layer or\n",
335 |     "     b) set add stride=2 to the arguments of the convolution layer. \n",
336 |     "  Verify that this decreases spatial dimension of the image. (print l_conv.output_shape or print   l_maxpool.output_shape). Does this increase the performance of the network (you may need to stack multiple layers or increase the number of filters to increase performance) ?\n",
337 |     "  \n"
338 |    ]
339 |   },
340 |   {
341 |    "cell_type": "markdown",
342 |    "metadata": {},
343 |    "source": [
344 |     "# Visualization of filters\n",
345 |     "Convolution filters can be interpreted as spatial feature detectors picking up different image features such as edges, corners etc. Below we provide code for visualization of the filters. The best results are obtained with fairly large filters of size 9 and either 16 or 36 filters. "
346 |    ]
347 |   },
348 |   {
349 |    "cell_type": "code",
350 |    "execution_count": null,
351 |    "metadata": {
352 |     "collapsed": false
353 |    },
354 |    "outputs": [],
355 |    "source": [
356 |     "# to start with we print the names of the weights in our graph\n",
357 |     "# to see what operations we are allowed to perform on the variables in our graph, try:\n",
358 |     "#print(dir(tf.all_variables()[0]))\n",
359 |     "# you will notice it has \"name\" and \"value\", which we will build a dictionary from\n",
360 |     "names_and_vars = {var.name: sess.run(var.value()) for var in tf.all_variables()}\n",
361 |     "print(names_and_vars.keys())\n",
362 |     "# getting the name was easy, just use .name on the variable object\n",
363 |     "# getting the value in a numpy array format is slightly more tricky\n",
364 |     "# we need to first get a variable object, then turn it into a tensor with .value()\n",
365 |     "# and the evaluate the tensor with sess.run(...)"
366 |    ]
367 |   },
368 |   {
369 |    "cell_type": "code",
370 |    "execution_count": null,
371 |    "metadata": {
372 |     "collapsed": false
373 |    },
374 |    "outputs": [],
375 |    "source": [
376 |     "### ERROR - If you get a key error, then you need to define l_conv1 in your model!\n",
377 |     "if not u'l_conv1/weights:0' in names_and_vars:\n",
378 |     "    print \"You need to go back and define a convolutional layer in the network.\"\n",
379 |     "else:\n",
380 |     "    np_W = names_and_vars[u'l_conv1/weights:0'] # get the filter values from the first conv layer\n",
381 |     "    print np_W.shape, \"i.e. the shape is filter_size, filter_size, num_channels, num_filters\"\n",
382 |     "    filter_size, _, num_channels, num_filters = np_W.shape\n",
383 |     "    n = int(num_filters**0.5)\n",
384 |     "\n",
385 |     "    # reshaping the last dimension to be n by n\n",
386 |     "    np_W_res = np_W.reshape(filter_size, filter_size, num_channels, n, n)\n",
387 |     "    fig, ax = plt.subplots(n,n)\n",
388 |     "    print \"learned filter values\"\n",
389 |     "    for i in range(n):\n",
390 |     "        for j in range(n):\n",
391 |     "            ax[i,j].imshow(np_W_res[:,:,0,i,j], cmap='gray',interpolation='none')\n",
392 |     "            ax[i,j].xaxis.set_major_formatter(plt.NullFormatter())\n",
393 |     "            ax[i,j].yaxis.set_major_formatter(plt.NullFormatter())\n",
394 |     "\n",
395 |     "\n",
396 |     "    idx = 1\n",
397 |     "    plt.figure()\n",
398 |     "    plt.imshow(x_train[idx,0],cmap='gray',interpolation='none')\n",
399 |     "    plt.title('Inut Image')\n",
400 |     "    plt.show()\n",
401 |     "\n",
402 |     "    #visalize the filters convolved with an input image\n",
403 |     "    from scipy.signal import convolve2d\n",
404 |     "    np_W_res = np_W.reshape(filter_size, filter_size, num_channels, n, n)\n",
405 |     "    fig, ax = plt.subplots(n,n,figsize=(9,9))\n",
406 |     "    print \"Response from input image convolved with the filters\"\n",
407 |     "    for i in range(n):\n",
408 |     "        for j in range(n):\n",
409 |     "            ax[i,j].imshow(convolve2d(x_train[1,0],np_W_res[:,:,0,i,j],mode='same'),\n",
410 |     "                           cmap='gray',interpolation='none')\n",
411 |     "            ax[i,j].xaxis.set_major_formatter(plt.NullFormatter())\n",
412 |     "            ax[i,j].yaxis.set_major_formatter(plt.NullFormatter())"
413 |    ]
414 |   },
415 |   {
416 |    "cell_type": "markdown",
417 |    "metadata": {},
418 |    "source": [
419 |     "# Assignment 2\n",
420 |     "\n",
421 |     "The visualized filters will likely look most like noise due to the small amount of training data.\n",
422 |     "\n",
423 |     " 1) Try to use 10000 traning examples instead and visualise the filters again\n",
424 |     " \n",
425 |     " 2) Dropout is a very usefull technique for preventing overfitting. Try to add a DropoutLayer after the convolution layer and hidden layer. This should increase both performance and the \"visual appeal\" of the filters\n",
426 |     " \n",
427 |     " 3) Batch normalization is a recent innovation for improving generalization performance. Try to insert batch normalization layers into the network to improve performance. \n",
428 |     " \n",
429 |     " \n"
430 |    ]
431 |   },
432 |   {
433 |    "cell_type": "markdown",
434 |    "metadata": {},
435 |    "source": [
436 |     "# More Fun with convolutional networks\n",
437 |     "### Get the data"
438 |    ]
439 |   },
440 |   {
441 |    "cell_type": "code",
442 |    "execution_count": null,
443 |    "metadata": {
444 |     "collapsed": false
445 |    },
446 |    "outputs": [],
447 |    "source": [
448 |     "!wget -N https://s3.amazonaws.com/lasagne/recipes/datasets/mnist_cluttered_60x60_6distortions.npz"
449 |    ]
450 |   },
451 |   {
452 |    "cell_type": "markdown",
453 |    "metadata": {},
454 |    "source": [
455 |     "In the data the each mnist digit (20x20 pixels) has been placed randomly in a 60x60 canvas. To make the task harder each canvas has then been cluttered with small pieces of digits. In this task it is helpfull for a network if it can focus only on the digit and ignore the rest.\n",
456 |     "\n",
457 |     "The ``TransformerLayer`` lets us do this. The transformer layer learns an affine transformation which lets the network zoom, rotate and skew. If you are interested you should read the paper, but the main idea is that you can let a small convolutional network determine the the parameters of the affine transformation. You then apply the affine transformation to the input data. Usually this also involves downsampling which forces the model to zoom in on the relevant parts of the data. After the affine transformation we can use a larger conv net to do the classification. \n",
458 |     "This is possible because you can backprop through a an affine transformation if you use bilinear interpolation."
459 |    ]
460 |   },
461 |   {
462 |    "cell_type": "code",
463 |    "execution_count": null,
464 |    "metadata": {
465 |     "collapsed": false
466 |    },
467 |    "outputs": [],
468 |    "source": [
469 |     "import os\n",
470 |     "import matplotlib\n",
471 |     "import numpy as np\n",
472 |     "np.random.seed(123)\n",
473 |     "import matplotlib.pyplot as plt\n",
474 |     "import tensorflow as tf\n",
475 |     "from tensorflow.contrib.layers import fully_connected, convolution2d, flatten, max_pool2d\n",
476 |     "pool = max_pool2d\n",
477 |     "conv = convolution2d\n",
478 |     "dense = fully_connected\n",
479 |     "from tensorflow.python.ops.nn import relu, softmax\n",
480 |     "from tensorflow.python.framework.ops import reset_default_graph\n",
481 |     "\n",
482 |     "from spatial_transformer import transformer\n",
483 |     "\n",
484 |     "def onehot(t, num_classes):\n",
485 |     "    out = np.zeros((t.shape[0], num_classes))\n",
486 |     "    for row, col in enumerate(t):\n",
487 |     "        out[row, col] = 1\n",
488 |     "    return out\n",
489 |     "\n",
490 |     "NUM_EPOCHS = 500\n",
491 |     "BATCH_SIZE = 256\n",
492 |     "LEARNING_RATE = 0.001\n",
493 |     "DIM = 60\n",
494 |     "NUM_CLASSES = 10\n",
495 |     "mnist_cluttered = \"mnist_cluttered_60x60_6distortions.npz\""
496 |    ]
497 |   },
498 |   {
499 |    "cell_type": "code",
500 |    "execution_count": null,
501 |    "metadata": {
502 |     "collapsed": false
503 |    },
504 |    "outputs": [],
505 |    "source": [
506 |     "def load_data():\n",
507 |     "    data = np.load(mnist_cluttered)\n",
508 |     "    X_train, y_train = data['x_train'], np.argmax(data['y_train'], axis=-1)\n",
509 |     "    X_valid, y_valid = data['x_valid'], np.argmax(data['y_valid'], axis=-1)\n",
510 |     "    X_test, y_test = data['x_test'], np.argmax(data['y_test'], axis=-1)\n",
511 |     "\n",
512 |     "    # reshape for convolutions\n",
513 |     "    X_train = X_train.reshape((X_train.shape[0], 1, DIM, DIM))\n",
514 |     "    X_valid = X_valid.reshape((X_valid.shape[0], 1, DIM, DIM))\n",
515 |     "    X_test = X_test.reshape((X_test.shape[0], 1, DIM, DIM))\n",
516 |     "    \n",
517 |     "    print \"Train samples:\", X_train.shape\n",
518 |     "    print \"Validation samples:\", X_valid.shape\n",
519 |     "    print \"Test samples:\", X_test.shape\n",
520 |     "\n",
521 |     "    return dict(\n",
522 |     "        X_train=np.asarray(X_train, dtype='float32'),\n",
523 |     "        y_train=y_train.astype('int32'),\n",
524 |     "        X_valid=np.asarray(X_valid, dtype='float32'),\n",
525 |     "        y_valid=y_valid.astype('int32'),\n",
526 |     "        X_test=np.asarray(X_test, dtype='float32'),\n",
527 |     "        y_test=y_test.astype('int32'),\n",
528 |     "        num_examples_train=X_train.shape[0],\n",
529 |     "        num_examples_valid=X_valid.shape[0],\n",
530 |     "        num_examples_test=X_test.shape[0],\n",
531 |     "        input_height=X_train.shape[2],\n",
532 |     "        input_width=X_train.shape[3],\n",
533 |     "        output_dim=10,)\n",
534 |     "data = load_data()\n",
535 |     "\n",
536 |     "idx = 0\n",
537 |     "canvas = np.zeros((DIM*10, 10*DIM))\n",
538 |     "for i in range(10):\n",
539 |     "    for j in range(10):\n",
540 |     "        canvas[i*DIM:(i+1)*DIM, j*DIM:(j+1)*DIM] = data['X_train'][idx].reshape((DIM, DIM))\n",
541 |     "        idx += 1\n",
542 |     "plt.figure(figsize=(10, 10))\n",
543 |     "plt.imshow(canvas, cmap='gray')\n",
544 |     "plt.title('Cluttered handwritten digits')\n",
545 |     "plt.axis('off')\n",
546 |     "\n",
547 |     "plt.show()"
548 |    ]
549 |   },
550 |   {
551 |    "cell_type": "markdown",
552 |    "metadata": {},
553 |    "source": [
554 |     "## Building the model\n",
555 |     "\n",
556 |     "We use a model where the localization network is a two layer convolution network which operates directly on the image input. The output from the localization network is a 6 dimensional vector specifying the parameters in the affine transformation.\n",
557 |     "\n",
558 |     "We set up the transformer layer to initially do the identity transform, similarly to [1]. If the output from the localization networks is [t1, t2, t3, t4, t5, t6] then t1 and t5 determines zoom, t2 and t4 determines skewness, and t3 and t6 move the center position. By setting the initial values of the bias vector to \n",
559 |     "\n",
560 |     "```\n",
561 |     "|1, 0, 0|\n",
562 |     "|0, 1, 0|\n",
563 |     "```\n",
564 |     "and the final W of the localization network to all zeros we ensure that in the beginning of training the network works as a pooling layer. \n",
565 |     "\n",
566 |     "The output of the localization layer feeds into the transformer layer which applies the transformation to the image input. In our setup the transformer layer downsamples the input by a factor 3.\n",
567 |     "\n",
568 |     "Finally a 2 layer convolution layer and 2 fully connected layers calculates the output probabilities.\n",
569 |     "\n",
570 |     "\n",
571 |     "### The model\n",
572 |     "```\n",
573 |     "Input -> localization_network -> TransformerLayer -> output_network -> predictions\n",
574 |     "   |                                |\n",
575 |     "   >--------------------------------^\n",
576 |     "```\n",
577 |     "\n",
578 |     "\n"
579 |    ]
580 |   },
581 |   {
582 |    "cell_type": "code",
583 |    "execution_count": null,
584 |    "metadata": {
585 |     "collapsed": false
586 |    },
587 |    "outputs": [],
588 |    "source": [
589 |     "reset_default_graph()\n",
590 |     "def build_model(x_pl, input_width, input_height, output_dim,\n",
591 |     "                batch_size=BATCH_SIZE):\n",
592 |     "    # Setting up placeholder, this is where your data enters the graph!\n",
593 |     "    l_reshape = tf.transpose(x_pl, [0, 2, 3, 1]) # TensorFlow uses NHWC instead of NCHW\n",
594 |     "\n",
595 |     "    # make distributed representation of input image for localization network\n",
596 |     "    loc_l1 = pool(l_reshape, kernel_size=[2, 2], scope=\"loc_l1\")\n",
597 |     "    loc_l2 = conv(loc_l1, num_outputs=8, kernel_size=[5, 5], stride=[1, 1], padding=\"SAME\", scope=\"loc_l2\")\n",
598 |     "    loc_l3 = pool(loc_l2, kernel_size=[2, 2], scope=\"loc_l3\")\n",
599 |     "    loc_l4 = conv(loc_l3, num_outputs=8, kernel_size=[5, 5], stride=[1, 1], padding=\"SAME\", scope=\"loc_l4\")\n",
600 |     "    loc_l4_flatten = flatten(loc_l4, scope=\"loc_l4_flatten\")\n",
601 |     "    loc_l5 = dense(loc_l4_flatten, num_outputs=50, activation_fn=relu, scope=\"loc_l5\")\n",
602 |     "    # set up weights for transformation (notice we always need 6 output neurons)\n",
603 |     "    W_loc_out = tf.get_variable(\"W_loc_out\", [50, 6], initializer=tf.constant_initializer(0.0))\n",
604 |     "    initial = np.array([[1, 0, 0], [0, 1, 0]])\n",
605 |     "    initial = initial.astype('float32')\n",
606 |     "    initial = initial.flatten()\n",
607 |     "    b_loc_out = tf.Variable(initial_value=initial, name='b_loc_out')\n",
608 |     "    loc_out = tf.matmul(loc_l5, W_loc_out) + b_loc_out\n",
609 |     "\n",
610 |     "    # spatial transformer\n",
611 |     "    l_trans1 = transformer(l_reshape, loc_out, out_size=(DIM//3, DIM//3))\n",
612 |     "    l_trans1.set_shape([None, DIM//3, DIM//3, 1])\n",
613 |     "    l_trans1_valid = tf.transpose(l_trans1, [0, 2, 3, 1]) # Back into NCHW for validation\n",
614 |     "\n",
615 |     "    print \"Transformer network output shape: \", l_trans1.get_shape()\n",
616 |     "\n",
617 |     "    # classification network\n",
618 |     "    class_l1 = conv(l_trans1, num_outputs=16, kernel_size=[3, 3], scope=\"class_l1\")\n",
619 |     "    class_l2 = pool(class_l1, kernel_size=[2, 2], scope=\"class_l2\")\n",
620 |     "    class_l3 = conv(class_l2, num_outputs=16, kernel_size=[3, 3], scope=\"class_l3\")\n",
621 |     "    class_l4 = pool(class_l3, kernel_size=[2, 2], scope=\"class_l4\")\n",
622 |     "    class_l4_flatten = flatten(class_l4, scope=\"class_l4_flatten\")\n",
623 |     "    class_l5 = dense(class_l4_flatten, num_outputs=256, activation_fn=relu, scope=\"class_l5\")\n",
624 |     "    l_out = dense(class_l5, num_outputs=output_dim, activation_fn=softmax, scope=\"l_out\")\n",
625 |     "\n",
626 |     "    return l_out, l_trans1_valid\n",
627 |     "\n",
628 |     "x_pl = tf.placeholder(tf.float32, [None, 1, DIM, DIM])\n",
629 |     "model, l_transform = build_model(x_pl, DIM, DIM, NUM_CLASSES)\n",
630 |     "#model_params = lasagne.layers.get_all_params(model, trainable=True)"
631 |    ]
632 |   },
633 |   {
634 |    "cell_type": "code",
635 |    "execution_count": null,
636 |    "metadata": {
637 |     "collapsed": false
638 |    },
639 |    "outputs": [],
640 |    "source": [
641 |     "# y_ is a placeholder variable taking on the value of the target batch.\n",
642 |     "y_pl = tf.placeholder(tf.float32, shape=[None, NUM_CLASSES])\n",
643 |     "lr_pl = tf.placeholder(tf.float32, shape=[])\n",
644 |     "\n",
645 |     "# computing cross entropy per sample\n",
646 |     "cross_entropy = -tf.reduce_sum(y_pl * tf.log(model+1e-8), reduction_indices=[1])\n",
647 |     "\n",
648 |     "# averaging over samples\n",
649 |     "cross_entropy = tf.reduce_mean(cross_entropy)\n",
650 |     "\n",
651 |     "# defining our optimizer\n",
652 |     "optimizer = tf.train.AdamOptimizer(learning_rate=lr_pl)\n",
653 |     "\n",
654 |     "# applying the gradients\n",
655 |     "train_op = optimizer.minimize(cross_entropy)"
656 |    ]
657 |   },
658 |   {
659 |    "cell_type": "code",
660 |    "execution_count": null,
661 |    "metadata": {
662 |     "collapsed": false
663 |    },
664 |    "outputs": [],
665 |    "source": [
666 |     "# test the forward pass\n",
667 |     "x = np.random.normal(0,1, (45, 1,60,60)).astype('float32') #dummy data\n",
668 |     "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n",
669 |     "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n",
670 |     "# initialize the Session\n",
671 |     "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n",
672 |     "sess.run(tf.initialize_all_variables())\n",
673 |     "res = sess.run(fetches=[model], feed_dict={x_pl: x})\n",
674 |     "print \"y\", res[0].shape"
675 |    ]
676 |   },
677 |   {
678 |    "cell_type": "markdown",
679 |    "metadata": {},
680 |    "source": [
681 |     "### Training the model\n",
682 |     "Unfortunately NVIDIA has yet to squeeze a TitanX into a labtop and training convnets on CPU is painfully slow. After 10 epochs you should see that model starts to zoom in on the digits. "
683 |    ]
684 |   },
685 |   {
686 |    "cell_type": "code",
687 |    "execution_count": null,
688 |    "metadata": {
689 |     "collapsed": false
690 |    },
691 |    "outputs": [],
692 |    "source": [
693 |     "def train_epoch(X, y, learning_rate):\n",
694 |     "    num_samples = X.shape[0]\n",
695 |     "    num_batches = int(np.ceil(num_samples / float(BATCH_SIZE)))\n",
696 |     "    costs = []\n",
697 |     "    correct = 0\n",
698 |     "    for i in range(num_batches):\n",
699 |     "        if i % 10 == 0:\n",
700 |     "            print i,\n",
701 |     "        idx = range(i*BATCH_SIZE, np.minimum((i+1)*BATCH_SIZE, num_samples))\n",
702 |     "        X_batch_tr = X[idx]\n",
703 |     "        y_batch_tr = y[idx]\n",
704 |     "        fetches_tr = [train_op, cross_entropy, model]\n",
705 |     "        feed_dict_tr = {x_pl: X_batch_tr, y_pl: onehot(y_batch_tr, NUM_CLASSES), lr_pl: learning_rate}\n",
706 |     "        res = sess.run(fetches=fetches_tr, feed_dict=feed_dict_tr)\n",
707 |     "        cost_batch, output_train = tuple(res[1:3])\n",
708 |     "        costs += [cost_batch]\n",
709 |     "        preds = np.argmax(output_train, axis=-1)\n",
710 |     "        correct += np.sum(y_batch_tr == preds)\n",
711 |     "    print \"\"\n",
712 |     "    return np.mean(costs), correct / float(num_samples)\n",
713 |     "\n",
714 |     "\n",
715 |     "def eval_epoch(X, y):\n",
716 |     "    num_samples = X.shape[0]\n",
717 |     "    num_batches = int(np.ceil(num_samples / float(BATCH_SIZE)))\n",
718 |     "    pred_list = []\n",
719 |     "    transform_list = []\n",
720 |     "    for i in range(num_batches):\n",
721 |     "        if i % 10 == 0:\n",
722 |     "            print i,\n",
723 |     "        idx = range(i*BATCH_SIZE, np.minimum((i+1)*BATCH_SIZE, num_samples))\n",
724 |     "        X_batch_val = X[idx]\n",
725 |     "        fetches_val = [model, l_transform]\n",
726 |     "        feed_dict_val = {x_pl: X_batch_val}\n",
727 |     "        res = sess.run(fetches=fetches_val, feed_dict=feed_dict_val)\n",
728 |     "        output_eval, transform_eval = tuple(res)\n",
729 |     "        pred_list.append(output_eval)\n",
730 |     "        transform_list.append(transform_eval)\n",
731 |     "    transform_eval = np.concatenate(transform_list, axis=0)\n",
732 |     "    preds = np.concatenate(pred_list, axis=0)\n",
733 |     "    preds = np.argmax(preds, axis=-1)\n",
734 |     "    acc = np.mean(preds == y)\n",
735 |     "    print \"\"\n",
736 |     "    return acc, transform_eval"
737 |    ]
738 |   },
739 |   {
740 |    "cell_type": "code",
741 |    "execution_count": null,
742 |    "metadata": {
743 |     "collapsed": false
744 |    },
745 |    "outputs": [],
746 |    "source": [
747 |     "valid_accs, train_accs, test_accs = [], [], []\n",
748 |     "learning_rate=0.0001\n",
749 |     "try:\n",
750 |     "    for n in range(NUM_EPOCHS):\n",
751 |     "        print \"Epoch %d:\" % n\n",
752 |     "        print 'train ',\n",
753 |     "        train_cost, train_acc = train_epoch(data['X_train'], data['y_train'], learning_rate)\n",
754 |     "        print 'valid ',\n",
755 |     "        valid_acc, valid_trainsform = eval_epoch(data['X_valid'], data['y_valid'])\n",
756 |     "        print 'test ',\n",
757 |     "        test_acc, test_transform = eval_epoch(data['X_test'], data['y_test'])\n",
758 |     "        valid_accs += [valid_acc]\n",
759 |     "        test_accs += [test_acc]\n",
760 |     "        train_accs += [train_acc]\n",
761 |     "\n",
762 |     "        # learning rate annealing\n",
763 |     "        if (n+1) % 20 == 0:\n",
764 |     "            learning_rate = learning_rate * 0.7\n",
765 |     "            print \"New LR:\", learning_rate\n",
766 |     "\n",
767 |     "        print \"train cost {0:.2}, train acc {1:.2}, val acc {2:.2}, test acc {3:.2}\".format(\n",
768 |     "                train_cost, train_acc, valid_acc, test_acc)\n",
769 |     "except KeyboardInterrupt:\n",
770 |     "    pass"
771 |    ]
772 |   },
773 |   {
774 |    "cell_type": "markdown",
775 |    "metadata": {},
776 |    "source": [
777 |     "### Plot errors and zoom"
778 |    ]
779 |   },
780 |   {
781 |    "cell_type": "code",
782 |    "execution_count": null,
783 |    "metadata": {
784 |     "collapsed": false
785 |    },
786 |    "outputs": [],
787 |    "source": [
788 |     "plt.figure(figsize=(9,9))\n",
789 |     "plt.plot(1-np.array(train_accs), label='Training Error')\n",
790 |     "plt.plot(1-np.array(valid_accs), label='Validation Error')\n",
791 |     "plt.legend(fontsize=20)\n",
792 |     "plt.xlabel('Epoch', fontsize=20)\n",
793 |     "plt.ylabel('Error', fontsize=20)\n",
794 |     "plt.show()"
795 |    ]
796 |   },
797 |   {
798 |    "cell_type": "code",
799 |    "execution_count": null,
800 |    "metadata": {
801 |     "collapsed": false
802 |    },
803 |    "outputs": [],
804 |    "source": [
805 |     "plt.figure(figsize=(7,14))\n",
806 |     "for i in range(3):\n",
807 |     "    plt.subplot(321+i*2)\n",
808 |     "    plt.imshow(data['X_test'][i].reshape(DIM, DIM), cmap='gray', interpolation='none')\n",
809 |     "    if i == 0:\n",
810 |     "        plt.title('Original 60x60', fontsize=20)\n",
811 |     "    plt.axis('off')\n",
812 |     "    plt.subplot(322+i*2)\n",
813 |     "    plt.imshow(test_transform[i].reshape(DIM//3, DIM//3).T, cmap='gray', interpolation='none')\n",
814 |     "    if i == 0:\n",
815 |     "        plt.title('Transformed 20x20', fontsize=20)\n",
816 |     "    plt.axis('off')\n",
817 |     "    \n",
818 |     "    \n",
819 |     "plt.tight_layout()\n",
820 |     "plt.show()"
821 |    ]
822 |   },
823 |   {
824 |    "cell_type": "markdown",
825 |    "metadata": {
826 |     "collapsed": true
827 |    },
828 |    "source": [
829 |     "# A few pointers for image classification\n",
830 |     "If you want do image classification using a pretrained model is often a good choice, especially if you have limited amounts of labeled data.\n",
831 |     "\n",
832 |     "An often used pretrained network is the Google Inception model. TensorFlow has a guide for using their current state-of-the-art pretrained model in their [model repository](https://github.com/tensorflow/models/tree/master/inception). Torch7 and Theano have similar pretrained models that you can find with google. \n",
833 |     "\n",
834 |     "Currently the best performing image network is the [ResNet](https://arxiv.org/pdf/1512.03385v1.pdf) model. Torch7 has an interesting blog post about residual nets. http://torch.ch/blog/2016/02/04/resnets.html"
835 |    ]
836 |   }
837 |  ],
838 |  "metadata": {
839 |   "kernelspec": {
840 |    "display_name": "Python 2",
841 |    "language": "python",
842 |    "name": "python2"
843 |   },
844 |   "language_info": {
845 |    "codemirror_mode": {
846 |     "name": "ipython",
847 |     "version": 2
848 |    },
849 |    "file_extension": ".py",
850 |    "mimetype": "text/x-python",
851 |    "name": "python",
852 |    "nbconvert_exporter": "python",
853 |    "pygments_lexer": "ipython2",
854 |    "version": "2.7.6"
855 |   }
856 |  },
857 |  "nbformat": 4,
858 |  "nbformat_minor": 0
859 | }
860 | 


--------------------------------------------------------------------------------
/lab2_CNN/mnist.npz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab2_CNN/mnist.npz


--------------------------------------------------------------------------------
/lab2_CNN/spatial_transformer.py:
--------------------------------------------------------------------------------
  1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved.
  2 | #
  3 | # Licensed under the Apache License, Version 2.0 (the "License");
  4 | # you may not use this file except in compliance with the License.
  5 | # You may obtain a copy of the License at
  6 | #
  7 | #     http://www.apache.org/licenses/LICENSE-2.0
  8 | #
  9 | # Unless required by applicable law or agreed to in writing, software
 10 | # distributed under the License is distributed on an "AS IS" BASIS,
 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 12 | # See the License for the specific language governing permissions and
 13 | # limitations under the License.
 14 | # ==============================================================================
 15 | import tensorflow as tf
 16 | 
 17 | 
 18 | def transformer(U, theta, out_size, name='SpatialTransformer', **kwargs):
 19 |     """Spatial Transformer Layer
 20 | 
 21 |     Implements a spatial transformer layer as described in [1]_.
 22 |     Based on [2]_ and edited by David Dao for Tensorflow.
 23 | 
 24 |     Parameters
 25 |     ----------
 26 |     U : float
 27 |         The output of a convolutional net should have the
 28 |         shape [num_batch, height, width, num_channels].
 29 |     theta: float
 30 |         The output of the
 31 |         localisation network should be [num_batch, 6].
 32 |     out_size: tuple of two ints
 33 |         The size of the output of the network (height, width)
 34 | 
 35 |     References
 36 |     ----------
 37 |     .. [1]  Spatial Transformer Networks
 38 |             Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu
 39 |             Submitted on 5 Jun 2015
 40 |     .. [2]  https://github.com/skaae/transformer_network/blob/master/transformerlayer.py
 41 | 
 42 |     Notes
 43 |     -----
 44 |     To initialize the network to the identity transform init
 45 |     ``theta`` to :
 46 |         identity = np.array([[1., 0., 0.],
 47 |                              [0., 1., 0.]])
 48 |         identity = identity.flatten()
 49 |         theta = tf.Variable(initial_value=identity)
 50 | 
 51 |     """
 52 | 
 53 |     def _repeat(x, n_repeats):
 54 |         with tf.variable_scope('_repeat'):
 55 |             rep = tf.transpose(
 56 |                 tf.expand_dims(tf.ones(shape=tf.pack([n_repeats, ])), 1), [1, 0])
 57 |             rep = tf.cast(rep, 'int32')
 58 |             x = tf.matmul(tf.reshape(x, (-1, 1)), rep)
 59 |             return tf.reshape(x, [-1])
 60 | 
 61 |     def _interpolate(im, x, y, out_size):
 62 |         with tf.variable_scope('_interpolate'):
 63 |             # constants
 64 |             num_batch = tf.shape(im)[0]
 65 |             height = tf.shape(im)[1]
 66 |             width = tf.shape(im)[2]
 67 |             channels = tf.shape(im)[3]
 68 | 
 69 |             x = tf.cast(x, 'float32')
 70 |             y = tf.cast(y, 'float32')
 71 |             height_f = tf.cast(height, 'float32')
 72 |             width_f = tf.cast(width, 'float32')
 73 |             out_height = out_size[0]
 74 |             out_width = out_size[1]
 75 |             zero = tf.zeros([], dtype='int32')
 76 |             max_y = tf.cast(tf.shape(im)[1] - 1, 'int32')
 77 |             max_x = tf.cast(tf.shape(im)[2] - 1, 'int32')
 78 | 
 79 |             # scale indices from [-1, 1] to [0, width/height]
 80 |             x = (x + 1.0)*(width_f) / 2.0
 81 |             y = (y + 1.0)*(height_f) / 2.0
 82 | 
 83 |             # do sampling
 84 |             x0 = tf.cast(tf.floor(x), 'int32')
 85 |             x1 = x0 + 1
 86 |             y0 = tf.cast(tf.floor(y), 'int32')
 87 |             y1 = y0 + 1
 88 | 
 89 |             x0 = tf.clip_by_value(x0, zero, max_x)
 90 |             x1 = tf.clip_by_value(x1, zero, max_x)
 91 |             y0 = tf.clip_by_value(y0, zero, max_y)
 92 |             y1 = tf.clip_by_value(y1, zero, max_y)
 93 |             dim2 = width
 94 |             dim1 = width*height
 95 |             base = _repeat(tf.range(num_batch)*dim1, out_height*out_width)
 96 |             base_y0 = base + y0*dim2
 97 |             base_y1 = base + y1*dim2
 98 |             idx_a = base_y0 + x0
 99 |             idx_b = base_y1 + x0
100 |             idx_c = base_y0 + x1
101 |             idx_d = base_y1 + x1
102 | 
103 |             # use indices to lookup pixels in the flat image and restore
104 |             # channels dim
105 |             im_flat = tf.reshape(im, tf.pack([-1, channels]))
106 |             im_flat = tf.cast(im_flat, 'float32')
107 |             Ia = tf.gather(im_flat, idx_a)
108 |             Ib = tf.gather(im_flat, idx_b)
109 |             Ic = tf.gather(im_flat, idx_c)
110 |             Id = tf.gather(im_flat, idx_d)
111 | 
112 |             # and finally calculate interpolated values
113 |             x0_f = tf.cast(x0, 'float32')
114 |             x1_f = tf.cast(x1, 'float32')
115 |             y0_f = tf.cast(y0, 'float32')
116 |             y1_f = tf.cast(y1, 'float32')
117 |             wa = tf.expand_dims(((x1_f-x) * (y1_f-y)), 1)
118 |             wb = tf.expand_dims(((x1_f-x) * (y-y0_f)), 1)
119 |             wc = tf.expand_dims(((x-x0_f) * (y1_f-y)), 1)
120 |             wd = tf.expand_dims(((x-x0_f) * (y-y0_f)), 1)
121 |             output = tf.add_n([wa*Ia, wb*Ib, wc*Ic, wd*Id])
122 |             return output
123 | 
124 |     def _meshgrid(height, width):
125 |         with tf.variable_scope('_meshgrid'):
126 |             # This should be equivalent to:
127 |             #  x_t, y_t = np.meshgrid(np.linspace(-1, 1, width),
128 |             #                         np.linspace(-1, 1, height))
129 |             #  ones = np.ones(np.prod(x_t.shape))
130 |             #  grid = np.vstack([x_t.flatten(), y_t.flatten(), ones])
131 |             x_t = tf.matmul(tf.ones(shape=tf.pack([height, 1])),
132 |                             tf.transpose(tf.expand_dims(tf.linspace(-1.0, 1.0, width), 1), [1, 0]))
133 |             y_t = tf.matmul(tf.expand_dims(tf.linspace(-1.0, 1.0, height), 1),
134 |                             tf.ones(shape=tf.pack([1, width])))
135 | 
136 |             x_t_flat = tf.reshape(x_t, (1, -1))
137 |             y_t_flat = tf.reshape(y_t, (1, -1))
138 | 
139 |             ones = tf.ones_like(x_t_flat)
140 |             grid = tf.concat(0, [x_t_flat, y_t_flat, ones])
141 |             return grid
142 | 
143 |     def _transform(theta, input_dim, out_size):
144 |         with tf.variable_scope('_transform'):
145 |             num_batch = tf.shape(input_dim)[0]
146 |             height = tf.shape(input_dim)[1]
147 |             width = tf.shape(input_dim)[2]
148 |             num_channels = tf.shape(input_dim)[3]
149 |             theta = tf.reshape(theta, (-1, 2, 3))
150 |             theta = tf.cast(theta, 'float32')
151 | 
152 |             # grid of (x_t, y_t, 1), eq (1) in ref [1]
153 |             height_f = tf.cast(height, 'float32')
154 |             width_f = tf.cast(width, 'float32')
155 |             out_height = out_size[0]
156 |             out_width = out_size[1]
157 |             grid = _meshgrid(out_height, out_width)
158 |             grid = tf.expand_dims(grid, 0)
159 |             grid = tf.reshape(grid, [-1])
160 |             grid = tf.tile(grid, tf.pack([num_batch]))
161 |             grid = tf.reshape(grid, tf.pack([num_batch, 3, -1]))
162 | 
163 |             # Transform A x (x_t, y_t, 1)^T -> (x_s, y_s)
164 |             T_g = tf.batch_matmul(theta, grid)
165 |             x_s = tf.slice(T_g, [0, 0, 0], [-1, 1, -1])
166 |             y_s = tf.slice(T_g, [0, 1, 0], [-1, 1, -1])
167 |             x_s_flat = tf.reshape(x_s, [-1])
168 |             y_s_flat = tf.reshape(y_s, [-1])
169 | 
170 |             input_transformed = _interpolate(
171 |                 input_dim, x_s_flat, y_s_flat,
172 |                 out_size)
173 | 
174 |             output = tf.reshape(
175 |                 input_transformed, tf.pack([num_batch, out_height, out_width, num_channels]))
176 |             return output
177 | 
178 |     with tf.variable_scope(name):
179 |         output = _transform(theta, U, out_size)
180 |         return output
181 | 
182 | 
183 | def batch_transformer(U, thetas, out_size, name='BatchSpatialTransformer'):
184 |     """Batch Spatial Transformer Layer
185 | 
186 |     Parameters
187 |     ----------
188 | 
189 |     U : float
190 |         tensor of inputs [num_batch,height,width,num_channels]
191 |     thetas : float
192 |         a set of transformations for each input [num_batch,num_transforms,6]
193 |     out_size : int
194 |         the size of the output [out_height,out_width]
195 | 
196 |     Returns: float
197 |         Tensor of size [num_batch*num_transforms,out_height,out_width,num_channels]
198 |     """
199 |     with tf.variable_scope(name):
200 |         num_batch, num_transforms = map(int, thetas.get_shape().as_list()[:2])
201 |         indices = [[i]*num_transforms for i in xrange(num_batch)]
202 |         input_repeated = tf.gather(U, tf.reshape(indices, [-1]))
203 |         return transformer(input_repeated, thetas, out_size)
204 | 


--------------------------------------------------------------------------------
/lab3_RNN/.gitignore:
--------------------------------------------------------------------------------
1 | *.jpg
2 | *.png
3 | 


--------------------------------------------------------------------------------
/lab3_RNN/confusionmatrix.py:
--------------------------------------------------------------------------------
  1 | import numpy as np
  2 | 
  3 | 
  4 | class ConfusionMatrix:
  5 |     """
  6 |        Simple confusion matrix class
  7 |        row is the true class, column is the predicted class
  8 |     """
  9 |     def __init__(self, num_classes, class_names=None):
 10 |         self.n_classes = num_classes
 11 |         if class_names is None:
 12 |             self.class_names = map(str, range(num_classes))
 13 |         else:
 14 |             self.class_names = class_names
 15 | 
 16 |         # find max class_name and pad
 17 |         max_len = max(map(len, self.class_names))
 18 |         self.max_len = max_len
 19 |         for idx, name in enumerate(self.class_names):
 20 |             if len(self.class_names) < max_len:
 21 |                 self.class_names[idx] = name + " "*(max_len-len(name))
 22 | 
 23 |         self.mat = np.zeros((num_classes,num_classes),dtype='int')
 24 | 
 25 |     def __str__(self):
 26 |         # calucate row and column sums
 27 |         col_sum = np.sum(self.mat, axis=1)
 28 |         row_sum = np.sum(self.mat, axis=0)
 29 | 
 30 |         s = []
 31 | 
 32 |         mat_str = self.mat.__str__()
 33 |         mat_str = mat_str.replace('[','').replace(']','').split('\n')
 34 | 
 35 |         for idx, row in enumerate(mat_str):
 36 |             if idx == 0:
 37 |                 pad = " "
 38 |             else:
 39 |                 pad = ""
 40 |             class_name = self.class_names[idx]
 41 |             class_name = " " + class_name + " |"
 42 |             row_str = class_name + pad + row
 43 |             row_str += " |" + str(col_sum[idx])
 44 |             s.append(row_str)
 45 | 
 46 |         row_sum = [(self.max_len+4)*" "+" ".join(map(str, row_sum))]
 47 |         hline = [(1+self.max_len)*" "+"-"*len(row_sum[0])]
 48 | 
 49 |         s = hline + s + hline + row_sum
 50 | 
 51 |         # add linebreaks
 52 |         s_out = [line+'\n' for line in s]
 53 |         return "".join(s_out)
 54 | 
 55 |     def batch_add(self, targets, preds):
 56 |         assert targets.shape == preds.shape
 57 |         assert len(targets) == len(preds)
 58 |         assert max(targets) < self.n_classes
 59 |         assert max(preds) < self.n_classes
 60 |         targets = targets.flatten()
 61 |         preds = preds.flatten()
 62 |         for i in range(len(targets)):
 63 |                 self.mat[targets[i], preds[i]] += 1
 64 | 
 65 |     def get_errors(self):
 66 |         tp = np.asarray(np.diag(self.mat).flatten(),dtype='float')
 67 |         fn = np.asarray(np.sum(self.mat, axis=1).flatten(),dtype='float') - tp
 68 |         fp = np.asarray(np.sum(self.mat, axis=0).flatten(),dtype='float') - tp
 69 |         tn = np.asarray(np.sum(self.mat)*np.ones(self.n_classes).flatten(),
 70 |                         dtype='float') - tp - fn - fp
 71 |         return tp, fn, fp, tn
 72 | 
 73 |     def accuracy(self):
 74 |         """
 75 |         Calculates global accuracy
 76 |         :return: accuracy
 77 |         :example: >>> conf = ConfusionMatrix(3)
 78 |                   >>> conf.batchAdd([0,0,1],[0,0,2])
 79 |                   >>> print conf.accuracy()
 80 |         """
 81 |         tp, _, _, _ = self.get_errors()
 82 |         n_samples = np.sum(self.mat)
 83 |         return np.sum(tp) / n_samples
 84 | 
 85 |     def sensitivity(self):
 86 |         tp, tn, fp, fn = self.get_errors()
 87 |         res = tp / (tp + fn)
 88 |         res = res[~np.isnan(res)]
 89 |         return res
 90 | 
 91 |     def specificity(self):
 92 |         tp, tn, fp, fn = self.get_errors()
 93 |         res = tn / (tn + fp)
 94 |         res = res[~np.isnan(res)]
 95 |         return res
 96 | 
 97 |     def positive_predictive_value(self):
 98 |         tp, tn, fp, fn = self.get_errors()
 99 |         res = tp / (tp + fp)
100 |         res = res[~np.isnan(res)]
101 |         return res
102 | 
103 |     def negative_predictive_value(self):
104 |         tp, tn, fp, fn = self.get_errors()
105 |         res = tn / (tn + fn)
106 |         res = res[~np.isnan(res)]
107 |         return res
108 | 
109 |     def false_positive_rate(self):
110 |         tp, tn, fp, fn = self.get_errors()
111 |         res = fp / (fp + tn)
112 |         res = res[~np.isnan(res)]
113 |         return res
114 | 
115 |     def false_discovery_rate(self):
116 |         tp, tn, fp, fn = self.get_errors()
117 |         res = fp / (tp + fp)
118 |         res = res[~np.isnan(res)]
119 |         return res
120 | 
121 |     def F1(self):
122 |         tp, tn, fp, fn = self.get_errors()
123 |         res = (2*tp) / (2*tp + fp + fn)
124 |         res = res[~np.isnan(res)]
125 |         return res
126 | 
127 |     def matthews_correlation(self):
128 |         tp, tn, fp, fn = self.get_errors()
129 |         numerator = tp*tn - fp*fn
130 |         denominator = np.sqrt((tp + fp)*(tp + fn)*(tn + fp)*(tn + fn))
131 |         res = numerator / denominator
132 |         res = res[~np.isnan(res)]
133 |         return res
134 | 


--------------------------------------------------------------------------------
/lab3_RNN/data_generator.py:
--------------------------------------------------------------------------------
  1 | from __future__ import print_function
  2 | import numpy as np
  3 | 
  4 | target_to_text = {
  5 |     '0':'zero',
  6 |     '1':'one',
  7 |     '2':'two',
  8 |     '3':'three',
  9 |     '4':'four',
 10 |     '5':'five',
 11 |     '6':'six',
 12 |     '7':'seven',
 13 |     '8':'eight',
 14 |     '9':'nine',
 15 | }
 16 | 
 17 | stop_character = start_character = '#'
 18 | 
 19 | input_characters = " ".join(target_to_text.values())
 20 | valid_characters = ['0', '1', '2', '3',  '4',  '5',  '6',  '7',  '8',  '9',  '#'] + \
 21 |               list(set(input_characters))
 22 | 
 23 | def print_valid_characters():
 24 |     l = ''
 25 |     for i,c in enumerate(valid_characters):
 26 |         l += "\'%s\'=%i,\t" % (c,i)
 27 |     print("Number of valid characters:", len(valid_characters))
 28 |     print(l)
 29 | 
 30 | ninput_chars = len(valid_characters)
 31 | def get_batch(batch_size=100, min_digits = 3, max_digits=3):
 32 |     '''
 33 |     Generates random sequences of integers and translates them to text i.e. 1->'one'.
 34 |     :param batch_size: number of samples to return
 35 |     :param min_digits: minimum length of target
 36 |     :param max_digits: maximum length of target
 37 |     '''
 38 |     text_inputs = []
 39 |     int_inputs = []
 40 |     text_targets_in = []
 41 |     text_targets_out = []
 42 |     int_targets_in = []
 43 |     int_targets_out = []
 44 |     for i in range(batch_size):
 45 |         #convert integer into a list of digits
 46 |         tar_len = np.random.randint(min_digits,max_digits+1)
 47 |         text_target = inp_str = "".join(map(str,np.random.randint(0,10,tar_len)))
 48 |         text_target_in = start_character + text_target
 49 |         text_target_out = text_target + stop_character
 50 | 
 51 |         #generate the targets as a list of intergers
 52 |         int_target_in = map(lambda c: valid_characters.index(c), text_target_in)
 53 |         int_target_out = map(lambda c: valid_characters.index(c), text_target_out)
 54 | 
 55 |         #generate the text input
 56 |         text_input = " ".join(map(lambda k: target_to_text[k], inp_str))
 57 |         #generate the inputs as a list of intergers
 58 |         int_input = map(lambda c: valid_characters.index(c), text_input)
 59 | 
 60 |         text_inputs.append(text_input)
 61 |         int_inputs.append(int_input)
 62 |         text_targets_in.append(text_target_in)
 63 |         text_targets_out.append(text_target_out)
 64 |         int_targets_in.append(int_target_in)
 65 |         int_targets_out.append(int_target_out)
 66 | 
 67 |     #create the input matrix, mask and seq_len - note that we zero pad the shorter sequences.
 68 |     max_input_len = max(map(len, int_inputs))
 69 |     inputs = np.zeros((batch_size, max_input_len))
 70 | #    input_masks = np.zeros((batch_size,max_input_len))
 71 |     for (i,inp) in enumerate(int_inputs):
 72 |         cur_len = len(inp)
 73 |         inputs[i,:cur_len] = inp
 74 | #        input_masks[i,:cur_len] = 1
 75 |     inputs_seqlen = np.asarray(map(len, int_inputs))
 76 | 
 77 |     max_target_in_len = max(map(len, int_targets_in))
 78 |     targets_in = np.zeros((batch_size, max_target_in_len))
 79 |     targets_mask = np.zeros((batch_size, max_target_in_len))
 80 |     for (i, tar) in enumerate(int_targets_in):
 81 |         cur_len = len(tar)
 82 |         targets_in[i, :cur_len] = tar
 83 |     targets_seqlen = np.asarray(map(len, int_targets_in))
 84 | 
 85 |     max_target_out_len = max(map(len, int_targets_out))
 86 |     targets_out = np.zeros((batch_size, max_target_in_len))
 87 |     for (i,tar) in enumerate(int_targets_out):
 88 |         cur_len = len(tar)
 89 |         targets_out[i,:cur_len] = tar
 90 |         targets_mask[i,:cur_len] = 1
 91 | 
 92 |     return inputs.astype('int32'), \
 93 |            inputs_seqlen.astype('int32'), \
 94 |            targets_in.astype('int32'), \
 95 |            targets_out.astype('int32'), \
 96 |            targets_seqlen.astype('int32'), \
 97 |            targets_mask.astype('float32'), \
 98 |            text_inputs, \
 99 |            text_targets_in, \
100 |            text_targets_out
101 | 
102 | if __name__ == '__main__':
103 |     batch_size = 3
104 |     inputs, inputs_seqlen, targets_in, targets_out, targets_seqlen, targets_mask, \
105 |     text_inputs, text_targets_in, text_targets_out = \
106 |         get_batch(batch_size=batch_size, max_digits=2, min_digits=1)
107 | 
108 |     print("input types:", inputs.dtype, inputs_seqlen.dtype, targets_in.dtype, targets_out.dtype, targets_seqlen.dtype)
109 |     print(print_valid_characters())
110 |     print("Stop/start character = #")
111 | 
112 |     for i in range(batch_size):
113 |         print("\nSAMPLE",i)
114 |         print("TEXT INPUTS:\t\t\t", text_inputs[i])
115 |         print("TEXT TARGETS INPUT:\t\t", text_targets_in[i])
116 |         print("TEXT TARGETS OUTPUT:\t\t", text_targets_out[i])
117 |         print("ENCODED INPUTS:\t\t\t", inputs[i])
118 |         print("INPUTS SEQUENCE LENGTH:\t\t", inputs_seqlen[i])
119 |         print("ENCODED TARGETS INPUT:\t\t", targets_in[i])
120 |         print("ENCODED TARGETS OUTPUT:\t\t", targets_out[i])
121 |         print("TARGETS SEQUENCE LENGTH:\t", targets_seqlen[i])
122 |         print("TARGETS MASK:\t\t\t", targets_mask[i])


--------------------------------------------------------------------------------
/lab3_RNN/enc-dec.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab3_RNN/enc-dec.png


--------------------------------------------------------------------------------
/lab3_RNN/lab3_RNN.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "code",
  5 |    "execution_count": null,
  6 |    "metadata": {
  7 |     "collapsed": false
  8 |    },
  9 |    "outputs": [],
 10 |    "source": [
 11 |     "%matplotlib inline \n",
 12 |     "%matplotlib nbagg\n",
 13 |     "import tensorflow as tf\n",
 14 |     "import matplotlib\n",
 15 |     "import numpy as np\n",
 16 |     "import matplotlib.pyplot as plt\n",
 17 |     "from IPython import display\n",
 18 |     "from data_generator import get_batch, print_valid_characters\n",
 19 |     "from tensorflow.python.framework.ops import reset_default_graph\n",
 20 |     "\n",
 21 |     "import tf_utils"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "# Recurrent Neural Networks\n",
 29 |     "\n",
 30 |     "Recurrent neural networks are the natural type of neural network to use for sequential data i.e. time series analysis, translation, speech recognition, biological sequence analysis etc. Recurrent neural networks works by recursively applying the same operation at each time step of the data sequence and having layers that pass information from previous time step to the current. It can therefore naturally handle input of varying length. Recurrent networks can be used for several prediction tasks including: sequence-to-class, sequence tagging, and sequence-to-sequence predictions.\n",
 31 |     "\n",
 32 |     "In this exercise we'll implement a Encoder-Decoder RNN based on the GRU unit for a simple sequence to sequence translation task. This type of models have shown impressive performance in Neural Machine Translation and Image Caption generation. \n",
 33 |     "\n",
 34 |     "For more in depth background material on RNNs please see [Supervised Sequence Labelling with Recurrent\n",
 35 |     "Neural Networks](https://www.cs.toronto.edu/~graves/preprint.pdf) by Alex Graves\n",
 36 |     "\n",
 37 |     "We know that LSTMs and GRUs are difficult to understand. A very good non-mathematical introduction is [Chris Olahs blog](http://colah.github.io/posts/2015-08-Understanding-LSTMs/). (All the posts are nice and cover various topics within machine-learning)."
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "# Encoder-Decoder\n",
 45 |     "In the encoder-decoder structure one RNN (blue) encodes the input and a second RNN (red) calculates the target values. One essential step is to let the encoder and decoder communicate. In the simplest approach you use the last hidden state of the encoder to initialize the decoder. Other approaches lets the decoder attend to different parts of the encoded input at different timesteps in the decoding process. \n",
 46 |     "\n",
 47 |     "<img src=\"files/enc-dec.png\", width=400>\n",
 48 |     "\n",
 49 |     "In our implementation we use a RNN with gated recurrent units (GRU) as encoder. We then use the last hidden state of the encoder ($h^{enc}_T$) as input to the decoder which is also a GRU RNN. \n",
 50 |     "\n",
 51 |     "### RNNs in TensorFlow\n",
 52 |     "TensorFlow has implementations of LSTM and GRU units. Both implementations assume that the input from the tensor below has the shape **(batch_size, seq_len, num_features)**, unless you have `time\\_major=True`. In this excercise we will use the GRU unit since it only stores a single hidden value per neuron (LSTMs stores two) and is approximately twice as fast as the LSTM unit.\n",
 53 |     "\n",
 54 |     "As stated above we will implement a Encoder-Decoder model. The simplest way to do this is to encode the input sequence using the Encoder model. We will then use the last hidden state of the Encoder $h^{enc}_T$ as input to the decoder model which then uses this information (simply a fixed length vector of numbers) to produce the targets. There is (at least) two ways to input $h^{enc}_T$ into the decoder\n",
 55 |     "\n",
 56 |     "1. Repeatly use $h^{enc}_T$ as input to the Decoder at each decode time step, as well as the previously computed word\n",
 57 |     "2. Intialize the decoder using $h^{enc}_T$ and run the decoder without any inputs\n",
 58 |     "\n",
 59 |     "In this exercise we will follow the second approach because it's easier to implement. To do this need to create a tensorflow layer that takes $h^{enc}_T$."
 60 |    ]
 61 |   },
 62 |   {
 63 |    "cell_type": "markdown",
 64 |    "metadata": {},
 65 |    "source": [
 66 |     "### The Data\n",
 67 |     "Since RNN models can be very slow to train on real large datasets we will generate some simpler training data for this exercise. The task for the RNN is simply to translate a string of letters spelling the numbers between 0-9 into the corresponding numbers i.e\n",
 68 |     "\n",
 69 |     "\"one two five\" --> \"125#\" (we use # as a special end-of-sequence character)\n",
 70 |     "\n",
 71 |     "To input the strings into the RNN model we translate the characters into a vector integers using a simple translation table (i.e. 'h'->16, 'o'-> 17 etc). The code below prints a few input/output pairs using the *get_batch* function which randomy produces the data.\n",
 72 |     "\n",
 73 |     "Do note; that as showed in the illustration above for input to the decoder the end-of-sequence tag is flipped, and used in the beginning instead of the end. This tag is known as start-of-sequence, but often the end-of-sequence tag is just reused for this purpose.\n",
 74 |     "\n",
 75 |     "In the data loader below you will see two targets, target input and target output. Where the input will be used to compute the translation and output used for the loss function."
 76 |    ]
 77 |   },
 78 |   {
 79 |    "cell_type": "code",
 80 |    "execution_count": null,
 81 |    "metadata": {
 82 |     "collapsed": false
 83 |    },
 84 |    "outputs": [],
 85 |    "source": [
 86 |     "batch_size = 3\n",
 87 |     "inputs, inputs_seqlen, targets_in, targets_out, targets_seqlen, targets_mask, \\\n",
 88 |     "text_inputs, text_targets_in, text_targets_out = \\\n",
 89 |     "    get_batch(batch_size=batch_size, max_digits=2, min_digits=1)\n",
 90 |     "\n",
 91 |     "print \"input types:\", inputs.dtype, inputs_seqlen.dtype, targets_in.dtype, targets_out.dtype, targets_seqlen.dtype\n",
 92 |     "print print_valid_characters()\n",
 93 |     "print \"Stop/start character = #\"\n",
 94 |     "\n",
 95 |     "for i in range(batch_size):\n",
 96 |     "    print \"\\nSAMPLE\",i\n",
 97 |     "    print \"TEXT INPUTS:\\t\\t\\t\", text_inputs[i]\n",
 98 |     "    print \"TEXT TARGETS INPUT:\\t\\t\", text_targets_in[i]\n",
 99 |     "    print \"TEXT TARGETS OUTPUT:\\t\\t\", text_targets_out[i]\n",
100 |     "    print \"ENCODED INPUTS:\\t\\t\\t\", inputs[i]\n",
101 |     "    print \"INPUTS SEQUENCE LENGTH:\\t\\t\", inputs_seqlen[i]\n",
102 |     "    print \"ENCODED TARGETS INPUT:\\t\\t\", targets_in[i]\n",
103 |     "    print \"ENCODED TARGETS OUTPUT:\\t\\t\", targets_out[i]\n",
104 |     "    print \"TARGETS SEQUENCE LENGTH:\\t\", targets_seqlen[i]\n",
105 |     "    print \"TARGETS MASK:\\t\\t\\t\", targets_mask[i]"
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "markdown",
110 |    "metadata": {},
111 |    "source": [
112 |     "### Encoder Decoder model setup\n",
113 |     "Below is the TensorFlow model definition. We use an embedding layer to go from integer representation to vector representation of the input.\n",
114 |     "\n",
115 |     "Note that we have made use of a custom decoder wrapper which can be found in `rnn.py`."
116 |    ]
117 |   },
118 |   {
119 |    "cell_type": "code",
120 |    "execution_count": null,
121 |    "metadata": {
122 |     "collapsed": false
123 |    },
124 |    "outputs": [],
125 |    "source": [
126 |     "# resetting the graph\n",
127 |     "reset_default_graph()\n",
128 |     "\n",
129 |     "# Setting up hyperparameters and general configs\n",
130 |     "MAX_DIGITS = 5\n",
131 |     "MIN_DIGITS = 5\n",
132 |     "NUM_INPUTS = 27\n",
133 |     "NUM_OUTPUTS = 11 #(0-9 + '#')\n",
134 |     "\n",
135 |     "BATCH_SIZE = 100\n",
136 |     "# try various learning rates 1e-2 to 1e-5\n",
137 |     "LEARNING_RATE = 0.005\n",
138 |     "X_EMBEDDINGS = 8\n",
139 |     "t_EMBEDDINGS = 8\n",
140 |     "NUM_UNITS_ENC = 10\n",
141 |     "NUM_UNITS_DEC = 10\n",
142 |     "\n",
143 |     "\n",
144 |     "# Setting up placeholders, these are the tensors that we \"feed\" to our network\n",
145 |     "Xs = tf.placeholder(tf.int32, shape=[None, None], name='X_input')\n",
146 |     "ts_in = tf.placeholder(tf.int32, shape=[None, None], name='t_input_in')\n",
147 |     "ts_out = tf.placeholder(tf.int32, shape=[None, None], name='t_input_out')\n",
148 |     "X_len = tf.placeholder(tf.int32, shape=[None], name='X_len')\n",
149 |     "t_len = tf.placeholder(tf.int32, shape=[None], name='X_len')\n",
150 |     "t_mask = tf.placeholder(tf.float32, shape=[None, None], name='t_mask')\n",
151 |     "\n",
152 |     "# Building the model\n",
153 |     "\n",
154 |     "# first we build the embeddings to make our characters into dense, trainable vectors\n",
155 |     "X_embeddings = tf.get_variable('X_embeddings', [NUM_INPUTS, X_EMBEDDINGS],\n",
156 |     "                               initializer=tf.random_normal_initializer(stddev=0.1))\n",
157 |     "t_embeddings = tf.get_variable('t_embeddings', [NUM_OUTPUTS, t_EMBEDDINGS],\n",
158 |     "                               initializer=tf.random_normal_initializer(stddev=0.1))\n",
159 |     "\n",
160 |     "# setting up weights for computing the final output\n",
161 |     "W_out = tf.get_variable('W_out', [NUM_UNITS_DEC, NUM_OUTPUTS])\n",
162 |     "b_out = tf.get_variable('b_out', [NUM_OUTPUTS])\n",
163 |     "\n",
164 |     "X_embedded = tf.gather(X_embeddings, Xs, name='embed_X')\n",
165 |     "t_embedded = tf.gather(t_embeddings, ts_in, name='embed_t')\n",
166 |     "\n",
167 |     "# forward encoding\n",
168 |     "enc_cell = tf.nn.rnn_cell.GRUCell(NUM_UNITS_ENC)#python.ops.rnn_cell.GRUCell\n",
169 |     "_, enc_state = tf.nn.dynamic_rnn(cell=enc_cell, inputs=X_embedded,\n",
170 |     "                                 sequence_length=X_len, dtype=tf.float32)\n",
171 |     "# use below incase TF's makes issues\n",
172 |     "#enc_state, _ = tf_utils.encoder(X_embedded, X_len, 'encoder', NUM_UNITS_ENC)\n",
173 |     "#\n",
174 |     "#enc_state = tf.concat(1, [enc_state, enc_state])\n",
175 |     "\n",
176 |     "# decoding\n",
177 |     "# note that we are using a wrapper for decoding here, this wrapper is hardcoded to only use GRU\n",
178 |     "# check out tf_utils to see how you make your own decoder\n",
179 |     "dec_out, valid_dec_out = tf_utils.decoder(enc_state, t_embedded, t_len, \n",
180 |     "                                          NUM_UNITS_DEC, t_embeddings,\n",
181 |     "                                          W_out, b_out)\n",
182 |     "\n",
183 |     "# reshaping to have [batch_size*seqlen, num_units]\n",
184 |     "out_tensor = tf.reshape(dec_out, [-1, NUM_UNITS_DEC])\n",
185 |     "valid_out_tensor = tf.reshape(valid_dec_out, [-1, NUM_UNITS_DEC])\n",
186 |     "# computing output\n",
187 |     "out_tensor = tf.matmul(out_tensor, W_out) + b_out\n",
188 |     "valid_out_tensor = tf.matmul(valid_out_tensor, W_out) + b_out\n",
189 |     "# reshaping back to sequence\n",
190 |     "b_size = tf.shape(X_len)[0] # use a variable we know has batch_size in [0]\n",
191 |     "seq_len = tf.shape(t_embedded)[1] # variable we know has sequence length in [1]\n",
192 |     "num_out = tf.constant(NUM_OUTPUTS) # casting NUM_OUTPUTS to a tensor variable\n",
193 |     "out_shape = tf.concat(0, [tf.expand_dims(b_size, 0),\n",
194 |     "                          tf.expand_dims(seq_len, 0),\n",
195 |     "                          tf.expand_dims(num_out, 0)])\n",
196 |     "out_tensor = tf.reshape(out_tensor, out_shape)\n",
197 |     "valid_out_tensor = tf.reshape(valid_out_tensor, out_shape)\n",
198 |     "# handling shape loss\n",
199 |     "#out_tensor.set_shape([None, None, NUM_OUTPUTS])\n",
200 |     "y = out_tensor\n",
201 |     "y_valid = valid_out_tensor"
202 |    ]
203 |   },
204 |   {
205 |    "cell_type": "code",
206 |    "execution_count": null,
207 |    "metadata": {
208 |     "collapsed": false
209 |    },
210 |    "outputs": [],
211 |    "source": [
212 |     "# print all the variable names and shapes\n",
213 |     "for var in tf.all_variables():\n",
214 |     "    s = var.name + \" \"*(40-len(var.name))\n",
215 |     "    print s, var.value().get_shape()"
216 |    ]
217 |   },
218 |   {
219 |    "cell_type": "markdown",
220 |    "metadata": {},
221 |    "source": [
222 |     "### Defining the cost function, gradient clipping and accuracy\n",
223 |     "Because the targets are categorical we use the cross entropy error.\n",
224 |     "As the data is sequential we use the sequence to sequence cross entropy supplied in `tf_utils.py`.\n",
225 |     "We use the Adam optimizer but you can experiment with the different optimizers implemented in [TensorFlow](https://www.tensorflow.org/versions/r0.10/api_docs/python/train.html#optimizers)."
226 |    ]
227 |   },
228 |   {
229 |    "cell_type": "code",
230 |    "execution_count": null,
231 |    "metadata": {
232 |     "collapsed": false
233 |    },
234 |    "outputs": [],
235 |    "source": [
236 |     "def loss_and_acc(preds):\n",
237 |     "    # sequence_loss_tensor is a modification of TensorFlow's own sequence_to_sequence_loss\n",
238 |     "    # TensorFlow's seq2seq loss works with a 2D list instead of a 3D tensors\n",
239 |     "    loss = tf_utils.sequence_loss_tensor(preds, ts_out, t_mask, NUM_OUTPUTS) # notice that we use ts_out here!\n",
240 |     "    # if you want regularization\n",
241 |     "    #reg_scale = 0.00001\n",
242 |     "    #regularize = tf.contrib.layers.l2_regularizer(reg_scale)\n",
243 |     "    #params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n",
244 |     "    #reg_term = sum([regularize(param) for param in params])\n",
245 |     "    #loss += reg_term\n",
246 |     "    # calculate accuracy\n",
247 |     "    argmax = tf.to_int32(tf.argmax(preds, 2))\n",
248 |     "    correct = tf.to_float(tf.equal(argmax, ts_out)) * t_mask\n",
249 |     "    accuracy = tf.reduce_sum(correct) / tf.reduce_sum(t_mask)\n",
250 |     "    return loss, accuracy, argmax\n",
251 |     "\n",
252 |     "loss, accuracy, predictions = loss_and_acc(y)\n",
253 |     "loss_valid, accuracy_valid, predictions_valid = loss_and_acc(y_valid)\n",
254 |     "\n",
255 |     "# use lobal step to keep track of our iterations\n",
256 |     "global_step = tf.Variable(0, name='global_step', trainable=False)\n",
257 |     "# pick optimizer, try momentum or adadelta\n",
258 |     "optimizer = tf.train.AdamOptimizer(LEARNING_RATE)\n",
259 |     "# extract gradients for each variable\n",
260 |     "grads_and_vars = optimizer.compute_gradients(loss)\n",
261 |     "# add below for clipping by norm\n",
262 |     "#gradients, variables = zip(*grads_and_vars)  # unzip list of tuples\n",
263 |     "#clipped_gradients, global_norm = (\n",
264 |     "#    tf.clip_by_global_norm(gradients, self.clip_norm) )\n",
265 |     "#grads_and_vars = zip(clipped_gradients, variables)\n",
266 |     "# apply gradients and make trainable function\n",
267 |     "train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)"
268 |    ]
269 |   },
270 |   {
271 |    "cell_type": "code",
272 |    "execution_count": null,
273 |    "metadata": {
274 |     "collapsed": false
275 |    },
276 |    "outputs": [],
277 |    "source": [
278 |     "# print all the variable names and shapes\n",
279 |     "# notice that we now have the optimizer Adam as well!\n",
280 |     "for var in tf.all_variables():\n",
281 |     "    s = var.name + \" \"*(40-len(var.name))\n",
282 |     "    print s, var.value().get_shape()"
283 |    ]
284 |   },
285 |   {
286 |    "cell_type": "code",
287 |    "execution_count": null,
288 |    "metadata": {
289 |     "collapsed": false
290 |    },
291 |    "outputs": [],
292 |    "source": [
293 |     "# as always, test the forward pass and initialize the tf.Session!\n",
294 |     "# here is some dummy data\n",
295 |     "batch_size=3\n",
296 |     "inputs, inputs_seqlen, targets_in, targets_out, targets_seqlen, targets_mask, \\\n",
297 |     "text_inputs, text_targets_in, text_targets_out = \\\n",
298 |     "    get_batch(batch_size=batch_size, max_digits=7, min_digits=2)\n",
299 |     "\n",
300 |     "for i in range(batch_size):\n",
301 |     "    print \"\\nSAMPLE\",i\n",
302 |     "    print \"TEXT INPUTS:\\t\\t\\t\", text_inputs[i]\n",
303 |     "    print \"TEXT TARGETS INPUT:\\t\\t\", text_targets_in[i]\n",
304 |     "\n",
305 |     "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n",
306 |     "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n",
307 |     "# initialize the Session\n",
308 |     "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n",
309 |     "# test train part\n",
310 |     "sess.run(tf.initialize_all_variables())\n",
311 |     "feed_dict = {Xs: inputs, X_len: inputs_seqlen, ts_in: targets_in,\n",
312 |     "             ts_out: targets_out, t_len: targets_seqlen}\n",
313 |     "fetches = [y]\n",
314 |     "res = sess.run(fetches=fetches, feed_dict=feed_dict)\n",
315 |     "print \"y\", res[0].shape\n",
316 |     "\n",
317 |     "# test validation part\n",
318 |     "fetches = [y_valid]\n",
319 |     "res = sess.run(fetches=fetches, feed_dict=feed_dict)\n",
320 |     "print \"y_valid\", res[0].shape"
321 |    ]
322 |   },
323 |   {
324 |    "cell_type": "code",
325 |    "execution_count": null,
326 |    "metadata": {
327 |     "collapsed": false
328 |    },
329 |    "outputs": [],
330 |    "source": [
331 |     "#Generate some validation data\n",
332 |     "X_val, X_len_val, t_in_val, t_out_val, t_len_val, t_mask_val, \\\n",
333 |     "text_inputs_val, text_targets_in_val, text_targets_out_val = \\\n",
334 |     "    get_batch(batch_size=5000, max_digits=MAX_DIGITS,min_digits=MIN_DIGITS)\n",
335 |     "print \"X_val\", X_val.shape\n",
336 |     "print \"t_out_val\", t_out_val.shape"
337 |    ]
338 |   },
339 |   {
340 |    "cell_type": "markdown",
341 |    "metadata": {},
342 |    "source": [
343 |     "# Training"
344 |    ]
345 |   },
346 |   {
347 |    "cell_type": "code",
348 |    "execution_count": null,
349 |    "metadata": {
350 |     "collapsed": false
351 |    },
352 |    "outputs": [],
353 |    "source": [
354 |     "# setting up running parameters\n",
355 |     "val_interval = 5000\n",
356 |     "samples_to_process = 3e5\n",
357 |     "samples_processed = 0\n",
358 |     "samples_val = []\n",
359 |     "costs, accs_val = [], []\n",
360 |     "plt.figure()\n",
361 |     "try:\n",
362 |     "    while samples_processed < samples_to_process:\n",
363 |     "        # load data\n",
364 |     "        X_tr, X_len_tr, t_in_tr, t_out_tr, t_len_tr, t_mask_tr, \\\n",
365 |     "        text_inputs_tr, text_targets_in_tr, text_targets_out_tr = \\\n",
366 |     "            get_batch(batch_size=BATCH_SIZE,max_digits=MAX_DIGITS,min_digits=MIN_DIGITS)\n",
367 |     "        # make fetches\n",
368 |     "        fetches_tr = [train_op, loss, accuracy]\n",
369 |     "        # set up feed dict\n",
370 |     "        feed_dict_tr = {Xs: X_tr, X_len: X_len_tr, ts_in: t_in_tr,\n",
371 |     "             ts_out: t_out_tr, t_len: t_len_tr, t_mask: t_mask_tr}\n",
372 |     "        # run the model\n",
373 |     "        res = tuple(sess.run(fetches=fetches_tr, feed_dict=feed_dict_tr))\n",
374 |     "        _, batch_cost, batch_acc = res\n",
375 |     "        costs += [batch_cost]\n",
376 |     "        samples_processed += BATCH_SIZE\n",
377 |     "        #if samples_processed % 1000 == 0: print batch_cost, batch_acc\n",
378 |     "        #validation data\n",
379 |     "        if samples_processed % val_interval == 0:\n",
380 |     "            #print \"validating\"\n",
381 |     "            fetches_val = [accuracy_valid, y_valid]\n",
382 |     "            feed_dict_val = {Xs: X_val, X_len: X_len_val, ts_in: t_in_val,\n",
383 |     "             ts_out: t_out_val, t_len: t_len_val, t_mask: t_mask_val}\n",
384 |     "            res = tuple(sess.run(fetches=fetches_val, feed_dict=feed_dict_val))\n",
385 |     "            acc_val, output_val = res\n",
386 |     "            samples_val += [samples_processed]\n",
387 |     "            accs_val += [acc_val]\n",
388 |     "            plt.plot(samples_val, accs_val, 'g-')\n",
389 |     "            plt.ylabel('Validation Accuracy', fontsize=15)\n",
390 |     "            plt.xlabel('Processed samples', fontsize=15)\n",
391 |     "            plt.title('', fontsize=20)\n",
392 |     "            plt.grid('on')\n",
393 |     "            plt.savefig(\"out.png\")\n",
394 |     "            display.display(display.Image(filename=\"out.png\"))\n",
395 |     "            display.clear_output(wait=True)\n",
396 |     "except KeyboardInterrupt:\n",
397 |     "    pass"
398 |    ]
399 |   },
400 |   {
401 |    "cell_type": "code",
402 |    "execution_count": null,
403 |    "metadata": {
404 |     "collapsed": false,
405 |     "scrolled": true
406 |    },
407 |    "outputs": [],
408 |    "source": [
409 |     "#plot of validation accuracy for each target position\n",
410 |     "plt.figure(figsize=(7,7))\n",
411 |     "plt.plot(np.mean(np.argmax(output_val,axis=2)==t_out_val,axis=0))\n",
412 |     "plt.ylabel('Accuracy', fontsize=15)\n",
413 |     "plt.xlabel('Target position', fontsize=15)\n",
414 |     "#plt.title('', fontsize=20)\n",
415 |     "plt.grid('on')\n",
416 |     "plt.show()\n",
417 |     "#why do the plot look like this?"
418 |    ]
419 |   },
420 |   {
421 |    "cell_type": "markdown",
422 |    "metadata": {},
423 |    "source": [
424 |     "# Exercises:\n",
425 |     "\n",
426 |     "1. The model has two GRU networks. The ```GRUEncoder``` and the ```GRUDecoder```.\n",
427 |     "A GRU is parameterized by a update gate `z`, reset gate `r` and the cell `c`.\n",
428 |     "Under normal circumstances, such as in the TensorFlow GRUCell implementation, these gates have been stacked for faster computation, but in the custom decoder each weight and bias are as described in the original [article for GRU](https://arxiv.org/abs/1406.1078).\n",
429 |     "Thus we have the following weights and bias; ```{decoder/W_z_x:0, decoder/W_z_h:0, b_updategate, decoder/b_z:0, decoder/W_r_x:0, decoder/W_r_h:0, decoder/b_r:0, decoder/W_c_x:0, decoder/W_c_h:0, decoder/b_h:0}```.\n",
430 |     "Try to explain the shape of ```decoder/W_z_x:0``` and ```decoder/W_z_h:0```. Why are they different? You can find the equations for the gru at: [GRU](http://lasagne.readthedocs.io/en/latest/modules/layers/recurrent.html#lasagne.layers.GRULayer). \n",
431 |     "\n",
432 |     "2. The GRUunit is able to ignore the input and just copy the previous hidden state. In the begining of training this might be desireable behaviour because it helps the model learn long range dependencies. You can make the model ignore the input by modifying initial bias values. What bias would you modify and how would you modify it? Again you'll need to refer to the GRU equations:  [GRU](http://lasagne.readthedocs.io/en/latest/modules/layers/recurrent.html#lasagne.layers.GRULayer)\n",
433 |     "Further, if you look into `tf_utils.py` and search for the `decoder(...)` function, you will see that the init for each weight and bias can be changed.\n",
434 |     "\n",
435 |     "3. Try setting MIN_DIGITS and MAX_DIGITS to 20\n",
436 |     "\n",
437 |     "4. What is the final validation performance? Why do you think it is not better? Comment on the accuracy for each position in of the output symbols?\n",
438 |     "\n",
439 |     "5. Why do you think the validation performance looks more \"jig-saw\" like compared to FFN and CNN models?\n",
440 |     "\n",
441 |     "6. In the example we stack a softmax layer on top of a Recurrent layer. In the code snippet below explain how we can do that?"
442 |    ]
443 |   },
444 |   {
445 |    "cell_type": "code",
446 |    "execution_count": null,
447 |    "metadata": {
448 |     "collapsed": false
449 |    },
450 |    "outputs": [],
451 |    "source": [
452 |     "reset_default_graph()\n",
453 |     "\n",
454 |     "bs_, seqlen_, numinputs_ = 16, 140, 40\n",
455 |     "x_pl_ = tf.placeholder(tf.float32, [bs_, seqlen_, numinputs_])\n",
456 |     "gru_cell_ = tf.nn.rnn_cell.GRUCell(10)\n",
457 |     "l_gru_, gru_state_ = tf.nn.dynamic_rnn(gru_cell_, x_pl_, dtype=tf.float32)\n",
458 |     "l_reshape_ = tf.reshape(l_gru_, [-1, 10])\n",
459 |     "\n",
460 |     "l_softmax_ = tf.contrib.layers.fully_connected(l_reshape_, 11, activation_fn=tf.nn.softmax)\n",
461 |     "l_softmax_seq_ = tf.reshape(l_softmax_, [bs_, seqlen_, -1])\n",
462 |     "\n",
463 |     "print \"l_input_\", x_pl_.get_shape()\n",
464 |     "print \"l_gru_\", l_gru_.get_shape()\n",
465 |     "print \"l_reshape_\", l_reshape_.get_shape()\n",
466 |     "print \"l_softmax_\", l_softmax_.get_shape()\n",
467 |     "print \"l_softmax_seq_\", l_softmax_seq_.get_shape()"
468 |    ]
469 |   },
470 |   {
471 |    "cell_type": "markdown",
472 |    "metadata": {},
473 |    "source": [
474 |     "6. Optional: You are interested in doing sentiment analysis on tweets, i.e classification as positive or negative. You decide read over the twitter seqeuence and use the last hidden state to do the classification. How can you modify the small network above to only outa single classification for network? Hints: look at the gru\\_state\\_ or the [tf.slice](https://www.tensorflow.org/versions/r0.10/api_docs/python/array_ops.html#slice) in the API.\n",
475 |     "\n",
476 |     "\n",
477 |     "7. Optional: Bidirectional Encoder, Bidirectional Encoders are usually implemented by running a forward model and  a backward model (a forward model on a reversed sequence) separately and the concatenating them before parsing them on to the next layer. To reverse the sequence try looking at [tf.reverse_sequence](https://www.tensorflow.org/versions/r0.10/api_docs/python/array_ops.html#reverse_sequence)\n",
478 |     "\n",
479 |     "```\n",
480 |     "enc_cell = tf.nn.rnn_cell.GRUCell(NUM_UNITS_ENC)\n",
481 |     "_, enc_state = tf.nn.dynamic_rnn(cell=enc_cell, inputs=X_embedded,\n",
482 |     "                                 sequence_length=X_len, dtype=tf.float32, scope=\"rnn_forward\")\n",
483 |     "\n",
484 |     "X_embedded_backwards = tf.reverse_sequence(X_embedded, tf.to_int64(X_len), 1)\n",
485 |     "enc_cell_backwards = tf.nn.rnn_cell.GRUCell(NUM_UNITS_ENC)\n",
486 |     "_, enc_state_backwards = tf.nn.dynamic_rnn(cell=enc_cell_backwards, inputs=X_embedded_backwards,\n",
487 |     "                                 sequence_length=X_len, dtype=tf.float32, scope=\"rnn_backward\")\n",
488 |     "\n",
489 |     "enc_state = tf.concat(1, [enc_state, enc_state_backwards])\n",
490 |     "```\n",
491 |     "\n",
492 |     "Note: you will need to double the NUM_UNITS_DEC, as it currently does not support different sizes."
493 |    ]
494 |   },
495 |   {
496 |    "cell_type": "markdown",
497 |    "metadata": {},
498 |    "source": [
499 |     "## Attention Decoder (LSTM)\n",
500 |     "Soft attention for recurrent neural networks have recently attracted a lot of interest.\n",
501 |     "These methods let the Decoder model selective focus on which part of the encoder sequence it will use for each decoded output symbol.\n",
502 |     "This relieves the encoder from having to compress the input sequence into a fixed size vector representation passed on to the decoder.\n",
503 |     "Secondly we can interrogate the decoder network about where it attends while producing the ouputs.\n",
504 |     "below we'll implement an LSTM-decoder with selective attention and show that it significantly improves the performance of the toy translation task.\n",
505 |     "\n",
506 |     "The siminal attention paper is https://arxiv.org/pdf/1409.0473v7.pdf\n",
507 |     "\n",
508 |     "The principle of attention models is simple. \n",
509 |     "\n",
510 |     "1. Use the encoder to get the hidden represention $\\{h^1_e, ...h^n_e\\}$ for each position in the input sequence. \n",
511 |     "2. for timestep $t$ in the decoder do for $m = 1...n$ : $a_m = f(h^m_e, h^d_t)$. Where f is a function returning a scalar value. \n",
512 |     "3. You can then normalize the sequence of scalars $\\{a_1, ... a_n\\}$ to get probablities $\\{p_1, ... p_n\\}$.\n",
513 |     "4. Weight each $h^e_t$ by its probablity $p_t$ and sum to get $h_{in}$.\n",
514 |     "5. Use $h_{in}$ as an additional input to the decoder. $h_{in}$ is recalculated each time the decoder is updated."
515 |    ]
516 |   },
517 |   {
518 |    "cell_type": "code",
519 |    "execution_count": null,
520 |    "metadata": {
521 |     "collapsed": false
522 |    },
523 |    "outputs": [],
524 |    "source": [
525 |     "# resetting the graph\n",
526 |     "reset_default_graph()\n",
527 |     "\n",
528 |     "# Setting up hyperparameters and general configs\n",
529 |     "MAX_DIGITS = 10\n",
530 |     "MIN_DIGITS = 10\n",
531 |     "NUM_INPUTS = 27\n",
532 |     "NUM_OUTPUTS = 11 #(0-9 + '#')\n",
533 |     "\n",
534 |     "BATCH_SIZE = 100\n",
535 |     "# try various learning rates 1e-2 to 1e-5\n",
536 |     "LEARNING_RATE = 0.005\n",
537 |     "X_EMBEDDINGS = 8\n",
538 |     "t_EMBEDDINGS = 8\n",
539 |     "NUM_UNITS_ENC = 10\n",
540 |     "NUM_UNITS_DEC = 10\n",
541 |     "NUM_UNITS_ATTN = 20\n",
542 |     "\n",
543 |     "\n",
544 |     "# Setting up placeholders, these are the tensors that we \"feed\" to our network\n",
545 |     "Xs = tf.placeholder(tf.int32, shape=[None, None], name='X_input')\n",
546 |     "ts_in = tf.placeholder(tf.int32, shape=[None, None], name='t_input_in')\n",
547 |     "ts_out = tf.placeholder(tf.int32, shape=[None, None], name='t_input_out')\n",
548 |     "X_len = tf.placeholder(tf.int32, shape=[None], name='X_len')\n",
549 |     "t_len = tf.placeholder(tf.int32, shape=[None], name='X_len')\n",
550 |     "t_mask = tf.placeholder(tf.float32, shape=[None, None], name='t_mask')\n",
551 |     "\n",
552 |     "# Building the model\n",
553 |     "\n",
554 |     "# first we build the embeddings to make our characters into dense, trainable vectors\n",
555 |     "X_embeddings = tf.get_variable('X_embeddings', [NUM_INPUTS, X_EMBEDDINGS],\n",
556 |     "                               initializer=tf.random_normal_initializer(stddev=0.1))\n",
557 |     "t_embeddings = tf.get_variable('t_embeddings', [NUM_OUTPUTS, t_EMBEDDINGS],\n",
558 |     "                               initializer=tf.random_normal_initializer(stddev=0.1))\n",
559 |     "\n",
560 |     "# setting up weights for computing the final output\n",
561 |     "W_out = tf.get_variable('W_out', [NUM_UNITS_DEC, NUM_OUTPUTS])\n",
562 |     "b_out = tf.get_variable('b_out', [NUM_OUTPUTS])\n",
563 |     "\n",
564 |     "X_embedded = tf.gather(X_embeddings, Xs, name='embed_X')\n",
565 |     "t_embedded = tf.gather(t_embeddings, ts_in, name='embed_t')\n",
566 |     "\n",
567 |     "# forward encoding\n",
568 |     "enc_cell = tf.nn.rnn_cell.GRUCell(NUM_UNITS_ENC)#python.ops.rnn_cell.GRUCell\n",
569 |     "enc_out, enc_state = tf.nn.dynamic_rnn(cell=enc_cell, inputs=X_embedded,\n",
570 |     "                                 sequence_length=X_len, dtype=tf.float32)\n",
571 |     "# use below in case TF's does not work as intended\n",
572 |     "#enc_state, _ = tf_utils.encoder(X_embedded, X_len, 'encoder', NUM_UNITS_ENC)\n",
573 |     "#\n",
574 |     "#enc_state = tf.concat(1, [enc_state, enc_state])\n",
575 |     "\n",
576 |     "# decoding\n",
577 |     "# note that we are using a wrapper for decoding here, this wrapper is hardcoded to only use GRU\n",
578 |     "# check out tf_utils to see how you make your own decoder\n",
579 |     "dec_out, dec_out_valid, alpha_valid = \\\n",
580 |     "    tf_utils.attention_decoder(enc_out, X_len, enc_state, t_embedded, t_len,\n",
581 |     "                               NUM_UNITS_DEC, NUM_UNITS_ATTN, t_embeddings,\n",
582 |     "                               W_out, b_out)\n",
583 |     "\n",
584 |     "# reshaping to have [batch_size*seqlen, num_units]\n",
585 |     "out_tensor = tf.reshape(dec_out, [-1, NUM_UNITS_DEC])\n",
586 |     "out_tensor_valid = tf.reshape(dec_out_valid, [-1, NUM_UNITS_DEC])\n",
587 |     "# computing output\n",
588 |     "out_tensor = tf.matmul(out_tensor, W_out) + b_out\n",
589 |     "out_tensor_valid = tf.matmul(out_tensor_valid, W_out) + b_out\n",
590 |     "# reshaping back to sequence\n",
591 |     "b_size = tf.shape(X_len)[0] # use a variable we know has batch_size in [0]\n",
592 |     "seq_len = tf.shape(t_embedded)[1] # variable we know has sequence length in [1]\n",
593 |     "num_out = tf.constant(NUM_OUTPUTS) # casting NUM_OUTPUTS to a tensor variable\n",
594 |     "out_shape = tf.concat(0, [tf.expand_dims(b_size, 0),\n",
595 |     "                          tf.expand_dims(seq_len, 0),\n",
596 |     "                          tf.expand_dims(num_out, 0)])\n",
597 |     "out_tensor = tf.reshape(out_tensor, out_shape)\n",
598 |     "out_tensor_valid = tf.reshape(out_tensor_valid, out_shape)\n",
599 |     "# handling shape loss\n",
600 |     "#out_tensor.set_shape([None, None, NUM_OUTPUTS])\n",
601 |     "y = out_tensor\n",
602 |     "y_valid = out_tensor_valid"
603 |    ]
604 |   },
605 |   {
606 |    "cell_type": "code",
607 |    "execution_count": null,
608 |    "metadata": {
609 |     "collapsed": false
610 |    },
611 |    "outputs": [],
612 |    "source": [
613 |     "def loss_and_acc(preds):\n",
614 |     "    # sequence_loss_tensor is a modification of TensorFlow's own sequence_to_sequence_loss\n",
615 |     "    # TensorFlow's seq2seq loss works with a 2D list instead of a 3D tensors\n",
616 |     "    loss = tf_utils.sequence_loss_tensor(preds, ts_out, t_mask, NUM_OUTPUTS) # notice that we use ts_out here!\n",
617 |     "    # if you want regularization\n",
618 |     "    reg_scale = 0.00001\n",
619 |     "    regularize = tf.contrib.layers.l2_regularizer(reg_scale)\n",
620 |     "    params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n",
621 |     "    reg_term = sum([regularize(param) for param in params])\n",
622 |     "    loss += reg_term\n",
623 |     "    # calculate accuracy\n",
624 |     "    argmax = tf.to_int32(tf.argmax(preds, 2))\n",
625 |     "    correct = tf.to_float(tf.equal(argmax, ts_out)) * t_mask\n",
626 |     "    accuracy = tf.reduce_sum(correct) / tf.reduce_sum(t_mask)\n",
627 |     "    return loss, accuracy, argmax\n",
628 |     "\n",
629 |     "loss, accuracy, predictions = loss_and_acc(y)\n",
630 |     "loss_valid, accuracy_valid, predictions_valid = loss_and_acc(y_valid)\n",
631 |     "\n",
632 |     "# use lobal step to keep track of our iterations\n",
633 |     "global_step = tf.Variable(0, name='global_step', trainable=False)\n",
634 |     "# pick optimizer, try momentum or adadelta\n",
635 |     "optimizer = tf.train.AdamOptimizer(LEARNING_RATE)\n",
636 |     "# extract gradients for each variable\n",
637 |     "grads_and_vars = optimizer.compute_gradients(loss)\n",
638 |     "# add below for clipping by norm\n",
639 |     "#gradients, variables = zip(*grads_and_vars)  # unzip list of tuples\n",
640 |     "#clipped_gradients, global_norm = (\n",
641 |     "#    tf.clip_by_global_norm(gradients, self.clip_norm) )\n",
642 |     "#grads_and_vars = zip(clipped_gradients, variables)\n",
643 |     "# apply gradients and make trainable function\n",
644 |     "train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)"
645 |    ]
646 |   },
647 |   {
648 |    "cell_type": "code",
649 |    "execution_count": null,
650 |    "metadata": {
651 |     "collapsed": false
652 |    },
653 |    "outputs": [],
654 |    "source": [
655 |     "# as always, test the forward pass and start the tf.Session!\n",
656 |     "# here is some dummy data\n",
657 |     "batch_size = 3\n",
658 |     "inputs, inputs_seqlen, targets_in, targets_out, targets_seqlen, targets_mask, \\\n",
659 |     "text_inputs, text_targets_in, text_targets_out = \\\n",
660 |     "    get_batch(batch_size=3, max_digits=7, min_digits=2)\n",
661 |     "\n",
662 |     "for i in range(batch_size):\n",
663 |     "    print \"\\nSAMPLE\",i\n",
664 |     "    print \"TEXT INPUTS:\\t\\t\\t\", text_inputs[i]\n",
665 |     "    print \"TEXT TARGETS INPUT:\\t\\t\", text_targets_in[i]\n",
666 |     "\n",
667 |     "# restricting memory usage, TensorFlow is greedy and will use all memory otherwise\n",
668 |     "gpu_opts = tf.GPUOptions(per_process_gpu_memory_fraction=0.2)\n",
669 |     "# initialize the Session\n",
670 |     "sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_opts))\n",
671 |     "# test train part\n",
672 |     "sess.run(tf.initialize_all_variables())\n",
673 |     "feed_dict = {Xs: inputs, X_len: inputs_seqlen, ts_in: targets_in,\n",
674 |     "             ts_out: targets_out, t_len: targets_seqlen}\n",
675 |     "fetches = [y]\n",
676 |     "res = sess.run(fetches=fetches, feed_dict=feed_dict)\n",
677 |     "print \"y\", res[0].shape\n",
678 |     "\n",
679 |     "# test validation part\n",
680 |     "fetches = [y_valid]\n",
681 |     "res = sess.run(fetches=fetches, feed_dict=feed_dict)\n",
682 |     "print \"y_valid\", res[0].shape"
683 |    ]
684 |   },
685 |   {
686 |    "cell_type": "code",
687 |    "execution_count": null,
688 |    "metadata": {
689 |     "collapsed": false,
690 |     "scrolled": true
691 |    },
692 |    "outputs": [],
693 |    "source": [
694 |     "# print all the variable names and shapes\n",
695 |     "# notice that W_z is now packed, such that it contains both W_z_h and W_x_h, this is for optimization\n",
696 |     "# further, we now have W_s, b_s. This is so NUM_UNITS_ENC and NUM_UNITS_DEC does not have to share shape ..!\n",
697 |     "for var in tf.all_variables():\n",
698 |     "    s = var.name + \" \"*(40-len(var.name))\n",
699 |     "    print s, var.value().get_shape()"
700 |    ]
701 |   },
702 |   {
703 |    "cell_type": "code",
704 |    "execution_count": null,
705 |    "metadata": {
706 |     "collapsed": false
707 |    },
708 |    "outputs": [],
709 |    "source": [
710 |     "#Generate some validation data\n",
711 |     "X_val, X_len_val, t_in_val, t_out_val, t_len_val, t_mask_val, \\\n",
712 |     "text_inputs_val, text_targets_in_val, text_targets_out_val = \\\n",
713 |     "    get_batch(batch_size=5000, max_digits=MAX_DIGITS,min_digits=MIN_DIGITS)\n",
714 |     "print \"X_val\", X_val.shape\n",
715 |     "print \"t_out_val\", t_out_val.shape"
716 |    ]
717 |   },
718 |   {
719 |    "cell_type": "code",
720 |    "execution_count": null,
721 |    "metadata": {
722 |     "collapsed": false,
723 |     "scrolled": true
724 |    },
725 |    "outputs": [],
726 |    "source": [
727 |     "# NOTICE - THIS MIGHT TAKE UPTO 30 MINUTES ON CPU..!\n",
728 |     "# setting up running parameters\n",
729 |     "val_interval = 5000\n",
730 |     "samples_to_process = 3e5\n",
731 |     "samples_processed = 0\n",
732 |     "samples_val = []\n",
733 |     "costs, accs = [], []\n",
734 |     "plt.figure()\n",
735 |     "try:\n",
736 |     "    while samples_processed < samples_to_process:\n",
737 |     "        # load data\n",
738 |     "        X_tr, X_len_tr, t_in_tr, t_out_tr, t_len_tr, t_mask_tr, \\\n",
739 |     "        text_inputs_tr, text_targets_in_tr, text_targets_out_tr = \\\n",
740 |     "            get_batch(batch_size=BATCH_SIZE,max_digits=MAX_DIGITS,min_digits=MIN_DIGITS)\n",
741 |     "        # make fetches\n",
742 |     "        fetches_tr = [train_op, loss, accuracy]\n",
743 |     "        # set up feed dict\n",
744 |     "        feed_dict_tr = {Xs: X_tr, X_len: X_len_tr, ts_in: t_in_tr,\n",
745 |     "             ts_out: t_out_tr, t_len: t_len_tr, t_mask: t_mask_tr}\n",
746 |     "        # run the model\n",
747 |     "        res = tuple(sess.run(fetches=fetches_tr, feed_dict=feed_dict_tr))\n",
748 |     "        _, batch_cost, batch_acc = res\n",
749 |     "        costs += [batch_cost]\n",
750 |     "        samples_processed += BATCH_SIZE\n",
751 |     "        #if samples_processed % 1000 == 0: print batch_cost, batch_acc\n",
752 |     "        #validation data\n",
753 |     "        if samples_processed % val_interval == 0:\n",
754 |     "            #print \"validating\"\n",
755 |     "            fetches_val = [accuracy_valid, y_valid, alpha_valid]\n",
756 |     "            feed_dict_val = {Xs: X_val, X_len: X_len_val, ts_in: t_in_val,\n",
757 |     "             ts_out: t_out_val, t_len: t_len_val, t_mask: t_mask_val}\n",
758 |     "            res = tuple(sess.run(fetches=fetches_val, feed_dict=feed_dict_val))\n",
759 |     "            acc_val, output_val, alp_val = res\n",
760 |     "            samples_val += [samples_processed]\n",
761 |     "            accs += [acc_val]\n",
762 |     "            plt.plot(samples_val, accs, 'b-')\n",
763 |     "            plt.ylabel('Validation Accuracy', fontsize=15)\n",
764 |     "            plt.xlabel('Processed samples', fontsize=15)\n",
765 |     "            plt.title('', fontsize=20)\n",
766 |     "            plt.grid('on')\n",
767 |     "            plt.savefig(\"out_attention.png\")\n",
768 |     "            display.display(display.Image(filename=\"out_attention.png\"))\n",
769 |     "            display.clear_output(wait=True)\n",
770 |     "# NOTICE - THIS MIGHT TAKE UPTO 30 MINUTES ON CPU..!\n",
771 |     "except KeyboardInterrupt:\n",
772 |     "    pass"
773 |    ]
774 |   },
775 |   {
776 |    "cell_type": "code",
777 |    "execution_count": null,
778 |    "metadata": {
779 |     "collapsed": false
780 |    },
781 |    "outputs": [],
782 |    "source": [
783 |     "#plot of validation accuracy for each target position\n",
784 |     "plt.figure(figsize=(7,7))\n",
785 |     "plt.plot(np.mean(np.argmax(output_val,axis=2)==t_out_val,axis=0))\n",
786 |     "plt.ylabel('Accuracy', fontsize=15)\n",
787 |     "plt.xlabel('Target position', fontsize=15)\n",
788 |     "#plt.title('', fontsize=20)\n",
789 |     "plt.grid('on')\n",
790 |     "plt.show()\n",
791 |     "#why do the plot look like this?"
792 |    ]
793 |   },
794 |   {
795 |    "cell_type": "code",
796 |    "execution_count": null,
797 |    "metadata": {
798 |     "collapsed": false
799 |    },
800 |    "outputs": [],
801 |    "source": [
802 |     "### attention plot, try with different i = 1, 2, ..., 1000\n",
803 |     "i = 42\n",
804 |     "\n",
805 |     "column_labels = map(str, list(t_out_val[i]))\n",
806 |     "row_labels = map(str, (list(X_val[i])))\n",
807 |     "data = alp_val[i]\n",
808 |     "fig, ax = plt.subplots()\n",
809 |     "heatmap = ax.pcolor(data, cmap=plt.cm.Blues)\n",
810 |     "\n",
811 |     "# put the major ticks at the middle of each cell\n",
812 |     "ax.set_xticks(np.arange(data.shape[1])+0.5, minor=False)\n",
813 |     "ax.set_yticks(np.arange(data.shape[0])+0.5, minor=False)\n",
814 |     "\n",
815 |     "# want a more natural, table-like display\n",
816 |     "ax.invert_yaxis()\n",
817 |     "ax.xaxis.tick_top()\n",
818 |     "\n",
819 |     "ax.set_xticklabels(row_labels, minor=False)\n",
820 |     "ax.set_yticklabels(column_labels, minor=False)\n",
821 |     "\n",
822 |     "plt.ylabel('output', fontsize=15)\n",
823 |     "plt.xlabel('Attention plot', fontsize=15)\n",
824 |     "\n",
825 |     "plt.show()"
826 |    ]
827 |   },
828 |   {
829 |    "cell_type": "code",
830 |    "execution_count": null,
831 |    "metadata": {
832 |     "collapsed": false
833 |    },
834 |    "outputs": [],
835 |    "source": [
836 |     "#Plot of average attention weight as a function of the sequence position for each of \n",
837 |     "#the 21 targets in the output sequence i.e. each line is the mean postion of the \n",
838 |     "#attention for each target position.\n",
839 |     "\n",
840 |     "np.mean(alp_val, axis=0).shape\n",
841 |     "plt.figure()\n",
842 |     "plt.plot(np.mean(alp_val, axis=0).T)\n",
843 |     "plt.ylabel('alpha', fontsize=15)\n",
844 |     "plt.xlabel('Input Sequence position', fontsize=15)\n",
845 |     "plt.title('Alpha weights', fontsize=20)\n",
846 |     "plt.legend(map(str,range(1,22)), bbox_to_anchor=(1.125,1.0), fontsize=10)\n",
847 |     "plt.show()\n"
848 |    ]
849 |   },
850 |   {
851 |    "cell_type": "markdown",
852 |    "metadata": {
853 |     "collapsed": true
854 |    },
855 |    "source": [
856 |     "## Assignments for the attention decoder\n",
857 |     "1. Explain what the attention plot show.\n",
858 |     "2. Explain what the alphaweights show.\n",
859 |     "3. Why are the alpha curve for the first digit narrow and peaked while later digits have alpha curves that are wider and less peaked?\n",
860 |     "4. Why is attention a good idea for this problem? Can you think of other problems where attention is a good choice?\n",
861 |     "5. Try setting MIN_DIGITS and MAX_DIGITS to 20\n",
862 |     "6. Enable gradient clipping (under the loss codeblock)"
863 |    ]
864 |   }
865 |  ],
866 |  "metadata": {
867 |   "kernelspec": {
868 |    "display_name": "Python 2",
869 |    "language": "python",
870 |    "name": "python2"
871 |   },
872 |   "language_info": {
873 |    "codemirror_mode": {
874 |     "name": "ipython",
875 |     "version": 2
876 |    },
877 |    "file_extension": ".py",
878 |    "mimetype": "text/x-python",
879 |    "name": "python",
880 |    "nbconvert_exporter": "python",
881 |    "pygments_lexer": "ipython2",
882 |    "version": "2.7.6"
883 |   }
884 |  },
885 |  "nbformat": 4,
886 |  "nbformat_minor": 0
887 | }
888 | 


--------------------------------------------------------------------------------
/lab3_RNN/tf_utils.py:
--------------------------------------------------------------------------------
  1 | import tensorflow as tf
  2 | from tensorflow.python.ops import tensor_array_ops
  3 | from tensorflow.python.framework import ops
  4 | from tensorflow.python.ops import nn_ops
  5 | from tensorflow.python.ops import math_ops
  6 | 
  7 | 
  8 | ###
  9 | # custom loss function, similar to tensorflows but uses 3D tensors
 10 | # instead of a list of 2D tensors
 11 | def sequence_loss_tensor(logits, targets, weights, num_classes,
 12 |                          average_across_timesteps=True,
 13 |                          softmax_loss_function=None, name=None):
 14 |     """Weighted cross-entropy loss for a sequence of logits (per example).
 15 |     """
 16 |     with ops.op_scope([logits, targets, weights], name, "sequence_loss_by_example"):
 17 |         probs_flat = tf.reshape(logits, [-1, num_classes])
 18 |         targets = tf.reshape(targets, [-1])
 19 |         if softmax_loss_function is None:
 20 |             crossent = nn_ops.sparse_softmax_cross_entropy_with_logits(
 21 |                     probs_flat, targets)
 22 |         else:
 23 |             crossent = softmax_loss_function(probs_flat, targets)
 24 |         crossent = crossent * tf.reshape(weights, [-1])
 25 |         crossent = tf.reduce_sum(crossent)
 26 |         total_size = math_ops.reduce_sum(weights)
 27 |         total_size += 1e-12 # to avoid division by zero
 28 |         crossent /= total_size
 29 |         return crossent
 30 | 
 31 | 
 32 | ###
 33 | # a custom masking function, takes sequence lengths and makes masks
 34 | def mask(sequence_lengths):
 35 |     # based on this SO answer: http://stackoverflow.com/a/34138336/118173
 36 |     batch_size = tf.shape(sequence_lengths)[0]
 37 |     max_len = tf.reduce_max(sequence_lengths)
 38 | 
 39 |     lengths_transposed = tf.expand_dims(sequence_lengths, 1)
 40 | 
 41 |     rng = tf.range(max_len)
 42 |     rng_row = tf.expand_dims(rng, 0)
 43 | 
 44 |     return tf.less(rng_row, lengths_transposed)
 45 | 
 46 | 
 47 | ###
 48 | # a custom encoder function (in case we cant get tensorflows to work)
 49 | 
 50 | def encoder(inputs, lengths, name, num_units, reverse=False, swap=False):
 51 |     with tf.variable_scope(name):
 52 |         weight_initializer = tf.truncated_normal_initializer(stddev=0.1)
 53 |         input_units = inputs.get_shape()[2]
 54 |         W_z = tf.get_variable('W_z',
 55 |                               shape=[input_units+num_units, num_units],
 56 |                               initializer=weight_initializer)
 57 |         W_r = tf.get_variable('W_r',
 58 |                               shape=[input_units+num_units, num_units],
 59 |                               initializer=weight_initializer)
 60 |         W_h = tf.get_variable('W_h',
 61 |                               shape=[input_units+num_units, num_units],
 62 |                               initializer=weight_initializer)
 63 |         b_z = tf.get_variable('b_z',
 64 |                               shape=[num_units],
 65 |                               initializer=tf.constant_initializer(1.0))
 66 |         b_r = tf.get_variable('b_r',
 67 |                               shape=[num_units],
 68 |                               initializer=tf.constant_initializer(1.0))
 69 |         b_h = tf.get_variable('b_h',
 70 |                               shape=[num_units],
 71 |                               initializer=tf.constant_initializer())
 72 | 
 73 |         max_sequence_length = tf.reduce_max(lengths)
 74 |         min_sequence_length = tf.reduce_min(lengths)
 75 | 
 76 |         time = tf.constant(0)
 77 | 
 78 |         state_shape = tf.concat(0, [tf.expand_dims(tf.shape(lengths)[0], 0),
 79 |                                     tf.expand_dims(tf.constant(num_units), 0)])
 80 |         # state_shape = tf.Print(state_shape, [state_shape])
 81 |         state = tf.zeros(state_shape, dtype=tf.float32)
 82 | 
 83 |         if reverse:
 84 |             inputs = tf.reverse(inputs, dims=[False, True, False])
 85 |         inputs = tf.transpose(inputs, perm=[1, 0, 2])
 86 |         input_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True)
 87 |         input_ta = input_ta.unpack(inputs)
 88 | 
 89 |         output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True)
 90 | 
 91 |         def encoder_cond(time, state, output_ta_t):
 92 |             return tf.less(time, max_sequence_length)
 93 | 
 94 |         def encoder_body(time, old_state, output_ta_t):
 95 |             x_t = input_ta.read(time)
 96 | 
 97 |             con = tf.concat(1, [x_t, old_state])
 98 |             z = tf.sigmoid(tf.matmul(con, W_z) + b_z)
 99 |             r = tf.sigmoid(tf.matmul(con, W_r) + b_r)
100 |             con = tf.concat(1, [x_t, r*old_state])
101 |             h = tf.tanh(tf.matmul(con, W_h) + b_h)
102 |             new_state = (1-z)*h + z*old_state
103 | 
104 |             output_ta_t = output_ta_t.write(time, new_state)
105 | 
106 |             def updateall():
107 |                 return new_state
108 | 
109 |             def updatesome():
110 |                 if reverse:
111 |                     return tf.select(
112 |                         tf.greater_equal(time, max_sequence_length-lengths),
113 |                         new_state,
114 |                         old_state)
115 |                 else:
116 |                     return tf.select(tf.less(time, lengths), new_state, old_state)
117 | 
118 |             if reverse:
119 |                 state = tf.cond(
120 |                     tf.greater_equal(time, max_sequence_length-min_sequence_length),
121 |                     updateall,
122 |                     updatesome)
123 |             else:
124 |                 state = tf.cond(tf.less(time, min_sequence_length), updateall, updatesome)
125 | 
126 |             return (time + 1, state, output_ta_t)
127 | 
128 |         loop_vars = [time, state, output_ta]
129 | 
130 |         time, state, output_ta = tf.while_loop(encoder_cond, encoder_body, loop_vars, swap_memory=swap)
131 | 
132 |         enc_state = state
133 |         enc_out = tf.transpose(output_ta.pack(), perm=[1, 0, 2])
134 | 
135 |         if reverse:
136 |             enc_out = tf.reverse(enc_out, dims=[False, True, False])
137 | 
138 |         enc_out.set_shape([None, None, num_units])
139 | 
140 |         return enc_state, enc_out
141 | 
142 | 
143 | ###
144 | # a custom decoder function
145 | 
146 | def decoder(initial_state, target_input, target_len, num_units,
147 |             embeddings, W_out, b_out,
148 |             W_z_x_init = tf.truncated_normal_initializer(stddev=0.1),
149 |             W_z_h_init = tf.truncated_normal_initializer(stddev=0.1),
150 |             W_r_x_init = tf.truncated_normal_initializer(stddev=0.1),
151 |             W_r_h_init = tf.truncated_normal_initializer(stddev=0.1),
152 |             W_c_x_init = tf.truncated_normal_initializer(stddev=0.1),
153 |             W_c_h_init = tf.truncated_normal_initializer(stddev=0.1),
154 |             b_z_init = tf.constant_initializer(0.0),
155 |             b_r_init = tf.constant_initializer(0.0),
156 |             b_c_init = tf.constant_initializer(0.0),
157 |             name='decoder', swap=False):
158 |     """decoder
159 |         TODO
160 |     """
161 | 
162 | 
163 |     with tf.variable_scope(name):
164 |         # we need the max seq len to optimize our RNN computation later on
165 |         max_sequence_length = tf.reduce_max(target_len)
166 |         # target_dims is just the embedding size
167 |         target_dims = target_input.get_shape()[2]
168 |         # set up weights for the GRU gates
169 |         var = tf.get_variable # for ease of use
170 |         # target_dims + num_units is because we stack embeddings and prev. hidden state to
171 |         # optimize speed
172 |         W_z_x = var('W_z_x', shape=[target_dims, num_units], initializer=W_z_x_init)
173 |         W_z_h = var('W_z_h', shape=[num_units, num_units], initializer=W_z_h_init)
174 |         b_z = var('b_z', shape=[num_units], initializer=b_z_init)
175 |         W_r_x = var('W_r_x', shape=[target_dims, num_units], initializer=W_r_x_init)
176 |         W_r_h = var('W_r_h', shape=[num_units, num_units], initializer=W_r_h_init)
177 |         b_r = var('b_r', shape=[num_units], initializer=b_r_init)
178 |         W_c_x = var('W_c_x', shape=[target_dims, num_units], initializer=W_c_x_init)
179 |         W_c_h = var('W_c_h', shape=[num_units, num_units], initializer=W_c_h_init)
180 |         b_c = var('b_h', shape=[num_units], initializer=b_c_init)
181 | 
182 |         # make inputs time-major
183 |         inputs = tf.transpose(target_input, perm=[1, 0, 2])
184 |         # make tensor array for inputs, these are dynamic and used in the while-loop
185 |         # these are not in the api documentation yet, you will have to look at github.com/tensorflow
186 |         input_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True)
187 |         input_ta = input_ta.unpack(inputs)
188 | 
189 |         # function to the while-loop, for early stopping
190 |         def decoder_cond(time, state, output_ta_t):
191 |             return tf.less(time, max_sequence_length)
192 | 
193 |         # the body_builder is just a wrapper to parse feedback
194 |         def decoder_body_builder(feedback=False):
195 |             # the decoder body, this is where the RNN magic happens!
196 |             def decoder_body(time, old_state, output_ta_t):
197 |                 # when validating we need previous prediction, handle in feedback
198 |                 if feedback:
199 |                     def from_previous():
200 |                         prev_1 = tf.matmul(old_state, W_out) + b_out
201 |                         return tf.gather(embeddings, tf.argmax(prev_1, 1))
202 |                     x_t = tf.cond(tf.greater(time, 0), from_previous, lambda: input_ta.read(0))
203 |                 else:
204 |                     # else we just read the next timestep
205 |                     x_t = input_ta.read(time)
206 | 
207 |                 # calculate the GRU
208 |                 z = tf.sigmoid(tf.matmul(x_t, W_z_x) + tf.matmul(old_state, W_z_h) + b_z) # update gate
209 |                 r = tf.sigmoid(tf.matmul(x_t, W_r_x) + tf.matmul(old_state, W_r_h) + b_r) # reset gate
210 |                 c = tf.tanh(tf.matmul(x_t, W_c_x) + tf.matmul(r*old_state, W_c_h) + b_c) # proposed new state
211 |                 new_state = (1-z)*c + z*old_state # new state
212 | 
213 |                 # writing output
214 |                 output_ta_t = output_ta_t.write(time, new_state)
215 | 
216 |                 # return in "input-to-next-step" style
217 |                 return (time + 1, new_state, output_ta_t)
218 |             return decoder_body
219 |         # set up variables to loop with
220 |         output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)
221 |         time = tf.constant(0)
222 |         loop_vars = [time, initial_state, output_ta]
223 | 
224 |         # run the while-loop for training
225 |         _, state, output_ta = tf.while_loop(decoder_cond,
226 |                                             decoder_body_builder(),
227 |                                             loop_vars,
228 |                                             swap_memory=swap)
229 |         # run the while-loop for validation
230 |         _, valid_state, valid_output_ta = tf.while_loop(decoder_cond,
231 |                                                         decoder_body_builder(feedback=True),
232 |                                                         loop_vars,
233 |                                                         swap_memory=swap)
234 |         # returning to batch major
235 |         dec_out = tf.transpose(output_ta.pack(), perm=[1, 0, 2])
236 |         valid_dec_out = tf.transpose(valid_output_ta.pack(), perm=[1, 0, 2])
237 |         return dec_out, valid_dec_out
238 | 
239 | 
240 | ###
241 | # decoder with attention
242 | 
243 | def attention_decoder(attention_input, attention_lengths, initial_state, target_input,
244 |                       target_input_lengths, num_units, num_attn_units, embeddings, W_out, b_out,
245 |                       name='decoder', swap=False):
246 |     """Decoder with attention.
247 |     Note that the number of units in the attention decoder must always
248 |     be equal to the size of the initial state/attention input.
249 |     Keyword arguments:
250 |         attention_input:    the input to put attention on. expected dims: [batch_size, attention_length, attention_dims]
251 |         initial_state:      The initial state for the decoder RNN.
252 |         target_input:       The target to replicate. Expected: [batch_size, max_target_sequence_len, embedding_dims]
253 |         num_attn_units:     Number of units in the alignment layer that produces the context vectors.
254 |     """
255 |     with tf.variable_scope(name):
256 |         target_dims = target_input.get_shape()[2]
257 |         attention_dims = attention_input.get_shape()[2]
258 |         attn_len = tf.shape(attention_input)[1]
259 |         max_sequence_length = tf.reduce_max(target_input_lengths)
260 | 
261 |         weight_initializer = tf.truncated_normal_initializer(stddev=0.1)
262 |         # map initial state to num_units
263 |         W_s = tf.get_variable('W_s',
264 |                               shape=[attention_dims, num_units],
265 |                               initializer=weight_initializer)
266 |         b_s = tf.get_variable('b_s',
267 |                               shape=[num_units],
268 |                               initializer=tf.constant_initializer())
269 | 
270 |         # GRU
271 |         W_z = tf.get_variable('W_z',
272 |                               shape=[target_dims+num_units+attention_dims, num_units],
273 |                               initializer=weight_initializer)
274 |         W_r = tf.get_variable('W_r',
275 |                               shape=[target_dims+num_units+attention_dims, num_units],
276 |                               initializer=weight_initializer)
277 |         W_c = tf.get_variable('W_c',
278 |                               shape=[target_dims+num_units+attention_dims, num_units],
279 |                               initializer=weight_initializer)
280 |         b_z = tf.get_variable('b_z',
281 |                               shape=[num_units],
282 |                               initializer=tf.constant_initializer(1.0))
283 |         b_r = tf.get_variable('b_r',
284 |                               shape=[num_units],
285 |                               initializer=tf.constant_initializer(1.0))
286 |         b_c = tf.get_variable('b_c',
287 |                               shape=[num_units],
288 |                               initializer=tf.constant_initializer())
289 | 
290 |         # for attention
291 |         W_a = tf.get_variable('W_a',
292 |                               shape=[attention_dims, num_attn_units],
293 |                               initializer=weight_initializer)
294 |         U_a = tf.get_variable('U_a',
295 |                               shape=[1, 1, attention_dims, num_attn_units],
296 |                               initializer=weight_initializer)
297 |         b_a = tf.get_variable('b_a',
298 |                               shape=[num_attn_units],
299 |                               initializer=tf.constant_initializer())
300 |         v_a = tf.get_variable('v_a',
301 |                               shape=[num_attn_units],
302 |                               initializer=weight_initializer)
303 | 
304 |         # project initial state
305 |         initial_state = tf.nn.tanh(tf.matmul(initial_state, W_s) + b_s)
306 | 
307 |         # TODO: don't use convolutions!
308 |         # TODO: fix the bias (b_a)
309 |         hidden = tf.reshape(attention_input, tf.pack([-1, attn_len, 1, attention_dims]))
310 |         part1 = tf.nn.conv2d(hidden, U_a, [1, 1, 1, 1], "SAME")
311 |         part1 = tf.squeeze(part1, [2])  # squeeze out the third dimension
312 | 
313 |         inputs = tf.transpose(target_input, perm=[1, 0, 2])
314 |         input_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True)
315 |         input_ta = input_ta.unpack(inputs)
316 | 
317 |         def decoder_cond(time, state, output_ta_t, attention_tracker):
318 |             return tf.less(time, max_sequence_length)
319 | 
320 |         def decoder_body_builder(feedback=False):
321 |             def decoder_body(time, old_state, output_ta_t, attention_tracker):
322 |                 if feedback:
323 |                     def from_previous():
324 |                         prev_1 = tf.matmul(old_state, W_out) + b_out
325 |                         return tf.gather(embeddings, tf.argmax(prev_1, 1))
326 |                     x_t = tf.cond(tf.greater(time, 0), from_previous, lambda: input_ta.read(0))
327 |                 else:
328 |                     x_t = input_ta.read(time)
329 | 
330 |                 # attention
331 |                 part2 = tf.matmul(old_state, W_a) + b_a
332 |                 part2 = tf.expand_dims(part2, 1)
333 |                 john = part1 + part2
334 |                 e = tf.reduce_sum(v_a * tf.tanh(john), [2])
335 |                 alpha = tf.nn.softmax(e)
336 |                 alpha = tf.to_float(mask(attention_lengths)) * alpha
337 |                 alpha = alpha / tf.reduce_sum(alpha, [1], keep_dims=True)
338 |                 attention_tracker = attention_tracker.write(time, alpha)
339 |                 context = tf.reduce_sum(tf.expand_dims(alpha, 2) * tf.squeeze(hidden), [1])
340 | 
341 |                 # GRU
342 |                 con = tf.concat(1, [x_t, old_state, context])
343 |                 z = tf.sigmoid(tf.matmul(con, W_z) + b_z)
344 |                 r = tf.sigmoid(tf.matmul(con, W_r) + b_r)
345 |                 con = tf.concat(1, [x_t, r*old_state, context])
346 |                 c = tf.tanh(tf.matmul(con, W_c) + b_c)
347 |                 new_state = (1-z)*c + z*old_state
348 | 
349 |                 output_ta_t = output_ta_t.write(time, new_state)
350 | 
351 |                 return (time + 1, new_state, output_ta_t, attention_tracker)
352 |             return decoder_body
353 | 
354 | 
355 |         output_ta = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)
356 |         attention_tracker = tensor_array_ops.TensorArray(tf.float32, size=1, dynamic_size=True, infer_shape=False)
357 |         time = tf.constant(0)
358 |         loop_vars = [time, initial_state, output_ta, attention_tracker]
359 | 
360 |         _, state, output_ta, _ = tf.while_loop(decoder_cond,
361 |                                                decoder_body_builder(),
362 |                                                loop_vars,
363 |                                                swap_memory=swap)
364 |         _, valid_state, valid_output_ta, valid_attention_tracker = tf.while_loop(decoder_cond,
365 |                                                         decoder_body_builder(feedback=True),
366 |                                                         loop_vars,
367 |                                                         swap_memory=swap)
368 | 
369 |         dec_out = tf.transpose(output_ta.pack(), perm=[1, 0, 2])
370 |         valid_dec_out = tf.transpose(valid_output_ta.pack(), perm=[1, 0, 2])
371 |         valid_attention_tracker = tf.transpose(valid_attention_tracker.pack(), perm=[1, 0, 2])
372 | 
373 |         return dec_out, valid_dec_out, valid_attention_tracker
374 | 


--------------------------------------------------------------------------------
/lab4_Kaggle/.gitignore:
--------------------------------------------------------------------------------
1 | tensorboard
2 | 


--------------------------------------------------------------------------------
/lab4_Kaggle/README.md:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/alrojo/tensorflow-tutorial/deae7354412a52d1874a03a34fe8d3a65d541d8f/lab4_Kaggle/README.md


--------------------------------------------------------------------------------
/lab5_AE/.gitignore:
--------------------------------------------------------------------------------
1 | *.jpg
2 | *.png
3 | 


--------------------------------------------------------------------------------
/lab5_AE/lab5_AE.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Credits\n",
  8 |     "TensorFlow translation of [Lasagne tutorial](https://github.com/DeepLearningDTU/02456-deep-learning/blob/master/week5/lab51_AE.ipynb). Thanks to [skaae](https://github.com/skaae), [casperkaae](https://github.com/casperkaae) and [larsmaaloee](https://github.com/larsmaaloee)."
  9 |    ]
 10 |   },
 11 |   {
 12 |    "cell_type": "markdown",
 13 |    "metadata": {},
 14 |    "source": [
 15 |     "# Dependancies and supporting functions\n",
 16 |     "Loading dependancies and supporting functions by running the code block below."
 17 |    ]
 18 |   },
 19 |   {
 20 |    "cell_type": "code",
 21 |    "execution_count": null,
 22 |    "metadata": {
 23 |     "collapsed": false
 24 |    },
 25 |    "outputs": [],
 26 |    "source": [
 27 |     "from __future__ import division, print_function\n",
 28 |     "import matplotlib\n",
 29 |     "import matplotlib.pyplot as plt\n",
 30 |     "from IPython.display import Image, display, clear_output\n",
 31 |     "%matplotlib nbagg\n",
 32 |     "%matplotlib inline \n",
 33 |     "import numpy as np\n",
 34 |     "import matplotlib.pyplot as plt\n",
 35 |     "import sklearn.datasets\n",
 36 |     "import tensorflow as tf\n",
 37 |     "from tensorflow.python.framework.ops import reset_default_graph"
 38 |    ]
 39 |   },
 40 |   {
 41 |    "cell_type": "markdown",
 42 |    "metadata": {},
 43 |    "source": [
 44 |     "# Auto-encoders 101\n",
 45 |     "In this notebook you will implement a simple auto-encoder (AE). We assume that you are already familiar with the basics of neural networks. We'll start by defining an AE similar to the one used for the finetuning step by [Geoffrey Hinton and Ruslan Salakhutdinov](https://www.cs.toronto.edu/~hinton/science.pdf). We'll experiment with the AE setup and try to run it on the MNIST dataset. There has been a wide variety of research into the field of auto-encoders and the technique that you're about to learn is very simple compared to recent advances (e.g. [the Ladder network](https://arxiv.org/abs/1507.02672) and [VAEs](https://arxiv.org/abs/1312.6114)). However, the basic idea stays the same.\n",
 46 |     "\n",
 47 |     "AEs are used within unsupervised learning, in which you do not have a target $y$. Instead it *encodes* an input $x$ into a latent state $z$ and decodes $z$ into a reconstruction $\\hat{x}$. This way the parameters of the network can be optimized w.r.t. the difference between $x$ and $\\hat{x}$. Depending on the input distribution, the difference can be measured in various ways, e.g. mean squared error (MSE). In many applications the auto-encoder will find an internal state of each data point corresponding to a feature. So if we are to model the MNIST dataset, one could expect that the internal state would correspond to a digit-class and/or the shape.\n",
 48 |     "\n",
 49 |     "*The exercises are found at the bottom of the notebook*"
 50 |    ]
 51 |   },
 52 |   {
 53 |    "cell_type": "markdown",
 54 |    "metadata": {},
 55 |    "source": [
 56 |     "## MNIST\n",
 57 |     "First let us load the MNIST dataset and plot a few examples. We only load a limited amount of classes to speed up training."
 58 |    ]
 59 |   },
 60 |   {
 61 |    "cell_type": "code",
 62 |    "execution_count": null,
 63 |    "metadata": {
 64 |     "collapsed": false
 65 |    },
 66 |    "outputs": [],
 67 |    "source": [
 68 |     "from sklearn.utils import shuffle\n",
 69 |     "\n",
 70 |     "# To speed up training we'll only work on a subset of the data containing only the numbers 0, 1.\n",
 71 |     "data = np.load('../lab1_FFN/mnist.npz')\n",
 72 |     "num_classes = 2\n",
 73 |     "idxs_train = []\n",
 74 |     "idxs_valid = []\n",
 75 |     "idxs_test = []\n",
 76 |     "for i in range(num_classes):\n",
 77 |     "    idxs_train += np.where(data['y_train'] == i)[0].tolist()\n",
 78 |     "    idxs_valid += np.where(data['y_valid'] == i)[0].tolist()\n",
 79 |     "    idxs_test += np.where(data['y_test'] == i)[0].tolist()\n",
 80 |     "\n",
 81 |     "x_train = data['X_train'][idxs_train].astype('float32')\n",
 82 |     "# Since this is unsupervised, the targets are only used for validation.\n",
 83 |     "targets_train = data['y_train'][idxs_train].astype('int32')\n",
 84 |     "x_train, targets_train = shuffle(x_train, targets_train, random_state=1234)\n",
 85 |     "\n",
 86 |     "\n",
 87 |     "x_valid = data['X_valid'][idxs_valid].astype('float32')\n",
 88 |     "targets_valid = data['y_valid'][idxs_valid].astype('int32')\n",
 89 |     "\n",
 90 |     "x_test = data['X_test'][idxs_test].astype('float32')\n",
 91 |     "targets_test = data['y_test'][idxs_test].astype('int32')\n",
 92 |     "\n",
 93 |     "print(\"training set dim(%i, %i).\" % x_train.shape)\n",
 94 |     "print(\"validation set dim(%i, %i).\" % x_valid.shape)\n",
 95 |     "print(\"test set dim(%i, %i).\" % x_test.shape)"
 96 |    ]
 97 |   },
 98 |   {
 99 |    "cell_type": "code",
100 |    "execution_count": null,
101 |    "metadata": {
102 |     "collapsed": false
103 |    },
104 |    "outputs": [],
105 |    "source": [
106 |     "#plot a few MNIST examples\n",
107 |     "idx = 0\n",
108 |     "canvas = np.zeros((28*10, 10*28))\n",
109 |     "for i in range(10):\n",
110 |     "    for j in range(10):\n",
111 |     "        canvas[i*28:(i+1)*28, j*28:(j+1)*28] = x_train[idx].reshape((28, 28))\n",
112 |     "        idx += 1\n",
113 |     "plt.figure(figsize=(7, 7))\n",
114 |     "plt.axis('off')\n",
115 |     "plt.imshow(canvas, cmap='gray')\n",
116 |     "plt.title('MNIST handwritten digits')"
117 |    ]
118 |   },
119 |   {
120 |    "cell_type": "markdown",
121 |    "metadata": {},
122 |    "source": [
123 |     "### Building the model\n",
124 |     "When defining the model the latent layer $z$ must act as a bottleneck of information. We initialize the AE with 1 hidden layer in the encoder and decoder using relu units as non-linearities. The latent layer has a dimensionality of 2 in order to make it easy to visualise. Since $x$ are pixel intensities that are normalized between 0 and 1, we use the sigmoid non-linearity to model the reconstruction."
125 |    ]
126 |   },
127 |   {
128 |    "cell_type": "code",
129 |    "execution_count": null,
130 |    "metadata": {
131 |     "collapsed": true
132 |    },
133 |    "outputs": [],
134 |    "source": [
135 |     "from tensorflow.contrib.layers import fully_connected\n",
136 |     "from tensorflow.python.ops.nn import relu, sigmoid"
137 |    ]
138 |   },
139 |   {
140 |    "cell_type": "code",
141 |    "execution_count": null,
142 |    "metadata": {
143 |     "collapsed": false
144 |    },
145 |    "outputs": [],
146 |    "source": [
147 |     "# define in/output size\n",
148 |     "num_features = x_train.shape[1]\n",
149 |     "\n",
150 |     "# reset graph\n",
151 |     "reset_default_graph()\n",
152 |     "\n",
153 |     "# define the model\n",
154 |     "x_pl = tf.placeholder(tf.float32, [None, num_features], 'x_pl')\n",
155 |     "l_enc = fully_connected(inputs=x_pl, num_outputs=128, activation_fn=relu, scope='l_enc')\n",
156 |     "l_z = fully_connected(inputs=l_enc, num_outputs=2, activation_fn=None, scope='l_z') # None indicates a linear output.\n",
157 |     "l_dec = fully_connected(inputs=l_z, num_outputs=128, activation_fn=relu, scope='l_dec')\n",
158 |     "l_out = fully_connected(inputs=l_dec, num_outputs=num_features, activation_fn=sigmoid) # iid pixel intensities between 0 and 1."
159 |    ]
160 |   },
161 |   {
162 |    "cell_type": "markdown",
163 |    "metadata": {},
164 |    "source": [
165 |     "Following we define the TensorFlow functions for training and evaluation."
166 |    ]
167 |   },
168 |   {
169 |    "cell_type": "code",
170 |    "execution_count": null,
171 |    "metadata": {
172 |     "collapsed": false
173 |    },
174 |    "outputs": [],
175 |    "source": [
176 |     "# calculate loss\n",
177 |     "loss_per_pixel = tf.square(tf.sub(l_out, x_pl))\n",
178 |     "loss = tf.reduce_mean(loss_per_pixel, name=\"mean_square_error\")\n",
179 |     "# if you want regularization\n",
180 |     "#reg_scale = 0.0005\n",
181 |     "#regularize = tf.contrib.layers.l2_regularizer(reg_scale)\n",
182 |     "#params = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES)\n",
183 |     "#reg_term = sum([regularize(param) for param in params])\n",
184 |     "#loss += reg_term\n",
185 |     "\n",
186 |     "# define our optimizer\n",
187 |     "optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.25)\n",
188 |     "\n",
189 |     "# make training op for applying the gradients\n",
190 |     "train_op = optimizer.minimize(loss)"
191 |    ]
192 |   },
193 |   {
194 |    "cell_type": "code",
195 |    "execution_count": null,
196 |    "metadata": {
197 |     "collapsed": false
198 |    },
199 |    "outputs": [],
200 |    "source": [
201 |     "# test the forward pass\n",
202 |     "_x_test = np.zeros(shape=(32, num_features))\n",
203 |     "# initialize the Session\n",
204 |     "sess = tf.Session()\n",
205 |     "# test the forward pass\n",
206 |     "sess.run(tf.initialize_all_variables())\n",
207 |     "feed_dict = {x_pl: _x_test}\n",
208 |     "res_forward_pass = sess.run(fetches=[l_out], feed_dict=feed_dict)\n",
209 |     "print(\"l_out\", res_forward_pass[0].shape)"
210 |    ]
211 |   },
212 |   {
213 |    "cell_type": "markdown",
214 |    "metadata": {},
215 |    "source": [
216 |     "In the training loop we sample each batch and evaluate the error, latent space and reconstructions every epoch."
217 |    ]
218 |   },
219 |   {
220 |    "cell_type": "code",
221 |    "execution_count": null,
222 |    "metadata": {
223 |     "collapsed": false
224 |    },
225 |    "outputs": [],
226 |    "source": [
227 |     "batch_size = 100\n",
228 |     "num_epochs = 100\n",
229 |     "num_samples_train = x_train.shape[0]\n",
230 |     "num_batches_train = num_samples_train // batch_size\n",
231 |     "num_samples_valid = x_valid.shape[0]\n",
232 |     "num_batches_valid = num_samples_valid // batch_size\n",
233 |     "updates = []\n",
234 |     "\n",
235 |     "train_loss = []\n",
236 |     "valid_loss = []\n",
237 |     "cur_loss = 0\n",
238 |     "plt.figure(figsize=(12, 24))\n",
239 |     "\n",
240 |     "try:\n",
241 |     "    for epoch in range(num_epochs):\n",
242 |     "        #Forward->Backprob->Update params\n",
243 |     "        cur_loss = []\n",
244 |     "        for i in range(num_batches_train):\n",
245 |     "            idxs = np.random.choice(range(x_train.shape[0]), size=(batch_size), replace=False)    \n",
246 |     "            x_batch = x_train[idxs]\n",
247 |     "            # setup what to fetch, notice l\n",
248 |     "            fetches_train = [train_op, loss, l_out, l_z]\n",
249 |     "            feed_dict_train = {x_pl: x_batch}\n",
250 |     "            # do the complete backprob pass\n",
251 |     "            res_train = sess.run(fetches_train, feed_dict_train)\n",
252 |     "            _, batch_loss, train_out, train_z = tuple(res_train)\n",
253 |     "            cur_loss += [batch_loss]\n",
254 |     "        train_loss += [np.mean(cur_loss)]\n",
255 |     "        updates += [batch_size*num_batches_train*(epoch+1)]\n",
256 |     "\n",
257 |     "        # evaluate\n",
258 |     "        fetches_eval = [loss, l_out, l_z]\n",
259 |     "        feed_dict_eval = {x_pl: x_valid}\n",
260 |     "        res_valid = sess.run(fetches_eval, feed_dict_eval)\n",
261 |     "        eval_loss, eval_out, eval_z = tuple(res_valid)\n",
262 |     "        valid_loss += [eval_loss]\n",
263 |     "\n",
264 |     "        if epoch == 0:\n",
265 |     "            continue\n",
266 |     "\n",
267 |     "        # Plotting\n",
268 |     "        plt.subplot(num_classes+1,2,1)\n",
269 |     "        plt.title('Error')\n",
270 |     "        plt.legend(['Train Error', 'Valid Error'])\n",
271 |     "        plt.xlabel('Updates'), plt.ylabel('Error')\n",
272 |     "        plt.plot(updates, train_loss, color=\"black\")\n",
273 |     "        plt.plot(updates, valid_loss, color=\"grey\")\n",
274 |     "        plt.ticklabel_format(style='sci', axis='x', scilimits=(0,0))\n",
275 |     "        plt.grid('on')\n",
276 |     "\n",
277 |     "        plt.subplot(num_classes+1,2,2)\n",
278 |     "        plt.cla()\n",
279 |     "        plt.title('Latent space')\n",
280 |     "        plt.xlabel('z0'), plt.ylabel('z1')\n",
281 |     "        color = iter(plt.get_cmap('brg')(np.linspace(0, 1.0, num_classes)))\n",
282 |     "        for i in range(num_classes):\n",
283 |     "            clr = next(color)\n",
284 |     "            plt.scatter(eval_z[targets_valid==i, 0], eval_z[targets_valid==i, 1], c=clr, s=5., lw=0, marker='o', )\n",
285 |     "        plt.grid('on')\n",
286 |     "        \n",
287 |     "        c=0\n",
288 |     "        for k in range(3, 3 + num_classes*2, 2):\n",
289 |     "            plt.subplot(num_classes+1,2,k)\n",
290 |     "            plt.cla()\n",
291 |     "            plt.title('Inputs for %i' % c)\n",
292 |     "            plt.axis('off')\n",
293 |     "            idx = 0\n",
294 |     "            canvas = np.zeros((28*10, 10*28))\n",
295 |     "            for i in range(10):\n",
296 |     "                for j in range(10):\n",
297 |     "                    canvas[i*28:(i+1)*28, j*28:(j+1)*28] = x_valid[targets_valid==c][idx].reshape((28, 28))\n",
298 |     "                    idx += 1\n",
299 |     "            plt.imshow(canvas, cmap='gray')\n",
300 |     "            \n",
301 |     "            plt.subplot(num_classes+1,2,k+1)\n",
302 |     "            plt.cla()\n",
303 |     "            plt.title('Reconstructions for %i' % c)\n",
304 |     "            plt.axis('off')\n",
305 |     "            idx = 0\n",
306 |     "            canvas = np.zeros((28*10, 10*28))\n",
307 |     "            for i in range(10):\n",
308 |     "                for j in range(10):\n",
309 |     "                    canvas[i*28:(i+1)*28, j*28:(j+1)*28] = eval_out[targets_valid==c][idx].reshape((28, 28))\n",
310 |     "                    idx += 1\n",
311 |     "            plt.imshow(canvas, cmap='gray')\n",
312 |     "            c+=1\n",
313 |     "      \n",
314 |     "        \n",
315 |     "        plt.savefig(\"out51.png\")\n",
316 |     "        display(Image(filename=\"out51.png\"))\n",
317 |     "        clear_output(wait=True)\n",
318 |     "        \n",
319 |     "except KeyboardInterrupt:\n",
320 |     "    pass\n",
321 |     "    "
322 |    ]
323 |   },
324 |   {
325 |    "cell_type": "markdown",
326 |    "metadata": {
327 |     "collapsed": true
328 |    },
329 |    "source": [
330 |     "### Exercise 1 - Analyzing the AE\n",
331 |     "1. The above implementation of an AE is very simple.\n",
332 |     "    - *Experiment with the number of layers and non-linearities in order to improve the reconstructions.*\n",
333 |     "    - *What happens with the network when we change the non-linearities in the latent layer (e.g. sigmoid)?*\n",
334 |     "    - *Try to increase the number of digit classes in the training set and analyze the results.*\n",
335 |     "    - *Test different optimization algorithms and decide whether you should use regularizers*.\n",
336 |     "       \n",
337 |     "2. Currently we optimize w.r.t. mean squared error. \n",
338 |     "    - *Find another error function that could fit this problem better.* \n",
339 |     "    - *Evaluate whether the error function is a better choice and explain your findings.*\n",
340 |     "\n",
341 |     "3. Complexity of the bottleneck.\n",
342 |     "    - *Increase the number of units in the latent layer and train.*\n",
343 |     "    - *Visualize by using [PCA](http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html) or [t-SNE](http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html).*"
344 |    ]
345 |   },
346 |   {
347 |    "cell_type": "markdown",
348 |    "metadata": {
349 |     "collapsed": true
350 |    },
351 |    "source": [
352 |     "### Exercise 2 - Adding classification (for the ambitious)\n",
353 |     "The above training has been performed unsupervised. Now let us assume that we only have a fraction of labeled data points from each class (implemented below). As we know, semi-supervised learning can be utilized by combining unsupervised and supervised learning. Now you must analyze whether a trained AE from the above exercise can aid a classifier.\n",
354 |     "\n",
355 |     "1. Build a simple classifier (like the ones from week1) where you:\n",
356 |     "    - *Train on the labeled dataset and evaluate the results.*\n",
357 |     "2. Build a second classifier and train on the latent output $z$ of the AE.\n",
358 |     "3. Build a third classifier and train on the reconstructions of the AE.\n",
359 |     "4. Evaluate the classifiers against each other and implement a model that improves the classification by combining the input, latent output and reconstruction."
360 |    ]
361 |   },
362 |   {
363 |    "cell_type": "code",
364 |    "execution_count": null,
365 |    "metadata": {
366 |     "collapsed": false
367 |    },
368 |    "outputs": [],
369 |    "source": [
370 |     "# Generate a subset of labeled data points\n",
371 |     "\n",
372 |     "num_labeled = 10 # You decide on the size of the fraction...\n",
373 |     "\n",
374 |     "def onehot(t, num_classes):\n",
375 |     "    out = np.zeros((t.shape[0], num_classes))\n",
376 |     "    for row, col in enumerate(t):\n",
377 |     "        out[row, col] = 1\n",
378 |     "    return out\n",
379 |     "\n",
380 |     "idxs_train_l = []\n",
381 |     "for i in range(num_classes):\n",
382 |     "    idxs = np.where(targets_train == i)[0]\n",
383 |     "    idxs_train_l += np.random.choice(idxs, size=num_labeled).tolist()\n",
384 |     "\n",
385 |     "x_train_l = x_train[idxs_train_l]\n",
386 |     "targets_train_l = targets_train[idxs_train_l]\n",
387 |     "print(\"labeled training set dim(%i, %i).\" % x_train_l.shape)\n",
388 |     "\n",
389 |     "plt.figure(figsize=(12, 7))\n",
390 |     "for i in range(num_classes*num_labeled):\n",
391 |     "    im = x_train_l[i].reshape((28, 28))\n",
392 |     "    plt.subplot(1, num_classes*num_labeled, i + 1)\n",
393 |     "    plt.imshow(im, cmap='gray')\n",
394 |     "    plt.axis('off')"
395 |    ]
396 |   }
397 |  ],
398 |  "metadata": {
399 |   "kernelspec": {
400 |    "display_name": "Python 2",
401 |    "language": "python",
402 |    "name": "python2"
403 |   },
404 |   "language_info": {
405 |    "codemirror_mode": {
406 |     "name": "ipython",
407 |     "version": 2
408 |    },
409 |    "file_extension": ".py",
410 |    "mimetype": "text/x-python",
411 |    "name": "python",
412 |    "nbconvert_exporter": "python",
413 |    "pygments_lexer": "ipython2",
414 |    "version": "2.7.6"
415 |   }
416 |  },
417 |  "nbformat": 4,
418 |  "nbformat_minor": 0
419 | }
420 | 


--------------------------------------------------------------------------------