├── .gitignore ├── LICENSE ├── README.md ├── cifar10.py ├── facial_recognition ├── __init__.py ├── network.py └── util.py ├── img ├── ConvolutionalNeuralNetworks_11_1.png ├── ConvolutionalNeuralNetworks_12_1.png ├── ConvolutionalNeuralNetworks_5_0.png ├── ConvolutionalNeuralNetworks_7_1.png ├── ConvolutionalNeuralNetworks_9_2.png ├── cnn_layer.png ├── convolution_schematic.gif ├── dropout.jpeg ├── face.jpg ├── mlp.png ├── obama.jpg ├── overfitting.png ├── pooling_schematic.gif ├── shared_weights.png └── sparse_connectivity.png └── sample_solution ├── __init__.py └── sample_cnn.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pkl 2 | data/* 3 | *.iml 4 | .idea/* 5 | __pycache__/ -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "{}" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright 2017, Alfredo Clemente 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | # Convolutional Neural Networks 3 | A convolutional neural network, or convnet, is a type of feed-forward artifcial neural network that is used to process grid structured data, for example 2D images, and time series data. Convnets obtain their name from the fact that they use convolution instead of matrix mutiplication on at least one of its layers. Convolutional neural networks are currently the best techniques for image and sound processing. 4 | 5 | ![Multi layer perceptron](img/cnn_layer.png) 6 |
Image taken from the Stanford convolutoinal network tutorial
7 | 8 | The basic building block of convolutional neural networks are convolutional layers. A convolutional layer is defined by a set of filters, which are then applied to the input image to produce an output. 9 | 10 | In the above image neurons of the same color are the same filter, there are several neurons with the same color because the filter is applied to the whole image. Each filter creates an output of size WxHx1 called a feature map. The feature maps from all the filters are then stacked together to construct a WxHxF output volume, where F is the amount of filters in the layer. 11 | 12 | The input to a convolutional layer is also a volume. The input volume is of size WxHxD where W is the width dimension of the input, H is the height dimension and D is the feature dimension. In the above image there is a single feature dimension, and it represents the greyscale value of the pixel. 13 | 14 | Given that convolutional layers both take an volume as an input, and give an volume as an output they can be stacked one after another, and this is the general architecture of a convolutional neural network. In practice however, convnets include also pooling layers, and fully connected layers. 15 | 16 | If we used filters that spanned the entirety of the input image, our convnet would be equivalent to a normal fully connected network. 17 | 18 | ## Sparse connectivity 19 | 20 | ![Multi layer perceptron](img/mlp.png) 21 |
Image taken from the DeepLearning.net MLP tutorial
22 | 23 | In traditional multi-layer perceptrons as shown above, all layers are fully connected to the next layer. This means that a network in which the input layer has **100x100** neurons, and the second layer has **64** neurons will require **640000** weights. 24 | 25 | ![Sparse connectivity](img/sparse_connectivity.png) 26 |
Image taken from the DeepLearning.net LeNet tutorial
27 | 28 | On the other hand convnets are sparsely connected, meaning that each neuron in a layer is only connected to a subset of the previous layer. Using the same example as above, a network with **100x100** input neurons and with **64** neurons on the following layer would only need **64xK** weights, where **K** is the filter size. Filters are usually small, 2x2, 3x3 5x5, etc. 29 | 30 | ## Weight sharing 31 | 32 | ![shared_weights](img/shared_weights.png) 33 |
Image taken from the DeepLearning.net LeNet tutorial
34 | 35 | A convolutional layer is composed of several sets of neurons, where each set is restricted to having the same weights. In the image above layer m is composed of a set of weights for all three neurons, same color lines represent same weights. Each neuron above can be though of as applying a 1x3 filter, and producing a 1x1 output. When we consider the three neurons together, they apply the same 1x3 filter on three different locations of the image and produce a 1x3 feature map. 36 | 37 | In practice, convolutional layers usually have several of these sets, and each set produces its own feature map. 38 | 39 | ## Convolution 40 | 41 | A convolution is mathematical operation of two functions that produces a a third function, which can be viewed as a modification of one of the original functions. 42 | A discrete convolution is defined as $$(f * g)[i]=\sum_{m=0}^{i-1}f[m]g[i-m]$$ 43 | Then the two dimentional discrete convolution is $$(f * g)[i, j]=\sum_{m=0}^{i-1}\sum_{n=0}^{j-1}f[m, n]g[i-m, j-n]$$ 44 | In the case of a convnet $g[i,j]$ represents the input at location i,j, while $f[i,j]$ represents the kernel weights connected to the input at location i, j. 45 | 46 | ## Pooling 47 | 48 | ![shared_weights](img/pooling_schematic.gif) 49 |
Animation taken from the Stanford deep learning tutorial
50 | 51 | In adition to convolutional layers, convnets usually have pooling layers. Pooling is performing a operation, usually taking the maximum, on a subset of the image at a time creating a feature map. Pooling layers serve two main purposes, one is to provide some translational invariance of features, and the other is to reduce the dimentionality of the layers. 52 | 53 | The translational invariance comes from the fact that pooling assigns a single value to a a subset of the input. For example if we have a 5x5 max pooling neuron and the maximum value its on the top left corner of its input, if the input is translated to the right by four units the pooling neuron will still give the same output. 54 | 55 | The dimentionality reduction play a big role in keeping convnets tractable, for example a 2x2 max pooling layer will reduce the size of its input by a factor of 2x2 = 4. 56 | 57 | 58 | ```python 59 | %matplotlib inline 60 | 61 | import util 62 | from nolearn.lasagne import visualize 63 | import numpy as np 64 | ``` 65 | 66 | DEBUG: nvcc STDOUT mod.cu 67 | Creating library C:/Users/erpa_/AppData/Local/Theano/compiledir_Windows-10-10.0.14295-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.4.4-64/tmp3cbxfn57/m91973e5c136ea49268a916ff971b7377.lib and object C:/Users/erpa_/AppData/Local/Theano/compiledir_Windows-10-10.0.14295-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.4.4-64/tmp3cbxfn57/m91973e5c136ea49268a916ff971b7377.exp 68 | 69 | Using gpu device 0: GeForce GTX 980 Ti (CNMeM is disabled, cuDNN 5004) 70 | 71 | 72 | To better visualize how convolution works, and what kinds of filters are learned we will disect a fully trained convnet with over 8 million parameters trained in the task of detecting facial features. The network architecture is taken from [Daniel Nouri](http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/) and trained for ten hours on kaggle's [facial keypoint detection data](https://www.kaggle.com/c/facial-keypoints-detection). 73 | 74 | The network is created using Lasagne, a python wrapper around Theano. 75 | 76 | The network is composed of three alternating convolution and max-pooling layers, followed by two fully connected layers. The network is trained on about 2000 96x96 black and white images, and outputs the x and y location of 15 different facial keypoins. For example left_eye_center, right_eye_center, nose_tip, mouth_center_bottom, etc. 77 | 78 | Below is the definition of the network used here. Some details of the network, such as dropout, input augmentation, etc. have been ommitted for clarity. 79 | 80 | 81 | 82 | ```python 83 | from nolearn.lasagne import NeuralNet 84 | from lasagne.layers import InputLayer, Conv2DLayer, MaxPool2DLayer, DenseLayer 85 | 86 | convnet = NeuralNet( 87 | layers=[ 88 | ('input', InputLayer), 89 | ('conv1', Conv2DLayer), 90 | ('pool1', Conv2DLayer), 91 | ('conv2', Conv2DLayer), 92 | ('pool2', MaxPool2DLayer), 93 | ('conv3', Conv2DLayer), 94 | ('pool3', MaxPool2DLayer), 95 | ('hidden4', DenseLayer), 96 | ('hidden5', DenseLayer), 97 | ('output', DenseLayer), 98 | ], 99 | input_shape=(None, 1, 96, 96), 100 | conv1_num_filters=32, conv1_filter_size=(3, 3), pool1_pool_size=(2, 2), 101 | conv2_num_filters=64, conv2_filter_size=(2, 2), pool2_pool_size=(2, 2), 102 | conv3_num_filters=128, conv3_filter_size=(2, 2), pool3_pool_size=(2, 2), 103 | hidden4_num_units=1000, 104 | hidden5_num_units=1000, 105 | output_num_units=30, output_nonlinearity='none', 106 | regression=True, 107 | ) 108 | ``` 109 | 110 | First we load the fully trained network. It trained for 3000 epochs, which took around 10 hours on my GTX 570. 111 | 112 | 113 | ```python 114 | net = util.unpickle_network("../networks/n7.pkl") 115 | util.visualize_predictions(net) 116 | ``` 117 | 118 | 119 | ![png](img/ConvolutionalNeuralNetworks_5_0.png) 120 | 121 | 122 | As can be shown above, the network does a very good job of locating the 15 different facial keypoints. In order to understand how the network does this, we will open it up and see how this is done. 123 | 124 | ## The first layer 125 | 126 | The first layer of a convnet is different from all other layers in the sense that it is the only layer that works in the same dimensions and representation as the input. The output of the first layer represents how similar a subset of the image is to each filter. In our case the first pixel of the first feature map represents how similar the first 3x3 square of the input is to the first filter. Then the feature map represents how similar each 3x3 square of the image is to the filter. This is then done for all 32 filters, resulting in 32 filter maps. 127 | 128 | ![shared_weights](img/convolution_schematic.gif) 129 |
Animation taken from the Stanford deep learning tutorial
130 | 131 | The first layer takes as its input a 1x96x96 volume, where the one represents that there is only one dimension per pixel, in our case 0-1 grayscale. If the input were an RGB image it would be a 3x96x96 volume. The layer then maps the input from the greyscale dimension to a feature space of 32 features where each pixel has 32 dimensions, one for each feature. The result is then a 32x96x96 volume. 132 | 133 | The reduction of the image size after a convolution is not addressed here (above 5x5 to 3x3), but in practice it is usually ignored or remediated with zero-padding. 134 | 135 | 136 | ```python 137 | visualize.plot_conv_weights(net.layers_['conv1']) 138 | ``` 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | ![png](img/ConvolutionalNeuralNetworks_7_1.png) 149 | 150 | 151 | Above are all 32 3x3 filters learned by the network on the first layer. The weights of the filters are individually scaled to the interval 0 - 255 for visualization. 152 | 153 | Feature maps are then created with the above filters by convolving them with the image. 154 | 155 | 156 | ```python 157 | # we load the data in a 2d representation 158 | x, _ = util.load2d(util.FTEST) 159 | visualize.plot_conv_activity(net.layers_['conv1'], x[0:1, 0:1, : ,:]) 160 | ``` 161 | 162 | DEBUG: nvcc STDOUT mod.cu 163 | Creating library C:/Users/erpa_/AppData/Local/Theano/compiledir_Windows-10-10.0.14295-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.4.4-64/tmprt73od52/m9a6bd0eb5ed5c92e91261282fc495cb4.lib and object C:/Users/erpa_/AppData/Local/Theano/compiledir_Windows-10-10.0.14295-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.4.4-64/tmprt73od52/m9a6bd0eb5ed5c92e91261282fc495cb4.exp 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | ![png](img/ConvolutionalNeuralNetworks_9_2.png) 176 | 177 | 178 | Above are the feature maps created by the first layer of the network when given an image. In this case black represents a high activation value. Each feature map is the result of convolving the original image with one of the filters. The different feature maps are tuned to recognize different features. For example some detect the nostrils, or the contours of the face, or the contours of the eyes. 179 | 180 | # Variable size input 181 | 182 | The parametrers of the convolutional layers depend only the the filter size and amount of filters, 32 neurons with 3x3 filters require 32x3x3 parameters (in addition to the bias). This means that a convolutional layer can be given any image size as an input and will give an output in which the output dimensions are proportionate to the input dimensions. Below we take the feature detectors from the first layer of our network and apply them to an image of size 313x250. 183 | 184 | This however, does not mean that we can simply supply our full convnet with any size image and expect an output. The final layers of the network are fully connected, which means that must have a fixed size input. This limitation can be solved by adding special pooling layers before the fully connected layes that reduce the dimensions to a fixed size. 185 | 186 | 187 | ```python 188 | img = util.load_image('img/obama.jpg') 189 | print("Image dimensions: ",img.shape) 190 | util.show_images([img]) 191 | ``` 192 | 193 | Image dimensions: (313, 250) 194 | 195 | 196 | 197 | ![png](img/ConvolutionalNeuralNetworks_11_1.png) 198 | 199 | 200 | 201 | ```python 202 | # the image is loaded with values from 0 to 255, we must scale them down to 0 - 1 203 | img /= 255.0 204 | visualize.plot_conv_activity(net.layers_['conv1'], img[None, None, :, :]) 205 | ``` 206 | 207 | 208 | 209 | 210 | 211 | 212 | 213 | 214 | 215 | ![png](img/ConvolutionalNeuralNetworks_12_1.png) 216 | 217 | 218 | ## The inner convolutional layers 219 | 220 | Unlike the first convolutional layer, the inner convolutional layers no longer work in the pixel space. They work on the feature space created by the previous layer. They map the input volume to a new feature space defined by that layer's feature detectors. 221 | Much like how the first layer transforms the input image into a set of features of the image, the inner layers transform their input volumes into a set of features of the features from the previous layer. Because of this a hierarchical structure of features is formed, where the deeper layers detect combinations of features from the lower layers. 222 | 223 | Once the input image has passed through all the convolutional and pooling layers it can be thought of as if it were mapped from the pixel space to the feature space. Where the image is no longer composed of grayscale pixels, but by combinations of high and low level features at different locations. 224 | 225 | ## The fully connected layers 226 | 227 | The final pooling layer of our network outputs a 128x11x11 volume that representing the features of the input image. These features must then be analyzed in order to give to the 30 outputs of the network corresponding to the x and y locations of the facial features. 228 | 229 | In practice it is common to take the output of a set of convolutional and pooling layers, and use it as the input of a fully connected network. The fully connected network can then process the features and give the required output. 230 | 231 | ## Model reusability 232 | 233 | One big advantage of this kind of convolutional network architecture is that, once the convolutional layers are fully trained it outputs feature representation of its input. These feature representations do not have to be used exclusively for the task they were trained for. 234 | 235 | The convolutional layers can be deatached from the fully connected layers they are trained with, and atached to a new set of fully connected layers that can then be trained to perform a new task. By doing this, the network does not have to learn how to extract features from images and can learn classification or regression on these features much faster. 236 | 237 | This reausability has led to very efficient models that take months to train to be published for [download](https://github.com/BVLC/caffe/wiki/Model-Zoo). They can then be atached to new dense layers to perform state of the art classification or regression. 238 | 239 | # Practical 240 | 241 | ## Prerequisites 242 | In order to do this practical, you must have lasagne, and theano installed. This can be tricky but if you follow the right guide it will probaby work. Here are some guides for [Windows](https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne-on-Windows-7-%2864-bit%29), [Ubuntu](https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne-on-Ubuntu-14.04) and a [general guide](http://lasagne.readthedocs.org/en/latest/user/installation.html). 243 | I would also recommend to get theano to run on your GPU if you have one, it will make everything _much_ faster. 244 | 245 | You can download the code for this practical from https://github.com/Alfredvc/cnn_workshop. In the same project under the folder *facial_recognition* is the network for facial feature recognition presented in the first part. 246 | 247 | ## The task 248 | 249 | We will create a convolutional neural network to recognize images from the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html). The CIFAR-10 dataset consists of several thousand 32x32 color images that can be classified into 10 different categories. Given our limited time and compute resources we will only be working with a subset of the dataset. 250 | 251 | Inside the **cifar10.py** file there are several utility functions for downloading, and visualizing the results. In addition there is a **build_cnn()** function, this is the function that configures the conv net, and is the function we will be modifying. 252 | 253 | Currently the network we will improve achieves around 57% test accuracy after 15 epochs of training, and is overfitting quite strongly. 254 | 255 | ## How do we do this? 256 | 257 | One way to approach the tuning of a particular architecture of neural networks, is by first making the network overfit on the training data, and adding regularization to atenuate the overfitting. 258 | 259 | Since our network is already overfitting, we can begin by regularizing it. And then increasing its capacity, and the regularizing again. This cycle is repeated until the desired accuracy is achieved. 260 | 261 | ### Overfitting 262 | Overfitting is when the network has low error on the training data and high error on the test data. This tends to happen because the network learns to "remember" the training data and what it should give as output, instead of learning the patterns in the data that lead to the outputs. This then generalizes poorly on the test data, which it has never seen. 263 | ![shared_weights](img/overfitting.png) 264 |
Image taken from the wikipedia article on overfitting
265 | 266 | An easy way of visualizing is with linear regression with polynomials. The above points are sampled from the distribution given by the black line, and then some noise is added to them, here we attempt to predict a point's y position given its x position. If we attempt to fit the points with a degree 1 polynomial (a line), then we can make very good predictions on new points that we have not seen. If instead we use a degree 11 polynomial as shown above, we will get a training error of zero as the line goes through all points, however when we get a new x value, our prediction will be completely wrong. In the world of polynomials we can still model the above data with a degree 11 polynomial, however strong regularization must be applied for it to work well, for example the polynomial could model the seen distribution of the error about the line. 267 | 268 | The way we achieve overfitting in neural netwoks is the same as with polynomials, we simply increase the number of parameters. In the case of the convolutional layers, we can increase the size of the filter, the amount of filters per layers, the amount of layers. For the fully connected layers, we can increase the number of neurons per layer, or the number of layers. 269 | 270 | 271 | ### Regularization by dropout 272 | ![shared_weights](img/dropout.jpeg) 273 |
Image taken from the Stanford class CS231n webpage
274 | 275 | Regularization is the process by which we help the network obtain a better generalization by forcing some contstraint on the network. Dropout is a powerfull and now common regularization technique developed by [researchers at the University of Toronto](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf). The idea behind dropout is simple, during training we randomly deactivate some neurons, and during evaluation we activate all the neurons. The reasoning behind this is to prevent neurons to collectively "remember" the inputs since the probability of a set of neurons being active all together more than once is highly unlikely. Instead knowledge must be distributed, and the network must rely on the detection of patterns. 276 | 277 | ## Suggested solution 278 | 279 | **sample_cnn.py** contains a cnn architecture that i have improved with the techniques mentioned above, and by increasing the network's capacity. It can achieve a 67% classification rate on the test data after 15 epochs. 280 | You can construct this network directly by calling `main(model='suggested_cnn')`. 281 | 282 | # Further reading 283 | 284 | If you liked this topic and would like to learn more about it you can take a look at the reference section. I would personally recommend [Andrej Karpathy's lectures on convnets](https://www.youtube.com/watch?v=NfnWJUyUJYU&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC), and [this](http://www.deeplearningbook.org) upcomming deep learning book. 285 | 286 | # References 287 | 288 | * http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/ 289 | * http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/ 290 | * http://deeplearning.net/tutorial/mlp.html 291 | * http://deeplearning.net/tutorial/lenet.html 292 | * https://www.coursera.org/course/neuralnets 293 | * https://en.wikipedia.org/wiki/Convolution 294 | * http://lasagne.readthedocs.org/en/latest/user/tutorial.html 295 | * https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne-on-Windows-7-%2864-bit%29 296 | * http://deeplearning.stanford.edu/tutorial/ 297 | * https://en.wikipedia.org/wiki/Affine_transformation 298 | * https://github.com/BVLC/caffe/wiki/Model-Zoo 299 | * http://cs231n.github.io/neural-networks-2/ 300 | * https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf 301 | * https://www.youtube.com/watch?v=NfnWJUyUJYU&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC 302 | * http://www.deeplearningbook.org Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, 2016 303 | -------------------------------------------------------------------------------- /cifar10.py: -------------------------------------------------------------------------------- 1 | import pickle 2 | import os.path 3 | import sys 4 | import numpy as np 5 | import lasagne 6 | import theano 7 | import theano.tensor as T 8 | import time 9 | from sample_solution import sample_cnn 10 | 11 | from matplotlib import pyplot as plt 12 | if sys.version_info[0] == 2: 13 | from urllib import urlretrieve 14 | else: 15 | from urllib.request import urlretrieve 16 | 17 | 18 | def pkl(file_name, object): 19 | with open(file_name, 'wb') as f: 20 | pickle.dump(object, f, -1) 21 | 22 | 23 | def un_pkl_l(file_name): 24 | with open(file_name, 'rb') as f: 25 | return pickle.load(f, encoding='latin1') 26 | 27 | 28 | def un_pkl(file_name): 29 | with open(file_name, 'rb') as f: 30 | return pickle.load(f, encoding='latin1') 31 | 32 | 33 | def make_image(X): 34 | im = np.swapaxes(X.T, 0, 1) 35 | im = im - im.min() 36 | im = im * 1.0 / im.max() 37 | return im 38 | 39 | 40 | def show_images(data, predicted, labels, classes): 41 | plt.figure(figsize=(16, 5)) 42 | for i in range(0, 10): 43 | plt.subplot(1, 10, i+1) 44 | plt.imshow(make_image(data[i]), interpolation='nearest') 45 | true = classes[labels[i]] 46 | pred = classes[predicted[i]] 47 | color = 'green' if true == pred else 'red' 48 | plt.text(0, 0, true, color='black', bbox=dict(facecolor='white', alpha=1)) 49 | plt.text(0, 32, pred, color=color, bbox=dict(facecolor='white', alpha=1)) 50 | 51 | plt.axis('off') 52 | 53 | DATA = 'data.pkl' 54 | 55 | 56 | def load_file(file): 57 | def url(file): 58 | if file is DATA: 59 | return 'http://folk.ntnu.no/alfredvc/workshop/data/data.pkl' 60 | 61 | def download(file): 62 | print("Downloading %s" % file) 63 | urlretrieve(url(file), file) 64 | 65 | if not os.path.exists(file): 66 | download(file) 67 | return un_pkl_l(file) 68 | 69 | 70 | def iterate_minibatches(inputs, targets, batchsize, shuffle=False): 71 | assert len(inputs) == len(targets) 72 | if shuffle: 73 | indices = np.arange(len(inputs)) 74 | np.random.shuffle(indices) 75 | for start_idx in range(0, len(inputs) - batchsize + 1, batchsize): 76 | if shuffle: 77 | excerpt = indices[start_idx:start_idx + batchsize] 78 | else: 79 | excerpt = slice(start_idx, start_idx + batchsize) 80 | yield inputs[excerpt], targets[excerpt] 81 | 82 | 83 | def build_cnn(input_var=None): 84 | network = lasagne.layers.InputLayer(shape=(None, 3, 32, 32), 85 | input_var=input_var) 86 | network = lasagne.layers.Conv2DLayer( 87 | network, num_filters=32, filter_size=(3, 3), 88 | nonlinearity=lasagne.nonlinearities.rectify, 89 | pad='same') 90 | 91 | network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2)) 92 | 93 | network = lasagne.layers.DenseLayer( 94 | network, 95 | num_units=128, 96 | nonlinearity=lasagne.nonlinearities.rectify) 97 | 98 | network = lasagne.layers.DenseLayer( 99 | network, 100 | num_units=10, 101 | nonlinearity=lasagne.nonlinearities.softmax) 102 | 103 | return network 104 | 105 | 106 | def main(model='cnn', num_epochs=10): 107 | # Load the dataset 108 | print("Loading data...") 109 | X_train, y_train, X_test, y_test, classes = load_file(DATA) 110 | 111 | # Prepare Theano variables for inputs and targets 112 | input_var = T.tensor4('inputs') 113 | target_var = T.ivector('targets') 114 | 115 | # Create neural network model (depending on first command line parameter) 116 | print("Building model and compiling functions...") 117 | if model == 'cnn': 118 | network = build_cnn(input_var) 119 | elif model == 'suggested_cnn': 120 | network = sample_cnn.build_cnn(input_var) 121 | else: 122 | print("Unrecognized model type %r." % model) 123 | return 124 | 125 | # Create a loss expression for training, i.e., a scalar objective we want 126 | # to minimize (for our multi-class problem, it is the cross-entropy loss): 127 | prediction = lasagne.layers.get_output(network) 128 | loss = lasagne.objectives.categorical_crossentropy(prediction, target_var) 129 | loss = loss.mean() 130 | # We could add some weight decay as well here, see lasagne.regularization. 131 | 132 | params = lasagne.layers.get_all_params(network, trainable=True) 133 | updates = lasagne.updates.adam(loss, params, learning_rate=0.001) 134 | 135 | # Create a loss expression for validation/testing. The crucial difference 136 | # here is that we do a deterministic forward pass through the network, 137 | # disabling dropout layers. 138 | test_prediction = lasagne.layers.get_output(network, deterministic=True) 139 | test_loss = lasagne.objectives.categorical_crossentropy(test_prediction, 140 | target_var) 141 | test_loss = test_loss.mean() 142 | # As a bonus, also create an expression for the classification accuracy: 143 | test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var), 144 | dtype=theano.config.floatX) 145 | 146 | # Compile a function performing a training step on a mini-batch (by giving 147 | # the updates dictionary) and returning the corresponding training loss: 148 | train_fn = theano.function([input_var, target_var], loss, updates=updates) 149 | 150 | # Compile a second function computing the validation loss and accuracy: 151 | val_fn = theano.function([input_var, target_var], [test_loss, test_acc]) 152 | 153 | # Compile a third function computing a prediction 154 | eval_fn = theano.function([input_var], [T.argmax(test_prediction, axis=1)]) 155 | 156 | # Finally, launch the training loop. 157 | print("Starting training...") 158 | # We iterate over epochs: 159 | training_error = [] 160 | test_error = [] 161 | test_accuracy = [] 162 | for epoch in range(num_epochs): 163 | # In each epoch, we do a full pass over the training data: 164 | train_err = 0 165 | train_batches = 0 166 | start_time = time.time() 167 | for batch in iterate_minibatches(X_train, y_train, 64, shuffle=True): 168 | inputs, targets = batch 169 | train_err += train_fn(inputs, targets) 170 | train_batches += 1 171 | 172 | # And a full pass over the validation data: 173 | val_err = 0 174 | val_acc = 0 175 | val_batches = 0 176 | for batch in iterate_minibatches(X_test, y_test, 500, shuffle=False): 177 | inputs, targets = batch 178 | err, acc = val_fn(inputs, targets) 179 | val_err += err 180 | val_acc += acc 181 | val_batches += 1 182 | 183 | # Then we print the results for this epoch: 184 | print("Epoch {} of {} took {:.3f}s".format( 185 | epoch + 1, num_epochs, time.time() - start_time)) 186 | print(" training loss:\t\t{:.6f}".format(train_err / train_batches)) 187 | print(" validation loss:\t\t{:.6f}".format(val_err / val_batches)) 188 | print(" validation accuracy:\t\t{:.2f} %".format( 189 | val_acc / val_batches * 100)) 190 | 191 | training_error.append(train_err / train_batches) 192 | test_error.append(val_err / val_batches) 193 | test_accuracy.append(val_acc / val_batches) 194 | 195 | data = X_test[123:133] 196 | labels = y_test[123:133] 197 | predicted = eval_fn(data)[0] 198 | show_images(data, predicted, labels, classes) 199 | fig, ax1 = plt.subplots() 200 | ax1.plot(training_error, color='b', label='Training error') 201 | ax1.plot(test_error, color='g', label='Test error') 202 | ax2 = ax1.twinx() 203 | ax2.plot(test_accuracy, color='r', label='Test accuracy') 204 | ax1.legend(loc='upper left', numpoints=1) 205 | ax2.legend(loc='upper right', numpoints=1) 206 | plt.xlabel("Epoch") 207 | 208 | plt.show() 209 | 210 | 211 | 212 | # Optionally, you could now dump the network weights to a file like this: 213 | # np.savez('model.npz', *lasagne.layers.get_all_param_values(network)) 214 | # 215 | # And load them again later on like this: 216 | # with np.load('model.npz') as f: 217 | # param_values = [f['arr_%d' % i] for i in range(len(f.files))] 218 | # lasagne.layers.set_all_param_values(network, param_values) 219 | main(num_epochs=15) 220 | -------------------------------------------------------------------------------- /facial_recognition/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/facial_recognition/__init__.py -------------------------------------------------------------------------------- /facial_recognition/network.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import facial_recognition.util as util 3 | from lasagne import layers 4 | from nolearn.lasagne import NeuralNet 5 | from facial_recognition.util import AdjustVariable 6 | from facial_recognition.util import EarlyStopping 7 | from facial_recognition.util import FlipBatchIterator 8 | from facial_recognition.util import float32 9 | 10 | try: 11 | from lasagne.layers.cuda_convnet import Conv2DCCLayer as Conv2DLayer 12 | from lasagne.layers.cuda_convnet import MaxPool2DCCLayer as MaxPool2DLayer 13 | except ImportError: 14 | Conv2DLayer = layers.Conv2DLayer 15 | MaxPool2DLayer = layers.MaxPool2DLayer 16 | 17 | 18 | def get_net(): 19 | return NeuralNet( 20 | layers=[ 21 | ('input', layers.InputLayer), 22 | ('conv1', Conv2DLayer), 23 | ('pool1', MaxPool2DLayer), 24 | ('dropout1', layers.DropoutLayer), 25 | ('conv2', Conv2DLayer), 26 | ('pool2', MaxPool2DLayer), 27 | ('dropout2', layers.DropoutLayer), 28 | ('conv3', Conv2DLayer), 29 | ('pool3', MaxPool2DLayer), 30 | ('dropout3', layers.DropoutLayer), 31 | ('hidden4', layers.DenseLayer), 32 | ('dropout4', layers.DropoutLayer), 33 | ('hidden5', layers.DenseLayer), 34 | ('output', layers.DenseLayer), 35 | ], 36 | input_shape=(None, 1, 96, 96), 37 | conv1_num_filters=32, conv1_filter_size=(3, 3), pool1_pool_size=(2, 2), 38 | dropout1_p=0.1, 39 | conv2_num_filters=64, conv2_filter_size=(2, 2), pool2_pool_size=(2, 2), 40 | dropout2_p=0.2, 41 | conv3_num_filters=128, conv3_filter_size=(2, 2), pool3_pool_size=(2, 2), 42 | dropout3_p=0.3, 43 | hidden4_num_units=1000, 44 | dropout4_p=0.5, 45 | hidden5_num_units=1000, 46 | output_num_units=30, output_nonlinearity=None, 47 | 48 | update_learning_rate=theano.shared(float32(0.03)), 49 | update_momentum=theano.shared(float32(0.9)), 50 | 51 | regression=True, 52 | batch_iterator_train=FlipBatchIterator(batch_size=128), 53 | on_epoch_finished=[ 54 | AdjustVariable('update_learning_rate', start=0.03, stop=0.0001), 55 | AdjustVariable('update_momentum', start=0.9, stop=0.999), 56 | EarlyStopping(patience=200), 57 | ], 58 | max_epochs=3000, 59 | verbose=1, 60 | ) 61 | 62 | 63 | def train_network(net, save_name=''): 64 | print("Loading data...") 65 | X, y = util.load2d(util.FTRAIN) 66 | print("Building network...") 67 | print("Started training...") 68 | net.fit(X, y) 69 | print("Finished training...") 70 | print("Saving network...") 71 | util.pickle_network(save_name + ".pkl", net) 72 | util.visualize_learning(net) 73 | 74 | 75 | def load_and_visualize_network(file): 76 | print("Loading data...") 77 | X, y = util.load2d(util.FTEST) 78 | print("Loading model...") 79 | net = util.unpickle_network(file) 80 | print("Finished training...") 81 | # util.visualize_learning(net) 82 | util.visualize_predictions(net) 83 | 84 | net = get_net() 85 | 86 | train_network(net, "net") -------------------------------------------------------------------------------- /facial_recognition/util.py: -------------------------------------------------------------------------------- 1 | import pickle as pickle 2 | from nolearn.lasagne import BatchIterator 3 | 4 | from datetime import datetime 5 | from pandas import DataFrame 6 | from pandas.io.parsers import read_csv 7 | import numpy as np 8 | from PIL import Image 9 | from matplotlib import pyplot 10 | from scipy.ndimage.filters import convolve 11 | from math import ceil 12 | import theano.tensor as T 13 | import theano 14 | from lasagne.layers import get_output 15 | from scipy.ndimage import rotate 16 | import sys 17 | import os 18 | import zipfile 19 | if sys.version_info[0] == 2: 20 | from urllib import urlretrieve 21 | else: 22 | from urllib.request import urlretrieve 23 | 24 | 25 | FTRAIN = 'data/training.csv' 26 | FTEST = 'data/test.csv' 27 | FLOOKUP = 'data/IdLookupTable.csv' 28 | 29 | 30 | def float32(k): 31 | return np.cast['float32'](k) 32 | 33 | 34 | class RotateBatchIterator(BatchIterator): 35 | def transform(self, Xb, yb): 36 | Xb, yb = super(RotateBatchIterator, self).transform(Xb, yb) 37 | 38 | angle = np.random.randint(-10,11) 39 | Xb_rotated = rotate(Xb, angle, axes=(2, 3), reshape=False) 40 | 41 | return Xb_rotated, yb 42 | 43 | 44 | class PreSplitTrainSplit(object): 45 | 46 | def __init__(self, X_train, y_train, X_valid, y_valid): 47 | self.X_train = X_train 48 | self.y_train = y_train 49 | self.X_valid = X_valid 50 | self.y_valid = y_valid 51 | 52 | def __call__(self, X, y, net): 53 | return self.X_train, self.X_valid, self.y_train, self.y_valid 54 | 55 | 56 | class AdjustVariable(object): 57 | def __init__(self, name, start=0.03, stop=0.001): 58 | self.name = name 59 | self.start, self.stop = start, stop 60 | self.ls = None 61 | 62 | def __call__(self, nn, train_history): 63 | if self.ls is None: 64 | self.ls = np.linspace(self.start, self.stop, nn.max_epochs) 65 | 66 | epoch = train_history[-1]['epoch'] 67 | if epoch >= nn.max_epochs: 68 | return 69 | new_value = float32(self.ls[epoch - 1]) 70 | getattr(nn, self.name).set_value(new_value) 71 | 72 | 73 | def load_file(file): 74 | 75 | def url(file): 76 | if file is FTRAIN: 77 | return 'http://folk.ntnu.no/alfredvc/workshop/data/training.zip' 78 | if file is FTEST: 79 | return 'http://folk.ntnu.no/alfredvc/workshop/data/test.zip' 80 | if file is FLOOKUP: 81 | return 'http://folk.ntnu.no/alfredvc/workshop/data/test.zip' 82 | 83 | def zip(file): 84 | if file is FTRAIN: 85 | return 'data/training.zip' 86 | if file is FTEST: 87 | return 'data/test.zip' 88 | 89 | def download(file): 90 | print("Downloading %s" % file) 91 | urlretrieve(url(file), zip(file)) 92 | print("Unzipping data %s" % file) 93 | if file is FTRAIN or file is FTEST: 94 | with zipfile.ZipFile(zip(file), "r") as z: 95 | z.extractall('data/') 96 | print("Deleting zip file " + zip(file)) 97 | os.remove(zip(file)) 98 | 99 | if not os.path.exists(file): 100 | download(file) 101 | 102 | return read_csv(file) 103 | 104 | 105 | def load(file_path): 106 | """Loads data from FTEST if *test* is True, otherwise from FTRAIN. 107 | Pass a list of *cols* if you're only interested in a subset of the 108 | target columns. 109 | """ 110 | 111 | df = load_file(file_path) 112 | 113 | # The Image column has pixel values separated by space; convert 114 | # the values to numpy arrays: 115 | df['Image'] = df['Image'].apply(lambda im: np.fromstring(im, sep=' ')) 116 | 117 | df = df.dropna() # drop all rows that have missing values in them 118 | 119 | X = np.vstack(df['Image'].values) / 255. # scale pixel values to [0, 1] 120 | X = X.astype(np.float32) 121 | 122 | if file_path is FTRAIN: # only FTRAIN has any target columns 123 | y = df[df.columns[:-1]].values 124 | y = (y - 48) / 48 # scale target coordinates to [-1, 1] 125 | y = y.astype(np.float32) 126 | else: 127 | y = None 128 | 129 | # print("X.shape == {}; X.min == {:.3f}; X.max == {:.3f}".format( 130 | # X.shape, X.min(), X.max())) 131 | # print("y.shape == {}; y.min == {:.3f}; y.max == {:.3f}".format( 132 | # y.shape, y.min(), y.max())) 133 | 134 | return X, y 135 | 136 | 137 | def load2d(file_path): 138 | X, y = load(file_path) 139 | X = X.reshape(-1, 1, 96, 96) 140 | return X, y 141 | 142 | 143 | def pickle_network(file_name, network): 144 | # in case the model is very big 145 | sys.setrecursionlimit(10000) 146 | with open(file_name, 'wb') as f: 147 | pickle.dump(network, f, -1) 148 | 149 | 150 | def unpickle_network(file_name): 151 | with open(file_name, 'rb') as f: # ! 152 | return pickle.load(f) 153 | 154 | 155 | class EarlyStopping(object): 156 | def __init__(self, patience=100): 157 | self.patience = patience 158 | self.best_valid = np.inf 159 | self.best_valid_epoch = 0 160 | self.best_weights = None 161 | 162 | def __call__(self, nn, train_history): 163 | current_valid = train_history[-1]['valid_loss'] 164 | current_epoch = train_history[-1]['epoch'] 165 | if current_valid < self.best_valid: 166 | self.best_valid = current_valid 167 | self.best_valid_epoch = current_epoch 168 | self.best_weights = nn.get_all_params_values() 169 | elif self.best_valid_epoch + self.patience < current_epoch: 170 | print("Early stopping.") 171 | print("Best valid loss was {:.6f} at epoch {}.".format( 172 | self.best_valid, self.best_valid_epoch)) 173 | nn.load_params_from(self.best_weights) 174 | raise StopIteration() 175 | 176 | 177 | class FlipBatchIterator(BatchIterator): 178 | flip_indices = [ 179 | (0, 2), (1, 3), 180 | (4, 8), (5, 9), (6, 10), (7, 11), 181 | (12, 16), (13, 17), (14, 18), (15, 19), 182 | (22, 24), (23, 25), 183 | ] 184 | 185 | def transform(self, Xb, yb): 186 | Xb, yb = super(FlipBatchIterator, self).transform(Xb, yb) 187 | 188 | # Flip half of the images in this batch at random: 189 | bs = Xb.shape[0] 190 | indices = np.random.choice(bs, bs / 2, replace=False) 191 | Xb[indices] = Xb[indices, :, :, ::-1] 192 | 193 | if yb is not None: 194 | # Horizontal flip of all x coordinates: 195 | yb[indices, ::2] = yb[indices, ::2] * -1 196 | 197 | # Swap places, e.g. left_eye_center_x -> right_eye_center_x 198 | for a, b in self.flip_indices: 199 | yb[indices, a], yb[indices, b] = ( 200 | yb[indices, b], yb[indices, a]) 201 | 202 | return Xb, yb 203 | 204 | 205 | def plot_sample(x, y, axis): 206 | img = x.reshape(96, 96) 207 | axis.imshow(img, cmap='gray') 208 | axis.scatter(y[0::2] * 48 + 48, y[1::2] * 48 + 48, marker='x', s=10) 209 | 210 | 211 | def visualize_predictions(net): 212 | X, _ = load2d(FTEST) 213 | y_pred = net.predict(X) 214 | 215 | fig = pyplot.figure(figsize=(6, 6)) 216 | fig.subplots_adjust( 217 | left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05) 218 | 219 | for i in range(16): 220 | ax = fig.add_subplot(4, 4, i + 1, xticks=[], yticks=[]) 221 | plot_sample(X[i], y_pred[i], ax) 222 | 223 | pyplot.show() 224 | 225 | 226 | def load_and_plot_layer(layer): 227 | with open(layer, 'rb') as f: 228 | layer0 = np.load(f) 229 | fig = pyplot.figure() 230 | fig.subplots_adjust( 231 | left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05) 232 | for i in range(32): 233 | img = layer0[i, :, :] 234 | img -= np.min(img) 235 | img /= np.max(img) / 255.0 236 | ax = fig.add_subplot(4, 8, i + 1, xticks=[], yticks=[]) 237 | ax.imshow(img, cmap='gray', interpolation='none') 238 | pyplot.show() 239 | 240 | def create_submition(net): 241 | X = load2d(FTEST)[0] 242 | y_pred = net.predict(X) 243 | 244 | y_pred2 = y_pred * 48 + 48 245 | y_pred2 = y_pred2.clip(0, 96) 246 | 247 | cols = ("left_eye_center_x","left_eye_center_y","right_eye_center_x","right_eye_center_y","left_eye_inner_corner_x","left_eye_inner_corner_y","left_eye_outer_corner_x","left_eye_outer_corner_y","right_eye_inner_corner_x","right_eye_inner_corner_y","right_eye_outer_corner_x","right_eye_outer_corner_y","left_eyebrow_inner_end_x","left_eyebrow_inner_end_y","left_eyebrow_outer_end_x","left_eyebrow_outer_end_y","right_eyebrow_inner_end_x","right_eyebrow_inner_end_y","right_eyebrow_outer_end_x","right_eyebrow_outer_end_y","nose_tip_x","nose_tip_y","mouth_left_corner_x","mouth_left_corner_y","mouth_right_corner_x","mouth_right_corner_y","mouth_center_top_lip_x","mouth_center_top_lip_y","mouth_center_bottom_lip_x","mouth_center_bottom_lip_y") 248 | 249 | df = DataFrame(y_pred2, columns=cols) 250 | 251 | lookup_table = load_file(FLOOKUP) 252 | values = [] 253 | 254 | for index, row in lookup_table.iterrows(): 255 | values.append(( 256 | row['RowId'], 257 | df.ix[row.ImageId - 1][row.FeatureName], 258 | )) 259 | 260 | now_str = datetime.now().isoformat().replace(':', '-') 261 | submission = DataFrame(values, columns=('RowId', 'Location')) 262 | filename = 'submission-{}.csv'.format(now_str) 263 | submission.to_csv(filename, index=False) 264 | print("Wrote {}".format(filename)) 265 | 266 | def visualize_learning(net): 267 | train_loss = np.array([i["train_loss"] for i in net.train_history_]) 268 | valid_loss = np.array([i["valid_loss"] for i in net.train_history_]) 269 | pyplot.plot(train_loss, linewidth=3, label="train") 270 | pyplot.plot(valid_loss, linewidth=3, label="valid") 271 | pyplot.grid() 272 | pyplot.legend() 273 | pyplot.xlabel("epoch") 274 | pyplot.ylabel("loss") 275 | ymax = max(np.max(valid_loss), np.max(train_loss)) 276 | ymin = min(np.min(valid_loss), np.min(train_loss)) 277 | pyplot.ylim(ymin * 0.8, ymax * 1.2) 278 | pyplot.yscale("log") 279 | pyplot.show() 280 | 281 | def conv(input, weights): 282 | return convolve(input, weights) 283 | 284 | 285 | def show_kernels(kernels, cols=8): 286 | rows = ceil(len(kernels)*1.0/cols) 287 | fig = pyplot.figure(figsize=(cols+2, rows+1)) 288 | 289 | fig.subplots_adjust( 290 | left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05) 291 | for i in range(len(kernels)): 292 | img = np.copy(kernels[i]) 293 | img -= np.min(img) 294 | img /= np.max(img) 295 | ax = fig.add_subplot(rows, cols, i + 1, xticks=[], yticks=[]) 296 | ax.imshow(img, cmap='gray', interpolation='none') 297 | pyplot.axis('off') 298 | pyplot.show() 299 | 300 | 301 | def get_activations(layer, x): 302 | # compile theano function 303 | xs = T.tensor4('xs').astype(theano.config.floatX) 304 | get_activity = theano.function([xs], get_output(layer, xs)) 305 | 306 | return get_activity(x) 307 | 308 | 309 | def show_images(list, cols=1): 310 | rows = ceil(len(list)*1.0/cols) 311 | fig = pyplot.figure(figsize=(cols+2, rows+1)) 312 | fig.subplots_adjust( 313 | left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05) 314 | for i in range(len(list)): 315 | ax = fig.add_subplot(rows, cols, i+1, xticks=[], yticks=[]) 316 | ax.imshow(list[i], cmap='gray') 317 | pyplot.axis('off') 318 | pyplot.show() 319 | 320 | 321 | def get_conv_weights(net): 322 | layers = net.get_all_layers() 323 | layercounter = 0 324 | w = [] 325 | b = [] 326 | for l in layers: 327 | if('Conv2DLayer' in str(type(l))): 328 | weights = l.W.get_value() 329 | biases = l.b.get_value() 330 | b.append(biases) 331 | weights = weights.reshape(weights.shape[0]*weights.shape[1],weights.shape[2],weights.shape[3]) 332 | w.append(weights) 333 | layercounter += 1 334 | return w, b 335 | 336 | 337 | def load_image(file): 338 | x=Image.open(file,'r') 339 | x=x.convert('L') 340 | y=np.asarray(x.getdata(),dtype=np.float32).reshape((x.size[1],x.size[0])) 341 | return y 342 | -------------------------------------------------------------------------------- /img/ConvolutionalNeuralNetworks_11_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_11_1.png -------------------------------------------------------------------------------- /img/ConvolutionalNeuralNetworks_12_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_12_1.png -------------------------------------------------------------------------------- /img/ConvolutionalNeuralNetworks_5_0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_5_0.png -------------------------------------------------------------------------------- /img/ConvolutionalNeuralNetworks_7_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_7_1.png -------------------------------------------------------------------------------- /img/ConvolutionalNeuralNetworks_9_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_9_2.png -------------------------------------------------------------------------------- /img/cnn_layer.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/cnn_layer.png -------------------------------------------------------------------------------- /img/convolution_schematic.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/convolution_schematic.gif -------------------------------------------------------------------------------- /img/dropout.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/dropout.jpeg -------------------------------------------------------------------------------- /img/face.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/face.jpg -------------------------------------------------------------------------------- /img/mlp.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/mlp.png -------------------------------------------------------------------------------- /img/obama.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/obama.jpg -------------------------------------------------------------------------------- /img/overfitting.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/overfitting.png -------------------------------------------------------------------------------- /img/pooling_schematic.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/pooling_schematic.gif -------------------------------------------------------------------------------- /img/shared_weights.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/shared_weights.png -------------------------------------------------------------------------------- /img/sparse_connectivity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/sparse_connectivity.png -------------------------------------------------------------------------------- /sample_solution/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/sample_solution/__init__.py -------------------------------------------------------------------------------- /sample_solution/sample_cnn.py: -------------------------------------------------------------------------------- 1 | import lasagne 2 | 3 | 4 | def build_cnn(input_var=None): 5 | network = lasagne.layers.InputLayer(shape=(None, 3, 32, 32), 6 | input_var=input_var) 7 | network = lasagne.layers.Conv2DLayer( 8 | network, num_filters=32, filter_size=(3, 3), 9 | nonlinearity=lasagne.nonlinearities.rectify, 10 | pad='same') 11 | 12 | network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2)) 13 | 14 | network = lasagne.layers.DropoutLayer(network, p=.2) 15 | 16 | network = lasagne.layers.Conv2DLayer( 17 | network, num_filters=64, filter_size=(3, 3), 18 | nonlinearity=lasagne.nonlinearities.rectify, 19 | pad='same') 20 | 21 | network = lasagne.layers.DropoutLayer(network, p=.2) 22 | 23 | network = lasagne.layers.Conv2DLayer( 24 | network, num_filters=64, filter_size=(3, 3), 25 | nonlinearity=lasagne.nonlinearities.rectify, 26 | pad='same') 27 | network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2)) 28 | 29 | network = lasagne.layers.DenseLayer( 30 | lasagne.layers.dropout(network, p=.5), 31 | num_units=512, 32 | nonlinearity=lasagne.nonlinearities.rectify) 33 | 34 | network = lasagne.layers.DenseLayer( 35 | lasagne.layers.dropout(network, p=.5), 36 | num_units=10, 37 | nonlinearity=lasagne.nonlinearities.softmax) 38 | 39 | return network 40 | --------------------------------------------------------------------------------