├── .gitignore
├── LICENSE
├── README.md
├── cifar10.py
├── facial_recognition
├── __init__.py
├── network.py
└── util.py
├── img
├── ConvolutionalNeuralNetworks_11_1.png
├── ConvolutionalNeuralNetworks_12_1.png
├── ConvolutionalNeuralNetworks_5_0.png
├── ConvolutionalNeuralNetworks_7_1.png
├── ConvolutionalNeuralNetworks_9_2.png
├── cnn_layer.png
├── convolution_schematic.gif
├── dropout.jpeg
├── face.jpg
├── mlp.png
├── obama.jpg
├── overfitting.png
├── pooling_schematic.gif
├── shared_weights.png
└── sparse_connectivity.png
└── sample_solution
├── __init__.py
└── sample_cnn.py
/.gitignore:
--------------------------------------------------------------------------------
1 | *.pkl
2 | data/*
3 | *.iml
4 | .idea/*
5 | __pycache__/
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "{}"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright 2017, Alfredo Clemente
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 | # Convolutional Neural Networks
3 | A convolutional neural network, or convnet, is a type of feed-forward artifcial neural network that is used to process grid structured data, for example 2D images, and time series data. Convnets obtain their name from the fact that they use convolution instead of matrix mutiplication on at least one of its layers. Convolutional neural networks are currently the best techniques for image and sound processing.
4 |
5 | 
6 |
Image taken from the Stanford convolutoinal network tutorial
7 |
8 | The basic building block of convolutional neural networks are convolutional layers. A convolutional layer is defined by a set of filters, which are then applied to the input image to produce an output.
9 |
10 | In the above image neurons of the same color are the same filter, there are several neurons with the same color because the filter is applied to the whole image. Each filter creates an output of size WxHx1 called a feature map. The feature maps from all the filters are then stacked together to construct a WxHxF output volume, where F is the amount of filters in the layer.
11 |
12 | The input to a convolutional layer is also a volume. The input volume is of size WxHxD where W is the width dimension of the input, H is the height dimension and D is the feature dimension. In the above image there is a single feature dimension, and it represents the greyscale value of the pixel.
13 |
14 | Given that convolutional layers both take an volume as an input, and give an volume as an output they can be stacked one after another, and this is the general architecture of a convolutional neural network. In practice however, convnets include also pooling layers, and fully connected layers.
15 |
16 | If we used filters that spanned the entirety of the input image, our convnet would be equivalent to a normal fully connected network.
17 |
18 | ## Sparse connectivity
19 |
20 | 
21 | Image taken from the DeepLearning.net MLP tutorial
22 |
23 | In traditional multi-layer perceptrons as shown above, all layers are fully connected to the next layer. This means that a network in which the input layer has **100x100** neurons, and the second layer has **64** neurons will require **640000** weights.
24 |
25 | 
26 | Image taken from the DeepLearning.net LeNet tutorial
27 |
28 | On the other hand convnets are sparsely connected, meaning that each neuron in a layer is only connected to a subset of the previous layer. Using the same example as above, a network with **100x100** input neurons and with **64** neurons on the following layer would only need **64xK** weights, where **K** is the filter size. Filters are usually small, 2x2, 3x3 5x5, etc.
29 |
30 | ## Weight sharing
31 |
32 | 
33 | Image taken from the DeepLearning.net LeNet tutorial
34 |
35 | A convolutional layer is composed of several sets of neurons, where each set is restricted to having the same weights. In the image above layer m is composed of a set of weights for all three neurons, same color lines represent same weights. Each neuron above can be though of as applying a 1x3 filter, and producing a 1x1 output. When we consider the three neurons together, they apply the same 1x3 filter on three different locations of the image and produce a 1x3 feature map.
36 |
37 | In practice, convolutional layers usually have several of these sets, and each set produces its own feature map.
38 |
39 | ## Convolution
40 |
41 | A convolution is mathematical operation of two functions that produces a a third function, which can be viewed as a modification of one of the original functions.
42 | A discrete convolution is defined as $$(f * g)[i]=\sum_{m=0}^{i-1}f[m]g[i-m]$$
43 | Then the two dimentional discrete convolution is $$(f * g)[i, j]=\sum_{m=0}^{i-1}\sum_{n=0}^{j-1}f[m, n]g[i-m, j-n]$$
44 | In the case of a convnet $g[i,j]$ represents the input at location i,j, while $f[i,j]$ represents the kernel weights connected to the input at location i, j.
45 |
46 | ## Pooling
47 |
48 | 
49 | Animation taken from the Stanford deep learning tutorial
50 |
51 | In adition to convolutional layers, convnets usually have pooling layers. Pooling is performing a operation, usually taking the maximum, on a subset of the image at a time creating a feature map. Pooling layers serve two main purposes, one is to provide some translational invariance of features, and the other is to reduce the dimentionality of the layers.
52 |
53 | The translational invariance comes from the fact that pooling assigns a single value to a a subset of the input. For example if we have a 5x5 max pooling neuron and the maximum value its on the top left corner of its input, if the input is translated to the right by four units the pooling neuron will still give the same output.
54 |
55 | The dimentionality reduction play a big role in keeping convnets tractable, for example a 2x2 max pooling layer will reduce the size of its input by a factor of 2x2 = 4.
56 |
57 |
58 | ```python
59 | %matplotlib inline
60 |
61 | import util
62 | from nolearn.lasagne import visualize
63 | import numpy as np
64 | ```
65 |
66 | DEBUG: nvcc STDOUT mod.cu
67 | Creating library C:/Users/erpa_/AppData/Local/Theano/compiledir_Windows-10-10.0.14295-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.4.4-64/tmp3cbxfn57/m91973e5c136ea49268a916ff971b7377.lib and object C:/Users/erpa_/AppData/Local/Theano/compiledir_Windows-10-10.0.14295-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.4.4-64/tmp3cbxfn57/m91973e5c136ea49268a916ff971b7377.exp
68 |
69 | Using gpu device 0: GeForce GTX 980 Ti (CNMeM is disabled, cuDNN 5004)
70 |
71 |
72 | To better visualize how convolution works, and what kinds of filters are learned we will disect a fully trained convnet with over 8 million parameters trained in the task of detecting facial features. The network architecture is taken from [Daniel Nouri](http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/) and trained for ten hours on kaggle's [facial keypoint detection data](https://www.kaggle.com/c/facial-keypoints-detection).
73 |
74 | The network is created using Lasagne, a python wrapper around Theano.
75 |
76 | The network is composed of three alternating convolution and max-pooling layers, followed by two fully connected layers. The network is trained on about 2000 96x96 black and white images, and outputs the x and y location of 15 different facial keypoins. For example left_eye_center, right_eye_center, nose_tip, mouth_center_bottom, etc.
77 |
78 | Below is the definition of the network used here. Some details of the network, such as dropout, input augmentation, etc. have been ommitted for clarity.
79 |
80 |
81 |
82 | ```python
83 | from nolearn.lasagne import NeuralNet
84 | from lasagne.layers import InputLayer, Conv2DLayer, MaxPool2DLayer, DenseLayer
85 |
86 | convnet = NeuralNet(
87 | layers=[
88 | ('input', InputLayer),
89 | ('conv1', Conv2DLayer),
90 | ('pool1', Conv2DLayer),
91 | ('conv2', Conv2DLayer),
92 | ('pool2', MaxPool2DLayer),
93 | ('conv3', Conv2DLayer),
94 | ('pool3', MaxPool2DLayer),
95 | ('hidden4', DenseLayer),
96 | ('hidden5', DenseLayer),
97 | ('output', DenseLayer),
98 | ],
99 | input_shape=(None, 1, 96, 96),
100 | conv1_num_filters=32, conv1_filter_size=(3, 3), pool1_pool_size=(2, 2),
101 | conv2_num_filters=64, conv2_filter_size=(2, 2), pool2_pool_size=(2, 2),
102 | conv3_num_filters=128, conv3_filter_size=(2, 2), pool3_pool_size=(2, 2),
103 | hidden4_num_units=1000,
104 | hidden5_num_units=1000,
105 | output_num_units=30, output_nonlinearity='none',
106 | regression=True,
107 | )
108 | ```
109 |
110 | First we load the fully trained network. It trained for 3000 epochs, which took around 10 hours on my GTX 570.
111 |
112 |
113 | ```python
114 | net = util.unpickle_network("../networks/n7.pkl")
115 | util.visualize_predictions(net)
116 | ```
117 |
118 |
119 | 
120 |
121 |
122 | As can be shown above, the network does a very good job of locating the 15 different facial keypoints. In order to understand how the network does this, we will open it up and see how this is done.
123 |
124 | ## The first layer
125 |
126 | The first layer of a convnet is different from all other layers in the sense that it is the only layer that works in the same dimensions and representation as the input. The output of the first layer represents how similar a subset of the image is to each filter. In our case the first pixel of the first feature map represents how similar the first 3x3 square of the input is to the first filter. Then the feature map represents how similar each 3x3 square of the image is to the filter. This is then done for all 32 filters, resulting in 32 filter maps.
127 |
128 | 
129 | Animation taken from the Stanford deep learning tutorial
130 |
131 | The first layer takes as its input a 1x96x96 volume, where the one represents that there is only one dimension per pixel, in our case 0-1 grayscale. If the input were an RGB image it would be a 3x96x96 volume. The layer then maps the input from the greyscale dimension to a feature space of 32 features where each pixel has 32 dimensions, one for each feature. The result is then a 32x96x96 volume.
132 |
133 | The reduction of the image size after a convolution is not addressed here (above 5x5 to 3x3), but in practice it is usually ignored or remediated with zero-padding.
134 |
135 |
136 | ```python
137 | visualize.plot_conv_weights(net.layers_['conv1'])
138 | ```
139 |
140 |
141 |
142 |
143 |
144 |
145 |
146 |
147 |
148 | 
149 |
150 |
151 | Above are all 32 3x3 filters learned by the network on the first layer. The weights of the filters are individually scaled to the interval 0 - 255 for visualization.
152 |
153 | Feature maps are then created with the above filters by convolving them with the image.
154 |
155 |
156 | ```python
157 | # we load the data in a 2d representation
158 | x, _ = util.load2d(util.FTEST)
159 | visualize.plot_conv_activity(net.layers_['conv1'], x[0:1, 0:1, : ,:])
160 | ```
161 |
162 | DEBUG: nvcc STDOUT mod.cu
163 | Creating library C:/Users/erpa_/AppData/Local/Theano/compiledir_Windows-10-10.0.14295-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.4.4-64/tmprt73od52/m9a6bd0eb5ed5c92e91261282fc495cb4.lib and object C:/Users/erpa_/AppData/Local/Theano/compiledir_Windows-10-10.0.14295-Intel64_Family_6_Model_60_Stepping_3_GenuineIntel-3.4.4-64/tmprt73od52/m9a6bd0eb5ed5c92e91261282fc495cb4.exp
164 |
165 |
166 |
167 |
168 |
169 |
170 |
171 |
172 |
173 |
174 |
175 | 
176 |
177 |
178 | Above are the feature maps created by the first layer of the network when given an image. In this case black represents a high activation value. Each feature map is the result of convolving the original image with one of the filters. The different feature maps are tuned to recognize different features. For example some detect the nostrils, or the contours of the face, or the contours of the eyes.
179 |
180 | # Variable size input
181 |
182 | The parametrers of the convolutional layers depend only the the filter size and amount of filters, 32 neurons with 3x3 filters require 32x3x3 parameters (in addition to the bias). This means that a convolutional layer can be given any image size as an input and will give an output in which the output dimensions are proportionate to the input dimensions. Below we take the feature detectors from the first layer of our network and apply them to an image of size 313x250.
183 |
184 | This however, does not mean that we can simply supply our full convnet with any size image and expect an output. The final layers of the network are fully connected, which means that must have a fixed size input. This limitation can be solved by adding special pooling layers before the fully connected layes that reduce the dimensions to a fixed size.
185 |
186 |
187 | ```python
188 | img = util.load_image('img/obama.jpg')
189 | print("Image dimensions: ",img.shape)
190 | util.show_images([img])
191 | ```
192 |
193 | Image dimensions: (313, 250)
194 |
195 |
196 |
197 | 
198 |
199 |
200 |
201 | ```python
202 | # the image is loaded with values from 0 to 255, we must scale them down to 0 - 1
203 | img /= 255.0
204 | visualize.plot_conv_activity(net.layers_['conv1'], img[None, None, :, :])
205 | ```
206 |
207 |
208 |
209 |
210 |
211 |
212 |
213 |
214 |
215 | 
216 |
217 |
218 | ## The inner convolutional layers
219 |
220 | Unlike the first convolutional layer, the inner convolutional layers no longer work in the pixel space. They work on the feature space created by the previous layer. They map the input volume to a new feature space defined by that layer's feature detectors.
221 | Much like how the first layer transforms the input image into a set of features of the image, the inner layers transform their input volumes into a set of features of the features from the previous layer. Because of this a hierarchical structure of features is formed, where the deeper layers detect combinations of features from the lower layers.
222 |
223 | Once the input image has passed through all the convolutional and pooling layers it can be thought of as if it were mapped from the pixel space to the feature space. Where the image is no longer composed of grayscale pixels, but by combinations of high and low level features at different locations.
224 |
225 | ## The fully connected layers
226 |
227 | The final pooling layer of our network outputs a 128x11x11 volume that representing the features of the input image. These features must then be analyzed in order to give to the 30 outputs of the network corresponding to the x and y locations of the facial features.
228 |
229 | In practice it is common to take the output of a set of convolutional and pooling layers, and use it as the input of a fully connected network. The fully connected network can then process the features and give the required output.
230 |
231 | ## Model reusability
232 |
233 | One big advantage of this kind of convolutional network architecture is that, once the convolutional layers are fully trained it outputs feature representation of its input. These feature representations do not have to be used exclusively for the task they were trained for.
234 |
235 | The convolutional layers can be deatached from the fully connected layers they are trained with, and atached to a new set of fully connected layers that can then be trained to perform a new task. By doing this, the network does not have to learn how to extract features from images and can learn classification or regression on these features much faster.
236 |
237 | This reausability has led to very efficient models that take months to train to be published for [download](https://github.com/BVLC/caffe/wiki/Model-Zoo). They can then be atached to new dense layers to perform state of the art classification or regression.
238 |
239 | # Practical
240 |
241 | ## Prerequisites
242 | In order to do this practical, you must have lasagne, and theano installed. This can be tricky but if you follow the right guide it will probaby work. Here are some guides for [Windows](https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne-on-Windows-7-%2864-bit%29), [Ubuntu](https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne-on-Ubuntu-14.04) and a [general guide](http://lasagne.readthedocs.org/en/latest/user/installation.html).
243 | I would also recommend to get theano to run on your GPU if you have one, it will make everything _much_ faster.
244 |
245 | You can download the code for this practical from https://github.com/Alfredvc/cnn_workshop. In the same project under the folder *facial_recognition* is the network for facial feature recognition presented in the first part.
246 |
247 | ## The task
248 |
249 | We will create a convolutional neural network to recognize images from the [CIFAR-10 dataset](https://www.cs.toronto.edu/~kriz/cifar.html). The CIFAR-10 dataset consists of several thousand 32x32 color images that can be classified into 10 different categories. Given our limited time and compute resources we will only be working with a subset of the dataset.
250 |
251 | Inside the **cifar10.py** file there are several utility functions for downloading, and visualizing the results. In addition there is a **build_cnn()** function, this is the function that configures the conv net, and is the function we will be modifying.
252 |
253 | Currently the network we will improve achieves around 57% test accuracy after 15 epochs of training, and is overfitting quite strongly.
254 |
255 | ## How do we do this?
256 |
257 | One way to approach the tuning of a particular architecture of neural networks, is by first making the network overfit on the training data, and adding regularization to atenuate the overfitting.
258 |
259 | Since our network is already overfitting, we can begin by regularizing it. And then increasing its capacity, and the regularizing again. This cycle is repeated until the desired accuracy is achieved.
260 |
261 | ### Overfitting
262 | Overfitting is when the network has low error on the training data and high error on the test data. This tends to happen because the network learns to "remember" the training data and what it should give as output, instead of learning the patterns in the data that lead to the outputs. This then generalizes poorly on the test data, which it has never seen.
263 | 
264 | Image taken from the wikipedia article on overfitting
265 |
266 | An easy way of visualizing is with linear regression with polynomials. The above points are sampled from the distribution given by the black line, and then some noise is added to them, here we attempt to predict a point's y position given its x position. If we attempt to fit the points with a degree 1 polynomial (a line), then we can make very good predictions on new points that we have not seen. If instead we use a degree 11 polynomial as shown above, we will get a training error of zero as the line goes through all points, however when we get a new x value, our prediction will be completely wrong. In the world of polynomials we can still model the above data with a degree 11 polynomial, however strong regularization must be applied for it to work well, for example the polynomial could model the seen distribution of the error about the line.
267 |
268 | The way we achieve overfitting in neural netwoks is the same as with polynomials, we simply increase the number of parameters. In the case of the convolutional layers, we can increase the size of the filter, the amount of filters per layers, the amount of layers. For the fully connected layers, we can increase the number of neurons per layer, or the number of layers.
269 |
270 |
271 | ### Regularization by dropout
272 | 
273 | Image taken from the Stanford class CS231n webpage
274 |
275 | Regularization is the process by which we help the network obtain a better generalization by forcing some contstraint on the network. Dropout is a powerfull and now common regularization technique developed by [researchers at the University of Toronto](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf). The idea behind dropout is simple, during training we randomly deactivate some neurons, and during evaluation we activate all the neurons. The reasoning behind this is to prevent neurons to collectively "remember" the inputs since the probability of a set of neurons being active all together more than once is highly unlikely. Instead knowledge must be distributed, and the network must rely on the detection of patterns.
276 |
277 | ## Suggested solution
278 |
279 | **sample_cnn.py** contains a cnn architecture that i have improved with the techniques mentioned above, and by increasing the network's capacity. It can achieve a 67% classification rate on the test data after 15 epochs.
280 | You can construct this network directly by calling `main(model='suggested_cnn')`.
281 |
282 | # Further reading
283 |
284 | If you liked this topic and would like to learn more about it you can take a look at the reference section. I would personally recommend [Andrej Karpathy's lectures on convnets](https://www.youtube.com/watch?v=NfnWJUyUJYU&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC), and [this](http://www.deeplearningbook.org) upcomming deep learning book.
285 |
286 | # References
287 |
288 | * http://danielnouri.org/notes/2014/12/17/using-convolutional-neural-nets-to-detect-facial-keypoints-tutorial/
289 | * http://ufldl.stanford.edu/tutorial/supervised/ConvolutionalNeuralNetwork/
290 | * http://deeplearning.net/tutorial/mlp.html
291 | * http://deeplearning.net/tutorial/lenet.html
292 | * https://www.coursera.org/course/neuralnets
293 | * https://en.wikipedia.org/wiki/Convolution
294 | * http://lasagne.readthedocs.org/en/latest/user/tutorial.html
295 | * https://github.com/Lasagne/Lasagne/wiki/From-Zero-to-Lasagne-on-Windows-7-%2864-bit%29
296 | * http://deeplearning.stanford.edu/tutorial/
297 | * https://en.wikipedia.org/wiki/Affine_transformation
298 | * https://github.com/BVLC/caffe/wiki/Model-Zoo
299 | * http://cs231n.github.io/neural-networks-2/
300 | * https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf
301 | * https://www.youtube.com/watch?v=NfnWJUyUJYU&list=PLkt2uSq6rBVctENoVBg1TpCC7OQi31AlC
302 | * http://www.deeplearningbook.org Ian Goodfellow, Yoshua Bengio, and Aaron Courville, Deep Learning, 2016
303 |
--------------------------------------------------------------------------------
/cifar10.py:
--------------------------------------------------------------------------------
1 | import pickle
2 | import os.path
3 | import sys
4 | import numpy as np
5 | import lasagne
6 | import theano
7 | import theano.tensor as T
8 | import time
9 | from sample_solution import sample_cnn
10 |
11 | from matplotlib import pyplot as plt
12 | if sys.version_info[0] == 2:
13 | from urllib import urlretrieve
14 | else:
15 | from urllib.request import urlretrieve
16 |
17 |
18 | def pkl(file_name, object):
19 | with open(file_name, 'wb') as f:
20 | pickle.dump(object, f, -1)
21 |
22 |
23 | def un_pkl_l(file_name):
24 | with open(file_name, 'rb') as f:
25 | return pickle.load(f, encoding='latin1')
26 |
27 |
28 | def un_pkl(file_name):
29 | with open(file_name, 'rb') as f:
30 | return pickle.load(f, encoding='latin1')
31 |
32 |
33 | def make_image(X):
34 | im = np.swapaxes(X.T, 0, 1)
35 | im = im - im.min()
36 | im = im * 1.0 / im.max()
37 | return im
38 |
39 |
40 | def show_images(data, predicted, labels, classes):
41 | plt.figure(figsize=(16, 5))
42 | for i in range(0, 10):
43 | plt.subplot(1, 10, i+1)
44 | plt.imshow(make_image(data[i]), interpolation='nearest')
45 | true = classes[labels[i]]
46 | pred = classes[predicted[i]]
47 | color = 'green' if true == pred else 'red'
48 | plt.text(0, 0, true, color='black', bbox=dict(facecolor='white', alpha=1))
49 | plt.text(0, 32, pred, color=color, bbox=dict(facecolor='white', alpha=1))
50 |
51 | plt.axis('off')
52 |
53 | DATA = 'data.pkl'
54 |
55 |
56 | def load_file(file):
57 | def url(file):
58 | if file is DATA:
59 | return 'http://folk.ntnu.no/alfredvc/workshop/data/data.pkl'
60 |
61 | def download(file):
62 | print("Downloading %s" % file)
63 | urlretrieve(url(file), file)
64 |
65 | if not os.path.exists(file):
66 | download(file)
67 | return un_pkl_l(file)
68 |
69 |
70 | def iterate_minibatches(inputs, targets, batchsize, shuffle=False):
71 | assert len(inputs) == len(targets)
72 | if shuffle:
73 | indices = np.arange(len(inputs))
74 | np.random.shuffle(indices)
75 | for start_idx in range(0, len(inputs) - batchsize + 1, batchsize):
76 | if shuffle:
77 | excerpt = indices[start_idx:start_idx + batchsize]
78 | else:
79 | excerpt = slice(start_idx, start_idx + batchsize)
80 | yield inputs[excerpt], targets[excerpt]
81 |
82 |
83 | def build_cnn(input_var=None):
84 | network = lasagne.layers.InputLayer(shape=(None, 3, 32, 32),
85 | input_var=input_var)
86 | network = lasagne.layers.Conv2DLayer(
87 | network, num_filters=32, filter_size=(3, 3),
88 | nonlinearity=lasagne.nonlinearities.rectify,
89 | pad='same')
90 |
91 | network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
92 |
93 | network = lasagne.layers.DenseLayer(
94 | network,
95 | num_units=128,
96 | nonlinearity=lasagne.nonlinearities.rectify)
97 |
98 | network = lasagne.layers.DenseLayer(
99 | network,
100 | num_units=10,
101 | nonlinearity=lasagne.nonlinearities.softmax)
102 |
103 | return network
104 |
105 |
106 | def main(model='cnn', num_epochs=10):
107 | # Load the dataset
108 | print("Loading data...")
109 | X_train, y_train, X_test, y_test, classes = load_file(DATA)
110 |
111 | # Prepare Theano variables for inputs and targets
112 | input_var = T.tensor4('inputs')
113 | target_var = T.ivector('targets')
114 |
115 | # Create neural network model (depending on first command line parameter)
116 | print("Building model and compiling functions...")
117 | if model == 'cnn':
118 | network = build_cnn(input_var)
119 | elif model == 'suggested_cnn':
120 | network = sample_cnn.build_cnn(input_var)
121 | else:
122 | print("Unrecognized model type %r." % model)
123 | return
124 |
125 | # Create a loss expression for training, i.e., a scalar objective we want
126 | # to minimize (for our multi-class problem, it is the cross-entropy loss):
127 | prediction = lasagne.layers.get_output(network)
128 | loss = lasagne.objectives.categorical_crossentropy(prediction, target_var)
129 | loss = loss.mean()
130 | # We could add some weight decay as well here, see lasagne.regularization.
131 |
132 | params = lasagne.layers.get_all_params(network, trainable=True)
133 | updates = lasagne.updates.adam(loss, params, learning_rate=0.001)
134 |
135 | # Create a loss expression for validation/testing. The crucial difference
136 | # here is that we do a deterministic forward pass through the network,
137 | # disabling dropout layers.
138 | test_prediction = lasagne.layers.get_output(network, deterministic=True)
139 | test_loss = lasagne.objectives.categorical_crossentropy(test_prediction,
140 | target_var)
141 | test_loss = test_loss.mean()
142 | # As a bonus, also create an expression for the classification accuracy:
143 | test_acc = T.mean(T.eq(T.argmax(test_prediction, axis=1), target_var),
144 | dtype=theano.config.floatX)
145 |
146 | # Compile a function performing a training step on a mini-batch (by giving
147 | # the updates dictionary) and returning the corresponding training loss:
148 | train_fn = theano.function([input_var, target_var], loss, updates=updates)
149 |
150 | # Compile a second function computing the validation loss and accuracy:
151 | val_fn = theano.function([input_var, target_var], [test_loss, test_acc])
152 |
153 | # Compile a third function computing a prediction
154 | eval_fn = theano.function([input_var], [T.argmax(test_prediction, axis=1)])
155 |
156 | # Finally, launch the training loop.
157 | print("Starting training...")
158 | # We iterate over epochs:
159 | training_error = []
160 | test_error = []
161 | test_accuracy = []
162 | for epoch in range(num_epochs):
163 | # In each epoch, we do a full pass over the training data:
164 | train_err = 0
165 | train_batches = 0
166 | start_time = time.time()
167 | for batch in iterate_minibatches(X_train, y_train, 64, shuffle=True):
168 | inputs, targets = batch
169 | train_err += train_fn(inputs, targets)
170 | train_batches += 1
171 |
172 | # And a full pass over the validation data:
173 | val_err = 0
174 | val_acc = 0
175 | val_batches = 0
176 | for batch in iterate_minibatches(X_test, y_test, 500, shuffle=False):
177 | inputs, targets = batch
178 | err, acc = val_fn(inputs, targets)
179 | val_err += err
180 | val_acc += acc
181 | val_batches += 1
182 |
183 | # Then we print the results for this epoch:
184 | print("Epoch {} of {} took {:.3f}s".format(
185 | epoch + 1, num_epochs, time.time() - start_time))
186 | print(" training loss:\t\t{:.6f}".format(train_err / train_batches))
187 | print(" validation loss:\t\t{:.6f}".format(val_err / val_batches))
188 | print(" validation accuracy:\t\t{:.2f} %".format(
189 | val_acc / val_batches * 100))
190 |
191 | training_error.append(train_err / train_batches)
192 | test_error.append(val_err / val_batches)
193 | test_accuracy.append(val_acc / val_batches)
194 |
195 | data = X_test[123:133]
196 | labels = y_test[123:133]
197 | predicted = eval_fn(data)[0]
198 | show_images(data, predicted, labels, classes)
199 | fig, ax1 = plt.subplots()
200 | ax1.plot(training_error, color='b', label='Training error')
201 | ax1.plot(test_error, color='g', label='Test error')
202 | ax2 = ax1.twinx()
203 | ax2.plot(test_accuracy, color='r', label='Test accuracy')
204 | ax1.legend(loc='upper left', numpoints=1)
205 | ax2.legend(loc='upper right', numpoints=1)
206 | plt.xlabel("Epoch")
207 |
208 | plt.show()
209 |
210 |
211 |
212 | # Optionally, you could now dump the network weights to a file like this:
213 | # np.savez('model.npz', *lasagne.layers.get_all_param_values(network))
214 | #
215 | # And load them again later on like this:
216 | # with np.load('model.npz') as f:
217 | # param_values = [f['arr_%d' % i] for i in range(len(f.files))]
218 | # lasagne.layers.set_all_param_values(network, param_values)
219 | main(num_epochs=15)
220 |
--------------------------------------------------------------------------------
/facial_recognition/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/facial_recognition/__init__.py
--------------------------------------------------------------------------------
/facial_recognition/network.py:
--------------------------------------------------------------------------------
1 | import theano
2 | import facial_recognition.util as util
3 | from lasagne import layers
4 | from nolearn.lasagne import NeuralNet
5 | from facial_recognition.util import AdjustVariable
6 | from facial_recognition.util import EarlyStopping
7 | from facial_recognition.util import FlipBatchIterator
8 | from facial_recognition.util import float32
9 |
10 | try:
11 | from lasagne.layers.cuda_convnet import Conv2DCCLayer as Conv2DLayer
12 | from lasagne.layers.cuda_convnet import MaxPool2DCCLayer as MaxPool2DLayer
13 | except ImportError:
14 | Conv2DLayer = layers.Conv2DLayer
15 | MaxPool2DLayer = layers.MaxPool2DLayer
16 |
17 |
18 | def get_net():
19 | return NeuralNet(
20 | layers=[
21 | ('input', layers.InputLayer),
22 | ('conv1', Conv2DLayer),
23 | ('pool1', MaxPool2DLayer),
24 | ('dropout1', layers.DropoutLayer),
25 | ('conv2', Conv2DLayer),
26 | ('pool2', MaxPool2DLayer),
27 | ('dropout2', layers.DropoutLayer),
28 | ('conv3', Conv2DLayer),
29 | ('pool3', MaxPool2DLayer),
30 | ('dropout3', layers.DropoutLayer),
31 | ('hidden4', layers.DenseLayer),
32 | ('dropout4', layers.DropoutLayer),
33 | ('hidden5', layers.DenseLayer),
34 | ('output', layers.DenseLayer),
35 | ],
36 | input_shape=(None, 1, 96, 96),
37 | conv1_num_filters=32, conv1_filter_size=(3, 3), pool1_pool_size=(2, 2),
38 | dropout1_p=0.1,
39 | conv2_num_filters=64, conv2_filter_size=(2, 2), pool2_pool_size=(2, 2),
40 | dropout2_p=0.2,
41 | conv3_num_filters=128, conv3_filter_size=(2, 2), pool3_pool_size=(2, 2),
42 | dropout3_p=0.3,
43 | hidden4_num_units=1000,
44 | dropout4_p=0.5,
45 | hidden5_num_units=1000,
46 | output_num_units=30, output_nonlinearity=None,
47 |
48 | update_learning_rate=theano.shared(float32(0.03)),
49 | update_momentum=theano.shared(float32(0.9)),
50 |
51 | regression=True,
52 | batch_iterator_train=FlipBatchIterator(batch_size=128),
53 | on_epoch_finished=[
54 | AdjustVariable('update_learning_rate', start=0.03, stop=0.0001),
55 | AdjustVariable('update_momentum', start=0.9, stop=0.999),
56 | EarlyStopping(patience=200),
57 | ],
58 | max_epochs=3000,
59 | verbose=1,
60 | )
61 |
62 |
63 | def train_network(net, save_name=''):
64 | print("Loading data...")
65 | X, y = util.load2d(util.FTRAIN)
66 | print("Building network...")
67 | print("Started training...")
68 | net.fit(X, y)
69 | print("Finished training...")
70 | print("Saving network...")
71 | util.pickle_network(save_name + ".pkl", net)
72 | util.visualize_learning(net)
73 |
74 |
75 | def load_and_visualize_network(file):
76 | print("Loading data...")
77 | X, y = util.load2d(util.FTEST)
78 | print("Loading model...")
79 | net = util.unpickle_network(file)
80 | print("Finished training...")
81 | # util.visualize_learning(net)
82 | util.visualize_predictions(net)
83 |
84 | net = get_net()
85 |
86 | train_network(net, "net")
--------------------------------------------------------------------------------
/facial_recognition/util.py:
--------------------------------------------------------------------------------
1 | import pickle as pickle
2 | from nolearn.lasagne import BatchIterator
3 |
4 | from datetime import datetime
5 | from pandas import DataFrame
6 | from pandas.io.parsers import read_csv
7 | import numpy as np
8 | from PIL import Image
9 | from matplotlib import pyplot
10 | from scipy.ndimage.filters import convolve
11 | from math import ceil
12 | import theano.tensor as T
13 | import theano
14 | from lasagne.layers import get_output
15 | from scipy.ndimage import rotate
16 | import sys
17 | import os
18 | import zipfile
19 | if sys.version_info[0] == 2:
20 | from urllib import urlretrieve
21 | else:
22 | from urllib.request import urlretrieve
23 |
24 |
25 | FTRAIN = 'data/training.csv'
26 | FTEST = 'data/test.csv'
27 | FLOOKUP = 'data/IdLookupTable.csv'
28 |
29 |
30 | def float32(k):
31 | return np.cast['float32'](k)
32 |
33 |
34 | class RotateBatchIterator(BatchIterator):
35 | def transform(self, Xb, yb):
36 | Xb, yb = super(RotateBatchIterator, self).transform(Xb, yb)
37 |
38 | angle = np.random.randint(-10,11)
39 | Xb_rotated = rotate(Xb, angle, axes=(2, 3), reshape=False)
40 |
41 | return Xb_rotated, yb
42 |
43 |
44 | class PreSplitTrainSplit(object):
45 |
46 | def __init__(self, X_train, y_train, X_valid, y_valid):
47 | self.X_train = X_train
48 | self.y_train = y_train
49 | self.X_valid = X_valid
50 | self.y_valid = y_valid
51 |
52 | def __call__(self, X, y, net):
53 | return self.X_train, self.X_valid, self.y_train, self.y_valid
54 |
55 |
56 | class AdjustVariable(object):
57 | def __init__(self, name, start=0.03, stop=0.001):
58 | self.name = name
59 | self.start, self.stop = start, stop
60 | self.ls = None
61 |
62 | def __call__(self, nn, train_history):
63 | if self.ls is None:
64 | self.ls = np.linspace(self.start, self.stop, nn.max_epochs)
65 |
66 | epoch = train_history[-1]['epoch']
67 | if epoch >= nn.max_epochs:
68 | return
69 | new_value = float32(self.ls[epoch - 1])
70 | getattr(nn, self.name).set_value(new_value)
71 |
72 |
73 | def load_file(file):
74 |
75 | def url(file):
76 | if file is FTRAIN:
77 | return 'http://folk.ntnu.no/alfredvc/workshop/data/training.zip'
78 | if file is FTEST:
79 | return 'http://folk.ntnu.no/alfredvc/workshop/data/test.zip'
80 | if file is FLOOKUP:
81 | return 'http://folk.ntnu.no/alfredvc/workshop/data/test.zip'
82 |
83 | def zip(file):
84 | if file is FTRAIN:
85 | return 'data/training.zip'
86 | if file is FTEST:
87 | return 'data/test.zip'
88 |
89 | def download(file):
90 | print("Downloading %s" % file)
91 | urlretrieve(url(file), zip(file))
92 | print("Unzipping data %s" % file)
93 | if file is FTRAIN or file is FTEST:
94 | with zipfile.ZipFile(zip(file), "r") as z:
95 | z.extractall('data/')
96 | print("Deleting zip file " + zip(file))
97 | os.remove(zip(file))
98 |
99 | if not os.path.exists(file):
100 | download(file)
101 |
102 | return read_csv(file)
103 |
104 |
105 | def load(file_path):
106 | """Loads data from FTEST if *test* is True, otherwise from FTRAIN.
107 | Pass a list of *cols* if you're only interested in a subset of the
108 | target columns.
109 | """
110 |
111 | df = load_file(file_path)
112 |
113 | # The Image column has pixel values separated by space; convert
114 | # the values to numpy arrays:
115 | df['Image'] = df['Image'].apply(lambda im: np.fromstring(im, sep=' '))
116 |
117 | df = df.dropna() # drop all rows that have missing values in them
118 |
119 | X = np.vstack(df['Image'].values) / 255. # scale pixel values to [0, 1]
120 | X = X.astype(np.float32)
121 |
122 | if file_path is FTRAIN: # only FTRAIN has any target columns
123 | y = df[df.columns[:-1]].values
124 | y = (y - 48) / 48 # scale target coordinates to [-1, 1]
125 | y = y.astype(np.float32)
126 | else:
127 | y = None
128 |
129 | # print("X.shape == {}; X.min == {:.3f}; X.max == {:.3f}".format(
130 | # X.shape, X.min(), X.max()))
131 | # print("y.shape == {}; y.min == {:.3f}; y.max == {:.3f}".format(
132 | # y.shape, y.min(), y.max()))
133 |
134 | return X, y
135 |
136 |
137 | def load2d(file_path):
138 | X, y = load(file_path)
139 | X = X.reshape(-1, 1, 96, 96)
140 | return X, y
141 |
142 |
143 | def pickle_network(file_name, network):
144 | # in case the model is very big
145 | sys.setrecursionlimit(10000)
146 | with open(file_name, 'wb') as f:
147 | pickle.dump(network, f, -1)
148 |
149 |
150 | def unpickle_network(file_name):
151 | with open(file_name, 'rb') as f: # !
152 | return pickle.load(f)
153 |
154 |
155 | class EarlyStopping(object):
156 | def __init__(self, patience=100):
157 | self.patience = patience
158 | self.best_valid = np.inf
159 | self.best_valid_epoch = 0
160 | self.best_weights = None
161 |
162 | def __call__(self, nn, train_history):
163 | current_valid = train_history[-1]['valid_loss']
164 | current_epoch = train_history[-1]['epoch']
165 | if current_valid < self.best_valid:
166 | self.best_valid = current_valid
167 | self.best_valid_epoch = current_epoch
168 | self.best_weights = nn.get_all_params_values()
169 | elif self.best_valid_epoch + self.patience < current_epoch:
170 | print("Early stopping.")
171 | print("Best valid loss was {:.6f} at epoch {}.".format(
172 | self.best_valid, self.best_valid_epoch))
173 | nn.load_params_from(self.best_weights)
174 | raise StopIteration()
175 |
176 |
177 | class FlipBatchIterator(BatchIterator):
178 | flip_indices = [
179 | (0, 2), (1, 3),
180 | (4, 8), (5, 9), (6, 10), (7, 11),
181 | (12, 16), (13, 17), (14, 18), (15, 19),
182 | (22, 24), (23, 25),
183 | ]
184 |
185 | def transform(self, Xb, yb):
186 | Xb, yb = super(FlipBatchIterator, self).transform(Xb, yb)
187 |
188 | # Flip half of the images in this batch at random:
189 | bs = Xb.shape[0]
190 | indices = np.random.choice(bs, bs / 2, replace=False)
191 | Xb[indices] = Xb[indices, :, :, ::-1]
192 |
193 | if yb is not None:
194 | # Horizontal flip of all x coordinates:
195 | yb[indices, ::2] = yb[indices, ::2] * -1
196 |
197 | # Swap places, e.g. left_eye_center_x -> right_eye_center_x
198 | for a, b in self.flip_indices:
199 | yb[indices, a], yb[indices, b] = (
200 | yb[indices, b], yb[indices, a])
201 |
202 | return Xb, yb
203 |
204 |
205 | def plot_sample(x, y, axis):
206 | img = x.reshape(96, 96)
207 | axis.imshow(img, cmap='gray')
208 | axis.scatter(y[0::2] * 48 + 48, y[1::2] * 48 + 48, marker='x', s=10)
209 |
210 |
211 | def visualize_predictions(net):
212 | X, _ = load2d(FTEST)
213 | y_pred = net.predict(X)
214 |
215 | fig = pyplot.figure(figsize=(6, 6))
216 | fig.subplots_adjust(
217 | left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
218 |
219 | for i in range(16):
220 | ax = fig.add_subplot(4, 4, i + 1, xticks=[], yticks=[])
221 | plot_sample(X[i], y_pred[i], ax)
222 |
223 | pyplot.show()
224 |
225 |
226 | def load_and_plot_layer(layer):
227 | with open(layer, 'rb') as f:
228 | layer0 = np.load(f)
229 | fig = pyplot.figure()
230 | fig.subplots_adjust(
231 | left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
232 | for i in range(32):
233 | img = layer0[i, :, :]
234 | img -= np.min(img)
235 | img /= np.max(img) / 255.0
236 | ax = fig.add_subplot(4, 8, i + 1, xticks=[], yticks=[])
237 | ax.imshow(img, cmap='gray', interpolation='none')
238 | pyplot.show()
239 |
240 | def create_submition(net):
241 | X = load2d(FTEST)[0]
242 | y_pred = net.predict(X)
243 |
244 | y_pred2 = y_pred * 48 + 48
245 | y_pred2 = y_pred2.clip(0, 96)
246 |
247 | cols = ("left_eye_center_x","left_eye_center_y","right_eye_center_x","right_eye_center_y","left_eye_inner_corner_x","left_eye_inner_corner_y","left_eye_outer_corner_x","left_eye_outer_corner_y","right_eye_inner_corner_x","right_eye_inner_corner_y","right_eye_outer_corner_x","right_eye_outer_corner_y","left_eyebrow_inner_end_x","left_eyebrow_inner_end_y","left_eyebrow_outer_end_x","left_eyebrow_outer_end_y","right_eyebrow_inner_end_x","right_eyebrow_inner_end_y","right_eyebrow_outer_end_x","right_eyebrow_outer_end_y","nose_tip_x","nose_tip_y","mouth_left_corner_x","mouth_left_corner_y","mouth_right_corner_x","mouth_right_corner_y","mouth_center_top_lip_x","mouth_center_top_lip_y","mouth_center_bottom_lip_x","mouth_center_bottom_lip_y")
248 |
249 | df = DataFrame(y_pred2, columns=cols)
250 |
251 | lookup_table = load_file(FLOOKUP)
252 | values = []
253 |
254 | for index, row in lookup_table.iterrows():
255 | values.append((
256 | row['RowId'],
257 | df.ix[row.ImageId - 1][row.FeatureName],
258 | ))
259 |
260 | now_str = datetime.now().isoformat().replace(':', '-')
261 | submission = DataFrame(values, columns=('RowId', 'Location'))
262 | filename = 'submission-{}.csv'.format(now_str)
263 | submission.to_csv(filename, index=False)
264 | print("Wrote {}".format(filename))
265 |
266 | def visualize_learning(net):
267 | train_loss = np.array([i["train_loss"] for i in net.train_history_])
268 | valid_loss = np.array([i["valid_loss"] for i in net.train_history_])
269 | pyplot.plot(train_loss, linewidth=3, label="train")
270 | pyplot.plot(valid_loss, linewidth=3, label="valid")
271 | pyplot.grid()
272 | pyplot.legend()
273 | pyplot.xlabel("epoch")
274 | pyplot.ylabel("loss")
275 | ymax = max(np.max(valid_loss), np.max(train_loss))
276 | ymin = min(np.min(valid_loss), np.min(train_loss))
277 | pyplot.ylim(ymin * 0.8, ymax * 1.2)
278 | pyplot.yscale("log")
279 | pyplot.show()
280 |
281 | def conv(input, weights):
282 | return convolve(input, weights)
283 |
284 |
285 | def show_kernels(kernels, cols=8):
286 | rows = ceil(len(kernels)*1.0/cols)
287 | fig = pyplot.figure(figsize=(cols+2, rows+1))
288 |
289 | fig.subplots_adjust(
290 | left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
291 | for i in range(len(kernels)):
292 | img = np.copy(kernels[i])
293 | img -= np.min(img)
294 | img /= np.max(img)
295 | ax = fig.add_subplot(rows, cols, i + 1, xticks=[], yticks=[])
296 | ax.imshow(img, cmap='gray', interpolation='none')
297 | pyplot.axis('off')
298 | pyplot.show()
299 |
300 |
301 | def get_activations(layer, x):
302 | # compile theano function
303 | xs = T.tensor4('xs').astype(theano.config.floatX)
304 | get_activity = theano.function([xs], get_output(layer, xs))
305 |
306 | return get_activity(x)
307 |
308 |
309 | def show_images(list, cols=1):
310 | rows = ceil(len(list)*1.0/cols)
311 | fig = pyplot.figure(figsize=(cols+2, rows+1))
312 | fig.subplots_adjust(
313 | left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
314 | for i in range(len(list)):
315 | ax = fig.add_subplot(rows, cols, i+1, xticks=[], yticks=[])
316 | ax.imshow(list[i], cmap='gray')
317 | pyplot.axis('off')
318 | pyplot.show()
319 |
320 |
321 | def get_conv_weights(net):
322 | layers = net.get_all_layers()
323 | layercounter = 0
324 | w = []
325 | b = []
326 | for l in layers:
327 | if('Conv2DLayer' in str(type(l))):
328 | weights = l.W.get_value()
329 | biases = l.b.get_value()
330 | b.append(biases)
331 | weights = weights.reshape(weights.shape[0]*weights.shape[1],weights.shape[2],weights.shape[3])
332 | w.append(weights)
333 | layercounter += 1
334 | return w, b
335 |
336 |
337 | def load_image(file):
338 | x=Image.open(file,'r')
339 | x=x.convert('L')
340 | y=np.asarray(x.getdata(),dtype=np.float32).reshape((x.size[1],x.size[0]))
341 | return y
342 |
--------------------------------------------------------------------------------
/img/ConvolutionalNeuralNetworks_11_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_11_1.png
--------------------------------------------------------------------------------
/img/ConvolutionalNeuralNetworks_12_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_12_1.png
--------------------------------------------------------------------------------
/img/ConvolutionalNeuralNetworks_5_0.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_5_0.png
--------------------------------------------------------------------------------
/img/ConvolutionalNeuralNetworks_7_1.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_7_1.png
--------------------------------------------------------------------------------
/img/ConvolutionalNeuralNetworks_9_2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/ConvolutionalNeuralNetworks_9_2.png
--------------------------------------------------------------------------------
/img/cnn_layer.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/cnn_layer.png
--------------------------------------------------------------------------------
/img/convolution_schematic.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/convolution_schematic.gif
--------------------------------------------------------------------------------
/img/dropout.jpeg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/dropout.jpeg
--------------------------------------------------------------------------------
/img/face.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/face.jpg
--------------------------------------------------------------------------------
/img/mlp.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/mlp.png
--------------------------------------------------------------------------------
/img/obama.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/obama.jpg
--------------------------------------------------------------------------------
/img/overfitting.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/overfitting.png
--------------------------------------------------------------------------------
/img/pooling_schematic.gif:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/pooling_schematic.gif
--------------------------------------------------------------------------------
/img/shared_weights.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/shared_weights.png
--------------------------------------------------------------------------------
/img/sparse_connectivity.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/img/sparse_connectivity.png
--------------------------------------------------------------------------------
/sample_solution/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Alfredvc/cnn_workshop/294ced2ec3dd6e15b140cca45cfc234ad2eb02cb/sample_solution/__init__.py
--------------------------------------------------------------------------------
/sample_solution/sample_cnn.py:
--------------------------------------------------------------------------------
1 | import lasagne
2 |
3 |
4 | def build_cnn(input_var=None):
5 | network = lasagne.layers.InputLayer(shape=(None, 3, 32, 32),
6 | input_var=input_var)
7 | network = lasagne.layers.Conv2DLayer(
8 | network, num_filters=32, filter_size=(3, 3),
9 | nonlinearity=lasagne.nonlinearities.rectify,
10 | pad='same')
11 |
12 | network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
13 |
14 | network = lasagne.layers.DropoutLayer(network, p=.2)
15 |
16 | network = lasagne.layers.Conv2DLayer(
17 | network, num_filters=64, filter_size=(3, 3),
18 | nonlinearity=lasagne.nonlinearities.rectify,
19 | pad='same')
20 |
21 | network = lasagne.layers.DropoutLayer(network, p=.2)
22 |
23 | network = lasagne.layers.Conv2DLayer(
24 | network, num_filters=64, filter_size=(3, 3),
25 | nonlinearity=lasagne.nonlinearities.rectify,
26 | pad='same')
27 | network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2))
28 |
29 | network = lasagne.layers.DenseLayer(
30 | lasagne.layers.dropout(network, p=.5),
31 | num_units=512,
32 | nonlinearity=lasagne.nonlinearities.rectify)
33 |
34 | network = lasagne.layers.DenseLayer(
35 | lasagne.layers.dropout(network, p=.5),
36 | num_units=10,
37 | nonlinearity=lasagne.nonlinearities.softmax)
38 |
39 | return network
40 |
--------------------------------------------------------------------------------