├── .gitignore
├── A Semisupervised Approach for Language Identification based on Ladder Networks.ipynb
├── LICENSE
├── README.md
├── The dark knowledge of tongues.ipynb
├── fuel.ipynb
├── ladder.py
├── language-tree.jpg
├── nn.py
├── run.py
└── utils.py


/.gitignore:
--------------------------------------------------------------------------------
 1 | # Byte-compiled / optimized / DLL files
 2 | __pycache__/
 3 | *.py[cod]
 4 | 
 5 | # C extensions
 6 | *.so
 7 | 
 8 | # Distribution / packaging
 9 | .Python
10 | env/
11 | build/
12 | develop-eggs/
13 | dist/
14 | downloads/
15 | eggs/
16 | .eggs/
17 | lib/
18 | lib64/
19 | parts/
20 | sdist/
21 | var/
22 | *.egg-info/
23 | .installed.cfg
24 | *.egg
25 | 
26 | # PyInstaller
27 | #  Usually these files are written by a python script from a template
28 | #  before PyInstaller builds the exe, so as to inject date/other infos into it.
29 | *.manifest
30 | *.spec
31 | 
32 | # Installer logs
33 | pip-log.txt
34 | pip-delete-this-directory.txt
35 | 
36 | # Unit test / coverage reports
37 | htmlcov/
38 | .tox/
39 | .coverage
40 | .coverage.*
41 | .cache
42 | nosetests.xml
43 | coverage.xml
44 | *,cover
45 | 
46 | # Translations
47 | *.mo
48 | *.pot
49 | 
50 | # Django stuff:
51 | *.log
52 | 
53 | # Sphinx documentation
54 | docs/_build/
55 | 
56 | # PyBuilder
57 | target/
58 | 


--------------------------------------------------------------------------------
/A Semisupervised Approach for Language Identification based on Ladder Networks.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# Data"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "markdown",
 12 |    "metadata": {},
 13 |    "source": [
 14 |     "The original dataset is made from training (15000 samples), dev (6431) and testing (6500) files. Only the 400 *i-vector* features where used. A process to whiten the entire dataset was applied before using the feature set $x_i$ and only the dev set was used to train the whitening parameters (see code suplied with the data by the competition organizers). Each sample is either unlabeled (all dev and testing samples) and we will label it as $y_i=0$ or is one of $n=50$ different categories $y_i \\in \\{1 \\ldots n \\}$ (all training samples.)"
 15 |    ]
 16 |   },
 17 |   {
 18 |    "cell_type": "markdown",
 19 |    "metadata": {},
 20 |    "source": [
 21 |     "## Cross validation"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "markdown",
 26 |    "metadata": {},
 27 |    "source": [
 28 |     "In order to select hyper parameters of the model a modified [cross validation dataset was built](./fuel.ipynb) from the training dataset.\n",
 29 |     "\n",
 30 |     "In the modified dataset, the $n$ known original training labels are considered to be the entire label space of the modified dataset and from them a subset is assumed to be known.\n",
 31 |     "The other labeles are assumed to be out-of-set for the purpose of the modified dataset.\n",
 32 |     "The number of assumed known labels is such that the ratio of known and unknown labels in the modified set is:\n",
 33 |     "\n",
 34 |     "$Q = \\lfloor \\left( 1 - P_{\\text{oos}} \\right) * n \\rfloor = 38 \\quad P_{\\text{oos}} = 0.23$\n",
 35 |     "\n",
 36 |     "The labels of the modified dataset are re-indexed such that the labels assumed to be known are $y_i \\in \\{1 \\ldots Q \\}$\n",
 37 |     "\n",
 38 |     "A part, $1-r$, of the training data with labels assumed to be known is used for training as labeled data. The rest, $r$, of the samples with labels assumed to be known are mixed with $r$ of the rest of the training which has labels that are assumed to be out-of-set. The mix is used for training as unlabeled data.\n",
 39 |     "For having an the number of unlabeled samples to be $u=0.5$ from the number of labeled samples (the ratio between `dev` and `training` sizes):\n",
 40 |     "\n",
 41 |     "\n",
 42 |     "$r = Q*u/(50+Q*u)$\n",
 43 |     "\n",
 44 |     "The remaining $1-r$ samples with labels assumed to be unknown are dropped.\n",
 45 |     "\n",
 46 |     "Each of the steps above, in building the modified dataset, uses a random selection process. The process of creating a modified dataset can be repeated many times giving each label an opportunity to be out-of-set."
 47 |    ]
 48 |   },
 49 |   {
 50 |    "cell_type": "markdown",
 51 |    "metadata": {},
 52 |    "source": [
 53 |     "# Model Training"
 54 |    ]
 55 |   },
 56 |   {
 57 |    "cell_type": "markdown",
 58 |    "metadata": {},
 59 |    "source": [
 60 |     "When training a model all samples are used, labeled and unlabeled. For cross validation, this is the modified dataset and for submission this is training and dev datasets, the test dataset is only used to make final prediction for submission.\n",
 61 |     "\n",
 62 |     "The model generates probability for each sample, $x_i$, to be out-of-set or in one of the categories. When doing cross validation the model will generate $Q+1=39$ categories and when training on the entire available data the model will generate $n+1=51$ categories. The label $l=0$ is used for out-of-set prediction (not to be confused with unlabeled sample.)\n",
 63 |     "\n",
 64 |     "$p(l) = p(l \\mid x_i) \\quad l \\in \\{ 0 \\ldots Q \\} \\quad \\text{or} \\quad l \\in \\{ 0 \\ldots n \\} $"
 65 |    ]
 66 |   },
 67 |   {
 68 |    "cell_type": "markdown",
 69 |    "metadata": {},
 70 |    "source": [
 71 |     "## Final score"
 72 |    ]
 73 |   },
 74 |   {
 75 |    "cell_type": "markdown",
 76 |    "metadata": {},
 77 |    "source": [
 78 |     "According to the [evaluation plan](http://www.nist.gov/itl/iad/mig/upload/lre_ivectorchallenge_rel_v2.pdf) of the competition, the goal is to minimize:\n",
 79 |     "\n",
 80 |     "$\\text{Cost} = \\frac{1-P_{\\text{oos}}}{n} * \\sum_{k=1}^n P_{\\text{error}}(k) + P_{\\text{oos}} * P_{\\text{error}}(\\text{oos}) \\qquad [1]$\n",
 81 |     "\n",
 82 |     "$P_{\\text{error}}(k) = \\left( \\frac{\\text{#errors_class_k}}{\\text{#trials_class_k}}  \\right) $"
 83 |    ]
 84 |   },
 85 |   {
 86 |    "cell_type": "markdown",
 87 |    "metadata": {},
 88 |    "source": [
 89 |     "In the cross validation stage we can compute this cost directly, by replacing $n$ with $Q$, and using the information we have on the validation part of the modified dataset. We will use this score to select the best hyper parameters."
 90 |    ]
 91 |   },
 92 |   {
 93 |    "cell_type": "markdown",
 94 |    "metadata": {},
 95 |    "source": [
 96 |     "## Loss function"
 97 |    ]
 98 |   },
 99 |   {
100 |    "cell_type": "markdown",
101 |    "metadata": {},
102 |    "source": [
103 |     "The training process optimize the model internal parameters (weights) minimizing a loss function. We describe the loss function used in cross validation training, when training for a submission, substitute $Q=n$.\n",
104 |     "\n",
105 |     "The loss is computed as a sum of loss on batches of samples. Each batch has ($N=1024$) samples. For each sample, $x_i$, the loss function accepts as input the $Q+1$ probabilities, $p(l \\mid x_i)$ from the model and the label information, $y_i$. Note that $p(0 \\mid x_i)$ gives the probability of the model to out-of-set label and $y_i = 0$ is used to indicate that the sample $x_i$ is not labeled."
106 |    ]
107 |   },
108 |   {
109 |    "cell_type": "markdown",
110 |    "metadata": {},
111 |    "source": [
112 |     "The loss of a batch is made from several parts:\n",
113 |     "\n",
114 |     "$\\text{loss} = \\text{cross_entopy} + \\beta \\cdot \\text{aprior_average_cross_entropy} + \\gamma \\cdot \\text{binary_cross_entropy} \\qquad [2]$\n",
115 |     "\n",
116 |     "where $\\beta$ and $\\gamma$ are hyper-parameters. After running cross validation tests the values $\\beta=0.15$ and $\\gamma=0.01$ were selected.\n",
117 |     "\n",
118 |     "### cross entropy\n",
119 |     "for the labeled samples in the batch the loss is\n",
120 |     "\n",
121 |     "$\\text{cross_entopy} = \\frac{1}{N_l} \\sum_{i : y_i \\in \\{1 \\ldots Q \\}} -\\log p(y_i \\mid x_i)$\n",
122 |     "\n",
123 |     "were $N_l$ is the number of labeled samples in the batch\n",
124 |     "\n",
125 |     "$N_l = \\sum_{i : y_i \\in \\{1 \\ldots Q \\}} 1$\n",
126 |     "\n",
127 |     "### aprior  cross entropy\n",
128 |     "Aprior, we assume that the predicted probabilities of unlabeled samples should have the distribution:\n",
129 |     "\n",
130 |     "$P^a (0) = P_\\text{oos} \\quad P^a (l) = \\frac{1-P_\\text{oos}}{Q} \\quad \\forall l \\in \\{1 \\ldots Q \\}$\n",
131 |     "\n",
132 |     "This distribution is correct for the cross validation modified dataset and we assume it is correct for the dev dataset.\n",
133 |     "\n",
134 |     "Armed with the apriori distribution, we can add a loss term which measure the cross entropy between predictions made on unlabeled samples and this\n",
135 |     "apriori distribution:\n",
136 |     "\n",
137 |     "$\\text{aprior_cross_entropy} = \\frac{1}{N_u} \\sum_{i : y_i \\notin \\{1 \\ldots Q \\}} -P^a(l)\\log(p(l \\mid x_i))$\n",
138 |     "\n",
139 |     "were $N_u$ is the number of labeled samples in the batch\n",
140 |     "\n",
141 |     "$N_u = \\sum_{i : y_i \\notin \\{1 \\ldots Q \\}} 1$\n",
142 |     "\n",
143 |     "### aprior average cross entropy\n",
144 |     "However it was found that a much better result is achieved by first averaging all the predicted probabilities over the unlabeled samples in the batch and only then \n",
145 |     "measuring its cross entropy with the aprior probability:\n",
146 |     "\n",
147 |     "$\\bar{p}(l) = \\frac{1}{N_u} \\sum_{i : y_i \\notin \\{1 \\ldots Q \\}} p(l \\mid x_i) \\\\\n",
148 |     "\\text{aprior_average_cross_entropy} = - \\sum_{l=0}^Q P^a (l) \\log(\\bar{p}(l \\mid x_i))$\n",
149 |     "\n",
150 |     "### aprior average Dirichlet\n",
151 |     "\n",
152 |     "$C_2 = -􀀀p_\\text{oos} \\log p_\\text{av}(\\text{oos}) 􀀀- \\frac{1 - p_\\text{oos}}{k} \\sum_{i=1}^k \\log p_\\text{av}(i)$\n",
153 |     "\n",
154 |     "changes to \n",
155 |     "\n",
156 |     "$\\text{NLLK}(p_\\text{av}) = -(\\alpha_\\text{oos} - 1) \\log p_\\text{av}(\\text{oos}) 􀀀- \\sum_{i=1}^k (\\alpha_i - 1) \\log p_\\text{av}(i) \\quad + \\text{constant}$\n",
157 |     "\n",
158 |     "such that \n",
159 |     "\n",
160 |     "$p_\\text{oos} = \\frac{\\alpha_\\text{oos}}{\\alpha_\\text{sum}} \\qquad \\frac{1 - p_\\text{oos}}{k} = \\frac{\\alpha_i}{\\alpha_\\text{sum}}$\n",
161 |     "\n",
162 |     "where\n",
163 |     "\n",
164 |     "$\\alpha_\\text{sum} = \\alpha_\\text{oos} + \\sum_{i=1}^k \\alpha_i$\n",
165 |     "\n",
166 |     "redefine $C_2$ as\n",
167 |     "\n",
168 |     "$C_2 = -(􀀀p_\\text{oos} - \\delta) \\log p_\\text{av}(\\text{oos}) 􀀀- \\left( \\frac{1 - p_\\text{oos}}{k} - \\delta \\right) \\sum_{i=1}^k \\log p_\\text{av}(i)$\n",
169 |     "\n",
170 |     "where $\\alpha_\\text{sum}$ is moved outside into $C_2$ scale factor $\\alpha$ and $\\delta = 1/\\alpha_\\text{sum}$\n",
171 |     "\n",
172 |     "\n",
173 |     "### binary cross entropy\n",
174 |     "We will use $p(0 \\mid x_i)$ to predict if $x_i$ is out-of-set or not. If $x_i$ happens to be a labeled sample, we know it is not out-of-set and if it is unlabeled we know there is $P_\\text{oos}$ chance that it is out-of-set.\n",
175 |     "Again this is something which is true for the corss validation modified dataset and assumed to be true for the dev dataset:\n",
176 |     "\n",
177 |     "$\\text{binary_cross_entropy} = -\\frac{1}{N} \\left[ \\sum_{i:y_i \\notin \\{1 \\ldots Q \\}} \\left( P_\\text{oos} \\log(p_0(i)) + (1-P_\\text{oos}) \\log(p_1(i)) \\right) + \\sum_{i:y_i \\in \\{1 \\ldots Q \\}} \\log(p_1(i)) \\right]$\n",
178 |     "\n",
179 |     "were\n",
180 |     "\n",
181 |     "$p_0(i) = p(0 \\mid x_i) \\quad p_1(i) = 1-p_0(i)$"
182 |    ]
183 |   },
184 |   {
185 |    "cell_type": "markdown",
186 |    "metadata": {},
187 |    "source": [
188 |     "# Model"
189 |    ]
190 |   },
191 |   {
192 |    "cell_type": "markdown",
193 |    "metadata": {},
194 |    "source": [
195 |     "The loss function we use [2] is applied to all available data: training and dev datasets. However the strongest signal is from the training (labeled) part and effectively we are in a situation in which 1/3 of the available data is unlabeled. It is therefore beneficial to use semi-supervised technique which will utilize the information available in all the data and not just in the training set."
196 |    ]
197 |   },
198 |   {
199 |    "cell_type": "markdown",
200 |    "metadata": {},
201 |    "source": [
202 |     "Predictions, $p(y_i \\mid x_i)$, are made using a modified [Ladder Network](http://arxiv.org/abs/1507.02672). The original Ladder Network [code](https://github.com/CuriousAI/ladder) was slightly modified. The code was modified to accept the training and dev data of the competition and was used in its entire both for supervised and unsupervised parts of the ladder method. The objective function used in computing the cost of the supervised part of the ladder method was replaced from a simple Cross Entropy to the loss function [2]. In addition, the error rate [1] was monitored while training on cross-validation dataset to determine the optimal number of epochs for training. The setup used for training that gave the best results are as follows:\n",
203 |     "\n",
204 |     "```bash\n",
205 |     "python run.py train --lr 1e-3 --labeled-samples 21431 --unlabeled-samples 21431 --encoder-layers 500-500-500-100-51 --decoder-spec gauss,relu,relu,relu,relu,relu --denoising-cost-x 1,1,.3,.3,.3,.3 --dseed 0 --seed 2 --super-noise-std 0.5 --f-local-noise-std 0.5 --batch-size 1024 --valid-batch-size 1024 --num-epochs 1000 --dataset 160111-fuel.test -- test.\n",
206 |     "```\n",
207 |     "\n",
208 |     "The interpretation of each of the parameters is as follows:\n",
209 |     "\n",
210 |     "parameter | value | description\n",
211 |     "--- | --- | ---\n",
212 |     "dataset | 160111-fuel.test | Both training and dev datasets were used as input. For cross validation this was changed to `160111-fuel.train`\n",
213 |     "labeled-samples | 21431 | All samples in training and dev were used for training the supervised part of the ladder method. This is made possible because the modified loss function has a part which is applied on unlabeled samples. For cross validation this was modified to `10000` and the rest of the modified dataset was used for validation\n",
214 |     "unlabeled-samples | 21431 | All samples in training and dev were used in the unsupervised parts of the ladder method. For cross validation this was modified to `10000`\n",
215 |     "encoder-layers | 500-500-500-100-51 | The network has an input of dimension 400 which pass through 4 hidden layers of size 500, 500, 500 and 100 and a final output layer of 51. For cross validation this was modified to 39.\n",
216 |     "decoder-spec | gauss,relu,relu,relu,relu,relu | A direct skip of information from the encoder to the decoder was used only on the input layer using the gaussian method described in ladder paper.\n",
217 |     "denoising-cost-x | 1,1,.3,.3,.3,.3 | The L2 error of the de-noising layers compared with an un-noised clean encoder was weighted with a weight of 1 for the input layer and the first hidden layer and 0.3 for all other layers.\n",
218 |     "super-noise-std | 0.5 | std of gaussian noise added to the input of the courrputed encoder\n",
219 |     "f-local-noise-std | 0.5 | std of gaussian noise added to output of all layers courrputed encoder\n",
220 |     "lr | 1e-3 | Learning rate\n",
221 |     "num-epochs | 1000 | Number of epoch iterations for which training was made. Before each iteration the order of the samples was shuffled. It turns out that because of the unsupervised learning the ladder method is insensitive to the number of epochs and having between 800 to 2000 epoch iterations would give similar results\n",
222 |     "batch-size  | 1024| batch size used for training. this size has a secondary effect through the loss function which performed an average of predictions before computing the loss\n",
223 |     "lrate-decay | 0.67 (default) | the learning rate starts to decay linearly to zero after passing 0.67 of the epoch iterations\n",
224 |     "act | relu (default) | the activation of the encoder layers except for the last layer which is always softmax"
225 |    ]
226 |   },
227 |   {
228 |    "cell_type": "markdown",
229 |    "metadata": {},
230 |    "source": [
231 |     "# Results"
232 |    ]
233 |   },
234 |   {
235 |    "cell_type": "markdown",
236 |    "metadata": {},
237 |    "source": [
238 |     "the results where measured by generating predictions on the test dataset using the model found in the training process. The prediction were then submitted to the competition web site which used an unknown subset of 30% of the samples to compute a score for the PROGESS SET (results for the 70% eval set are not reported by the web site.) "
239 |    ]
240 |   },
241 |   {
242 |    "cell_type": "markdown",
243 |    "metadata": {},
244 |    "source": [
245 |     "Score | Description | Command line\n",
246 |     "--- | --- | ---\n",
247 |     "24.000 | The best configuration which was described above. This would have been translated to 11th place while the competition was in progress | --lr 1e-3 --labeled-samples 21431 --unlabeled-samples 21431 --encoder-layers 500-500-500-100-51 --decoder-spec gauss,relu,relu,relu,relu,relu --denoising-cost-x 1,1,.3,.3,.3,.3 --dseed 0 --seed 2 --super-noise-std 0.5 --f-local-noise-std 0.5 --batch-size 1024 --valid-batch-size 1024 --num-epochs 1000\n",
248 |     "31.487 | In this configuration the unsupervisied part of the ladder algorithm is disabled. An early stopping after 138 epochs was needed to avoid overfiting | --lr 1e-3 --labeled-samples 21431 --unlabeled-samples 21431 --encoder-layers 500-500-500-100-51 --decoder-spec gauss,relu,relu,relu,relu,relu --denoising-cost-x 0,0,0,0,0,0 --decoder-spec 0-0-0-0-0-0 --dseed 0 --seed 2 --super-noise-std 0.5 --f-local-noise-std 0.5 --batch-size 1024 --valid-batch-size 1024 --num-epochs 138"
249 |    ]
250 |   },
251 |   {
252 |    "cell_type": "markdown",
253 |    "metadata": {},
254 |    "source": [
255 |     "To generate a submission file identify the directory in which the training stored its results. This is a subdirectory under `./results/` the subdirectory name has prefix determined by the last argument in the command line. In the example given above the prefix is `test.`. The suffix of the subdirectory is a number which is incremented after every training run. Below I assume that all of this results in `results/test.0`\n",
256 |     "\n",
257 |     "You then generate predictions with\n",
258 |     "```bash\n",
259 |     "run.py dump --layer -1 -- results/test.0\n",
260 |     "```"
261 |    ]
262 |   },
263 |   {
264 |    "cell_type": "markdown",
265 |    "metadata": {},
266 |    "source": [
267 |     "The submission is made from the predictions on the `test` part of the dataset file (last 6500 samples) that are saved in `bz2` file which can be submitted to the web site"
268 |    ]
269 |   },
270 |   {
271 |    "cell_type": "code",
272 |    "execution_count": null,
273 |    "metadata": {
274 |     "collapsed": true
275 |    },
276 |    "outputs": [],
277 |    "source": [
278 |     "import numpy as np\n",
279 |     "import bz2\n",
280 |     "\n",
281 |     "yprob = np.load('results/test.0/layer-1.npy'%t)\n",
282 |     "y_pred = np.argmax(yprob,axis=1)\n",
283 |     "fn = 'submission.txt.bz2'\n",
284 |     "with bz2.BZ2File('data/%s'%fn, 'w') as f:\n",
285 |     "    for i in y_pred[-6500:]:\n",
286 |     "        f.write('%s\\n' % idx2lang[i])"
287 |    ]
288 |   },
289 |   {
290 |    "cell_type": "markdown",
291 |    "metadata": {},
292 |    "source": [
293 |     "# Reference"
294 |    ]
295 |   },
296 |   {
297 |    "cell_type": "markdown",
298 |    "metadata": {},
299 |    "source": [
300 |     "1. https://ivectorchallenge.nist.gov/\n",
301 |     "2. http://www.nist.gov/itl/iad/mig/upload/lre_ivectorchallenge_rel_v2.pdf\n",
302 |     "2. http://arxiv.org/abs/1507.02672\n",
303 |     "3. https://github.com/CuriousAI/ladder\n",
304 |     "3. http://arxiv.org/abs/1511.06430v3"
305 |    ]
306 |   }
307 |  ],
308 |  "metadata": {
309 |   "kernelspec": {
310 |    "display_name": "Python 2",
311 |    "language": "python",
312 |    "name": "python2"
313 |   },
314 |   "language_info": {
315 |    "codemirror_mode": {
316 |     "name": "ipython",
317 |     "version": 2
318 |    },
319 |    "file_extension": ".py",
320 |    "mimetype": "text/x-python",
321 |    "name": "python",
322 |    "nbconvert_exporter": "python",
323 |    "pygments_lexer": "ipython2",
324 |    "version": "2.7.11"
325 |   }
326 |  },
327 |  "nbformat": 4,
328 |  "nbformat_minor": 0
329 | }
330 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License (MIT)
 2 | 
 3 | Copyright (c) 2016 Ehud Ben-Reuven
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | <a target="_blank" href="http://twitter.com/udibr"><img alt='Twitter followers' src="https://img.shields.io/twitter/follow/udibr.svg?style=social"></a>
 2 | 
 3 | This repository contains source code for the experiments in a paper titled [A Semisupervised Approach for Language Identification based on Ladder Networks](http://arxiv.org/pdf/1604.00317v1.pdf)
 4 | 
 5 | In 2015 NIST conducted a [LRE i-vector challenge](https://ivectorchallenge.nist.gov/evaluations/2).
 6 | The challenge was to identify which language is spoken from a speech sample, given that the language belongs 
 7 | to one of 50 given language or is one of out-of-set languages.
 8 | The speech samples were already processed into `i-vectors` and duration information.
 9 | The data was split into `training`, `dev` and `test`.
10 | The `training` data included labeled samples from the 50 given languages.
11 | The `dev` data included unlabeled samples from both the 50 given languages and the out-of-set languages.
12 | The `test` was similar to `dev` but it could have been only used for making submissions to the competition.
13 | 
14 | * [our solution](./A%20Semisupervised%20Approach%20for%20Language%20Identification%20based%20on%20Ladder%20Networks.ipynb) used a modification of the [Ladder Network](http://arxiv.org/abs/1507.02672) and [published code](https://github.com/CuriousAI/ladder).
15 | * [The dark knowledge of tongues](./The%20dark%20knowledge%20of%20tongues.ipynb), fun with the i-vector dataset supplied by the challenge.
16 | * [Odyssey 2016, video lecture](https://www.superlectures.com/odyssey2016/a-semisupervised-approach-for-language-identification-based-on-ladder-networks)
17 | 


--------------------------------------------------------------------------------
/fuel.ipynb:
--------------------------------------------------------------------------------
  1 | {
  2 |  "cells": [
  3 |   {
  4 |    "cell_type": "markdown",
  5 |    "metadata": {},
  6 |    "source": [
  7 |     "# fuel"
  8 |    ]
  9 |   },
 10 |   {
 11 |    "cell_type": "code",
 12 |    "execution_count": 1,
 13 |    "metadata": {
 14 |     "collapsed": true
 15 |    },
 16 |    "outputs": [],
 17 |    "source": [
 18 |     "import bz2\n",
 19 |     "import csv\n",
 20 |     "import numpy as np\n",
 21 |     "import sys"
 22 |    ]
 23 |   },
 24 |   {
 25 |    "cell_type": "code",
 26 |    "execution_count": 2,
 27 |    "metadata": {
 28 |     "collapsed": false
 29 |    },
 30 |    "outputs": [
 31 |     {
 32 |      "name": "stderr",
 33 |      "output_type": "stream",
 34 |      "text": [
 35 |       "Using gpu device 0: GeForce GTX 980 (CNMeM is disabled)\n"
 36 |      ]
 37 |     },
 38 |     {
 39 |      "data": {
 40 |       "text/plain": [
 41 |        "'/Users/udi/Downloads/lisa'"
 42 |       ]
 43 |      },
 44 |      "execution_count": 2,
 45 |      "metadata": {},
 46 |      "output_type": "execute_result"
 47 |     }
 48 |    ],
 49 |    "source": [
 50 |     "import fuel, os\n",
 51 |     "fuel_path = fuel.config.data_path[0]\n",
 52 |     "fuel_path"
 53 |    ]
 54 |   },
 55 |   {
 56 |    "cell_type": "code",
 57 |    "execution_count": 3,
 58 |    "metadata": {
 59 |     "collapsed": true
 60 |    },
 61 |    "outputs": [],
 62 |    "source": [
 63 |     "base = 'data/r146_1_1/ivec15-lre/'"
 64 |    ]
 65 |   },
 66 |   {
 67 |    "cell_type": "code",
 68 |    "execution_count": 4,
 69 |    "metadata": {
 70 |     "collapsed": true
 71 |    },
 72 |    "outputs": [],
 73 |    "source": [
 74 |     "def load_ivectors(filename):\n",
 75 |     "    \"\"\"Loads ivectors\n",
 76 |     "\n",
 77 |     "    Parameters\n",
 78 |     "    ----------\n",
 79 |     "    filename : string\n",
 80 |     "        Path to ivector files (e.g. dev_ivectors.csv)\n",
 81 |     "\n",
 82 |     "    Returns\n",
 83 |     "    -------\n",
 84 |     "    ids : list\n",
 85 |     "        List of ivectorids\n",
 86 |     "    durations : array, shaped('n_ivectors')\n",
 87 |     "        Array of durations for each ivectorid\n",
 88 |     "    languages : array, shaped('n_ivectors')\n",
 89 |     "        Array of langs for each ivectorid (only applies to train)\n",
 90 |     "    ivectors : array, shaped('n_ivectors', 600)\n",
 91 |     "        Array of ivectors for each ivectorid\n",
 92 |     "    \"\"\"\n",
 93 |     "    ids = []\n",
 94 |     "    durations = []\n",
 95 |     "    languages = []\n",
 96 |     "    ivectors = []\n",
 97 |     "    with open(filename, 'rb') as infile:\n",
 98 |     "        reader = csv.reader(infile, delimiter='\\t')\n",
 99 |     "        reader.next()\n",
100 |     "\n",
101 |     "        for row in csv.reader(infile, delimiter='\\t'):\n",
102 |     "            ids.append(row[0])\n",
103 |     "            durations.append(float(row[1]))\n",
104 |     "            languages.append(row[2])\n",
105 |     "            ivectors.append(np.asarray(row[3:], dtype=np.float32))\n",
106 |     "\n",
107 |     "            sys.stdout.write(\"\\r     %s  \" % row[0])\n",
108 |     "            sys.stdout.flush()\n",
109 |     "\n",
110 |     "    print \"\\n   I-    Adding Transformed ivectors \"\n",
111 |     "\n",
112 |     "    return ids, np.array(durations, dtype=np.float32), np.array(languages), np.vstack(ivectors)"
113 |    ]
114 |   },
115 |   {
116 |    "cell_type": "code",
117 |    "execution_count": 5,
118 |    "metadata": {
119 |     "collapsed": false
120 |    },
121 |    "outputs": [
122 |     {
123 |      "name": "stdout",
124 |      "output_type": "stream",
125 |      "text": [
126 |       "     ivec15-lre_zzzzabb  \n",
127 |       "   I-    Adding Transformed ivectors \n"
128 |      ]
129 |     }
130 |    ],
131 |    "source": [
132 |     "train_ids, train_durations, train_languages, train_ivec = load_ivectors(base+'data/ivec15_lre_train_ivectors.tsv')"
133 |    ]
134 |   },
135 |   {
136 |    "cell_type": "code",
137 |    "execution_count": 6,
138 |    "metadata": {
139 |     "collapsed": false
140 |    },
141 |    "outputs": [
142 |     {
143 |      "data": {
144 |       "text/plain": [
145 |        "15000"
146 |       ]
147 |      },
148 |      "execution_count": 6,
149 |      "metadata": {},
150 |      "output_type": "execute_result"
151 |     }
152 |    ],
153 |    "source": [
154 |     "Nt = len(train_ivec)\n",
155 |     "Nt"
156 |    ]
157 |   },
158 |   {
159 |    "cell_type": "code",
160 |    "execution_count": 7,
161 |    "metadata": {
162 |     "collapsed": false
163 |    },
164 |    "outputs": [
165 |     {
166 |      "name": "stdout",
167 |      "output_type": "stream",
168 |      "text": [
169 |       "     ivec15-lre_zzyykqa  \n",
170 |       "   I-    Adding Transformed ivectors \n"
171 |      ]
172 |     },
173 |     {
174 |      "data": {
175 |       "text/plain": [
176 |        "6431"
177 |       ]
178 |      },
179 |      "execution_count": 7,
180 |      "metadata": {},
181 |      "output_type": "execute_result"
182 |     }
183 |    ],
184 |    "source": [
185 |     "dev_ids, dev_durations, dev_languages, dev_ivec = load_ivectors(base+'data/ivec15_lre_dev_ivectors.tsv')\n",
186 |     "len(dev_ids)"
187 |    ]
188 |   },
189 |   {
190 |    "cell_type": "code",
191 |    "execution_count": 8,
192 |    "metadata": {
193 |     "collapsed": false
194 |    },
195 |    "outputs": [
196 |     {
197 |      "name": "stdout",
198 |      "output_type": "stream",
199 |      "text": [
200 |       "     ivec15-lre_zzshxfc  \n",
201 |       "   I-    Adding Transformed ivectors \n"
202 |      ]
203 |     },
204 |     {
205 |      "data": {
206 |       "text/plain": [
207 |        "6500"
208 |       ]
209 |      },
210 |      "execution_count": 8,
211 |      "metadata": {},
212 |      "output_type": "execute_result"
213 |     }
214 |    ],
215 |    "source": [
216 |     "test_ids, test_durations, test_languages, test_ivec = load_ivectors(base + 'data/ivec15_lre_test_ivectors.tsv')\n",
217 |     "len(test_ids)"
218 |    ]
219 |   },
220 |   {
221 |    "cell_type": "markdown",
222 |    "metadata": {},
223 |    "source": [
224 |     "compute the mean and whitening transformation over dev set only. You are not allowed to use test and train does not have all languages"
225 |    ]
226 |   },
227 |   {
228 |    "cell_type": "code",
229 |    "execution_count": 10,
230 |    "metadata": {
231 |     "collapsed": true
232 |    },
233 |    "outputs": [],
234 |    "source": [
235 |     "m = np.mean(dev_ivec, axis=0)\n",
236 |     "S = np.cov(dev_ivec, rowvar=0)\n",
237 |     "D, V = np.linalg.eig(S)\n",
238 |     "W = (1 / np.sqrt(D) * V).transpose().astype('float32')"
239 |    ]
240 |   },
241 |   {
242 |    "cell_type": "markdown",
243 |    "metadata": {},
244 |    "source": [
245 |     "center and whiten"
246 |    ]
247 |   },
248 |   {
249 |    "cell_type": "code",
250 |    "execution_count": 11,
251 |    "metadata": {
252 |     "collapsed": true
253 |    },
254 |    "outputs": [],
255 |    "source": [
256 |     "all_durations = np.hstack((train_durations,dev_durations,test_durations))\n",
257 |     "all_data = np.vstack((train_ivec,dev_ivec,test_ivec))"
258 |    ]
259 |   },
260 |   {
261 |    "cell_type": "code",
262 |    "execution_count": 12,
263 |    "metadata": {
264 |     "collapsed": true
265 |    },
266 |    "outputs": [],
267 |    "source": [
268 |     "all_data = np.dot(all_data - m, W.transpose())"
269 |    ]
270 |   },
271 |   {
272 |    "cell_type": "markdown",
273 |    "metadata": {},
274 |    "source": [
275 |     "convert labels to int. 'out_of_set' is 0"
276 |    ]
277 |   },
278 |   {
279 |    "cell_type": "code",
280 |    "execution_count": 15,
281 |    "metadata": {
282 |     "collapsed": false
283 |    },
284 |    "outputs": [
285 |     {
286 |      "name": "stdout",
287 |      "output_type": "stream",
288 |      "text": [
289 |       "upload: data/160111-fuel.idx2lang.pkl to s3://udikaggle/nist/160111-fuel.idx2lang.pkl\r\n"
290 |      ]
291 |     }
292 |    ],
293 |    "source": [
294 |     "idx2lang = dict(enumerate(['out_of_set']+sorted(np.unique(train_languages))))\n",
295 |     "lang2idx = dict((l,i) for i,l in idx2lang.iteritems())\n",
296 |     "import cPickle as pickle\n",
297 |     "with open('data/160111-fuel.idx2lang.pkl','wb') as fp:\n",
298 |     "    pickle.dump(idx2lang,fp)\n",
299 |     "!aws s3 cp data/160111-fuel.idx2lang.pkl s3://udikaggle/nist/"
300 |    ]
301 |   },
302 |   {
303 |    "cell_type": "code",
304 |    "execution_count": 16,
305 |    "metadata": {
306 |     "collapsed": true
307 |    },
308 |    "outputs": [],
309 |    "source": [
310 |     "X = all_data\n",
311 |     "y = np.array(map(lambda l: lang2idx[l], train_languages))"
312 |    ]
313 |   },
314 |   {
315 |    "cell_type": "markdown",
316 |    "metadata": {},
317 |    "source": [
318 |     "mark all data not coming from training set as out of set"
319 |    ]
320 |   },
321 |   {
322 |    "cell_type": "code",
323 |    "execution_count": 17,
324 |    "metadata": {
325 |     "collapsed": true
326 |    },
327 |    "outputs": [],
328 |    "source": [
329 |     "y = np.hstack((y,lang2idx['out_of_set']*np.ones(len(X)-len(y),dtype=int)))"
330 |    ]
331 |   },
332 |   {
333 |    "cell_type": "code",
334 |    "execution_count": 18,
335 |    "metadata": {
336 |     "collapsed": false
337 |    },
338 |    "outputs": [
339 |     {
340 |      "data": {
341 |       "text/plain": [
342 |        "['/Users/udi/Downloads/lisa']"
343 |       ]
344 |      },
345 |      "execution_count": 18,
346 |      "metadata": {},
347 |      "output_type": "execute_result"
348 |     }
349 |    ],
350 |    "source": [
351 |     "import fuel\n",
352 |     "fuel.config.data_path"
353 |    ]
354 |   },
355 |   {
356 |    "cell_type": "code",
357 |    "execution_count": 19,
358 |    "metadata": {
359 |     "collapsed": false
360 |    },
361 |    "outputs": [
362 |     {
363 |      "data": {
364 |       "text/plain": [
365 |        "'/Users/udi/Downloads/lisa/160111-fuel.test/160111-fuel.test.hdf5'"
366 |       ]
367 |      },
368 |      "execution_count": 19,
369 |      "metadata": {},
370 |      "output_type": "execute_result"
371 |     }
372 |    ],
373 |    "source": [
374 |     "import os\n",
375 |     "from fuel.datasets.hdf5 import H5PYDataset\n",
376 |     "datasource = '160111-fuel.test'\n",
377 |     "datasource_dir = os.path.join(fuel.config.data_path[0], datasource)\n",
378 |     "datasource_fname = os.path.join(datasource_dir , datasource + '.hdf5')\n",
379 |     "datasource_fname"
380 |    ]
381 |   },
382 |   {
383 |    "cell_type": "code",
384 |    "execution_count": 20,
385 |    "metadata": {
386 |     "collapsed": true
387 |    },
388 |    "outputs": [],
389 |    "source": [
390 |     "!mkdir -p {datasource_dir}"
391 |    ]
392 |   },
393 |   {
394 |    "cell_type": "code",
395 |    "execution_count": 21,
396 |    "metadata": {
397 |     "collapsed": false
398 |    },
399 |    "outputs": [
400 |     {
401 |      "name": "stdout",
402 |      "output_type": "stream",
403 |      "text": [
404 |       "-rw-r--r--  1 udi  staff  44915848 Jan 11 17:30 /Users/udi/Downloads/lisa/160111-fuel.test/160111-fuel.test.hdf5\r\n"
405 |      ]
406 |     }
407 |    ],
408 |    "source": [
409 |     "import h5py\n",
410 |     "N, NF = X.shape\n",
411 |     "with h5py.File(datasource_fname, mode='w') as fp:\n",
412 |     "    features = fp.create_dataset('features', (N, NF), dtype=np.float32)\n",
413 |     "    targets = fp.create_dataset('targets', (N,), dtype='int')\n",
414 |     "    features[...] = X.astype(np.float32)\n",
415 |     "    targets[...] = y\n",
416 |     "    from fuel.datasets.hdf5 import H5PYDataset\n",
417 |     "    split_dict = {\n",
418 |     "        'train': {'features': (0, N), 'targets': (0, N)},\n",
419 |     "        'test': {'features': (0, N), 'targets': (0, N)}\n",
420 |     "    }\n",
421 |     "    fp.attrs['split'] = H5PYDataset.create_split_array(split_dict)\n",
422 |     "!ls -l {datasource_fname}"
423 |    ]
424 |   },
425 |   {
426 |    "cell_type": "markdown",
427 |    "metadata": {},
428 |    "source": [
429 |     "the samples are not shuffled"
430 |    ]
431 |   },
432 |   {
433 |    "cell_type": "markdown",
434 |    "metadata": {},
435 |    "source": [
436 |     "## simulate training"
437 |    ]
438 |   },
439 |   {
440 |    "cell_type": "code",
441 |    "execution_count": 24,
442 |    "metadata": {
443 |     "collapsed": true
444 |    },
445 |    "outputs": [],
446 |    "source": [
447 |     "import random\n",
448 |     "\n",
449 |     "def cv_modify(X, y, Q, seed=None, oos_labels=None, unlabel_label_ratio=0.5):\n",
450 |     "    assert Q < 50, \"Q has to be smaller than 50, try 38\"\n",
451 |     "    assert np.all(y>0), \"unlabeled data\"\n",
452 |     "    \n",
453 |     "    if seed is not None:\n",
454 |     "        random.seed(seed)\n",
455 |     "        np.random.seed(seed)\n",
456 |     "        \n",
457 |     "    oos_size = 50 - Q\n",
458 |     "    if oos_labels is None:\n",
459 |     "        oos_labels = random.sample(range(1,51), oos_size)\n",
460 |     "    else:\n",
461 |     "        n = len(oos_labels)\n",
462 |     "        assert n <= oos_size\n",
463 |     "        assert len(set(oos_labels)) == n\n",
464 |     "        assert all(0 < s <= 50 for s in oos_labels)\n",
465 |     "        if n < oos_size:\n",
466 |     "            oos_labels += random.sample(set(range(1,51)) - set(oos_labels), oos_size - n)\n",
467 |     "\n",
468 |     "    # for each label build a map such that the known labels are at the start followed by the unknown labels\n",
469 |     "    label_map = [0] + sorted(set(range(1,51)) - set(oos_labels)) + sorted(oos_labels)\n",
470 |     "    y_train  = np.array([label_map.index(yy) for yy in y])\n",
471 |     "\n",
472 |     "    # index of all samples that are out-of-set\n",
473 |     "    oos = [i for i, yy in enumerate(y_train) if yy > Q or yy == 0]\n",
474 |     "    # index of all samples that are in-set\n",
475 |     "    in_set = [i for i, yy in enumerate(y_train) if 0 < yy <= Q]\n",
476 |     "    \n",
477 |     "    # take a part, r, of the samples that are in-set to be unlabeled, and leave 1-r\n",
478 |     "    # eventually the unlabeld set will be made from Q/50 in-set samples and 1-Q/50 oos samples\n",
479 |     "    # eventually the unlabeled size will be r*50/Q and we want\n",
480 |     "    # (1-r)*unlabel_label_ratio = r*50/Q\n",
481 |     "    # unlabel_label_ratio = r(50/Q + unlabel_label_ratio)\n",
482 |     "    # r = unlabel_label_ratio/(50./Q + unlabel_label_ratio)\n",
483 |     "    # r = Q*unlabel_label_ratio/(50. + Q*unlabel_label_ratio)\n",
484 |     "    Qu = Q*unlabel_label_ratio\n",
485 |     "    r = Qu/(50. + Qu)\n",
486 |     "    in_set_unlabeled = random.sample(in_set, int(len(in_set)*r))\n",
487 |     "    # the other half will be used as labeled\n",
488 |     "    in_set_labeled = list(set(in_set) - set(in_set_unlabeled))\n",
489 |     "    # give the unlabeled samples that are in-set have a high label (so the training will consider them to be unlabeled)\n",
490 |     "    # but keep their original identity for error measurement\n",
491 |     "    y_train[in_set_unlabeled] += 50\n",
492 |     "\n",
493 |     "    # add out-of-set samples to the unlabeled set keeping the ratio to labeled as before\n",
494 |     "    oos_unlabeled = random.sample(oos,int(len(oos)*r))\n",
495 |     "\n",
496 |     "    unlabeled = oos_unlabeled+in_set_unlabeled\n",
497 |     "\n",
498 |     "    # all other (oos) samples are dropped (too bad but we want to keep the original ratios)\n",
499 |     "    keep = in_set_labeled + unlabeled\n",
500 |     "    random.shuffle(keep)\n",
501 |     "\n",
502 |     "    y_train = y_train[keep]\n",
503 |     "    X_train = X[keep]\n",
504 |     "    return X_train, y_train, label_map"
505 |    ]
506 |   },
507 |   {
508 |    "cell_type": "code",
509 |    "execution_count": 25,
510 |    "metadata": {
511 |     "collapsed": true
512 |    },
513 |    "outputs": [],
514 |    "source": [
515 |     "Q=38\n",
516 |     "poos=0.23"
517 |    ]
518 |   },
519 |   {
520 |    "cell_type": "code",
521 |    "execution_count": 26,
522 |    "metadata": {
523 |     "collapsed": true
524 |    },
525 |    "outputs": [],
526 |    "source": [
527 |     "all_oos = []"
528 |    ]
529 |   },
530 |   {
531 |    "cell_type": "code",
532 |    "execution_count": 27,
533 |    "metadata": {
534 |     "collapsed": false
535 |    },
536 |    "outputs": [
537 |     {
538 |      "name": "stdout",
539 |      "output_type": "stream",
540 |      "text": [
541 |       "0 12391 /Users/udi/Downloads/lisa/160111-fuel.train.0/160111-fuel.train.0.hdf5\n",
542 |       "-rw-r--r--  1 udi  staff  19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.0/160111-fuel.train.0.hdf5\n",
543 |       "12\n",
544 |       "1 12391 /Users/udi/Downloads/lisa/160111-fuel.train.1/160111-fuel.train.1.hdf5\n",
545 |       "-rw-r--r--  1 udi  staff  19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.1/160111-fuel.train.1.hdf5\n",
546 |       "24\n",
547 |       "2 12391 /Users/udi/Downloads/lisa/160111-fuel.train.2/160111-fuel.train.2.hdf5\n",
548 |       "-rw-r--r--  1 udi  staff  19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.2/160111-fuel.train.2.hdf5\n",
549 |       "36\n",
550 |       "3 12391 /Users/udi/Downloads/lisa/160111-fuel.train.3/160111-fuel.train.3.hdf5\n",
551 |       "-rw-r--r--  1 udi  staff  19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.3/160111-fuel.train.3.hdf5\n",
552 |       "48\n",
553 |       "4 12391 /Users/udi/Downloads/lisa/160111-fuel.train.4/160111-fuel.train.4.hdf5\n",
554 |       "-rw-r--r--  1 udi  staff  19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.4/160111-fuel.train.4.hdf5\n",
555 |       "50\n"
556 |      ]
557 |     }
558 |    ],
559 |    "source": [
560 |     "for seed in range(5):\n",
561 |     "    # the first 5 seeds are used to cover all labels at least once\n",
562 |     "    oos_labels = range(1+12*seed,min(1+12*seed + 12,51))\n",
563 |     "        \n",
564 |     "    X_train, y_train, labels = cv_modify(X[:Nt], y[:Nt], Q, seed=seed, oos_labels=oos_labels)\n",
565 |     "    datasource = '160111-fuel.train.%d'%seed\n",
566 |     "    datasource_dir = os.path.join(fuel.config.data_path[0], datasource)\n",
567 |     "    datasource_fname = os.path.join(datasource_dir , datasource + '.hdf5')\n",
568 |     "    !mkdir -p {datasource_dir}\n",
569 |     "    N0 = len(X_train)\n",
570 |     "    print seed, N0, datasource_fname\n",
571 |     "\n",
572 |     "    with h5py.File(datasource_fname, mode='w') as fp:\n",
573 |     "        features = fp.create_dataset('features', (N0, NF), dtype=np.float32)\n",
574 |     "        targets = fp.create_dataset('targets', (N0,), dtype='int')\n",
575 |     "        features[...] = X_train.astype(np.float32)\n",
576 |     "        targets[...] = y_train\n",
577 |     "        \n",
578 |     "        split_dict = {\n",
579 |     "            'train': {'features': (0, N0), 'targets': (0, N0)},\n",
580 |     "            'test': {'features': (0, N0), 'targets': (0, N0)}\n",
581 |     "        }\n",
582 |     "        fp.attrs['split'] = H5PYDataset.create_split_array(split_dict)\n",
583 |     "        fp.attrs['labels'] = labels\n",
584 |     "    !ls -l {datasource_fname}\n",
585 |     "    oos = labels[-12:]\n",
586 |     "    all_oos += oos\n",
587 |     "    print len(set(all_oos))"
588 |    ]
589 |   },
590 |   {
591 |    "cell_type": "code",
592 |    "execution_count": 28,
593 |    "metadata": {
594 |     "collapsed": true
595 |    },
596 |    "outputs": [],
597 |    "source": [
598 |     "!(cd /Users/udi/Downloads/lisa/ ; tar cfz 160111-fuel.tgz 160111-fuel.train.* )"
599 |    ]
600 |   },
601 |   {
602 |    "cell_type": "code",
603 |    "execution_count": 29,
604 |    "metadata": {
605 |     "collapsed": false
606 |    },
607 |    "outputs": [
608 |     {
609 |      "name": "stdout",
610 |      "output_type": "stream",
611 |      "text": [
612 |       "move: ../../../../../Downloads/lisa/160111-fuel.tgz to s3://udikaggle/nist/160111-fuel.tgz\n"
613 |      ]
614 |     }
615 |    ],
616 |    "source": [
617 |     "!aws s3 mv /Users/udi/Downloads/lisa/160111-fuel.tgz s3://udikaggle/nist/"
618 |    ]
619 |   },
620 |   {
621 |    "cell_type": "code",
622 |    "execution_count": null,
623 |    "metadata": {
624 |     "collapsed": true
625 |    },
626 |    "outputs": [],
627 |    "source": []
628 |   }
629 |  ],
630 |  "metadata": {
631 |   "kernelspec": {
632 |    "display_name": "Python 2",
633 |    "language": "python",
634 |    "name": "python2"
635 |   },
636 |   "language_info": {
637 |    "codemirror_mode": {
638 |     "name": "ipython",
639 |     "version": 2
640 |    },
641 |    "file_extension": ".py",
642 |    "mimetype": "text/x-python",
643 |    "name": "python",
644 |    "nbconvert_exporter": "python",
645 |    "pygments_lexer": "ipython2",
646 |    "version": "2.7.11"
647 |   }
648 |  },
649 |  "nbformat": 4,
650 |  "nbformat_minor": 0
651 | }
652 | 


--------------------------------------------------------------------------------
/ladder.py:
--------------------------------------------------------------------------------
  1 | import logging
  2 | 
  3 | import numpy as np
  4 | from collections import OrderedDict
  5 | 
  6 | import theano
  7 | import theano.tensor as T
  8 | from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams
  9 | from theano.tensor.nnet.conv import conv2d, ConvOp
 10 | from theano.sandbox.cuda.blas import GpuCorrMM
 11 | from theano.sandbox.cuda.basic_ops import gpu_contiguous
 12 | 
 13 | from blocks.bricks.cost import SquaredError
 14 | from blocks.bricks.cost import CategoricalCrossEntropy, MisclassificationRate, Cost
 15 | from blocks.graph import add_annotation, Annotation
 16 | from blocks.roles import add_role, PARAMETER, WEIGHT, BIAS
 17 | 
 18 | from utils import shared_param, AttributeDict
 19 | from nn import maxpool_2d, global_meanpool_2d, BNPARAM
 20 | 
 21 | logger = logging.getLogger('main.model')
 22 | floatX = theano.config.floatX
 23 | 
 24 | from blocks.bricks.base import application
 25 | from theano.tensor.extra_ops import to_one_hot
 26 | class MisclassificationRateIV(Cost):
 27 |     def __init__(self, oos_thr=0., poos=0.23):
 28 |         self.oos_thr = oos_thr
 29 |         self.poos = poos
 30 |         super(MisclassificationRateIV, self).__init__()
 31 | 
 32 |     @application(outputs=["error_rate"])
 33 |     def apply(self, y, y_hat):
 34 |         # find the unlabeled samples: a combination of oos and labeled samples that were moved by +50
 35 |         unlabeled = (y >= y_hat.shape[1]).nonzero()
 36 |         y = y[unlabeled]
 37 |         y_hat = y_hat[unlabeled]
 38 |         # return unlabeled samples that are in-set to their original value
 39 |         y = T.switch(y <= 50, y, y-50)
 40 |         # convert oos to 0
 41 |         y = T.switch(y < y_hat.shape[1], y, 0)
 42 | 
 43 |         # if maximal prob is below oos_thr then assume it is OOS
 44 |         y_hat_argmax = T.switch(y_hat.max(axis=1) >= self.oos_thr, y_hat.argmax(axis=1), 0)
 45 |         # locate mistakes
 46 |         mistakes = T.neq(y, y_hat_argmax)
 47 | 
 48 |         # compute the error rate for each label
 49 |         yhot = to_one_hot(y, y_hat.shape[1], dtype=floatX)
 50 |         yhot = yhot.T
 51 |         mistakes = T.dot(yhot, mistakes) / (yhot.sum(axis=1) + np.float32(1e-6))
 52 |         return (1. - self.poos)*mistakes[1:].mean() + self.poos * mistakes[0]
 53 | 
 54 | class OOSRateIV(Cost):
 55 |     @application(outputs=["oos_rate"])
 56 |     def apply(self, y, y_hat):
 57 |         # find the unlabeled samples: a combination of oos and labeled samples that were moved by +50
 58 |         unlabeled = (y >= y_hat.shape[1]).nonzero()
 59 |         y_hat = y_hat[unlabeled]
 60 |         oos = T.eq(y_hat.argmax(axis=1), 0)
 61 |         return T.mean(oos)
 62 | 
 63 | # Exactly like 160107-keras.ipynb
 64 | def objective(y_true, y_pred, P, Q, alpha=0., beta=0.15, dbeta=0., gamma=0.01, gamma1=-1., poos=0.23, eps=1e-6):
 65 |     '''Expects a binary class matrix instead of a vector of scalar classes.
 66 |     '''
 67 | 
 68 |     beta = np.float32(beta)
 69 |     dbeta = np.float32(dbeta)
 70 |     gamma = np.float32(gamma)
 71 |     poos = np.float32(poos)
 72 |     eps = np.float32(eps)
 73 | 
 74 |     # scale preds so that the class probas of each sample sum to 1
 75 |     y_pred += eps
 76 |     y_pred /= y_pred.sum(axis=-1, keepdims=True)
 77 | 
 78 |     y_true = T.cast(y_true.flatten(), 'int64')
 79 |     y1 = T.and_(T.gt(y_true, 0), T.le(y_true, Q))  # in-set
 80 |     y0 = T.or_(T.eq(y_true, 0), T.gt(y_true, Q))  # out-of-set or unlabeled
 81 |     y0sum = y0.sum() + eps  # number of oos
 82 |     y1sum = y1.sum() + eps  # number of in-set
 83 |     # we want to reduce cross entrophy of labeled data
 84 |     # convert all oos/unlabeled to label=0
 85 |     cost0 = T.nnet.categorical_crossentropy(y_pred, T.switch(y_true <= Q, y_true, 0))
 86 |     cost0 = T.dot(y1, cost0) / y1sum  # average cost per labeled example
 87 | 
 88 |     if alpha:
 89 |         cost1 = T.nnet.categorical_crossentropy(y_pred, y_pred)
 90 |         cost1 = T.dot(y0, cost1) / y0sum  # average cost per labeled example
 91 |         cost0 += alpha*cost1
 92 | 
 93 |     # we want to increase the average entrophy in each batch
 94 |     # average over batch
 95 |     if beta:
 96 |         y_pred_avg0 = T.dot(y0, y_pred) / y0sum
 97 |         y_pred_avg0 = T.clip(y_pred_avg0, eps, np.float32(1) - eps)
 98 |         y_pred_avg0 /= y_pred_avg0.sum(axis=-1, keepdims=True)
 99 |         cost2 = T.nnet.categorical_crossentropy(y_pred_avg0.reshape((1,-1)), P-dbeta)[0] # [None,:]
100 |         cost2 = T.switch(y0sum > 0.5, cost2, 0.)  # ignore cost2 if no samples
101 |         cost0 += beta*cost2
102 | 
103 |     # binary classifier score
104 |     if gamma:
105 |         y_pred0 = T.clip(y_pred[:,0], eps, np.float32(1) - eps)
106 |         if gamma1 < 0.:
107 |             cost3 = - T.dot(poos*y0,T.log(y_pred0)) - T.dot(np.float32(1)-poos*y0.T,T.log(np.float32(1)-y_pred0))
108 |             cost3 /= y_pred.shape[0]
109 |             cost0 += gamma*cost3
110 |         elif gamma1 > 0.:
111 |             cost3 = - T.dot(poos*y0,T.log(y_pred0)) - T.dot((np.float32(1)-poos)*y0,T.log(np.float32(1)-y_pred0))
112 |             cost3 /= y0sum
113 |             cost31 =  - T.dot(y1,T.log(np.float32(1)-y_pred0))
114 |             cost3 /= y1sum
115 |             cost0 += gamma*cost3 + gamma1*cost31
116 |         else:  # gamma1 == 0.
117 |             cost3 = - T.dot(poos*y0,T.log(y_pred0)) - T.dot((np.float32(1)-poos)*y0, T.log(np.float32(1)-y_pred0))
118 |             cost3 /= y0sum
119 |             cost0 += gamma*cost3
120 |     return cost0
121 | 
122 | 
123 | 
124 | class CategoricalCrossEntropyIV(Cost):
125 |     def __init__(self, Q, poos=0.23, alpha=0., beta=0.15, dbeta=0., gamma=0.01, gamma1=-1.):
126 |         self.poos = poos
127 |         self.alpha = alpha
128 |         self.beta = beta
129 |         self.dbeta = dbeta
130 |         self.gamma = gamma
131 |         self.gamma1 = gamma1
132 |         super(CategoricalCrossEntropyIV, self).__init__()
133 | 
134 |         self.Q = Q
135 |         P = (1.-poos)/Q*np.ones(Q+1)
136 |         P[0] = poos
137 |         P = P.reshape((1,-1))
138 |         self.P = theano.shared(P.astype(theano.config.floatX), broadcastable=(True,False))
139 | 
140 |     @application(outputs=["cost"])
141 |     def apply(self, y, y_hat):
142 |         return T.sum(objective(y, y_hat, self.P, self.Q, self.alpha, self.beta, self.dbeta, self.gamma, self.gamma1))
143 | 
144 | class LadderAE():
145 |     def __init__(self, p):
146 |         self.p = p
147 |         self.init_weights_transpose = False
148 |         self.default_lr = p.lr
149 |         self.shareds = OrderedDict()
150 |         self.rstream = RandomStreams(seed=p.seed)
151 |         self.rng = np.random.RandomState(seed=p.seed)
152 | 
153 |         n_layers = len(p.encoder_layers)
154 |         assert n_layers > 1, "Need to define encoder layers"
155 |         assert n_layers == len(p.denoising_cost_x), (
156 |             "Number of denoising costs does not match with %d layers: %s" %
157 |             (n_layers, str(p.denoising_cost_x)))
158 | 
159 |         def one_to_all(x):
160 |             """ (5.,) -> 5 -> (5., 5., 5.)
161 |                 ('relu',) -> 'relu' -> ('relu', 'relu', 'relu')
162 |             """
163 |             if type(x) is tuple and len(x) == 1:
164 |                 x = x[0]
165 | 
166 |             if type(x) is float:
167 |                 x = (np.float32(x),) * n_layers
168 | 
169 |             if type(x) is str:
170 |                 x = (x,) * n_layers
171 |             return x
172 | 
173 |         p.decoder_spec = one_to_all(p.decoder_spec)
174 |         p.f_local_noise_std = one_to_all(p.f_local_noise_std)
175 |         acts = one_to_all(p.get('act', 'relu'))
176 | 
177 |         assert n_layers == len(p.decoder_spec), "f and g need to match"
178 |         assert (n_layers == len(acts)), (
179 |             "Not enough activations given. Requires %d. Got: %s" %
180 |             (n_layers, str(acts)))
181 |         acts = acts[:-1] + ('softmax',)
182 | 
183 |         def parse_layer(spec):
184 |             """ 'fc:5' -> ('fc', 5)
185 |                 '5'    -> ('fc', 5)
186 |                 5      -> ('fc', 5)
187 |                 'convv:3:2:2' -> ('convv', [3,2,2])
188 |             """
189 |             if type(spec) is not str:
190 |                 return "fc", spec
191 |             spec = spec.split(':')
192 |             l_type = spec.pop(0) if len(spec) >= 2 else "fc"
193 |             spec = map(int, spec)
194 |             spec = spec[0] if len(spec) == 1 else spec
195 |             return l_type, spec
196 | 
197 |         enc = map(parse_layer, p.encoder_layers)
198 |         self.layers = list(enumerate(zip(enc, p.decoder_spec, acts)))
199 | 
200 |     def weight(self, init, name, cast_float32=True, for_conv=False):
201 |         weight = self.shared(init, name, cast_float32, role=WEIGHT)
202 |         if for_conv:
203 |             return weight.dimshuffle('x', 0, 'x', 'x')
204 |         return weight
205 | 
206 |     def bias(self, init, name, cast_float32=True, for_conv=False):
207 |         b = self.shared(init, name, cast_float32, role=BIAS)
208 |         if for_conv:
209 |             return b.dimshuffle('x', 0, 'x', 'x')
210 |         return b
211 | 
212 |     def shared(self, init, name, cast_float32=True, role=PARAMETER, **kwargs):
213 |         p = self.shareds.get(name)
214 |         if p is None:
215 |             p = shared_param(init, name, cast_float32, role, **kwargs)
216 |             self.shareds[name] = p
217 |         return p
218 | 
219 |     def counter(self):
220 |         name = 'counter'
221 |         p = self.shareds.get(name)
222 |         update = []
223 |         if p is None:
224 |             p_max_val = np.float32(10)
225 |             p = self.shared(np.float32(1), name, role=BNPARAM)
226 |             p_max = self.shared(p_max_val, name + '_max', role=BNPARAM)
227 |             update = [(p, T.clip(p + np.float32(1), np.float32(0), p_max)),
228 |                       (p_max, p_max_val)]
229 |         return (p, update)
230 | 
231 |     def noise_like(self, x):
232 |         noise = self.rstream.normal(size=x.shape, avg=0.0, std=1.0)
233 |         return T.cast(noise, dtype=floatX)
234 | 
235 |     def rand_init(self, in_dim, out_dim):
236 |         """ Random initialization for fully connected layers """
237 |         W = self.rng.randn(in_dim, out_dim) / np.sqrt(in_dim)
238 |         return W
239 | 
240 |     def rand_init_conv(self, dim):
241 |         """ Random initialization for convolution filters """
242 |         fan_in = np.prod(dtype=floatX, a=dim[1:])
243 |         bound = np.sqrt(3. / max(1.0, (fan_in)))
244 |         W = np.asarray(
245 |             self.rng.uniform(low=-bound, high=bound, size=dim), dtype=floatX)
246 |         return W
247 | 
248 |     def new_activation_dict(self):
249 |         return AttributeDict({'z': {}, 'h': {}, 's': {}, 'm': {}})
250 | 
251 |     def annotate_update(self, update, tag_to):
252 |         a = Annotation()
253 |         for (var, up) in update:
254 |             a.updates[var] = up
255 |         add_annotation(tag_to, a)
256 | 
257 |     def apply(self, input_labeled, target_labeled, input_unlabeled):
258 |         self.layer_counter = 0
259 |         input_dim = self.p.encoder_layers[0]
260 | 
261 |         # Store the dimension tuples in the same order as layers.
262 |         layers = self.layers
263 |         self.layer_dims = {0: input_dim}
264 | 
265 |         self.lr = self.shared(self.default_lr, 'learning_rate', role=None)
266 | 
267 |         self.costs = costs = AttributeDict()
268 |         self.costs.denois = AttributeDict()
269 | 
270 |         self.act = AttributeDict()
271 |         self.error = AttributeDict()
272 |         self.oos = AttributeDict()
273 | 
274 |         top = len(layers) - 1
275 | 
276 |         N = input_labeled.shape[0]
277 |         self.join = lambda l, u: T.concatenate([l, u], axis=0)
278 |         self.labeled = lambda x: x[:N] if x is not None else x
279 |         self.unlabeled = lambda x: x[N:] if x is not None else x
280 |         self.split_lu = lambda x: (self.labeled(x), self.unlabeled(x))
281 | 
282 |         input_concat = self.join(input_labeled, input_unlabeled)
283 | 
284 |         def encoder(input_, path_name, input_noise_std=0, noise_std=[]):
285 |             h = input_
286 | 
287 |             logger.info('  0: noise %g' % input_noise_std)
288 |             if input_noise_std > 0.:
289 |                 h = h + self.noise_like(h) * input_noise_std
290 | 
291 |             d = AttributeDict()
292 |             d.unlabeled = self.new_activation_dict()
293 |             d.labeled = self.new_activation_dict()
294 |             d.labeled.z[0] = self.labeled(h)
295 |             d.unlabeled.z[0] = self.unlabeled(h)
296 |             prev_dim = input_dim
297 |             for i, (spec, _, act_f) in layers[1:]:
298 |                 d.labeled.h[i - 1], d.unlabeled.h[i - 1] = self.split_lu(h)
299 |                 noise = noise_std[i] if i < len(noise_std) else 0.
300 |                 curr_dim, z, m, s, h = self.f(h, prev_dim, spec, i, act_f,
301 |                                               path_name=path_name,
302 |                                               noise_std=noise)
303 |                 assert self.layer_dims.get(i) in (None, curr_dim)
304 |                 self.layer_dims[i] = curr_dim
305 |                 d.labeled.z[i], d.unlabeled.z[i] = self.split_lu(z)
306 |                 d.unlabeled.s[i] = s
307 |                 d.unlabeled.m[i] = m
308 |                 prev_dim = curr_dim
309 |             d.labeled.h[i], d.unlabeled.h[i] = self.split_lu(h)
310 |             return d
311 | 
312 |         # Clean, supervised
313 |         logger.info('Encoder: clean, labeled')
314 |         clean = self.act.clean = encoder(input_concat, 'clean')
315 | 
316 |         # Corrupted, supervised
317 |         logger.info('Encoder: corr, labeled')
318 |         corr = self.act.corr = encoder(input_concat, 'corr',
319 |                                        input_noise_std=self.p.super_noise_std,
320 |                                        noise_std=self.p.f_local_noise_std)
321 |         est = self.act.est = self.new_activation_dict()
322 | 
323 |         # Decoder path in opposite order
324 |         logger.info('Decoder: z_corr -> z_est')
325 |         for i, ((_, spec), l_type, act_f) in layers[::-1]:
326 |             z_corr = corr.unlabeled.z[i]
327 |             z_clean = clean.unlabeled.z[i]
328 |             z_clean_s = clean.unlabeled.s.get(i)
329 |             z_clean_m = clean.unlabeled.m.get(i)
330 |             fspec = layers[i+1][1][0] if len(layers) > i+1 else (None, None)
331 | 
332 |             if i == top:
333 |                 ver = corr.unlabeled.h[i]
334 |                 ver_dim = self.layer_dims[i]
335 |                 top_g = True
336 |             else:
337 |                 ver = est.z.get(i + 1)
338 |                 ver_dim = self.layer_dims.get(i + 1)
339 |                 top_g = False
340 | 
341 |             z_est = self.g(z_lat=z_corr,
342 |                            z_ver=ver,
343 |                            in_dims=ver_dim,
344 |                            out_dims=self.layer_dims[i],
345 |                            l_type=l_type,
346 |                            num=i,
347 |                            fspec=fspec,
348 |                            top_g=top_g)
349 | 
350 |             if z_est is not None:
351 |                 # Denoising cost
352 | 
353 |                 if z_clean_s and self.p.zestbn == 'bugfix':
354 |                     z_est_norm = (z_est - z_clean_m) / T.sqrt(z_clean_s + np.float32(1e-10))
355 |                 elif z_clean_s is None or self.p.zestbn == 'no':
356 |                     z_est_norm = z_est
357 |                 else:
358 |                     assert False, 'Not supported path'
359 | 
360 |                 se = SquaredError('denois' + str(i))
361 |                 costs.denois[i] = se.apply(z_est_norm.flatten(2),
362 |                                            z_clean.flatten(2)) \
363 |                     / np.prod(self.layer_dims[i], dtype=floatX)
364 |                 costs.denois[i].name = 'denois' + str(i)
365 |                 denois_print = 'denois %.2f' % self.p.denoising_cost_x[i]
366 |             else:
367 |                 denois_print = ''
368 | 
369 |             # Store references for later use
370 |             est.h[i] = self.apply_act(z_est, act_f)
371 |             est.z[i] = z_est
372 |             est.s[i] = None
373 |             est.m[i] = None
374 |             logger.info('  g%d: %10s, %s, dim %s -> %s' % (
375 |                 i, l_type,
376 |                 denois_print,
377 |                 self.layer_dims.get(i+1),
378 |                 self.layer_dims.get(i)
379 |                 ))
380 | 
381 |         # Costs
382 |         y = target_labeled.flatten()
383 | 
384 |         Q = int(self.layer_dims[top][0]) - 1
385 |         logger.info('Q=%d'%Q)
386 |         costs.class_clean = CategoricalCrossEntropyIV(Q=Q,
387 |                                                       alpha=self.p.alpha,
388 |                                                       beta=self.p.beta,
389 |                                                       dbeta=self.p.dbeta,
390 |                                                       gamma=self.p.gamma,
391 |                                                       gamma1=self.p.gamma1
392 |                                                       ).apply(y, clean.labeled.h[top])
393 |         costs.class_clean.name = 'cost_class_clean'
394 | 
395 |         costs.class_corr = CategoricalCrossEntropyIV(Q=Q,
396 |                                                      alpha=self.p.alpha,
397 |                                                      beta=self.p.beta,
398 |                                                      dbeta=self.p.dbeta,
399 |                                                      gamma=self.p.gamma,
400 |                                                      gamma1=self.p.gamma1,
401 |                                                      ).apply(y, corr.labeled.h[top])
402 |         costs.class_corr.name = 'cost_class_corr'
403 | 
404 |         # This will be used for training
405 |         costs.total = costs.class_corr * 1.0
406 |         for i in range(top + 1):
407 |             if costs.denois.get(i) and self.p.denoising_cost_x[i] > 0:
408 |                 costs.total += costs.denois[i] * self.p.denoising_cost_x[i]
409 |         if self.p.alpha_clean:
410 |             y_true = y
411 |             eps = np.float32(1e-6)
412 | 
413 |             # scale preds so that the class probas of each sample sum to 1
414 |             y_pred = clean.labeled.h[top] + eps
415 |             y_pred /= y_pred.sum(axis=-1, keepdims=True)
416 | 
417 |             y0 = T.or_(T.eq(y_true, 0), T.gt(y_true, Q))  # out-of-set or unlabeled
418 |             y0sum = y0.sum() + eps  # number of oos
419 | 
420 |             cost1 = T.nnet.categorical_crossentropy(y_pred, y_pred)
421 |             cost1 = T.dot(y0, cost1) / y0sum  # average cost per labeled example
422 |             costs.total += self.p.alpha_clean * cost1
423 | 
424 |         costs.total.name = 'cost_total'
425 | 
426 |         # Classification error
427 |         mr = MisclassificationRateIV(oos_thr=self.p.oos_thr)
428 |         self.error.clean = mr.apply(y, clean.labeled.h[top]) * np.float32(100.)
429 |         self.error.clean.name = 'error_rate_clean'
430 |         oosr = OOSRateIV()
431 |         self.oos.clean = oosr.apply(y, clean.labeled.h[top]) * np.float32(100.)
432 |         self.oos.clean.name = 'oos_rate_clean'
433 | 
434 |     def apply_act(self, input, act_name):
435 |         if input is None:
436 |             return input
437 |         act = {
438 |             'relu': lambda x: T.maximum(0, x),
439 |             'leakyrelu': lambda x: T.switch(x > 0., x, 0.1 * x),
440 |             'linear': lambda x: x,
441 |             'softplus': lambda x: T.log(1. + T.exp(x)),
442 |             'sigmoid': lambda x: T.nnet.sigmoid(x),
443 |             'softmax': lambda x: T.nnet.softmax(x),
444 |         }.get(act_name)
445 |         assert act, 'unknown act %s' % act_name
446 |         if act_name == 'softmax':
447 |             input = input.flatten(2)
448 |         return act(input)
449 | 
450 |     def annotate_bn(self, var, id, var_type, mb_size, size, norm_ax):
451 |         var_shape = np.array((1,) + size)
452 |         out_dim = np.prod(var_shape) / np.prod(var_shape[list(norm_ax)])
453 |         # Flatten the var - shared variable updating is not trivial otherwise,
454 |         # as theano seems to believe a row vector is a matrix and will complain
455 |         # about the updates
456 |         orig_shape = var.shape
457 |         var = var.flatten()
458 |         # Here we add the name and role, the variables will later be identified
459 |         # by these values
460 |         var.name = id + '_%s_clean' % var_type
461 |         add_role(var, BNPARAM)
462 |         shared_var = self.shared(np.zeros(out_dim),
463 |                                  name='shared_%s' % var.name, role=None)
464 | 
465 |         # Update running average estimates. When the counter is reset to 1, it
466 |         # will clear its memory
467 |         cntr, c_up = self.counter()
468 |         one = np.float32(1)
469 |         run_avg = lambda new, old: one / cntr * new + (one - one / cntr) * old
470 |         if var_type == 'mean':
471 |             new_value = run_avg(var, shared_var)
472 |         elif var_type == 'var':
473 |             mb_size = T.cast(mb_size, 'float32')
474 |             new_value = run_avg(mb_size / (mb_size - one) * var, shared_var)
475 |         else:
476 |             raise NotImplemented('Unknown batch norm var %s' % var_type)
477 |         # Add the counter update to the annotated update if it is the first
478 |         # instance of a counter
479 |         self.annotate_update([(shared_var, new_value)] + c_up, var)
480 | 
481 |         return var.reshape(orig_shape)
482 | 
483 |     def f(self, h, in_dim, spec, num, act_f, path_name, noise_std=0):
484 |         assert path_name in ['clean', 'corr']
485 |         # Generates identifiers used for referencing shared variables.
486 |         # E.g. clean and corrupted encoders will end up using the same
487 |         # variable name and hence sharing parameters
488 |         gen_id = lambda s: '_'.join(['f', str(num), s])
489 |         layer_type, _ = spec
490 | 
491 |         # Pooling
492 |         if layer_type in ['maxpool', 'globalmeanpool']:
493 |             z, output_size = self.f_pool(h, spec, in_dim)
494 |             norm_ax = (0, -2, -1)
495 |             # after pooling, no activation func for now unless its softmax
496 |             act_f = "linear" if act_f != "softmax" else act_f
497 | 
498 |         # Convolution
499 |         elif layer_type in ['convv', 'convf']:
500 |             z, output_size = self.f_conv(h, spec, in_dim, gen_id('W'))
501 |             norm_ax = (0, -2, -1)
502 | 
503 |         # Fully connected
504 |         elif layer_type == "fc":
505 |             h = h.flatten(2) if h.ndim > 2 else h
506 |             _, dim = spec
507 |             W = self.weight(self.rand_init(np.prod(in_dim), dim), gen_id('W'))
508 |             z, output_size = T.dot(h, W), (dim,)
509 |             norm_ax = (0,)
510 |         else:
511 |             raise ValueError("Unknown layer spec: %s" % layer_type)
512 | 
513 |         m = s = None
514 |         is_normalizing = True
515 |         if is_normalizing:
516 |             keep_dims = True
517 |             z_l = self.labeled(z)
518 |             z_u = self.unlabeled(z)
519 |             m = z_u.mean(norm_ax, keepdims=keep_dims)
520 |             s = z_u.var(norm_ax, keepdims=keep_dims)
521 | 
522 |             m_l = z_l.mean(norm_ax, keepdims=keep_dims)
523 |             s_l = z_l.var(norm_ax, keepdims=keep_dims)
524 |             if path_name == 'clean':
525 |                 # Batch normalization estimates the mean and variance of
526 |                 # validation and test sets based on the training set
527 |                 # statistics. The following annotates the computation of
528 |                 # running average to the graph.
529 |                 m_l = self.annotate_bn(m_l, gen_id('bn'), 'mean', z_l.shape[0],
530 |                                        output_size, norm_ax)
531 |                 s_l = self.annotate_bn(s_l, gen_id('bn'), 'var', z_l.shape[0],
532 |                                        output_size, norm_ax)
533 |             z = self.join(
534 |                 (z_l - m_l) / T.sqrt(s_l + np.float32(1e-10)),
535 |                 (z_u - m) / T.sqrt(s + np.float32(1e-10)))
536 | 
537 |         if noise_std > 0:
538 |             z += self.noise_like(z) * noise_std
539 | 
540 |         # z for lateral connection
541 |         z_lat = z
542 |         b_init, c_init = 0.0, 1.0
543 |         b_c_size = output_size[0]
544 | 
545 |         # Add bias
546 |         if act_f != 'linear':
547 |             z += self.bias(b_init * np.ones(b_c_size), gen_id('b'),
548 |                            for_conv=len(output_size) > 1)
549 | 
550 |         if is_normalizing:
551 |             # Add free parameter (gamma in original Batch Normalization paper)
552 |             # if needed by the activation. For instance ReLU does't need one
553 |             # and we only add it to softmax if hyperparameter top_c is set.
554 |             if (act_f not in ['relu', 'leakyrelu', 'linear', 'softmax'] or
555 |                     (act_f == 'softmax' and self.p.top_c is True)):
556 |                 c = self.weight(c_init * np.ones(b_c_size), gen_id('c'),
557 |                                 for_conv=len(output_size) > 1)
558 |                 z *= c
559 | 
560 |         h = self.apply_act(z, act_f)
561 | 
562 |         logger.info('  f%d: %s, %s,%s noise %.2f, params %s, dim %s -> %s' % (
563 |             num, layer_type, act_f, ' BN,' if is_normalizing else '',
564 |             noise_std, spec[1], in_dim, output_size))
565 |         return output_size, z_lat, m, s, h
566 | 
567 |     def f_pool(self, x, spec, in_dim):
568 |         layer_type, dims = spec
569 |         num_filters = in_dim[0]
570 |         if "globalmeanpool" == layer_type:
571 |             y, output_size = global_meanpool_2d(x, num_filters)
572 |             # scale the variance to match normal conv layers with xavier init
573 |             y = y * np.float32(in_dim[-1]) * np.float32(np.sqrt(3))
574 |         else:
575 |             assert dims[0] != 1 or dims[1] != 1
576 |             y, output_size = maxpool_2d(x, in_dim,
577 |                                         poolsize=(dims[1], dims[1]),
578 |                                         poolstride=(dims[0], dims[0]))
579 |         return y, output_size
580 | 
581 |     def f_conv(self, x, spec, in_dim, weight_name):
582 |         layer_type, dims = spec
583 |         num_filters = dims[0]
584 |         filter_size = (dims[1], dims[1])
585 |         stride = (dims[2], dims[2])
586 | 
587 |         bm = 'full' if 'convf' in layer_type else 'valid'
588 | 
589 |         num_channels = in_dim[0]
590 | 
591 |         W = self.weight(self.rand_init_conv(
592 |             (num_filters, num_channels) + filter_size), weight_name)
593 | 
594 |         if stride != (1, 1):
595 |             f = GpuCorrMM(subsample=stride, border_mode=bm, pad=(0, 0))
596 |             y = f(gpu_contiguous(x), gpu_contiguous(W))
597 |         else:
598 |             assert self.p.batch_size == self.p.valid_batch_size
599 |             y = conv2d(x, W, image_shape=(2*self.p.batch_size, ) + in_dim,
600 |                        filter_shape=((num_filters, num_channels) +
601 |                                      filter_size), border_mode=bm)
602 |         output_size = ((num_filters,) +
603 |                        ConvOp.getOutputShape(in_dim[1:], filter_size,
604 |                                              stride, bm))
605 | 
606 |         return y, output_size
607 | 
608 |     def g(self, z_lat, z_ver, in_dims, out_dims, l_type, num, fspec, top_g):
609 |         f_layer_type, dims = fspec
610 |         is_conv = f_layer_type is not None and ('conv' in f_layer_type or
611 |                                                 'pool' in f_layer_type)
612 |         gen_id = lambda s: '_'.join(['g', str(num), s])
613 | 
614 |         in_dim = np.prod(dtype=floatX, a=in_dims)
615 |         out_dim = np.prod(dtype=floatX, a=out_dims)
616 |         num_filters = out_dims[0] if is_conv else out_dim
617 | 
618 |         if l_type[-1] in ['0']:
619 |             g_type, u_type = l_type[:-1], l_type[-1]
620 |         else:
621 |             g_type, u_type = l_type, None
622 | 
623 |         # Mapping from layer above: u
624 |         if u_type in ['0'] or z_ver is None:
625 |             if z_ver is None and u_type not in ['0']:
626 |                 logger.warn('Decoder %d:%s without vertical input' %
627 |                             (num, g_type))
628 |             u = None
629 |         else:
630 |             if top_g:
631 |                 u = z_ver
632 |             elif is_conv:
633 |                 u = self.g_deconv(z_ver, in_dims, out_dims, gen_id('W'), fspec)
634 |             else:
635 |                 W = self.weight(self.rand_init(in_dim, out_dim), gen_id('W'))
636 |                 u = T.dot(z_ver, W)
637 | 
638 |         # Batch-normalize u
639 |         if u is not None:
640 |             norm_ax = (0,) if u.ndim <= 2 else (0, -2, -1)
641 |             keep_dims = True
642 |             u -= u.mean(norm_ax, keepdims=keep_dims)
643 |             u /= T.sqrt(u.var(norm_ax, keepdims=keep_dims) +
644 |                         np.float32(1e-10))
645 | 
646 |         # Define the g function
647 |         if not is_conv:
648 |             z_lat = z_lat.flatten(2)
649 |         bi = lambda inits, name: self.bias(inits * np.ones(num_filters),
650 |                                            gen_id(name), for_conv=is_conv)
651 |         wi = lambda inits, name: self.weight(inits * np.ones(num_filters),
652 |                                              gen_id(name), for_conv=is_conv)
653 | 
654 |         if g_type == '':
655 |             z_est = None
656 | 
657 |         elif g_type == 'i':
658 |             z_est = z_lat
659 | 
660 |         elif g_type in ['sig']:
661 |             sigval = bi(0., 'c1') + wi(1., 'c2') * z_lat
662 |             if u is not None:
663 |                 sigval += wi(0., 'c3') * u + wi(0., 'c4') * z_lat * u
664 |             sigval = T.nnet.sigmoid(sigval)
665 | 
666 |             z_est = bi(0., 'a1') + wi(1., 'a2') * z_lat + wi(1., 'b1') * sigval
667 |             if u is not None:
668 |                 z_est += wi(0., 'a3') * u + wi(0., 'a4') * z_lat * u
669 | 
670 |         elif g_type in ['lin']:
671 |             a1 = wi(1.0, 'a1')
672 |             b = bi(0.0, 'b')
673 | 
674 |             z_est = a1 * z_lat + b
675 | 
676 |         elif g_type in ['relu']:
677 |             assert u is not None
678 |             b = bi(0., 'b')
679 |             x = u + b
680 |             z_est = self.apply_act(x, 'relu')
681 | 
682 |         elif g_type in ['sigmoid']:
683 |             assert u is not None
684 |             b = bi(0., 'b')
685 |             c = wi(1., 'c')
686 |             z_est = self.apply_act((u + b) * c, 'sigmoid')
687 | 
688 |         elif g_type in ['comparison_g2']:
689 |             # sig without the uz cross term
690 |             sigval = bi(0., 'c1') + wi(1., 'c2') * z_lat
691 |             if u is not None:
692 |                 sigval += wi(0., 'c3') * u
693 |             sigval = T.nnet.sigmoid(sigval)
694 | 
695 |             z_est = bi(0., 'a1') + wi(1., 'a2') * z_lat + wi(1., 'b1') * sigval
696 |             if u is not None:
697 |                 z_est += wi(0., 'a3') * u
698 | 
699 |         elif g_type in ['comparison_g3']:
700 |             # sig without the sigmoid nonlinearity
701 |             z_est = bi(0., 'a1') + wi(1., 'a2') * z_lat
702 |             if u is not None:
703 |                 z_est += wi(0., 'a3') * u + wi(0., 'a4') * z_lat * u
704 | 
705 |         elif g_type in ['comparison_g4']:
706 |             # No mixing between z_lat and u before final sum, otherwise similar
707 |             # to sig
708 |             def nonlin(inp, in_name='input', add_bias=True):
709 |                 w1 = wi(1., 'w1_%s' % in_name)
710 |                 b1 = bi(0., 'b1')
711 |                 w2 = wi(1., 'w2_%s' % in_name)
712 |                 b2 = bi(0., 'b2') if add_bias else 0
713 |                 w3 = wi(0., 'w3_%s' % in_name)
714 |                 return w2 * T.nnet.sigmoid(b1 + w1 * inp) + w3 * inp + b2
715 | 
716 |             z_est = nonlin(z_lat, 'lat') if u is None else \
717 |                 nonlin(z_lat, 'lat') + nonlin(u, 'ver', False)
718 | 
719 |         elif g_type in ['comparison_g5', 'gauss']:
720 |             # Gaussian assumption on z: (z - mu) * v + mu
721 |             if u is None:
722 |                 b1 = bi(0., 'b1')
723 |                 w1 = wi(1., 'w1')
724 |                 z_est = w1 * z_lat + b1
725 |             else:
726 |                 a1 = bi(0., 'a1')
727 |                 a2 = wi(1., 'a2')
728 |                 a3 = bi(0., 'a3')
729 |                 a4 = bi(0., 'a4')
730 |                 a5 = bi(0., 'a5')
731 | 
732 |                 a6 = bi(0., 'a6')
733 |                 a7 = wi(1., 'a7')
734 |                 a8 = bi(0., 'a8')
735 |                 a9 = bi(0., 'a9')
736 |                 a10 = bi(0., 'a10')
737 | 
738 |                 mu = a1 * T.nnet.sigmoid(a2 * u + a3) + a4 * u + a5
739 |                 v = a6 * T.nnet.sigmoid(a7 * u + a8) + a9 * u + a10
740 | 
741 |                 z_est = (z_lat - mu) * v + mu
742 | 
743 |         else:
744 |             raise NotImplementedError("unknown g type: %s" % str(g_type))
745 | 
746 |         # Reshape the output if z is for conv but u from fc layer
747 |         if (z_est is not None and type(out_dims) == tuple and
748 |                 len(out_dims) > 1.0 and z_est.ndim < 4):
749 |             z_est = z_est.reshape((z_est.shape[0],) + out_dims)
750 | 
751 |         return z_est
752 | 
753 |     def g_deconv(self, z_ver, in_dims, out_dims, weight_name, fspec):
754 |         """ Inverse operation for each type of f used in convnets """
755 |         f_type, f_dims = fspec
756 |         assert z_ver is not None
757 |         num_channels = in_dims[0] if in_dims is not None else None
758 |         num_filters, width, height = out_dims[:3]
759 | 
760 |         if f_type in ['globalmeanpool']:
761 |             u = T.addbroadcast(z_ver, 2, 3)
762 |             assert in_dims[1] == 1 and in_dims[2] == 1, \
763 |                 "global pooling needs in_dims (1,1): %s" % str(in_dims)
764 | 
765 |         elif f_type in ['maxpool']:
766 |             sh, str, size = z_ver.shape, f_dims[0], f_dims[1]
767 |             assert str == size, "depooling requires stride == size"
768 |             u = T.zeros((sh[0], sh[1], sh[2] * str, sh[3] * str),
769 |                         dtype=z_ver.dtype)
770 |             for x in xrange(str):
771 |                 for y in xrange(str):
772 |                     u = T.set_subtensor(u[:, :, x::str, y::str], z_ver)
773 |             u = u[:, :, :width, :height]
774 | 
775 |         elif f_type in ['convv', 'convf']:
776 |             filter_size, str = (f_dims[1], f_dims[1]), f_dims[2]
777 |             W_shape = (num_filters, num_channels) + filter_size
778 |             W = self.weight(self.rand_init_conv(W_shape), weight_name)
779 |             if str > 1:
780 |                 # upsample if strided version
781 |                 sh = z_ver.shape
782 |                 u = T.zeros((sh[0], sh[1], sh[2] * str, sh[3] * str),
783 |                             dtype=z_ver.dtype)
784 |                 u = T.set_subtensor(u[:, :, ::str, ::str], z_ver)
785 |             else:
786 |                 u = z_ver  # no strides, only deconv
787 |             u = conv2d(u, W, filter_shape=W_shape,
788 |                        border_mode='valid' if 'convf' in f_type else 'full')
789 |             u = u[:, :, :width, :height]
790 |         else:
791 |             raise NotImplementedError('Layer %s has no convolutional decoder'
792 |                                       % f_type)
793 | 
794 |         return u
795 | 


--------------------------------------------------------------------------------
/language-tree.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/udibr/LRE/2571ba133ec8ac276e36074915bfa7d2113e5baa/language-tree.jpg


--------------------------------------------------------------------------------
/nn.py:
--------------------------------------------------------------------------------
  1 | from collections import OrderedDict
  2 | import logging
  3 | 
  4 | import scipy
  5 | import numpy as np
  6 | from theano import tensor
  7 | from theano.tensor.signal.downsample import max_pool_2d, DownsampleFactorMax
  8 | 
  9 | from blocks.extensions import SimpleExtension
 10 | from blocks.extensions.monitoring import (DataStreamMonitoring,
 11 |                                           MonitoringExtension)
 12 | from blocks.filter import VariableFilter
 13 | from blocks.graph import ComputationGraph
 14 | from blocks.monitoring.evaluators import DatasetEvaluator
 15 | from blocks.roles import AuxiliaryRole
 16 | 
 17 | logger = logging.getLogger('main.nn')
 18 | 
 19 | 
 20 | class BnParamRole(AuxiliaryRole):
 21 |     pass
 22 | 
 23 | # Batch normalization parameters that have to be replaced when testing
 24 | BNPARAM = BnParamRole()
 25 | 
 26 | 
 27 | class ZCA(object):
 28 |     def __init__(self, n_components=None, data=None, filter_bias=0.1):
 29 |         self.filter_bias = np.float32(filter_bias)
 30 |         self.P = None
 31 |         self.P_inv = None
 32 |         self.n_components = 0
 33 |         self.is_fit = False
 34 |         if n_components and data:
 35 |             self.fit(n_components, data)
 36 | 
 37 |     def fit(self, n_components, data):
 38 |         if len(data.shape) == 2:
 39 |             self.reshape = None
 40 |         else:
 41 |             assert n_components == np.product(data.shape[1:]), \
 42 |                 'ZCA whitening components should be %d for convolutional data'\
 43 |                 % np.product(data.shape[1:])
 44 |             self.reshape = data.shape[1:]
 45 | 
 46 |         data = self._flatten_data(data)
 47 |         assert len(data.shape) == 2
 48 |         n, m = data.shape
 49 |         self.mean = np.mean(data, axis=0)
 50 | 
 51 |         bias = self.filter_bias * scipy.sparse.identity(m, 'float32')
 52 |         cov = np.cov(data, rowvar=0, bias=1) + bias
 53 |         eigs, eigv = scipy.linalg.eigh(cov)
 54 | 
 55 |         assert not np.isnan(eigs).any()
 56 |         assert not np.isnan(eigv).any()
 57 |         assert eigs.min() > 0
 58 | 
 59 |         if self.n_components:
 60 |             eigs = eigs[-self.n_components:]
 61 |             eigv = eigv[:, -self.n_components:]
 62 | 
 63 |         sqrt_eigs = np.sqrt(eigs)
 64 |         self.P = np.dot(eigv * (1.0 / sqrt_eigs), eigv.T)
 65 |         assert not np.isnan(self.P).any()
 66 |         self.P_inv = np.dot(eigv * sqrt_eigs, eigv.T)
 67 | 
 68 |         self.P = np.float32(self.P)
 69 |         self.P_inv = np.float32(self.P_inv)
 70 | 
 71 |         self.is_fit = True
 72 | 
 73 |     def apply(self, data, remove_mean=True):
 74 |         data = self._flatten_data(data)
 75 |         d = data - self.mean if remove_mean else data
 76 |         return self._reshape_data(np.dot(d, self.P))
 77 | 
 78 |     def inv(self, data, add_mean=True):
 79 |         d = np.dot(self._flatten_data(data), self.P_inv)
 80 |         d += self.mean if add_mean else 0.
 81 |         return self._reshape_data(d)
 82 | 
 83 |     def _flatten_data(self, data):
 84 |         if self.reshape is None:
 85 |             return data
 86 |         assert data.shape[1:] == self.reshape
 87 |         return data.reshape(data.shape[0], np.product(data.shape[1:]))
 88 | 
 89 |     def _reshape_data(self, data):
 90 |         assert len(data.shape) == 2
 91 |         if self.reshape is None:
 92 |             return data
 93 |         return np.reshape(data, (data.shape[0],) + self.reshape)
 94 | 
 95 | 
 96 | class ContrastNorm(object):
 97 |     def __init__(self, scale=55, epsilon=1e-8):
 98 |         self.scale = np.float32(scale)
 99 |         self.epsilon = np.float32(epsilon)
100 | 
101 |     def apply(self, data, copy=False):
102 |         if copy:
103 |             data = np.copy(data)
104 |         data_shape = data.shape
105 |         if len(data.shape) > 2:
106 |             data = data.reshape(data.shape[0], np.product(data.shape[1:]))
107 | 
108 |         assert len(data.shape) == 2, 'Contrast norm on flattened data'
109 | 
110 |         data -= data.mean(axis=1)[:, np.newaxis]
111 | 
112 |         norms = np.sqrt(np.sum(data ** 2, axis=1)) / self.scale
113 |         norms[norms < self.epsilon] = np.float32(1.)
114 | 
115 |         data /= norms[:, np.newaxis]
116 | 
117 |         if data_shape != data.shape:
118 |             data = data.reshape(data_shape)
119 | 
120 |         return data
121 | 
122 | 
123 | class TestMonitoring(object):
124 |     def _get_bn_params(self, output_vars):
125 |         # Pick out the nodes with batch normalization vars
126 |         cg = ComputationGraph(output_vars)
127 |         var_filter = VariableFilter(roles=[BNPARAM])
128 |         bn_ps = var_filter(cg.variables)
129 | 
130 |         if len(bn_ps) == 0:
131 |             logger.warn('No batch normalization parameters found - is' +
132 |                         ' batch normalization turned off?')
133 |             self._bn = False
134 |             self._counter = None
135 |             self._counter_max = None
136 |             bn_share = []
137 |             output_vars_replaced = output_vars
138 |         else:
139 |             self._bn = True
140 |             assert len(set([p.name for p in bn_ps])) == len(bn_ps), \
141 |                 'Some batch norm params have the same name'
142 |             logger.info('Batch norm parameters: %s' % ', '.join([p.name for p in bn_ps]))
143 | 
144 |             # Filter out the shared variables from the model updates
145 |             def filter_share(par):
146 |                 lst = [up for up in cg.updates if up.name == 'shared_%s' % par.name]
147 |                 assert len(lst) == 1
148 |                 return lst[0]
149 |             bn_share = map(filter_share, bn_ps)
150 | 
151 |             # Replace the BN coefficients in the test data model - Replace the
152 |             # theano variables in the test graph with the shareds
153 |             output_vars_replaced = cg.replace(zip(bn_ps, bn_share)).outputs
154 | 
155 |             # Pick out the counter
156 |             self._counter = self._param_from_updates(cg.updates, 'counter')
157 |             self._counter_max = self._param_from_updates(cg.updates, 'counter_max')
158 | 
159 |         return bn_ps, bn_share, output_vars_replaced
160 | 
161 |     def _param_from_updates(self, updates, p_name):
162 |         var_filter = VariableFilter(roles=[BNPARAM])
163 |         bn_ps = var_filter(updates.keys())
164 |         p = [p for p in bn_ps if p.name == p_name]
165 |         assert len(p) == 1, 'No %s of more than one %s' % (p_name, p_name)
166 |         return p[0]
167 | 
168 |     def reset_counter(self):
169 |         if self._bn:
170 |             self._counter.set_value(np.float32(1))
171 | 
172 |     def replicate_vars(self, output_vars):
173 |         # Problem in Blocks with multiple monitors monitoring the
174 |         # same value in a graph. Therefore, they are all "replicated" to a new
175 |         # Theano variable
176 |         if isinstance(output_vars, (list, tuple)):
177 |             return map(self.replicate_vars, output_vars)
178 |         assert not hasattr(output_vars.tag, 'aggregation_scheme'), \
179 |             'The variable %s already has an aggregator ' % output_vars.name + \
180 |             'assigned to it - are you using a datasetmonitor with the same' + \
181 |             ' variable as output? This might cause trouble in Blocks'
182 |         new_var = 1 * output_vars
183 |         new_var.name = output_vars.name
184 |         return new_var
185 | 
186 | 
187 | class ApproxTestMonitoring(DataStreamMonitoring, TestMonitoring):
188 |     def __init__(self, output_vars, *args, **kwargs):
189 |         output_vars = self.replicate_vars(output_vars)
190 |         _, _, replaced_vars = self._get_bn_params(output_vars)
191 |         super(ApproxTestMonitoring, self).__init__(replaced_vars, *args,
192 |                                                    **kwargs)
193 | 
194 |     def do(self, which_callback, *args, **kwargs):
195 |         assert not which_callback == "after_batch", "Do not monitor each mb"
196 |         self.reset_counter()
197 |         super(ApproxTestMonitoring, self).do(which_callback, *args, **kwargs)
198 | 
199 | 
200 | class FinalTestMonitoring(SimpleExtension, MonitoringExtension, TestMonitoring):
201 |     """Monitors validation and test set data with batch norm
202 | 
203 |     Calculates the training set statistics for batch normalization and adds
204 |     them to the model before calculating the validation and test set values.
205 |     This is done in two steps: First the training set is iterated and the
206 |     statistics are saved in shared variables, then the model iterates through
207 |     the test/validation set using the saved shared variables.
208 |     When the training set is iterated, it is done for the full set, layer by
209 |     layer so that the statistics are correct. This is expensive for very deep
210 |     models, in which case some approximation could be in order
211 |     """
212 |     def __init__(self, output_vars, train_data_stream, test_data_stream,
213 |                  **kwargs):
214 |         output_vars = self.replicate_vars(output_vars)
215 |         super(FinalTestMonitoring, self).__init__(**kwargs)
216 |         self.trn_stream = train_data_stream
217 |         self.tst_stream = test_data_stream
218 | 
219 |         bn_ps, bn_share, output_vars_replaced = self._get_bn_params(output_vars)
220 | 
221 |         if self._bn:
222 |             updates = self._get_updates(bn_ps, bn_share)
223 |             trn_evaluator = DatasetEvaluator(bn_ps, updates=updates)
224 |         else:
225 |             trn_evaluator = None
226 | 
227 |         self._trn_evaluator = trn_evaluator
228 |         self._tst_evaluator = DatasetEvaluator(output_vars_replaced)
229 | 
230 |     def _get_updates(self, bn_ps, bn_share):
231 |         cg = ComputationGraph(bn_ps)
232 |         # Only store updates that relate to params or the counter
233 |         updates = OrderedDict([(up, cg.updates[up]) for up in
234 |                                cg.updates if up.name == 'counter' or
235 |                                up in bn_share])
236 |         assert self._counter == self._param_from_updates(cg.updates, 'counter')
237 |         assert self._counter_max == self._param_from_updates(cg.updates,
238 |                                                              'counter_max')
239 |         assert len(updates) == len(bn_ps) + 1, \
240 |             'Counter or var missing from update'
241 |         return updates
242 | 
243 |     def do(self, which_callback, *args):
244 |         """Write the values of monitored variables to the log."""
245 |         assert not which_callback == "after_batch", "Do not monitor each mb"
246 |         # Run on train data and get the statistics
247 |         if self._bn:
248 |             self._counter_max.set_value(np.float32(np.inf))
249 |             self.reset_counter()
250 |             self._trn_evaluator.evaluate(self.trn_stream)
251 |             self.reset_counter()
252 | 
253 |         value_dict = self._tst_evaluator.evaluate(self.tst_stream)
254 |         self.add_records(self.main_loop.log, value_dict.items())
255 | 
256 | 
257 | class LRDecay(SimpleExtension):
258 |     def __init__(self, lr, decay_first, decay_last, lrmin=0., **kwargs):
259 |         super(LRDecay, self).__init__(**kwargs)
260 |         self.iter = 0
261 |         self.decay_first = decay_first
262 |         self.decay_last = decay_last
263 |         self.lr = lr
264 |         self.lrmin = lrmin
265 |         self.lr_init = lr.get_value()
266 | 
267 |     def do(self, which_callback, *args):
268 |         self.iter += 1
269 |         if self.iter > self.decay_first:
270 |             ratio = 1.0 * (self.decay_last - self.iter)
271 |             ratio = np.maximum(0, ratio / (self.decay_last - self.decay_first + 1e-6))
272 |             self.lr.set_value(np.float32(ratio * (self.lr_init - self.lrmin) + self.lrmin))
273 |         logger.info("Iter %d, lr %f" % (self.iter, self.lr.get_value()))
274 | 
275 | 
276 | def global_meanpool_2d(x, num_filters):
277 |     mean = tensor.mean(x.flatten(3), axis=2)
278 |     mean = mean.dimshuffle(0, 1, 'x', 'x')
279 |     return mean, (num_filters, 1, 1)
280 | 
281 | 
282 | def pool_2d(x, mode="average", ws=(2, 2), stride=(2, 2)):
283 |     import theano.sandbox.cuda as cuda
284 |     assert cuda.dnn.dnn_available()
285 |     return cuda.dnn.dnn_pool(x, ws=ws, stride=stride, mode=mode)
286 | 
287 | 
288 | def maxpool_2d(z, in_dim, poolsize, poolstride):
289 |     z = max_pool_2d(z, ds=poolsize, st=poolstride)
290 |     output_size = tuple(DownsampleFactorMax.out_shape(in_dim, poolsize,
291 |                                                       st=poolstride))
292 |     return z, output_size
293 | 


--------------------------------------------------------------------------------
/run.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python
  2 | 
  3 | import functools
  4 | import logging
  5 | import os
  6 | import subprocess
  7 | from argparse import ArgumentParser, Action, SUPPRESS
  8 | nodefaultargs = []
  9 | from collections import OrderedDict
 10 | import sys
 11 | 
 12 | import numpy
 13 | import time
 14 | import theano
 15 | from theano.tensor.type import TensorType
 16 | from pandas import DataFrame
 17 | 
 18 | from blocks.algorithms import GradientDescent, Adam
 19 | from blocks.extensions import FinishAfter
 20 | from blocks.extensions.monitoring import TrainingDataMonitoring
 21 | from blocks.filter import VariableFilter
 22 | from blocks.graph import ComputationGraph
 23 | from blocks.main_loop import MainLoop
 24 | from blocks.model import Model
 25 | from blocks.roles import PARAMETER
 26 | from fuel.datasets import MNIST, CIFAR10
 27 | from fuel.schemes import ShuffledScheme, SequentialScheme
 28 | from fuel.streams import DataStream
 29 | from fuel.transformers import Transformer
 30 | 
 31 | from picklable_itertools import cycle, imap
 32 | from itertools import izip, product, tee
 33 | 
 34 | logger = logging.getLogger('main')
 35 | 
 36 | from utils import ShortPrinting, prepare_dir, load_df, DummyLoop
 37 | from utils import SaveExpParams, SaveLog, SaveParams, AttributeDict
 38 | from nn import ZCA, ContrastNorm
 39 | from nn import ApproxTestMonitoring, FinalTestMonitoring, TestMonitoring
 40 | from nn import LRDecay
 41 | from ladder import LadderAE
 42 | 
 43 | debug = sys.gettrace() is not None
 44 | if debug:
 45 |     theano.config.optimizer='fast_compile'
 46 |     theano.config.exception_verbosity='high'
 47 |     theano.config.compute_test_value = 'warn'
 48 | floatX = theano.config.floatX
 49 | 
 50 | class Whitening(Transformer):
 51 |     """ Makes a copy of the examples in the underlying dataset and whitens it
 52 |         if necessary.
 53 |     """
 54 |     def __init__(self, data_stream, iteration_scheme, whiten, cnorm=None,
 55 |                  **kwargs):
 56 |         super(Whitening, self).__init__(data_stream,
 57 |                                         iteration_scheme=iteration_scheme,
 58 |                                         **kwargs)
 59 |         data = data_stream.get_data(slice(data_stream.dataset.num_examples))
 60 |         self.data = []
 61 |         for s, d in zip(self.sources, data):
 62 |             if 'features' == s:
 63 |                 # Fuel provides Cifar in uint8, convert to float32
 64 |                 d = numpy.require(d, dtype=numpy.float32)
 65 |                 if cnorm is not None:
 66 |                     d = cnorm.apply(d)
 67 |                 if whiten is not None:
 68 |                     d = whiten.apply(d)
 69 |                 self.data += [d]
 70 |             elif 'targets' == s:
 71 |                 d = unify_labels(d)
 72 |                 self.data += [d]
 73 |             else:
 74 |                 raise Exception("Unsupported Fuel target: %s" % s)
 75 | 
 76 |     def get_data(self, request=None):
 77 |         return (s[request] for s in self.data)
 78 | 
 79 | 
 80 | class SemiDataStream(Transformer):
 81 |     """ Combines two datastreams into one such that 'target' source (labels)
 82 |         is used only from the first one. The second one is renamed
 83 |         to avoid collision. Upon iteration, the first one is repeated until
 84 |         the second one depletes.
 85 |         """
 86 |     def __init__(self, data_stream_labeled, data_stream_unlabeled, **kwargs):
 87 |         super(Transformer, self).__init__(**kwargs)
 88 |         self.ds_labeled = data_stream_labeled
 89 |         self.ds_unlabeled = data_stream_unlabeled
 90 |         # Rename the sources for clarity
 91 |         self.ds_labeled.sources = ('features_labeled', 'targets_labeled')
 92 |         # Rename the source for input pixels and hide its labels!
 93 |         self.ds_unlabeled.sources = ('features_unlabeled',)
 94 | 
 95 |     @property
 96 |     def sources(self):
 97 |         if hasattr(self, '_sources'):
 98 |             return self._sources
 99 |         return self.ds_labeled.sources + self.ds_unlabeled.sources
100 | 
101 |     @sources.setter
102 |     def sources(self, value):
103 |         self._sources = value
104 | 
105 |     def close(self):
106 |         self.ds_labeled.close()
107 |         self.ds_unlabeled.close()
108 | 
109 |     def reset(self):
110 |         self.ds_labeled.reset()
111 |         self.ds_unlabeled.reset()
112 | 
113 |     def next_epoch(self):
114 |         self.ds_labeled.next_epoch()
115 |         self.ds_unlabeled.next_epoch()
116 | 
117 |     def get_epoch_iterator(self, **kwargs):
118 |         unlabeled = self.ds_unlabeled.get_epoch_iterator(**kwargs)
119 |         labeled = self.ds_labeled.get_epoch_iterator(**kwargs)
120 |         assert type(labeled) == type(unlabeled)
121 | 
122 |         return imap(self.mergedicts, cycle(labeled), unlabeled)
123 | 
124 |     def mergedicts(self, x, y):
125 |         return dict(list(x.items()) + list(y.items()))
126 | 
127 | 
128 | def unify_labels(y):
129 |     """ Work-around for Fuel bug where MNIST and Cifar-10
130 |     datasets have different dimensionalities for the targets:
131 |     e.g. (50000, 1) vs (60000,) """
132 |     yshape = y.shape
133 |     y = y.flatten()
134 |     assert y.shape[0] == yshape[0]
135 |     return y
136 | 
137 | 
138 | def make_datastream(dataset, indices, batch_size,
139 |                     n_labeled=None, n_unlabeled=None,
140 |                     balanced_classes=True, whiten=None, cnorm=None,
141 |                     scheme=ShuffledScheme, dseed=None):
142 |     """
143 | 
144 |     :param dataset:
145 |     :param indices:
146 |     :param batch_size:
147 |     :param n_labeled: None, int, list
148 |         if None or 0 then all indices are used as labeled data.
149 |         otherwise only the first n_labeled indices are used as labeled.
150 |         If a list then balanced_classes must be true and the list specificy
151 |          the number of examples to take from each category. If a category is
152 |          too small than samples are repeated
153 |     :param n_unlabeled:
154 |     :param balanced_classes:
155 |     :param whiten:
156 |     :param cnorm:
157 |     :param scheme:
158 |     :return:
159 |     """
160 |     if isinstance(n_labeled,tuple):
161 |         assert balanced_classes
162 |         n_labeled_list = n_labeled if len(n_labeled) > 1 else None
163 |         n_labeled = sum(n_labeled) if len(n_labeled) > 0 else 0
164 |     else:
165 |         n_labeled_list = None
166 |     if n_labeled is None or n_labeled <= 0:
167 |         n_labeled = len(indices)
168 |     if batch_size is None:
169 |         batch_size = len(indices)
170 |     if n_unlabeled is None or n_unlabeled < 0:
171 |         n_unlabeled = len(indices)
172 |     assert n_labeled <= n_unlabeled, 'need less labeled than unlabeled'
173 | 
174 |     all_data = dataset.data_sources[dataset.sources.index('targets')]
175 |     y = unify_labels(all_data)[indices]
176 |     if len(y):
177 |         n_classes = y.max() + 1
178 |         assert n_labeled_list is None or len(n_labeled_list) == n_classes
179 |         logger.info('#samples %d #class %d' % (len(y),n_classes))
180 |         # for c in range(n_classes):
181 |         #     c_count = (y == c).sum()
182 |         #     logger.info('Class %d size %d %f%%' % (c, c_count, float(c_count)/len(y)))
183 | 
184 |     # Get unlabeled indices
185 |     i_unlabeled = indices[:n_unlabeled]
186 | 
187 |     if balanced_classes and n_labeled < n_unlabeled:
188 |         # Ensure each label is equally represented
189 |         logger.info('Balancing %d labels...' % n_labeled)
190 |         assert n_labeled % n_classes == 0
191 |         n_from_each_class = n_labeled / n_classes
192 | 
193 |         i_labeled = []
194 |         for c in range(n_classes):
195 |             n_from_class = n_from_each_class if n_labeled_list is None else n_labeled_list[c]
196 |             # if a class does not have enough examples, then duplicate
197 |             ids = []
198 |             while len(ids) < n_from_class:
199 |                 n = n_from_class - len(ids)
200 |                 i = (i_unlabeled[y[:n_unlabeled] == c])[:n]
201 |                 ids += list(i)
202 |             i_labeled += ids
203 |         # no need to shuffle the samples because latter
204 |         # ds=SemiDataStream(...,iteration_scheme=ShuffledScheme,...)
205 |     else:
206 |         i_labeled = indices[:n_labeled]
207 | 
208 |     ds = SemiDataStream(
209 |         data_stream_labeled=Whitening(
210 |             DataStream(dataset),
211 |             iteration_scheme=scheme(i_labeled, batch_size),
212 |             whiten=whiten, cnorm=cnorm),
213 |         data_stream_unlabeled=Whitening(
214 |             DataStream(dataset),
215 |             iteration_scheme=scheme(i_unlabeled, batch_size),
216 |             whiten=whiten, cnorm=cnorm)
217 |     )
218 |     return ds
219 | 
220 | 
221 | def setup_model(p):
222 |     ladder = LadderAE(p)
223 |     # Setup inputs
224 |     input_type = TensorType('float32', [False] * (len(p.encoder_layers[0]) + 1))
225 |     x_only = input_type('features_unlabeled')
226 |     if debug:
227 |         x_only.tag.test_value =  numpy.random.normal(size=(p.batch_size,)+p.encoder_layers[0]).astype(floatX)
228 |     x = input_type('features_labeled')
229 |     if debug:
230 |         x.tag.test_value =  numpy.random.normal(size=(p.batch_size,)+p.encoder_layers[0]).astype(floatX)
231 |     y = theano.tensor.lvector('targets_labeled')
232 |     if debug:
233 |         y.tag.test_value = numpy.random.randint(1,int(p.encoder_layers[-1])+1,(p.batch_size))
234 |     ladder.apply(x, y, x_only)
235 | 
236 |     # Load parameters if requested
237 |     if p.get('load_from'):
238 |         with open(p.load_from + '/trained_params.npz') as f:
239 |             loaded = numpy.load(f)
240 |             cg = ComputationGraph([ladder.costs.total])
241 |             current_params = VariableFilter(roles=[PARAMETER])(cg.variables)
242 |             logger.info('Loading parameters: %s' % ', '.join(loaded.keys()))
243 |             for param in current_params:
244 |                 assert param.get_value().shape == loaded[param.name].shape
245 |                 param.set_value(loaded[param.name])
246 | 
247 |     return ladder
248 | 
249 | 
250 | def load_and_log_params(cli_params):
251 |     cli_params = AttributeDict(cli_params)
252 |     if cli_params.get('load_from'):
253 |         p = load_df(cli_params.load_from, 'params').to_dict()[0]
254 |         p = AttributeDict(p)
255 |         for key in cli_params.iterkeys():
256 |             if key not in p:
257 |                 p[key] = None
258 |         new_params = cli_params
259 |         loaded = True
260 |     else:
261 |         p = cli_params
262 |         new_params = {}
263 |         loaded = False
264 | 
265 |         # Make dseed seed unless specified explicitly
266 |         if p.get('dseed') is None and p.get('seed') is not None:
267 |             p['dseed'] = p['seed']
268 | 
269 |     logger.info('== COMMAND LINE ==')
270 |     logger.info(' '.join(sys.argv))
271 | 
272 |     logger.info('== PARAMETERS ==')
273 |     for k, v in p.iteritems():
274 |         replace_str = ""
275 |         if loaded:
276 |             if k in nodefaultargs:
277 |                 p[k] = new_params[k]
278 |                 replace_str = "<- " + str(new_params.get(k))
279 |             elif p.get(k) is None and new_params.get(k) is not None:
280 |                 p[k] = new_params[k]
281 |                 replace_str = "<- " + str(new_params.get(k))
282 |         else:
283 |             if new_params.get(k) is not None:
284 |                 p[k] = new_params[k]
285 |                 replace_str = "<- " + str(new_params.get(k))
286 |         logger.info(" {:20}: {:<20} {}".format(k, v, replace_str))
287 |     return p, loaded
288 | 
289 | 
290 | def setup_data(p, test_set=False):
291 |     if p.dataset in ['cifar10','mnist']:
292 |         dataset_class, training_set_size = {
293 |             'cifar10': (CIFAR10, 40000),
294 |             'mnist': (MNIST, 50000),
295 |         }[p.dataset]
296 |     else:
297 |         from fuel.datasets import H5PYDataset
298 |         from fuel.utils import find_in_data_path
299 |         from functools import partial
300 |         fn=p.dataset
301 |         fn=os.path.join(fn, fn + '.hdf5')
302 |         def dataset_class(which_sets):
303 |             return H5PYDataset(file_or_path=find_in_data_path(fn),
304 |                                which_sets=which_sets,
305 |                                load_in_memory=True)
306 |         training_set_size = None
307 | 
308 |     train_set = dataset_class(["train"])
309 | 
310 |     # Allow overriding the default from command line
311 |     if p.get('unlabeled_samples') is not None and p.unlabeled_samples >= 0:
312 |         training_set_size = p.unlabeled_samples
313 |     elif training_set_size is None:
314 |         training_set_size = train_set.num_examples
315 | 
316 |     # Make sure the MNIST data is in right format
317 |     if p.dataset == 'mnist':
318 |         d = train_set.data_sources[train_set.sources.index('features')]
319 |         assert numpy.all(d <= 1.0) and numpy.all(d >= 0.0), \
320 |             'Make sure data is in float format and in range 0 to 1'
321 | 
322 |     # Take all indices and permutate them
323 |     all_ind = numpy.arange(train_set.num_examples)
324 |     if p.get('dseed'):
325 |         rng = numpy.random.RandomState(seed=p.dseed)
326 |         rng.shuffle(all_ind)
327 | 
328 |     d = AttributeDict()
329 | 
330 |     # Choose the training set
331 |     d.train = train_set
332 |     d.train_ind = all_ind[:training_set_size]
333 | 
334 |     # Then choose validation set from the remaining indices
335 |     d.valid = train_set
336 |     d.valid_ind = numpy.setdiff1d(all_ind, d.train_ind)[:p.valid_set_size]
337 |     logger.info('Using %d examples for validation' % len(d.valid_ind))
338 | 
339 |     # Only touch test data if requested
340 |     if test_set:
341 |         d.test = dataset_class(["test"])
342 |         d.test_ind = numpy.arange(d.test.num_examples)
343 | 
344 |     # Setup optional whitening, only used for Cifar-10
345 |     in_dim = train_set.data_sources[train_set.sources.index('features')].shape[1:]
346 |     if len(in_dim) > 1 and p.whiten_zca > 0:
347 |         assert numpy.product(in_dim) == p.whiten_zca, \
348 |             'Need %d whitening dimensions, not %d' % (numpy.product(in_dim),
349 |                                                       p.whiten_zca)
350 |     cnorm = ContrastNorm(p.contrast_norm) if p.contrast_norm != 0 else None
351 | 
352 |     def get_data(d, i):
353 |         data = d.get_data(request=i)[d.sources.index('features')]
354 |         # Fuel provides Cifar in uint8, convert to float32
355 |         data = numpy.require(data, dtype=numpy.float32)
356 |         return data if cnorm is None else cnorm.apply(data)
357 | 
358 |     if p.whiten_zca > 0:
359 |         logger.info('Whitening using %d ZCA components' % p.whiten_zca)
360 |         whiten = ZCA()
361 |         whiten.fit(p.whiten_zca, get_data(d.train, d.train_ind))
362 |     else:
363 |         whiten = None
364 | 
365 |     return in_dim, d, whiten, cnorm
366 | 
367 | 
368 | def get_error(args):
369 |     """ Calculate the classification error
370 |     called when evaluating
371 |     """
372 |     args['data_type'] = args.get('data_type', 'test')
373 |     args['no_load'] = 'g_'
374 | 
375 |     targets, acts = analyze(args)
376 |     guess = numpy.argmax(acts, axis=1)
377 |     correct = numpy.sum(numpy.equal(guess, targets.flatten()))
378 | 
379 |     return (1. - correct / float(len(guess))) * 100.
380 | 
381 | 
382 | def get_layer(args):
383 |     """ Get the output of the layer just below softmax
384 |     """
385 |     args['data_type'] = args.get('data_type', 'test')
386 |     args['no_load'] = 'g_'
387 |     args['layer'] = args.get('layer', -1)
388 | 
389 |     targets, acts = analyze(args)
390 | 
391 |     return acts
392 | 
393 | 
394 | def analyze(cli_params):
395 |     """
396 |     called when evaluating
397 |     :return: inputs, result
398 |     """
399 |     p, _ = load_and_log_params(cli_params)
400 |     _, data, whiten, cnorm = setup_data(p, test_set=(p.data_type == 'test'))
401 |     ladder = setup_model(p)
402 | 
403 |     # Analyze activations
404 |     if p.data_type == 'train':
405 |         dset, indices, calc_batchnorm = data.train, data.train_ind, False
406 |     elif p.data_type == 'valid':
407 |         dset, indices, calc_batchnorm = data.valid, data.valid_ind, True
408 |     elif p.data_type == 'test':
409 |         dset, indices, calc_batchnorm = data.test, data.test_ind, True
410 |     else:
411 |         raise Exception("Unknown data-type %s"%p.data_type)
412 | 
413 |     if calc_batchnorm:
414 |         logger.info('Calculating batch normalization for clean.labeled path')
415 |         main_loop = DummyLoop(
416 |             extensions=[
417 |                 FinalTestMonitoring(
418 |                     [ladder.costs.class_clean, ladder.error.clean, ladder.oos.clean]
419 |                     + ladder.costs.denois.values(),
420 |                     make_datastream(data.train, data.train_ind,
421 |                                     # These need to match with the training
422 |                                     p.batch_size,
423 |                                     n_labeled=p.labeled_samples,
424 |                                     n_unlabeled=len(data.train_ind),
425 |                                     cnorm=cnorm,
426 |                                     balanced_classes=p.balanced_classes,
427 |                                     whiten=whiten, scheme=ShuffledScheme),
428 |                     make_datastream(data.valid, data.valid_ind,
429 |                                     p.valid_batch_size,
430 |                                     n_labeled=len(data.valid_ind),
431 |                                     n_unlabeled=len(data.valid_ind),
432 |                                     balanced_classes=p.balanced_classes,
433 |                                     cnorm=cnorm,
434 |                                     whiten=whiten, scheme=ShuffledScheme),
435 |                     prefix="valid_final", before_training=True),
436 |                 ShortPrinting({
437 |                     "valid_final": OrderedDict([
438 |                         ('VF_C_class', ladder.costs.class_clean),
439 |                         ('VF_E', ladder.error.clean),
440 |                         ('VF_O', ladder.oos.clean),
441 |                         ('VF_C_de', [ladder.costs.denois.get(0),
442 |                                      ladder.costs.denois.get(1),
443 |                                      ladder.costs.denois.get(2),
444 |                                      ladder.costs.denois.get(3)]),
445 |                     ]),
446 |                 }, after_training=True, use_log=False),
447 |             ])
448 |         main_loop.run()
449 |         # df = DataFrame.from_dict(main_loop.log, orient='index')
450 |         # col = 'valid_final_error_rate_clean'
451 |         # logger.info('%s %g' % (col, df[col].iloc[-1]))
452 | 
453 |     # Make a datastream that has all the indices in the labeled pathway
454 |     ds = make_datastream(dset, indices,
455 |                          batch_size=p.get('batch_size'),
456 |                          n_labeled=len(indices),
457 |                          n_unlabeled=len(indices),
458 |                          balanced_classes=False,
459 |                          whiten=whiten,
460 |                          cnorm=cnorm,
461 |                          scheme=SequentialScheme)
462 | 
463 |     # If layer=-1 we want out the values after softmax
464 |     outputs = ladder.act.clean.labeled.h[len(ladder.layers) - 1]
465 | 
466 |     # Replace the batch normalization paramameters with the shared variables
467 |     if calc_batchnorm:
468 |         outputreplacer = TestMonitoring()
469 |         _, _,  outputs = outputreplacer._get_bn_params(outputs)
470 | 
471 |     cg = ComputationGraph(outputs)
472 |     f = cg.get_theano_function()
473 | 
474 |     it = ds.get_epoch_iterator(as_dict=True)
475 |     res = []
476 |     inputs = {'features_labeled': [],
477 |               'targets_labeled': [],
478 |               'features_unlabeled': []}
479 |     # Loop over one epoch
480 |     for d in it:
481 |         # Store all inputs
482 |         for k, v in d.iteritems():
483 |             inputs[k] += [v]
484 |         # Store outputs
485 |         res += [f(*[d[str(inp)] for inp in cg.inputs])]
486 | 
487 |     # Concatenate all minibatches
488 |     res = [numpy.vstack(minibatches) for minibatches in zip(*res)]
489 |     inputs = {k: numpy.concatenate(v) for k, v in inputs.iteritems()}
490 | 
491 |     return inputs['targets_labeled'], res[0]
492 | 
493 | def dump_unlabeled_encoder(cli_params):
494 |     """
495 |     called when dumping
496 |     :return: inputs, result
497 |     """
498 |     p, _ = load_and_log_params(cli_params)
499 |     _, data, whiten, cnorm = setup_data(p, test_set=(p.data_type == 'test'))
500 |     ladder = setup_model(p)
501 | 
502 |     # Analyze activations
503 |     if p.data_type == 'train':
504 |         dset, indices, calc_batchnorm = data.train, data.train_ind, False
505 |     elif p.data_type == 'valid':
506 |         dset, indices, calc_batchnorm = data.valid, data.valid_ind, True
507 |     elif p.data_type == 'test':
508 |         dset, indices, calc_batchnorm = data.test, data.test_ind, True
509 |     else:
510 |         raise Exception("Unknown data-type %s"%p.data_type)
511 | 
512 |     if calc_batchnorm:
513 |         logger.info('Calculating batch normalization for clean.labeled path')
514 |         main_loop = DummyLoop(
515 |             extensions=[
516 |                 FinalTestMonitoring(
517 |                     [ladder.costs.class_clean, ladder.error.clean, ladder.oos.clean]
518 |                     + ladder.costs.denois.values(),
519 |                     make_datastream(data.train, data.train_ind,
520 |                                     # These need to match with the training
521 |                                     p.batch_size,
522 |                                     n_labeled=p.labeled_samples,
523 |                                     n_unlabeled=len(data.train_ind),
524 |                                     balanced_classes=p.balanced_classes,
525 |                                     cnorm=cnorm,
526 |                                     whiten=whiten, scheme=ShuffledScheme),
527 |                     make_datastream(data.valid, data.valid_ind,
528 |                                     p.valid_batch_size,
529 |                                     n_labeled=len(data.valid_ind),
530 |                                     n_unlabeled=len(data.valid_ind),
531 |                                     balanced_classes=p.balanced_classes,
532 |                                     cnorm=cnorm,
533 |                                     whiten=whiten, scheme=ShuffledScheme),
534 |                     prefix="valid_final", before_training=True),
535 |                 ShortPrinting({
536 |                     "valid_final": OrderedDict([
537 |                         ('VF_C_class', ladder.costs.class_clean),
538 |                         ('VF_E', ladder.error.clean),
539 |                         ('VF_O', ladder.oos.clean),
540 |                         ('VF_C_de', [ladder.costs.denois.get(0),
541 |                                      ladder.costs.denois.get(1),
542 |                                      ladder.costs.denois.get(2),
543 |                                      ladder.costs.denois.get(3)]),
544 |                     ]),
545 |                 }, after_training=True, use_log=False),
546 |             ])
547 |         main_loop.run()
548 | 
549 |     all_ind = numpy.arange(dset.num_examples)
550 |     # Make a datastream that has all the indices in the labeled pathway
551 |     ds = make_datastream(dset, all_ind,
552 |                          batch_size=p.get('batch_size'),
553 |                          n_labeled=len(all_ind),
554 |                          n_unlabeled=len(all_ind),
555 |                          balanced_classes=False,
556 |                          whiten=whiten,
557 |                          cnorm=cnorm,
558 |                          scheme=SequentialScheme)
559 | 
560 |     # If layer=-1 we want out the values after softmax
561 |     if p.layer < 0:
562 |         # ladder.act.clean.unlabeled.h is a dict not a list
563 |         outputs = ladder.act.clean.labeled.h[len(ladder.layers) + p.layer]
564 |     else:
565 |         outputs = ladder.act.clean.labeled.h[p.layer]
566 | 
567 |     # Replace the batch normalization paramameters with the shared variables
568 |     if calc_batchnorm:
569 |         outputreplacer = TestMonitoring()
570 |         _, _,  outputs = outputreplacer._get_bn_params(outputs)
571 | 
572 |     cg = ComputationGraph(outputs)
573 |     f = cg.get_theano_function()
574 | 
575 |     it = ds.get_epoch_iterator(as_dict=True)
576 |     res = []
577 | 
578 |     # Loop over one epoch
579 |     for d in it:
580 |         # Store outputs
581 |         res += [f(*[d[str(inp)] for inp in cg.inputs])]
582 | 
583 |     # Concatenate all minibatches
584 |     res = [numpy.vstack(minibatches) for minibatches in zip(*res)]
585 | 
586 |     return res[0]
587 | 
588 | 
589 | def train(cli_params):
590 |     fn = 'noname'
591 |     if 'save_to' in nodefaultargs or not cli_params.get('load_from'):
592 |         fn = cli_params['save_to']
593 |     cli_params['save_dir'] = prepare_dir(fn)
594 |     nodefaultargs.append('save_dir')
595 | 
596 |     logfile = os.path.join(cli_params['save_dir'], 'log.txt')
597 | 
598 |     # Log also DEBUG to a file
599 |     fh = logging.FileHandler(filename=logfile)
600 |     fh.setLevel(logging.DEBUG)
601 |     logger.addHandler(fh)
602 | 
603 |     logger.info('Logging into %s' % logfile)
604 | 
605 |     p, loaded = load_and_log_params(cli_params)
606 | 
607 |     in_dim, data, whiten, cnorm = setup_data(p, test_set=False)
608 |     if not loaded:
609 |         # Set the zero layer to match input dimensions
610 |         p.encoder_layers = (in_dim,) + p.encoder_layers
611 | 
612 |     ladder = setup_model(p)
613 | 
614 |     # Training
615 |     all_params = ComputationGraph([ladder.costs.total]).parameters
616 |     logger.info('Found the following parameters: %s' % str(all_params))
617 | 
618 |     # Fetch all batch normalization updates. They are in the clean path.
619 |     # you can turn off BN by setting is_normalizing = False in ladder.py
620 |     bn_updates = ComputationGraph([ladder.costs.class_clean]).updates
621 |     assert not bn_updates or 'counter' in [u.name for u in bn_updates.keys()], \
622 |         'No batch norm params in graph - the graph has been cut?'
623 | 
624 |     training_algorithm = GradientDescent(
625 |         cost=ladder.costs.total, parameters=all_params,
626 |         step_rule=Adam(learning_rate=ladder.lr))
627 |     # In addition to actual training, also do BN variable approximations
628 |     if bn_updates:
629 |         training_algorithm.add_updates(bn_updates)
630 | 
631 |     short_prints = {
632 |         "train": OrderedDict([
633 |             ('T_E', ladder.error.clean),
634 |             ('T_O', ladder.oos.clean),
635 |             ('T_C_class', ladder.costs.class_corr),
636 |             ('T_C_de', ladder.costs.denois.values()),
637 |             ('T_T', ladder.costs.total),
638 |         ]),
639 |         "valid_approx": OrderedDict([
640 |             ('V_C_class', ladder.costs.class_clean),
641 |             ('V_E', ladder.error.clean),
642 |             ('V_O', ladder.oos.clean),
643 |             ('V_C_de', ladder.costs.denois.values()),
644 |             ('V_T', ladder.costs.total),
645 |         ]),
646 |         "valid_final": OrderedDict([
647 |             ('VF_C_class', ladder.costs.class_clean),
648 |             ('VF_E', ladder.error.clean),
649 |             ('VF_O', ladder.oos.clean),
650 |             ('VF_C_de', ladder.costs.denois.values()),
651 |             ('V_T', ladder.costs.total),
652 |         ]),
653 |     }
654 | 
655 |     if len(data.valid_ind):
656 |         main_loop = MainLoop(
657 |             training_algorithm,
658 |             # Datastream used for training
659 |             make_datastream(data.train, data.train_ind,
660 |                             p.batch_size,
661 |                             n_labeled=p.labeled_samples,
662 |                             n_unlabeled=p.unlabeled_samples,
663 |                             whiten=whiten,
664 |                             cnorm=cnorm,
665 |                             balanced_classes=p.balanced_classes,
666 |                             dseed=p.dseed),
667 |             model=Model(ladder.costs.total),
668 |             extensions=[
669 |                 FinishAfter(after_n_epochs=p.num_epochs),
670 | 
671 |                 # This will estimate the validation error using
672 |                 # running average estimates of the batch normalization
673 |                 # parameters, mean and variance
674 |                 ApproxTestMonitoring(
675 |                     [ladder.costs.class_clean, ladder.error.clean, ladder.oos.clean, ladder.costs.total]
676 |                     + ladder.costs.denois.values(),
677 |                     make_datastream(data.valid, data.valid_ind,
678 |                                     p.valid_batch_size, whiten=whiten, cnorm=cnorm,
679 |                                     balanced_classes=p.balanced_classes,
680 |                                     scheme=ShuffledScheme),
681 |                     prefix="valid_approx"),
682 | 
683 |                 # This Monitor is slower, but more accurate since it will first
684 |                 # estimate batch normalization parameters from training data and
685 |                 # then do another pass to calculate the validation error.
686 |                 FinalTestMonitoring(
687 |                     [ladder.costs.class_clean, ladder.error.clean, ladder.oos.clean, ladder.costs.total]
688 |                     + ladder.costs.denois.values(),
689 |                     make_datastream(data.train, data.train_ind,
690 |                                     p.batch_size,
691 |                                     n_labeled=p.labeled_samples,
692 |                                     whiten=whiten, cnorm=cnorm,
693 |                                     balanced_classes=p.balanced_classes,
694 |                                     scheme=ShuffledScheme),
695 |                     make_datastream(data.valid, data.valid_ind,
696 |                                     p.valid_batch_size,
697 |                                     n_labeled=len(data.valid_ind),
698 |                                     whiten=whiten, cnorm=cnorm,
699 |                                     balanced_classes=p.balanced_classes,
700 |                                     scheme=ShuffledScheme),
701 |                     prefix="valid_final",
702 |                     after_n_epochs=p.num_epochs, after_training=True),
703 | 
704 |                 TrainingDataMonitoring(
705 |                     [ladder.error.clean, ladder.oos.clean, ladder.costs.total, ladder.costs.class_corr,
706 |                      training_algorithm.total_gradient_norm]
707 |                     + ladder.costs.denois.values(),
708 |                     prefix="train", after_epoch=True),
709 |                 # ladder.costs.class_clean - save model whenever we have best validation result another option `('train',ladder.costs.total)`
710 |                 SaveParams(('valid_approx', ladder.error.clean), all_params, p.save_dir, after_epoch=True),
711 |                 SaveExpParams(p, p.save_dir, before_training=True),
712 |                 SaveLog(p.save_dir, after_training=True),
713 |                 ShortPrinting(short_prints),
714 |                 LRDecay(ladder.lr, p.num_epochs * p.lrate_decay, p.num_epochs, lrmin=p.lrmin,
715 |                         after_epoch=True),
716 |             ])
717 |     else:
718 |         main_loop = MainLoop(
719 |             training_algorithm,
720 |             # Datastream used for training
721 |             make_datastream(data.train, data.train_ind,
722 |                             p.batch_size,
723 |                             n_labeled=p.labeled_samples,
724 |                             n_unlabeled=p.unlabeled_samples,
725 |                             whiten=whiten,
726 |                             cnorm=cnorm,
727 |                             balanced_classes=p.balanced_classes,
728 |                             dseed=p.dseed),
729 |             model=Model(ladder.costs.total),
730 |             extensions=[
731 |                 FinishAfter(after_n_epochs=p.num_epochs),
732 |                 TrainingDataMonitoring(
733 |                     [ladder.error.clean, ladder.oos.clean, ladder.costs.total, ladder.costs.class_corr,
734 |                      training_algorithm.total_gradient_norm]
735 |                     + ladder.costs.denois.values(),
736 |                     prefix="train", after_epoch=True),
737 |                 # ladder.costs.class_clean - save model whenever we have best validation result another option `('train',ladder.costs.total)`
738 |                 SaveParams(('train', ladder.error.clean), all_params, p.save_dir, after_epoch=True),
739 |                 SaveExpParams(p, p.save_dir, before_training=True),
740 |                 SaveLog(p.save_dir, after_training=True),
741 |                 ShortPrinting(short_prints),
742 |                 LRDecay(ladder.lr, p.num_epochs * p.lrate_decay, p.num_epochs, lrmin=p.lrmin,
743 |                         after_epoch=True),
744 |             ])
745 |     main_loop.run()
746 | 
747 |     # Get results
748 |     if len(data.valid_ind) == 0 :
749 |         return None
750 | 
751 |     df = DataFrame.from_dict(main_loop.log, orient='index')
752 |     col = 'valid_final_error_rate_clean'
753 |     logger.info('%s %g' % (col, df[col].iloc[-1]))
754 | 
755 |     if main_loop.log.status['epoch_interrupt_received']:
756 |         return None
757 |     return df
758 | 
759 | if __name__ == "__main__":
760 |     logging.basicConfig(level=logging.INFO)
761 | 
762 |     rep = lambda s: s.replace('-', ',')
763 |     chop = lambda s: s.split(',')
764 |     to_int = lambda ss: [int(s) for s in ss if s.isdigit()]
765 |     to_float = lambda ss: [float(s) for s in ss]
766 | 
767 |     def to_bool(s):
768 |         if s.lower() in ['true', 't']:
769 |             return True
770 |         elif s.lower() in ['false', 'f']:
771 |             return False
772 |         else:
773 |             raise Exception("Unknown bool value %s" % s)
774 | 
775 |     def compose(*funs):
776 |         return functools.reduce(lambda f, g: lambda x: f(g(x)), funs)
777 | 
778 |     # Functional parsing logic to allow flexible function compositions
779 |     # as actions for ArgumentParser
780 |     def funcs(additional_arg):
781 |         class customAction(Action):
782 |             def __call__(self, parser, args, values, option_string=None):
783 | 
784 |                 def process(arg, func_list):
785 |                     if arg is None:
786 |                         return None
787 |                     elif type(arg) is list:
788 |                         return map(compose(*func_list), arg)
789 |                     else:
790 |                         return compose(*func_list)(arg)
791 | 
792 |                 setattr(args, self.dest, process(values, additional_arg))
793 |         return customAction
794 | 
795 |     def add_train_params(parser, use_defaults):
796 |         a = parser.add_argument
797 |         default = lambda x: x if use_defaults else None
798 | 
799 |         # General hyper parameters and settings
800 |         a("save_to", help="Destination to save the state and results",
801 |           default=default("noname"), nargs="?")
802 |         a("--num-epochs", help="Number of training epochs",
803 |           type=int, default=default(150))
804 |         a("--seed", help="Seed",
805 |           type=int, default=default([1]), nargs='+')
806 |         a("--dseed", help="Data permutation seed, defaults to 'seed'",
807 |           type=int, default=default([None]), nargs='+')
808 |         a("--labeled-samples", help="How many supervised samples are used. "
809 |         "By default all indices are used as labeled data. "
810 |         "If a number is given then only the first samples are used as labeled. "
811 |         "If a list is given then the list specificy the number of samples to "
812 |         "take from each category and if a category is too small than samples "
813 |         "are repeated",
814 |           type=str, default=default(None), nargs='+', action=funcs([tuple, to_int, chop]))
815 |         a("--unlabeled-samples", help="How many unsupervised samples are used",
816 |           type=int, default=default(None), nargs='+')
817 |         a("--dataset", type=str, default=default(['mnist']), nargs='+',
818 |           help="Which dataset to use. mnist, cifar10 or your own hdf5")
819 |         a("--lr", help="Initial learning rate",
820 |           type=float, default=default([0.002]), nargs='+')
821 |         a("--lrmin", help="minimal learning rate",
822 |           type=float, default=default([0.]), nargs='+')
823 |         a("--lrate-decay", help="When to linearly start decaying lrate (0-1)",
824 |           type=float, default=default([0.67]), nargs='+')
825 |         a("--alpha",
826 |           type=float, default=default([0.]), nargs='+',
827 |           help='Weight of self-entropy cost applied to corrupted predictions')
828 |         a("--alpha-clean",
829 |           type=float, default=default([0.]), nargs='+',
830 |           help='Weight of self-entropy cost applied to clean predictions')
831 |         a("--beta", help="Weight of cross entropy cost between aprior and average",
832 |           type=float, default=default([0.15]), nargs='+')
833 |         a("--dbeta", help="Dirichlet correction",
834 |           type=float, default=default([0.]), nargs='+')
835 |         a("--gamma", help="Weight of binary classifier cost",
836 |           type=float, default=default([0.01]), nargs='+')
837 |         a("--gamma1", help="",
838 |           type=float, default=default([-1.]), nargs='+')
839 |         a("--batch-size", help="Minibatch size",
840 |           type=int, default=default([100]), nargs='+')
841 |         a("--valid-batch-size", help="Minibatch size for validation data",
842 |           type=int, default=default([100]), nargs='+')
843 |         a("--valid-set-size", help="Upper limit on number of examples in "
844 |                                    "validation set, taken from the examples "
845 |                                    "not used in unlabeled samples",
846 |           type=int, default=default([10000]), nargs='+')
847 | 
848 |         # Hyperparameters controlling supervised path
849 |         a("--super-noise-std", help="Noise added to supervised learning path",
850 |           type=float, default=default([0.3]), nargs='+')
851 |         a("--f-local-noise-std", help="Noise added encoder path",
852 |           type=str, default=default([0.3]), nargs='+',
853 |           action=funcs([tuple, to_float, chop]))
854 |         a("--act", nargs='+', type=str, action=funcs([tuple, chop, rep]),
855 |           default=default(["relu"]), help="List of activation functions")
856 |         a("--encoder-layers", help="List of layers for f",
857 |           type=str, action=funcs([tuple, chop, rep])) #default=default(()),
858 | 
859 |         # Hyperparameters controlling unsupervised training
860 |         a("--denoising-cost-x", help="Weight of the denoising cost.",
861 |           type=str, default=default([(0.,)]), nargs='+',
862 |           action=funcs([tuple, to_float, chop]))
863 |         a("--decoder-spec", help="List of decoding function types", nargs='+',
864 |           type=str, default=default(['sig']), action=funcs([tuple, chop, rep]))
865 |         a("--zestbn", type=str, default=default(['bugfix']), nargs='+',
866 |           choices=['bugfix', 'no'], help="How to do zest bn")
867 | 
868 |         # Hyperparameters used for Cifar training
869 |         a("--contrast-norm", help="Scale of contrast normalization (0=off)",
870 |           type=int, default=default([0]), nargs='+')
871 |         a("--top-c", help="Have c at softmax?", action=funcs([to_bool]),
872 |           default=default([True]), nargs='+')
873 |         a("--whiten-zca", help="Whether to whiten the data with ZCA",
874 |           type=int, default=default([0]), nargs='+')
875 |         a('--load_from', type=str,
876 |                         help="Destination to load the state from")
877 |         a("--oos-thr", help="Minimal probability for maximal label, below which label is assumed to be OOS",
878 |           type=float, default=default([0.]), nargs='+')
879 |         a("-C", "--balanced_classes",
880 |           help="DONT balance classes, relevant if labeled-samples < unlabeled-samples",
881 |           action='store_false',
882 |           default=True)
883 | 
884 |     ap = ArgumentParser("Semisupervised experiment")
885 |     subparsers = ap.add_subparsers(dest='cmd', help='sub-command help')
886 | 
887 |     # TRAIN
888 |     train_cmd = subparsers.add_parser('train', help='Train a new model')
889 |     add_train_params(train_cmd, use_defaults=True)
890 | 
891 |     # EVALUATE
892 |     load_cmd = subparsers.add_parser('evaluate', help='Evaluate test error')
893 |     load_cmd.add_argument('load_from', type=str,
894 |                           help="Destination to load the state from")
895 |     load_cmd.add_argument('--data-type', type=str, default='test',
896 |                           help="Data set to evaluate on")
897 |     load_cmd.add_argument("-C", "--balanced_classes",
898 |           help="DONT balance classes, relevant if labeled-samples < unlabeled-samples",
899 |           action='store_false',
900 |           default=True)
901 | 
902 |     # DUMP
903 |     dump_cmd = subparsers.add_parser('dump', help='Store the output of an encoder layer for all inputs')
904 |     dump_cmd.add_argument('load_from', type=str,
905 |                           help="Destination to load the state from, and where to save the dump")
906 |     # dump_cmd.add_argument("--dataset", type=str, default=default(['mnist']), nargs='+',
907 |     #                     help="Which dataset to use. mnist, cifar10 or your own hdf5")
908 |     dump_cmd.add_argument('--data-type', type=str, default='test',
909 |                           help="Data set to evaluate on")
910 |     dump_cmd.add_argument("--layer", type=int, default=-1,
911 |                           help="which layer to dump (default top)")
912 |     dump_cmd.add_argument("--super-noise-std", help="Noise added to supervised learning path",
913 |       type=float, default=0.3)
914 |     dump_cmd.add_argument("--f-local-noise-std", help="Noise added encoder path",
915 |       type=str, default=0.3, nargs='+',
916 |       action=funcs([tuple, to_float, chop]))
917 |     dump_cmd.add_argument("-C", "--balanced_classes",
918 |           help="DONT balance classes, relevant if labeled-samples < unlabeled-samples",
919 |           action='store_false',
920 |           default=True)
921 |     args = ap.parse_args()
922 | 
923 |     if args.load_from:
924 |         ap.set_defaults(**dict((k,None) for k in vars(args).iterkeys()))
925 |         nodefaultargs = [k for k,v in vars(ap.parse_args()).iteritems() if v is not None]
926 |     # dump the entire data-set. Override values loaded from the saved state
927 |     # if args.cmd == 'dump':
928 |         # args.labeled_samples = -1
929 |         # args.unlabeled_samples = -1
930 |         # args.super_noise_std = 0.
931 |         # args.f_local_noise_std = 0.
932 | 
933 |     subp = subprocess.Popen(['git', 'rev-parse', 'HEAD'],
934 |                             stdin=subprocess.PIPE, stdout=subprocess.PIPE,
935 |                             stderr=subprocess.PIPE)
936 |     out, err = subp.communicate()
937 |     args.commit = out.strip()
938 |     if err.strip():
939 |         logger.error('Subprocess returned %s' % err.strip())
940 | 
941 |     t_start = time.time()
942 |     if args.cmd == 'evaluate':
943 |         for k, v in vars(args).iteritems():
944 |             if type(v) is list:
945 |                 assert len(v) == 1, "should not be a list when loading: %s" % k
946 |                 logger.info("%s" % str(v[0]))
947 |                 vars(args)[k] = v[0]
948 | 
949 |         err = get_error(vars(args))
950 |         logger.info('Test error: %f' % err)
951 |     elif args.cmd == 'dump':
952 |         layer = dump_unlabeled_encoder(vars(args))
953 |         fname = os.path.join(args.load_from,'layer%d'%args.layer)
954 |         logger.info("Saving dump to %s" % fname)
955 |         numpy.save(fname, layer)
956 |     elif args.cmd == "train":
957 |         listdicts = {k: v for k, v in vars(args).iteritems() if type(v) is list}
958 |         therest = {k: v for k, v in vars(args).iteritems() if type(v) is not list}
959 | 
960 |         gen1, gen2 = tee(product(*listdicts.itervalues()))
961 | 
962 |         l = len(list(gen1))
963 |         for i, d in enumerate(dict(izip(listdicts, x)) for x in gen2):
964 |             if l > 1:
965 |                 logger.info('Training configuration %d / %d' % (i+1, l))
966 |             d.update(therest)
967 |             if train(d) is None:
968 |                 break
969 |     logger.info('Took %.1f minutes' % ((time.time() - t_start) / 60.))
970 | 


--------------------------------------------------------------------------------
/utils.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | 
  3 | import logging
  4 | import numpy as np
  5 | import theano
  6 | from pandas import DataFrame, read_hdf
  7 | 
  8 | from blocks.extensions import Printing, SimpleExtension
  9 | from blocks.main_loop import MainLoop
 10 | from blocks.roles import add_role
 11 | 
 12 | import sys
 13 | debug = sys.gettrace() is not None
 14 | 
 15 | logger = logging.getLogger('main.utils')
 16 | 
 17 | 
 18 | def shared_param(init, name, cast_float32, role, **kwargs):
 19 |     if cast_float32:
 20 |         v = np.float32(init)
 21 |     p = theano.shared(v, name=name, **kwargs)
 22 |     if debug:
 23 |         p.tag.test_value = v
 24 |     add_role(p, role)
 25 |     return p
 26 | 
 27 | 
 28 | class AttributeDict(dict):
 29 |     __getattr__ = dict.__getitem__
 30 | 
 31 |     def __setattr__(self, a, b):
 32 |         self.__setitem__(a, b)
 33 | 
 34 | 
 35 | class DummyLoop(MainLoop):
 36 |     def __init__(self, extensions):
 37 |         return super(DummyLoop, self).__init__(algorithm=None,
 38 |                                                data_stream=None,
 39 |                                                extensions=extensions)
 40 | 
 41 |     def run(self):
 42 |         for extension in self.extensions:
 43 |             extension.main_loop = self
 44 |         self._run_extensions('before_training')
 45 |         self._run_extensions('after_training')
 46 | 
 47 | 
 48 | class ShortPrinting(Printing):
 49 |     def __init__(self, to_print, use_log=True, **kwargs):
 50 |         self.to_print = to_print
 51 |         self.use_log = use_log
 52 |         super(ShortPrinting, self).__init__(**kwargs)
 53 | 
 54 |     def do(self, which_callback, *args):
 55 |         log = self.main_loop.log
 56 | 
 57 |         # Iteration
 58 |         msg = "e {}, i {}:".format(
 59 |             log.status['epochs_done'],
 60 |             log.status['iterations_done'])
 61 | 
 62 |         # Requested channels
 63 |         items = []
 64 |         for k, vars in self.to_print.iteritems():
 65 |             for shortname, vars in vars.iteritems():
 66 |                 if vars is None:
 67 |                     continue
 68 |                 if type(vars) is not list:
 69 |                     vars = [vars]
 70 | 
 71 |                 s = ""
 72 |                 for var in vars:
 73 |                     try:
 74 |                         name = k + '_' + var.name
 75 |                         val = log.current_row[name]
 76 |                     except:
 77 |                         continue
 78 |                     try:
 79 |                         s += ' ' + ' '.join(["%.3g" % v for v in val])
 80 |                     except:
 81 |                         s += " %.3g" % val
 82 |                 if s != "":
 83 |                     items += [shortname + s]
 84 |         msg = msg + ", ".join(items)
 85 |         if self.use_log:
 86 |             logger.info(msg)
 87 |         else:
 88 |             print msg
 89 | 
 90 | 
 91 | class SaveParams(SimpleExtension):
 92 |     """Finishes the training process when triggered."""
 93 |     def __init__(self, trigger_var, params, save_path, save_every=10, **kwargs):
 94 |         super(SaveParams, self).__init__(**kwargs)
 95 |         if trigger_var is None:
 96 |             self.var_name = None
 97 |         else:
 98 |             self.var_name = trigger_var[0] + '_' + trigger_var[1].name
 99 |         self.save_path = save_path
100 |         self.params = params
101 |         self.to_save = {}
102 |         self.best_value = None
103 |         self.add_condition(['after_training'], self.save)
104 |         self.add_condition(['on_interrupt'], self.save)
105 |         self.save_every = save_every
106 |         self.save_every_count = 0
107 | 
108 |     def save(self, which_callback, *args):
109 |         if self.var_name is None:
110 |             self.to_save = {v.name: v.get_value() for v in self.params}
111 |         path = self.save_path + '/trained_params'
112 |         logger.info('Saving to %s' % path)
113 |         np.savez_compressed(path, **self.to_save)
114 | 
115 |     def do(self, which_callback, *args):
116 |         self.save_every_count += 1
117 |         if self.save_every and self.save_every_count % self.save_every == 0:
118 |             self.save(which_callback, *args)
119 |         if self.var_name is None:
120 |             return
121 |         val = self.main_loop.log.current_row[self.var_name]
122 |         if self.best_value is None or val <= self.best_value:
123 |             self.best_value = val
124 |             logger.info('Best value %f' % val)
125 |             self.to_save = {v.name: v.get_value().copy() for v in self.params}
126 | 
127 | class SaveExpParams(SimpleExtension):
128 |     def __init__(self, experiment_params, dir, **kwargs):
129 |         super(SaveExpParams, self).__init__(**kwargs)
130 |         self.dir = dir
131 |         self.experiment_params = experiment_params
132 | 
133 |     def do(self, which_callback, *args):
134 |         df = DataFrame.from_dict(self.experiment_params, orient='index')
135 |         df.to_hdf(os.path.join(self.dir, 'params'), 'params', mode='w',
136 |                   complevel=5, complib='blosc')
137 | 
138 | 
139 | class SaveLog(SimpleExtension):
140 |     def __init__(self, dir, show=None, **kwargs):
141 |         super(SaveLog, self).__init__(**kwargs)
142 |         self.dir = dir
143 |         self.show = show if show is not None else []
144 | 
145 |     def do(self, which_callback, *args):
146 |         df = DataFrame.from_dict(self.main_loop.log, orient='index')
147 |         df.to_hdf(os.path.join(self.dir, 'log'), 'log', mode='w',
148 |                   complevel=5, complib='blosc')
149 | 
150 | 
151 | def prepare_dir(save_to, results_dir='results'):
152 |     base = os.path.join(results_dir, save_to)
153 |     i = 0
154 | 
155 |     while True:
156 |         name = base + str(i)
157 |         try:
158 |             os.makedirs(name)
159 |             break
160 |         except:
161 |             i += 1
162 | 
163 |     return name
164 | 
165 | 
166 | def load_df(dirpath, filename, varname=None):
167 |     varname = filename if varname is None else varname
168 |     fn = os.path.join(dirpath, filename)
169 |     return read_hdf(fn, varname)
170 | 
171 | 
172 | def filter_funcs_prefix(d, pfx):
173 |     pfx = 'cmd_'
174 |     fp = lambda x: x.find(pfx)
175 |     return {n[fp(n) + len(pfx):]: v for n, v in d.iteritems() if fp(n) >= 0}
176 | 


--------------------------------------------------------------------------------