├── .gitignore ├── A Semisupervised Approach for Language Identification based on Ladder Networks.ipynb ├── LICENSE ├── README.md ├── The dark knowledge of tongues.ipynb ├── fuel.ipynb ├── ladder.py ├── language-tree.jpg ├── nn.py ├── run.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | 5 | # C extensions 6 | *.so 7 | 8 | # Distribution / packaging 9 | .Python 10 | env/ 11 | build/ 12 | develop-eggs/ 13 | dist/ 14 | downloads/ 15 | eggs/ 16 | .eggs/ 17 | lib/ 18 | lib64/ 19 | parts/ 20 | sdist/ 21 | var/ 22 | *.egg-info/ 23 | .installed.cfg 24 | *.egg 25 | 26 | # PyInstaller 27 | # Usually these files are written by a python script from a template 28 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 29 | *.manifest 30 | *.spec 31 | 32 | # Installer logs 33 | pip-log.txt 34 | pip-delete-this-directory.txt 35 | 36 | # Unit test / coverage reports 37 | htmlcov/ 38 | .tox/ 39 | .coverage 40 | .coverage.* 41 | .cache 42 | nosetests.xml 43 | coverage.xml 44 | *,cover 45 | 46 | # Translations 47 | *.mo 48 | *.pot 49 | 50 | # Django stuff: 51 | *.log 52 | 53 | # Sphinx documentation 54 | docs/_build/ 55 | 56 | # PyBuilder 57 | target/ 58 | -------------------------------------------------------------------------------- /A Semisupervised Approach for Language Identification based on Ladder Networks.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# Data" 8 | ] 9 | }, 10 | { 11 | "cell_type": "markdown", 12 | "metadata": {}, 13 | "source": [ 14 | "The original dataset is made from training (15000 samples), dev (6431) and testing (6500) files. Only the 400 *i-vector* features where used. A process to whiten the entire dataset was applied before using the feature set $x_i$ and only the dev set was used to train the whitening parameters (see code suplied with the data by the competition organizers). Each sample is either unlabeled (all dev and testing samples) and we will label it as $y_i=0$ or is one of $n=50$ different categories $y_i \\in \\{1 \\ldots n \\}$ (all training samples.)" 15 | ] 16 | }, 17 | { 18 | "cell_type": "markdown", 19 | "metadata": {}, 20 | "source": [ 21 | "## Cross validation" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "In order to select hyper parameters of the model a modified [cross validation dataset was built](./fuel.ipynb) from the training dataset.\n", 29 | "\n", 30 | "In the modified dataset, the $n$ known original training labels are considered to be the entire label space of the modified dataset and from them a subset is assumed to be known.\n", 31 | "The other labeles are assumed to be out-of-set for the purpose of the modified dataset.\n", 32 | "The number of assumed known labels is such that the ratio of known and unknown labels in the modified set is:\n", 33 | "\n", 34 | "$Q = \\lfloor \\left( 1 - P_{\\text{oos}} \\right) * n \\rfloor = 38 \\quad P_{\\text{oos}} = 0.23$\n", 35 | "\n", 36 | "The labels of the modified dataset are re-indexed such that the labels assumed to be known are $y_i \\in \\{1 \\ldots Q \\}$\n", 37 | "\n", 38 | "A part, $1-r$, of the training data with labels assumed to be known is used for training as labeled data. The rest, $r$, of the samples with labels assumed to be known are mixed with $r$ of the rest of the training which has labels that are assumed to be out-of-set. The mix is used for training as unlabeled data.\n", 39 | "For having an the number of unlabeled samples to be $u=0.5$ from the number of labeled samples (the ratio between `dev` and `training` sizes):\n", 40 | "\n", 41 | "\n", 42 | "$r = Q*u/(50+Q*u)$\n", 43 | "\n", 44 | "The remaining $1-r$ samples with labels assumed to be unknown are dropped.\n", 45 | "\n", 46 | "Each of the steps above, in building the modified dataset, uses a random selection process. The process of creating a modified dataset can be repeated many times giving each label an opportunity to be out-of-set." 47 | ] 48 | }, 49 | { 50 | "cell_type": "markdown", 51 | "metadata": {}, 52 | "source": [ 53 | "# Model Training" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "When training a model all samples are used, labeled and unlabeled. For cross validation, this is the modified dataset and for submission this is training and dev datasets, the test dataset is only used to make final prediction for submission.\n", 61 | "\n", 62 | "The model generates probability for each sample, $x_i$, to be out-of-set or in one of the categories. When doing cross validation the model will generate $Q+1=39$ categories and when training on the entire available data the model will generate $n+1=51$ categories. The label $l=0$ is used for out-of-set prediction (not to be confused with unlabeled sample.)\n", 63 | "\n", 64 | "$p(l) = p(l \\mid x_i) \\quad l \\in \\{ 0 \\ldots Q \\} \\quad \\text{or} \\quad l \\in \\{ 0 \\ldots n \\} $" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "## Final score" 72 | ] 73 | }, 74 | { 75 | "cell_type": "markdown", 76 | "metadata": {}, 77 | "source": [ 78 | "According to the [evaluation plan](http://www.nist.gov/itl/iad/mig/upload/lre_ivectorchallenge_rel_v2.pdf) of the competition, the goal is to minimize:\n", 79 | "\n", 80 | "$\\text{Cost} = \\frac{1-P_{\\text{oos}}}{n} * \\sum_{k=1}^n P_{\\text{error}}(k) + P_{\\text{oos}} * P_{\\text{error}}(\\text{oos}) \\qquad [1]$\n", 81 | "\n", 82 | "$P_{\\text{error}}(k) = \\left( \\frac{\\text{#errors_class_k}}{\\text{#trials_class_k}} \\right) $" 83 | ] 84 | }, 85 | { 86 | "cell_type": "markdown", 87 | "metadata": {}, 88 | "source": [ 89 | "In the cross validation stage we can compute this cost directly, by replacing $n$ with $Q$, and using the information we have on the validation part of the modified dataset. We will use this score to select the best hyper parameters." 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "## Loss function" 97 | ] 98 | }, 99 | { 100 | "cell_type": "markdown", 101 | "metadata": {}, 102 | "source": [ 103 | "The training process optimize the model internal parameters (weights) minimizing a loss function. We describe the loss function used in cross validation training, when training for a submission, substitute $Q=n$.\n", 104 | "\n", 105 | "The loss is computed as a sum of loss on batches of samples. Each batch has ($N=1024$) samples. For each sample, $x_i$, the loss function accepts as input the $Q+1$ probabilities, $p(l \\mid x_i)$ from the model and the label information, $y_i$. Note that $p(0 \\mid x_i)$ gives the probability of the model to out-of-set label and $y_i = 0$ is used to indicate that the sample $x_i$ is not labeled." 106 | ] 107 | }, 108 | { 109 | "cell_type": "markdown", 110 | "metadata": {}, 111 | "source": [ 112 | "The loss of a batch is made from several parts:\n", 113 | "\n", 114 | "$\\text{loss} = \\text{cross_entopy} + \\beta \\cdot \\text{aprior_average_cross_entropy} + \\gamma \\cdot \\text{binary_cross_entropy} \\qquad [2]$\n", 115 | "\n", 116 | "where $\\beta$ and $\\gamma$ are hyper-parameters. After running cross validation tests the values $\\beta=0.15$ and $\\gamma=0.01$ were selected.\n", 117 | "\n", 118 | "### cross entropy\n", 119 | "for the labeled samples in the batch the loss is\n", 120 | "\n", 121 | "$\\text{cross_entopy} = \\frac{1}{N_l} \\sum_{i : y_i \\in \\{1 \\ldots Q \\}} -\\log p(y_i \\mid x_i)$\n", 122 | "\n", 123 | "were $N_l$ is the number of labeled samples in the batch\n", 124 | "\n", 125 | "$N_l = \\sum_{i : y_i \\in \\{1 \\ldots Q \\}} 1$\n", 126 | "\n", 127 | "### aprior cross entropy\n", 128 | "Aprior, we assume that the predicted probabilities of unlabeled samples should have the distribution:\n", 129 | "\n", 130 | "$P^a (0) = P_\\text{oos} \\quad P^a (l) = \\frac{1-P_\\text{oos}}{Q} \\quad \\forall l \\in \\{1 \\ldots Q \\}$\n", 131 | "\n", 132 | "This distribution is correct for the cross validation modified dataset and we assume it is correct for the dev dataset.\n", 133 | "\n", 134 | "Armed with the apriori distribution, we can add a loss term which measure the cross entropy between predictions made on unlabeled samples and this\n", 135 | "apriori distribution:\n", 136 | "\n", 137 | "$\\text{aprior_cross_entropy} = \\frac{1}{N_u} \\sum_{i : y_i \\notin \\{1 \\ldots Q \\}} -P^a(l)\\log(p(l \\mid x_i))$\n", 138 | "\n", 139 | "were $N_u$ is the number of labeled samples in the batch\n", 140 | "\n", 141 | "$N_u = \\sum_{i : y_i \\notin \\{1 \\ldots Q \\}} 1$\n", 142 | "\n", 143 | "### aprior average cross entropy\n", 144 | "However it was found that a much better result is achieved by first averaging all the predicted probabilities over the unlabeled samples in the batch and only then \n", 145 | "measuring its cross entropy with the aprior probability:\n", 146 | "\n", 147 | "$\\bar{p}(l) = \\frac{1}{N_u} \\sum_{i : y_i \\notin \\{1 \\ldots Q \\}} p(l \\mid x_i) \\\\\n", 148 | "\\text{aprior_average_cross_entropy} = - \\sum_{l=0}^Q P^a (l) \\log(\\bar{p}(l \\mid x_i))$\n", 149 | "\n", 150 | "### aprior average Dirichlet\n", 151 | "\n", 152 | "$C_2 = -􀀀p_\\text{oos} \\log p_\\text{av}(\\text{oos}) 􀀀- \\frac{1 - p_\\text{oos}}{k} \\sum_{i=1}^k \\log p_\\text{av}(i)$\n", 153 | "\n", 154 | "changes to \n", 155 | "\n", 156 | "$\\text{NLLK}(p_\\text{av}) = -(\\alpha_\\text{oos} - 1) \\log p_\\text{av}(\\text{oos}) 􀀀- \\sum_{i=1}^k (\\alpha_i - 1) \\log p_\\text{av}(i) \\quad + \\text{constant}$\n", 157 | "\n", 158 | "such that \n", 159 | "\n", 160 | "$p_\\text{oos} = \\frac{\\alpha_\\text{oos}}{\\alpha_\\text{sum}} \\qquad \\frac{1 - p_\\text{oos}}{k} = \\frac{\\alpha_i}{\\alpha_\\text{sum}}$\n", 161 | "\n", 162 | "where\n", 163 | "\n", 164 | "$\\alpha_\\text{sum} = \\alpha_\\text{oos} + \\sum_{i=1}^k \\alpha_i$\n", 165 | "\n", 166 | "redefine $C_2$ as\n", 167 | "\n", 168 | "$C_2 = -(􀀀p_\\text{oos} - \\delta) \\log p_\\text{av}(\\text{oos}) 􀀀- \\left( \\frac{1 - p_\\text{oos}}{k} - \\delta \\right) \\sum_{i=1}^k \\log p_\\text{av}(i)$\n", 169 | "\n", 170 | "where $\\alpha_\\text{sum}$ is moved outside into $C_2$ scale factor $\\alpha$ and $\\delta = 1/\\alpha_\\text{sum}$\n", 171 | "\n", 172 | "\n", 173 | "### binary cross entropy\n", 174 | "We will use $p(0 \\mid x_i)$ to predict if $x_i$ is out-of-set or not. If $x_i$ happens to be a labeled sample, we know it is not out-of-set and if it is unlabeled we know there is $P_\\text{oos}$ chance that it is out-of-set.\n", 175 | "Again this is something which is true for the corss validation modified dataset and assumed to be true for the dev dataset:\n", 176 | "\n", 177 | "$\\text{binary_cross_entropy} = -\\frac{1}{N} \\left[ \\sum_{i:y_i \\notin \\{1 \\ldots Q \\}} \\left( P_\\text{oos} \\log(p_0(i)) + (1-P_\\text{oos}) \\log(p_1(i)) \\right) + \\sum_{i:y_i \\in \\{1 \\ldots Q \\}} \\log(p_1(i)) \\right]$\n", 178 | "\n", 179 | "were\n", 180 | "\n", 181 | "$p_0(i) = p(0 \\mid x_i) \\quad p_1(i) = 1-p_0(i)$" 182 | ] 183 | }, 184 | { 185 | "cell_type": "markdown", 186 | "metadata": {}, 187 | "source": [ 188 | "# Model" 189 | ] 190 | }, 191 | { 192 | "cell_type": "markdown", 193 | "metadata": {}, 194 | "source": [ 195 | "The loss function we use [2] is applied to all available data: training and dev datasets. However the strongest signal is from the training (labeled) part and effectively we are in a situation in which 1/3 of the available data is unlabeled. It is therefore beneficial to use semi-supervised technique which will utilize the information available in all the data and not just in the training set." 196 | ] 197 | }, 198 | { 199 | "cell_type": "markdown", 200 | "metadata": {}, 201 | "source": [ 202 | "Predictions, $p(y_i \\mid x_i)$, are made using a modified [Ladder Network](http://arxiv.org/abs/1507.02672). The original Ladder Network [code](https://github.com/CuriousAI/ladder) was slightly modified. The code was modified to accept the training and dev data of the competition and was used in its entire both for supervised and unsupervised parts of the ladder method. The objective function used in computing the cost of the supervised part of the ladder method was replaced from a simple Cross Entropy to the loss function [2]. In addition, the error rate [1] was monitored while training on cross-validation dataset to determine the optimal number of epochs for training. The setup used for training that gave the best results are as follows:\n", 203 | "\n", 204 | "```bash\n", 205 | "python run.py train --lr 1e-3 --labeled-samples 21431 --unlabeled-samples 21431 --encoder-layers 500-500-500-100-51 --decoder-spec gauss,relu,relu,relu,relu,relu --denoising-cost-x 1,1,.3,.3,.3,.3 --dseed 0 --seed 2 --super-noise-std 0.5 --f-local-noise-std 0.5 --batch-size 1024 --valid-batch-size 1024 --num-epochs 1000 --dataset 160111-fuel.test -- test.\n", 206 | "```\n", 207 | "\n", 208 | "The interpretation of each of the parameters is as follows:\n", 209 | "\n", 210 | "parameter | value | description\n", 211 | "--- | --- | ---\n", 212 | "dataset | 160111-fuel.test | Both training and dev datasets were used as input. For cross validation this was changed to `160111-fuel.train`\n", 213 | "labeled-samples | 21431 | All samples in training and dev were used for training the supervised part of the ladder method. This is made possible because the modified loss function has a part which is applied on unlabeled samples. For cross validation this was modified to `10000` and the rest of the modified dataset was used for validation\n", 214 | "unlabeled-samples | 21431 | All samples in training and dev were used in the unsupervised parts of the ladder method. For cross validation this was modified to `10000`\n", 215 | "encoder-layers | 500-500-500-100-51 | The network has an input of dimension 400 which pass through 4 hidden layers of size 500, 500, 500 and 100 and a final output layer of 51. For cross validation this was modified to 39.\n", 216 | "decoder-spec | gauss,relu,relu,relu,relu,relu | A direct skip of information from the encoder to the decoder was used only on the input layer using the gaussian method described in ladder paper.\n", 217 | "denoising-cost-x | 1,1,.3,.3,.3,.3 | The L2 error of the de-noising layers compared with an un-noised clean encoder was weighted with a weight of 1 for the input layer and the first hidden layer and 0.3 for all other layers.\n", 218 | "super-noise-std | 0.5 | std of gaussian noise added to the input of the courrputed encoder\n", 219 | "f-local-noise-std | 0.5 | std of gaussian noise added to output of all layers courrputed encoder\n", 220 | "lr | 1e-3 | Learning rate\n", 221 | "num-epochs | 1000 | Number of epoch iterations for which training was made. Before each iteration the order of the samples was shuffled. It turns out that because of the unsupervised learning the ladder method is insensitive to the number of epochs and having between 800 to 2000 epoch iterations would give similar results\n", 222 | "batch-size | 1024| batch size used for training. this size has a secondary effect through the loss function which performed an average of predictions before computing the loss\n", 223 | "lrate-decay | 0.67 (default) | the learning rate starts to decay linearly to zero after passing 0.67 of the epoch iterations\n", 224 | "act | relu (default) | the activation of the encoder layers except for the last layer which is always softmax" 225 | ] 226 | }, 227 | { 228 | "cell_type": "markdown", 229 | "metadata": {}, 230 | "source": [ 231 | "# Results" 232 | ] 233 | }, 234 | { 235 | "cell_type": "markdown", 236 | "metadata": {}, 237 | "source": [ 238 | "the results where measured by generating predictions on the test dataset using the model found in the training process. The prediction were then submitted to the competition web site which used an unknown subset of 30% of the samples to compute a score for the PROGESS SET (results for the 70% eval set are not reported by the web site.) " 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "Score | Description | Command line\n", 246 | "--- | --- | ---\n", 247 | "24.000 | The best configuration which was described above. This would have been translated to 11th place while the competition was in progress | --lr 1e-3 --labeled-samples 21431 --unlabeled-samples 21431 --encoder-layers 500-500-500-100-51 --decoder-spec gauss,relu,relu,relu,relu,relu --denoising-cost-x 1,1,.3,.3,.3,.3 --dseed 0 --seed 2 --super-noise-std 0.5 --f-local-noise-std 0.5 --batch-size 1024 --valid-batch-size 1024 --num-epochs 1000\n", 248 | "31.487 | In this configuration the unsupervisied part of the ladder algorithm is disabled. An early stopping after 138 epochs was needed to avoid overfiting | --lr 1e-3 --labeled-samples 21431 --unlabeled-samples 21431 --encoder-layers 500-500-500-100-51 --decoder-spec gauss,relu,relu,relu,relu,relu --denoising-cost-x 0,0,0,0,0,0 --decoder-spec 0-0-0-0-0-0 --dseed 0 --seed 2 --super-noise-std 0.5 --f-local-noise-std 0.5 --batch-size 1024 --valid-batch-size 1024 --num-epochs 138" 249 | ] 250 | }, 251 | { 252 | "cell_type": "markdown", 253 | "metadata": {}, 254 | "source": [ 255 | "To generate a submission file identify the directory in which the training stored its results. This is a subdirectory under `./results/` the subdirectory name has prefix determined by the last argument in the command line. In the example given above the prefix is `test.`. The suffix of the subdirectory is a number which is incremented after every training run. Below I assume that all of this results in `results/test.0`\n", 256 | "\n", 257 | "You then generate predictions with\n", 258 | "```bash\n", 259 | "run.py dump --layer -1 -- results/test.0\n", 260 | "```" 261 | ] 262 | }, 263 | { 264 | "cell_type": "markdown", 265 | "metadata": {}, 266 | "source": [ 267 | "The submission is made from the predictions on the `test` part of the dataset file (last 6500 samples) that are saved in `bz2` file which can be submitted to the web site" 268 | ] 269 | }, 270 | { 271 | "cell_type": "code", 272 | "execution_count": null, 273 | "metadata": { 274 | "collapsed": true 275 | }, 276 | "outputs": [], 277 | "source": [ 278 | "import numpy as np\n", 279 | "import bz2\n", 280 | "\n", 281 | "yprob = np.load('results/test.0/layer-1.npy'%t)\n", 282 | "y_pred = np.argmax(yprob,axis=1)\n", 283 | "fn = 'submission.txt.bz2'\n", 284 | "with bz2.BZ2File('data/%s'%fn, 'w') as f:\n", 285 | " for i in y_pred[-6500:]:\n", 286 | " f.write('%s\\n' % idx2lang[i])" 287 | ] 288 | }, 289 | { 290 | "cell_type": "markdown", 291 | "metadata": {}, 292 | "source": [ 293 | "# Reference" 294 | ] 295 | }, 296 | { 297 | "cell_type": "markdown", 298 | "metadata": {}, 299 | "source": [ 300 | "1. https://ivectorchallenge.nist.gov/\n", 301 | "2. http://www.nist.gov/itl/iad/mig/upload/lre_ivectorchallenge_rel_v2.pdf\n", 302 | "2. http://arxiv.org/abs/1507.02672\n", 303 | "3. https://github.com/CuriousAI/ladder\n", 304 | "3. http://arxiv.org/abs/1511.06430v3" 305 | ] 306 | } 307 | ], 308 | "metadata": { 309 | "kernelspec": { 310 | "display_name": "Python 2", 311 | "language": "python", 312 | "name": "python2" 313 | }, 314 | "language_info": { 315 | "codemirror_mode": { 316 | "name": "ipython", 317 | "version": 2 318 | }, 319 | "file_extension": ".py", 320 | "mimetype": "text/x-python", 321 | "name": "python", 322 | "nbconvert_exporter": "python", 323 | "pygments_lexer": "ipython2", 324 | "version": "2.7.11" 325 | } 326 | }, 327 | "nbformat": 4, 328 | "nbformat_minor": 0 329 | } 330 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 Ehud Ben-Reuven 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | Twitter followers 2 | 3 | This repository contains source code for the experiments in a paper titled [A Semisupervised Approach for Language Identification based on Ladder Networks](http://arxiv.org/pdf/1604.00317v1.pdf) 4 | 5 | In 2015 NIST conducted a [LRE i-vector challenge](https://ivectorchallenge.nist.gov/evaluations/2). 6 | The challenge was to identify which language is spoken from a speech sample, given that the language belongs 7 | to one of 50 given language or is one of out-of-set languages. 8 | The speech samples were already processed into `i-vectors` and duration information. 9 | The data was split into `training`, `dev` and `test`. 10 | The `training` data included labeled samples from the 50 given languages. 11 | The `dev` data included unlabeled samples from both the 50 given languages and the out-of-set languages. 12 | The `test` was similar to `dev` but it could have been only used for making submissions to the competition. 13 | 14 | * [our solution](./A%20Semisupervised%20Approach%20for%20Language%20Identification%20based%20on%20Ladder%20Networks.ipynb) used a modification of the [Ladder Network](http://arxiv.org/abs/1507.02672) and [published code](https://github.com/CuriousAI/ladder). 15 | * [The dark knowledge of tongues](./The%20dark%20knowledge%20of%20tongues.ipynb), fun with the i-vector dataset supplied by the challenge. 16 | * [Odyssey 2016, video lecture](https://www.superlectures.com/odyssey2016/a-semisupervised-approach-for-language-identification-based-on-ladder-networks) 17 | -------------------------------------------------------------------------------- /fuel.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "# fuel" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": { 14 | "collapsed": true 15 | }, 16 | "outputs": [], 17 | "source": [ 18 | "import bz2\n", 19 | "import csv\n", 20 | "import numpy as np\n", 21 | "import sys" 22 | ] 23 | }, 24 | { 25 | "cell_type": "code", 26 | "execution_count": 2, 27 | "metadata": { 28 | "collapsed": false 29 | }, 30 | "outputs": [ 31 | { 32 | "name": "stderr", 33 | "output_type": "stream", 34 | "text": [ 35 | "Using gpu device 0: GeForce GTX 980 (CNMeM is disabled)\n" 36 | ] 37 | }, 38 | { 39 | "data": { 40 | "text/plain": [ 41 | "'/Users/udi/Downloads/lisa'" 42 | ] 43 | }, 44 | "execution_count": 2, 45 | "metadata": {}, 46 | "output_type": "execute_result" 47 | } 48 | ], 49 | "source": [ 50 | "import fuel, os\n", 51 | "fuel_path = fuel.config.data_path[0]\n", 52 | "fuel_path" 53 | ] 54 | }, 55 | { 56 | "cell_type": "code", 57 | "execution_count": 3, 58 | "metadata": { 59 | "collapsed": true 60 | }, 61 | "outputs": [], 62 | "source": [ 63 | "base = 'data/r146_1_1/ivec15-lre/'" 64 | ] 65 | }, 66 | { 67 | "cell_type": "code", 68 | "execution_count": 4, 69 | "metadata": { 70 | "collapsed": true 71 | }, 72 | "outputs": [], 73 | "source": [ 74 | "def load_ivectors(filename):\n", 75 | " \"\"\"Loads ivectors\n", 76 | "\n", 77 | " Parameters\n", 78 | " ----------\n", 79 | " filename : string\n", 80 | " Path to ivector files (e.g. dev_ivectors.csv)\n", 81 | "\n", 82 | " Returns\n", 83 | " -------\n", 84 | " ids : list\n", 85 | " List of ivectorids\n", 86 | " durations : array, shaped('n_ivectors')\n", 87 | " Array of durations for each ivectorid\n", 88 | " languages : array, shaped('n_ivectors')\n", 89 | " Array of langs for each ivectorid (only applies to train)\n", 90 | " ivectors : array, shaped('n_ivectors', 600)\n", 91 | " Array of ivectors for each ivectorid\n", 92 | " \"\"\"\n", 93 | " ids = []\n", 94 | " durations = []\n", 95 | " languages = []\n", 96 | " ivectors = []\n", 97 | " with open(filename, 'rb') as infile:\n", 98 | " reader = csv.reader(infile, delimiter='\\t')\n", 99 | " reader.next()\n", 100 | "\n", 101 | " for row in csv.reader(infile, delimiter='\\t'):\n", 102 | " ids.append(row[0])\n", 103 | " durations.append(float(row[1]))\n", 104 | " languages.append(row[2])\n", 105 | " ivectors.append(np.asarray(row[3:], dtype=np.float32))\n", 106 | "\n", 107 | " sys.stdout.write(\"\\r %s \" % row[0])\n", 108 | " sys.stdout.flush()\n", 109 | "\n", 110 | " print \"\\n I- Adding Transformed ivectors \"\n", 111 | "\n", 112 | " return ids, np.array(durations, dtype=np.float32), np.array(languages), np.vstack(ivectors)" 113 | ] 114 | }, 115 | { 116 | "cell_type": "code", 117 | "execution_count": 5, 118 | "metadata": { 119 | "collapsed": false 120 | }, 121 | "outputs": [ 122 | { 123 | "name": "stdout", 124 | "output_type": "stream", 125 | "text": [ 126 | " ivec15-lre_zzzzabb \n", 127 | " I- Adding Transformed ivectors \n" 128 | ] 129 | } 130 | ], 131 | "source": [ 132 | "train_ids, train_durations, train_languages, train_ivec = load_ivectors(base+'data/ivec15_lre_train_ivectors.tsv')" 133 | ] 134 | }, 135 | { 136 | "cell_type": "code", 137 | "execution_count": 6, 138 | "metadata": { 139 | "collapsed": false 140 | }, 141 | "outputs": [ 142 | { 143 | "data": { 144 | "text/plain": [ 145 | "15000" 146 | ] 147 | }, 148 | "execution_count": 6, 149 | "metadata": {}, 150 | "output_type": "execute_result" 151 | } 152 | ], 153 | "source": [ 154 | "Nt = len(train_ivec)\n", 155 | "Nt" 156 | ] 157 | }, 158 | { 159 | "cell_type": "code", 160 | "execution_count": 7, 161 | "metadata": { 162 | "collapsed": false 163 | }, 164 | "outputs": [ 165 | { 166 | "name": "stdout", 167 | "output_type": "stream", 168 | "text": [ 169 | " ivec15-lre_zzyykqa \n", 170 | " I- Adding Transformed ivectors \n" 171 | ] 172 | }, 173 | { 174 | "data": { 175 | "text/plain": [ 176 | "6431" 177 | ] 178 | }, 179 | "execution_count": 7, 180 | "metadata": {}, 181 | "output_type": "execute_result" 182 | } 183 | ], 184 | "source": [ 185 | "dev_ids, dev_durations, dev_languages, dev_ivec = load_ivectors(base+'data/ivec15_lre_dev_ivectors.tsv')\n", 186 | "len(dev_ids)" 187 | ] 188 | }, 189 | { 190 | "cell_type": "code", 191 | "execution_count": 8, 192 | "metadata": { 193 | "collapsed": false 194 | }, 195 | "outputs": [ 196 | { 197 | "name": "stdout", 198 | "output_type": "stream", 199 | "text": [ 200 | " ivec15-lre_zzshxfc \n", 201 | " I- Adding Transformed ivectors \n" 202 | ] 203 | }, 204 | { 205 | "data": { 206 | "text/plain": [ 207 | "6500" 208 | ] 209 | }, 210 | "execution_count": 8, 211 | "metadata": {}, 212 | "output_type": "execute_result" 213 | } 214 | ], 215 | "source": [ 216 | "test_ids, test_durations, test_languages, test_ivec = load_ivectors(base + 'data/ivec15_lre_test_ivectors.tsv')\n", 217 | "len(test_ids)" 218 | ] 219 | }, 220 | { 221 | "cell_type": "markdown", 222 | "metadata": {}, 223 | "source": [ 224 | "compute the mean and whitening transformation over dev set only. You are not allowed to use test and train does not have all languages" 225 | ] 226 | }, 227 | { 228 | "cell_type": "code", 229 | "execution_count": 10, 230 | "metadata": { 231 | "collapsed": true 232 | }, 233 | "outputs": [], 234 | "source": [ 235 | "m = np.mean(dev_ivec, axis=0)\n", 236 | "S = np.cov(dev_ivec, rowvar=0)\n", 237 | "D, V = np.linalg.eig(S)\n", 238 | "W = (1 / np.sqrt(D) * V).transpose().astype('float32')" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "center and whiten" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": 11, 251 | "metadata": { 252 | "collapsed": true 253 | }, 254 | "outputs": [], 255 | "source": [ 256 | "all_durations = np.hstack((train_durations,dev_durations,test_durations))\n", 257 | "all_data = np.vstack((train_ivec,dev_ivec,test_ivec))" 258 | ] 259 | }, 260 | { 261 | "cell_type": "code", 262 | "execution_count": 12, 263 | "metadata": { 264 | "collapsed": true 265 | }, 266 | "outputs": [], 267 | "source": [ 268 | "all_data = np.dot(all_data - m, W.transpose())" 269 | ] 270 | }, 271 | { 272 | "cell_type": "markdown", 273 | "metadata": {}, 274 | "source": [ 275 | "convert labels to int. 'out_of_set' is 0" 276 | ] 277 | }, 278 | { 279 | "cell_type": "code", 280 | "execution_count": 15, 281 | "metadata": { 282 | "collapsed": false 283 | }, 284 | "outputs": [ 285 | { 286 | "name": "stdout", 287 | "output_type": "stream", 288 | "text": [ 289 | "upload: data/160111-fuel.idx2lang.pkl to s3://udikaggle/nist/160111-fuel.idx2lang.pkl\r\n" 290 | ] 291 | } 292 | ], 293 | "source": [ 294 | "idx2lang = dict(enumerate(['out_of_set']+sorted(np.unique(train_languages))))\n", 295 | "lang2idx = dict((l,i) for i,l in idx2lang.iteritems())\n", 296 | "import cPickle as pickle\n", 297 | "with open('data/160111-fuel.idx2lang.pkl','wb') as fp:\n", 298 | " pickle.dump(idx2lang,fp)\n", 299 | "!aws s3 cp data/160111-fuel.idx2lang.pkl s3://udikaggle/nist/" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 16, 305 | "metadata": { 306 | "collapsed": true 307 | }, 308 | "outputs": [], 309 | "source": [ 310 | "X = all_data\n", 311 | "y = np.array(map(lambda l: lang2idx[l], train_languages))" 312 | ] 313 | }, 314 | { 315 | "cell_type": "markdown", 316 | "metadata": {}, 317 | "source": [ 318 | "mark all data not coming from training set as out of set" 319 | ] 320 | }, 321 | { 322 | "cell_type": "code", 323 | "execution_count": 17, 324 | "metadata": { 325 | "collapsed": true 326 | }, 327 | "outputs": [], 328 | "source": [ 329 | "y = np.hstack((y,lang2idx['out_of_set']*np.ones(len(X)-len(y),dtype=int)))" 330 | ] 331 | }, 332 | { 333 | "cell_type": "code", 334 | "execution_count": 18, 335 | "metadata": { 336 | "collapsed": false 337 | }, 338 | "outputs": [ 339 | { 340 | "data": { 341 | "text/plain": [ 342 | "['/Users/udi/Downloads/lisa']" 343 | ] 344 | }, 345 | "execution_count": 18, 346 | "metadata": {}, 347 | "output_type": "execute_result" 348 | } 349 | ], 350 | "source": [ 351 | "import fuel\n", 352 | "fuel.config.data_path" 353 | ] 354 | }, 355 | { 356 | "cell_type": "code", 357 | "execution_count": 19, 358 | "metadata": { 359 | "collapsed": false 360 | }, 361 | "outputs": [ 362 | { 363 | "data": { 364 | "text/plain": [ 365 | "'/Users/udi/Downloads/lisa/160111-fuel.test/160111-fuel.test.hdf5'" 366 | ] 367 | }, 368 | "execution_count": 19, 369 | "metadata": {}, 370 | "output_type": "execute_result" 371 | } 372 | ], 373 | "source": [ 374 | "import os\n", 375 | "from fuel.datasets.hdf5 import H5PYDataset\n", 376 | "datasource = '160111-fuel.test'\n", 377 | "datasource_dir = os.path.join(fuel.config.data_path[0], datasource)\n", 378 | "datasource_fname = os.path.join(datasource_dir , datasource + '.hdf5')\n", 379 | "datasource_fname" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 20, 385 | "metadata": { 386 | "collapsed": true 387 | }, 388 | "outputs": [], 389 | "source": [ 390 | "!mkdir -p {datasource_dir}" 391 | ] 392 | }, 393 | { 394 | "cell_type": "code", 395 | "execution_count": 21, 396 | "metadata": { 397 | "collapsed": false 398 | }, 399 | "outputs": [ 400 | { 401 | "name": "stdout", 402 | "output_type": "stream", 403 | "text": [ 404 | "-rw-r--r-- 1 udi staff 44915848 Jan 11 17:30 /Users/udi/Downloads/lisa/160111-fuel.test/160111-fuel.test.hdf5\r\n" 405 | ] 406 | } 407 | ], 408 | "source": [ 409 | "import h5py\n", 410 | "N, NF = X.shape\n", 411 | "with h5py.File(datasource_fname, mode='w') as fp:\n", 412 | " features = fp.create_dataset('features', (N, NF), dtype=np.float32)\n", 413 | " targets = fp.create_dataset('targets', (N,), dtype='int')\n", 414 | " features[...] = X.astype(np.float32)\n", 415 | " targets[...] = y\n", 416 | " from fuel.datasets.hdf5 import H5PYDataset\n", 417 | " split_dict = {\n", 418 | " 'train': {'features': (0, N), 'targets': (0, N)},\n", 419 | " 'test': {'features': (0, N), 'targets': (0, N)}\n", 420 | " }\n", 421 | " fp.attrs['split'] = H5PYDataset.create_split_array(split_dict)\n", 422 | "!ls -l {datasource_fname}" 423 | ] 424 | }, 425 | { 426 | "cell_type": "markdown", 427 | "metadata": {}, 428 | "source": [ 429 | "the samples are not shuffled" 430 | ] 431 | }, 432 | { 433 | "cell_type": "markdown", 434 | "metadata": {}, 435 | "source": [ 436 | "## simulate training" 437 | ] 438 | }, 439 | { 440 | "cell_type": "code", 441 | "execution_count": 24, 442 | "metadata": { 443 | "collapsed": true 444 | }, 445 | "outputs": [], 446 | "source": [ 447 | "import random\n", 448 | "\n", 449 | "def cv_modify(X, y, Q, seed=None, oos_labels=None, unlabel_label_ratio=0.5):\n", 450 | " assert Q < 50, \"Q has to be smaller than 50, try 38\"\n", 451 | " assert np.all(y>0), \"unlabeled data\"\n", 452 | " \n", 453 | " if seed is not None:\n", 454 | " random.seed(seed)\n", 455 | " np.random.seed(seed)\n", 456 | " \n", 457 | " oos_size = 50 - Q\n", 458 | " if oos_labels is None:\n", 459 | " oos_labels = random.sample(range(1,51), oos_size)\n", 460 | " else:\n", 461 | " n = len(oos_labels)\n", 462 | " assert n <= oos_size\n", 463 | " assert len(set(oos_labels)) == n\n", 464 | " assert all(0 < s <= 50 for s in oos_labels)\n", 465 | " if n < oos_size:\n", 466 | " oos_labels += random.sample(set(range(1,51)) - set(oos_labels), oos_size - n)\n", 467 | "\n", 468 | " # for each label build a map such that the known labels are at the start followed by the unknown labels\n", 469 | " label_map = [0] + sorted(set(range(1,51)) - set(oos_labels)) + sorted(oos_labels)\n", 470 | " y_train = np.array([label_map.index(yy) for yy in y])\n", 471 | "\n", 472 | " # index of all samples that are out-of-set\n", 473 | " oos = [i for i, yy in enumerate(y_train) if yy > Q or yy == 0]\n", 474 | " # index of all samples that are in-set\n", 475 | " in_set = [i for i, yy in enumerate(y_train) if 0 < yy <= Q]\n", 476 | " \n", 477 | " # take a part, r, of the samples that are in-set to be unlabeled, and leave 1-r\n", 478 | " # eventually the unlabeld set will be made from Q/50 in-set samples and 1-Q/50 oos samples\n", 479 | " # eventually the unlabeled size will be r*50/Q and we want\n", 480 | " # (1-r)*unlabel_label_ratio = r*50/Q\n", 481 | " # unlabel_label_ratio = r(50/Q + unlabel_label_ratio)\n", 482 | " # r = unlabel_label_ratio/(50./Q + unlabel_label_ratio)\n", 483 | " # r = Q*unlabel_label_ratio/(50. + Q*unlabel_label_ratio)\n", 484 | " Qu = Q*unlabel_label_ratio\n", 485 | " r = Qu/(50. + Qu)\n", 486 | " in_set_unlabeled = random.sample(in_set, int(len(in_set)*r))\n", 487 | " # the other half will be used as labeled\n", 488 | " in_set_labeled = list(set(in_set) - set(in_set_unlabeled))\n", 489 | " # give the unlabeled samples that are in-set have a high label (so the training will consider them to be unlabeled)\n", 490 | " # but keep their original identity for error measurement\n", 491 | " y_train[in_set_unlabeled] += 50\n", 492 | "\n", 493 | " # add out-of-set samples to the unlabeled set keeping the ratio to labeled as before\n", 494 | " oos_unlabeled = random.sample(oos,int(len(oos)*r))\n", 495 | "\n", 496 | " unlabeled = oos_unlabeled+in_set_unlabeled\n", 497 | "\n", 498 | " # all other (oos) samples are dropped (too bad but we want to keep the original ratios)\n", 499 | " keep = in_set_labeled + unlabeled\n", 500 | " random.shuffle(keep)\n", 501 | "\n", 502 | " y_train = y_train[keep]\n", 503 | " X_train = X[keep]\n", 504 | " return X_train, y_train, label_map" 505 | ] 506 | }, 507 | { 508 | "cell_type": "code", 509 | "execution_count": 25, 510 | "metadata": { 511 | "collapsed": true 512 | }, 513 | "outputs": [], 514 | "source": [ 515 | "Q=38\n", 516 | "poos=0.23" 517 | ] 518 | }, 519 | { 520 | "cell_type": "code", 521 | "execution_count": 26, 522 | "metadata": { 523 | "collapsed": true 524 | }, 525 | "outputs": [], 526 | "source": [ 527 | "all_oos = []" 528 | ] 529 | }, 530 | { 531 | "cell_type": "code", 532 | "execution_count": 27, 533 | "metadata": { 534 | "collapsed": false 535 | }, 536 | "outputs": [ 537 | { 538 | "name": "stdout", 539 | "output_type": "stream", 540 | "text": [ 541 | "0 12391 /Users/udi/Downloads/lisa/160111-fuel.train.0/160111-fuel.train.0.hdf5\n", 542 | "-rw-r--r-- 1 udi staff 19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.0/160111-fuel.train.0.hdf5\n", 543 | "12\n", 544 | "1 12391 /Users/udi/Downloads/lisa/160111-fuel.train.1/160111-fuel.train.1.hdf5\n", 545 | "-rw-r--r-- 1 udi staff 19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.1/160111-fuel.train.1.hdf5\n", 546 | "24\n", 547 | "2 12391 /Users/udi/Downloads/lisa/160111-fuel.train.2/160111-fuel.train.2.hdf5\n", 548 | "-rw-r--r-- 1 udi staff 19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.2/160111-fuel.train.2.hdf5\n", 549 | "36\n", 550 | "3 12391 /Users/udi/Downloads/lisa/160111-fuel.train.3/160111-fuel.train.3.hdf5\n", 551 | "-rw-r--r-- 1 udi staff 19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.3/160111-fuel.train.3.hdf5\n", 552 | "48\n", 553 | "4 12391 /Users/udi/Downloads/lisa/160111-fuel.train.4/160111-fuel.train.4.hdf5\n", 554 | "-rw-r--r-- 1 udi staff 19928000 Jan 11 18:02 /Users/udi/Downloads/lisa/160111-fuel.train.4/160111-fuel.train.4.hdf5\n", 555 | "50\n" 556 | ] 557 | } 558 | ], 559 | "source": [ 560 | "for seed in range(5):\n", 561 | " # the first 5 seeds are used to cover all labels at least once\n", 562 | " oos_labels = range(1+12*seed,min(1+12*seed + 12,51))\n", 563 | " \n", 564 | " X_train, y_train, labels = cv_modify(X[:Nt], y[:Nt], Q, seed=seed, oos_labels=oos_labels)\n", 565 | " datasource = '160111-fuel.train.%d'%seed\n", 566 | " datasource_dir = os.path.join(fuel.config.data_path[0], datasource)\n", 567 | " datasource_fname = os.path.join(datasource_dir , datasource + '.hdf5')\n", 568 | " !mkdir -p {datasource_dir}\n", 569 | " N0 = len(X_train)\n", 570 | " print seed, N0, datasource_fname\n", 571 | "\n", 572 | " with h5py.File(datasource_fname, mode='w') as fp:\n", 573 | " features = fp.create_dataset('features', (N0, NF), dtype=np.float32)\n", 574 | " targets = fp.create_dataset('targets', (N0,), dtype='int')\n", 575 | " features[...] = X_train.astype(np.float32)\n", 576 | " targets[...] = y_train\n", 577 | " \n", 578 | " split_dict = {\n", 579 | " 'train': {'features': (0, N0), 'targets': (0, N0)},\n", 580 | " 'test': {'features': (0, N0), 'targets': (0, N0)}\n", 581 | " }\n", 582 | " fp.attrs['split'] = H5PYDataset.create_split_array(split_dict)\n", 583 | " fp.attrs['labels'] = labels\n", 584 | " !ls -l {datasource_fname}\n", 585 | " oos = labels[-12:]\n", 586 | " all_oos += oos\n", 587 | " print len(set(all_oos))" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": 28, 593 | "metadata": { 594 | "collapsed": true 595 | }, 596 | "outputs": [], 597 | "source": [ 598 | "!(cd /Users/udi/Downloads/lisa/ ; tar cfz 160111-fuel.tgz 160111-fuel.train.* )" 599 | ] 600 | }, 601 | { 602 | "cell_type": "code", 603 | "execution_count": 29, 604 | "metadata": { 605 | "collapsed": false 606 | }, 607 | "outputs": [ 608 | { 609 | "name": "stdout", 610 | "output_type": "stream", 611 | "text": [ 612 | "move: ../../../../../Downloads/lisa/160111-fuel.tgz to s3://udikaggle/nist/160111-fuel.tgz\n" 613 | ] 614 | } 615 | ], 616 | "source": [ 617 | "!aws s3 mv /Users/udi/Downloads/lisa/160111-fuel.tgz s3://udikaggle/nist/" 618 | ] 619 | }, 620 | { 621 | "cell_type": "code", 622 | "execution_count": null, 623 | "metadata": { 624 | "collapsed": true 625 | }, 626 | "outputs": [], 627 | "source": [] 628 | } 629 | ], 630 | "metadata": { 631 | "kernelspec": { 632 | "display_name": "Python 2", 633 | "language": "python", 634 | "name": "python2" 635 | }, 636 | "language_info": { 637 | "codemirror_mode": { 638 | "name": "ipython", 639 | "version": 2 640 | }, 641 | "file_extension": ".py", 642 | "mimetype": "text/x-python", 643 | "name": "python", 644 | "nbconvert_exporter": "python", 645 | "pygments_lexer": "ipython2", 646 | "version": "2.7.11" 647 | } 648 | }, 649 | "nbformat": 4, 650 | "nbformat_minor": 0 651 | } 652 | -------------------------------------------------------------------------------- /ladder.py: -------------------------------------------------------------------------------- 1 | import logging 2 | 3 | import numpy as np 4 | from collections import OrderedDict 5 | 6 | import theano 7 | import theano.tensor as T 8 | from theano.sandbox.rng_mrg import MRG_RandomStreams as RandomStreams 9 | from theano.tensor.nnet.conv import conv2d, ConvOp 10 | from theano.sandbox.cuda.blas import GpuCorrMM 11 | from theano.sandbox.cuda.basic_ops import gpu_contiguous 12 | 13 | from blocks.bricks.cost import SquaredError 14 | from blocks.bricks.cost import CategoricalCrossEntropy, MisclassificationRate, Cost 15 | from blocks.graph import add_annotation, Annotation 16 | from blocks.roles import add_role, PARAMETER, WEIGHT, BIAS 17 | 18 | from utils import shared_param, AttributeDict 19 | from nn import maxpool_2d, global_meanpool_2d, BNPARAM 20 | 21 | logger = logging.getLogger('main.model') 22 | floatX = theano.config.floatX 23 | 24 | from blocks.bricks.base import application 25 | from theano.tensor.extra_ops import to_one_hot 26 | class MisclassificationRateIV(Cost): 27 | def __init__(self, oos_thr=0., poos=0.23): 28 | self.oos_thr = oos_thr 29 | self.poos = poos 30 | super(MisclassificationRateIV, self).__init__() 31 | 32 | @application(outputs=["error_rate"]) 33 | def apply(self, y, y_hat): 34 | # find the unlabeled samples: a combination of oos and labeled samples that were moved by +50 35 | unlabeled = (y >= y_hat.shape[1]).nonzero() 36 | y = y[unlabeled] 37 | y_hat = y_hat[unlabeled] 38 | # return unlabeled samples that are in-set to their original value 39 | y = T.switch(y <= 50, y, y-50) 40 | # convert oos to 0 41 | y = T.switch(y < y_hat.shape[1], y, 0) 42 | 43 | # if maximal prob is below oos_thr then assume it is OOS 44 | y_hat_argmax = T.switch(y_hat.max(axis=1) >= self.oos_thr, y_hat.argmax(axis=1), 0) 45 | # locate mistakes 46 | mistakes = T.neq(y, y_hat_argmax) 47 | 48 | # compute the error rate for each label 49 | yhot = to_one_hot(y, y_hat.shape[1], dtype=floatX) 50 | yhot = yhot.T 51 | mistakes = T.dot(yhot, mistakes) / (yhot.sum(axis=1) + np.float32(1e-6)) 52 | return (1. - self.poos)*mistakes[1:].mean() + self.poos * mistakes[0] 53 | 54 | class OOSRateIV(Cost): 55 | @application(outputs=["oos_rate"]) 56 | def apply(self, y, y_hat): 57 | # find the unlabeled samples: a combination of oos and labeled samples that were moved by +50 58 | unlabeled = (y >= y_hat.shape[1]).nonzero() 59 | y_hat = y_hat[unlabeled] 60 | oos = T.eq(y_hat.argmax(axis=1), 0) 61 | return T.mean(oos) 62 | 63 | # Exactly like 160107-keras.ipynb 64 | def objective(y_true, y_pred, P, Q, alpha=0., beta=0.15, dbeta=0., gamma=0.01, gamma1=-1., poos=0.23, eps=1e-6): 65 | '''Expects a binary class matrix instead of a vector of scalar classes. 66 | ''' 67 | 68 | beta = np.float32(beta) 69 | dbeta = np.float32(dbeta) 70 | gamma = np.float32(gamma) 71 | poos = np.float32(poos) 72 | eps = np.float32(eps) 73 | 74 | # scale preds so that the class probas of each sample sum to 1 75 | y_pred += eps 76 | y_pred /= y_pred.sum(axis=-1, keepdims=True) 77 | 78 | y_true = T.cast(y_true.flatten(), 'int64') 79 | y1 = T.and_(T.gt(y_true, 0), T.le(y_true, Q)) # in-set 80 | y0 = T.or_(T.eq(y_true, 0), T.gt(y_true, Q)) # out-of-set or unlabeled 81 | y0sum = y0.sum() + eps # number of oos 82 | y1sum = y1.sum() + eps # number of in-set 83 | # we want to reduce cross entrophy of labeled data 84 | # convert all oos/unlabeled to label=0 85 | cost0 = T.nnet.categorical_crossentropy(y_pred, T.switch(y_true <= Q, y_true, 0)) 86 | cost0 = T.dot(y1, cost0) / y1sum # average cost per labeled example 87 | 88 | if alpha: 89 | cost1 = T.nnet.categorical_crossentropy(y_pred, y_pred) 90 | cost1 = T.dot(y0, cost1) / y0sum # average cost per labeled example 91 | cost0 += alpha*cost1 92 | 93 | # we want to increase the average entrophy in each batch 94 | # average over batch 95 | if beta: 96 | y_pred_avg0 = T.dot(y0, y_pred) / y0sum 97 | y_pred_avg0 = T.clip(y_pred_avg0, eps, np.float32(1) - eps) 98 | y_pred_avg0 /= y_pred_avg0.sum(axis=-1, keepdims=True) 99 | cost2 = T.nnet.categorical_crossentropy(y_pred_avg0.reshape((1,-1)), P-dbeta)[0] # [None,:] 100 | cost2 = T.switch(y0sum > 0.5, cost2, 0.) # ignore cost2 if no samples 101 | cost0 += beta*cost2 102 | 103 | # binary classifier score 104 | if gamma: 105 | y_pred0 = T.clip(y_pred[:,0], eps, np.float32(1) - eps) 106 | if gamma1 < 0.: 107 | cost3 = - T.dot(poos*y0,T.log(y_pred0)) - T.dot(np.float32(1)-poos*y0.T,T.log(np.float32(1)-y_pred0)) 108 | cost3 /= y_pred.shape[0] 109 | cost0 += gamma*cost3 110 | elif gamma1 > 0.: 111 | cost3 = - T.dot(poos*y0,T.log(y_pred0)) - T.dot((np.float32(1)-poos)*y0,T.log(np.float32(1)-y_pred0)) 112 | cost3 /= y0sum 113 | cost31 = - T.dot(y1,T.log(np.float32(1)-y_pred0)) 114 | cost3 /= y1sum 115 | cost0 += gamma*cost3 + gamma1*cost31 116 | else: # gamma1 == 0. 117 | cost3 = - T.dot(poos*y0,T.log(y_pred0)) - T.dot((np.float32(1)-poos)*y0, T.log(np.float32(1)-y_pred0)) 118 | cost3 /= y0sum 119 | cost0 += gamma*cost3 120 | return cost0 121 | 122 | 123 | 124 | class CategoricalCrossEntropyIV(Cost): 125 | def __init__(self, Q, poos=0.23, alpha=0., beta=0.15, dbeta=0., gamma=0.01, gamma1=-1.): 126 | self.poos = poos 127 | self.alpha = alpha 128 | self.beta = beta 129 | self.dbeta = dbeta 130 | self.gamma = gamma 131 | self.gamma1 = gamma1 132 | super(CategoricalCrossEntropyIV, self).__init__() 133 | 134 | self.Q = Q 135 | P = (1.-poos)/Q*np.ones(Q+1) 136 | P[0] = poos 137 | P = P.reshape((1,-1)) 138 | self.P = theano.shared(P.astype(theano.config.floatX), broadcastable=(True,False)) 139 | 140 | @application(outputs=["cost"]) 141 | def apply(self, y, y_hat): 142 | return T.sum(objective(y, y_hat, self.P, self.Q, self.alpha, self.beta, self.dbeta, self.gamma, self.gamma1)) 143 | 144 | class LadderAE(): 145 | def __init__(self, p): 146 | self.p = p 147 | self.init_weights_transpose = False 148 | self.default_lr = p.lr 149 | self.shareds = OrderedDict() 150 | self.rstream = RandomStreams(seed=p.seed) 151 | self.rng = np.random.RandomState(seed=p.seed) 152 | 153 | n_layers = len(p.encoder_layers) 154 | assert n_layers > 1, "Need to define encoder layers" 155 | assert n_layers == len(p.denoising_cost_x), ( 156 | "Number of denoising costs does not match with %d layers: %s" % 157 | (n_layers, str(p.denoising_cost_x))) 158 | 159 | def one_to_all(x): 160 | """ (5.,) -> 5 -> (5., 5., 5.) 161 | ('relu',) -> 'relu' -> ('relu', 'relu', 'relu') 162 | """ 163 | if type(x) is tuple and len(x) == 1: 164 | x = x[0] 165 | 166 | if type(x) is float: 167 | x = (np.float32(x),) * n_layers 168 | 169 | if type(x) is str: 170 | x = (x,) * n_layers 171 | return x 172 | 173 | p.decoder_spec = one_to_all(p.decoder_spec) 174 | p.f_local_noise_std = one_to_all(p.f_local_noise_std) 175 | acts = one_to_all(p.get('act', 'relu')) 176 | 177 | assert n_layers == len(p.decoder_spec), "f and g need to match" 178 | assert (n_layers == len(acts)), ( 179 | "Not enough activations given. Requires %d. Got: %s" % 180 | (n_layers, str(acts))) 181 | acts = acts[:-1] + ('softmax',) 182 | 183 | def parse_layer(spec): 184 | """ 'fc:5' -> ('fc', 5) 185 | '5' -> ('fc', 5) 186 | 5 -> ('fc', 5) 187 | 'convv:3:2:2' -> ('convv', [3,2,2]) 188 | """ 189 | if type(spec) is not str: 190 | return "fc", spec 191 | spec = spec.split(':') 192 | l_type = spec.pop(0) if len(spec) >= 2 else "fc" 193 | spec = map(int, spec) 194 | spec = spec[0] if len(spec) == 1 else spec 195 | return l_type, spec 196 | 197 | enc = map(parse_layer, p.encoder_layers) 198 | self.layers = list(enumerate(zip(enc, p.decoder_spec, acts))) 199 | 200 | def weight(self, init, name, cast_float32=True, for_conv=False): 201 | weight = self.shared(init, name, cast_float32, role=WEIGHT) 202 | if for_conv: 203 | return weight.dimshuffle('x', 0, 'x', 'x') 204 | return weight 205 | 206 | def bias(self, init, name, cast_float32=True, for_conv=False): 207 | b = self.shared(init, name, cast_float32, role=BIAS) 208 | if for_conv: 209 | return b.dimshuffle('x', 0, 'x', 'x') 210 | return b 211 | 212 | def shared(self, init, name, cast_float32=True, role=PARAMETER, **kwargs): 213 | p = self.shareds.get(name) 214 | if p is None: 215 | p = shared_param(init, name, cast_float32, role, **kwargs) 216 | self.shareds[name] = p 217 | return p 218 | 219 | def counter(self): 220 | name = 'counter' 221 | p = self.shareds.get(name) 222 | update = [] 223 | if p is None: 224 | p_max_val = np.float32(10) 225 | p = self.shared(np.float32(1), name, role=BNPARAM) 226 | p_max = self.shared(p_max_val, name + '_max', role=BNPARAM) 227 | update = [(p, T.clip(p + np.float32(1), np.float32(0), p_max)), 228 | (p_max, p_max_val)] 229 | return (p, update) 230 | 231 | def noise_like(self, x): 232 | noise = self.rstream.normal(size=x.shape, avg=0.0, std=1.0) 233 | return T.cast(noise, dtype=floatX) 234 | 235 | def rand_init(self, in_dim, out_dim): 236 | """ Random initialization for fully connected layers """ 237 | W = self.rng.randn(in_dim, out_dim) / np.sqrt(in_dim) 238 | return W 239 | 240 | def rand_init_conv(self, dim): 241 | """ Random initialization for convolution filters """ 242 | fan_in = np.prod(dtype=floatX, a=dim[1:]) 243 | bound = np.sqrt(3. / max(1.0, (fan_in))) 244 | W = np.asarray( 245 | self.rng.uniform(low=-bound, high=bound, size=dim), dtype=floatX) 246 | return W 247 | 248 | def new_activation_dict(self): 249 | return AttributeDict({'z': {}, 'h': {}, 's': {}, 'm': {}}) 250 | 251 | def annotate_update(self, update, tag_to): 252 | a = Annotation() 253 | for (var, up) in update: 254 | a.updates[var] = up 255 | add_annotation(tag_to, a) 256 | 257 | def apply(self, input_labeled, target_labeled, input_unlabeled): 258 | self.layer_counter = 0 259 | input_dim = self.p.encoder_layers[0] 260 | 261 | # Store the dimension tuples in the same order as layers. 262 | layers = self.layers 263 | self.layer_dims = {0: input_dim} 264 | 265 | self.lr = self.shared(self.default_lr, 'learning_rate', role=None) 266 | 267 | self.costs = costs = AttributeDict() 268 | self.costs.denois = AttributeDict() 269 | 270 | self.act = AttributeDict() 271 | self.error = AttributeDict() 272 | self.oos = AttributeDict() 273 | 274 | top = len(layers) - 1 275 | 276 | N = input_labeled.shape[0] 277 | self.join = lambda l, u: T.concatenate([l, u], axis=0) 278 | self.labeled = lambda x: x[:N] if x is not None else x 279 | self.unlabeled = lambda x: x[N:] if x is not None else x 280 | self.split_lu = lambda x: (self.labeled(x), self.unlabeled(x)) 281 | 282 | input_concat = self.join(input_labeled, input_unlabeled) 283 | 284 | def encoder(input_, path_name, input_noise_std=0, noise_std=[]): 285 | h = input_ 286 | 287 | logger.info(' 0: noise %g' % input_noise_std) 288 | if input_noise_std > 0.: 289 | h = h + self.noise_like(h) * input_noise_std 290 | 291 | d = AttributeDict() 292 | d.unlabeled = self.new_activation_dict() 293 | d.labeled = self.new_activation_dict() 294 | d.labeled.z[0] = self.labeled(h) 295 | d.unlabeled.z[0] = self.unlabeled(h) 296 | prev_dim = input_dim 297 | for i, (spec, _, act_f) in layers[1:]: 298 | d.labeled.h[i - 1], d.unlabeled.h[i - 1] = self.split_lu(h) 299 | noise = noise_std[i] if i < len(noise_std) else 0. 300 | curr_dim, z, m, s, h = self.f(h, prev_dim, spec, i, act_f, 301 | path_name=path_name, 302 | noise_std=noise) 303 | assert self.layer_dims.get(i) in (None, curr_dim) 304 | self.layer_dims[i] = curr_dim 305 | d.labeled.z[i], d.unlabeled.z[i] = self.split_lu(z) 306 | d.unlabeled.s[i] = s 307 | d.unlabeled.m[i] = m 308 | prev_dim = curr_dim 309 | d.labeled.h[i], d.unlabeled.h[i] = self.split_lu(h) 310 | return d 311 | 312 | # Clean, supervised 313 | logger.info('Encoder: clean, labeled') 314 | clean = self.act.clean = encoder(input_concat, 'clean') 315 | 316 | # Corrupted, supervised 317 | logger.info('Encoder: corr, labeled') 318 | corr = self.act.corr = encoder(input_concat, 'corr', 319 | input_noise_std=self.p.super_noise_std, 320 | noise_std=self.p.f_local_noise_std) 321 | est = self.act.est = self.new_activation_dict() 322 | 323 | # Decoder path in opposite order 324 | logger.info('Decoder: z_corr -> z_est') 325 | for i, ((_, spec), l_type, act_f) in layers[::-1]: 326 | z_corr = corr.unlabeled.z[i] 327 | z_clean = clean.unlabeled.z[i] 328 | z_clean_s = clean.unlabeled.s.get(i) 329 | z_clean_m = clean.unlabeled.m.get(i) 330 | fspec = layers[i+1][1][0] if len(layers) > i+1 else (None, None) 331 | 332 | if i == top: 333 | ver = corr.unlabeled.h[i] 334 | ver_dim = self.layer_dims[i] 335 | top_g = True 336 | else: 337 | ver = est.z.get(i + 1) 338 | ver_dim = self.layer_dims.get(i + 1) 339 | top_g = False 340 | 341 | z_est = self.g(z_lat=z_corr, 342 | z_ver=ver, 343 | in_dims=ver_dim, 344 | out_dims=self.layer_dims[i], 345 | l_type=l_type, 346 | num=i, 347 | fspec=fspec, 348 | top_g=top_g) 349 | 350 | if z_est is not None: 351 | # Denoising cost 352 | 353 | if z_clean_s and self.p.zestbn == 'bugfix': 354 | z_est_norm = (z_est - z_clean_m) / T.sqrt(z_clean_s + np.float32(1e-10)) 355 | elif z_clean_s is None or self.p.zestbn == 'no': 356 | z_est_norm = z_est 357 | else: 358 | assert False, 'Not supported path' 359 | 360 | se = SquaredError('denois' + str(i)) 361 | costs.denois[i] = se.apply(z_est_norm.flatten(2), 362 | z_clean.flatten(2)) \ 363 | / np.prod(self.layer_dims[i], dtype=floatX) 364 | costs.denois[i].name = 'denois' + str(i) 365 | denois_print = 'denois %.2f' % self.p.denoising_cost_x[i] 366 | else: 367 | denois_print = '' 368 | 369 | # Store references for later use 370 | est.h[i] = self.apply_act(z_est, act_f) 371 | est.z[i] = z_est 372 | est.s[i] = None 373 | est.m[i] = None 374 | logger.info(' g%d: %10s, %s, dim %s -> %s' % ( 375 | i, l_type, 376 | denois_print, 377 | self.layer_dims.get(i+1), 378 | self.layer_dims.get(i) 379 | )) 380 | 381 | # Costs 382 | y = target_labeled.flatten() 383 | 384 | Q = int(self.layer_dims[top][0]) - 1 385 | logger.info('Q=%d'%Q) 386 | costs.class_clean = CategoricalCrossEntropyIV(Q=Q, 387 | alpha=self.p.alpha, 388 | beta=self.p.beta, 389 | dbeta=self.p.dbeta, 390 | gamma=self.p.gamma, 391 | gamma1=self.p.gamma1 392 | ).apply(y, clean.labeled.h[top]) 393 | costs.class_clean.name = 'cost_class_clean' 394 | 395 | costs.class_corr = CategoricalCrossEntropyIV(Q=Q, 396 | alpha=self.p.alpha, 397 | beta=self.p.beta, 398 | dbeta=self.p.dbeta, 399 | gamma=self.p.gamma, 400 | gamma1=self.p.gamma1, 401 | ).apply(y, corr.labeled.h[top]) 402 | costs.class_corr.name = 'cost_class_corr' 403 | 404 | # This will be used for training 405 | costs.total = costs.class_corr * 1.0 406 | for i in range(top + 1): 407 | if costs.denois.get(i) and self.p.denoising_cost_x[i] > 0: 408 | costs.total += costs.denois[i] * self.p.denoising_cost_x[i] 409 | if self.p.alpha_clean: 410 | y_true = y 411 | eps = np.float32(1e-6) 412 | 413 | # scale preds so that the class probas of each sample sum to 1 414 | y_pred = clean.labeled.h[top] + eps 415 | y_pred /= y_pred.sum(axis=-1, keepdims=True) 416 | 417 | y0 = T.or_(T.eq(y_true, 0), T.gt(y_true, Q)) # out-of-set or unlabeled 418 | y0sum = y0.sum() + eps # number of oos 419 | 420 | cost1 = T.nnet.categorical_crossentropy(y_pred, y_pred) 421 | cost1 = T.dot(y0, cost1) / y0sum # average cost per labeled example 422 | costs.total += self.p.alpha_clean * cost1 423 | 424 | costs.total.name = 'cost_total' 425 | 426 | # Classification error 427 | mr = MisclassificationRateIV(oos_thr=self.p.oos_thr) 428 | self.error.clean = mr.apply(y, clean.labeled.h[top]) * np.float32(100.) 429 | self.error.clean.name = 'error_rate_clean' 430 | oosr = OOSRateIV() 431 | self.oos.clean = oosr.apply(y, clean.labeled.h[top]) * np.float32(100.) 432 | self.oos.clean.name = 'oos_rate_clean' 433 | 434 | def apply_act(self, input, act_name): 435 | if input is None: 436 | return input 437 | act = { 438 | 'relu': lambda x: T.maximum(0, x), 439 | 'leakyrelu': lambda x: T.switch(x > 0., x, 0.1 * x), 440 | 'linear': lambda x: x, 441 | 'softplus': lambda x: T.log(1. + T.exp(x)), 442 | 'sigmoid': lambda x: T.nnet.sigmoid(x), 443 | 'softmax': lambda x: T.nnet.softmax(x), 444 | }.get(act_name) 445 | assert act, 'unknown act %s' % act_name 446 | if act_name == 'softmax': 447 | input = input.flatten(2) 448 | return act(input) 449 | 450 | def annotate_bn(self, var, id, var_type, mb_size, size, norm_ax): 451 | var_shape = np.array((1,) + size) 452 | out_dim = np.prod(var_shape) / np.prod(var_shape[list(norm_ax)]) 453 | # Flatten the var - shared variable updating is not trivial otherwise, 454 | # as theano seems to believe a row vector is a matrix and will complain 455 | # about the updates 456 | orig_shape = var.shape 457 | var = var.flatten() 458 | # Here we add the name and role, the variables will later be identified 459 | # by these values 460 | var.name = id + '_%s_clean' % var_type 461 | add_role(var, BNPARAM) 462 | shared_var = self.shared(np.zeros(out_dim), 463 | name='shared_%s' % var.name, role=None) 464 | 465 | # Update running average estimates. When the counter is reset to 1, it 466 | # will clear its memory 467 | cntr, c_up = self.counter() 468 | one = np.float32(1) 469 | run_avg = lambda new, old: one / cntr * new + (one - one / cntr) * old 470 | if var_type == 'mean': 471 | new_value = run_avg(var, shared_var) 472 | elif var_type == 'var': 473 | mb_size = T.cast(mb_size, 'float32') 474 | new_value = run_avg(mb_size / (mb_size - one) * var, shared_var) 475 | else: 476 | raise NotImplemented('Unknown batch norm var %s' % var_type) 477 | # Add the counter update to the annotated update if it is the first 478 | # instance of a counter 479 | self.annotate_update([(shared_var, new_value)] + c_up, var) 480 | 481 | return var.reshape(orig_shape) 482 | 483 | def f(self, h, in_dim, spec, num, act_f, path_name, noise_std=0): 484 | assert path_name in ['clean', 'corr'] 485 | # Generates identifiers used for referencing shared variables. 486 | # E.g. clean and corrupted encoders will end up using the same 487 | # variable name and hence sharing parameters 488 | gen_id = lambda s: '_'.join(['f', str(num), s]) 489 | layer_type, _ = spec 490 | 491 | # Pooling 492 | if layer_type in ['maxpool', 'globalmeanpool']: 493 | z, output_size = self.f_pool(h, spec, in_dim) 494 | norm_ax = (0, -2, -1) 495 | # after pooling, no activation func for now unless its softmax 496 | act_f = "linear" if act_f != "softmax" else act_f 497 | 498 | # Convolution 499 | elif layer_type in ['convv', 'convf']: 500 | z, output_size = self.f_conv(h, spec, in_dim, gen_id('W')) 501 | norm_ax = (0, -2, -1) 502 | 503 | # Fully connected 504 | elif layer_type == "fc": 505 | h = h.flatten(2) if h.ndim > 2 else h 506 | _, dim = spec 507 | W = self.weight(self.rand_init(np.prod(in_dim), dim), gen_id('W')) 508 | z, output_size = T.dot(h, W), (dim,) 509 | norm_ax = (0,) 510 | else: 511 | raise ValueError("Unknown layer spec: %s" % layer_type) 512 | 513 | m = s = None 514 | is_normalizing = True 515 | if is_normalizing: 516 | keep_dims = True 517 | z_l = self.labeled(z) 518 | z_u = self.unlabeled(z) 519 | m = z_u.mean(norm_ax, keepdims=keep_dims) 520 | s = z_u.var(norm_ax, keepdims=keep_dims) 521 | 522 | m_l = z_l.mean(norm_ax, keepdims=keep_dims) 523 | s_l = z_l.var(norm_ax, keepdims=keep_dims) 524 | if path_name == 'clean': 525 | # Batch normalization estimates the mean and variance of 526 | # validation and test sets based on the training set 527 | # statistics. The following annotates the computation of 528 | # running average to the graph. 529 | m_l = self.annotate_bn(m_l, gen_id('bn'), 'mean', z_l.shape[0], 530 | output_size, norm_ax) 531 | s_l = self.annotate_bn(s_l, gen_id('bn'), 'var', z_l.shape[0], 532 | output_size, norm_ax) 533 | z = self.join( 534 | (z_l - m_l) / T.sqrt(s_l + np.float32(1e-10)), 535 | (z_u - m) / T.sqrt(s + np.float32(1e-10))) 536 | 537 | if noise_std > 0: 538 | z += self.noise_like(z) * noise_std 539 | 540 | # z for lateral connection 541 | z_lat = z 542 | b_init, c_init = 0.0, 1.0 543 | b_c_size = output_size[0] 544 | 545 | # Add bias 546 | if act_f != 'linear': 547 | z += self.bias(b_init * np.ones(b_c_size), gen_id('b'), 548 | for_conv=len(output_size) > 1) 549 | 550 | if is_normalizing: 551 | # Add free parameter (gamma in original Batch Normalization paper) 552 | # if needed by the activation. For instance ReLU does't need one 553 | # and we only add it to softmax if hyperparameter top_c is set. 554 | if (act_f not in ['relu', 'leakyrelu', 'linear', 'softmax'] or 555 | (act_f == 'softmax' and self.p.top_c is True)): 556 | c = self.weight(c_init * np.ones(b_c_size), gen_id('c'), 557 | for_conv=len(output_size) > 1) 558 | z *= c 559 | 560 | h = self.apply_act(z, act_f) 561 | 562 | logger.info(' f%d: %s, %s,%s noise %.2f, params %s, dim %s -> %s' % ( 563 | num, layer_type, act_f, ' BN,' if is_normalizing else '', 564 | noise_std, spec[1], in_dim, output_size)) 565 | return output_size, z_lat, m, s, h 566 | 567 | def f_pool(self, x, spec, in_dim): 568 | layer_type, dims = spec 569 | num_filters = in_dim[0] 570 | if "globalmeanpool" == layer_type: 571 | y, output_size = global_meanpool_2d(x, num_filters) 572 | # scale the variance to match normal conv layers with xavier init 573 | y = y * np.float32(in_dim[-1]) * np.float32(np.sqrt(3)) 574 | else: 575 | assert dims[0] != 1 or dims[1] != 1 576 | y, output_size = maxpool_2d(x, in_dim, 577 | poolsize=(dims[1], dims[1]), 578 | poolstride=(dims[0], dims[0])) 579 | return y, output_size 580 | 581 | def f_conv(self, x, spec, in_dim, weight_name): 582 | layer_type, dims = spec 583 | num_filters = dims[0] 584 | filter_size = (dims[1], dims[1]) 585 | stride = (dims[2], dims[2]) 586 | 587 | bm = 'full' if 'convf' in layer_type else 'valid' 588 | 589 | num_channels = in_dim[0] 590 | 591 | W = self.weight(self.rand_init_conv( 592 | (num_filters, num_channels) + filter_size), weight_name) 593 | 594 | if stride != (1, 1): 595 | f = GpuCorrMM(subsample=stride, border_mode=bm, pad=(0, 0)) 596 | y = f(gpu_contiguous(x), gpu_contiguous(W)) 597 | else: 598 | assert self.p.batch_size == self.p.valid_batch_size 599 | y = conv2d(x, W, image_shape=(2*self.p.batch_size, ) + in_dim, 600 | filter_shape=((num_filters, num_channels) + 601 | filter_size), border_mode=bm) 602 | output_size = ((num_filters,) + 603 | ConvOp.getOutputShape(in_dim[1:], filter_size, 604 | stride, bm)) 605 | 606 | return y, output_size 607 | 608 | def g(self, z_lat, z_ver, in_dims, out_dims, l_type, num, fspec, top_g): 609 | f_layer_type, dims = fspec 610 | is_conv = f_layer_type is not None and ('conv' in f_layer_type or 611 | 'pool' in f_layer_type) 612 | gen_id = lambda s: '_'.join(['g', str(num), s]) 613 | 614 | in_dim = np.prod(dtype=floatX, a=in_dims) 615 | out_dim = np.prod(dtype=floatX, a=out_dims) 616 | num_filters = out_dims[0] if is_conv else out_dim 617 | 618 | if l_type[-1] in ['0']: 619 | g_type, u_type = l_type[:-1], l_type[-1] 620 | else: 621 | g_type, u_type = l_type, None 622 | 623 | # Mapping from layer above: u 624 | if u_type in ['0'] or z_ver is None: 625 | if z_ver is None and u_type not in ['0']: 626 | logger.warn('Decoder %d:%s without vertical input' % 627 | (num, g_type)) 628 | u = None 629 | else: 630 | if top_g: 631 | u = z_ver 632 | elif is_conv: 633 | u = self.g_deconv(z_ver, in_dims, out_dims, gen_id('W'), fspec) 634 | else: 635 | W = self.weight(self.rand_init(in_dim, out_dim), gen_id('W')) 636 | u = T.dot(z_ver, W) 637 | 638 | # Batch-normalize u 639 | if u is not None: 640 | norm_ax = (0,) if u.ndim <= 2 else (0, -2, -1) 641 | keep_dims = True 642 | u -= u.mean(norm_ax, keepdims=keep_dims) 643 | u /= T.sqrt(u.var(norm_ax, keepdims=keep_dims) + 644 | np.float32(1e-10)) 645 | 646 | # Define the g function 647 | if not is_conv: 648 | z_lat = z_lat.flatten(2) 649 | bi = lambda inits, name: self.bias(inits * np.ones(num_filters), 650 | gen_id(name), for_conv=is_conv) 651 | wi = lambda inits, name: self.weight(inits * np.ones(num_filters), 652 | gen_id(name), for_conv=is_conv) 653 | 654 | if g_type == '': 655 | z_est = None 656 | 657 | elif g_type == 'i': 658 | z_est = z_lat 659 | 660 | elif g_type in ['sig']: 661 | sigval = bi(0., 'c1') + wi(1., 'c2') * z_lat 662 | if u is not None: 663 | sigval += wi(0., 'c3') * u + wi(0., 'c4') * z_lat * u 664 | sigval = T.nnet.sigmoid(sigval) 665 | 666 | z_est = bi(0., 'a1') + wi(1., 'a2') * z_lat + wi(1., 'b1') * sigval 667 | if u is not None: 668 | z_est += wi(0., 'a3') * u + wi(0., 'a4') * z_lat * u 669 | 670 | elif g_type in ['lin']: 671 | a1 = wi(1.0, 'a1') 672 | b = bi(0.0, 'b') 673 | 674 | z_est = a1 * z_lat + b 675 | 676 | elif g_type in ['relu']: 677 | assert u is not None 678 | b = bi(0., 'b') 679 | x = u + b 680 | z_est = self.apply_act(x, 'relu') 681 | 682 | elif g_type in ['sigmoid']: 683 | assert u is not None 684 | b = bi(0., 'b') 685 | c = wi(1., 'c') 686 | z_est = self.apply_act((u + b) * c, 'sigmoid') 687 | 688 | elif g_type in ['comparison_g2']: 689 | # sig without the uz cross term 690 | sigval = bi(0., 'c1') + wi(1., 'c2') * z_lat 691 | if u is not None: 692 | sigval += wi(0., 'c3') * u 693 | sigval = T.nnet.sigmoid(sigval) 694 | 695 | z_est = bi(0., 'a1') + wi(1., 'a2') * z_lat + wi(1., 'b1') * sigval 696 | if u is not None: 697 | z_est += wi(0., 'a3') * u 698 | 699 | elif g_type in ['comparison_g3']: 700 | # sig without the sigmoid nonlinearity 701 | z_est = bi(0., 'a1') + wi(1., 'a2') * z_lat 702 | if u is not None: 703 | z_est += wi(0., 'a3') * u + wi(0., 'a4') * z_lat * u 704 | 705 | elif g_type in ['comparison_g4']: 706 | # No mixing between z_lat and u before final sum, otherwise similar 707 | # to sig 708 | def nonlin(inp, in_name='input', add_bias=True): 709 | w1 = wi(1., 'w1_%s' % in_name) 710 | b1 = bi(0., 'b1') 711 | w2 = wi(1., 'w2_%s' % in_name) 712 | b2 = bi(0., 'b2') if add_bias else 0 713 | w3 = wi(0., 'w3_%s' % in_name) 714 | return w2 * T.nnet.sigmoid(b1 + w1 * inp) + w3 * inp + b2 715 | 716 | z_est = nonlin(z_lat, 'lat') if u is None else \ 717 | nonlin(z_lat, 'lat') + nonlin(u, 'ver', False) 718 | 719 | elif g_type in ['comparison_g5', 'gauss']: 720 | # Gaussian assumption on z: (z - mu) * v + mu 721 | if u is None: 722 | b1 = bi(0., 'b1') 723 | w1 = wi(1., 'w1') 724 | z_est = w1 * z_lat + b1 725 | else: 726 | a1 = bi(0., 'a1') 727 | a2 = wi(1., 'a2') 728 | a3 = bi(0., 'a3') 729 | a4 = bi(0., 'a4') 730 | a5 = bi(0., 'a5') 731 | 732 | a6 = bi(0., 'a6') 733 | a7 = wi(1., 'a7') 734 | a8 = bi(0., 'a8') 735 | a9 = bi(0., 'a9') 736 | a10 = bi(0., 'a10') 737 | 738 | mu = a1 * T.nnet.sigmoid(a2 * u + a3) + a4 * u + a5 739 | v = a6 * T.nnet.sigmoid(a7 * u + a8) + a9 * u + a10 740 | 741 | z_est = (z_lat - mu) * v + mu 742 | 743 | else: 744 | raise NotImplementedError("unknown g type: %s" % str(g_type)) 745 | 746 | # Reshape the output if z is for conv but u from fc layer 747 | if (z_est is not None and type(out_dims) == tuple and 748 | len(out_dims) > 1.0 and z_est.ndim < 4): 749 | z_est = z_est.reshape((z_est.shape[0],) + out_dims) 750 | 751 | return z_est 752 | 753 | def g_deconv(self, z_ver, in_dims, out_dims, weight_name, fspec): 754 | """ Inverse operation for each type of f used in convnets """ 755 | f_type, f_dims = fspec 756 | assert z_ver is not None 757 | num_channels = in_dims[0] if in_dims is not None else None 758 | num_filters, width, height = out_dims[:3] 759 | 760 | if f_type in ['globalmeanpool']: 761 | u = T.addbroadcast(z_ver, 2, 3) 762 | assert in_dims[1] == 1 and in_dims[2] == 1, \ 763 | "global pooling needs in_dims (1,1): %s" % str(in_dims) 764 | 765 | elif f_type in ['maxpool']: 766 | sh, str, size = z_ver.shape, f_dims[0], f_dims[1] 767 | assert str == size, "depooling requires stride == size" 768 | u = T.zeros((sh[0], sh[1], sh[2] * str, sh[3] * str), 769 | dtype=z_ver.dtype) 770 | for x in xrange(str): 771 | for y in xrange(str): 772 | u = T.set_subtensor(u[:, :, x::str, y::str], z_ver) 773 | u = u[:, :, :width, :height] 774 | 775 | elif f_type in ['convv', 'convf']: 776 | filter_size, str = (f_dims[1], f_dims[1]), f_dims[2] 777 | W_shape = (num_filters, num_channels) + filter_size 778 | W = self.weight(self.rand_init_conv(W_shape), weight_name) 779 | if str > 1: 780 | # upsample if strided version 781 | sh = z_ver.shape 782 | u = T.zeros((sh[0], sh[1], sh[2] * str, sh[3] * str), 783 | dtype=z_ver.dtype) 784 | u = T.set_subtensor(u[:, :, ::str, ::str], z_ver) 785 | else: 786 | u = z_ver # no strides, only deconv 787 | u = conv2d(u, W, filter_shape=W_shape, 788 | border_mode='valid' if 'convf' in f_type else 'full') 789 | u = u[:, :, :width, :height] 790 | else: 791 | raise NotImplementedError('Layer %s has no convolutional decoder' 792 | % f_type) 793 | 794 | return u 795 | -------------------------------------------------------------------------------- /language-tree.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/udibr/LRE/2571ba133ec8ac276e36074915bfa7d2113e5baa/language-tree.jpg -------------------------------------------------------------------------------- /nn.py: -------------------------------------------------------------------------------- 1 | from collections import OrderedDict 2 | import logging 3 | 4 | import scipy 5 | import numpy as np 6 | from theano import tensor 7 | from theano.tensor.signal.downsample import max_pool_2d, DownsampleFactorMax 8 | 9 | from blocks.extensions import SimpleExtension 10 | from blocks.extensions.monitoring import (DataStreamMonitoring, 11 | MonitoringExtension) 12 | from blocks.filter import VariableFilter 13 | from blocks.graph import ComputationGraph 14 | from blocks.monitoring.evaluators import DatasetEvaluator 15 | from blocks.roles import AuxiliaryRole 16 | 17 | logger = logging.getLogger('main.nn') 18 | 19 | 20 | class BnParamRole(AuxiliaryRole): 21 | pass 22 | 23 | # Batch normalization parameters that have to be replaced when testing 24 | BNPARAM = BnParamRole() 25 | 26 | 27 | class ZCA(object): 28 | def __init__(self, n_components=None, data=None, filter_bias=0.1): 29 | self.filter_bias = np.float32(filter_bias) 30 | self.P = None 31 | self.P_inv = None 32 | self.n_components = 0 33 | self.is_fit = False 34 | if n_components and data: 35 | self.fit(n_components, data) 36 | 37 | def fit(self, n_components, data): 38 | if len(data.shape) == 2: 39 | self.reshape = None 40 | else: 41 | assert n_components == np.product(data.shape[1:]), \ 42 | 'ZCA whitening components should be %d for convolutional data'\ 43 | % np.product(data.shape[1:]) 44 | self.reshape = data.shape[1:] 45 | 46 | data = self._flatten_data(data) 47 | assert len(data.shape) == 2 48 | n, m = data.shape 49 | self.mean = np.mean(data, axis=0) 50 | 51 | bias = self.filter_bias * scipy.sparse.identity(m, 'float32') 52 | cov = np.cov(data, rowvar=0, bias=1) + bias 53 | eigs, eigv = scipy.linalg.eigh(cov) 54 | 55 | assert not np.isnan(eigs).any() 56 | assert not np.isnan(eigv).any() 57 | assert eigs.min() > 0 58 | 59 | if self.n_components: 60 | eigs = eigs[-self.n_components:] 61 | eigv = eigv[:, -self.n_components:] 62 | 63 | sqrt_eigs = np.sqrt(eigs) 64 | self.P = np.dot(eigv * (1.0 / sqrt_eigs), eigv.T) 65 | assert not np.isnan(self.P).any() 66 | self.P_inv = np.dot(eigv * sqrt_eigs, eigv.T) 67 | 68 | self.P = np.float32(self.P) 69 | self.P_inv = np.float32(self.P_inv) 70 | 71 | self.is_fit = True 72 | 73 | def apply(self, data, remove_mean=True): 74 | data = self._flatten_data(data) 75 | d = data - self.mean if remove_mean else data 76 | return self._reshape_data(np.dot(d, self.P)) 77 | 78 | def inv(self, data, add_mean=True): 79 | d = np.dot(self._flatten_data(data), self.P_inv) 80 | d += self.mean if add_mean else 0. 81 | return self._reshape_data(d) 82 | 83 | def _flatten_data(self, data): 84 | if self.reshape is None: 85 | return data 86 | assert data.shape[1:] == self.reshape 87 | return data.reshape(data.shape[0], np.product(data.shape[1:])) 88 | 89 | def _reshape_data(self, data): 90 | assert len(data.shape) == 2 91 | if self.reshape is None: 92 | return data 93 | return np.reshape(data, (data.shape[0],) + self.reshape) 94 | 95 | 96 | class ContrastNorm(object): 97 | def __init__(self, scale=55, epsilon=1e-8): 98 | self.scale = np.float32(scale) 99 | self.epsilon = np.float32(epsilon) 100 | 101 | def apply(self, data, copy=False): 102 | if copy: 103 | data = np.copy(data) 104 | data_shape = data.shape 105 | if len(data.shape) > 2: 106 | data = data.reshape(data.shape[0], np.product(data.shape[1:])) 107 | 108 | assert len(data.shape) == 2, 'Contrast norm on flattened data' 109 | 110 | data -= data.mean(axis=1)[:, np.newaxis] 111 | 112 | norms = np.sqrt(np.sum(data ** 2, axis=1)) / self.scale 113 | norms[norms < self.epsilon] = np.float32(1.) 114 | 115 | data /= norms[:, np.newaxis] 116 | 117 | if data_shape != data.shape: 118 | data = data.reshape(data_shape) 119 | 120 | return data 121 | 122 | 123 | class TestMonitoring(object): 124 | def _get_bn_params(self, output_vars): 125 | # Pick out the nodes with batch normalization vars 126 | cg = ComputationGraph(output_vars) 127 | var_filter = VariableFilter(roles=[BNPARAM]) 128 | bn_ps = var_filter(cg.variables) 129 | 130 | if len(bn_ps) == 0: 131 | logger.warn('No batch normalization parameters found - is' + 132 | ' batch normalization turned off?') 133 | self._bn = False 134 | self._counter = None 135 | self._counter_max = None 136 | bn_share = [] 137 | output_vars_replaced = output_vars 138 | else: 139 | self._bn = True 140 | assert len(set([p.name for p in bn_ps])) == len(bn_ps), \ 141 | 'Some batch norm params have the same name' 142 | logger.info('Batch norm parameters: %s' % ', '.join([p.name for p in bn_ps])) 143 | 144 | # Filter out the shared variables from the model updates 145 | def filter_share(par): 146 | lst = [up for up in cg.updates if up.name == 'shared_%s' % par.name] 147 | assert len(lst) == 1 148 | return lst[0] 149 | bn_share = map(filter_share, bn_ps) 150 | 151 | # Replace the BN coefficients in the test data model - Replace the 152 | # theano variables in the test graph with the shareds 153 | output_vars_replaced = cg.replace(zip(bn_ps, bn_share)).outputs 154 | 155 | # Pick out the counter 156 | self._counter = self._param_from_updates(cg.updates, 'counter') 157 | self._counter_max = self._param_from_updates(cg.updates, 'counter_max') 158 | 159 | return bn_ps, bn_share, output_vars_replaced 160 | 161 | def _param_from_updates(self, updates, p_name): 162 | var_filter = VariableFilter(roles=[BNPARAM]) 163 | bn_ps = var_filter(updates.keys()) 164 | p = [p for p in bn_ps if p.name == p_name] 165 | assert len(p) == 1, 'No %s of more than one %s' % (p_name, p_name) 166 | return p[0] 167 | 168 | def reset_counter(self): 169 | if self._bn: 170 | self._counter.set_value(np.float32(1)) 171 | 172 | def replicate_vars(self, output_vars): 173 | # Problem in Blocks with multiple monitors monitoring the 174 | # same value in a graph. Therefore, they are all "replicated" to a new 175 | # Theano variable 176 | if isinstance(output_vars, (list, tuple)): 177 | return map(self.replicate_vars, output_vars) 178 | assert not hasattr(output_vars.tag, 'aggregation_scheme'), \ 179 | 'The variable %s already has an aggregator ' % output_vars.name + \ 180 | 'assigned to it - are you using a datasetmonitor with the same' + \ 181 | ' variable as output? This might cause trouble in Blocks' 182 | new_var = 1 * output_vars 183 | new_var.name = output_vars.name 184 | return new_var 185 | 186 | 187 | class ApproxTestMonitoring(DataStreamMonitoring, TestMonitoring): 188 | def __init__(self, output_vars, *args, **kwargs): 189 | output_vars = self.replicate_vars(output_vars) 190 | _, _, replaced_vars = self._get_bn_params(output_vars) 191 | super(ApproxTestMonitoring, self).__init__(replaced_vars, *args, 192 | **kwargs) 193 | 194 | def do(self, which_callback, *args, **kwargs): 195 | assert not which_callback == "after_batch", "Do not monitor each mb" 196 | self.reset_counter() 197 | super(ApproxTestMonitoring, self).do(which_callback, *args, **kwargs) 198 | 199 | 200 | class FinalTestMonitoring(SimpleExtension, MonitoringExtension, TestMonitoring): 201 | """Monitors validation and test set data with batch norm 202 | 203 | Calculates the training set statistics for batch normalization and adds 204 | them to the model before calculating the validation and test set values. 205 | This is done in two steps: First the training set is iterated and the 206 | statistics are saved in shared variables, then the model iterates through 207 | the test/validation set using the saved shared variables. 208 | When the training set is iterated, it is done for the full set, layer by 209 | layer so that the statistics are correct. This is expensive for very deep 210 | models, in which case some approximation could be in order 211 | """ 212 | def __init__(self, output_vars, train_data_stream, test_data_stream, 213 | **kwargs): 214 | output_vars = self.replicate_vars(output_vars) 215 | super(FinalTestMonitoring, self).__init__(**kwargs) 216 | self.trn_stream = train_data_stream 217 | self.tst_stream = test_data_stream 218 | 219 | bn_ps, bn_share, output_vars_replaced = self._get_bn_params(output_vars) 220 | 221 | if self._bn: 222 | updates = self._get_updates(bn_ps, bn_share) 223 | trn_evaluator = DatasetEvaluator(bn_ps, updates=updates) 224 | else: 225 | trn_evaluator = None 226 | 227 | self._trn_evaluator = trn_evaluator 228 | self._tst_evaluator = DatasetEvaluator(output_vars_replaced) 229 | 230 | def _get_updates(self, bn_ps, bn_share): 231 | cg = ComputationGraph(bn_ps) 232 | # Only store updates that relate to params or the counter 233 | updates = OrderedDict([(up, cg.updates[up]) for up in 234 | cg.updates if up.name == 'counter' or 235 | up in bn_share]) 236 | assert self._counter == self._param_from_updates(cg.updates, 'counter') 237 | assert self._counter_max == self._param_from_updates(cg.updates, 238 | 'counter_max') 239 | assert len(updates) == len(bn_ps) + 1, \ 240 | 'Counter or var missing from update' 241 | return updates 242 | 243 | def do(self, which_callback, *args): 244 | """Write the values of monitored variables to the log.""" 245 | assert not which_callback == "after_batch", "Do not monitor each mb" 246 | # Run on train data and get the statistics 247 | if self._bn: 248 | self._counter_max.set_value(np.float32(np.inf)) 249 | self.reset_counter() 250 | self._trn_evaluator.evaluate(self.trn_stream) 251 | self.reset_counter() 252 | 253 | value_dict = self._tst_evaluator.evaluate(self.tst_stream) 254 | self.add_records(self.main_loop.log, value_dict.items()) 255 | 256 | 257 | class LRDecay(SimpleExtension): 258 | def __init__(self, lr, decay_first, decay_last, lrmin=0., **kwargs): 259 | super(LRDecay, self).__init__(**kwargs) 260 | self.iter = 0 261 | self.decay_first = decay_first 262 | self.decay_last = decay_last 263 | self.lr = lr 264 | self.lrmin = lrmin 265 | self.lr_init = lr.get_value() 266 | 267 | def do(self, which_callback, *args): 268 | self.iter += 1 269 | if self.iter > self.decay_first: 270 | ratio = 1.0 * (self.decay_last - self.iter) 271 | ratio = np.maximum(0, ratio / (self.decay_last - self.decay_first + 1e-6)) 272 | self.lr.set_value(np.float32(ratio * (self.lr_init - self.lrmin) + self.lrmin)) 273 | logger.info("Iter %d, lr %f" % (self.iter, self.lr.get_value())) 274 | 275 | 276 | def global_meanpool_2d(x, num_filters): 277 | mean = tensor.mean(x.flatten(3), axis=2) 278 | mean = mean.dimshuffle(0, 1, 'x', 'x') 279 | return mean, (num_filters, 1, 1) 280 | 281 | 282 | def pool_2d(x, mode="average", ws=(2, 2), stride=(2, 2)): 283 | import theano.sandbox.cuda as cuda 284 | assert cuda.dnn.dnn_available() 285 | return cuda.dnn.dnn_pool(x, ws=ws, stride=stride, mode=mode) 286 | 287 | 288 | def maxpool_2d(z, in_dim, poolsize, poolstride): 289 | z = max_pool_2d(z, ds=poolsize, st=poolstride) 290 | output_size = tuple(DownsampleFactorMax.out_shape(in_dim, poolsize, 291 | st=poolstride)) 292 | return z, output_size 293 | -------------------------------------------------------------------------------- /run.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import functools 4 | import logging 5 | import os 6 | import subprocess 7 | from argparse import ArgumentParser, Action, SUPPRESS 8 | nodefaultargs = [] 9 | from collections import OrderedDict 10 | import sys 11 | 12 | import numpy 13 | import time 14 | import theano 15 | from theano.tensor.type import TensorType 16 | from pandas import DataFrame 17 | 18 | from blocks.algorithms import GradientDescent, Adam 19 | from blocks.extensions import FinishAfter 20 | from blocks.extensions.monitoring import TrainingDataMonitoring 21 | from blocks.filter import VariableFilter 22 | from blocks.graph import ComputationGraph 23 | from blocks.main_loop import MainLoop 24 | from blocks.model import Model 25 | from blocks.roles import PARAMETER 26 | from fuel.datasets import MNIST, CIFAR10 27 | from fuel.schemes import ShuffledScheme, SequentialScheme 28 | from fuel.streams import DataStream 29 | from fuel.transformers import Transformer 30 | 31 | from picklable_itertools import cycle, imap 32 | from itertools import izip, product, tee 33 | 34 | logger = logging.getLogger('main') 35 | 36 | from utils import ShortPrinting, prepare_dir, load_df, DummyLoop 37 | from utils import SaveExpParams, SaveLog, SaveParams, AttributeDict 38 | from nn import ZCA, ContrastNorm 39 | from nn import ApproxTestMonitoring, FinalTestMonitoring, TestMonitoring 40 | from nn import LRDecay 41 | from ladder import LadderAE 42 | 43 | debug = sys.gettrace() is not None 44 | if debug: 45 | theano.config.optimizer='fast_compile' 46 | theano.config.exception_verbosity='high' 47 | theano.config.compute_test_value = 'warn' 48 | floatX = theano.config.floatX 49 | 50 | class Whitening(Transformer): 51 | """ Makes a copy of the examples in the underlying dataset and whitens it 52 | if necessary. 53 | """ 54 | def __init__(self, data_stream, iteration_scheme, whiten, cnorm=None, 55 | **kwargs): 56 | super(Whitening, self).__init__(data_stream, 57 | iteration_scheme=iteration_scheme, 58 | **kwargs) 59 | data = data_stream.get_data(slice(data_stream.dataset.num_examples)) 60 | self.data = [] 61 | for s, d in zip(self.sources, data): 62 | if 'features' == s: 63 | # Fuel provides Cifar in uint8, convert to float32 64 | d = numpy.require(d, dtype=numpy.float32) 65 | if cnorm is not None: 66 | d = cnorm.apply(d) 67 | if whiten is not None: 68 | d = whiten.apply(d) 69 | self.data += [d] 70 | elif 'targets' == s: 71 | d = unify_labels(d) 72 | self.data += [d] 73 | else: 74 | raise Exception("Unsupported Fuel target: %s" % s) 75 | 76 | def get_data(self, request=None): 77 | return (s[request] for s in self.data) 78 | 79 | 80 | class SemiDataStream(Transformer): 81 | """ Combines two datastreams into one such that 'target' source (labels) 82 | is used only from the first one. The second one is renamed 83 | to avoid collision. Upon iteration, the first one is repeated until 84 | the second one depletes. 85 | """ 86 | def __init__(self, data_stream_labeled, data_stream_unlabeled, **kwargs): 87 | super(Transformer, self).__init__(**kwargs) 88 | self.ds_labeled = data_stream_labeled 89 | self.ds_unlabeled = data_stream_unlabeled 90 | # Rename the sources for clarity 91 | self.ds_labeled.sources = ('features_labeled', 'targets_labeled') 92 | # Rename the source for input pixels and hide its labels! 93 | self.ds_unlabeled.sources = ('features_unlabeled',) 94 | 95 | @property 96 | def sources(self): 97 | if hasattr(self, '_sources'): 98 | return self._sources 99 | return self.ds_labeled.sources + self.ds_unlabeled.sources 100 | 101 | @sources.setter 102 | def sources(self, value): 103 | self._sources = value 104 | 105 | def close(self): 106 | self.ds_labeled.close() 107 | self.ds_unlabeled.close() 108 | 109 | def reset(self): 110 | self.ds_labeled.reset() 111 | self.ds_unlabeled.reset() 112 | 113 | def next_epoch(self): 114 | self.ds_labeled.next_epoch() 115 | self.ds_unlabeled.next_epoch() 116 | 117 | def get_epoch_iterator(self, **kwargs): 118 | unlabeled = self.ds_unlabeled.get_epoch_iterator(**kwargs) 119 | labeled = self.ds_labeled.get_epoch_iterator(**kwargs) 120 | assert type(labeled) == type(unlabeled) 121 | 122 | return imap(self.mergedicts, cycle(labeled), unlabeled) 123 | 124 | def mergedicts(self, x, y): 125 | return dict(list(x.items()) + list(y.items())) 126 | 127 | 128 | def unify_labels(y): 129 | """ Work-around for Fuel bug where MNIST and Cifar-10 130 | datasets have different dimensionalities for the targets: 131 | e.g. (50000, 1) vs (60000,) """ 132 | yshape = y.shape 133 | y = y.flatten() 134 | assert y.shape[0] == yshape[0] 135 | return y 136 | 137 | 138 | def make_datastream(dataset, indices, batch_size, 139 | n_labeled=None, n_unlabeled=None, 140 | balanced_classes=True, whiten=None, cnorm=None, 141 | scheme=ShuffledScheme, dseed=None): 142 | """ 143 | 144 | :param dataset: 145 | :param indices: 146 | :param batch_size: 147 | :param n_labeled: None, int, list 148 | if None or 0 then all indices are used as labeled data. 149 | otherwise only the first n_labeled indices are used as labeled. 150 | If a list then balanced_classes must be true and the list specificy 151 | the number of examples to take from each category. If a category is 152 | too small than samples are repeated 153 | :param n_unlabeled: 154 | :param balanced_classes: 155 | :param whiten: 156 | :param cnorm: 157 | :param scheme: 158 | :return: 159 | """ 160 | if isinstance(n_labeled,tuple): 161 | assert balanced_classes 162 | n_labeled_list = n_labeled if len(n_labeled) > 1 else None 163 | n_labeled = sum(n_labeled) if len(n_labeled) > 0 else 0 164 | else: 165 | n_labeled_list = None 166 | if n_labeled is None or n_labeled <= 0: 167 | n_labeled = len(indices) 168 | if batch_size is None: 169 | batch_size = len(indices) 170 | if n_unlabeled is None or n_unlabeled < 0: 171 | n_unlabeled = len(indices) 172 | assert n_labeled <= n_unlabeled, 'need less labeled than unlabeled' 173 | 174 | all_data = dataset.data_sources[dataset.sources.index('targets')] 175 | y = unify_labels(all_data)[indices] 176 | if len(y): 177 | n_classes = y.max() + 1 178 | assert n_labeled_list is None or len(n_labeled_list) == n_classes 179 | logger.info('#samples %d #class %d' % (len(y),n_classes)) 180 | # for c in range(n_classes): 181 | # c_count = (y == c).sum() 182 | # logger.info('Class %d size %d %f%%' % (c, c_count, float(c_count)/len(y))) 183 | 184 | # Get unlabeled indices 185 | i_unlabeled = indices[:n_unlabeled] 186 | 187 | if balanced_classes and n_labeled < n_unlabeled: 188 | # Ensure each label is equally represented 189 | logger.info('Balancing %d labels...' % n_labeled) 190 | assert n_labeled % n_classes == 0 191 | n_from_each_class = n_labeled / n_classes 192 | 193 | i_labeled = [] 194 | for c in range(n_classes): 195 | n_from_class = n_from_each_class if n_labeled_list is None else n_labeled_list[c] 196 | # if a class does not have enough examples, then duplicate 197 | ids = [] 198 | while len(ids) < n_from_class: 199 | n = n_from_class - len(ids) 200 | i = (i_unlabeled[y[:n_unlabeled] == c])[:n] 201 | ids += list(i) 202 | i_labeled += ids 203 | # no need to shuffle the samples because latter 204 | # ds=SemiDataStream(...,iteration_scheme=ShuffledScheme,...) 205 | else: 206 | i_labeled = indices[:n_labeled] 207 | 208 | ds = SemiDataStream( 209 | data_stream_labeled=Whitening( 210 | DataStream(dataset), 211 | iteration_scheme=scheme(i_labeled, batch_size), 212 | whiten=whiten, cnorm=cnorm), 213 | data_stream_unlabeled=Whitening( 214 | DataStream(dataset), 215 | iteration_scheme=scheme(i_unlabeled, batch_size), 216 | whiten=whiten, cnorm=cnorm) 217 | ) 218 | return ds 219 | 220 | 221 | def setup_model(p): 222 | ladder = LadderAE(p) 223 | # Setup inputs 224 | input_type = TensorType('float32', [False] * (len(p.encoder_layers[0]) + 1)) 225 | x_only = input_type('features_unlabeled') 226 | if debug: 227 | x_only.tag.test_value = numpy.random.normal(size=(p.batch_size,)+p.encoder_layers[0]).astype(floatX) 228 | x = input_type('features_labeled') 229 | if debug: 230 | x.tag.test_value = numpy.random.normal(size=(p.batch_size,)+p.encoder_layers[0]).astype(floatX) 231 | y = theano.tensor.lvector('targets_labeled') 232 | if debug: 233 | y.tag.test_value = numpy.random.randint(1,int(p.encoder_layers[-1])+1,(p.batch_size)) 234 | ladder.apply(x, y, x_only) 235 | 236 | # Load parameters if requested 237 | if p.get('load_from'): 238 | with open(p.load_from + '/trained_params.npz') as f: 239 | loaded = numpy.load(f) 240 | cg = ComputationGraph([ladder.costs.total]) 241 | current_params = VariableFilter(roles=[PARAMETER])(cg.variables) 242 | logger.info('Loading parameters: %s' % ', '.join(loaded.keys())) 243 | for param in current_params: 244 | assert param.get_value().shape == loaded[param.name].shape 245 | param.set_value(loaded[param.name]) 246 | 247 | return ladder 248 | 249 | 250 | def load_and_log_params(cli_params): 251 | cli_params = AttributeDict(cli_params) 252 | if cli_params.get('load_from'): 253 | p = load_df(cli_params.load_from, 'params').to_dict()[0] 254 | p = AttributeDict(p) 255 | for key in cli_params.iterkeys(): 256 | if key not in p: 257 | p[key] = None 258 | new_params = cli_params 259 | loaded = True 260 | else: 261 | p = cli_params 262 | new_params = {} 263 | loaded = False 264 | 265 | # Make dseed seed unless specified explicitly 266 | if p.get('dseed') is None and p.get('seed') is not None: 267 | p['dseed'] = p['seed'] 268 | 269 | logger.info('== COMMAND LINE ==') 270 | logger.info(' '.join(sys.argv)) 271 | 272 | logger.info('== PARAMETERS ==') 273 | for k, v in p.iteritems(): 274 | replace_str = "" 275 | if loaded: 276 | if k in nodefaultargs: 277 | p[k] = new_params[k] 278 | replace_str = "<- " + str(new_params.get(k)) 279 | elif p.get(k) is None and new_params.get(k) is not None: 280 | p[k] = new_params[k] 281 | replace_str = "<- " + str(new_params.get(k)) 282 | else: 283 | if new_params.get(k) is not None: 284 | p[k] = new_params[k] 285 | replace_str = "<- " + str(new_params.get(k)) 286 | logger.info(" {:20}: {:<20} {}".format(k, v, replace_str)) 287 | return p, loaded 288 | 289 | 290 | def setup_data(p, test_set=False): 291 | if p.dataset in ['cifar10','mnist']: 292 | dataset_class, training_set_size = { 293 | 'cifar10': (CIFAR10, 40000), 294 | 'mnist': (MNIST, 50000), 295 | }[p.dataset] 296 | else: 297 | from fuel.datasets import H5PYDataset 298 | from fuel.utils import find_in_data_path 299 | from functools import partial 300 | fn=p.dataset 301 | fn=os.path.join(fn, fn + '.hdf5') 302 | def dataset_class(which_sets): 303 | return H5PYDataset(file_or_path=find_in_data_path(fn), 304 | which_sets=which_sets, 305 | load_in_memory=True) 306 | training_set_size = None 307 | 308 | train_set = dataset_class(["train"]) 309 | 310 | # Allow overriding the default from command line 311 | if p.get('unlabeled_samples') is not None and p.unlabeled_samples >= 0: 312 | training_set_size = p.unlabeled_samples 313 | elif training_set_size is None: 314 | training_set_size = train_set.num_examples 315 | 316 | # Make sure the MNIST data is in right format 317 | if p.dataset == 'mnist': 318 | d = train_set.data_sources[train_set.sources.index('features')] 319 | assert numpy.all(d <= 1.0) and numpy.all(d >= 0.0), \ 320 | 'Make sure data is in float format and in range 0 to 1' 321 | 322 | # Take all indices and permutate them 323 | all_ind = numpy.arange(train_set.num_examples) 324 | if p.get('dseed'): 325 | rng = numpy.random.RandomState(seed=p.dseed) 326 | rng.shuffle(all_ind) 327 | 328 | d = AttributeDict() 329 | 330 | # Choose the training set 331 | d.train = train_set 332 | d.train_ind = all_ind[:training_set_size] 333 | 334 | # Then choose validation set from the remaining indices 335 | d.valid = train_set 336 | d.valid_ind = numpy.setdiff1d(all_ind, d.train_ind)[:p.valid_set_size] 337 | logger.info('Using %d examples for validation' % len(d.valid_ind)) 338 | 339 | # Only touch test data if requested 340 | if test_set: 341 | d.test = dataset_class(["test"]) 342 | d.test_ind = numpy.arange(d.test.num_examples) 343 | 344 | # Setup optional whitening, only used for Cifar-10 345 | in_dim = train_set.data_sources[train_set.sources.index('features')].shape[1:] 346 | if len(in_dim) > 1 and p.whiten_zca > 0: 347 | assert numpy.product(in_dim) == p.whiten_zca, \ 348 | 'Need %d whitening dimensions, not %d' % (numpy.product(in_dim), 349 | p.whiten_zca) 350 | cnorm = ContrastNorm(p.contrast_norm) if p.contrast_norm != 0 else None 351 | 352 | def get_data(d, i): 353 | data = d.get_data(request=i)[d.sources.index('features')] 354 | # Fuel provides Cifar in uint8, convert to float32 355 | data = numpy.require(data, dtype=numpy.float32) 356 | return data if cnorm is None else cnorm.apply(data) 357 | 358 | if p.whiten_zca > 0: 359 | logger.info('Whitening using %d ZCA components' % p.whiten_zca) 360 | whiten = ZCA() 361 | whiten.fit(p.whiten_zca, get_data(d.train, d.train_ind)) 362 | else: 363 | whiten = None 364 | 365 | return in_dim, d, whiten, cnorm 366 | 367 | 368 | def get_error(args): 369 | """ Calculate the classification error 370 | called when evaluating 371 | """ 372 | args['data_type'] = args.get('data_type', 'test') 373 | args['no_load'] = 'g_' 374 | 375 | targets, acts = analyze(args) 376 | guess = numpy.argmax(acts, axis=1) 377 | correct = numpy.sum(numpy.equal(guess, targets.flatten())) 378 | 379 | return (1. - correct / float(len(guess))) * 100. 380 | 381 | 382 | def get_layer(args): 383 | """ Get the output of the layer just below softmax 384 | """ 385 | args['data_type'] = args.get('data_type', 'test') 386 | args['no_load'] = 'g_' 387 | args['layer'] = args.get('layer', -1) 388 | 389 | targets, acts = analyze(args) 390 | 391 | return acts 392 | 393 | 394 | def analyze(cli_params): 395 | """ 396 | called when evaluating 397 | :return: inputs, result 398 | """ 399 | p, _ = load_and_log_params(cli_params) 400 | _, data, whiten, cnorm = setup_data(p, test_set=(p.data_type == 'test')) 401 | ladder = setup_model(p) 402 | 403 | # Analyze activations 404 | if p.data_type == 'train': 405 | dset, indices, calc_batchnorm = data.train, data.train_ind, False 406 | elif p.data_type == 'valid': 407 | dset, indices, calc_batchnorm = data.valid, data.valid_ind, True 408 | elif p.data_type == 'test': 409 | dset, indices, calc_batchnorm = data.test, data.test_ind, True 410 | else: 411 | raise Exception("Unknown data-type %s"%p.data_type) 412 | 413 | if calc_batchnorm: 414 | logger.info('Calculating batch normalization for clean.labeled path') 415 | main_loop = DummyLoop( 416 | extensions=[ 417 | FinalTestMonitoring( 418 | [ladder.costs.class_clean, ladder.error.clean, ladder.oos.clean] 419 | + ladder.costs.denois.values(), 420 | make_datastream(data.train, data.train_ind, 421 | # These need to match with the training 422 | p.batch_size, 423 | n_labeled=p.labeled_samples, 424 | n_unlabeled=len(data.train_ind), 425 | cnorm=cnorm, 426 | balanced_classes=p.balanced_classes, 427 | whiten=whiten, scheme=ShuffledScheme), 428 | make_datastream(data.valid, data.valid_ind, 429 | p.valid_batch_size, 430 | n_labeled=len(data.valid_ind), 431 | n_unlabeled=len(data.valid_ind), 432 | balanced_classes=p.balanced_classes, 433 | cnorm=cnorm, 434 | whiten=whiten, scheme=ShuffledScheme), 435 | prefix="valid_final", before_training=True), 436 | ShortPrinting({ 437 | "valid_final": OrderedDict([ 438 | ('VF_C_class', ladder.costs.class_clean), 439 | ('VF_E', ladder.error.clean), 440 | ('VF_O', ladder.oos.clean), 441 | ('VF_C_de', [ladder.costs.denois.get(0), 442 | ladder.costs.denois.get(1), 443 | ladder.costs.denois.get(2), 444 | ladder.costs.denois.get(3)]), 445 | ]), 446 | }, after_training=True, use_log=False), 447 | ]) 448 | main_loop.run() 449 | # df = DataFrame.from_dict(main_loop.log, orient='index') 450 | # col = 'valid_final_error_rate_clean' 451 | # logger.info('%s %g' % (col, df[col].iloc[-1])) 452 | 453 | # Make a datastream that has all the indices in the labeled pathway 454 | ds = make_datastream(dset, indices, 455 | batch_size=p.get('batch_size'), 456 | n_labeled=len(indices), 457 | n_unlabeled=len(indices), 458 | balanced_classes=False, 459 | whiten=whiten, 460 | cnorm=cnorm, 461 | scheme=SequentialScheme) 462 | 463 | # If layer=-1 we want out the values after softmax 464 | outputs = ladder.act.clean.labeled.h[len(ladder.layers) - 1] 465 | 466 | # Replace the batch normalization paramameters with the shared variables 467 | if calc_batchnorm: 468 | outputreplacer = TestMonitoring() 469 | _, _, outputs = outputreplacer._get_bn_params(outputs) 470 | 471 | cg = ComputationGraph(outputs) 472 | f = cg.get_theano_function() 473 | 474 | it = ds.get_epoch_iterator(as_dict=True) 475 | res = [] 476 | inputs = {'features_labeled': [], 477 | 'targets_labeled': [], 478 | 'features_unlabeled': []} 479 | # Loop over one epoch 480 | for d in it: 481 | # Store all inputs 482 | for k, v in d.iteritems(): 483 | inputs[k] += [v] 484 | # Store outputs 485 | res += [f(*[d[str(inp)] for inp in cg.inputs])] 486 | 487 | # Concatenate all minibatches 488 | res = [numpy.vstack(minibatches) for minibatches in zip(*res)] 489 | inputs = {k: numpy.concatenate(v) for k, v in inputs.iteritems()} 490 | 491 | return inputs['targets_labeled'], res[0] 492 | 493 | def dump_unlabeled_encoder(cli_params): 494 | """ 495 | called when dumping 496 | :return: inputs, result 497 | """ 498 | p, _ = load_and_log_params(cli_params) 499 | _, data, whiten, cnorm = setup_data(p, test_set=(p.data_type == 'test')) 500 | ladder = setup_model(p) 501 | 502 | # Analyze activations 503 | if p.data_type == 'train': 504 | dset, indices, calc_batchnorm = data.train, data.train_ind, False 505 | elif p.data_type == 'valid': 506 | dset, indices, calc_batchnorm = data.valid, data.valid_ind, True 507 | elif p.data_type == 'test': 508 | dset, indices, calc_batchnorm = data.test, data.test_ind, True 509 | else: 510 | raise Exception("Unknown data-type %s"%p.data_type) 511 | 512 | if calc_batchnorm: 513 | logger.info('Calculating batch normalization for clean.labeled path') 514 | main_loop = DummyLoop( 515 | extensions=[ 516 | FinalTestMonitoring( 517 | [ladder.costs.class_clean, ladder.error.clean, ladder.oos.clean] 518 | + ladder.costs.denois.values(), 519 | make_datastream(data.train, data.train_ind, 520 | # These need to match with the training 521 | p.batch_size, 522 | n_labeled=p.labeled_samples, 523 | n_unlabeled=len(data.train_ind), 524 | balanced_classes=p.balanced_classes, 525 | cnorm=cnorm, 526 | whiten=whiten, scheme=ShuffledScheme), 527 | make_datastream(data.valid, data.valid_ind, 528 | p.valid_batch_size, 529 | n_labeled=len(data.valid_ind), 530 | n_unlabeled=len(data.valid_ind), 531 | balanced_classes=p.balanced_classes, 532 | cnorm=cnorm, 533 | whiten=whiten, scheme=ShuffledScheme), 534 | prefix="valid_final", before_training=True), 535 | ShortPrinting({ 536 | "valid_final": OrderedDict([ 537 | ('VF_C_class', ladder.costs.class_clean), 538 | ('VF_E', ladder.error.clean), 539 | ('VF_O', ladder.oos.clean), 540 | ('VF_C_de', [ladder.costs.denois.get(0), 541 | ladder.costs.denois.get(1), 542 | ladder.costs.denois.get(2), 543 | ladder.costs.denois.get(3)]), 544 | ]), 545 | }, after_training=True, use_log=False), 546 | ]) 547 | main_loop.run() 548 | 549 | all_ind = numpy.arange(dset.num_examples) 550 | # Make a datastream that has all the indices in the labeled pathway 551 | ds = make_datastream(dset, all_ind, 552 | batch_size=p.get('batch_size'), 553 | n_labeled=len(all_ind), 554 | n_unlabeled=len(all_ind), 555 | balanced_classes=False, 556 | whiten=whiten, 557 | cnorm=cnorm, 558 | scheme=SequentialScheme) 559 | 560 | # If layer=-1 we want out the values after softmax 561 | if p.layer < 0: 562 | # ladder.act.clean.unlabeled.h is a dict not a list 563 | outputs = ladder.act.clean.labeled.h[len(ladder.layers) + p.layer] 564 | else: 565 | outputs = ladder.act.clean.labeled.h[p.layer] 566 | 567 | # Replace the batch normalization paramameters with the shared variables 568 | if calc_batchnorm: 569 | outputreplacer = TestMonitoring() 570 | _, _, outputs = outputreplacer._get_bn_params(outputs) 571 | 572 | cg = ComputationGraph(outputs) 573 | f = cg.get_theano_function() 574 | 575 | it = ds.get_epoch_iterator(as_dict=True) 576 | res = [] 577 | 578 | # Loop over one epoch 579 | for d in it: 580 | # Store outputs 581 | res += [f(*[d[str(inp)] for inp in cg.inputs])] 582 | 583 | # Concatenate all minibatches 584 | res = [numpy.vstack(minibatches) for minibatches in zip(*res)] 585 | 586 | return res[0] 587 | 588 | 589 | def train(cli_params): 590 | fn = 'noname' 591 | if 'save_to' in nodefaultargs or not cli_params.get('load_from'): 592 | fn = cli_params['save_to'] 593 | cli_params['save_dir'] = prepare_dir(fn) 594 | nodefaultargs.append('save_dir') 595 | 596 | logfile = os.path.join(cli_params['save_dir'], 'log.txt') 597 | 598 | # Log also DEBUG to a file 599 | fh = logging.FileHandler(filename=logfile) 600 | fh.setLevel(logging.DEBUG) 601 | logger.addHandler(fh) 602 | 603 | logger.info('Logging into %s' % logfile) 604 | 605 | p, loaded = load_and_log_params(cli_params) 606 | 607 | in_dim, data, whiten, cnorm = setup_data(p, test_set=False) 608 | if not loaded: 609 | # Set the zero layer to match input dimensions 610 | p.encoder_layers = (in_dim,) + p.encoder_layers 611 | 612 | ladder = setup_model(p) 613 | 614 | # Training 615 | all_params = ComputationGraph([ladder.costs.total]).parameters 616 | logger.info('Found the following parameters: %s' % str(all_params)) 617 | 618 | # Fetch all batch normalization updates. They are in the clean path. 619 | # you can turn off BN by setting is_normalizing = False in ladder.py 620 | bn_updates = ComputationGraph([ladder.costs.class_clean]).updates 621 | assert not bn_updates or 'counter' in [u.name for u in bn_updates.keys()], \ 622 | 'No batch norm params in graph - the graph has been cut?' 623 | 624 | training_algorithm = GradientDescent( 625 | cost=ladder.costs.total, parameters=all_params, 626 | step_rule=Adam(learning_rate=ladder.lr)) 627 | # In addition to actual training, also do BN variable approximations 628 | if bn_updates: 629 | training_algorithm.add_updates(bn_updates) 630 | 631 | short_prints = { 632 | "train": OrderedDict([ 633 | ('T_E', ladder.error.clean), 634 | ('T_O', ladder.oos.clean), 635 | ('T_C_class', ladder.costs.class_corr), 636 | ('T_C_de', ladder.costs.denois.values()), 637 | ('T_T', ladder.costs.total), 638 | ]), 639 | "valid_approx": OrderedDict([ 640 | ('V_C_class', ladder.costs.class_clean), 641 | ('V_E', ladder.error.clean), 642 | ('V_O', ladder.oos.clean), 643 | ('V_C_de', ladder.costs.denois.values()), 644 | ('V_T', ladder.costs.total), 645 | ]), 646 | "valid_final": OrderedDict([ 647 | ('VF_C_class', ladder.costs.class_clean), 648 | ('VF_E', ladder.error.clean), 649 | ('VF_O', ladder.oos.clean), 650 | ('VF_C_de', ladder.costs.denois.values()), 651 | ('V_T', ladder.costs.total), 652 | ]), 653 | } 654 | 655 | if len(data.valid_ind): 656 | main_loop = MainLoop( 657 | training_algorithm, 658 | # Datastream used for training 659 | make_datastream(data.train, data.train_ind, 660 | p.batch_size, 661 | n_labeled=p.labeled_samples, 662 | n_unlabeled=p.unlabeled_samples, 663 | whiten=whiten, 664 | cnorm=cnorm, 665 | balanced_classes=p.balanced_classes, 666 | dseed=p.dseed), 667 | model=Model(ladder.costs.total), 668 | extensions=[ 669 | FinishAfter(after_n_epochs=p.num_epochs), 670 | 671 | # This will estimate the validation error using 672 | # running average estimates of the batch normalization 673 | # parameters, mean and variance 674 | ApproxTestMonitoring( 675 | [ladder.costs.class_clean, ladder.error.clean, ladder.oos.clean, ladder.costs.total] 676 | + ladder.costs.denois.values(), 677 | make_datastream(data.valid, data.valid_ind, 678 | p.valid_batch_size, whiten=whiten, cnorm=cnorm, 679 | balanced_classes=p.balanced_classes, 680 | scheme=ShuffledScheme), 681 | prefix="valid_approx"), 682 | 683 | # This Monitor is slower, but more accurate since it will first 684 | # estimate batch normalization parameters from training data and 685 | # then do another pass to calculate the validation error. 686 | FinalTestMonitoring( 687 | [ladder.costs.class_clean, ladder.error.clean, ladder.oos.clean, ladder.costs.total] 688 | + ladder.costs.denois.values(), 689 | make_datastream(data.train, data.train_ind, 690 | p.batch_size, 691 | n_labeled=p.labeled_samples, 692 | whiten=whiten, cnorm=cnorm, 693 | balanced_classes=p.balanced_classes, 694 | scheme=ShuffledScheme), 695 | make_datastream(data.valid, data.valid_ind, 696 | p.valid_batch_size, 697 | n_labeled=len(data.valid_ind), 698 | whiten=whiten, cnorm=cnorm, 699 | balanced_classes=p.balanced_classes, 700 | scheme=ShuffledScheme), 701 | prefix="valid_final", 702 | after_n_epochs=p.num_epochs, after_training=True), 703 | 704 | TrainingDataMonitoring( 705 | [ladder.error.clean, ladder.oos.clean, ladder.costs.total, ladder.costs.class_corr, 706 | training_algorithm.total_gradient_norm] 707 | + ladder.costs.denois.values(), 708 | prefix="train", after_epoch=True), 709 | # ladder.costs.class_clean - save model whenever we have best validation result another option `('train',ladder.costs.total)` 710 | SaveParams(('valid_approx', ladder.error.clean), all_params, p.save_dir, after_epoch=True), 711 | SaveExpParams(p, p.save_dir, before_training=True), 712 | SaveLog(p.save_dir, after_training=True), 713 | ShortPrinting(short_prints), 714 | LRDecay(ladder.lr, p.num_epochs * p.lrate_decay, p.num_epochs, lrmin=p.lrmin, 715 | after_epoch=True), 716 | ]) 717 | else: 718 | main_loop = MainLoop( 719 | training_algorithm, 720 | # Datastream used for training 721 | make_datastream(data.train, data.train_ind, 722 | p.batch_size, 723 | n_labeled=p.labeled_samples, 724 | n_unlabeled=p.unlabeled_samples, 725 | whiten=whiten, 726 | cnorm=cnorm, 727 | balanced_classes=p.balanced_classes, 728 | dseed=p.dseed), 729 | model=Model(ladder.costs.total), 730 | extensions=[ 731 | FinishAfter(after_n_epochs=p.num_epochs), 732 | TrainingDataMonitoring( 733 | [ladder.error.clean, ladder.oos.clean, ladder.costs.total, ladder.costs.class_corr, 734 | training_algorithm.total_gradient_norm] 735 | + ladder.costs.denois.values(), 736 | prefix="train", after_epoch=True), 737 | # ladder.costs.class_clean - save model whenever we have best validation result another option `('train',ladder.costs.total)` 738 | SaveParams(('train', ladder.error.clean), all_params, p.save_dir, after_epoch=True), 739 | SaveExpParams(p, p.save_dir, before_training=True), 740 | SaveLog(p.save_dir, after_training=True), 741 | ShortPrinting(short_prints), 742 | LRDecay(ladder.lr, p.num_epochs * p.lrate_decay, p.num_epochs, lrmin=p.lrmin, 743 | after_epoch=True), 744 | ]) 745 | main_loop.run() 746 | 747 | # Get results 748 | if len(data.valid_ind) == 0 : 749 | return None 750 | 751 | df = DataFrame.from_dict(main_loop.log, orient='index') 752 | col = 'valid_final_error_rate_clean' 753 | logger.info('%s %g' % (col, df[col].iloc[-1])) 754 | 755 | if main_loop.log.status['epoch_interrupt_received']: 756 | return None 757 | return df 758 | 759 | if __name__ == "__main__": 760 | logging.basicConfig(level=logging.INFO) 761 | 762 | rep = lambda s: s.replace('-', ',') 763 | chop = lambda s: s.split(',') 764 | to_int = lambda ss: [int(s) for s in ss if s.isdigit()] 765 | to_float = lambda ss: [float(s) for s in ss] 766 | 767 | def to_bool(s): 768 | if s.lower() in ['true', 't']: 769 | return True 770 | elif s.lower() in ['false', 'f']: 771 | return False 772 | else: 773 | raise Exception("Unknown bool value %s" % s) 774 | 775 | def compose(*funs): 776 | return functools.reduce(lambda f, g: lambda x: f(g(x)), funs) 777 | 778 | # Functional parsing logic to allow flexible function compositions 779 | # as actions for ArgumentParser 780 | def funcs(additional_arg): 781 | class customAction(Action): 782 | def __call__(self, parser, args, values, option_string=None): 783 | 784 | def process(arg, func_list): 785 | if arg is None: 786 | return None 787 | elif type(arg) is list: 788 | return map(compose(*func_list), arg) 789 | else: 790 | return compose(*func_list)(arg) 791 | 792 | setattr(args, self.dest, process(values, additional_arg)) 793 | return customAction 794 | 795 | def add_train_params(parser, use_defaults): 796 | a = parser.add_argument 797 | default = lambda x: x if use_defaults else None 798 | 799 | # General hyper parameters and settings 800 | a("save_to", help="Destination to save the state and results", 801 | default=default("noname"), nargs="?") 802 | a("--num-epochs", help="Number of training epochs", 803 | type=int, default=default(150)) 804 | a("--seed", help="Seed", 805 | type=int, default=default([1]), nargs='+') 806 | a("--dseed", help="Data permutation seed, defaults to 'seed'", 807 | type=int, default=default([None]), nargs='+') 808 | a("--labeled-samples", help="How many supervised samples are used. " 809 | "By default all indices are used as labeled data. " 810 | "If a number is given then only the first samples are used as labeled. " 811 | "If a list is given then the list specificy the number of samples to " 812 | "take from each category and if a category is too small than samples " 813 | "are repeated", 814 | type=str, default=default(None), nargs='+', action=funcs([tuple, to_int, chop])) 815 | a("--unlabeled-samples", help="How many unsupervised samples are used", 816 | type=int, default=default(None), nargs='+') 817 | a("--dataset", type=str, default=default(['mnist']), nargs='+', 818 | help="Which dataset to use. mnist, cifar10 or your own hdf5") 819 | a("--lr", help="Initial learning rate", 820 | type=float, default=default([0.002]), nargs='+') 821 | a("--lrmin", help="minimal learning rate", 822 | type=float, default=default([0.]), nargs='+') 823 | a("--lrate-decay", help="When to linearly start decaying lrate (0-1)", 824 | type=float, default=default([0.67]), nargs='+') 825 | a("--alpha", 826 | type=float, default=default([0.]), nargs='+', 827 | help='Weight of self-entropy cost applied to corrupted predictions') 828 | a("--alpha-clean", 829 | type=float, default=default([0.]), nargs='+', 830 | help='Weight of self-entropy cost applied to clean predictions') 831 | a("--beta", help="Weight of cross entropy cost between aprior and average", 832 | type=float, default=default([0.15]), nargs='+') 833 | a("--dbeta", help="Dirichlet correction", 834 | type=float, default=default([0.]), nargs='+') 835 | a("--gamma", help="Weight of binary classifier cost", 836 | type=float, default=default([0.01]), nargs='+') 837 | a("--gamma1", help="", 838 | type=float, default=default([-1.]), nargs='+') 839 | a("--batch-size", help="Minibatch size", 840 | type=int, default=default([100]), nargs='+') 841 | a("--valid-batch-size", help="Minibatch size for validation data", 842 | type=int, default=default([100]), nargs='+') 843 | a("--valid-set-size", help="Upper limit on number of examples in " 844 | "validation set, taken from the examples " 845 | "not used in unlabeled samples", 846 | type=int, default=default([10000]), nargs='+') 847 | 848 | # Hyperparameters controlling supervised path 849 | a("--super-noise-std", help="Noise added to supervised learning path", 850 | type=float, default=default([0.3]), nargs='+') 851 | a("--f-local-noise-std", help="Noise added encoder path", 852 | type=str, default=default([0.3]), nargs='+', 853 | action=funcs([tuple, to_float, chop])) 854 | a("--act", nargs='+', type=str, action=funcs([tuple, chop, rep]), 855 | default=default(["relu"]), help="List of activation functions") 856 | a("--encoder-layers", help="List of layers for f", 857 | type=str, action=funcs([tuple, chop, rep])) #default=default(()), 858 | 859 | # Hyperparameters controlling unsupervised training 860 | a("--denoising-cost-x", help="Weight of the denoising cost.", 861 | type=str, default=default([(0.,)]), nargs='+', 862 | action=funcs([tuple, to_float, chop])) 863 | a("--decoder-spec", help="List of decoding function types", nargs='+', 864 | type=str, default=default(['sig']), action=funcs([tuple, chop, rep])) 865 | a("--zestbn", type=str, default=default(['bugfix']), nargs='+', 866 | choices=['bugfix', 'no'], help="How to do zest bn") 867 | 868 | # Hyperparameters used for Cifar training 869 | a("--contrast-norm", help="Scale of contrast normalization (0=off)", 870 | type=int, default=default([0]), nargs='+') 871 | a("--top-c", help="Have c at softmax?", action=funcs([to_bool]), 872 | default=default([True]), nargs='+') 873 | a("--whiten-zca", help="Whether to whiten the data with ZCA", 874 | type=int, default=default([0]), nargs='+') 875 | a('--load_from', type=str, 876 | help="Destination to load the state from") 877 | a("--oos-thr", help="Minimal probability for maximal label, below which label is assumed to be OOS", 878 | type=float, default=default([0.]), nargs='+') 879 | a("-C", "--balanced_classes", 880 | help="DONT balance classes, relevant if labeled-samples < unlabeled-samples", 881 | action='store_false', 882 | default=True) 883 | 884 | ap = ArgumentParser("Semisupervised experiment") 885 | subparsers = ap.add_subparsers(dest='cmd', help='sub-command help') 886 | 887 | # TRAIN 888 | train_cmd = subparsers.add_parser('train', help='Train a new model') 889 | add_train_params(train_cmd, use_defaults=True) 890 | 891 | # EVALUATE 892 | load_cmd = subparsers.add_parser('evaluate', help='Evaluate test error') 893 | load_cmd.add_argument('load_from', type=str, 894 | help="Destination to load the state from") 895 | load_cmd.add_argument('--data-type', type=str, default='test', 896 | help="Data set to evaluate on") 897 | load_cmd.add_argument("-C", "--balanced_classes", 898 | help="DONT balance classes, relevant if labeled-samples < unlabeled-samples", 899 | action='store_false', 900 | default=True) 901 | 902 | # DUMP 903 | dump_cmd = subparsers.add_parser('dump', help='Store the output of an encoder layer for all inputs') 904 | dump_cmd.add_argument('load_from', type=str, 905 | help="Destination to load the state from, and where to save the dump") 906 | # dump_cmd.add_argument("--dataset", type=str, default=default(['mnist']), nargs='+', 907 | # help="Which dataset to use. mnist, cifar10 or your own hdf5") 908 | dump_cmd.add_argument('--data-type', type=str, default='test', 909 | help="Data set to evaluate on") 910 | dump_cmd.add_argument("--layer", type=int, default=-1, 911 | help="which layer to dump (default top)") 912 | dump_cmd.add_argument("--super-noise-std", help="Noise added to supervised learning path", 913 | type=float, default=0.3) 914 | dump_cmd.add_argument("--f-local-noise-std", help="Noise added encoder path", 915 | type=str, default=0.3, nargs='+', 916 | action=funcs([tuple, to_float, chop])) 917 | dump_cmd.add_argument("-C", "--balanced_classes", 918 | help="DONT balance classes, relevant if labeled-samples < unlabeled-samples", 919 | action='store_false', 920 | default=True) 921 | args = ap.parse_args() 922 | 923 | if args.load_from: 924 | ap.set_defaults(**dict((k,None) for k in vars(args).iterkeys())) 925 | nodefaultargs = [k for k,v in vars(ap.parse_args()).iteritems() if v is not None] 926 | # dump the entire data-set. Override values loaded from the saved state 927 | # if args.cmd == 'dump': 928 | # args.labeled_samples = -1 929 | # args.unlabeled_samples = -1 930 | # args.super_noise_std = 0. 931 | # args.f_local_noise_std = 0. 932 | 933 | subp = subprocess.Popen(['git', 'rev-parse', 'HEAD'], 934 | stdin=subprocess.PIPE, stdout=subprocess.PIPE, 935 | stderr=subprocess.PIPE) 936 | out, err = subp.communicate() 937 | args.commit = out.strip() 938 | if err.strip(): 939 | logger.error('Subprocess returned %s' % err.strip()) 940 | 941 | t_start = time.time() 942 | if args.cmd == 'evaluate': 943 | for k, v in vars(args).iteritems(): 944 | if type(v) is list: 945 | assert len(v) == 1, "should not be a list when loading: %s" % k 946 | logger.info("%s" % str(v[0])) 947 | vars(args)[k] = v[0] 948 | 949 | err = get_error(vars(args)) 950 | logger.info('Test error: %f' % err) 951 | elif args.cmd == 'dump': 952 | layer = dump_unlabeled_encoder(vars(args)) 953 | fname = os.path.join(args.load_from,'layer%d'%args.layer) 954 | logger.info("Saving dump to %s" % fname) 955 | numpy.save(fname, layer) 956 | elif args.cmd == "train": 957 | listdicts = {k: v for k, v in vars(args).iteritems() if type(v) is list} 958 | therest = {k: v for k, v in vars(args).iteritems() if type(v) is not list} 959 | 960 | gen1, gen2 = tee(product(*listdicts.itervalues())) 961 | 962 | l = len(list(gen1)) 963 | for i, d in enumerate(dict(izip(listdicts, x)) for x in gen2): 964 | if l > 1: 965 | logger.info('Training configuration %d / %d' % (i+1, l)) 966 | d.update(therest) 967 | if train(d) is None: 968 | break 969 | logger.info('Took %.1f minutes' % ((time.time() - t_start) / 60.)) 970 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import os 2 | 3 | import logging 4 | import numpy as np 5 | import theano 6 | from pandas import DataFrame, read_hdf 7 | 8 | from blocks.extensions import Printing, SimpleExtension 9 | from blocks.main_loop import MainLoop 10 | from blocks.roles import add_role 11 | 12 | import sys 13 | debug = sys.gettrace() is not None 14 | 15 | logger = logging.getLogger('main.utils') 16 | 17 | 18 | def shared_param(init, name, cast_float32, role, **kwargs): 19 | if cast_float32: 20 | v = np.float32(init) 21 | p = theano.shared(v, name=name, **kwargs) 22 | if debug: 23 | p.tag.test_value = v 24 | add_role(p, role) 25 | return p 26 | 27 | 28 | class AttributeDict(dict): 29 | __getattr__ = dict.__getitem__ 30 | 31 | def __setattr__(self, a, b): 32 | self.__setitem__(a, b) 33 | 34 | 35 | class DummyLoop(MainLoop): 36 | def __init__(self, extensions): 37 | return super(DummyLoop, self).__init__(algorithm=None, 38 | data_stream=None, 39 | extensions=extensions) 40 | 41 | def run(self): 42 | for extension in self.extensions: 43 | extension.main_loop = self 44 | self._run_extensions('before_training') 45 | self._run_extensions('after_training') 46 | 47 | 48 | class ShortPrinting(Printing): 49 | def __init__(self, to_print, use_log=True, **kwargs): 50 | self.to_print = to_print 51 | self.use_log = use_log 52 | super(ShortPrinting, self).__init__(**kwargs) 53 | 54 | def do(self, which_callback, *args): 55 | log = self.main_loop.log 56 | 57 | # Iteration 58 | msg = "e {}, i {}:".format( 59 | log.status['epochs_done'], 60 | log.status['iterations_done']) 61 | 62 | # Requested channels 63 | items = [] 64 | for k, vars in self.to_print.iteritems(): 65 | for shortname, vars in vars.iteritems(): 66 | if vars is None: 67 | continue 68 | if type(vars) is not list: 69 | vars = [vars] 70 | 71 | s = "" 72 | for var in vars: 73 | try: 74 | name = k + '_' + var.name 75 | val = log.current_row[name] 76 | except: 77 | continue 78 | try: 79 | s += ' ' + ' '.join(["%.3g" % v for v in val]) 80 | except: 81 | s += " %.3g" % val 82 | if s != "": 83 | items += [shortname + s] 84 | msg = msg + ", ".join(items) 85 | if self.use_log: 86 | logger.info(msg) 87 | else: 88 | print msg 89 | 90 | 91 | class SaveParams(SimpleExtension): 92 | """Finishes the training process when triggered.""" 93 | def __init__(self, trigger_var, params, save_path, save_every=10, **kwargs): 94 | super(SaveParams, self).__init__(**kwargs) 95 | if trigger_var is None: 96 | self.var_name = None 97 | else: 98 | self.var_name = trigger_var[0] + '_' + trigger_var[1].name 99 | self.save_path = save_path 100 | self.params = params 101 | self.to_save = {} 102 | self.best_value = None 103 | self.add_condition(['after_training'], self.save) 104 | self.add_condition(['on_interrupt'], self.save) 105 | self.save_every = save_every 106 | self.save_every_count = 0 107 | 108 | def save(self, which_callback, *args): 109 | if self.var_name is None: 110 | self.to_save = {v.name: v.get_value() for v in self.params} 111 | path = self.save_path + '/trained_params' 112 | logger.info('Saving to %s' % path) 113 | np.savez_compressed(path, **self.to_save) 114 | 115 | def do(self, which_callback, *args): 116 | self.save_every_count += 1 117 | if self.save_every and self.save_every_count % self.save_every == 0: 118 | self.save(which_callback, *args) 119 | if self.var_name is None: 120 | return 121 | val = self.main_loop.log.current_row[self.var_name] 122 | if self.best_value is None or val <= self.best_value: 123 | self.best_value = val 124 | logger.info('Best value %f' % val) 125 | self.to_save = {v.name: v.get_value().copy() for v in self.params} 126 | 127 | class SaveExpParams(SimpleExtension): 128 | def __init__(self, experiment_params, dir, **kwargs): 129 | super(SaveExpParams, self).__init__(**kwargs) 130 | self.dir = dir 131 | self.experiment_params = experiment_params 132 | 133 | def do(self, which_callback, *args): 134 | df = DataFrame.from_dict(self.experiment_params, orient='index') 135 | df.to_hdf(os.path.join(self.dir, 'params'), 'params', mode='w', 136 | complevel=5, complib='blosc') 137 | 138 | 139 | class SaveLog(SimpleExtension): 140 | def __init__(self, dir, show=None, **kwargs): 141 | super(SaveLog, self).__init__(**kwargs) 142 | self.dir = dir 143 | self.show = show if show is not None else [] 144 | 145 | def do(self, which_callback, *args): 146 | df = DataFrame.from_dict(self.main_loop.log, orient='index') 147 | df.to_hdf(os.path.join(self.dir, 'log'), 'log', mode='w', 148 | complevel=5, complib='blosc') 149 | 150 | 151 | def prepare_dir(save_to, results_dir='results'): 152 | base = os.path.join(results_dir, save_to) 153 | i = 0 154 | 155 | while True: 156 | name = base + str(i) 157 | try: 158 | os.makedirs(name) 159 | break 160 | except: 161 | i += 1 162 | 163 | return name 164 | 165 | 166 | def load_df(dirpath, filename, varname=None): 167 | varname = filename if varname is None else varname 168 | fn = os.path.join(dirpath, filename) 169 | return read_hdf(fn, varname) 170 | 171 | 172 | def filter_funcs_prefix(d, pfx): 173 | pfx = 'cmd_' 174 | fp = lambda x: x.find(pfx) 175 | return {n[fp(n) + len(pfx):]: v for n, v in d.iteritems() if fp(n) >= 0} 176 | --------------------------------------------------------------------------------